Openness in data formats

Me and Tantek

Tantek wrote this thought provoking entry about data formats and openness. Which I can't help but kind of agree on and disagree on. So first his entry.

  1. ASCII is dependable. Project Gutenberg insists on publishing their e-books as plain ASCII text as Mark Pilgrim noted, and their reasons are solid.
  2. Compatible XHTML is now also dependable. In the 15+ years since its public introduction, I believe that HTML has established itself sufficiently prominently worldwide that I feel quite comfortable declaring that HTML will be accepted to be as reliable as ASCII in coming years. In particular, authoring what I like to call Compatible XHTML, that is, valid XHTML 1.0 strict that conforms to Appendix C, is IMHO the way to author HTML that will have longevity as good as ASCII. Note that files in most file systems have no sense of “MIME-type”, thus the winged-mythological-creatures-on-the-head-of-a-pin style arguments about text/html vs. application/xhtml+xml that are often used to discredit either HTML or XHTML (or both) are irrelevant for the most common case of keeping archives of files in file systems.
  3. Plain old XML (POX) formats in the long run are no better than proprietary binary formats. XML, both in technology and as a “technical culture” is too biased towards Tower of Babel outcomes. I've spoken on this many times, but in short, the culture surrounding XML, especially the unquestioned faith in namespaces and misplaced assumed requirement thereof, leads to (has already lead to) Tower of Babel style interoperability failures. As this is a cultural bias (whether intentional or not) built into the very foundations of XML, I don't think it can be saved. There may be a few XML formats that survive and converge sufficiently to be dependable (maybe RSS, maybe Atom), but for now XHTML is IMHO the only longerm reliable XML format, and that has more to do with it being based on HTML than it being XML.
  4. Formats that are smaller (e.g. define fewer terms) tend to be more reliable.
  5. Formats that are simpler (e.g. define fewer restrictions/rules for publishers) tend to be more reliable.
  6. Formats that are more compatible with existing reliable formats tend to be more reliable, e.g. HTML worked well with existing systems that supported “plain text” (AKA ASCII)
  7. Formats that are easier to use, i.e. publish, and more immediately useful, rapidly become widely adopted, and thus become reliable as a breadth of software and services catches up with a breadth of published data in those formats.

The microformats principles were based on these observations. Now this doesn't mean I think microformats will replace existing reliable formats. Not at all. For example, I feel quite confident storing files in the following formats:

  • ASCII / “plain text” / .txt / (UTF8 only if necessary)
  • mbox
  • X)HTML
  • JPEG
  • PNG
  • WAV
  • MP3
  • MPEG

So my take on Tantek's thoughts.

Plain old XML (POX) formats in the long run are no better than proprietary binary formats. See I take issue with this, I understand what Tantek is getting at but I would say plain xml without a schema isn't leaning towards the Tower of Babel. And like Tantek already mentioned RSS and ATOM are pretty close to the non-tower of babel direction. I would also add FOAF and OPML to the list. I would love for SVG to also be included in this but alas its not. Formats that are smaller (e.g. define fewer terms) tend to be more reliable. Good point, hence why things should be broken down like how XHTML and SVG got Modularization.

My list of formats are slightly different too.

  • XHTML (Unicode)
  • XML (Unicode)
  • JPEG
  • PNG
  • MPEG3 audio
  • MPEG4 video
  • WAVE
  • SVG

Comments [Comments]
Trackbacks [0]

Blojsom 3.0 adds database storage and a even stronger API

My favorate blogging server Blojsom is shifting to Database storage for its next version. David Czarnecki the owner of the Open Source project outlined its very active history.

  • 01/29/2003 – blojsom project was registered on SourceForge and development was started.
  • 02/02/2003 – blojsom 1.0 was officially released. 18 releases were made in the 1.x cycle.
  • 09/10/2003 – blojsom 2.0 was officially released.
  • 06/28/2004 – Apple officially announces Tiger Server wherein blojsom is bundled as Weblog Server.
  • 03/14/2006 – blojsom 2.30 was officially released. 30 releases have been made in the 2.x cycle.

I remember running Blojsom betas, I think I started at Blojsom 0.7 when it could only handle one blog at a time. Then Blojsom 2.x came around and gave the whole project a real boost because it could easily handle many blogs under one install. I think the record is still 25,000 by some university in Australia. During the 1.x life of Blojsom, lots of plugins were developed and Blojsom was seriously deconstructed by the guys at HP research labs as part of there semantic blogging project. Its one of the things which I loved about Blojsom. Its nod towards something bigger than just simply blogging. Jon Udell did a talk about controlling our own data at Etech recently and one of snippits I heard was about he would run Xpath searches over his blog to pull out certain things. Its a step beyond tagging but one of the things which Blojsom has had for quite some time (Q3 2003 actually). Blojsom also has some other great stuff going for it like LDAP support!

Anyway, its a awesome blogging server and I believe Blojsom 3.0 will be better than Word Press. Its outgrown its roots in Bloxsom, which I believe is now struggling to stay around? And out grown all the Java solutions like Roller and Snipsnap. Being Java based will keep it out of the mainstream because most people have a LAMP setup on there hoster, but otherwise Blojsom 3.0 would be a bigger deal. Anyway more details about Blojsom 3.0

The first major change has been in the way blojsom is “wired” together. I've rewritten blojsom to use Spring for its dependency injection and bean management. There were aspects of the blojsom 2.x codebase that were more “patchwork” with respect to how certain components used or referenced other components.

The second major change has been in the datastore. I don't necessarily think I've exhausted all that can be done using the filesystem as a content database, but I've been feeling like there's a lot of development energy into making relations between data in the filesystem that can be expressed very easy using a relational database.

In blojsom 3.0, I've settled on using a relational database for the datastore. I'm using Hibernate as the ORM library to manage the data. This means goodbye to all the .properties files for configuration! It was fun while it lasted. The templates and themes are still stored on the filesystem, but I'd envision also storing the template data within the database as well. I've already prototyped use of the Velocity database template loader. I imagine removing any filesystem dependency will allow blojsom to be used in a clustered environment more easily.

Ultimately I think this will allow blojsom to scale much more than I think it can using the filesystem as a content database. I don't believe there are any esoteric relationships among the data in blojsom as to require a full-time DBA to manage an installation of blojsom.

The last major change has been in evolving blojsom's API.

For awhile now there are aspects of the API that were a throwback to needing certain data or referring to elements a certain way. I just wanted a more self-documenting and less redundant API.

For example, I've renamed the BlojsomPlugin interface to Plugin. I felt that having the org.blojsom.plugin package was declarative enough, but that keeping BlojsomPlugin was too redundant. None of the APIs have gone away, they're just more simple and straightforward.

The long and short of it is that you can do all of the things in blojsom 3.0 that were done in previous releases of blojsom. There are a few more components and plugins to migrate to 3.0, but I'm happy with how far things have come in such a short time given the scope of the changes.

You're more than welcome to start playing with blojsom 3.0 right now. All that you need to do after setting up your database is to add a blog and a user for that blog and you'll be able to login through the administration console.

If any of this interests you, feel free to participate on the blojsom-developers mailing list.

Being hosted with Hub.org, it would be wrong for me to not to choose PostgreSQL for my database backend. I would love to try other storage backends like a XMLDB but I can't quite experiment with this blog till I've tested it fully. Maybe there will be a way to run one blog on a Database and another on a filesystem or XML Database? Because that would be great. If worst comes to worst I will just run another copy of Blojsom for testing purposes.

Comments [Comments]
Trackbacks [0]

Semanticly changing cubicgarden

This page is xhtml 1.1 valid

Its been all of about a week since I wrote anything. I've been quite busy but I've actually been working on this blog. I've changed the structured of the pages which does cause some problems with some of you using Internet Explorer but most of you are using the RSS/ATOM so its low on my list of changes. I've also finally sorted out most of the issues with why the site didn't validate. As you can see, it now validates. This won't always be the case, due to that well talked about entity problem in copy and pasted url's. I'm also going to try and use Microformats more than I have in the past. I've not dumped OPML for outlining but I like XOXO and am actively looking for a application which supports it for quick editing. In the past I was using JOE (java outline editor) which is great because it allows you to runs python scripts which can do many things. But its not had much updates as of late. So can anyone suggest a XOXO editor besides the javascript one. If not there are XSLs to convert between OPML and XOXO so I'm not that worried.

Comments [Comments]
Trackbacks [0]

Tim Berners-Lee Semantic web lecture

Tim Berners Lee in Oxford

After the mad panic trying to get the train up to Oxford due to the Trainline machine at work not working. We arrived at the Oxford University venue well before the start time and picked a great spot for the lecture. Tim Berners-Lee was good to see live, you could see he certainly was no Steve Jobs. He was more like Bill Gates, a little uneasy with public talking but happy to talk about his vision and his work towards that vision. That vision is the Semantic Web. Rather than me explain every aspect of the talk its best I point you towards Tim's S5 presentation, a webcast (coming soon), this blog and my notes. I've also added my photos from the lecture to Flickr.

So generally I'm even more sure that the semantic web is happening but within certain domains. Will the semantic web happen across the web, doubtful at best. Recent developments in web 2.0 have really pushed the web towards a more richer smeantic web but away from top down ontologies and rules.

Oh and believe it or not, me and Miles were quoted in the Newstatesman blog

Comments [Comments]
Trackbacks [0]

Live clipboard from Microsoft

Before I've even had the chance to play with Microsoft's Simple Sharing Extensions, Ray Ozzie just shared a prototype they have been playing with internally. Its called Live Clipboard and basiclly is a clipboard for the semantic web.

Its a JavaScript-based solution which works in most browsers like Internet Explorer and Firefox. It stores data on the page as actual xml data trees which can be copied and pasted without having to select the text content. Its a difficult concept to explain but luckly Ray's got tons of screencasts to show how it works. The interesting thing is that not only does Live clipboard work in the browser domain but also in the desktop domain. Thanks to 25hours a day for the Etech trip report, which alerted me to Live clipboard in my RSS reader today.

Honestly when I first read the post, I did think this would be perfect as a Firefox Extension or even Greasemonkey script but you would miss out on the desktop side of things. I'll be interested to know how flexable Live clipboard is. For example will it read all types of Microformats? How about FOAF and XFN? Humm, I wonder if you could do something between a Firefox extension and a Yahoo Widget?

Comments [Comments]
Trackbacks [0]

A XSL transformation mindset

Someone asks on Metafilter.

When you imagine XSLT transformations happening in your mind's eye, what does it look like?

Its a really good question and opens up a whole range of thinking about the differences in peoples thought processes. So first Jeff talks about the question.

This is a very powerful question to ask, because ancient, procedurally oriented developers like me sometimes have trouble following the non-linear, pattern-driven processing that takes place when an XSLT template is applied to a tree of XML elements. In fact I have noticed that non-developers sometimes have an easier time with XSLT than do experienced developers, because they don't try as hard to figure out what is happening beneath the covers.

I would kind of agree with that statement. Theres something about XSL and XML which just makes sense in my head. I'm not from a traditional software or computer science background, so I still find it weird to be called a programmer by some of my peers. John wrote this fantastic comment.

My first project with XSLT a few years back was to actually generate XSLT *from* XML and XSLT and forced me to break my ideas of how it worked. When I finally got the whole “it happens all at once” approach, it started to make sense. However, every programmer that I've brought on board to an XSLT project since has had trouble getting out of the procedural thinking and that ends up being the biggest source for their mistakes.

Unfortunately, like MagicEye images, some people just aren't able to unfocus their minds in the right way to really grok XSLT beyond the simplest examples.

I have heard of programmers comparing XSLT to Prolog and even Lisp, I'm not sure how true this is but its certain that you can't approch XSLT in a regular way. Recursion is one of those things which seems to drive people mad. In XSL there's a lot of recursion and declaration which seems to fit the way I think. I always wanted to create a SVG of a XSLT process. So you can see in lines and boxes what templates are being called and add some kind of dimension to XSL. I'm sure its not that hard and even my experiements with transforming Cocoon's Sitemap file into SVG didn't require too much work. Talking about recursion someone posted this nice animated gif of how it all works. There's no douht that XSL requires a different mindset and working with a programming language like Java or Perl will be more of a hinderance that an advantage.

I posted this question to a few of the XSL developers I know and got a variaty of answers. In my own mind I see lots of lines and trees which get broken into branches

Comments [Comments]
Trackbacks [0]

Tagging which way? How about my way?

Story telling fest

Looking though my to read at somepoint in the future tagged catagory in Great News I found this useful summary of the problem with tagging online at the moment. Tag formats: Can’t we all just get along? covers the main tagging applications online and shows the confusion between spaced keywords and the comma seperated method.

So where do I fall on this issue? Well although I use Flickr and Del.icio.us almost everyday, I think they could both do benefit from using commas to seperate tags. All the latest services which I've used which support tagging have used commas because they make a lot more sense. As Victor says in the comments,

commas are faster than quotes.

as i see it (in my own experience) tags can be annoying if you don’t really care about them when you have to enter them. Usually you care about them later on, when you cannot find what you’re looking for. but they’re still a(nother) time-consuming task.

i’d use fast, thus i’d use commas.

The only thing which puts me off commas is the language issue, which is that some languages use commas for other things. There was a suggestion to use semicolon but I feel that would go down like a listening to your ipod in a church service. Other solutions which I've seen around the web include autosensing spaces or commas and the Amazon box model type thing. Which I personally think sucks because it takes too long to fill them in. I wonder why no ones written a greasemonkey script to allow people to pick a method which will be translated across all tagging services. So I can type commas into Flickr and it just translates it into spaces for me. Yeah its very lazyweb stuff. But as FataL points out, this can't be that hard.

Computer now smart enough to parse them all:
south asia, africa = [south asia] [africa]
“south asia” africa = [south asia] [africa]
‘south asia’ africa = [south asia] [africa]
(south asia) africa = [south asia] [africa]
south asia – africa = [south asia] [africa]
It’s not so hard to program all this I believe.

Comments [Comments]
Trackbacks [0]

Firefox 1.5 now out but with limited SVG support

Firefox 1.5 released

Firefox 1.5 is released, hooray! And its the same as Firefox 1.5 RC3 which I've been using for a while now, hooray again… But not with full support for SVG 1.1 Full, Tiny or Basic profiles. This is a crying shame but still marks another step forward for SVG on the desktop. The full version which supports SVG is still in development and should be available in Firefox 3 according to SVG news. At least SVG is doing much better in the mobile space, almost 100 phones and counting.

If you want to see whats possible with Firefox 1.5 and SVG, do check out the Canvas painter demos which are poping up everywhere. Vladimir has a link to the best ones.

Comments [Comments]
Trackbacks [0]

All your bases belong to google

This entry by Greg at Blogdigger titled Someone set up us The Bomb is excellent. I'd honestly hadn't really looked into Google base because the idea of marking up my data just for Google gives me the creeps, but the angle gives me a even creeper feeling.

In an effort to push things in the proper direction, a small group of individuals and companies began working on ways to structure information, in an attempt to prevent SDL (Semantic Data Loss) and create better search in the process. The history here goes back quite a bit, so I'll skip to the end, which is often called datablogging, microformats and/or structured blogging, all of which attempt to make the process of capturing the meaning of content easier both for the producer and the consumer. Things were moving along nicely in that direction; Google Base, however sends a proverbial “Make your time” to all those services, since Google Base essentially allows content producers to explicitly tell Google what all those little bits of data mean and how to interpret them.

Greg is right, but this is the dilemma. Google is offering a solution to put large amounts of structured data online while Databloggling hasnt gone that far and Microformats for as much I love them are still a second thought when blogging. I mean I'm a xml guy and I usually write the text, add the basic links, etc then some tags and maybe trackbacks. The adding of microformats usually comes afterwards, imagine what most people do.

We really need to start adding microformats to the Blogging applications, and soon.

Comments [Comments]
Trackbacks [0]

Using Microformats in blog entries

I'm going to start using Microformats a lot more from now on forward. I've setup Wbloggar with a load of custom tags and hope to use them when blogging. I want to use it as a experiment to see how practical it is to use Microformats in everyday life. I even looked back into XFN, for describing relationships. I'll come back to how well it goes, but I'm considering using ecto instead as I heard it can have scripts which mean I could put in a real form instead of just code.

Comments [Comments]
Trackbacks [0]

Does presentation matter in a world of RSS?

So Ben Metcalfe asks the question Does presentation matter anymore? This is exactly what me, Miles, Harry and Dave talked about one night over dinner. Honestly I think it does but as Ben identifies its moved around the chain now. If we take it that RSS has a huge audience and that its not changed a lot from its current form (aka no JS, CSS, Ajax, etc in RSS or ATOM) for a moment. The presentation shifts to feed promotion and the news reader style. For example Great News which I'm using for my desktop aggregator supports CSS and I can actually define a style sheet per feed if I want to. This was useful today when Google news was delivering me all the WorldService and ArabicTV stories, as I could use the brief stylesheet to show a lot of entries on one screen. While I use the readability stylesheet for reading Ben's blog and most of RSS content.

But it goes deeper than that, design isnt just about presentation. A designer should have a hand in the structured elements of the RSS feed, the useability of how its pushed and pulled around the internet and the accessability of the feed and its content. Its what I prefer to call the whole process the Flow of the content. Its part of what I do and I feel its part of the emerging role for new media designers. I mean is it too much to ask for a designer to build a client side XSL page for a RSS feed?

Just stepping away from the world of huge RSS audiences now. There something which smart designers understand well. The media, there designing for. web media isnt print media. Sounds obvious, but were talking about the vision for how the site should look and work being thrown out the window. I'm not talking about just browser quirks, screen resoultions and font size differents. I'm talking about the range of toolbars, extensions and the like which deconstruct the website beyond the control of the tightest web designer. Then if you go down the Greasemonkey path, you have something where you can actually share your deconstructions. Smart designers understand and embrace this and actually push for CSS driven sites to make this even easier. There are a few even testing the waters with Client side XSL transformations for all content with CSS for style.

I've included a screenshot of how I currently see BBC news story pages and how its meant to look. I custom built this simple script because it makes loading up bbc news stories from my RSS reader quicker and is easier to read for myself. Others would disagree, but then I would suggest you write your own greasemonkey script.

So back to the question, yes presentation does matter and the role of a designer is very important but like everything, roles shift with the times and media. Branding is another issue which I wont go into right now either…

I found this great little post about WIndows Longhorn/Vista's redline designs. Ryan suggests Redlines are a throw back to another generation of design, and I have to agree. Dactylx asks this question in the comments
I'm down with that idea, but then how do you as a designer communicate how the design should be rendered to a developer? What can we use to replace the redlines? and Ryan replies with a slightly optimistic but good answer.

Here is the first step. Do not separate the teams. There should be no technical team and design team working separately (on different floors or on different continents). They should sit right next to each other and *understand* the problem just as great as the designers. Design is manifested in code, so if the coders don't understand, then the product is inevitable to fail.

I'm once again in total agreement, in my experience the best projects are always when everyone is involved in the problem. Not passed around like a rugby ball on a winters day.

Comments [Comments]
Trackbacks [0]

The opening keynote from SVG Open

Taken from Kurt Cagle's presentation at SVG Open 2005The Future of SVG and the Web

I think a few of us (okay, maybe all of us) wish that this process was going faster, but its worth putting things into perspective. Two years ago, I had to explain to most programmers I worked with what SVG was. A year ago, I had to explain to most non-programmers I worked with what SVG was. Today, companies are hiring SVG developers, SVG is on our phones, is moving into our browsers, is appearing in embedded display systems on our trains and planes. This did not happen in a vacuum. It occurred because you took the message of SVG, of open standards, into your workplace, into your schools, into your government offices.

And thats the only the start. Kurt later runs through different points which he feels add to the changing landscape of the net. One of the key points I feel is his one about the rise of domain experts and platform independence.

Rise of Domain Experts, Not Programmers. XGUI based systems separate the abstract representation of applications from their implementation, which means that increasingly (likely using tools) specialized programmers will be replaced by domain expert non-programmers. This is already happening in fields like GIS. GUIs for designing such XGUI applications will similarly look more like flash editing tools or web layout tools with a few “access points” into scripting exceptions than they will complex IDEs. This doesn't make programmers obsolete, but it does increasingly push them into a component developer role.

Data/Platform/Language Independence. XML is increasingly abstracting the form of data access, turning complex and arcane queries (and updates) against LDAP servers, SQL databases, web services, mail services and so forth away from dissonant technologies and towards common XML ones. XML based XGUIs abstract the underlying platform interfaces and turn them increasingly into XML-oriented virtual machines that can degrade gracefully in the face of more limited capabilities, and makes such religious issues as Java vs. C++ vs. C# vs. flavor of the month language irrelevant – you use what works on the system to implement the abstraction. This doesn't eliminate the need for software – you still need to have those component implementations, and many of them may be extraordinarily complex and specialized in the back end, but it goes a long way toward eliminating the need for re-engineering the 90% of actions that we still do using the web now, from gaming to e-commerce to communication.

I have to say this is key! XSLT is so powerful once your able to get everything down to a XML level. Proprietary ways are moving aside while bridge applications are being used to open the data into XML. I actually remember when I first started using Cocoon and my fear was that there would not be enough XML sources to really make use of its ability. Boy was I wrong. I'm seeing lots of new web API's built on a RESTful interface, Bridge Apps for IM, email, newsgroups and even operating system information stored and generated in XML. SVG adoption has indeed been slow but its growing and it will be just another common place namespace.

Its well worth reading the whole of Kurts entry yourself, I actually found it quite moving….

Comments [Comments]
Trackbacks [0]

Note taking and outlining

I thought I had it covered. Joe on the TabletPC and Pocket Thinker on the ipaq, both support OPML naively but pocketthinker does not import OPML from Joe. So I'm back to the start with note taking.

I'm seriously douhting if OPML is the right thing for the task. Uche goes one step futher and suggests XML formats for outlining are complete rubbish. Danny Ayers also gets the boot in on OPML. Honestly he has a point but offers up a couple of others which I had not looked into before. OPL in reaction to the ugliness of OPML. Looking at the spec, I'm not sure it goes quite far enough. XBEL on the other hand looks too wildly different but useful for outlining. Uche also did a follow where he reviews. I like the idea of XoXo but prefer the idea of using XHTML or RDF which is easily parsed and integreated into other processes.

Then I found Wikipad… and had high hopes for a pocketpc version like this palm version or even this mobile phone type version. Wikipad doesn't have the name of something like Voodoopad but it certainly does do a good job of notetaking for now…

Comments [Comments]
Trackbacks [0]

The problem with Language RSS

Shoshannah Forbes and I have been sending emails back and forth about the issue of RSS adoption in Right to Left languages like Hebrew and Arabic. It fits so closely with what I'm going to say tomorrow at XTECH (where I happen to be right now actually) its almost uncanny. I asked Shoshannah if I could blog her reply to my question about her RSS feeds. Basicly her RSS feeds include the HTML attribute dir to indicate direction of the text. Which makes it invalid and may break quite a few of the RSS readers out there. Anyhow here is the email complete with my new agreements and additional comments. Please remember as usual these comments are my own views and not BBC World Service's views (my employer).

Shoshannah Forbes wrote:

> The problem I am facing is simple:
> If I use valid RSS with no dir=rtl, then 99% of the RSS readers will display the text block as LTR, with punctuation digits and English in wrong locations, making the whole thing unreadable.
> When adding dir=rtl, at least I can get about 50% of the RSS readers to display the post body properly (titles are still a mess).

Agreed, but I feel there are two ways of looking at the problem. From your point of view it makes sense to include dir=”rtl” because very few software developers are going to change there code to take this into consideration. For us (the BBC World Service) we have the might to speak to developers and get them to change there code. Even if we do not do it for ourselves, we owe it to our audience (my own feelings).

> I don't use unicode control characters for a few reasons:
> * They are a real pain to input- it is like entering the control characters for CR/LF or < font > tag manually (but worse)- there are just to many places to enter them.

Yep totally agree

> * Most keyboard layouts do not have a direct way to enter them.

Yeah were using virtual keyboards for some languages and there a nightmare!

> * They make a mess of the text- they are only used for the RSS, and unneeded for the editing or the html display, and can produce unexpected results when entered into the text.

Yep, agreed

> * There are many clients that incorrectly display them as visible characters in the text.

Yeah, its a shame and that will change but its too much trouble at the moment

> * They make the text much more difficult to edit- if you change the text, you need to go back and change them as well. And since they are invisible, you get an awful lot of trial an error.

Indeed! You really need to understand them to edit with them. This would require extra training for our language services

> * They force me to use explicit directionality, which complicates things and makes the text less portable.

Yeah, there is a idea of reuse through out our language services. This is tricky already, who knows how much more tricky it would be if text was unicode directional too

> * My web app that creates the RSS from my HTML does not know how to add them automatically.

Yep, I know my Blogger app (Blojsom) supports Unicode Directionality IF i put them in at the start but then were back to the editor problem of virtual keyboards and sticking in hidden characters! The same is true of the BBC World Service systems. We use XSL with Saxon so if the characters are there, it should (not tested by myself) pass through to the RSS.

> * Since they are rarely used in other contexts, I can't focus on the content when writing, and have to start thinking more closely about the presentation.

Yeah indeed! Our language services are already busy as hell, unicode directionality would just add a level of complex on top of a already stressful job.

> * Moving from me to other users- most Hebrew/Arabic users don't know about them, and don't want to know. You try to explain to your mother that when she is writing in her weblog, she can't write in here usual manner, but has to enter this strange codes in a foreign language which have complicated rules (I have seen many pros get confuses with these characters, I don't expect laypeople to understand them).

Right on the nail! One of my points for tomorrow is unicode directionality is too damm difficult and very confusing! i expect some will challenge me about this tomorrow and honestly I will just admit its too difficult for me its even more difficult for others. Plus we should be making things easier for people not harder. The barrier for entry should be at a level where your mum or my mum could use it and write it.

> * It doesn't scale- think about a an Israeli blog hosting service- they want to offer RSS feeds for all the blogs, with minimum work for the users. Relaying on unicode control characters just doesn't do it.

Yeah plus from the Israeli blog hosting point of view, you want to get people going quick and easily not putting them off with complex editiing. Its the reason why Blogger does so well, 3 steps and you got your own blog.

> * Since they are complex, it is difficult to create a GUI for entering them (unlike general RTL/LTR controls, which are available everywhere).

Yeah its almost needs to be just like the direction attribute in HTML. I'm suggesting tomorrow a attribute like this for RSS.

> Not having the dir attribute in RSS gets rid of some markup- in favor of lower level much more complex control characters. A bad deal, IMO, and one which is a major cause for the problems when dealing with Hebrew/Arabic RSS.

Indeed, it was a ideal solution but the real world use is too painful

> I think that the root of the problem is that bidi is part presentation and part structure. And since even in the best of cases (for example, the automatic bidi control in recent QT or GTK applications on Linux) there are still many many cases that can *not* be covered reliably by the display algorithms of the software, I tend to think that for practical prepossess, bidi is more structure then presentation.

Yeah agreed, theres lots of push to put bidi information inside of CSS instead of HTML even, which is correct if you see bidi as presentation.

> I sure wish there was a way in RSS to tell the client “this element is RTL” or “this area is LTR” without resorting to HTML hacks. But at the moment, those hacks are the only practical tool I have to get at least *some* of the readers out there to display the text properly (more like “mostly properly”).

I feel your pain and I'm not even writing my own content in a right to left language! Its such a shame that HTML hacks are the only way we can move forward on this. The crux of my presentation and paper is that developers and content providers need to work much closer together and the RSS specificiation needs to make full use of attributes like xml:lang and maybe some other kind attribute for direction.

Comments [Comments]
Trackbacks [0]