I've been wanting to blog this for quite some time now. When we think of blocking and censorship, everyone goes on about China. Well theres many other nations which have levels of censorship and blocking. So it started with the blockage of BBC Persian content in Iran, then we started to syndicate more via public RSS and Email. Then Mario wrote a Instant messenger bot which takes BBC Persian RSS feeds and republishes them on the MSN network if you subscribe to the bot (just add bbcpersian@hotmail.co.uk to your buddy list). Then Mario added support for the Jabbber network (just add bbcpersian@menti.name to your buddy list) and tried to get YIM (Yahoo) working, as thats the most popular Instant messenging tool in Iran. Now he's trying out JRS which is a publishing tool for the XMPP (jabber) network as the Perl Yahoo module is broken or/and out of date. Then Hoder (Hossein Derakhshan) gave a good talk about censorship in Iran to the BBC.
Some observations along the way. Although right to left text should be easy with most unicode complient instant messenging clients. This simply is not the case. The markup of right to left languages is still a very difficult thing to do. Dan Brickley send a good email into the W3C internationalisation core group. I keep meaning to respond myself, but still have a draft ready which I keep rewriting. I'm happy Martin Duerst and others have read my paper from Xtech 2005. But I would like a little more clarity on Martin's reply.
In Ian's article and in Mario's messages, there is also some extent of confusion with regards to bidi. If the text in a line or paragraph contains only rtl characters, or neutral characters such as punctuation, any application is supposed to display it in the correct order. No attributes are neccessary, except for where to start the line (flush left or flush right), which can be considered a matter of taste (in mixed English/Farsi text, I wouldn't consider having all English messages flush left and all Farsi messages flush right necessarily
always the best display) and which could be handled by a switch in the user agent.It's only when a line or paragraph mixes both rtl and ltr text where having additional information becomes really necessary, to indicate whether the text is a (e.g.) Farsi sentence with some English embedded or the other way round (or even a more complicated structure).
See this is great in theory but the practice or reality Applications don't do this correctly. Its good to see I was correct about ATOM and RSS when it comes to language support.
It very clearly shows that more thought should go into supporting internationalization markup in all kinds of document or document-like (in the sense that they use free text rather than data items) formats.
The only blog format that got that right (sic!) from the start is Atom (http://www.ietf.org/rfc/rfc4287.txt). Elements such as title all allow for embedded XHTML markup, which then can take a dir attribute. RSS 1.0 has a content module that could do the same thing, but I'm not sure how well it is supported.
Certainly, its hardly supported in the RSS space. ATOM is the only one which had this from the start, so all the developers who build there readers have build in the ability to have markup inside of content module including directionality.