revjim.net

January 14th, 2004:

Humans don’t read XML

I’ve stayed out of this discussion so far because I don’t have the time to argue. But there are some points that need to be made that some people aren’t recognizing.

First of all, I’d like to say that I generally agree with Mark. He’s a smart guy who thinks through things in much the same fashion that I do. More than anyone, he’s helped to solidify RSS as a syndication format and has contributed a lot to the difficult task of checking hundreds of websites for new content and displaying that information in a simple, easy to read location reducing the amount of time it takes to be updated on what’s being said in the world.

However, on this most recent issue, I disagree, slightly, with his stance. I think it all boils down to this comment he made:

Julian, thanks for that link. You have convinced me that XML is the wrong solution for anything that involves end users.

I’m pretty sure Mark is being sarcastic here. And Julian says exactly what needs to be said. End users don’t produce nor read XML. If you have an argument where an end user might do so, it’s a very obscure case, I’m sure. End users type words into little boxes on their screens and hope that it makes it to their website. End users type URLs into address bars and wait for their browsers to figure out what they want and show it to them. End users do not read XML. Hell, developers rarely read XML except when they are developing something. XML isn’t fun to read. Sure, it’s readable. But reading it is not what it was made for.

Weblog content tools generate XML, not users. And feed parsers/readers read XML, not users. A small case can be made for those users that hand type XHTML as being users who generate XML. And, in this case, it’s possible that an exception should be made. However, I strongly feel that, if an end-user can produce valid XHTML, then she is acting as a developer. And if an end-user can’t produce valid XHTML, she should stick to HTML and/or use a content creation application to produce XHTML for her. If an end-user is hand typing their XML on a regular basis, they need better tools. If they are reading XML documents as text, then they really should find an application created to handle that specific type of XML. I read XML on a semi-regular basis in my line of work, but I only do so when developing. If I had to read XML just to get updated on the day’s news, I’d certainly find a better way to do it.

HTML parsers must be liberal. They were created before weblog authoring tools were available. They were created at a time when much of the world was hand typing HTML and many of those people typing these foreign characters into text boxes had no clue what HTML was supposed to look like. They had a vague idea. If the HTML browsers of the time didn’t accept what they wrote, chances are, they wouldn’t be able to fix it. In many ways, this is still a true statement. Many of the people who author HTML today don’t know a thing about how it’s supposed to be, what’s valid, and what isn’t. They just know how to copy what they see and hope that it’s good enough to render something close to what they were hoping for.

RSS used to be this way. When RSS was first used, many of those feeds were typed by hand by people who did so only because they thought they needed to. They are the same people that coded their HTML by hand without the use of any content engine whatsoever. This time, however, is no longer upon us. There might be a handful of people out there who still handcode RSS. People who don’t use any sort of content engine. They are the few, and the proud. But I no longer believe that these people should be catered to. The time has come where just about anyone can run a content generation engine like Movable Type. If you can’t, that’s fine. But, just as I don’t expect the people who consume JPEG data with their various tools to find methods of accepting JPEG data that might be error-ridden because it was created by hand, I don’t believe the RSS/ATOM reading world should be plagued with that task either.

Computers read JPEG data, not humans. In that same fashion, computers read RSS/ATOM data, not humans. Computers also generate JPEG data, not humans. And computers should generate RSS/ATOM data, not humans. If you want to be a rouge do-it-yourself-er, that’s fine. But don’t expect the rest of the world to make up for your mistakes. Regardless of what format you are publishing data in, if you hand-coded that data, validate it, and correct it if there are mistakes. The rest of the world shouldn’t be required to accommodate your laziness.

(And before anyone goes crazy and tells me that my RSS feed doesn’t validate. You’re right, it doesn’t. However that’s because, despite who I asked (including Mark Pilgrim), no one was able to tell me the proper way to encode international characters in an RSS feed. So, without a proper method to use, I had no choice but to do it my own way.)