Bloglines doesn't handle inter-element white space properly

Posted by Tom Moertel Tue, 11 Sep 2007 16:51:00 GMT

If you’re reading my blog via Bloglines, you may have noticed that some of my posts look terrible, especially when they contain code snippets. I am sorry for that, but it’s not my fault. Bloglines doesn’t handle white space properly.

Here’s the more detailed explanation. When you request one of my feeds in, say, Atom format, you get back a bunch of XML that contains the most-recent posts from my blog. Each post is represented as lovingly crafted HTML, escaped per the Atom specs. When Bloglines gets its hands on this very same HTML, it attempts to scrub it nice and clean – get rid of any naughty bits, you know. And there’s nothing wrong with that. Except when the scrubbing goes horribly, horribly wrong. Which is exactly what happens when Bloglines encounters perfectly legitimate markup that represents syntax-highlighted code snippets.

What does Bloglines do then? It strips out all of the significant white space, turning each block of code into a single, mile-long, unbreakable line of NoSpaceText that forces your web browser to expand the page until it is wide enough to enshroud a small solar system. Then you are forced to scroll forever to read each line of the text column. Ugg.

More specifically, each syntax-highlighted code block is represented in HTML as a preformatted (PRE) text block. Each word in that block is wrapped in a SPAN element whose class attribute indicates the word’s role in the original source code. Keywords get one class, identifiers another, and so on. For example, the code “import List” might be represented as follows:

<span class="kwd">import</span> <span class="name">List</span>

But when Bloglines gets its hands on that markup, it strips out the whitespace between the SPAN elements:

<span class="kwd">import</span><span class="name">List</span>

Thus the markup renders as “importList” when it hits your web browser. Now imagine the same space-denuding bad behavior applied to all of the inter-element white space in a full-length block of code. That’s right, what you end up with is a single, insanely long LineOfUnbreakableText that your web browser chokes on. Again: Ugg.

The folks at Bloglines have had similar problems in the past, most of which have been fixed. I hope they fix this particular problem soon, too.

Until that time, however, you might want to consider other feed readers.

Posted in
Tags , , , , ,
4 comments
no trackbacks
Reddit Delicious

Claiming my blog on Bloglines

Posted by Tom Moertel Wed, 05 Jul 2006 14:44:00 GMT

Bloglines now offers a way to claim your blogs. Ordinarily, I never bother to do stuff like this. But Bloglines has at least three versions of my blog in their catalog. I would like to consolidate these into a single entry, something Bloglines claims I can do if I register my blog.

The registration process is somewhat annoying, but I can see why it is necessary. In short, I must add a Bloglines-given identifier to my blog’s HTML template. Then I must add another Bloglines-given identifier to a blog post. These allow Bloglines to verify that the blog’s website and feed are both under my control.

I’ll let you know how it goes.

Update: I was able to claim my blog but not its outdated entries in Bloglines's catalog. Neither Bloglines's instructions nor error reporting is specific enough for me to figure out what is wrong. I'm giving up for now.

Update 2 (2006-07-15): I was able to claim one of the outdated entries that had caused Bloglines to choke before. The other entries, however, are still problematic. Bloglines does have more-descriptive error reporting now, but those reports do not inspire confidence:

Verification Failed:
An Unidentified Error occured
[sic] while talking to (null) or http://blog.moertel.com/xml/rss/feed.xml?snip=start.

Capitalizing “Unidentified Error” is a nice touch: it makes the error seem both mysterious and important.

Update 3 (2006-07-20): All of my feeds on Bloglines are now consolidated under a single entry. What's the secret? Ruthless URL canonicalization. I perused my logs and found all of the various URLs that Bloglines was using to access my feed. Then I configured my front-end proxy server to redirect (401 permanent) all of them to my preferred feed URL. After a week or so, Bloglines's software took the hint and consolidated the entries.

Posted in
Tags ,
1 comment
no trackbacks
Reddit Delicious