If unit testing can't keep Rails safe from string-escaping problems, what makes you think it will keep your projects safe?

By
Posted on
Tags: rails, testing, strings-problem, escaping

Recently I wrote about unit testing being a tool, not a goal in itself. I argued that unit testing was not a reliable way to fight certain kinds of common coding errors and, therefore, that unit testing ought to be supplemented with other tools.

To support my argument, I gave an example of a common, important coding error that unit testing does a bad job of helping programmers control. That error is failing to manage and escape strings properly: the “strings problem.” It is the mother of XSS and SQL-injection security vulnerabilities, not to mention the cause of legions of broken links and bad HTML on the web.

If you think I’m overstating the problem, or if you think that unit testing is a good way of solving it, let me show you how easy it is for even smart developers to get it wrong.

Consider Ruby on Rails, a great framework for developing web applications. Rails has an extensive suite of unit tests, and the Rails development guidelines require that changes to Rails be accompanied by unit tests that “prove [the] change works.”

Now consider that one of Rails’s most-used and most-scrutinized methods – the venerable link_to helper – contains a fundamental string-escaping error:

require 'rubygems'
require_gem 'rails'
include ActionView::Helpers::UrlHelper

url = "http://example.com?ohms_law?volt=1&amp=3"
puts link_to("TEST", url)

The code, when executed, prints the following HTML snippet:

<a href="http://example.com?ohms_law?volt=1&amp=3">TEST</a>

The HTML snippet represents a hypertext link. The link should point to the URL given in the code, but because the URL was not properly escaped when it was converted into HTML by the link_to helper, the link is broken:

CORRECT:  http://example.com?ohms_law?volt=1&amp=3
LINK_TO:  http://example.com?ohms_law?volt=1&=3
                                             ^ oops

Here’s what’s going on. Because the URL was not escaped, web browsers misinterpret its “amp” parameter as a character-entity reference, which gets gobbled up when the link’s href attribute is parsed. (To see this for yourself, save the output of the Ruby code into an HTML file, open the file with your favorite web browser, and see where the link points.)

Now, how come the unit tests didn’t catch this problem? It turns out, the tests got it wrong, too, by expecting broken output:

# in url_helper_test.rb

def test_link_tag_with_query
  assert_dom_equal \
    "<a href=\"http://www.example.com?q1=v1&amp;q2=v2\">Hello</a>",
    link_to("Hello", "http://www.example.com?q1=v1&amp;q2=v2")
end

The point isn’t that the Rails developers are dumb. The point is that the Rails developers are smart. If they can’t get the strings problem right, even with all their brains and all their unit testing, what reason does any programmer have to think that unit testing is going to solve this problem reliably?

If, then, you want to solve the strings problem – and you really, seriously ought to want to solve the strings problem – you should consider options beyond unit testing.

Update 2007-09-04: I just noticed that the documentation for link_to has been revised to state that if you pass a string as its options parameter, the string will be interpreted not as a URL but as an HTML href attribute value, that is, an HTML-encoded URL. The old documentation:

def link_to(name, options = {}, html_options = nil, *parms)
Creates a link tag of the given name using an URL created by the set of options…. It’s also possible to pass a string instead of an options hash to get a link tag that just points without consideration.

The relevant part of the revised documentation:

It’s also possible to pass a string instead of an options hash to get a link tag that uses the value of the string as the href for the link.

So, according to the updated documentation, the test I described in my article is actually correct. Does this mean that string-handling code is Rails is worry free? The existence of helper methods like fix_double_escape suggests the answer is no.