If unit testing can't keep Rails safe from string-escaping problems, what makes you think it will keep your projects safe?

Posted by Tom Moertel Thu, 12 Oct 2006 20:06:00 GMT

Recently I wrote about unit testing being a tool, not a goal in itself. I argued that unit testing was not a reliable way to fight certain kinds of common coding errors and, therefore, that unit testing ought to be supplemented with other tools.

To support my argument, I gave an example of a common, important coding error that unit testing does a bad job of helping programmers control. That error is failing to manage and escape strings properly: the “strings problem.” It is the mother of XSS and SQL-injection security vulnerabilities, not to mention the cause of legions of broken links and bad HTML on the web.

If you think I’m overstating the problem, or if you think that unit testing is a good way of solving it, let me show you how easy it is for even smart developers to get it wrong.

Consider Ruby on Rails, a great framework for developing web applications. Rails has an extensive suite of unit tests, and the Rails development guidelines require that changes to Rails be accompanied by unit tests that “prove [the] change works.”

Now consider that one of Rails’s most-used and most-scrutinized methods – the venerable link_to helper – contains a fundamental string-escaping error:

require 'rubygems'
require_gem 'rails'
include ActionView::Helpers::UrlHelper

url = "http://example.com?ohms_law?volt=1&amp=3" 
puts link_to("TEST", url)

The code, when executed, prints the following HTML snippet:

<a href="http://example.com?ohms_law?volt=1&amp=3">TEST</a>

The HTML snippet represents a hypertext link. The link should point to the URL given in the code, but because the URL was not properly escaped when it was converted into HTML by the link_to helper, the link is broken:

CORRECT:  http://example.com?ohms_law?volt=1&amp=3
LINK_TO:  http://example.com?ohms_law?volt=1&=3
                                             ^ oops

Here’s what’s going on. Because the URL was not escaped, web browsers misinterpret its “amp” parameter as a character-entity reference, which gets gobbled up when the link’s href attribute is parsed. (To see this for yourself, save the output of the Ruby code into an HTML file, open the file with your favorite web browser, and see where the link points.)

Now, how come the unit tests didn’t catch this problem? It turns out, the tests got it wrong, too, by expecting broken output:

# in url_helper_test.rb

def test_link_tag_with_query
  assert_dom_equal \
    "<a href=\"http://www.example.com?q1=v1&amp;q2=v2\">Hello</a>",
    link_to("Hello", "http://www.example.com?q1=v1&amp;q2=v2")
end

The point isn’t that the Rails developers are dumb. The point is that the Rails developers are smart. If they can’t get the strings problem right, even with all their brains and all their unit testing, what reason does any programmer have to think that unit testing is going to solve this problem reliably?

If, then, you want to solve the strings problem – and you really, seriously ought to want to solve the strings problem – you should consider options beyond unit testing.

Update 2007-09-04: I just noticed that the documentation for link_to has been revised to state that if you pass a string as its options parameter, the string will be interpreted not as a URL but as an HTML href attribute value, that is, an HTML-encoded URL. The old documentation:

def link_to(name, options = {}, html_options = nil, *parms)
Creates a link tag of the given name using an URL created by the set of options.... It’s also possible to pass a string instead of an options hash to get a link tag that just points without consideration.

The relevant part of the revised documentation:

It’s also possible to pass a string instead of an options hash to get a link tag that uses the value of the string as the href for the link.

So, according to the updated documentation, the test I described in my article is actually correct. Does this mean that string-handling code is Rails is worry free? The existence of helper methods like fix_double_escape suggests the answer is no.

Posted in , ,
Tags , , , ,
5 comments
no trackbacks
Reddit Delicious

Comments

  1. Levi said about 1 hour later:

    Arg. I was patiently awaiting your Haskell solution to this problem, and thought, when this article showed up in my feeds, that you were going to present it here.

    I will now go back to waiting patiently. :)

  2. John Nowak said 115 days later:

    Aye.. What other sort of solution was possible here? Whoever wrote the test expected broken behaviour. He tested for broken behaviour. His test passed, and hence, we got broken behaviour. No language can save a programmer from herself if she doesn’t know what her own functions are supposed to be doing.

  3. Tom Moertel said 115 days later:

    John, what other sort of solution was possible? Take a look at a type-based solution to the strings problem for one.

    Cheers! —Tom

  4. Samuel said 148 days later:

    The problem is inadequate specification. Unit tests can back up specifications, but if the specs are wrong, the unit tests necessarily must also be wrong.

    The problem IS NOT the practice of unit testing. It’s with the client’s expectation of how link_to works.

    Your arguments against unit testing so far have been thin and vacuous.

    However, I agree overall that unit tests aren’t for everything. Unit tests are best at isolating “corner cases”, but do relatively little for the “common cases.” For example, to prove the correctness of a 2048-bit modulo integer multiplication routine, for example, you’d never want to use unit tests. :)

    What I’m saying is that your thesis is good, but the supporting argument is bad.

    Based on reading this article, I’m going to predict that the Haskell-based solution is to use “newtype” on String to identify unescaped versus escaped strings, thus making it patently clear to the coder (and the compiler) what is expected of the function. This DOES have the effect of reducing unnecessary unit tests. But it won’t eliminate unit tests entirely.

    There’s always been a major war between the “formal methods” and “unit testing” camps. The truth is, as in all chaotic systems, the interesting parts are in the interfaces between the two camps. If you want excellent code coverage AND provably correct code, you must use BOTH techniques.

  5. Tom Moertel said 149 days later:

    Samuel,

    Thanks for your comments.

    I’m not sure that you accurately understand my claims. If you have not seen it (and it seems like you have not), please read my article Unit testing is a tool, not a goal, which explains what I believe.

    To be clear, I’m not saying that unit testing is bad, or that the problem with bad code is the practice of unit testing. I’m saying that unit testing should not be our focus. Our focus ought to be confidence in our code. For many kinds of situations, unit testing is an efficient way to establish that confidence. For others, the strings problem being one of them, unit testing doesn’t work so well.

    Based on reading this article, I’m going to predict that the Haskell-based solution is to use “newtype” on String to identify unescaped versus escaped strings …

    It’s actually a bit more complex (involving type classes, newtypes for different kinds of strings, and Template Haskell), but that’s the gist of it. Please read A type-based solution to the strings problem if you’re interested.

    Cheers,
    Tom

Trackbacks

Use the following link to trackback from your own site:
http://blog.moertel.com/articles/trackback/184

(leave url/email »)

   Comment Markup Help Preview comment