A type-based solution to the “strings problem”: a fitting end to XSS and SQL-injection holes?

Posted on
Tags: haskell, ruby, types, strings, testing, safestrings, sqlinjection, xss

Even skilled programmers have a hard time keeping their web applications free of XSS and SQL-injection vulnerabilities. And it shows: a sobering portion of web sites are open to some scary security threats.

Why are so many sites vulnerable to these well-known holes? Probably because it’s insanely hard for programmers to solve the fundamental “strings problem” at the heart of these vulnerabilities. The problem itself is easy to understand, but we humans aren’t equipped to carry out the solution. Simply put, we just plain suck at keeping a bazillion different strings straight in our heads, let alone consistently and reliably rendering their interactions safe whenever they cross paths in a modern web application. It’s easy to say, “just escape the darn things,” but it’s hard to get it right, every single time.

Computers, on the other hand, are pretty good at keeping track of details by the bucket-full. Wouldn’t it be nice, then, if we could delegate this nasty “strings problem” to our computers, which could then devote their unwavering mechanical precision to grinding the problem out of existence? Isn’t that the kind of thing modern programming languages are supposed to be good at?

I’d like to think the answer to that question is a big, you betcha.

So let’s grab a modern programming language and solve the strings problem.

Let’s solve the strings problem in Haskell

In this article, we will look at one way (among many) to solve the strings problem: by adding Ruby-style string templates to Haskell. These templates support “interpolation” via the usual, convenient #{var} syntax, but here interpolation is type safe. Haskell’s type system will prevent us from inadvertently mixing incompatible string types, and it will detect mistakes at compile time, before they can become live XSS or SQL-injection holes. Further, our solution will offer us these benefits without making us jump through hoops or pay some onerous syntax penalty.

To be more specific, the system offers the following benefits:

(This is a long one, so grab an espresso, lean back, and read on in style. Also, if you have a smoking jacket, you might want to get it now.)

Before I describe this Haskell-based solution, let’s take a closer look at the strings problem and review why a type-based approach makes sense. (If you already understand the strings problem and are convinced that it is both important and tricky to solve, feel free to skim the first third of this article.)

Examining the “strings problem”

Most web applications are just business-logic-driven string processors. They take strings from user-submitted forms, database queries, web-service responses, templates, and myriad other sources, and they combine the strings to generate yet more strings, which they emit as output and fling across the Internet, into your web browser.

For example, consider this snippet of Ruby (on Rails) code that I used to add submit-to-Reddit and submit-to-del.icio.us buttons to articles on my blog:

When writing this code, I had to keep track of at least three different kinds of strings:

In code like this, each type of string must conform to the requirements of its own little language, and it’s the programmer’s job – your job – to make sure that differences in these requirements are accounted for when combining strings. Getting it right is a difficult trick to pull off, and getting it right consistently is something even the best developers have difficulty doing.

In the tiny snippet of code above, for example, I had to remember to do all of these things:

  1. URL-escape (using the u helper method) the article’s title before inserting it into the submit-URL template
  2. URL-escape the URL for the article’s permalink before inserting it into the submit-URL template
  3. HTML-escape (using the h helper method) the final, expanded submit-URL template before inserting it into the hypertext-link template
  4. HTML-escape the submit-title (e.g., “Submit to Reddit”) before inserting it into the hypertext-link template
  5. HTML-escape the article’s title before inserting it into the hypertext-link template

That’s a lot to keep track of when coding.

But that’s not all. I also had to know not to escape the result of calling image_tag, because that helper method returns an HTML fragment, which is already in the language of the hypertext-link template into which it is inserted. Escaping it would have turned the image-element markup into embedded text that happens to look a lot like HTML markup.

And that’s not the worst of it. If you screw up any one of these steps for the typical web application, you open the door to a host of nasty problems. If you’re lucky, the damage will be contained to broken links or a rendering problem that most people won’t notice, maybe a weird database error now and again. In the worst case, however, you’re screwed: Your application’s customers become vulnerable to cross-site-scripting (XSS) attacks and your database is opened to injected SQL, through which enterprising crackers might steal your customers’ account data or do even nastier things.

Clearly, the strings problem is common enough and nasty enough to merit our attention. Many of our favorite problem-stomping practices, however, have not proved effective on the ever-tricky strings problem.

Unit testing is an inefficient solution to the strings problem

Unit testing is one of the most efficient programming practices for increasing the quality of software. If you write unit tests pervasively as you code, you are likely to nip many kinds of programming problems in the bud, saving time and effort, which you can then re-invest in your code. Further, unit-testing suites make for swell regression-detection nets and thus free you to refactor crufty code without fear of introducing breakage elsewhere. As a result, you’re more likely to keep your code lean and mean.

Despite its general effectiveness, unit testing is an inefficient way to defend against the perils of the strings problem. That’s because the strings problem is caused by knowledge deficits, which you can’t test for. If you don’t realize that you must escape one URL before you stuff it into another URL, you probably won’t think to write tests for that requirement.

Moreover, if you do think to write the tests, it’s expensive to get them right. In most unit testing scenarios, getting the tests right is usually easier or at least comparable in difficulty to getting the code that’s being tested right. That’s why unit testing is usually so efficient. For the strings problem, however, getting the tests right is often much more expensive than writing typical string-handling code. In my code sample above, for example, there are at least six ways the strings problem can cause trouble. How do you test for them all without making a mistake? It’s not easy.

In sum, unit testing probably isn’t the answer to the strings problem.

Other solutions to the strings problem

If unit testing isn’t the answer, what is?

Joel Spolsky wrote about the strings problem and suggested that using Hungarian notation was an effective solution. It might work, but it’s clunky.

In the database-programming world, many programmers have adopted the convention of never inserting a string into a SQL template by hand. Instead, they insert placeholders, typically question marks, into a template to indicate where they would like strings to be inserted. The template and the strings are then given to a special function that safely inserts the strings, escaping them as necessary. In Ruby on Rails, which has a fairly typical implementation, template expansion looks like this:

The question-marks-in-the-template solution is effective, but it’s also clunky, especially when you’re trying to insert a lot of strings. By comparison, Ruby’s native string-interpolation feature, in which the syntax #{…} lets us inject strings into a string template, is unsafe but much easier to follow:

In sum, the Hungarian-notation solution and the question-marks solution are reasonable responses to the strings problem, but both are clunky, especially when compared to the straightforwardness of good-old string interpolation.

Perhaps we can do better.

Eating and having one’s cake: a type-based solution

An ideal solution would combine the safety of the question-marks solution with the straightforward convenience of string interpolation, and it would work for all kinds of strings, not just SQL, and, because I’m implementing it in Haskell, it would lovingly nestle into Haskell’s type system and gain the full benefits of type-inferencing goodness.

How would it work? Well, let’s back up and think about strings for a moment. We can divide strings into two classes: (1) those that represent text, in which every character represents literally itself; and (2) those that represent fragments of interpreted languages, such as XML or SQL, where each character’s interpretation depends on the rules of the associated language. In text, for example, an ampersand (“&”) represents an ampersand, but in XML an ampersand represents the start of a character-entity reference.

It doesn’t make sense, then, to join text strings directly with language-fragment strings. If you did join them, text characters could be misinterpreted as language characters. For the same reason, it doesn’t make sense to join fragments of different languages together. (It does make sense, however, to escape text strings or language fragments “into” a target language and then join them with strings in the target language.)

A sound solution, therefore, should enforce the following fundamental, safe-string-handling rule: Do not allow strings that represent fragments of one language to be directly joined with strings that represent either plain text or fragments of another language.

The trick is making the computer enforce this rule for us. As it turns out, modern type systems absolutely love to do this kind of thing.

A solution to the strings problem in Haskell

Making the computer enforce our safe-string-handling rule in Haskell is fairly easy. All it takes is a little code. (As we go through the following code, remember that we’re writing a library. Normally, as users of the library, this code would be invisible to us.)

To begin, we create a module for our code and export the essential types and functions that make up our about-to-be-written safe-string kernel:

In order to create safe strings that correspond to particular languages, we need to tell the computer what we mean by Language:

Here we’re saying that Language is the class of languages, i.e., all data types l for which we can provide four functions:

  1. litfrag – converts a string that represents a language fragment into a language fragment
  2. littext – converts a string that represents plain text into a language fragment that represents the text (via escaping)
  3. natrep – converts a language fragment, verbatim, into a string that represents the language fragment
  4. language – returns the name of the language associated with a given fragment

Further, we need to declare a few “language laws” that conforming Language types must obey. These laws are for us. They will keep us honest when teaching the computer about new languages. Here are the two laws we will require language types to satisfy:

  1. natrep (litfrag s) = s
  2. natrep (littext s) = escapeL s

The first law requires that (natrep . litfrag) be equivalent to the identity function for strings. The second law requires that (natrep . littext) be equivalent to the text-escaping function for a given language L. For example, for the language XML:

Next, let’s construct a type-safe container for strings having a known language:

This data-type definition says that if l is a language, we can construct SafeString values for that language. Each value can represent an empty fragment of the language (via SSEmpty), a non-empty fragment of the language (via SSFragment), or the concatenation of two other SafeString values for the language (via SSCat).

Now comes the interesting part. We are going to use the type system to enforce the safe-string-handling rule for us.

We will do this using the SafeString data type we just defined. We have already placed the data type’s definition into a module that does not export the type’s data constructors. That means we will not be able to create SafeString values for ourselves. Instead, we must ask a small set of kernel functions, which are exported, to create the values on our behalf.

These kernel functions, which we are about to write, will create SafeString values only in accordance with our safe-string-handling rule. In particular, they will require us to certify that an existing string represents either text or a language fragment before creating a corresponding SafeString value for us. From then on, the type system will know which language the string is associated with and prevent us from joining it to regular strings or to SafeString values associated with other languages.

Let’s write these constructor functions now:

Here’s what the functions do:

Once the kernel creates SafeString values for us, we need some way to combine them safely. Thus we define the +++ operator and the cat function:

Finally, we need a way to convert SafeString values into normal strings so that we can pass them through the boundaries of our safe-string-protected code and into the outside world. For this, we write the render function:

As a convenience, let’s round out our kernel with a Show instance that tells Haskell how to format SafeString values for display.

And that’s our SafeStrings kernel.

Another look at the SafeStrings kernel

The following illustration, complete with poorly chosen colors, provides a visual summary of our system:

Stunning visual interpretation of the SafeStrings kernel and its relationship to the evil outside world

(Ignore the $(q “…”) stuff for the moment, we’ll talk about it later.)

Activating our mad art-interpretation skillz, we can now decipher the illustration:

Regular strings gain “admittance” to the SafeStrings kernel only via the text and frag certification functions, which we use to create corresponding safe strings for a given language. Once created, the safe strings live their entire lives in the fleshy-colored, egg-shaped protective sac that is the kernel, whose safe-string functions and operators use Haskell’s type system to prevent us from accidentally mixing the strings in unsafe ways. Further, because the kernel does not export its underlying data structures, we can’t screw around with the innards of our safe strings to break the kernel’s promises. When our safe strings have finally reached their ultimate, beautiful state, we can render them into regular strings and pass them bravely into the cruel outside world – where, most likely, somebody else’s broken code will screw them up anyway. But at least we tried.

Our first SafeString module: SafeXml

Now that we have written our SafeStrings kernel, let’s use it to create a SafeXml module that we can use for working with XML. Again, we will be writing library code that under normal circumstances would be hidden from view.

First, we will create a new module that uses the SafeStrings kernel:

Next, we will create a wrapper type to testify that a string represents a fragment of XML:

If you go back and look at the export list for the module, you’ll see that the XmlString data type is not exported. It is internal to the module, and thus we, as clients of the module, can’t create values of that type. That means we can’t “forge” XML strings into existence. We can create them only through the safe-string kernel, and even then only by certifying a regular string as representing text or a language fragment.

Like all good language types, XmlString needs to be a member of the Language type class, so we provide the necessary instance functions:

Note that the functions satisfy the language laws we defined earlier.

Next, we need to write a function to implement the escaping rule for XML:

Next, because we expect to work with XML frequently, we will create a convenient type synonym, Xml, for SafeString values that represent XML:

Finally, we will create a few convenience functions to create and render XML fragments. These functions are identical to the SafeString kernel’s frag and render functions but for the Xml type exclusively. When we use these functions, we won’t need to provide additional type annotations; the computer will know we are dealing with XML strings:

And we’re done.

Before going on, let me point out two things:

  1. If you think the code we have written so far is long or perhaps confusing, please remember that it is library code. Typically, you would never see it. All you would do is import SafeXml and start using the library.
  2. The SafeXml implementation is formulaic, and we can replace all of it except for the escaping function’s definition with a single line of code, something we will do later.

A quick test drive of our SafeXml module

Let’s give our SafeXml module a spin in the GHC interactive shell.

We can create an XML fragment by certifying that a regular string represents a language fragment and telling Haskell that we expect a result of type Xml.

Ok, modules loaded: SafeXml, SafeStrings.
*SafeXml> frag "<em>wow!</em>" :: Xml

Note how the output is prefixed with the label “xml:” to tell us that our kernel certifies this value to represent an XML fragment.

Because entering type annotations can be inconvenient, we can instead use the xml function, which certifies a string not just as a fragment but as an XML fragment:

*SafeXml> xml "<em>wow!</em>"

If we want to represent text in XML, the kernel will automatically escape it for us:

*SafeXml> text "ham & eggs" :: Xml
xml:"ham &amp; eggs"

Now let’s try to do something naughty. Will the type system let us?

*SafeXml> let someXml = xml "<em>Hi!</em>"
*SafeXml> let plainOldText = "ham & eggs"
*SafeXml> someXml ++ plainOldText

    Couldn't match `[a]' against `Xml'
      Expected type: [a]
      Inferred type: Xml
    In the first argument of `(++)', namely `someXml'
    In the definition of `it': it = someXml ++ plainOldText

In Haskell, the ++ operator is used to join strings. In the code above, we tried to use this operator to join an XML fragment to a plain-old string, which would have violated our safe-string-handling rule. Fortunately, we were unable to fool the type system into allowing this ill-conceived union to occur.

In fact, the union was never even attempted: our mistake was caught at compile time, before the code was ever converted into executable form. This is a big deal. Mistakes like this are programming errors that open security holes. Being able to catch these errors at compile time means you have the opportunity to track the errors to their source and fix them there. If you caught ill-conceived string unions only at run time, the logical errors that led to the attempted unions could have been in upstream code that has already executed – launching the missiles, perhaps. By then, it may be too late to undo the consequences.

Returning to our example, if we certify that the plain-old string represents text, we can make a safe union, so the type system lets us go ahead:

*SafeXml> someXml +++ text plainOldText
xml:"<em>Hi!</em>ham &amp; eggs"

And that’s basically all there is to it.

Syntactic sugar for safe strings

Not having to worry about the strings problem is fabulous and all, but having to type in frag, text, and +++ is kind of clunky. Let’s get rid of the clunkiness by introducing some syntactic sugar.

The common case when dealing with strings in web applications is templates. For example, here’s a simplified version of the link_to method from the deservedly popular Ruby on Rails. The method wraps a hypertext link around some content by “interpolating” the content and a URL into a link template:

In this code, we need to HTML-escape the URL (via the h helper) before interpolating it into the template. We do not need to escape the content, however, because it is already in the template’s language, XHTML.

Now, to introduce our syntactic sugar, here’s link_to rewritten in Haskell and using safe strings:

The type signature makes clear to everybody that the content parameter is XHTML, the url parameter is a URL, and the result is XHTML. The signature isn’t needed, but link_to is the stuff of libraries, and so annotations are good form.

The interpolation syntax is like Ruby’s, but with slightly different modifiers:

It’s pretty easy to tell which interpolation option is right for any situation, but late-night coding sessions make fools of us all. That’s why the type system is there to catch us when we make a dumb mistake.

Let’s try out the sugary link_to method:

> link_to (text "Tom's Weblog") (url "http://blog.moertel.com/")
    xml:"<a href="http://blog.moertel.com/">Tom's Weblog</a>"

Let’s take advantage of type inferencing in the next example:

> link_to $(q "<em>Espresso!</em>")
          $(q "http://google.com/search?q=espresso&oe=utf-8")

    xml:"<a href="http://google.com/search?q=espresso&amp;oe=utf-8">

This time we just supplied templates as input parameters. Haskell figured out their types and took care of the escaping for us.

Now that we know what the syntactic sugar looks like, let’s see how to implement it.

Implementing the syntactic sugar using Template Haskell

We implement the SafeString library’s syntactic sugar using Template Haskell. A small function q parses the sugared syntax at compile time and emits equivalent code using our safe-string functions frag, text, and so on. For example, the following sugar:

becomes the following code:

The code that makes it happen is fairly straightforward if you know Template Haskell, so I’ll skip the explanation because this article is already way too long. As usual, it’s library code, so normally we wouldn’t see it or care about it. All we care about is the $(q) sugar that the code makes available to us.

Here it is:

More sugar: defining additional safe-string types

One additional bit of Template Haskell code, which I won’t reprint here, defines declareSafeString. This function lets us eliminate the boilerplate code when defining new safe-string types. For example, compare our earlier definition of the SafeXml module with the following implementation of a module for safe URL strings:

The final line generates the boilerplate code for the wrapper type, the language definition, the Url type synonym, and the url and renderUrl language-specific convenience functions.

One big example to wrap things up

Because we have been discussing mainly library code, let’s take a step back and see some typical user-level code that uses safe strings. After all, that’s what counts.

Here is a Haskellized, safe-strings version of the Ruby code that I presented at the beginning of the article to add submit-to-Reddit and submit-to-del.icio.us buttons to my blog:

The code looks fairly similar to the original Ruby code, with the exception of some extra backslashes, courtesy of Haskell’s rather-unfortunate syntax for multi-line string constants.

The other big difference is that, in this version, the type system has automatically checked the code for strings-problem errors.

For completeness, here is the example’s supporting code . This code also makes extensive use of safe-string templates:

Have we done it?

Have we rid ourselves of the strings problem? If we use a programming language like Haskell and a library like SafeStrings, I think we can answer yes.

To be clear, the fundamental problem of having to manage different kinds of strings is still with us. As programmers, we still must understand the differences between URLs, XML, SQL, untrusted user input, and so on. But now, we don’t have to be perfect. As long as we can reliably slap the right type on a string when it first appears – a responsibility that for the most part can be carried out by libraries and frameworks – we can let the computer worry about it from then on. If we forget to escape the string later, as it winds its way through the twisty code of a large web application and interacts with other strings in potentially dangerous ways, the computer will catch our mistake – at compile time, before it can possibly become a live security hole.

But if slapping the right types on strings – certifying them – is a pain in the neck, we won’t do it. We will happily go back to our days of winging it, where every string interaction becomes an opportunity for a perfectly human mistake to give birth to a nasty security vulnerability.

That’s why syntax matters. That’s why Template Haskell, Lisp macros, and other meta-programming tools are important: they let us craft friendly syntaxes that encourage the use of programming aids like SafeStrings. That’s why type inferencing is important: it lets us do away with redundant annotations and makes working with types convenient, so we can reap the benefits of strong guarantees without having to pay prohibitive costs.

If there is a moral to this story, it’s that modern type systems and macro systems are powerful tools. They let us do things that otherwise would be impractically inconvenient. They extend our reach as programmers and let us solve problems that we couldn’t solve before.

Update: minor edits for clarity.