A type-based solution to the "strings problem": a fitting end to XSS and SQL-injection holes?

Posted by Tom Moertel Thu, 19 Oct 2006 01:40:00 GMT

Even skilled programmers have a hard time keeping their web applications free of XSS and SQL-injection vulnerabilities. And it shows: a sobering portion of web sites are open to some scary security threats.

Why are so many sites vulnerable to these well-known holes? Probably because it’s insanely hard for programmers to solve the fundamental “strings problem” at the heart of these vulnerabilities. The problem itself is easy to understand, but we humans aren’t equipped to carry out the solution. Simply put, we just plain suck at keeping a bazillion different strings straight in our heads, let alone consistently and reliably rendering their interactions safe whenever they cross paths in a modern web application. It’s easy to say, “just escape the little buggers,” but it’s hard to get it right, every single time.

Computers, on the other hand, are pretty good at keeping track of details by the bucket-full. Wouldn’t it be nice, then, if our programming languages gave us the power to delegate this nasty “strings problem” to our computers, which could then devote their unwavering mechanical precision to grinding the problem out of existence? Isn’t that the kind of thing modern programming languages are supposed to be good at?

I’d like to think the answer to that question is a big, you betcha.

So let’s grab a modern programming language and solve the strings problem.

Let’s solve the strings problem in Haskell

In this article, we will look at one way (among many) to solve the strings problem: by adding Ruby-style string templates to Haskell. These templates support “interpolation” via the usual, convenient #{var} syntax, but here interpolation is type safe. Haskell’s type system will prevent us from inadvertently mixing incompatible string types, and it will detect mistakes at compile time, before they can become live XSS or SQL-injection holes. Further, our solution will offer us these benefits without making us jump through hoops or pay some onerous syntax penalty.

To be more specific, the system offers the following benefits:

  • It provides a string-management kernel that lets you create “safe strings” by certifying a regular string as representing either text or a fragment of a known language.
  • It allows you to conveniently define new language types for any string-based language that you can provide an escaping rule for (e.g., XML, URLs, SQL, untrusted user input).
  • It provides compile-time syntactic sugar (via Template Haskell) that makes working with safe strings as convenient as working with string interpolation in languages like Ruby and Perl.
  • It catches and reports (at compile time) the following commonly made programming errors:
    • failing to escape a plain-old-text string before mixing it into a string that represents a language fragment
    • mixing strings that represent fragments of incompatible languages
    • mixing strings that represent fragments of compatible languages in an ambiguous way (the system will force you to disambiguate)

(This is a long one, so grab an espresso, lean back, and read on in style. Also, if you have a smoking jacket, you might want to get it now.)

Read more...

Posted in , , , , , ,
Tags , , , ,
37 comments
no trackbacks
Reddit Delicious

Talk: Embedded domain-specific languages for Perl

Posted by Tom Moertel Tue, 14 Mar 2006 22:39:00 GMT

Last week I gave a brief talk for the Pittsburgh Perl Mongers about embedding domain-specific languages into Perl. The slides from the talk are now available: Embedding an XHTML template language into Perl.

Title slide from my talk on embedding domain-specific languages into Perl

Posted in , ,
Tags , , ,
no comments
no trackbacks
Reddit Delicious

Wondrous oddities: R's function-call semantics

Posted by Tom Moertel Fri, 20 Jan 2006 23:02:00 GMT

Every so often, I am going to write about wondrous oddities – obscure programming-language features that are so cool they deserve wider notice. Today, in the first installment, I want to show you the function-call semantics of R, a great system for statistical computing.

You might not expect a statistics system to have a first-class programming language at it’s heart, but if you think about it, it does make sense. The R language, actually a dialect of the S language, is described as “a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.” All true. It gives me the feeling of an infix Lisp or Scheme whose syntax is slanted toward mathematics and vector operations. The language has an object layer, too, but that’s not why we are here.

No, we are here to look at R’s uncommonly interesting function-call semantics, in particular argument binding and evaluation. Let’s dig in.

Read more...

Posted in , ,
Tags ,
2 comments
no trackbacks
Reddit Delicious

Scope herding with delimited continuations

Posted by Tom Moertel Tue, 13 Sep 2005 14:24:00 GMT

Recently I took advantage of delimited continuations to create a more natural Haskell-based kernel for GIML. The amazing scope-herding abilities of reset and shift were the magic that made it possible.

Ready to get wild? Grab an espresso and read on.

Read more...

Posted in , ,
Tags , ,
2 comments
no trackbacks
Reddit Delicious

Closures and the professional programmer

Posted by Tom Moertel Tue, 30 Aug 2005 18:56:00 GMT

I came across Tim Bray’s thoughts on Ruby via the ever-delightful Lambda the Ultimate and found the following bit fascinating:

I’ve had access to languages with closures and continuations and suchlike constructs for years and years, and I’ve never ever written one. While I’m impressed by how natural this stuff is in Ruby, I’m still unconvinced that these are a necessary part of the professional programmer’s arsenal. [Emphasis mine.]

While Tim Bray may be unconvinced, I am a true believer.

Read more...

Posted in , , ,
Tags , ,
12 comments
1 trackback
Reddit Delicious

Cool stuff: Composable memory transactions

Posted by Tom Moertel Sat, 09 Apr 2005 18:10:00 GMT

If you write software that deals with concurrency, you are doubtless familiar with the painful limitations of the concurrency abstractions that most programming languages, runtimes, and operating systems offer us humble programmers. The one-big-select idiom, the need to impose a global ordering policy on lock taking, and the myriad things that can unexpectedly bite you in the behind when managing threads are not-so-subtle reminders that the programming world still has a few fundamental problems to solve.

Thus I was impressed when I read Tim Harris, Simon Marlow, Simon Peyton Jones, and Maurice Herlihy’s paper on Composable memory transactions a couple of months ago. The paper introduced some Very Cool Stuff (especially if you program in Haskell, for which there is an implementation available). More recently at the Links meeting at ETAPS (another cool thing a’brewing), the same team gave a talk on the subject: Concurrency Unlocked: transactional memory for composable concurrency. Check out the slides from the talk for a summary of the problem and the STM solution.

The gist is that today’s ubiquitous concurrency abstraction – the lock – is fundamentally at odds with the most successful technique we humans have for building complex systems: gluing simple systems together. Composable memory transactions, on the other hand, do not have this problem. As a result, they offer a fundamentally simpler and more mentally scalable solution for building complex concurrent systems.

To quantify this coolness, we have only to look at section 4.2 of the paper. In about 25 lines of code the authors give an implementation for what is effectively the heart of an instant messaging server. I am not kidding. Multiple writers with serialized writes into each channel, multiple readers on each channel with independent read positions and buffering. Yeah, it’s in there.

Do yourself a favor. Check out the slides from the talk and then read the paper. This is some seriously cool stuff. You ought to know about it.

Update 2007-01-16: Since I wrote this article, Software Transactional Memory has received a lot of attenttion. Here are a couple of pointers worth checking out:

Posted in , ,
Tags , , ,
no comments
no trackbacks
Reddit Delicious

It's official: I like Ruby

Posted by Tom Moertel Fri, 08 Apr 2005 16:00:00 GMT

Recently, I decided to spend some quality time with Ruby. Two weeks of coding later, I’m a happy man. Ruby is fun.

Did you hear that? Ruby is fun. It made my list of Legitimately Fun Programming Languages and, if I am being honest, even edged out Perl:

  1. Haskell
  2. Ruby
  3. Perl

(Haskell is still tops, but you knew that, right?)

Languages that don’t thrill me anymore include C, C++, C#, and Java. Sorry guys, you’re tedious. Hell, you don’t even listen to me. When I assign a string to a variable, why do you make me tell you again that the variable is a string? Couldn’t you figure that out for yourselves? It’s like the left hand side doesn’t know what the right hand side is doing! Yeeeesh!

Anyhoo, Ruby doesn’t make me repeat myself. Ruby also supports many of my favorite functional programming idioms. For instance, the inject method, available on all Enumerable objects, is actually my functional friend foldl:

foldl ( \x y -> ... ) z list   -- Haskell
list.inject(z) { |x,y| ... }   #  Ruby

And Ruby has lambda, callcc, and the strangely wonderful binding. (But sadly no tail-call optimization.) Meta-programming is well supported and seems to be fairly commonplace in Ruby culture. I like that.

If you haven’t tried Ruby, what’s holding you back? Give it a test drive. You never know, you might actually have fun.

Posted in ,
no comments
no trackbacks
Reddit Delicious

Writing a simple Ruby evaluator in Haskell

Posted by Tom Moertel Sat, 26 Mar 2005 00:06:00 GMT

In the last few days I have been learning Ruby, something I have had on my to-do list for a long time. Luckily, I now have a project for which Ruby on Rails is perfect, and so now is the perfect time to get more into Ruby.

Naturally, I am making much use of the second edition of “The Pickaxe,” (pragmatic) Dave Thomas’s book Programming Ruby (the first edition of which is available online). Overall, it is a great book: good organization, lively writing, and superb examples.

But I must say I have one source of frustration. I am a computer-language guy, and I frequently find myself thinking, that’s great, but what exactly does this mean? I’ll give you an example, which I found quite surprising.

Using the following Ruby code, you can create what is in effect your own while-loop construct:

def my_while(cond)
  break unless cond
  yield
  retry
end

i = 0
my_while i < 10 do
  print i
  i += 1
end  

# output: 0123456789

At first, this blew my mind. Why? Because while reading the book, I was building up an operational semantics for the Ruby language in my head, and my understanding of what it means to call a function (iterator) with a block was wrong. In my semantics, calling a function results in evaluating its arguments in the caller’s evaluation context, entering the evaluation context of the function, binding the argument values, passing in the associated block, and then evaluating the function body. When retry is called, I thus reasoned mistakenly, the evaluation stops and begins anew at the beginning of the function body, in this case, back at the break expression.

But, clearly, that cannot be what is happening. If it were, the loop would never terminate. The condition i < 10 would be evaluated only once – when the my_while function was called – and thus true would be forever bound to cond within the evaluation context of the function’s body.

At this point, my brain went into hyper-curiosity mode. Why – and more to the point, how – does this thing work? I started looking for the calling semantics of Ruby. (No luck finding them, btw.) Are arguments passed as thunks that get reevaluated upon each access? No, that seemed too wasteful and bizarre.

This is where the Pickaxe II let me down. It said, “retry will reevaluate any arguments to the iterator before restarting it.” Yes, clearly, that is what is happening. But how is it happening and what exactly does that simple English statement really mean?

So, after thinking about it, I concluded that what is going on is that a function call in Ruby works like this. Given a function f, a block b, and arguments xs, the call f(xs){b} means this:

  1. let k be the current continuation (i.e., just before the call)
  2. evaulate xs and bind the resulting values to f’s formal arguments
  3. bind b internally to the current block
  4. evaluate the body of f

Now, if inside of f’s body we encounter a retry, the evaluator basically calls k (with a nil argument, I expect). This jumps back to step 2, from which evaluation continues. Any side effects up to this point are retained (so we could have previously incremented i, for example), which is what eventually allows the code within the function body to choose an execution path which does not contain a retry expression, and thus avoid looping forever.

Just to make sure I really had the semantics down, I wrote an evaluator for a mini-Ruby in Haskell. (I find that I understand something better after I build it from the ground up.)

{-# OPTIONS -fglasgow-exts #-}

-- Tom Moertel <tom@moertel.com>
-- 2005-03-26

module MiniRuby where

import Control.Monad.Cont
import Control.Monad.Reader
import Control.Monad.State
import Data.List
import Data.Maybe

-- ========================================================
-- DATA TYPES

-- To keep things simple, there is only value type: string.

type Identifier = String
type Value = String

-- Evaluation occurs with in the continuation monad (which is used to
-- handle control flow), wrapped around a reader monad (which keeps
-- track of the calling context), wrapped around a state monad (which
-- keeps track of the evaluator's variables.)

type Env = [(Identifier, Value)] -- identifiers => values (strings)
type RubyEvalCxt a = ContT a (ReaderT FcallCxt (State Env)) a
data FcallCxt = FC { retryCall :: Exp
                   , blockCont :: Value -> Exp }

-- An expression is just something that can be evaluated to a value
-- within a ruby evaluation context.

type Exp = RubyEvalCxt Value

-- ========================================================
-- EVALUATOR

-- The following function evaluates an expression within a given
-- evaluation context.   The result is a value.

eval :: Env -> FcallCxt -> Exp -> Value
eval env fc =
    (`evalState` env) . (`runReaderT` fc) . (`runContT` return)

-- This version evaluates an expression at the "top level."

evalTop = eval [] $
    FC { retryCall = return "TOPLEVEL RETRY"
       , blockCont = const $ return "TOPLEVEL BLOCK" }

-- A function call takes a function, a list of variable bindings, and
-- corresponding block.  It then creates a new execution context,
-- binds the variables, and evaluates the function body.

fcall :: Exp -> [(Identifier, Exp)] -> Exp -> Exp
fcall fn args blk = callCC evalFn
  where
    evalFn fnCont = (`local` do { bindArgs; fn }) $ \fc -> 
        fc { retryCall = evalFn fnCont >>= fnCont
           , blockCont = const blk }
    bindArgs = mapM_ (uncurry (=:=)) args

-- Yield remembers the current continuation and passes control to the
-- associated block (which we get from the calling context).  For
-- extra spice, this version differs from Ruby's in that upon yielding
-- it makes the yielding function seem like a block to the block to
-- which it yields.  This lets the function and block yield back and
-- forth to each other, passing values along the way.

yield_ = yield "YIELD" -- yield w/o value

yield :: Value -> Exp
yield value = callCC $ \k -> do
    bc <- asks blockCont
    local (\fc -> fc { blockCont = k }) (bc value)

-- Retry restarts the current computation w/in the calling context's
-- continuation.

retry :: Exp
retry = do
    k <- asks retryCall
    k

-- ========================================================
-- VARIABLES

-- Bind associates a value with an identifier

bind :: Identifier -> Value -> Exp
bind i v = do
    modify ((i,v) :)
    return v

-- More convenient syntax (=:=) for binding

infixr 1 =:=
class Bindable v where (=:=) :: Identifier -> v -> Exp
instance Bindable Value where (=:=) = bind
instance Bindable Exp where i =:= e = bind i =<< e

-- Lookup the value associated with an identifier

val :: Identifier -> Exp
val i = gets $ fromMaybe (i ++ "=UNDEFINED") . lookup i

-- ========================================================
-- SAMPLE CODE

-- This code shows how "retry" works.  It is equivalent
-- to the following Ruby code:
-- 
--   def my_while(cond)
--     if cond
--       yield
--       retry
--     end
--   end
-- 
--   i = 0
--   my_while i < 10 do
--     i += 1
--   end  
-- 
--   i

test1 = do
    "i" =:= "0"
    my_while [("cond", condExp)] $
        "i" += 1 -- block, passed to my_while
    val "i"
  where
    my_while = fcall $ do
        cond <- val "cond"
        if cond == "true"
           then do { yield_; retry }
           else return cond

    -- Ruby's += operator
    a += b = a =:= (liftM $ show . (b+) . read) (val a)

    -- the expression "i < 10"
    condExp = do
        i <- val "i"
        return $ if (read i) < 10 then "true" else "false"

-- This example tests out yield.  It is equivalent to the following
-- Ruby code:
--
--   def f
--     @i = "I"
--     @k = "K"
--     yield
--   end
-- 
--   f do
--     @j = "J"
--     @l = "L"
--   end
--
--   [@i,@j,@k,@l].join(" ")

test2 = do
    f [] $ do
        "j" =:= "J"
        "l" =:= "L"
    mapM val (words "i j k l") >>= return . unwords
  where
    f = fcall $ do
        "i" =:= "I"
        "k" =:= "K"
        yield_

-- This sample is somewhat trickier.  It uses the evaluator's
-- extended yield semantics to do what this Ruby code would
-- do if it were legal:
--
-- def f
--   @i = "I"
--   @j = yield
--   @k = "K"
--   @l = yield "Right-back-atcha!"
--   @m = yield
-- end
-- 
-- f do
--   rba = yield "J-via-yield"  # not Ruby: yields *back* to f's body
--   yield rba
--   yield "M-via-yield"
-- end
--
-- [@i,@j,@k,@l,@m].join(" ")

test3 = do
    f [] $ do
        rba <- yield "J-via-yield"
        yield rba
        yield "M-via-yield"
    mapM val (words "i j k l m") >>= return . unwords
  where
    f = fcall $ do
        "i" =:= "I"
        "j" =:= yield_
        "k" =:= "K"
        "l" =:= yield "Right-back-atcha!"
        "m" =:= yield_

Here’s what the code does when executed:

> evalTop test1
"10"

> evalTop test2
"I J K L"

> evalTop test3
"I J-via-yield K Right-back-atcha! M-via-yield"

I must say that I really like Ruby’s semantics. So far, I find Ruby to be a seriously cool programming language.

Posted in , , ,
Tags , ,
6 comments
no trackbacks
Reddit Delicious