Scope herding with delimited continuations

By
Posted on
Tags: continuations, scope, languages

Recently I took advantage of delimited continuations to create a more natural Haskell-based kernel for GIML (Genetics Information Manipulation Language). The amazing scope-herding abilities of reset and shift were the magic that made it possible.

Ready to get wild? Grab an espresso and read on. Consider the GIML code below. A block, delimited by curly braces, establishes a new evaluation frame. Variable bindings (such as those for x below) shadow earlier bindings, and each remains in effect until its enclosing block goes out of scope.

# GIML code (ex. 1)
x <- 1
{
    x <- 2   # local binding shadows earlier binding
    x        # evaluates to 2
}            # local binding goes out of scope with block
x   # evaluates to 1

The Reader monad provides most of this block-scoping behavior for free, and so it makes a natural part of the monad underlying the GIML evaluator. Its job is to make an environment – in this case, a Haskell Data.Map containing our active bindings – available to the actions running within it. The Reader monad provides the ask and asks combinators, which we can use to get and work with our bindings. We can use asks, for example, to write a combinator evalVar that takes a variable name and returns an action that looks up the variable’s value in the active bindings:

-- Haskell code (ex. 2)
evalVar var = do
    val <- asks (Map.findWithDefault 0 var)
    return val

The Reader monad also supplies a combinator local that can be used to run actions within a locally modified environment. The semantics of local ensure that the modifications cannot escape: after the local actions have been executed, the local environment goes out of scope and the previous, containing environment is restored. And that’s the block-scoping behavior we want.

Let’s use local to represent the following GIML block:

# GIML code (ex. 3)
{
    x <- 2
    x
}

Here is a local-based Haskell representation of the GIML block:

-- Haskell code (ex. 4)
local (Map.insert "x" 2) $ do
    evalVar "x"

The Haskell code above says, roughly translated, “Create a local environment by binding x to 2 (on top of any bindings that may have been effective within the outer environment) and then run the evalVar action within the local environment.”

Any actions before or after the local action will run in the outer environment, which will be unaffected by the local action. For example, consider the following Haskell code:

do
evalVar "x"
local (Map.insert "x" 2) $ do
    evalVar "x"
evalVar "x"

In the code above, if the first evalVar action results in 1, so must the third evalVar action because both run in the exact same environment. The second evalVar action, however, runs in a locally modified environment where x is bound to 2, and so it will result in 2.

Unfortunately, this translation scheme for blocks and bindings has a limitation. Once we enter a block, we can’t change its environment because the only way to make changes is via local, and those changes would be confined to a new, local environment and would not affect the environment of the block’s remaining actions. Thus the only way we can effect a mid-block binding is to introduce another block, nested within the first.

For example, consider the following GIML code:

# GIML code (ex. 5)
{
    x <- 2
    y <- x + 1
    x + y
}

How do we translate this into Haskell? Our earlier translation method doesn’t work on this code because we have a mid-block binding. If we rewrite the GIML code to make the scope of each binding explicit, however, we get an equivalent GIML expression that can be translated:

# GIML code (ex. 6)
{
    x <- 2
    {
        y <- x + 1
        x + y
    }
}

The Haskell translation of the above:

-- Haskell code (ex. 7)
local (Map.insert "x" 2) $ do
    x <- evalVar "x"
    local (Map.insert "y" (x + 1)) $ do
        x <- evalVar "x"
        y <- evalVar "y"
        return (x + y)

This is, ultimately, the translation we want, but deriving it is awkward. Who wants to scan the AST and insert scoping blocks? Not me.

Instead, why not break the chains between binding and block-scoping? We ought to be able to make a local binding anywhere, not just at the point where a block begins. Let’s aim for code like the following, which is a direct translation of the earlier GIML code and preserves the user-supplied blocking:

-- Haskell code (ex. 8)
enterBlock $ do
    bindLocal "x" 2
    x <- evalVar "x"
    bindLocal "y" (x + 1)
    x <- evalVar "x"
    y <- evalVar "y"
    return (x + y)

The key to this more pleasant translation is the magic pair of combinators enterBlock and bindLocal. The enterBlock combinator creates a new block, and bindLocal sets up a binding that goes out of scope when the block that contains it does. Note that bindLocal can be used anywhere, and not just at the beginning of a block.

How, then, do we implement these new combinators? This is where delimited continuations come in. Take a look at the combinators’ definitions:

-- Haskell code
enterBlock action = reset action
bindLocal var val = shift $ \c -> local (Map.insert var val) (c ())

Before going further, let’s review reset and shift, the dynamic duo of delimited continuations. The reset combinator brackets an action, drawing a boundary around it. The shift combinator is a bit trickier. When we invoke shift within a bracketed action, something amazing happens. The bracketed action is abortively replaced by another action that is constructed by the template function that we supply to shift.

Now, here’s where it gets wild. Our template function constructs the replacement action from an internal template, which can pull in the shift action’s context. The context is a delimited continuation that represents the unexecuted portion of the reset-bracketed action – except for the shift action itself, which has been removed, leaving a hole in the context. Our template fills in the hole with a value by passing the value to the context, which is really just another function.

If it sounds confusing, the following example may help. Consider this code, which represents an action within a monad that supports delimited continuations and IO:

-- Haskell code (ex. 9)
reset $ do
    puts "before"
    msg <- shift $ \cxt -> do
        -- our "template"
        puts "template start"
        cxt "template cxt(first)"
        cxt "template cxt(second)"
        puts "template end"
    puts msg
    puts "after"
  where
    puts = liftIO . putStrLn

Let’s break it down. First, reset brackets everything between it and the where clause.

Second, the initial puts action prints “before” to the screen.

Third, the shift action takes control. Blammo! The reset-delimited action is aborted. Its unexecuted portion is captured as the context, which we can approximate with an anonymous function that looks like this:

( \hole -> reset $ do
    msg <- return hole
    puts msg
    puts "after" )

This context function is then passed by shift to our template function, which binds the context function to its cxt variable.

Then the template function builds the action that will replace the aborted action. Before “expansion,” the template looks like this:

do
puts "template start"
cxt "template cxt(first)"
cxt "template cxt(second)"
puts "template end"

Each call to the cxt function drops the captured context into the template and fills in the context’s hole with the supplied argument. Expanding these calls results in the final template:

do
puts "template start"
reset $ do
    msg <- return "template cxt(first)"
    puts msg
    puts "after"
reset $ do
    msg <- return "template cxt(second)"
    puts msg
    puts "after"
puts "template end"

Because the reset-delimited portions of the code above do not contain any shift actions, reset acts like an identity function, and the final expansion of our template is thus as follows:

do
puts "template start"
msg <- return "template cxt(first)"
puts msg
puts "after"
msg <- return "template cxt(second)"
puts msg
puts "after"
puts "template end"

As we would now expect, running the original action results in the following output:

before
template start
template cxt(first)
after
template cxt(second)
after
template end

(Recall that the first line of output was emitted before the shift action was invoked.)

The ability to grab the remainder of an action and stuff it into a template is what makes the magic of enterBlock and bindLocal possible. When they appear in our Haskellized GIML code from earlier:

-- Haskell code
enterBlock $ do
    bindLocal "x" 2
    x <- evalVar "x"
    bindLocal "y" (x + 1)
    x <- evalVar "x"
    y <- evalVar "y"
    return (x + y)

they expand into this:

reset $ do
    shift $ \c -> local (Map.insert "x" 2) (c ())
    x <- evalVar "x"
    shift $ \c -> local (Map.insert "y" (x + 1)) (c ())
    x <- evalVar "x"
    y <- evalVar "y"
    return (x + y)

And that, thanks to delimited continuations, is “re-scoped” into to this:

reset $ do
    reset $ local (Map.insert "x" 2) $ do
        x <- evalVar "x"
        shift $ \c -> local (Map.insert "y" (x + 1)) (c ())
        x <- evalVar "x"
        y <- evalVar "y"
        return (x + y)

and then into this:

reset $ do
    reset $ local (Map.insert "x" 2) $ do
        x <- evalVar "x"
        reset $ local (Map.insert "y" (x + 1)) $ do
            x <- evalVar "x"
            y <- evalVar "y"
            return (x + y)

Finally, because the remaining reset combinators contain no shift actions, we can treat them like identity functions to arrive at our final code, which is identical to the Haskell code we wrote by hand back in Example 7 to represent the GIML code that we added explicit blocking to (again, by hand) in Example 6:

local (Map.insert "x" 2) $ do
    x <- evalVar "x"
    local (Map.insert "y" (x + 1)) $ do
        x <- evalVar "x"
        y <- evalVar "y"
        return (x + y)

And that, my friends, is what we were after, all along. Only this time around, we didn’t have to do it by hand – our combinators did the hard work for us.

Update 2007-03-15: Fixed typo.