Writing a simple Ruby evaluator in Haskell

Posted on March 25, 2005

In the last few days I have been learning Ruby, something I have had on my to-do list for a long time. Luckily, I now have a project for which Ruby on Rails is perfect, and so now is the perfect time to get more into Ruby.

Naturally, I am making much use of the second edition of “The Pickaxe,” (pragmatic) Dave Thomas’s book Programming Ruby (the first edition of which is available online). Overall, it is a great book: good organization, lively writing, and superb examples.

But I must say I have one source of frustration. I am a computer-language guy, and I frequently find myself thinking, that’s great, but what exactly does this mean? I’ll give you an example, which I found quite surprising.

Using the following Ruby code, you can create what is in effect your own while-loop construct:

def my_while(cond)
  break unless cond
  yield
  retry
end

i = 0
my_while i < 10 do
  print i
  i += 1
end

# output: 0123456789

At first, this blew my mind. Why? Because while reading the book, I was building an understanding of the semantics of the Ruby language in my head, and my understanding of what it means to call a function (iterator) with a block was wrong. In my understanding, calling a function results in evaluating its arguments in the caller’s evaluation context, entering the evaluation context of the function, binding the argument values, passing in the associated block, and then evaluating the function body. When retry is called, I reasoned mistakenly, the evaluation stops and begins anew at the beginning of the function body, in this case, back at the break expression.

But, clearly, that cannot be what is happening. If it were, the loop would never terminate. The condition i < 10 would be evaluated only once – when the my_while function was called – and thus true would be forever bound to cond within the evaluation context of the function’s body.

At this point, I became curious. What’s really going on with retry? To understand its relationship to function calls, I started looking for the calling semantics of Ruby. (No luck finding them, btw.) Are arguments passed as thunks that get reevaluated upon each access? No, that seemed too wasteful and bizarre.

Pickaxe II said, “retry will reevaluate any arguments to the iterator before restarting it.” Yes, clearly, that is what is happening. But what is happening under the hood? What does that statement really mean?

So, after thinking about it, I concluded that what is going on is that a function call in Ruby works like this. Given a function f, a block b, and arguments xs, the call f(xs){b} means this:

let k be the current continuation (i.e., just before the call)
evaulate xs and bind the resulting values to f’s formal arguments
bind b internally to the current block
evaluate the body of f

Now, if inside of f’s body we encounter a retry, the evaluator basically calls k (with a nil argument, I expect). This jumps back to step 2, from which evaluation continues. Any side effects up to this point are retained (so we could have previously incremented i, for example), which is what eventually allows the code within the function body to choose an execution path which does not contain a retry expression, and thus avoid looping forever.

Just to make sure I really had the semantics down, I wrote an evaluator for a mini-Ruby in Haskell. (I find that I understand something better after I build it from the ground up.)

{-# OPTIONS -fglasgow-exts #-}

-- Tom Moertel <tom@moertel.com>
-- 2005-03-26

module MiniRuby where

import Control.Monad.Cont
import Control.Monad.Reader
import Control.Monad.State
import Data.List
import Data.Maybe

-- ========================================================
-- DATA TYPES

-- To keep things simple, there is only value type: string.

type Identifier = String
type Value = String

-- Evaluation occurs with in the continuation monad (which is used to
-- handle control flow), wrapped around a reader monad (which keeps
-- track of the calling context), wrapped around a state monad (which
-- keeps track of the evaluator's variables.)

type Env = [(Identifier, Value)] -- identifiers => values (strings)
type RubyEvalCxt a = ContT a (ReaderT FcallCxt (State Env)) a
data FcallCxt = FC { retryCall :: Exp
                   , blockCont :: Value -> Exp }

-- An expression is just something that can be evaluated to a value
-- within a ruby evaluation context.

type Exp = RubyEvalCxt Value


-- ========================================================
-- EVALUATOR

-- The following function evaluates an expression within a given
-- evaluation context.   The result is a value.

eval :: Env -> FcallCxt -> Exp -> Value
eval env fc =
    (`evalState` env) . (`runReaderT` fc) . (`runContT` return)

-- This version evaluates an expression at the "top level."

evalTop = eval [] $
    FC { retryCall = return "TOPLEVEL RETRY"
       , blockCont = const $ return "TOPLEVEL BLOCK" }

-- A function call takes a function, a list of variable bindings, and
-- corresponding block.  It then creates a new execution context,
-- binds the variables, and evaluates the function body.

fcall :: Exp -> [(Identifier, Exp)] -> Exp -> Exp
fcall fn args blk = callCC evalFn
  where
    evalFn fnCont = (`local` do { bindArgs; fn }) $ \fc ->
        fc { retryCall = evalFn fnCont >>= fnCont
           , blockCont = const blk }
    bindArgs = mapM_ (uncurry (=:=)) args

-- Yield remembers the current continuation and passes control to the
-- associated block (which we get from the calling context).  For
-- extra spice, this version differs from Ruby's in that upon yielding
-- it makes the yielding function seem like a block to the block to
-- which it yields.  This lets the function and block yield back and
-- forth to each other, passing values along the way.

yield_ = yield "YIELD" -- yield w/o value

yield :: Value -> Exp
yield value = callCC $ \k -> do
    bc <- asks blockCont
    local (\fc -> fc { blockCont = k }) (bc value)

-- Retry restarts the current computation w/in the calling context's
-- continuation.

retry :: Exp
retry = do
    k <- asks retryCall
    k


-- ========================================================
-- VARIABLES

-- Bind associates a value with an identifier

bind :: Identifier -> Value -> Exp
bind i v = do
    modify ((i,v) :)
    return v

-- More convenient syntax (=:=) for binding

infixr 1 =:=
class Bindable v where (=:=) :: Identifier -> v -> Exp
instance Bindable Value where (=:=) = bind
instance Bindable Exp where i =:= e = bind i =<< e

-- Lookup the value associated with an identifier

val :: Identifier -> Exp
val i = gets $ fromMaybe (i ++ "=UNDEFINED") . lookup i


-- ========================================================
-- SAMPLE CODE

-- This code shows how "retry" works.  It is equivalent
-- to the following Ruby code:
--
--   def my_while(cond)
--     if cond
--       yield
--       retry
--     end
--   end
--
--   i = 0
--   my_while i < 10 do
--     i += 1
--   end
--
--   i

test1 = do
    "i" =:= "0"
    my_while [("cond", condExp)] $
        "i" += 1 -- block, passed to my_while
    val "i"
  where
    my_while = fcall $ do
        cond <- val "cond"
        if cond == "true"
           then do { yield_; retry }
           else return cond

    -- Ruby's += operator
    a += b = a =:= (liftM $ show . (b+) . read) (val a)

    -- the expression "i < 10"
    condExp = do
        i <- val "i"
        return $ if (read i) < 10 then "true" else "false"


-- This example tests out yield.  It is equivalent to the following
-- Ruby code:
--
--   def f
--     @i = "I"
--     @k = "K"
--     yield
--   end
--
--   f do
--     @j = "J"
--     @l = "L"
--   end
--
--   [@i,@j,@k,@l].join(" ")

test2 = do
    f [] $ do
        "j" =:= "J"
        "l" =:= "L"
    mapM val (words "i j k l") >>= return . unwords
  where
    f = fcall $ do
        "i" =:= "I"
        "k" =:= "K"
        yield_

-- This sample is somewhat trickier.  It uses the evaluator's
-- extended yield semantics to do what this Ruby code would
-- do if it were legal:
--
-- def f
--   @i = "I"
--   @j = yield
--   @k = "K"
--   @l = yield "Right-back-atcha!"
--   @m = yield
-- end
--
-- f do
--   rba = yield "J-via-yield"  # not Ruby: yields *back* to f's body
--   yield rba
--   yield "M-via-yield"
-- end
--
-- [@i,@j,@k,@l,@m].join(" ")

test3 = do
    f [] $ do
        rba <- yield "J-via-yield"
        yield rba
        yield "M-via-yield"
    mapM val (words "i j k l m") >>= return . unwords
  where
    f = fcall $ do
        "i" =:= "I"
        "j" =:= yield_
        "k" =:= "K"
        "l" =:= yield "Right-back-atcha!"
        "m" =:= yield_

Here’s what the code does when executed:

> evalTop test1
"10"

> evalTop test2
"I J K L"

> evalTop test3
"I J-via-yield K Right-back-atcha! M-via-yield"

I must say that I really like Ruby’s semantics. So far, I find Ruby to be a seriously cool programming language.