Posted by Tom Moertel
Tue, 31 Oct 2006 19:44:00 GMT
Last Tuesday, my friend Casey and I were hanging
out at Aldo Coffee. We planned on enjoying
some espresso, doing some work, and then heading over to the Pittsburgh Coding
Dojo, where we could hang out with
other geekly folks.
We ended up
not having enough time to go to the meeting, but we decided to hack
on the challenge problem anyway, using Aldo’s ever-handy free
wireless to access the Internet.
The Dojo problem was PragDave’s Kata Eleven – Sorting it
Out. (It’s short;
read it now.) We decided to use Haskell for our implementation
language.
In this post, I’ll walk through our coding session and explain how our
solution evolved. To better fit the session into a blog post, I
have removed a lot of back-and-forth micro iterations, and I have
edited some of the code for clarity.
The first part of the problem
The first part of the problem was “Sorting Balls.” The story: You
need to implement a “rack” to hold the balls drawn at random (without
replacement) from a bin containing sixty balls, numbered 0 to 59.
Regardless of the order in which the balls are added to the rack, you
need to present them in sorted order whenever you’re asked for them.
Upon reading this part of the challenge, a couple of thoughts sprung to mind:
- Because the range of balls is so small, the problem was begging for a solution based on a counting sort.
- Because the balls are uniquely numbered and drawn without replacement, we could even use a bit vector to represent counts.
Nevertheless, we decided to ignore these thoughts and implement a
more-general solution that would work for any (orderable) values,
not just small ranges of integers.
Sketching the interface
The first step, then, was to sketch out an interface. Our
interface mirrored the one from the problem statement but
was tweaked for Haskell:
mkRack :: Rack a
add :: Ord a => a -> Rack a -> Rack a
balls :: Rack a -> [a]
The function mkRack makes a new rack to hold values (“balls”) of
type a. It’s equivalent to Rack.new in Ruby.
The add function adds a ball to a rack. You give it a ball and a
rack, and it returns a new rack that is the same as the original rack
but also contains the ball. (If you’re accustomed to stateful
programming, this may seem weird. Why return a new rack instead of
modifying the original rack? Because, in Haskell, you can’t change
values: you can only create new values. At first, this constraint may
seem limiting, but after you get used to it, you’ll find it
empowering.)
Note: the Ord a qualification on the type signature of
add says that it will work for any type a whose values can be
ordered. The qualification is necessary because values of some types,
like IO actions, cannot be compared to see which are less than the
others.
The balls function is an “observer”: it lets you observe the balls
in a rack by returning them as an ordered list.
And that’s the interface.
With the interface sketched, we gave it meaning by defining its
properties.
Giving our interface meaning: defining properties using QuickCheck
QuickCheck is a
powerful, easy-to-use testing tool. Instead of checking test cases,
it checks properties – statements about what your code ought to do
in general.
The great thing about QuickCheck properties is that they are
testable documentation. They tell the world what your code
is supposed to do,
and they do so in a concise, formal language that just happens to be
easily readable by humans and automatically testable by computers.
To specify the desired properties of our Rack interface, we first had
to import QuickCheck:
Then, we defined our first property. It said, simply, that a new rack
must be empty when observed:
prop_New =
balls mkRack =~ []
Our second property said that, when you add a ball x to
a rack, the resulting rack must contain the same
balls as the original rack plus x:
prop_AddAddsElement rack x =
balls (add x rack) =~ (x : balls rack)
Both of the properties above rely upon a special, order-insensitive
equality test that we defined for lists of Int values:
(=~) :: [Int] -> [Int] -> Bool
xs =~ ys = sort xs == sort ys
Note that under this test, [1,2] “equals”
both [1,2] and [2,1], but it does not “equal”
any other values.
The reason we defined this operator was to help us specify the two
essential properties of add separately: (1) it must insert a ball
into a rack, and (2) the new ball’s position, when observed, must
preserve the rack’s ordering invariant. The previous property
definition used the =~ operator to specify the first of
these two properties. The next property we defined specified the
second:
prop_AddPreservesOrdering rack x =
isOrdered (balls rack) ==> isOrdered (balls (add x rack))
This definition specifies that, for all racks rack and all balls
x, if the balls in rack are ordered, the balls in the rack that
results from adding x to rack must also be ordered. If you
are familiar with proof by
induction, you’ll
know why we went this route. In short, if we can prove that this
property holds (and, trivially, that an empty rack is ordered), we can
prove that add preserves the ordering invariant.
To round out the property definition, we needed to define the isOrdered test:
isOrdered :: [Int] -> Bool
isOrdered xs = xs == sort xs
And those are the properties we needed to check the correctness
of our implementation. Of course, we still needed to write our
implementation, and we turned to that task next.
A simple, list-based Rack implementation
For our first implementation, we decided upon a drop-dead-simple
list-based representation. We would keep the elements of the list
in sorted order by inserting them into the correct positions when
add was called.
Here, then, was our code:
type Rack a = [a]
mkRack = []
add x xs = insertList x xs
balls = id
insertList :: Ord a => a -> [a] -> [a]
insertList x [] = [x]
insertList x (y:ys)
| x < y = x : y : ys
| otherwise = y : insertList x ys
That’s it.
We took our new implementation for a spin in GHCi:
*Rack> balls mkRack
[]
*Rack> balls (add 3 mkRack)
[3]
*Rack> balls (add 4 (add 3 mkRack))
[3,4]
*Rack> balls (add 1 (add 4 (add 3 mkRack)))
[1,3,4]
*Rack> balls (foldr add mkRack [4,2,6,3,-9,0,33,9])
[-9,0,2,3,4,6,9,33]
To really test our implementation, we asked QuickCheck to check its
properties:
*Rack> quickCheck prop_New
OK, passed 100 tests.
*Rack> quickCheck prop_AddAddsElement
OK, passed 100 tests.
*Rack> quickCheck prop_AddPreservesOrdering
OK, passed 100 tests.
I should point out that QuickCheck did not prove that our properties
held. Rather, it gathered evidence that we could use to argue that
our properties held. The evidence was that each of our properties’
claims was subjected to 100 randomly generated tests, and none of
the tests was able to disprove a claim.
Was this evidence sufficient for us to rest satisfied that our
implementation was correct? Given how simple our implementation
was, I felt that the evidence was sufficient. Casey agreed, and we moved on.
With the first implementation done, we decided to try a more-sophisticated
implementation.
Generalizing the interface
Since we were about to have multiple implementations, it made sense
for us to define a generalized interface that any “Rack-like”
implementation could use. For that, Haskell’s type classes were
perfect:
class Racklike a ra | ra -> a where
mkRack :: ra
add :: Ord a => a -> ra -> ra
balls :: ra -> [a]
The interface was essentially the same as before, except that the data
type behind the rack implementation was not given by a specific type
Rack a but rather by the type variable ra, which represents some
type of rack container for balls of type a.
Note that ra determines a. If, for example, you know that
the container type ra equals “a list of Int values,”
you know that a must equal Int. (To represent this
relationship, we used functional
dependencies,
a popular extension to the Haskell 98 standard.)
With the Racklike type class in place, we moved our list-based
implementation inside of the interface:
type ListRack a = [a]
instance Racklike a (ListRack a) where
mkRack = []
add = insertList
balls = id
Next, we modified our QuickCheck property definitions. Where before
it was fine to assume that we would be testing our single, list-based
implementation, now we needed to allow for testing other
implementation types. We did this by adding a rackType parameter to
our property definitions. We used the type, not the value, of this
parameter to determine the type of rack to test:
prop_New rackType =
balls (mkRack `asTypeOf` rackType) =~ []
prop_AddAddsElement rackType ballList x =
balls (add x rack) =~ (x : balls rack)
where
rack = rackFromList ballList `asTypeOf` rackType
prop_AddPreservesOrdering rackType ballList x =
isOrdered (balls rack) ==> isOrdered (balls (add x rack))
where
rack = rackFromList ballList `asTypeOf` rackType
Because we could no longer assume the rack would be represented
as a list of integers, we wrote rackFromList to convert such
a list into a rack:
rackFromList xs = foldr add mkRack xs
With these modifications in place, we re-ran our tests, specifying
(via type annotations) that we wanted to run them for the ListRack
implementation:
*Rack> quickCheck $ prop_New (undefined :: ListRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddAddsElement (undefined :: ListRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddPreservesOrdering (undefined :: ListRack Int)
OK, passed 100 tests.
A tree-based Rack implementation
Now that we were free to add additional implementation types,
we created one based on binary trees. We started by defining
the tree data type:
data Tree a
= Empty
| Root (Tree a) a (Tree a)
deriving (Ord, Eq, Show)
This definition says that a tree can be either empty or a root node.
A root node has a single value and left and right sub-trees.
Further, root nodes must satisfy an ordering invariant: if a root
node’s value is x, all of the values in its left subtree must be
less than x, and all of the values in its right subtree must be
greater than or equal to x. The data type doesn’t enforce this
invariant, so we would need to enforce it in our implementation.
Next, we wrote the basic functions for creating, adding elements to,
and observing our trees.
We needed to be able to create empty trees:
Inserting an element into a tree requires us to walk the tree and
append the element as a new leaf node in the correct location, being
mindful of our ordering invariant. Because our data structure is
inherently recursive, a recursive implementation was straightforward
to code:
insertTree x Empty = Root Empty x Empty
insertTree x (Root left y right)
| x < y = Root (insertTree x left) y right
| otherwise = Root left y (insertTree x right)
Note that we don’t try to ensure that the tree is balanced. The
problem statement says that the balls are randomly selected, and thus
we can expect our trees, on average, to be balanced naturally.
Next, we wrote the code to observe the elements of a tree.
We used a functional-programming idiom
for efficiently flattening a tree into a list:
elemsTree rx =
elemsTree' rx []
elemsTree' Empty = id
elemsTree' (Root left x right) =
elemsTree' left . (x :) . elemsTree' right
Finally, we defined a new tree-based rack type and declared
it to be an instance of the Racklike type class:
type TreeRack a = Tree a
instance Racklike a (TreeRack a) where
mkRack = emptyTree
add = insertTree
balls = elemsTree
With the implementation done, we took it for a test drive:
*Rack> add 1 mkRack :: TreeRack Int
Root Empty 1 Empty
*Rack> add 3 (add 1 mkRack) :: TreeRack Int
Root Empty 1 (Root Empty 3 Empty)
*Rack> balls (add 3 (add 1 mkRack) :: TreeRack Int)
[1,3]
Then, for the real test, we checked that our properties held for
TreeRacks:
*Rack> quickCheck $ prop_New (undefined :: TreeRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddAddsElement (undefined :: TreeRack Int)
OK, passed 100 tests.
quickCheck $ prop_AddPreservesOrdering (undefined :: TreeRack Int)
OK, passed 100 tests.
Satisfied with these results, we moved on to part two of the problem.
The second part of the problem
The second part of the problem was about sorting the letters within a
block of text, ignoring white space and punctuation, and converting
upper case letters into lower case: “Are there any ways to
perform this sort cheaply, and without using built-in libraries?”
Again, a counting sort seemed like an obvious ideal solution, but
we decided to recycle our existing code since we had to leave soon.
Because our Rack implementations were generic, they would work on
letters just as well as on numbers or other kinds of balls:
*Rack> balls (rackFromList "this is a test" :: TreeRack Char)
" aehiisssttt"
With our existing code already doing the hard work
for us, it was trivial to code up the letter-sorting function:
sortLetters xs =
balls (rackFromList letters :: TreeRack Char)
where
letters = [toLower x | x <- xs, isAlpha x]
(Note: Because of the nature of the problem, I interpreted the
question’s “without using built-in libraries” to mean “without
built-in sorting libraries.”)
We took the new function for a test drive, and it worked
as expected:
*Rack> sortLetters "This is a test, pal."
"aaehiilpsssttt"
And that ended our coding session.
Update: Tweaked the revised definition of the AddAddsElement
property for greater parallelism with the original.
Update 2007-03-03: Minor edits for clarity.
Posted in programming, functional programming, haskell, testing
Tags haskell, kata, quickcheck, sorting, testing
7 comments
no trackbacks

Posted by Tom Moertel
Sat, 26 Mar 2005 00:06:00 GMT
In the last few days I have been learning
Ruby, something I have had on my
to-do list for a long time. Luckily, I now have a project for which Ruby on Rails
is perfect, and so now is the perfect time to get more into Ruby.
Naturally, I am making much use of the second edition of “The Pickaxe,” (pragmatic) Dave Thomas’s book
Programming Ruby (the first edition of which is available online). Overall, it is a great book: good organization, lively writing, and superb examples.
But I must say I have one source of frustration. I am a computer-language guy, and I frequently find myself thinking, that’s great, but what exactly does this mean? I’ll give you an example, which I found quite surprising.
Using the following Ruby code, you can create what is in effect your own while-loop construct:
def my_while(cond)
break unless cond
yield
retry
end
i = 0
my_while i < 10 do
print i
i += 1
end
At first, this blew my mind. Why? Because while reading the book, I
was building up an operational semantics for the Ruby language in my
head, and my understanding of what it means to call a function
(iterator) with a block was wrong. In my semantics, calling a
function results in evaluating its arguments in the caller’s
evaluation context, entering the evaluation context of the function,
binding the argument values, passing in the associated block, and then
evaluating the function body. When retry is called, I thus reasoned
mistakenly, the evaluation stops and begins anew at the beginning of
the function body, in this case, back at the break expression.
But, clearly, that cannot be what is happening. If it
were, the loop would never terminate. The condition
i < 10 would be evaluated only once – when the
my_while function was called – and thus true would be forever bound to cond
within the evaluation context of the function’s body.
At this point, my brain went into hyper-curiosity mode. Why –
and more to the point, how – does this thing work? I
started looking for the calling semantics of Ruby. (No luck finding
them, btw.) Are arguments passed as thunks that get reevaluated upon
each access? No, that seemed too wasteful and bizarre.
This is where the Pickaxe II let me down. It said, “retry will
reevaluate any arguments to the iterator before restarting it.” Yes,
clearly, that is what is happening. But how is it happening and
what exactly does that simple English statement really mean?
So, after thinking about it, I concluded that what is going on is
that a function call in Ruby works like this. Given a function f,
a block b, and arguments xs, the call f(xs){b}
means this:
- let k be the current continuation (i.e., just before the call)
- evaulate xs and bind the resulting values to f’s formal arguments
- bind b internally to the current block
- evaluate the body of f
Now, if inside of f’s body we encounter a retry, the evaluator
basically calls k (with a nil argument, I expect). This jumps
back to step 2, from which evaluation continues. Any side effects up
to this point are retained (so we could have previously incremented
i, for example), which is what eventually allows the code within the function
body to choose an execution path which does not contain a
retry expression, and thus avoid looping forever.
Just to make sure I really had the semantics down, I wrote an
evaluator for a mini-Ruby in Haskell. (I find that I understand something
better after I build it from the ground up.)
module MiniRuby where
import Control.Monad.Cont
import Control.Monad.Reader
import Control.Monad.State
import Data.List
import Data.Maybe
type Identifier = String
type Value = String
type Env = [(Identifier, Value)]
type RubyEvalCxt a = ContT a (ReaderT FcallCxt (State Env)) a
data FcallCxt = FC { retryCall :: Exp
, blockCont :: Value -> Exp }
type Exp = RubyEvalCxt Value
eval :: Env -> FcallCxt -> Exp -> Value
eval env fc =
(`evalState` env) . (`runReaderT` fc) . (`runContT` return)
evalTop = eval [] $
FC { retryCall = return "TOPLEVEL RETRY"
, blockCont = const $ return "TOPLEVEL BLOCK" }
fcall :: Exp -> [(Identifier, Exp)] -> Exp -> Exp
fcall fn args blk = callCC evalFn
where
evalFn fnCont = (`local` do { bindArgs; fn }) $ \fc ->
fc { retryCall = evalFn fnCont >>= fnCont
, blockCont = const blk }
bindArgs = mapM_ (uncurry (=:=)) args
yield_ = yield "YIELD"
yield :: Value -> Exp
yield value = callCC $ \k -> do
bc <- asks blockCont
local (\fc -> fc { blockCont = k }) (bc value)
retry :: Exp
retry = do
k <- asks retryCall
k
bind :: Identifier -> Value -> Exp
bind i v = do
modify ((i,v) :)
return v
infixr 1 =:=
class Bindable v where (=:=) :: Identifier -> v -> Exp
instance Bindable Value where (=:=) = bind
instance Bindable Exp where i =:= e = bind i =<< e
val :: Identifier -> Exp
val i = gets $ fromMaybe (i ++ "=UNDEFINED") . lookup i
test1 = do
"i" =:= "0"
my_while [("cond", condExp)] $
"i" += 1
val "i"
where
my_while = fcall $ do
cond <- val "cond"
if cond == "true"
then do { yield_; retry }
else return cond
a += b = a =:= (liftM $ show . (b+) . read) (val a)
condExp = do
i <- val "i"
return $ if (read i) < 10 then "true" else "false"
test2 = do
f [] $ do
"j" =:= "J"
"l" =:= "L"
mapM val (words "i j k l") >>= return . unwords
where
f = fcall $ do
"i" =:= "I"
"k" =:= "K"
yield_
test3 = do
f [] $ do
rba <- yield "J-via-yield"
yield rba
yield "M-via-yield"
mapM val (words "i j k l m") >>= return . unwords
where
f = fcall $ do
"i" =:= "I"
"j" =:= yield_
"k" =:= "K"
"l" =:= yield "Right-back-atcha!"
"m" =:= yield_
Here’s what the code does when executed:
> evalTop test1
"10"
> evalTop test2
"I J K L"
> evalTop test3
"I J-via-yield K Right-back-atcha! M-via-yield"
I must say that I really like Ruby’s semantics. So far, I find
Ruby to be a seriously cool programming language.
Posted in functional programming, programming languages, haskell, ruby
Tags evaluators, haskell, ruby
6 comments
no trackbacks
