Thinking algebraically: a functional-programming dividend that pays during your imperative-programming day job

Posted by Tom Moertel Thu, 21 Aug 2008 01:50:00 GMT

Although at work I code mostly in Python – a language from which lambda and map were nearly removed – I still find that functional-programming experience has its benefits. One of the “functional-programming dividends” I notice most often is insight gained from considering problems from an algebraic perspective.

Recently, for example, I had a small parsing problem. I had to write code that, given a simple grammar specification as input, emits a regular expression that matches the language generated by the grammar.

Here’s a simplified version of the problem. A grammar specification is limited to a series of one or more atoms. For example, “a b c” represents the atom “a”, followed by the atom “b”, followed by the atom “c”. To generate the grammar, the series of atoms is interpreted such that each atom (except the last) generates a production rule of the following form:

atom_rule ::=
  <the literal atom> (SPACE <the next rule> | NOTHING)

(SPACE represents literal white space and NOTHING represents an empty string.) The grammar as a whole is rooted in the first atom’s rule.

Thus the specification “a b c” represents the following grammar:

grammar ::= a_rule
a_rule  ::= "a" (SPACE b_rule | NOTHING)
b_rule  ::= "b" (SPACE c_rule | NOTHING)
c_rule  ::= "c" 

Note that the final atom’s production matches only the literal atom itself: it has no following rule on which to chain.

The grammar, in turn, generates the following language:

a
a b
a b c

Thus, given the grammar specification “a b c”, my code had to produce a regular expression that would match “a”, “a b”, or “a b c”.

At this point, please stop for a moment and think about this little programming exercise. Do any solutions leap to mind? How would you approach the problem? Form your opinions now, because I’m going to ask you about them later. (If you’re feeling especially caffeinated, try coding a solution before reading on.)

Read more...

Posted in
Tags , , ,
20 comments
no trackbacks
Reddit Delicious

Oleg's great way of explaining delimited continuations

Posted by Tom Moertel Mon, 05 May 2008 13:58:00 GMT

There’s a great way to explain delimited continuations in the notes of Oleg’s Continuation Fest talk on using delimited continuations for CGI programming. Just so it doesn’t get overlooked, here it is:

I’m obsessed in pointing out that every programmer already knows and understands the delimited continuations; they might not know that word though. Everyone knows that when a process executes a system call like read, it gets suspended. When the disk delivers the data, the process is resumed. That suspension of a process is its continuation. It is delimited: it is not the check-point of the whole OS, it is the check-point of a process only, from the invocation of main() up to the point main() returns. Normally these suspensions are resumed only once, but can be zero times (exit) or twice (fork).

I especially like the final part about exit and fork, which drives home the notion that something more subtle than returning from a typical function call is going on. If anybody is confused over what suspended means, that last part ought to clear things up.

The next time I need to explain delimited continuations, I know how I’m going to do it.

Posted in
Tags , ,
no comments
no trackbacks
Reddit Delicious

A simple directory-tree printer in Haskell

Posted by Tom Moertel Thu, 22 Feb 2007 21:30:00 GMT

At a recent gathering, my friend Casey mentioned that he was learning a new programming language and, as a learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and prints out a tree representation for each:

$ tlist cheating-hangman

cheating-hangman
|-- CVS
|   |-- Entries
|   |-- Repository
|   `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl

I thought the problem sounded like fun, and so I wrote a small solution in Haskell. Let’s take a look.

Read more...

Posted in , ,
Tags , , ,
10 comments
no trackbacks
Reddit Delicious

Introductory Haskell: Solving the Sorting-It-Out Kata

Posted by Tom Moertel Tue, 31 Oct 2006 19:44:00 GMT

Last Tuesday, my friend Casey and I were hanging out at Aldo Coffee. We planned on enjoying some espresso, doing some work, and then heading over to the Pittsburgh Coding Dojo, where we could hang out with other geekly folks. We ended up not having enough time to go to the meeting, but we decided to hack on the challenge problem anyway, using Aldo’s ever-handy free wireless to access the Internet.

The Dojo problem was PragDave’s Kata Eleven – Sorting it Out. (It’s short; read it now.) We decided to use Haskell for our implementation language.

In this post, I’ll walk through our coding session and explain how our solution evolved. To better fit the session into a blog post, I have removed a lot of back-and-forth micro iterations, and I have edited some of the code for clarity.

The first part of the problem

The first part of the problem was “Sorting Balls.” The story: You need to implement a “rack” to hold the balls drawn at random (without replacement) from a bin containing sixty balls, numbered 0 to 59. Regardless of the order in which the balls are added to the rack, you need to present them in sorted order whenever you’re asked for them.

Upon reading this part of the challenge, a couple of thoughts sprung to mind:

  • Because the range of balls is so small, the problem was begging for a solution based on a counting sort.
  • Because the balls are uniquely numbered and drawn without replacement, we could even use a bit vector to represent counts.

Nevertheless, we decided to ignore these thoughts and implement a more-general solution that would work for any (orderable) values, not just small ranges of integers.

Sketching the interface

The first step, then, was to sketch out an interface. Our interface mirrored the one from the problem statement but was tweaked for Haskell:

mkRack :: Rack a
add    :: Ord a => a -> Rack a -> Rack a
balls  :: Rack a -> [a]

The function mkRack makes a new rack to hold values (“balls”) of type a. It’s equivalent to Rack.new in Ruby.

The add function adds a ball to a rack. You give it a ball and a rack, and it returns a new rack that is the same as the original rack but also contains the ball. (If you’re accustomed to stateful programming, this may seem weird. Why return a new rack instead of modifying the original rack? Because, in Haskell, you can’t change values: you can only create new values. At first, this constraint may seem limiting, but after you get used to it, you’ll find it empowering.)

Note: the Ord a qualification on the type signature of add says that it will work for any type a whose values can be ordered. The qualification is necessary because values of some types, like IO actions, cannot be compared to see which are less than the others.

The balls function is an “observer”: it lets you observe the balls in a rack by returning them as an ordered list.

And that’s the interface.

With the interface sketched, we gave it meaning by defining its properties.

Giving our interface meaning: defining properties using QuickCheck

QuickCheck is a powerful, easy-to-use testing tool. Instead of checking test cases, it checks properties – statements about what your code ought to do in general.

The great thing about QuickCheck properties is that they are testable documentation. They tell the world what your code is supposed to do, and they do so in a concise, formal language that just happens to be easily readable by humans and automatically testable by computers.

To specify the desired properties of our Rack interface, we first had to import QuickCheck:

import Test.QuickCheck

Then, we defined our first property. It said, simply, that a new rack must be empty when observed:

prop_New =
    balls mkRack =~ []

Our second property said that, when you add a ball x to a rack, the resulting rack must contain the same balls as the original rack plus x:

prop_AddAddsElement rack x =
    balls (add x rack) =~ (x : balls rack)

Both of the properties above rely upon a special, order-insensitive equality test that we defined for lists of Int values:

(=~) :: [Int] -> [Int] -> Bool
xs =~ ys = sort xs == sort ys

Note that under this test, [1,2] “equals” both [1,2] and [2,1], but it does not “equal” any other values.

The reason we defined this operator was to help us specify the two essential properties of add separately: (1) it must insert a ball into a rack, and (2) the new ball’s position, when observed, must preserve the rack’s ordering invariant. The previous property definition used the =~ operator to specify the first of these two properties. The next property we defined specified the second:

prop_AddPreservesOrdering rack x =
    isOrdered (balls rack) ==> isOrdered (balls (add x rack))

This definition specifies that, for all racks rack and all balls x, if the balls in rack are ordered, the balls in the rack that results from adding x to rack must also be ordered. If you are familiar with proof by induction, you’ll know why we went this route. In short, if we can prove that this property holds (and, trivially, that an empty rack is ordered), we can prove that add preserves the ordering invariant.

To round out the property definition, we needed to define the isOrdered test:

isOrdered :: [Int] -> Bool
isOrdered xs = xs == sort xs

And those are the properties we needed to check the correctness of our implementation. Of course, we still needed to write our implementation, and we turned to that task next.

A simple, list-based Rack implementation

For our first implementation, we decided upon a drop-dead-simple list-based representation. We would keep the elements of the list in sorted order by inserting them into the correct positions when add was called.

Here, then, was our code:

-- Our list-based implementation of a Rack

type Rack a = [a]

mkRack   = []
add x xs = insertList x xs
balls    = id

insertList :: Ord a => a -> [a] -> [a]
insertList x []     = [x]
insertList x (y:ys)
    | x < y         = x : y : ys
    | otherwise     = y : insertList x ys

That’s it.

We took our new implementation for a spin in GHCi:

*Rack> balls mkRack
[]

*Rack> balls (add 3 mkRack)
[3]

*Rack> balls (add 4 (add 3 mkRack))
[3,4]

*Rack> balls (add 1 (add 4 (add 3 mkRack)))
[1,3,4]

*Rack> balls (foldr add mkRack [4,2,6,3,-9,0,33,9])
[-9,0,2,3,4,6,9,33]

To really test our implementation, we asked QuickCheck to check its properties:

*Rack> quickCheck prop_New
OK, passed 100 tests.

*Rack> quickCheck prop_AddAddsElement
OK, passed 100 tests.

*Rack> quickCheck prop_AddPreservesOrdering
OK, passed 100 tests.

I should point out that QuickCheck did not prove that our properties held. Rather, it gathered evidence that we could use to argue that our properties held. The evidence was that each of our properties’ claims was subjected to 100 randomly generated tests, and none of the tests was able to disprove a claim.

Was this evidence sufficient for us to rest satisfied that our implementation was correct? Given how simple our implementation was, I felt that the evidence was sufficient. Casey agreed, and we moved on.

With the first implementation done, we decided to try a more-sophisticated implementation.

Generalizing the interface

Since we were about to have multiple implementations, it made sense for us to define a generalized interface that any “Rack-like” implementation could use. For that, Haskell’s type classes were perfect:

-- Our interface for "Rack-like" data types

class Racklike a ra | ra -> a where
    mkRack :: ra
    add    :: Ord a => a -> ra -> ra
    balls  :: ra -> [a]

The interface was essentially the same as before, except that the data type behind the rack implementation was not given by a specific type Rack a but rather by the type variable ra, which represents some type of rack container for balls of type a.

Note that ra determines a. If, for example, you know that the container type ra equals “a list of Int values,” you know that a must equal Int. (To represent this relationship, we used functional dependencies, a popular extension to the Haskell 98 standard.)

With the Racklike type class in place, we moved our list-based implementation inside of the interface:

-- Our list-based implementation of a Rack

type ListRack a = [a]

instance Racklike a (ListRack a) where
    mkRack = []
    add    = insertList
    balls  = id

Next, we modified our QuickCheck property definitions. Where before it was fine to assume that we would be testing our single, list-based implementation, now we needed to allow for testing other implementation types. We did this by adding a rackType parameter to our property definitions. We used the type, not the value, of this parameter to determine the type of rack to test:

prop_New rackType =
    balls (mkRack `asTypeOf` rackType) =~ []

prop_AddAddsElement rackType ballList x =
    balls (add x rack) =~ (x : balls rack)
  where
    rack = rackFromList ballList `asTypeOf` rackType

prop_AddPreservesOrdering rackType ballList x =
    isOrdered (balls rack) ==> isOrdered (balls (add x rack))
  where
    rack = rackFromList ballList `asTypeOf` rackType

Because we could no longer assume the rack would be represented as a list of integers, we wrote rackFromList to convert such a list into a rack:

rackFromList xs = foldr add mkRack xs

With these modifications in place, we re-ran our tests, specifying (via type annotations) that we wanted to run them for the ListRack implementation:

*Rack> quickCheck $ prop_New (undefined :: ListRack Int)
OK, passed 100 tests.

*Rack> quickCheck $ prop_AddAddsElement (undefined :: ListRack Int)
OK, passed 100 tests.

*Rack> quickCheck $ prop_AddPreservesOrdering (undefined :: ListRack Int)
OK, passed 100 tests.

A tree-based Rack implementation

Now that we were free to add additional implementation types, we created one based on binary trees. We started by defining the tree data type:

data Tree a
    = Empty
    | Root (Tree a) a (Tree a)
    deriving (Ord, Eq, Show)

This definition says that a tree can be either empty or a root node. A root node has a single value and left and right sub-trees.

Further, root nodes must satisfy an ordering invariant: if a root node’s value is x, all of the values in its left subtree must be less than x, and all of the values in its right subtree must be greater than or equal to x. The data type doesn’t enforce this invariant, so we would need to enforce it in our implementation.

Next, we wrote the basic functions for creating, adding elements to, and observing our trees.

We needed to be able to create empty trees:

emptyTree =
    Empty

Inserting an element into a tree requires us to walk the tree and append the element as a new leaf node in the correct location, being mindful of our ordering invariant. Because our data structure is inherently recursive, a recursive implementation was straightforward to code:

insertTree x Empty  = Root Empty x Empty
insertTree x (Root left y right)
    | x < y         = Root (insertTree x left) y right
    | otherwise     = Root left y (insertTree x right)

Note that we don’t try to ensure that the tree is balanced. The problem statement says that the balls are randomly selected, and thus we can expect our trees, on average, to be balanced naturally.

Next, we wrote the code to observe the elements of a tree. We used a functional-programming idiom for efficiently flattening a tree into a list:

elemsTree rx =
    elemsTree' rx []

elemsTree' Empty               = id
elemsTree' (Root left x right) =
    elemsTree' left . (x :) . elemsTree' right

Finally, we defined a new tree-based rack type and declared it to be an instance of the Racklike type class:

type TreeRack a = Tree a

instance Racklike a (TreeRack a) where
    mkRack = emptyTree
    add    = insertTree
    balls  = elemsTree

With the implementation done, we took it for a test drive:

*Rack> add 1 mkRack :: TreeRack Int
Root Empty 1 Empty

*Rack> add 3 (add 1 mkRack) :: TreeRack Int
Root Empty 1 (Root Empty 3 Empty)

*Rack> balls (add 3 (add 1 mkRack) :: TreeRack Int)
[1,3]

Then, for the real test, we checked that our properties held for TreeRacks:

*Rack> quickCheck $ prop_New (undefined :: TreeRack Int)
OK, passed 100 tests.

*Rack> quickCheck $ prop_AddAddsElement (undefined :: TreeRack Int)
OK, passed 100 tests.

quickCheck $ prop_AddPreservesOrdering (undefined :: TreeRack Int)
OK, passed 100 tests.

Satisfied with these results, we moved on to part two of the problem.

The second part of the problem

The second part of the problem was about sorting the letters within a block of text, ignoring white space and punctuation, and converting upper case letters into lower case: “Are there any ways to perform this sort cheaply, and without using built-in libraries?”

Again, a counting sort seemed like an obvious ideal solution, but we decided to recycle our existing code since we had to leave soon. Because our Rack implementations were generic, they would work on letters just as well as on numbers or other kinds of balls:

*Rack> balls (rackFromList "this is a test" :: TreeRack Char)
"   aehiisssttt" 

With our existing code already doing the hard work for us, it was trivial to code up the letter-sorting function:

sortLetters xs =
    balls (rackFromList letters :: TreeRack Char)
  where
    letters = [toLower x | x <- xs, isAlpha x]

(Note: Because of the nature of the problem, I interpreted the question’s “without using built-in libraries” to mean “without built-in sorting libraries.”)

We took the new function for a test drive, and it worked as expected:

*Rack> sortLetters "This is a test, pal." 
"aaehiilpsssttt" 

And that ended our coding session.

Update: Tweaked the revised definition of the AddAddsElement property for greater parallelism with the original.

Update 2007-03-03: Minor edits for clarity.

Posted in , , ,
Tags , , , ,
7 comments
no trackbacks
Reddit Delicious

My code is "so clever as to be stupid"

Posted by Tom Moertel Fri, 16 Jun 2006 12:21:00 GMT

In Port Scanning Shootout, author “cavedave” provides a mini-review of my Concurrent port scanner in Haskell. The thing is, I am not sure what to make of it:

50 lines of indentation balancing monadic grappling goodness. Considering I started this quest after seeing this code and thinking it was good, in retrospect it seems very big and not very clever, or at least so clever as to be stupid.

I am struck by the last statement. It seems the author is channeling the timeless David St. Hubbins, who said, “It’s such a fine line between stupid, and clever.”

Words to live by, if you ask me.   ;-)

Posted in , , ,
no comments
no trackbacks
Reddit Delicious

Composing functions in Ruby

Posted by Tom Moertel Fri, 07 Apr 2006 15:55:00 GMT

One of the things I miss when coding in Ruby is inexpensive function composition. In Haskell, for example, I can compose functions using the dot (.) operator:

inc        = (+1)
twice      = (*2)
twiceOfInc = twice . inc
Because of Ruby’s open classes, however, I can easily add the feature to the language. In the code below, I introduce Proc.compose and overload the star (*) operator for the purpose:
# func_composition.rb
class Proc
  def self.compose(f, g)
    lambda { |*args| f[g[*args]] }
  end
  def *(g)
    Proc.compose(self, g)
  end
end

And that’s all it takes:

$ irb --simple-prompt -r func_composition.rb

>> inc = lambda { |x| x + 1 }
=> #<Proc:0x00002aaaaaad7810@(irb):1>

>> twice = lambda { |x| x * 2 }
=> #<Proc:0x00002aaaaabd2d18@(irb):2>

>> inc[1]
=> 2

>> twice[2]
=> 4

>> twice_of_inc = twice * inc
=> #<Proc:0x00002aaaaab32458@./func_composition.rb:3>

>> twice_of_inc[1]
=> 4

>> twice_of_inc[2]
=> 6

Now, isn’t that refreshing?

Update: Vincent Foley pointed out on comp.lang.ruby that Ruby Facets has a nearly identical implementation that also uses the star operator for composition. (Its version of compose, however, is an instance method whereas my version is a class method.)

Posted in ,
Tags , ,
2 comments
no trackbacks
Reddit Delicious

Wow! A Haskell-based first-person shooter!

Posted by Tom Moertel Tue, 22 Nov 2005 22:26:00 GMT

As seen in Haskell Weekly News, Mon Hon Cheong announced Frag, a first-person shooter written in – wait for it – Haskell. It uses HOpenGL for its OpenGL binding and Yampa for reactive game elements.

Cool!

Posted in ,
Tags , , ,
2 comments
no trackbacks
Reddit Delicious

Scope herding with delimited continuations

Posted by Tom Moertel Tue, 13 Sep 2005 14:24:00 GMT

Recently I took advantage of delimited continuations to create a more natural Haskell-based kernel for GIML. The amazing scope-herding abilities of reset and shift were the magic that made it possible.

Ready to get wild? Grab an espresso and read on.

Read more...

Posted in , ,
Tags , ,
2 comments
no trackbacks
Reddit Delicious

Closures and the professional programmer

Posted by Tom Moertel Tue, 30 Aug 2005 18:56:00 GMT

I came across Tim Bray’s thoughts on Ruby via the ever-delightful Lambda the Ultimate and found the following bit fascinating:

I’ve had access to languages with closures and continuations and suchlike constructs for years and years, and I’ve never ever written one. While I’m impressed by how natural this stuff is in Ruby, I’m still unconvinced that these are a necessary part of the professional programmer’s arsenal. [Emphasis mine.]

While Tim Bray may be unconvinced, I am a true believer.

Read more...

Posted in , , ,
Tags , ,
12 comments
1 trackback
Reddit Delicious

Power parsing with Haskell and Parsec

Posted by Tom Moertel Sat, 27 Aug 2005 21:12:00 GMT

One of the projects I’m working on is a language to help researchers manipulate genetics information. Despite all the well-publicized advances in genetics, researchers still spend about a third of their time writing shell, awk, and Perl scripts to manipulate their data. If researchers can get some of this time back, they can use it to think about more interesting problems, like curing cancer and stuff like that.

Read more...

Posted in ,
Tags , ,
4 comments
no trackbacks
Reddit Delicious

Older posts: 1 2