TMR 9!

Posted by Tom Moertel Mon, 19 Nov 2007 18:07:00 GMT

Issue 9 of The Monad.Reader is hot off the presses! The issue focuses on three Google-Summer-of-Code projects for Haskell: Cabal configurations, Darcs’s Patch Theory, and the typechecker-framework TaiChi. Good stuff.

I know what I’ll be reading for lunch today.

Posted in
Tags ,
no comments
no trackbacks
Reddit Delicious

Updated cabal2rpm helps you make RPM packages from Haskell Cabal packages

Posted by Tom Moertel Sat, 08 Sep 2007 00:19:00 GMT

I just released an updated version of cabal2rpm, a small program (written in Perl) that creates RPM spec files from Cabal package descriptions. RPM is the software-packaging format used by several popular Linux distributions, including Red Hat and Fedora. Cabal is the packaging format used by the Haskell community to distribute software written in Haskell.

Bryan O’Sullivan’s cabal-rpm also creates spec files from Cabal packages. Unlike cabal2rpm, it is written in Haskell and directly interfaces with the Cabal libraries. Long term, it is the way to go. For now, however, cabal2rpm may be more convenient because it works out of the box. (To use cabal-rpm, you’ll first need to install the just-tagged Cabal 1.2.0 library, not yet in wide distribution.)

Posted in
Tags , ,
no comments
no trackbacks
Reddit Delicious

Directory-tree printing in Haskell, part three: lazy I/O

Posted by Tom Moertel Wed, 28 Mar 2007 19:40:00 GMT

This article is part three in a series on introductory Haskell programming. In the first article, we wrote a small program to recursively scan file-system directories and print their contents as ASCII-art trees. In the second article, we refactored the program to make its logic more reusable by separating the directory-scanning logic from the tree-printing logic. In this article, we will address a shortcoming of the refactored version: It must scan directory hierarchies completely before printing their trees, i.e., it must scan and then print, when doing both simultaneously is both more efficient and more user friendly.

Recall from the previous article that our directory-printing program is factored into three pieces of logic:

  1. fsTraverse, which traverses a file-system hierarchy and returns a tree data structure;
  2. showTree, which converts a tree into lovingly crafted ASCII art; and
  3. traverseAndPrint, which prints the tree for a file-system hierarchy by using the first two pieces of logic.

The types of the functions are as follows:

fsTraverse       :: Path -> DentName -> IO DirTree
showTree         :: Tree String -> String
traverseAndPrint :: Path -> IO ()

Note that showTree is a pure function, but the other two return IO actions that may have side effects.

Within traverseAndPrint, fsTraverse and showTree are combined into a composite IO action by the =<< combinator:

putStr . showTree =<< fsTraverse root path

The sequencing semantics of Haskell’s IO monad forces all of the effects of fsTraverse to complete before any following effects can begin. To better understand these sequencing semantics, let’s consider a simple example.

The IO-monad code,

a >> b

can loosely be interpreted as running the action a, which forces its side effects to occur, and then running the action b, which forces its side effects to occur.

In reality, a and b are not actions. They are functions. Like all Haskell functions, they are pure and have no side effects. It’s just that a and b return values that represent actions, and those actions may have side effects, and the semantics of the IO monad guarantee the ordering of those effects (should the actions end up being connected to the runtime’s top-level IO action and executed). If you think that’s weird, hold that thought. For now, all that’s important is that, if the composite action represented by the expression (a >> b) is executed, the effects of a, regardless of how complex, will be executed before the effects of b.

Thus if a represents building a tree by recursively scanning a file-system hierarchy, the entire tree must be built before b ever gets a chance to do its thing. For our particular application, however, that particular sequencing is suboptimal. We know from our earlier, monolithic implementation that the file-system hierarchy can be scanned and printed simultaneously, which is more efficient. Ideally, then, our refactored implementation should be just as efficient.

In this article, we will look at one way to maintain the clean, logical separation of the a part from the b part while allowing the parts’ effects to be interleaved for efficiency. We will use an extension to the Haskell language to make the directory-scanning action lazy so that it builds the tree as the tree is consumed.

Ready? Let’s dive in.

Read more...

Posted in
Tags , , , , ,
6 comments
no trackbacks
Reddit Delicious

Directory-tree printing in Haskell, part two: refactoring

Posted by Tom Moertel Wed, 07 Mar 2007 21:04:00 GMT

In my previous article on writing a simple directory-tree printer in Haskell, I wrote a small program to recursively scan file-system directories and print their contents as ASCII-art trees. The program made for an approachable example of how to use Haskell for “imperative” tasks, but it has a few problems.

First, the directory-scanning logic and tree-printing logic are intertwined. Neither is reusable. Second, both bits of logic are rigid, specialized for this particular task. Even if you could reuse them, you wouldn’t want to.

In this article, the second in a series, we will explore ways to make our original code more reusable. We will separate the directory scanning from the tree printing, harness the power of some old friends from Haskell’s libraries, and think about the costs and benefits of our changes.

The plan

Recall our original directory tree–listing solution, the core of which I will reprint below:

tlist path =
    visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path

visit path leader tie arm node = do
    putStrLn (leader ++ arm ++ tie ++ node)
    visitChildren (path ++ "/" ++ node) (leader ++ extension)
  where
    extension = case arm of ""  -> ""; "`" -> "    "; _   -> "|   "

visitChildren path leader =
    whenM (doesDirectoryExist path) $ do
        contents <- getDirectoryContents path
            `catch` (\e -> return [show e])
        let visibles = sort . filter (`notElem` [".", ".."]) $ contents
            arms = replicate (length visibles - 1) "|" ++ ["`"]
        zipWithM_ (visit path leader "-- ") arms visibles

The tlist function kicks off the process for a particular file-system path, handing off to visit which recursively descends the directory tree from the root node. The visit function calls visitChildren to expand the subtree, if any, for each node visited. The visitChildren function, in turn, calls back to visit to repeat the process for each child in the subtree. In effect, we are traversing the tree rooted at path, printing each node in passing.

To separate the traversal part from the printing part, we will introduce a tree data structure. The file system–traversal code will emit a tree, and the tree-showing code will consume a tree. We will rewrite our old tlist function, which we might as well rename to the more descriptive traverseAndPrint, to glue the two pieces together with the tree serving as glue:

traverseAndPrint :: Path -> IO ()
traverseAndPrint path = do
    tree <- fsTraverse root path
    putStrLn (showTree tree)
  where
    root = if "/" `isPrefixOf` path then "" else "."

That’s the plan. Now let’s carry it out.

Read more...

Posted in
Tags , , , ,
7 comments
no trackbacks
Reddit Delicious

A simple directory-tree printer in Haskell

Posted by Tom Moertel Thu, 22 Feb 2007 21:30:00 GMT

At a recent gathering, my friend Casey mentioned that he was learning a new programming language and, as a learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and prints out a tree representation for each:

$ tlist cheating-hangman

cheating-hangman
|-- CVS
|   |-- Entries
|   |-- Repository
|   `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl

I thought the problem sounded like fun, and so I wrote a small solution in Haskell. Let’s take a look.

Read more...

Posted in , ,
Tags , , ,
7 comments
no trackbacks
Reddit Delicious

Adding Haskell syntax highlighting to the Typo blogging system

Posted by Tom Moertel Wed, 01 Nov 2006 22:01:00 GMT

Last night on #haskell, Don Stewart asked if I had seen HsColour for rendering syntax-highlighted Haskell in HTML. He had used it recently, he noted in passing, to add syntax highlighting to planet.haskell.org.

Now, I can’t be certain about this, but I suspect that Don’s question was cleverly designed to instill in me a subtle case of syntax-highlighting envy. For on my blog, Haskell code snippets were rendered in dreadfully boring uncolored text. But on his blog, the snippets dance in joyous polychromatic splendor.

Thus I was compelled to add Haskell syntax-highlighting to my blog.

Adding Haskell syntax-highlighting to Typo

My blog runs on the Ruby-on-Rails-powered Typo system, which allows for plug-in text filters. One of the included filters, in fact, is a syntax-highlighting filter for snippets of Ruby, XML, and YAML code. This filter is built upon the Ruby Syntax module, which wasn’t exactly designed for Haskell syntax analysis. So I set out to create a new plug-in filter based upon HsColour.

This task turned out to be easy. All I did was duplicate Typo’s existing syntax-highlighting filter and swap out its filtering code for the following:

IO.popen("HsColour -css", "r+") do |f|
  pid = fork { f.write text; f.close; exit! 0 }
  f.close_write
  text = f.read
  Process.waitpid pid
end

I also tweaked the post-processing regular expressions so that they would whittle away the HTML filler before and after the syntax-highlighted output of HsColour:

text.gsub!(/.*<p()re>/m, ...)
text.gsub!(/<\/pre>.*/m, ...)

A few more tweaks and I was done.

Now I can wrap my Haskell code in <typo:haskell> tags and it, too, will dance in joyous polychromatic splendor:

constructTable tspecs = do
    ecolspecs <- during "argument evaluation" $ do
        toNvps . concat =<< mapM splice tspecs
    let names = map fst ecolspecs
    let evecs = map snd ecolspecs
    vecs <- argof nm $ mapM evalVector evecs
    let vlens = map vlen vecs
    if length (group vlens) == 1
        then return . VTable $ mkTable (zip names vecs)
        else throwError $
             "table columns must be non-empty vectors of equal length"
  where
    nm = "table(...) constructor"
    splice (TCol envp)  = return [envp]
    splice (TSplice e)  = do
        val <- eval e
        case val of
            VTable t ->
                return $ zipWith mkNVP (tcnames t) (elems (tvecs t))
            VList gl ->
                liftM (zipWith mkNVP (map name . elems $ glnames gl)) $
                mapM asVectorNull (elems $ glvals gl)
            _ -> throwError $
                "can't construct table columns from (" ++
                show val ++ ")"
    mkNVP n vec = NVP n (mkNoPosExpr . EVal $ VVector vec)
    name ""     = "NA"
    name n      = n

If you want the filter code, here it is: haskell_controller.rb. Just drop it into components/plugins/textfilters and restart Typo. The corresponding CSS styles can be found in my user-styles.css.

Posted in , ,
Tags , , ,
no comments
no trackbacks
Reddit Delicious

Introductory Haskell: Solving the Sorting-It-Out Kata

Posted by Tom Moertel Tue, 31 Oct 2006 19:44:00 GMT

Last Tuesday, my friend Casey and I were hanging out at Aldo Coffee. We planned on enjoying some espresso, doing some work, and then heading over to the Pittsburgh Coding Dojo, where we could hang out with other geekly folks. We ended up not having enough time to go to the meeting, but we decided to hack on the challenge problem anyway, using Aldo’s ever-handy free wireless to access the Internet.

The Dojo problem was PragDave’s Kata Eleven – Sorting it Out. (It’s short; read it now.) We decided to use Haskell for our implementation language.

In this post, I’ll walk through our coding session and explain how our solution evolved. To better fit the session into a blog post, I have removed a lot of back-and-forth micro iterations, and I have edited some of the code for clarity.

The first part of the problem

The first part of the problem was “Sorting Balls.” The story: You need to implement a “rack” to hold the balls drawn at random (without replacement) from a bin containing sixty balls, numbered 0 to 59. Regardless of the order in which the balls are added to the rack, you need to present them in sorted order whenever you’re asked for them.

Upon reading this part of the challenge, a couple of thoughts sprung to mind:

  • Because the range of balls is so small, the problem was begging for a solution based on a counting sort.
  • Because the balls are uniquely numbered and drawn without replacement, we could even use a bit vector to represent counts.

Nevertheless, we decided to ignore these thoughts and implement a more-general solution that would work for any (orderable) values, not just small ranges of integers.

Sketching the interface

The first step, then, was to sketch out an interface. Our interface mirrored the one from the problem statement but was tweaked for Haskell:

mkRack :: Rack a
add    :: Ord a => a -> Rack a -> Rack a
balls  :: Rack a -> [a]

The function mkRack makes a new rack to hold values (“balls”) of type a. It’s equivalent to Rack.new in Ruby.

The add function adds a ball to a rack. You give it a ball and a rack, and it returns a new rack that is the same as the original rack but also contains the ball. (If you’re accustomed to stateful programming, this may seem weird. Why return a new rack instead of modifying the original rack? Because, in Haskell, you can’t change values: you can only create new values. At first, this constraint may seem limiting, but after you get used to it, you’ll find it empowering.)

Note: the Ord a qualification on the type signature of add says that it will work for any type a whose values can be ordered. The qualification is necessary because values of some types, like IO actions, cannot be compared to see which are less than the others.

The balls function is an “observer”: it lets you observe the balls in a rack by returning them as an ordered list.

And that’s the interface.

With the interface sketched, we gave it meaning by defining its properties.

Giving our interface meaning: defining properties using QuickCheck

QuickCheck is a powerful, easy-to-use testing tool. Instead of checking test cases, it checks properties – statements about what your code ought to do in general.

The great thing about QuickCheck properties is that they are testable documentation. They tell the world what your code is supposed to do, and they do so in a concise, formal language that just happens to be easily readable by humans and automatically testable by computers.

To specify the desired properties of our Rack interface, we first had to import QuickCheck:

import Test.QuickCheck

Then, we defined our first property. It said, simply, that a new rack must be empty when observed:

prop_New =
    balls mkRack =~ []

Our second property said that, when you add a ball x to a rack, the resulting rack must contain the same balls as the original rack plus x:

prop_AddAddsElement rack x =
    balls (add x rack) =~ (x : balls rack)

Both of the properties above rely upon a special, order-insensitive equality test that we defined for lists of Int values:

(=~) :: [Int] -> [Int] -> Bool
xs =~ ys = sort xs == sort ys

Note that under this test, [1,2] “equals” both [1,2] and [2,1], but it does not “equal” any other values.

The reason we defined this operator was to help us specify the two essential properties of add separately: (1) it must insert a ball into a rack, and (2) the new ball’s position, when observed, must preserve the rack’s ordering invariant. The previous property definition used the =~ operator to specify the first of these two properties. The next property we defined specified the second:

prop_AddPreservesOrdering rack x =
    isOrdered (balls rack) ==> isOrdered (balls (add x rack))

This definition specifies that, for all racks rack and all balls x, if the balls in rack are ordered, the balls in the rack that results from adding x to rack must also be ordered. If you are familiar with proof by induction, you’ll know why we went this route. In short, if we can prove that this property holds (and, trivially, that an empty rack is ordered), we can prove that add preserves the ordering invariant.

To round out the property definition, we needed to define the isOrdered test:

isOrdered :: [Int] -> Bool
isOrdered xs = xs == sort xs

And those are the properties we needed to check the correctness of our implementation. Of course, we still needed to write our implementation, and we turned to that task next.

A simple, list-based Rack implementation

For our first implementation, we decided upon a drop-dead-simple list-based representation. We would keep the elements of the list in sorted order by inserting them into the correct positions when add was called.

Here, then, was our code:

-- Our list-based implementation of a Rack

type Rack a = [a]

mkRack   = []
add x xs = insertList x xs
balls    = id

insertList :: Ord a => a -> [a] -> [a]
insertList x []     = [x]
insertList x (y:ys)
    | x < y         = x : y : ys
    | otherwise     = y : insertList x ys

That’s it.

We took our new implementation for a spin in GHCi:

*Rack> balls mkRack
[]

*Rack> balls (add 3 mkRack)
[3]

*Rack> balls (add 4 (add 3 mkRack))
[3,4]

*Rack> balls (add 1 (add 4 (add 3 mkRack)))
[1,3,4]

*Rack> balls (foldr add mkRack [4,2,6,3,-9,0,33,9])
[-9,0,2,3,4,6,9,33]

To really test our implementation, we asked QuickCheck to check its properties:

*Rack> quickCheck prop_New
OK, passed 100 tests.

*Rack> quickCheck prop_AddAddsElement
OK, passed 100 tests.

*Rack> quickCheck prop_AddPreservesOrdering
OK, passed 100 tests.

I should point out that QuickCheck did not prove that our properties held. Rather, it gathered evidence that we could use to argue that our properties held. The evidence was that each of our properties’ claims was subjected to 100 randomly generated tests, and none of the tests was able to disprove a claim.

Was this evidence sufficient for us to rest satisfied that our implementation was correct? Given how simple our implementation was, I felt that the evidence was sufficient. Casey agreed, and we moved on.

With the first implementation done, we decided to try a more-sophisticated implementation.

Generalizing the interface

Since we were about to have multiple implementations, it made sense for us to define a generalized interface that any “Rack-like” implementation could use. For that, Haskell’s type classes were perfect:

-- Our interface for "Rack-like" data types

class Racklike a ra | ra -> a where
    mkRack :: ra
    add    :: Ord a => a -> ra -> ra
    balls  :: ra -> [a]

The interface was essentially the same as before, except that the data type behind the rack implementation was not given by a specific type Rack a but rather by the type variable ra, which represents some type of rack container for balls of type a.

Note that ra determines a. If, for example, you know that the container type ra equals “a list of Int values,” you know that a must equal Int. (To represent this relationship, we used functional dependencies, a popular extension to the Haskell 98 standard.)

With the Racklike type class in place, we moved our list-based implementation inside of the interface:

-- Our list-based implementation of a Rack

type ListRack a = [a]

instance Racklike a (ListRack a) where
    mkRack = []
    add    = insertList
    balls  = id

Next, we modified our QuickCheck property definitions. Where before it was fine to assume that we would be testing our single, list-based implementation, now we needed to allow for testing other implementation types. We did this by adding a rackType parameter to our property definitions. We used the type, not the value, of this parameter to determine the type of rack to test:

prop_New rackType =
    balls (mkRack `asTypeOf` rackType) =~ []

prop_AddAddsElement rackType ballList x =
    balls (add x rack) =~ (x : balls rack)
  where
    rack = rackFromList ballList `asTypeOf` rackType

prop_AddPreservesOrdering rackType ballList x =
    isOrdered (balls rack) ==> isOrdered (balls (add x rack))
  where
    rack = rackFromList ballList `asTypeOf` rackType

Because we could no longer assume the rack would be represented as a list of integers, we wrote rackFromList to convert such a list into a rack:

rackFromList xs = foldr add mkRack xs

With these modifications in place, we re-ran our tests, specifying (via type annotations) that we wanted to run them for the ListRack implementation:

*Rack> quickCheck $ prop_New (undefined :: ListRack Int)
OK, passed 100 tests.

*Rack> quickCheck $ prop_AddAddsElement (undefined :: ListRack Int)
OK, passed 100 tests.

*Rack> quickCheck $ prop_AddPreservesOrdering (undefined :: ListRack Int)
OK, passed 100 tests.

A tree-based Rack implementation

Now that we were free to add additional implementation types, we created one based on binary trees. We started by defining the tree data type:

data Tree a
    = Empty
    | Root (Tree a) a (Tree a)
    deriving (Ord, Eq, Show)

This definition says that a tree can be either empty or a root node. A root node has a single value and left and right sub-trees.

Further, root nodes must satisfy an ordering invariant: if a root node’s value is x, all of the values in its left subtree must be less than x, and all of the values in its right subtree must be greater than or equal to x. The data type doesn’t enforce this invariant, so we would need to enforce it in our implementation.

Next, we wrote the basic functions for creating, adding elements to, and observing our trees.

We needed to be able to create empty trees:

emptyTree =
    Empty

Inserting an element into a tree requires us to walk the tree and append the element as a new leaf node in the correct location, being mindful of our ordering invariant. Because our data structure is inherently recursive, a recursive implementation was straightforward to code:

insertTree x Empty  = Root Empty x Empty
insertTree x (Root left y right)
    | x < y         = Root (insertTree x left) y right
    | otherwise     = Root left y (insertTree x right)

Note that we don’t try to ensure that the tree is balanced. The problem statement says that the balls are randomly selected, and thus we can expect our trees, on average, to be balanced naturally.

Next, we wrote the code to observe the elements of a tree. We used a functional-programming idiom for efficiently flattening a tree into a list:

elemsTree rx =
    elemsTree' rx []

elemsTree' Empty               = id
elemsTree' (Root left x right) =
    elemsTree' left . (x :) . elemsTree' right

Finally, we defined a new tree-based rack type and declared it to be an instance of the Racklike type class:

type TreeRack a = Tree a

instance Racklike a (TreeRack a) where
    mkRack = emptyTree
    add    = insertTree
    balls  = elemsTree

With the implementation done, we took it for a test drive:

*Rack> add 1 mkRack :: TreeRack Int
Root Empty 1 Empty

*Rack> add 3 (add 1 mkRack) :: TreeRack Int
Root Empty 1 (Root Empty 3 Empty)

*Rack> balls (add 3 (add 1 mkRack) :: TreeRack Int)
[1,3]

Then, for the real test, we checked that our properties held for TreeRacks:

*Rack> quickCheck $ prop_New (undefined :: TreeRack Int)
OK, passed 100 tests.

*Rack> quickCheck $ prop_AddAddsElement (undefined :: TreeRack Int)
OK, passed 100 tests.

quickCheck $ prop_AddPreservesOrdering (undefined :: TreeRack Int)
OK, passed 100 tests.

Satisfied with these results, we moved on to part two of the problem.

The second part of the problem

The second part of the problem was about sorting the letters within a block of text, ignoring white space and punctuation, and converting upper case letters into lower case: “Are there any ways to perform this sort cheaply, and without using built-in libraries?”

Again, a counting sort seemed like an obvious ideal solution, but we decided to recycle our existing code since we had to leave soon. Because our Rack implementations were generic, they would work on letters just as well as on numbers or other kinds of balls:

*Rack> balls (rackFromList "this is a test" :: TreeRack Char)
"   aehiisssttt" 

With our existing code already doing the hard work for us, it was trivial to code up the letter-sorting function:

sortLetters xs =
    balls (rackFromList letters :: TreeRack Char)
  where
    letters = [toLower x | x <- xs, isAlpha x]

(Note: Because of the nature of the problem, I interpreted the question’s “without using built-in libraries” to mean “without built-in sorting libraries.”)

We took the new function for a test drive, and it worked as expected:

*Rack> sortLetters "This is a test, pal." 
"aaehiilpsssttt" 

And that ended our coding session.

Update: Tweaked the revised definition of the AddAddsElement property for greater parallelism with the original.

Update 2007-03-03: Minor edits for clarity.

Posted in , , ,
Tags , , , ,
7 comments
no trackbacks
Reddit Delicious

A type-based solution to the "strings problem": a fitting end to XSS and SQL-injection holes?

Posted by Tom Moertel Thu, 19 Oct 2006 01:40:00 GMT

Even skilled programmers have a hard time keeping their web applications free of XSS and SQL-injection vulnerabilities. And it shows: a sobering portion of web sites are open to some scary security threats.

Why are so many sites vulnerable to these well-known holes? Probably because it’s insanely hard for programmers to solve the fundamental “strings problem” at the heart of these vulnerabilities. The problem itself is easy to understand, but we humans aren’t equipped to carry out the solution. Simply put, we just plain suck at keeping a bazillion different strings straight in our heads, let alone consistently and reliably rendering their interactions safe whenever they cross paths in a modern web application. It’s easy to say, “just escape the little buggers,” but it’s hard to get it right, every single time.

Computers, on the other hand, are pretty good at keeping track of details by the bucket-full. Wouldn’t it be nice, then, if our programming languages gave us the power to delegate this nasty “strings problem” to our computers, which could then devote their unwavering mechanical precision to grinding the problem out of existence? Isn’t that the kind of thing modern programming languages are supposed to be good at?

I’d like to think the answer to that question is a big, you betcha.

So let’s grab a modern programming language and solve the strings problem.

Let’s solve the strings problem in Haskell

In this article, we will look at one way (among many) to solve the strings problem: by adding Ruby-style string templates to Haskell. These templates support “interpolation” via the usual, convenient #{var} syntax, but here interpolation is type safe. Haskell’s type system will prevent us from inadvertently mixing incompatible string types, and it will detect mistakes at compile time, before they can become live XSS or SQL-injection holes. Further, our solution will offer us these benefits without making us jump through hoops or pay some onerous syntax penalty.

To be more specific, the system offers the following benefits:

  • It provides a string-management kernel that lets you create “safe strings” by certifying a regular string as representing either text or a fragment of a known language.
  • It allows you to conveniently define new language types for any string-based language that you can provide an escaping rule for (e.g., XML, URLs, SQL, untrusted user input).
  • It provides compile-time syntactic sugar (via Template Haskell) that makes working with safe strings as convenient as working with string interpolation in languages like Ruby and Perl.
  • It catches and reports (at compile time) the following commonly made programming errors:
    • failing to escape a plain-old-text string before mixing it into a string that represents a language fragment
    • mixing strings that represent fragments of incompatible languages
    • mixing strings that represent fragments of compatible languages in an ambiguous way (the system will force you to disambiguate)

(This is a long one, so grab an espresso, lean back, and read on in style. Also, if you have a smoking jacket, you might want to get it now.)

Read more...

Posted in , , , , , ,
Tags , , , ,
37 comments
no trackbacks
Reddit Delicious

GHC 6.6 is out!

Posted by Tom Moertel Wed, 11 Oct 2006 16:21:00 GMT

Hooray! The Glasgow Haskell Compiler, version 6.6, has been released. Among the many new features in GHC 6.6 are a few that I’m looking forward to using right away:

  • SMP support to keep dual-core processors happy.
  • Bang patterns to strictify function arguments automatically, saving many calls to seq.
  • GADTs can now use record syntax.
  • GHCi now allows tab completion of in-scope names and modules on platforms that use readline (i.e. not Windows).

Congratulations and a big thank-you to the GHC team!

Posted in
Tags ,
no comments
no trackbacks
Reddit Delicious

Solving the Google Code Jam "countPaths" problem in Haskell

Posted by Tom Moertel Tue, 15 Aug 2006 21:01:00 GMT

Via the article on this year’s Google Code Jam on Slashdot earlier today, I found Hareesh Nagarajan’s write-up of a previous year’s Code-Jam problem. Since Google often comes up with interesting problems, I decided to give this one a go.

The problem: count the ways to find a word by walking on a grid

You are given a rectangular grid of letters and a word to find. You must compute the number of ways to find the word within the grid using the following rules:

  • start at any cell within the grid
  • from there, move to any of the cell’s eight neighboring cells
  • continue moving from that neighbor to its neighbors, and so on, until you have spelled out the word
  • you may visit cells more than once, but you cannot visit the same cell twice in a row (i.e., you must move for each turn)

For instance, consider the following grid, taken from the examples in the problem statement:

ABC
FED
GAI

If you were asked to find the word “AEA” on this grid, you could do it in four ways:

Way  --Move---
     1   2   3

1:  *BC ABC *BC
    FED F*D FED
    GAI GAI GAI

2:  *BC ABC ABC
    FED F*D FED
    GAI GAI G*I

3:  ABC ABC *BC
    FED F*D FED
    G*I GAI GAI

4:  ABC ABC ABC
    FED F*D FED
    G*I GAI G*I

If you were asked to find “ABCD”, you could do it in only one way:

Way  --Move-------- 
     1   2   3   4 

1:  *BC A*C AB* ABC
    FED FED FED FE*
    GAI GAI GAI GAI

If you were asked to find “AAB”, you could not: there are no “A” cells on the grid that have other “A” cells as neighbors.

The tricksy nature of the problem

As you might expect from Google, this puzzle was designed to see whether your solution can scale. A simple search will quickly bog down because each step in the search can expand into vastly more possibilities, as searching for “AAAA” on a seemingly harmless 2×2 grid of all “A” cells shows – there are 108 solutions.

The problem statement says that the grid may be up to 50×50 in size and the word to find may be up to 50 letters long. Imagine, then, that you are asked to find a word composed of 50 “A” letters within a 50×50 grid of “A” cells. All of the cells will be valid starting points, and each will have, on average, slightly less than 8 valid neighbors. Thus there will be about 50 × 50 × 8^49 = 4.5e47 ways to find the word1. Tracing them all would take forever.

The trick is figuring out a more efficient way to solve the problem. Since that’s the fun part of this problem, I won’t spoil it for you by telling you how I did it. (If you truly want spoilers, you can study my code.)

My solution

Here is what I came up with. I’ll present the code first and then discuss how to use it.

Note: The code below is out of date but printed here for continuity. See Update 5 for the most-recent revision.

{-

Tom Moertel <tom@moertel.com>
2006-08-15

Haskell-based solution to the Google Code Jam problem "countPaths";
see http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.

-}

module Main (main) where

import Control.Monad
import Data.Array
import qualified Data.Map as M

main = do
    word:gridspec <- liftM words getContents
    print $ (countPaths word (toGridArray gridspec) :: Integer)

countPaths word@(p:_) gridArray =
    sum . M.elems $ foldl step state0 (zip word (tail word))
  where
    state0 = M.fromList [(cell, 1) | (cell, q) <- assocs gridArray, p == q]
    neighbors = toNeighborMap gridArray
    step state fromto = M.fromListWith (+) $ do
        steps <- M.lookup fromto neighbors
        (start, count) <- M.assocs state
        cells <- M.lookup start steps
        cell <- cells
        return (cell, count)

toGridArray gridspec@(l1:_) =
    listArray ((1,1), (length gridspec, length l1)) (concat gridspec)

toNeighborMap gridArray =
    M.fromListWith (M.unionWith (flip (++))) $ do
        (cell, p) <- assocs gridArray
        cell' <- neighbors8 cell
        guard $ inRange (bounds gridArray) cell'
        return ((p, gridArray!cell'), M.singleton cell [cell'])

neighbors8 (r,c) =
    [(r+h, c+v) | h <- [-1..1], v <- [-1..1], h /= 0 || v /= 0]

-- Local Variables:  ***
-- compile-command: "ghc -O2 -o wordpath --make WordPath.hs" ***
-- End: ***

My solution generalizes upon the problem statement in a few ways:

  • the grid can be any size and the word any length
  • the grid and word can be composed of any comparable data type, not just A–Z letters (if you use the stdin interface, the code will use Unicode characters)
  • the code will compute exact counts instead of returning -1 for counts greater than 1e9

You can enter problems from the command line. Enter the word first and then the grid, each row separated by whitespace. For example:

$ ./wordpath
AAAAAAAAAAA

AAAAA
AAAAA
AAAAA
AAAAA
AAAAA
^D

2745564336

Give it a try

This was a fun problem to solve. If you have a little spare time, give it a try. I would love to compare results and talk about strategies.

Update: Fixed typo: Finding “AAAA” – not “AA” – on a 2×2 grid of all “A” letters results in a count of 108. Thanks to Joshua Volz for pointing out my mistake.

Update 2: Here’s a dynamic-programming-based implementation of countPaths that is about six times faster than my original implementation when solving the maximum-size, all-the-same-letter problem:

countPaths word gridArray =
    sum [counts ! (length word, cell) | cell <- cells]
  where
    counts = listArray ((1, (1, 1)), (length word, gridSize)) $
             [countFrom i cell | i <- [1..length word], cell <- cells]

    countFrom i cell
        | i == 1 && match = 1
        | match           = sum [counts!((i-1),n) | n <- neighbors!cell]
        | otherwise       = 0
      where
        match = rword ! i == gridArray ! cell

    neighbors = listArray (bounds gridArray) $
        [filter (inRange (bounds gridArray)) (neighbors8 cell)
            | cell <- cells ]

    rword    = listArray (1, length word) (reverse word)
    cells    = indices gridArray
    gridSize = snd (bounds gridArray)

See the thread started by ‘psykotic’ on reddit.com for more.

Update 3: Ivan Peev has solved the problem in Python: Solving the Google Code Jam ‘countPaths’ problem in Python. Because his implementation uses the same algorithm that my implementation in Update 2 does, it makes a good vehicle for Haskell-versus-Python speed comparisons, an interesting topic in light of the warning Google provides about using Python in the Google Code Jam:

NOTE: All submissions have a maximum of 2 seconds of runtime per test case. This limit is used in harder problems to force submissions to be of a certain complexity. Because of the inherent speed differences between Python and the other offered languages is large, some problems may require extra optimization or not be solvable using the Python language.

Ivan reports that his Python implementation solves the maximum-size, all-the-same-letter problem in about 8 seconds on an old 1-GHz AMD Athlon. The Haskell version comes in somewhat faster at 0.9 second on a 1.8-GHz AMD Opteron. (On the same Opteron, Ivan’s code clocks in at 2.8 seconds, which is impressive.)

Update 4: I have added a Ruby implementation and a Perl implementation and timings, too. On the the maximum-size, all-the-same-letter problem, Ruby clocks in at 4.2 seconds; Perl in 1.7 seconds. See the Perl implementation for a summary table of the timings.

Update 5: As I promised reader Kartik in a comment, here is a further-simplified, yet 25-percent-faster, version of my implementation in Update 2. This version eliminates the cache in favor of a current-state array that is folded through the successive letters of the target word. The result of the fold operation is the final state array, whose elements are summed to yield the final result. Here’s the complete code:

{-

Tom Moertel <tom@moertel.com>
2006-08-15 (revised 2006-09-01)

Haskell-based solution to the Google Code Jam problem "countPaths" 
See http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.

This implementation is based on the dynamic-programming strategy
mentioned by reddit.com user "psykotic":
http://programming.reddit.com/info/dni1/comments/cdp59.

-}

module Main (main) where

import Control.Monad
import Data.Array

main = do
    word:gridspec <- liftM words getContents
    print $ (countPaths word (toGridArray gridspec) :: Integer)

countPaths word grid =
    sum . elems $ foldl move counts0 (tail (reverse word))
  where
    move counts c  = step c $ sum . map (counts!) . neighbors
    counts0        = step (last word) (const 1)
    step c f       = listArray (bounds grid) $ map (match c f) cells
    match c f cell = if c == grid!cell then f cell else 0
    neighbors cell = filter (inRange (bounds grid)) (neighbors8 cell)
    cells          = indices grid

toGridArray gridspec@(l1:_) =
    listArray ((1,1), (length gridspec, length l1)) (concat gridspec)

neighbors8 (r,c) =
    [(h, v) | h <- [r-1..r+1], v <- [c-1..c+1], h /= r || v /= c]

-- Local Variables:  ***
-- compile-command: "ghc -O2 -o wordpathdp --make WordPathDP.hs" ***
-- End: ***

1 I believe that the exact count is 303 835 410 591 851 117 616 135 618 108 340 196 903 254 429 200 (approx. 3.04e47). It takes about six seconds 0.75 second to compute on a 1.8-GHz AMD64 box running Linux.

Posted in , ,
Tags , , , , , ,
7 comments
no trackbacks
Reddit Delicious

Older posts: 1 2