Posted by Tom Moertel
Wed, 28 Mar 2007 19:40:00 GMT
This article is part three in a series on introductory Haskell
programming. In the first
article,
we wrote a small program to recursively scan file-system directories
and print their contents as ASCII-art trees. In the second
article,
we refactored the program to make its logic more reusable by separating
the directory-scanning logic from the tree-printing logic. In this
article, we will address a shortcoming of the refactored version: It
must scan directory hierarchies completely before printing their
trees, i.e., it must scan and then print,
when doing both simultaneously is both more efficient and
more user friendly.
Recall from the previous article that our directory-printing program
is factored into three pieces of logic:
- fsTraverse, which traverses a file-system hierarchy and returns a tree data structure;
- showTree, which converts a tree into lovingly crafted ASCII art; and
- traverseAndPrint, which prints the tree for a file-system hierarchy by
using the first two pieces of logic.
The types of the functions are as follows:
fsTraverse :: Path -> DentName -> IO DirTree
showTree :: Tree String -> String
traverseAndPrint :: Path -> IO ()
Note that showTree is a pure function, but the other two return
IO actions that may have side effects.
Within traverseAndPrint, fsTraverse and showTree are combined
into a composite IO action by the =<< combinator:
putStr . showTree =<< fsTraverse root path
The sequencing semantics of Haskell’s IO monad forces all of the
effects of fsTraverse to complete before any following
effects can begin. To better understand these sequencing semantics,
let’s consider a simple example.
The IO-monad code,
can loosely be interpreted as running the action a, which forces
its side effects to occur, and then running the action b, which forces
its side effects to occur.
In reality, a and b are not actions. They are
functions. Like all Haskell functions, they are pure and have no side
effects. It’s just that a and b return values that
represent actions, and those actions may have side effects, and the
semantics of the IO monad guarantee the ordering of those effects
(should the actions end up being connected to the runtime’s
top-level IO action and executed). If you think that’s weird, hold
that thought. For now, all that’s important is that, if the composite
action represented by the expression
(a >> b) is executed, the
effects of a, regardless of how complex, will be executed
before the effects of b.
Thus if a represents building a tree by recursively scanning
a file-system hierarchy, the entire tree must be built before
b ever gets a chance to do its thing. For our particular
application, however, that particular sequencing is suboptimal. We
know from our earlier, monolithic implementation that the
file-system hierarchy can be scanned and printed simultaneously, which
is more efficient. Ideally, then, our refactored
implementation should be just as efficient.
In this article, we will look at one way to maintain the clean,
logical separation of the a part from the b part
while allowing the parts’ effects to be interleaved for efficiency.
We will use an extension to the Haskell language to make the
directory-scanning action lazy so that it builds the tree as the tree
is consumed.
Ready? Let’s dive in.
Read more...
Posted in haskell
Tags directory_tree_series, haskell, io, laziness, lazy, trees
6 comments
no trackbacks

Posted by Tom Moertel
Wed, 07 Mar 2007 21:04:00 GMT
In my previous article on writing a simple directory-tree printer in
Haskell,
I wrote a small program to recursively scan file-system
directories and print their contents as ASCII-art trees. The
program made for an approachable example of how to use Haskell for
“imperative” tasks, but it has a few problems.
First, the directory-scanning logic and tree-printing logic are
intertwined. Neither is reusable. Second, both bits of logic are
rigid, specialized for this particular task. Even if you could
reuse them, you wouldn’t want to.
In this article, the second in a series, we will explore ways to make
our original code more reusable. We will separate the directory
scanning from the tree printing, harness the power of some old
friends from Haskell’s libraries, and think about the costs
and benefits of our changes.
The plan
Recall our original directory tree–listing solution, the
core of which I will reprint below:
tlist path =
visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path
visit path leader tie arm node = do
putStrLn (leader ++ arm ++ tie ++ node)
visitChildren (path ++ "/" ++ node) (leader ++ extension)
where
extension = case arm of "" -> ""; "`" -> " "; _ -> "| "
visitChildren path leader =
whenM (doesDirectoryExist path) $ do
contents <- getDirectoryContents path
`catch` (\e -> return [show e])
let visibles = sort . filter (`notElem` [".", ".."]) $ contents
arms = replicate (length visibles 1) "|" ++ ["`"]
zipWithM_ (visit path leader "-- ") arms visibles
The tlist function kicks off the process for a particular
file-system path, handing off to visit which recursively descends the
directory tree from the root node. The visit function calls
visitChildren to expand the subtree, if any, for each node visited.
The visitChildren function, in turn, calls back to visit to
repeat the process for each child in the subtree.
In effect, we are traversing the tree rooted at path, printing each
node in passing.
To separate the traversal part from the printing part, we will
introduce a tree data structure. The file system–traversal code
will emit a tree, and the tree-showing code will consume a tree. We
will rewrite our old tlist function, which we might as well rename to
the more descriptive traverseAndPrint, to glue the two pieces
together with the tree serving as glue:
traverseAndPrint :: Path -> IO ()
traverseAndPrint path = do
tree <- fsTraverse root path
putStrLn (showTree tree)
where
root = if "/" `isPrefixOf` path then "" else "."
That’s the plan. Now let’s carry it out.
Read more...
Posted in haskell
Tags directory_tree_series, haskell, io, refactoring, trees
7 comments
no trackbacks

Posted by Tom Moertel
Wed, 01 Nov 2006 22:01:00 GMT
Last night on #haskell, Don
Stewart asked if I had seen
HsColour
for rendering syntax-highlighted Haskell in HTML. He had
used it recently, he noted in passing, to add syntax highlighting to planet.haskell.org.
Now, I can’t be certain about this, but I suspect that Don’s question
was cleverly designed to instill in me a subtle case of
syntax-highlighting envy. For on my blog, Haskell code snippets
were rendered in dreadfully boring uncolored text.
But on his blog, the
snippets dance in joyous polychromatic splendor.
Thus I was compelled to add Haskell syntax-highlighting to my blog.
Adding Haskell syntax-highlighting to Typo
My blog runs on the Ruby-on-Rails-powered Typo
system, which allows for plug-in text filters. One of the included filters, in fact, is a syntax-highlighting filter for snippets of Ruby, XML, and YAML code. This filter is built upon the Ruby Syntax module, which wasn’t exactly designed for Haskell syntax analysis. So I set out to create a new plug-in filter based upon HsColour.
This task turned out to be easy. All I did was duplicate
Typo’s existing syntax-highlighting filter and swap out its filtering
code for the following:
IO.popen("HsColour -css", "r+") do |f|
pid = fork { f.write text; f.close; exit! 0 }
f.close_write
text = f.read
Process.waitpid pid
end
I also tweaked the post-processing regular expressions so that they
would whittle away the HTML filler before and after the
syntax-highlighted output of HsColour:
text.gsub!(/.*<p()re>/m, ...)
text.gsub!(/<\/pre>.*/m, ...)
A few more tweaks and I was done.
Now I can wrap my Haskell code in <typo:haskell> tags and it, too, will
dance in joyous polychromatic splendor:
constructTable tspecs = do
ecolspecs <- during "argument evaluation" $ do
toNvps . concat =<< mapM splice tspecs
let names = map fst ecolspecs
let evecs = map snd ecolspecs
vecs <- argof nm $ mapM evalVector evecs
let vlens = map vlen vecs
if length (group vlens) == 1
then return . VTable $ mkTable (zip names vecs)
else throwError $
"table columns must be non-empty vectors of equal length"
where
nm = "table(...) constructor"
splice (TCol envp) = return [envp]
splice (TSplice e) = do
val <- eval e
case val of
VTable t ->
return $ zipWith mkNVP (tcnames t) (elems (tvecs t))
VList gl ->
liftM (zipWith mkNVP (map name . elems $ glnames gl)) $
mapM asVectorNull (elems $ glvals gl)
_ -> throwError $
"can't construct table columns from (" ++
show val ++ ")"
mkNVP n vec = NVP n (mkNoPosExpr . EVal $ VVector vec)
name "" = "NA"
name n = n
If you want the filter code, here it is: haskell_controller.rb. Just drop it into components/plugins/textfilters and restart Typo. The corresponding CSS styles can be found in my user-styles.css.
Posted in haskell, ruby, typo
Tags haskell, hscolour, ruby, typo
no comments
no trackbacks

Posted by Tom Moertel
Tue, 31 Oct 2006 19:44:00 GMT
Last Tuesday, my friend Casey and I were hanging
out at Aldo Coffee. We planned on enjoying
some espresso, doing some work, and then heading over to the Pittsburgh Coding
Dojo, where we could hang out with
other geekly folks.
We ended up
not having enough time to go to the meeting, but we decided to hack
on the challenge problem anyway, using Aldo’s ever-handy free
wireless to access the Internet.
The Dojo problem was PragDave’s Kata Eleven – Sorting it
Out. (It’s short;
read it now.) We decided to use Haskell for our implementation
language.
In this post, I’ll walk through our coding session and explain how our
solution evolved. To better fit the session into a blog post, I
have removed a lot of back-and-forth micro iterations, and I have
edited some of the code for clarity.
The first part of the problem
The first part of the problem was “Sorting Balls.” The story: You
need to implement a “rack” to hold the balls drawn at random (without
replacement) from a bin containing sixty balls, numbered 0 to 59.
Regardless of the order in which the balls are added to the rack, you
need to present them in sorted order whenever you’re asked for them.
Upon reading this part of the challenge, a couple of thoughts sprung to mind:
- Because the range of balls is so small, the problem was begging for a solution based on a counting sort.
- Because the balls are uniquely numbered and drawn without replacement, we could even use a bit vector to represent counts.
Nevertheless, we decided to ignore these thoughts and implement a
more-general solution that would work for any (orderable) values,
not just small ranges of integers.
Sketching the interface
The first step, then, was to sketch out an interface. Our
interface mirrored the one from the problem statement but
was tweaked for Haskell:
mkRack :: Rack a
add :: Ord a => a -> Rack a -> Rack a
balls :: Rack a -> [a]
The function mkRack makes a new rack to hold values (“balls”) of
type a. It’s equivalent to Rack.new in Ruby.
The add function adds a ball to a rack. You give it a ball and a
rack, and it returns a new rack that is the same as the original rack
but also contains the ball. (If you’re accustomed to stateful
programming, this may seem weird. Why return a new rack instead of
modifying the original rack? Because, in Haskell, you can’t change
values: you can only create new values. At first, this constraint may
seem limiting, but after you get used to it, you’ll find it
empowering.)
Note: the Ord a qualification on the type signature of
add says that it will work for any type a whose values can be
ordered. The qualification is necessary because values of some types,
like IO actions, cannot be compared to see which are less than the
others.
The balls function is an “observer”: it lets you observe the balls
in a rack by returning them as an ordered list.
And that’s the interface.
With the interface sketched, we gave it meaning by defining its
properties.
Giving our interface meaning: defining properties using QuickCheck
QuickCheck is a
powerful, easy-to-use testing tool. Instead of checking test cases,
it checks properties – statements about what your code ought to do
in general.
The great thing about QuickCheck properties is that they are
testable documentation. They tell the world what your code
is supposed to do,
and they do so in a concise, formal language that just happens to be
easily readable by humans and automatically testable by computers.
To specify the desired properties of our Rack interface, we first had
to import QuickCheck:
Then, we defined our first property. It said, simply, that a new rack
must be empty when observed:
prop_New =
balls mkRack =~ []
Our second property said that, when you add a ball x to
a rack, the resulting rack must contain the same
balls as the original rack plus x:
prop_AddAddsElement rack x =
balls (add x rack) =~ (x : balls rack)
Both of the properties above rely upon a special, order-insensitive
equality test that we defined for lists of Int values:
(=~) :: [Int] -> [Int] -> Bool
xs =~ ys = sort xs == sort ys
Note that under this test, [1,2] “equals”
both [1,2] and [2,1], but it does not “equal”
any other values.
The reason we defined this operator was to help us specify the two
essential properties of add separately: (1) it must insert a ball
into a rack, and (2) the new ball’s position, when observed, must
preserve the rack’s ordering invariant. The previous property
definition used the =~ operator to specify the first of
these two properties. The next property we defined specified the
second:
prop_AddPreservesOrdering rack x =
isOrdered (balls rack) ==> isOrdered (balls (add x rack))
This definition specifies that, for all racks rack and all balls
x, if the balls in rack are ordered, the balls in the rack that
results from adding x to rack must also be ordered. If you
are familiar with proof by
induction, you’ll
know why we went this route. In short, if we can prove that this
property holds (and, trivially, that an empty rack is ordered), we can
prove that add preserves the ordering invariant.
To round out the property definition, we needed to define the isOrdered test:
isOrdered :: [Int] -> Bool
isOrdered xs = xs == sort xs
And those are the properties we needed to check the correctness
of our implementation. Of course, we still needed to write our
implementation, and we turned to that task next.
A simple, list-based Rack implementation
For our first implementation, we decided upon a drop-dead-simple
list-based representation. We would keep the elements of the list
in sorted order by inserting them into the correct positions when
add was called.
Here, then, was our code:
type Rack a = [a]
mkRack = []
add x xs = insertList x xs
balls = id
insertList :: Ord a => a -> [a] -> [a]
insertList x [] = [x]
insertList x (y:ys)
| x < y = x : y : ys
| otherwise = y : insertList x ys
That’s it.
We took our new implementation for a spin in GHCi:
*Rack> balls mkRack
[]
*Rack> balls (add 3 mkRack)
[3]
*Rack> balls (add 4 (add 3 mkRack))
[3,4]
*Rack> balls (add 1 (add 4 (add 3 mkRack)))
[1,3,4]
*Rack> balls (foldr add mkRack [4,2,6,3,-9,0,33,9])
[-9,0,2,3,4,6,9,33]
To really test our implementation, we asked QuickCheck to check its
properties:
*Rack> quickCheck prop_New
OK, passed 100 tests.
*Rack> quickCheck prop_AddAddsElement
OK, passed 100 tests.
*Rack> quickCheck prop_AddPreservesOrdering
OK, passed 100 tests.
I should point out that QuickCheck did not prove that our properties
held. Rather, it gathered evidence that we could use to argue that
our properties held. The evidence was that each of our properties’
claims was subjected to 100 randomly generated tests, and none of
the tests was able to disprove a claim.
Was this evidence sufficient for us to rest satisfied that our
implementation was correct? Given how simple our implementation
was, I felt that the evidence was sufficient. Casey agreed, and we moved on.
With the first implementation done, we decided to try a more-sophisticated
implementation.
Generalizing the interface
Since we were about to have multiple implementations, it made sense
for us to define a generalized interface that any “Rack-like”
implementation could use. For that, Haskell’s type classes were
perfect:
class Racklike a ra | ra -> a where
mkRack :: ra
add :: Ord a => a -> ra -> ra
balls :: ra -> [a]
The interface was essentially the same as before, except that the data
type behind the rack implementation was not given by a specific type
Rack a but rather by the type variable ra, which represents some
type of rack container for balls of type a.
Note that ra determines a. If, for example, you know that
the container type ra equals “a list of Int values,”
you know that a must equal Int. (To represent this
relationship, we used functional
dependencies,
a popular extension to the Haskell 98 standard.)
With the Racklike type class in place, we moved our list-based
implementation inside of the interface:
type ListRack a = [a]
instance Racklike a (ListRack a) where
mkRack = []
add = insertList
balls = id
Next, we modified our QuickCheck property definitions. Where before
it was fine to assume that we would be testing our single, list-based
implementation, now we needed to allow for testing other
implementation types. We did this by adding a rackType parameter to
our property definitions. We used the type, not the value, of this
parameter to determine the type of rack to test:
prop_New rackType =
balls (mkRack `asTypeOf` rackType) =~ []
prop_AddAddsElement rackType ballList x =
balls (add x rack) =~ (x : balls rack)
where
rack = rackFromList ballList `asTypeOf` rackType
prop_AddPreservesOrdering rackType ballList x =
isOrdered (balls rack) ==> isOrdered (balls (add x rack))
where
rack = rackFromList ballList `asTypeOf` rackType
Because we could no longer assume the rack would be represented
as a list of integers, we wrote rackFromList to convert such
a list into a rack:
rackFromList xs = foldr add mkRack xs
With these modifications in place, we re-ran our tests, specifying
(via type annotations) that we wanted to run them for the ListRack
implementation:
*Rack> quickCheck $ prop_New (undefined :: ListRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddAddsElement (undefined :: ListRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddPreservesOrdering (undefined :: ListRack Int)
OK, passed 100 tests.
A tree-based Rack implementation
Now that we were free to add additional implementation types,
we created one based on binary trees. We started by defining
the tree data type:
data Tree a
= Empty
| Root (Tree a) a (Tree a)
deriving (Ord, Eq, Show)
This definition says that a tree can be either empty or a root node.
A root node has a single value and left and right sub-trees.
Further, root nodes must satisfy an ordering invariant: if a root
node’s value is x, all of the values in its left subtree must be
less than x, and all of the values in its right subtree must be
greater than or equal to x. The data type doesn’t enforce this
invariant, so we would need to enforce it in our implementation.
Next, we wrote the basic functions for creating, adding elements to,
and observing our trees.
We needed to be able to create empty trees:
Inserting an element into a tree requires us to walk the tree and
append the element as a new leaf node in the correct location, being
mindful of our ordering invariant. Because our data structure is
inherently recursive, a recursive implementation was straightforward
to code:
insertTree x Empty = Root Empty x Empty
insertTree x (Root left y right)
| x < y = Root (insertTree x left) y right
| otherwise = Root left y (insertTree x right)
Note that we don’t try to ensure that the tree is balanced. The
problem statement says that the balls are randomly selected, and thus
we can expect our trees, on average, to be balanced naturally.
Next, we wrote the code to observe the elements of a tree.
We used a functional-programming idiom
for efficiently flattening a tree into a list:
elemsTree rx =
elemsTree' rx []
elemsTree' Empty = id
elemsTree' (Root left x right) =
elemsTree' left . (x :) . elemsTree' right
Finally, we defined a new tree-based rack type and declared
it to be an instance of the Racklike type class:
type TreeRack a = Tree a
instance Racklike a (TreeRack a) where
mkRack = emptyTree
add = insertTree
balls = elemsTree
With the implementation done, we took it for a test drive:
*Rack> add 1 mkRack :: TreeRack Int
Root Empty 1 Empty
*Rack> add 3 (add 1 mkRack) :: TreeRack Int
Root Empty 1 (Root Empty 3 Empty)
*Rack> balls (add 3 (add 1 mkRack) :: TreeRack Int)
[1,3]
Then, for the real test, we checked that our properties held for
TreeRacks:
*Rack> quickCheck $ prop_New (undefined :: TreeRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddAddsElement (undefined :: TreeRack Int)
OK, passed 100 tests.
quickCheck $ prop_AddPreservesOrdering (undefined :: TreeRack Int)
OK, passed 100 tests.
Satisfied with these results, we moved on to part two of the problem.
The second part of the problem
The second part of the problem was about sorting the letters within a
block of text, ignoring white space and punctuation, and converting
upper case letters into lower case: “Are there any ways to
perform this sort cheaply, and without using built-in libraries?”
Again, a counting sort seemed like an obvious ideal solution, but
we decided to recycle our existing code since we had to leave soon.
Because our Rack implementations were generic, they would work on
letters just as well as on numbers or other kinds of balls:
*Rack> balls (rackFromList "this is a test" :: TreeRack Char)
" aehiisssttt"
With our existing code already doing the hard work
for us, it was trivial to code up the letter-sorting function:
sortLetters xs =
balls (rackFromList letters :: TreeRack Char)
where
letters = [toLower x | x <- xs, isAlpha x]
(Note: Because of the nature of the problem, I interpreted the
question’s “without using built-in libraries” to mean “without
built-in sorting libraries.”)
We took the new function for a test drive, and it worked
as expected:
*Rack> sortLetters "This is a test, pal."
"aaehiilpsssttt"
And that ended our coding session.
Update: Tweaked the revised definition of the AddAddsElement
property for greater parallelism with the original.
Update 2007-03-03: Minor edits for clarity.
Posted in programming, functional programming, haskell, testing
Tags haskell, kata, quickcheck, sorting, testing
7 comments
no trackbacks

Posted by Tom Moertel
Thu, 19 Oct 2006 01:40:00 GMT
Even skilled programmers have a hard time keeping their web
applications free of XSS and SQL-injection vulnerabilities. And it
shows: a sobering portion of web sites are open to some scary security threats.
Why are so many sites vulnerable to these well-known holes? Probably
because it’s insanely hard for programmers to solve the fundamental
“strings problem” at the heart of these vulnerabilities. The problem
itself is easy to understand, but we humans aren’t equipped to carry
out the solution. Simply put, we just plain suck at keeping a
bazillion different strings straight in our heads, let alone
consistently and reliably rendering their interactions safe whenever they
cross paths in a modern web application. It’s easy to say, “just
escape the little buggers,” but it’s hard to get it right, every single time.
Computers, on the other hand, are pretty good at keeping track of
details by the bucket-full. Wouldn’t it be nice, then,
if our programming languages gave us the power to delegate this nasty “strings
problem” to our computers, which could then devote their unwavering mechanical precision to grinding the problem out of existence? Isn’t that the kind of thing modern programming languages are supposed to be good at?
I’d like to think the answer to that question is a big, you betcha.
So let’s grab a modern programming language and solve the strings problem.
Let’s solve the strings problem in Haskell
In this article, we will look at one way (among many) to solve the strings
problem: by adding Ruby-style string templates to Haskell. These
templates support “interpolation” via the usual, convenient #{var}
syntax, but here interpolation is type safe. Haskell’s type system
will prevent us from inadvertently mixing incompatible string types,
and it will detect mistakes at compile time, before they can become
live XSS or SQL-injection holes. Further, our solution will offer
us these benefits without making us jump through hoops or pay some
onerous syntax penalty.
To be more specific, the system offers the following benefits:
- It provides a string-management kernel that lets you create “safe strings” by certifying a regular string as representing either text or a fragment of a known language.
- It allows you to conveniently define new language types for any string-based language that you can provide an escaping rule for (e.g., XML, URLs, SQL, untrusted user input).
- It provides compile-time syntactic sugar (via Template Haskell) that makes working with safe strings as convenient as working with string interpolation in languages like Ruby and Perl.
- It catches and reports (at compile time) the following commonly made programming errors:
- failing to escape a plain-old-text string before mixing it into a string that represents a language fragment
- mixing strings that represent fragments of incompatible languages
- mixing strings that represent fragments of compatible languages in an ambiguous way (the system will force you to disambiguate)
(This is a long one, so grab an espresso, lean back, and read on in
style. Also, if you have a smoking jacket, you might want to get it now.)
Read more...
Posted in programming, programming languages, haskell, ruby, web development, testing, rails
Tags haskell, ruby, strings, testing, types
37 comments
no trackbacks

Posted by Tom Moertel
Tue, 15 Aug 2006 21:01:00 GMT
Via the article on this year’s Google Code
Jam on
Slashdot earlier today, I found Hareesh Nagarajan’s write-up of a previous
year’s Code-Jam
problem. Since
Google often comes up with interesting problems, I decided to give
this one a go.
The problem: count the ways to find a word by walking on a grid
You are given a rectangular grid of letters and a word to find.
You must compute the number of ways to find the word within the grid
using the following rules:
- start at any cell within the grid
- from there, move to any of the cell’s eight neighboring cells
- continue moving from that neighbor to its neighbors, and so on,
until you have spelled out the word
- you may visit cells more than once, but you cannot visit
the same cell twice in a row (i.e., you must move for each turn)
For instance, consider the following grid, taken from the examples in
the problem statement:
ABC
FED
GAI
If you were asked to find the word “AEA” on this grid, you could do it in
four ways:
Way --Move---
1 2 3
1: *BC ABC *BC
FED F*D FED
GAI GAI GAI
2: *BC ABC ABC
FED F*D FED
GAI GAI G*I
3: ABC ABC *BC
FED F*D FED
G*I GAI GAI
4: ABC ABC ABC
FED F*D FED
G*I GAI G*I
If you were asked to find “ABCD”, you could do it in only one way:
Way --Move--------
1 2 3 4
1: *BC A*C AB* ABC
FED FED FED FE*
GAI GAI GAI GAI
If you were asked to find “AAB”, you could not:
there are no “A” cells on the grid that have other “A” cells
as neighbors.
The tricksy nature of the problem
As you might expect from Google, this puzzle was designed to see
whether your solution can scale. A simple search will quickly bog
down because each step in the search can expand into vastly more
possibilities, as searching for “AAAA” on a seemingly harmless 2×2
grid of all “A” cells shows – there are 108 solutions.
The problem statement says that the grid may be up to 50×50 in
size and the word to find may be up to 50 letters long. Imagine,
then, that you are asked to find a word composed of 50 “A” letters
within a 50×50 grid of “A” cells. All of the cells will be valid
starting points, and each will have, on average, slightly less than 8
valid neighbors. Thus there will be about
50 × 50 × 8^49 = 4.5e47 ways to find
the word1. Tracing them all would take forever.
The trick is figuring out a more efficient way to solve the problem.
Since that’s the fun part of this problem, I won’t spoil it for you
by telling you how I did it. (If you truly want spoilers, you can study
my code.)
My solution
Here is what I came up with. I’ll present the code first and then
discuss how to use it.
Note: The code below is out of date but printed here for
continuity. See Update 5 for the most-recent revision.
{-
Tom Moertel <tom@moertel.com>
2006-08-15
Haskell-based solution to the Google Code Jam problem "countPaths";
see http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.
-}
module Main (main) where
import Control.Monad
import Data.Array
import qualified Data.Map as M
main = do
word:gridspec <- liftM words getContents
print $ (countPaths word (toGridArray gridspec) :: Integer)
countPaths word@(p:_) gridArray =
sum . M.elems $ foldl step state0 (zip word (tail word))
where
state0 = M.fromList [(cell, 1) | (cell, q) <- assocs gridArray, p == q]
neighbors = toNeighborMap gridArray
step state fromto = M.fromListWith (+) $ do
steps <- M.lookup fromto neighbors
(start, count) <- M.assocs state
cells <- M.lookup start steps
cell <- cells
return (cell, count)
toGridArray gridspec@(l1:_) =
listArray ((1,1), (length gridspec, length l1)) (concat gridspec)
toNeighborMap gridArray =
M.fromListWith (M.unionWith (flip (++))) $ do
(cell, p) <- assocs gridArray
cell' <- neighbors8 cell
guard $ inRange (bounds gridArray) cell'
return ((p, gridArray!cell'), M.singleton cell [cell'])
neighbors8 (r,c) =
[(r+h, c+v) | h <- [-1..1], v <- [-1..1], h /= 0 || v /= 0]
-- Local Variables: ***
-- compile-command: "ghc -O2 -o wordpath --make WordPath.hs" ***
-- End: ***
My solution generalizes upon the problem statement in a few ways:
- the grid can be any size and the word any length
- the grid and word can be composed of any comparable data type, not just A–Z letters (if you use the stdin interface, the code will use Unicode characters)
- the code will compute exact counts instead of returning -1 for counts greater than 1e9
You can enter problems from the command line. Enter the word first
and then the grid, each row separated by whitespace. For example:
$ ./wordpath
AAAAAAAAAAA
AAAAA
AAAAA
AAAAA
AAAAA
AAAAA
^D
2745564336
Give it a try
This was a fun problem to solve. If you have a little spare time,
give it a try. I would love to compare results and talk about
strategies.
Update: Fixed typo: Finding “AAAA” – not “AA” – on
a 2×2 grid of all “A” letters results in a count of 108. Thanks to Joshua Volz for pointing out my mistake.
Update 2: Here’s a dynamic-programming-based implementation of countPaths that is
about six times faster than my original implementation when solving the
maximum-size, all-the-same-letter problem:
countPaths word gridArray =
sum [counts ! (length word, cell) | cell <- cells]
where
counts = listArray ((1, (1, 1)), (length word, gridSize)) $
[countFrom i cell | i <- [1..length word], cell <- cells]
countFrom i cell
| i == 1 && match = 1
| match = sum [counts!((i-1),n) | n <- neighbors!cell]
| otherwise = 0
where
match = rword ! i == gridArray ! cell
neighbors = listArray (bounds gridArray) $
[filter (inRange (bounds gridArray)) (neighbors8 cell)
| cell <- cells ]
rword = listArray (1, length word) (reverse word)
cells = indices gridArray
gridSize = snd (bounds gridArray)
See the thread started by ‘psykotic’ on reddit.com for more.
Update 3: Ivan Peev has solved the problem in Python: Solving the Google Code Jam ‘countPaths’ problem in Python. Because his implementation uses the same algorithm that my implementation in Update 2 does, it makes a good vehicle for Haskell-versus-Python speed comparisons, an interesting topic in light of the warning Google provides about using Python in the Google Code Jam:
NOTE: All submissions have a maximum of 2 seconds of runtime
per test case. This limit is used in harder problems to
force submissions to be of a certain complexity. Because of
the inherent speed differences between Python and the other
offered languages is large, some problems may require extra
optimization or not be solvable using the Python language.
Ivan reports that his Python implementation solves the maximum-size, all-the-same-letter problem in about 8 seconds on an old 1-GHz AMD Athlon. The Haskell version comes in somewhat faster at 0.9 second on a 1.8-GHz AMD Opteron. (On the same Opteron, Ivan’s code clocks in at 2.8 seconds, which is impressive.)
Update 4: I have added a Ruby implementation and a Perl implementation and timings, too. On the the maximum-size, all-the-same-letter problem, Ruby clocks in at 4.2 seconds; Perl in 1.7 seconds. See the Perl implementation for a summary table of the timings.
Update 5: As I promised reader Kartik in a comment, here is a
further-simplified, yet 25-percent-faster, version of my
implementation in Update 2. This version eliminates the cache in
favor of a current-state array that is folded through the successive
letters of the target word. The result of the fold operation is the
final state array, whose elements are summed to yield the final
result. Here’s the complete code:
{-
Tom Moertel <tom@moertel.com>
2006-08-15 (revised 2006-09-01)
Haskell-based solution to the Google Code Jam problem "countPaths"
See http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.
This implementation is based on the dynamic-programming strategy
mentioned by reddit.com user "psykotic":
http://programming.reddit.com/info/dni1/comments/cdp59.
-}
module Main (main) where
import Control.Monad
import Data.Array
main = do
word:gridspec <- liftM words getContents
print $ (countPaths word (toGridArray gridspec) :: Integer)
countPaths word grid =
sum . elems $ foldl move counts0 (tail (reverse word))
where
move counts c = step c $ sum . map (counts!) . neighbors
counts0 = step (last word) (const 1)
step c f = listArray (bounds grid) $ map (match c f) cells
match c f cell = if c == grid!cell then f cell else 0
neighbors cell = filter (inRange (bounds grid)) (neighbors8 cell)
cells = indices grid
toGridArray gridspec@(l1:_) =
listArray ((1,1), (length gridspec, length l1)) (concat gridspec)
neighbors8 (r,c) =
[(h, v) | h <- [r-1..r+1], v <- [c-1..c+1], h /= r || v /= c]
-- Local Variables: ***
-- compile-command: "ghc -O2 -o wordpathdp --make WordPathDP.hs" ***
-- End: ***
1 I believe that the exact count is
303 835 410 591 851 117 616 135 618 108 340 196 903 254 429 200 (approx. 3.04e47). It takes about six seconds 0.75 second to compute on a 1.8-GHz AMD64 box running Linux.
Posted in programming, haskell, fun stuff
Tags code, countpaths, google, haskell, jam, puzzles, wordpaths
7 comments
no trackbacks
