Solving the Google Code Jam "countPaths" problem in Haskell
Posted by Tom Moertel Tue, 15 Aug 2006 21:01:00 GMT
Via the article on this year’s Google Code Jam on Slashdot earlier today, I found Hareesh Nagarajan’s write-up of a previous year’s Code-Jam problem. Since Google often comes up with interesting problems, I decided to give this one a go.
The problem: count the ways to find a word by walking on a grid
You are given a rectangular grid of letters and a word to find. You must compute the number of ways to find the word within the grid using the following rules:
- start at any cell within the grid
- from there, move to any of the cell’s eight neighboring cells
- continue moving from that neighbor to its neighbors, and so on, until you have spelled out the word
- you may visit cells more than once, but you cannot visit the same cell twice in a row (i.e., you must move for each turn)
For instance, consider the following grid, taken from the examples in the problem statement:
ABC FED GAI
If you were asked to find the word “AEA” on this grid, you could do it in four ways:
Way --Move---
1 2 3
1: *BC ABC *BC
FED F*D FED
GAI GAI GAI
2: *BC ABC ABC
FED F*D FED
GAI GAI G*I
3: ABC ABC *BC
FED F*D FED
G*I GAI GAI
4: ABC ABC ABC
FED F*D FED
G*I GAI G*I
If you were asked to find “ABCD”, you could do it in only one way:
Way --Move--------
1 2 3 4
1: *BC A*C AB* ABC
FED FED FED FE*
GAI GAI GAI GAI
If you were asked to find “AAB”, you could not: there are no “A” cells on the grid that have other “A” cells as neighbors.
The tricksy nature of the problem
As you might expect from Google, this puzzle was designed to see whether your solution can scale. A simple search will quickly bog down because each step in the search can expand into vastly more possibilities, as searching for “AAAA” on a seemingly harmless 2×2 grid of all “A” cells shows – there are 108 solutions.
The problem statement says that the grid may be up to 50×50 in size and the word to find may be up to 50 letters long. Imagine, then, that you are asked to find a word composed of 50 “A” letters within a 50×50 grid of “A” cells. All of the cells will be valid starting points, and each will have, on average, slightly less than 8 valid neighbors. Thus there will be about 50 × 50 × 8^49 = 4.5e47 ways to find the word1. Tracing them all would take forever.
The trick is figuring out a more efficient way to solve the problem. Since that’s the fun part of this problem, I won’t spoil it for you by telling you how I did it. (If you truly want spoilers, you can study my code.)
My solution
Here is what I came up with. I’ll present the code first and then discuss how to use it.
Note: The code below is out of date but printed here for continuity. See Update 5 for the most-recent revision.
{-
Tom Moertel <tom@moertel.com>
2006-08-15
Haskell-based solution to the Google Code Jam problem "countPaths";
see http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.
-}
module Main (main) where
import Control.Monad
import Data.Array
import qualified Data.Map as M
main = do
word:gridspec <- liftM words getContents
print $ (countPaths word (toGridArray gridspec) :: Integer)
countPaths word@(p:_) gridArray =
sum . M.elems $ foldl step state0 (zip word (tail word))
where
state0 = M.fromList [(cell, 1) | (cell, q) <- assocs gridArray, p == q]
neighbors = toNeighborMap gridArray
step state fromto = M.fromListWith (+) $ do
steps <- M.lookup fromto neighbors
(start, count) <- M.assocs state
cells <- M.lookup start steps
cell <- cells
return (cell, count)
toGridArray gridspec@(l1:_) =
listArray ((1,1), (length gridspec, length l1)) (concat gridspec)
toNeighborMap gridArray =
M.fromListWith (M.unionWith (flip (++))) $ do
(cell, p) <- assocs gridArray
cell' <- neighbors8 cell
guard $ inRange (bounds gridArray) cell'
return ((p, gridArray!cell'), M.singleton cell [cell'])
neighbors8 (r,c) =
[(r+h, c+v) | h <- [-1..1], v <- [-1..1], h /= 0 || v /= 0]
-- Local Variables: ***
-- compile-command: "ghc -O2 -o wordpath --make WordPath.hs" ***
-- End: ***
My solution generalizes upon the problem statement in a few ways:
- the grid can be any size and the word any length
- the grid and word can be composed of any comparable data type, not just A–Z letters (if you use the stdin interface, the code will use Unicode characters)
- the code will compute exact counts instead of returning -1 for counts greater than 1e9
You can enter problems from the command line. Enter the word first and then the grid, each row separated by whitespace. For example:
$ ./wordpath
AAAAAAAAAAA
AAAAA
AAAAA
AAAAA
AAAAA
AAAAA
^D
2745564336
Give it a try
This was a fun problem to solve. If you have a little spare time, give it a try. I would love to compare results and talk about strategies.
Update: Fixed typo: Finding “AAAA” – not “AA” – on a 2×2 grid of all “A” letters results in a count of 108. Thanks to Joshua Volz for pointing out my mistake.
Update 2: Here’s a dynamic-programming-based implementation of countPaths that is about six times faster than my original implementation when solving the maximum-size, all-the-same-letter problem:
countPaths word gridArray =
sum [counts ! (length word, cell) | cell <- cells]
where
counts = listArray ((1, (1, 1)), (length word, gridSize)) $
[countFrom i cell | i <- [1..length word], cell <- cells]
countFrom i cell
| i == 1 && match = 1
| match = sum [counts!((i-1),n) | n <- neighbors!cell]
| otherwise = 0
where
match = rword ! i == gridArray ! cell
neighbors = listArray (bounds gridArray) $
[filter (inRange (bounds gridArray)) (neighbors8 cell)
| cell <- cells ]
rword = listArray (1, length word) (reverse word)
cells = indices gridArray
gridSize = snd (bounds gridArray)
See the thread started by ‘psykotic’ on reddit.com for more.
Update 3: Ivan Peev has solved the problem in Python: Solving the Google Code Jam ‘countPaths’ problem in Python. Because his implementation uses the same algorithm that my implementation in Update 2 does, it makes a good vehicle for Haskell-versus-Python speed comparisons, an interesting topic in light of the warning Google provides about using Python in the Google Code Jam:
NOTE: All submissions have a maximum of 2 seconds of runtime per test case. This limit is used in harder problems to force submissions to be of a certain complexity. Because of the inherent speed differences between Python and the other offered languages is large, some problems may require extra optimization or not be solvable using the Python language.
Ivan reports that his Python implementation solves the maximum-size, all-the-same-letter problem in about 8 seconds on an old 1-GHz AMD Athlon. The Haskell version comes in somewhat faster at 0.9 second on a 1.8-GHz AMD Opteron. (On the same Opteron, Ivan’s code clocks in at 2.8 seconds, which is impressive.)
Update 4: I have added a Ruby implementation and a Perl implementation and timings, too. On the the maximum-size, all-the-same-letter problem, Ruby clocks in at 4.2 seconds; Perl in 1.7 seconds. See the Perl implementation for a summary table of the timings.
Update 5: As I promised reader Kartik in a comment, here is a further-simplified, yet 25-percent-faster, version of my implementation in Update 2. This version eliminates the cache in favor of a current-state array that is folded through the successive letters of the target word. The result of the fold operation is the final state array, whose elements are summed to yield the final result. Here’s the complete code:
{-
Tom Moertel <tom@moertel.com>
2006-08-15 (revised 2006-09-01)
Haskell-based solution to the Google Code Jam problem "countPaths"
See http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.
This implementation is based on the dynamic-programming strategy
mentioned by reddit.com user "psykotic":
http://programming.reddit.com/info/dni1/comments/cdp59.
-}
module Main (main) where
import Control.Monad
import Data.Array
main = do
word:gridspec <- liftM words getContents
print $ (countPaths word (toGridArray gridspec) :: Integer)
countPaths word grid =
sum . elems $ foldl move counts0 (tail (reverse word))
where
move counts c = step c $ sum . map (counts!) . neighbors
counts0 = step (last word) (const 1)
step c f = listArray (bounds grid) $ map (match c f) cells
match c f cell = if c == grid!cell then f cell else 0
neighbors cell = filter (inRange (bounds grid)) (neighbors8 cell)
cells = indices grid
toGridArray gridspec@(l1:_) =
listArray ((1,1), (length gridspec, length l1)) (concat gridspec)
neighbors8 (r,c) =
[(h, v) | h <- [r-1..r+1], v <- [c-1..c+1], h /= r || v /= c]
-- Local Variables: ***
-- compile-command: "ghc -O2 -o wordpathdp --make WordPathDP.hs" ***
-- End: ***
1 I believe that the exact count is
303 835 410 591 851 117 616 135 618 108 340 196 903 254 429 200 (approx. 3.04e47). It takes about six seconds 0.75 second to compute on a 1.8-GHz AMD64 box running Linux.
readers


Hi, I didn’t quite solve the problem itself, but I’ve been trying to make sense of your solution with my incomplete haskell skills. I’ve documented my progress in a tiddlywiki
It’s still incomplete; I’ve focussed so far on toNeighborMap which first caught my attention by being totally opaque ;)
Two lessons I’ve learnt so far have been how easy a pure functional language is to refactor, and how to do blackbox probing in haskell. I refactored toNeighborMap into 2 new functions, and the description for one of them contains a nice IMO account of blackbox probing.
I suspect as I go further I will eliminate toNeighborMap entirely and create a new name for the main do block in countPaths.
Kartik, the function toNeighborMap in my original implementation converts the grid array into a “neighbor map” that serves as a look-up table to speed the main search. The neighbor map relates a given letter-transition pair to the set of all cells on the grid that participate in that transition. For example, if you looked up the transition
('A','B')in the neighbor map, the result would be the set of cells that contain'A'and have'B'cells as neighbors.If you look at my revised implementation in Update 2, however, you’ll see that the underlying dynamic-programming algorithm makes the neighbor map unnecessary.
Further, even the Update-2 implementation can be improved. Access to the cache array counts proceeds in an orderly march, from small to large paths, that requires only two levels of the array at any given time. Thus we could replace counts with an array of lesser dimension that is folded through the path’s letters to result in the final maximum-path array, which is all that is needed to compute the result. If I have a spare moment, I’ll code it up.
here’s my java solution – I think it does the trick…
I tinkered with the problem again this morning and have come up with a mush better solution – it’s about 8 times faster now
This C++ solution took me 2.5 hours, to hit the anser. Found 6 bugs in the code. Shame on me. If I don’t have the answers given in the problem statement, I probably can’t get it right. So, it is not easy to work in Google!
My C++ solution is generalised but for the fact that it follows the original specification of 50×50 grid and 50 character long word. It can count and display correctly any digited numbers being limited only by RAM availibility. However the major drawback is its preformance. Its damn slow.Guess recursion is taking its toll.
the code ;
output ;
Here’s a different algo , a rather optimized one and i clocks less than a sec for the full 50×50 size of 50 words search.. !
#include
#include
#include
#include
void main()
{
// defining variables
register int i,j,r,c=0,wc,l,ROWMAX=50,COLMAX=50,WORDLEN,element,d,p,e;
register char WORD[50],GRID50[50],a,b,ch;
register long double sum,no_of_paths,RM2[2500];
// accepting and displaying the grid and the word to serach for
// setting the ROWMAX COLMAX and WORDLEN
cout<<”\n Use test case for maximum length ? (y/n) : ” ;
cin>>ch;
if(!(ch==’y’ || ch==’Y’ ))
{
}
l=0;
while(l<=1)
{
}
if(c==1)
{cout<<”\nEnter the eord to search for : ”;
gets(WORD);
}
else
{
for(i=0;i
}
WORDLEN=strlen(WORD);
e=0;
// The real calculation
for (wc=WORDLEN-1;wc>=0;wc—)
{
}
no_of_paths=0;
for(i=0;i<=element;i++)
{
no_of_paths+=RM[d][i];
}
cout<<”\nThe no of paths in which the word can be spelled is ”<
getch();
}