Posted by Tom Moertel
Sat, 01 Sep 2007 19:39:00 GMT
Via Reddit I found Mark Nelson’s post about a recent word puzzle from NPR’s
Weekend Edition:
Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?
The puzzle is fairly straightforward to solve by hand (think about
it), but let’s write a program to solve it. That will give us a convenient
excuse to discuss a super-handy function I use all the time:
clusterBy. In Haskell, it looks like this:
import Control.Arrow ((&&&))
import qualified Data.Map as M
clusterBy :: Ord b => (a -> b) -> [a] -> [[a]]
clusterBy f = M.elems . M.map reverse . M.fromListWith (++)
. map (f &&& return)
What clusterBy does is group a list of values by their signatures,
as computed by a given signature function f, and returns
the groups in order of ascending signature. For example, we
can cluster the words “the tan ant gets some fat” by length, by
first letter, or by last letter just by changing the
signature function we give to clusterBy:
*Main> let antwords = words "the tan ant gets some fat"
*Main> clusterBy length antwords
[["the","tan","ant","fat"],["gets","some"]]
*Main> clusterBy head antwords
[["ant"],["fat"],["gets"],["some"],["the","tan"]]
*Main> clusterBy last antwords
[["the","some"],["tan"],["gets"],["ant","fat"]]
If we use sort as the signature function, we can find anagrams:
*Main> clusterBy sort antwords
[["fat"],["tan","ant"],["gets"],["the"],["some"]]
And that brings us back to the original puzzle. To find the solution,
we must consider each unique pair of state names to form a “word” and
find the anagrams among a list of such “words.”
Assuming we are given
a list of state names on standard input, one state per line, we can
write the shell of our solution as follows:
main = mapM_ print . solve . lines =<< getContents
The shell delegates the real work to solve. It’s job is to
compute the unique, 2-state combinations from the original
list of states, and then find the anagrams among these combinations.
As before, finding the anagrams is simply a matter of calling
clusterBy with the right signature function. We also filter
out the trivial results, which are not valid solutions:
solve = filter ((>1) . length) . clusterBy signature . ucombos
ucombos xs = [[x,y] | x <- xs, y <- xs, x < y]
signature = sort . filter isAlpha . concat
That’s it. Now we can solve the puzzle by feeding our program a list of states:
$ runhaskell anagrams2.hs < states.txt
[["NORTH CAROLINA","SOUTH DAKOTA"],
["NORTH DAKOTA","SOUTH CAROLINA"]]
What a handy little function, that clusterBy.
Posted in programming
Tags clusterby, functions, haskell, hof, puzzles
11 comments
no trackbacks

Posted by Tom Moertel
Thu, 17 Aug 2006 06:21:00 GMT
As promised, here’s a Perl version of a dynamic-programming-based solver
for the Google Code Jam “countPaths” problem. It is a straight
translation of my improved Ruby implementation.
As you might expect, the Perl version was pretty fast. It proved faster than the
other scripting-language implementations I tried (in this rather unscientific benchmark, not to be taken seriously):
All timings were taken while solving the maximum-size, all-the-same-letter
problem on my 1.8-GHz Opteron box.
Here’s the Perl implementation:
#!/usr/bin/perl
# Tom Moertel <tom@moertel.com>
# 2006-08-16
#
# Perl-based solution to the Google Code Jam problem "countPaths".
# See http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.
use strict;
use warnings;
use List::Util 'sum';
use Math::BigInt;
sub count_paths {
my ($grid, $word) = @_;
my $rword = reverse $word;
my $rowmax = $#$grid;
my $colmax = length($grid->[0]);
my ($slab, $sum);
for my $i (0 .. length($rword) - 1) {
my $char = substr $rword, $i, 1;
($slab, my $previous_slab) = ([], $slab);
for my $r (0 .. $rowmax) {
my ($row, $line) = ($grid->[$r], $slab->[$r] ||= []);
for my $c (0 .. $colmax) {
$line->[$c] = $char ne substr($row,$c,1) ? 0 : $i == 0 ? 1 : do {
$sum = 0;
my $clo = $c > 0 ? $c - 1 : $c;
my $chi = $c < $colmax ? $c + 1 : $c;
for my $nr (($r>0 ? $r-1 : $r) .. ($r<$rowmax ? $r+1 : $r)) {
for my $nc ($clo .. $chi) {
$sum += $previous_slab->[$nr][$nc]
if $nr != $r || $nc != $c;
}
}
$sum;
}
}
}
}
sum map @$_, @$slab;
}
print count_paths([("A"x50)x50], "A"x50), $/;
# 3.03835410591851e+47
Update: I simplified the code a whisper by
removing an unnecessary variable
$counts. Here’s a diff
if you’re curious about what’s changed:
--- countpaths.pl.orig 2006-08-18 00:16:56.000000000 -0400
+++ /countpaths.pl 2006-08-18 00:19:30.000000000 -0400
@@ -19,11 +19,11 @@
my $rword = reverse $word;
my $rowmax = $#$grid;
my $colmax = length($grid->[0]);
- my ($counts, $slab, $sum);
+ my ($slab, $sum);
for my $i (0 .. length($rword) - 1) {
my $char = substr $rword, $i, 1;
- ($slab, my $previous_slab) = ($counts->[$i] ||= [], $slab);
+ ($slab, my $previous_slab) = ([], $slab);
for my $r (0 .. $rowmax) {
my ($row, $line) = ($grid->[$r], $slab->[$r] ||= []);
for my $c (0 .. $colmax) {
Update 2: Augmented the introductory paragraph with a parenthetical
comment that reminds readers that these single-fuzzy-data-point-style
timings should not be taken seriously. Also removed the word
“bested,” which might suggest that there is an optimization
contest in play. Please, no wagering.
Update 3: Stripped another variable ($j), which was
completely unused and leftover from previous implementation. (See
why you shouldn’t code late at night?)
Posted in programming, perl, fun stuff
Tags code, countpaths, google, jam, perl, puzzles, wordpaths
2 comments
no trackbacks

Posted by Tom Moertel
Tue, 15 Aug 2006 21:01:00 GMT
Via the article on this year’s Google Code
Jam on
Slashdot earlier today, I found Hareesh Nagarajan’s write-up of a previous
year’s Code-Jam
problem. Since
Google often comes up with interesting problems, I decided to give
this one a go.
The problem: count the ways to find a word by walking on a grid
You are given a rectangular grid of letters and a word to find.
You must compute the number of ways to find the word within the grid
using the following rules:
- start at any cell within the grid
- from there, move to any of the cell’s eight neighboring cells
- continue moving from that neighbor to its neighbors, and so on,
until you have spelled out the word
- you may visit cells more than once, but you cannot visit
the same cell twice in a row (i.e., you must move for each turn)
For instance, consider the following grid, taken from the examples in
the problem statement:
ABC
FED
GAI
If you were asked to find the word “AEA” on this grid, you could do it in
four ways:
Way --Move---
1 2 3
1: *BC ABC *BC
FED F*D FED
GAI GAI GAI
2: *BC ABC ABC
FED F*D FED
GAI GAI G*I
3: ABC ABC *BC
FED F*D FED
G*I GAI GAI
4: ABC ABC ABC
FED F*D FED
G*I GAI G*I
If you were asked to find “ABCD”, you could do it in only one way:
Way --Move--------
1 2 3 4
1: *BC A*C AB* ABC
FED FED FED FE*
GAI GAI GAI GAI
If you were asked to find “AAB”, you could not:
there are no “A” cells on the grid that have other “A” cells
as neighbors.
The tricksy nature of the problem
As you might expect from Google, this puzzle was designed to see
whether your solution can scale. A simple search will quickly bog
down because each step in the search can expand into vastly more
possibilities, as searching for “AAAA” on a seemingly harmless 2×2
grid of all “A” cells shows – there are 108 solutions.
The problem statement says that the grid may be up to 50×50 in
size and the word to find may be up to 50 letters long. Imagine,
then, that you are asked to find a word composed of 50 “A” letters
within a 50×50 grid of “A” cells. All of the cells will be valid
starting points, and each will have, on average, slightly less than 8
valid neighbors. Thus there will be about
50 × 50 × 8^49 = 4.5e47 ways to find
the word1. Tracing them all would take forever.
The trick is figuring out a more efficient way to solve the problem.
Since that’s the fun part of this problem, I won’t spoil it for you
by telling you how I did it. (If you truly want spoilers, you can study
my code.)
My solution
Here is what I came up with. I’ll present the code first and then
discuss how to use it.
Note: The code below is out of date but printed here for
continuity. See Update 5 for the most-recent revision.
{-
Tom Moertel <tom@moertel.com>
2006-08-15
Haskell-based solution to the Google Code Jam problem "countPaths";
see http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.
-}
module Main (main) where
import Control.Monad
import Data.Array
import qualified Data.Map as M
main = do
word:gridspec <- liftM words getContents
print $ (countPaths word (toGridArray gridspec) :: Integer)
countPaths word@(p:_) gridArray =
sum . M.elems $ foldl step state0 (zip word (tail word))
where
state0 = M.fromList [(cell, 1) | (cell, q) <- assocs gridArray, p == q]
neighbors = toNeighborMap gridArray
step state fromto = M.fromListWith (+) $ do
steps <- M.lookup fromto neighbors
(start, count) <- M.assocs state
cells <- M.lookup start steps
cell <- cells
return (cell, count)
toGridArray gridspec@(l1:_) =
listArray ((1,1), (length gridspec, length l1)) (concat gridspec)
toNeighborMap gridArray =
M.fromListWith (M.unionWith (flip (++))) $ do
(cell, p) <- assocs gridArray
cell' <- neighbors8 cell
guard $ inRange (bounds gridArray) cell'
return ((p, gridArray!cell'), M.singleton cell [cell'])
neighbors8 (r,c) =
[(r+h, c+v) | h <- [-1..1], v <- [-1..1], h /= 0 || v /= 0]
-- Local Variables: ***
-- compile-command: "ghc -O2 -o wordpath --make WordPath.hs" ***
-- End: ***
My solution generalizes upon the problem statement in a few ways:
- the grid can be any size and the word any length
- the grid and word can be composed of any comparable data type, not just A–Z letters (if you use the stdin interface, the code will use Unicode characters)
- the code will compute exact counts instead of returning -1 for counts greater than 1e9
You can enter problems from the command line. Enter the word first
and then the grid, each row separated by whitespace. For example:
$ ./wordpath
AAAAAAAAAAA
AAAAA
AAAAA
AAAAA
AAAAA
AAAAA
^D
2745564336
Give it a try
This was a fun problem to solve. If you have a little spare time,
give it a try. I would love to compare results and talk about
strategies.
Update: Fixed typo: Finding “AAAA” – not “AA” – on
a 2×2 grid of all “A” letters results in a count of 108. Thanks to Joshua Volz for pointing out my mistake.
Update 2: Here’s a dynamic-programming-based implementation of countPaths that is
about six times faster than my original implementation when solving the
maximum-size, all-the-same-letter problem:
countPaths word gridArray =
sum [counts ! (length word, cell) | cell <- cells]
where
counts = listArray ((1, (1, 1)), (length word, gridSize)) $
[countFrom i cell | i <- [1..length word], cell <- cells]
countFrom i cell
| i == 1 && match = 1
| match = sum [counts!((i-1),n) | n <- neighbors!cell]
| otherwise = 0
where
match = rword ! i == gridArray ! cell
neighbors = listArray (bounds gridArray) $
[filter (inRange (bounds gridArray)) (neighbors8 cell)
| cell <- cells ]
rword = listArray (1, length word) (reverse word)
cells = indices gridArray
gridSize = snd (bounds gridArray)
See the thread started by ‘psykotic’ on reddit.com for more.
Update 3: Ivan Peev has solved the problem in Python: Solving the Google Code Jam ‘countPaths’ problem in Python. Because his implementation uses the same algorithm that my implementation in Update 2 does, it makes a good vehicle for Haskell-versus-Python speed comparisons, an interesting topic in light of the warning Google provides about using Python in the Google Code Jam:
NOTE: All submissions have a maximum of 2 seconds of runtime
per test case. This limit is used in harder problems to
force submissions to be of a certain complexity. Because of
the inherent speed differences between Python and the other
offered languages is large, some problems may require extra
optimization or not be solvable using the Python language.
Ivan reports that his Python implementation solves the maximum-size, all-the-same-letter problem in about 8 seconds on an old 1-GHz AMD Athlon. The Haskell version comes in somewhat faster at 0.9 second on a 1.8-GHz AMD Opteron. (On the same Opteron, Ivan’s code clocks in at 2.8 seconds, which is impressive.)
Update 4: I have added a Ruby implementation and a Perl implementation and timings, too. On the the maximum-size, all-the-same-letter problem, Ruby clocks in at 4.2 seconds; Perl in 1.7 seconds. See the Perl implementation for a summary table of the timings.
Update 5: As I promised reader Kartik in a comment, here is a
further-simplified, yet 25-percent-faster, version of my
implementation in Update 2. This version eliminates the cache in
favor of a current-state array that is folded through the successive
letters of the target word. The result of the fold operation is the
final state array, whose elements are summed to yield the final
result. Here’s the complete code:
{-
Tom Moertel <tom@moertel.com>
2006-08-15 (revised 2006-09-01)
Haskell-based solution to the Google Code Jam problem "countPaths"
See http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.
This implementation is based on the dynamic-programming strategy
mentioned by reddit.com user "psykotic":
http://programming.reddit.com/info/dni1/comments/cdp59.
-}
module Main (main) where
import Control.Monad
import Data.Array
main = do
word:gridspec <- liftM words getContents
print $ (countPaths word (toGridArray gridspec) :: Integer)
countPaths word grid =
sum . elems $ foldl move counts0 (tail (reverse word))
where
move counts c = step c $ sum . map (counts!) . neighbors
counts0 = step (last word) (const 1)
step c f = listArray (bounds grid) $ map (match c f) cells
match c f cell = if c == grid!cell then f cell else 0
neighbors cell = filter (inRange (bounds grid)) (neighbors8 cell)
cells = indices grid
toGridArray gridspec@(l1:_) =
listArray ((1,1), (length gridspec, length l1)) (concat gridspec)
neighbors8 (r,c) =
[(h, v) | h <- [r-1..r+1], v <- [c-1..c+1], h /= r || v /= c]
-- Local Variables: ***
-- compile-command: "ghc -O2 -o wordpathdp --make WordPathDP.hs" ***
-- End: ***
1 I believe that the exact count is
303 835 410 591 851 117 616 135 618 108 340 196 903 254 429 200 (approx. 3.04e47). It takes about six seconds 0.75 second to compute on a 1.8-GHz AMD64 box running Linux.
Posted in programming, haskell, fun stuff
Tags code, countpaths, google, haskell, jam, puzzles, wordpaths
7 comments
no trackbacks

Posted by Tom Moertel
Thu, 23 Mar 2006 21:51:00 GMT
I ran across a fun programming puzzle (via
Raganwald):
Given a deck of n unique cards, cut the deck i cards from top
and perform a perfect shuffle. A perfect shuffle begins by putting
down the bottom card from the top portion of the deck followed by
the bottom card from the bottom portion of the deck followed by
the next card from the top portion, etc., alternating cards until
one portion is used up. The remaining cards go on top. The problem
is to find the number of perfect shuffles required to return the
deck to its original order. Your function should be declared as:
static long shuffles(int nCards,int iCut);
Please send the result of shuffles(1002,101) along with your
program and your resume to ‘resume’ at nextag.com.
It’s a fun problem, so give it a try before reading on.
Warning: small spoilers ahead
Read more...
Posted in programming, haskell, fun stuff
Tags haskell, puzzles, shuffles
3 comments
no trackbacks
