The inner beauty of tree traversals

Posted by Tom Moertel Fri, 27 Jan 2012 04:24:00 GMT

This has been done a million times before, but if you haven’t seen it, it’s pretty neat. Let’s say you have a simple recursive data structure like a binary tree:

module Tree where

data Tree a = Empty
            | Node a (Tree a) (Tree a)
            deriving (Eq, Show)

-- some binary trees

t0 = Empty
t1 = Node 1 Empty Empty
t3 = Node 2 (Node 1 Empty Empty) (Node 3 Empty Empty)

Now say you want to traverse the nodes, in “preorder.” So you write a traversal function. It folds a binary operator f through the values in the tree, starting with an initial accumulator value of z. First, it visits the current node, then the left subtree, and finally the right subtree, combining each value it encounters with the current accumulator value to get the next accumulator value. The final accumulator value is the result.

Spelling out the steps, it looks like this:

preorder_traversal f z tree = go tree z
  where
    go Empty        z = z
    go (Node v l r) z = let z'   = f v z     -- current node
                            z''  = go l z'   -- left subtree
                            z''' = go r z''  -- right subtree
                        in z'''

But if you wanted an “inorder” traversal instead, you’d write a slightly different function, visiting the left subtree before the current node:

inorder_traversal f z tree = go tree z
  where
    go Empty        z = z
    go (Node v l r) z = let z'   = go l z    -- left subtree
                            z''  = f v z'    -- current node
                            z''' = go r z''  -- right subtree
                        in z'''

To see how the two traversal functions work, let’s use both to flatten the 3-node tree t3 we defined earlier.

flatten traversal = reverse . traversal (:) []

test0i = flatten inorder_traversal t3   -- [1,2,3]
test0p = flatten preorder_traversal t3  -- [2,1,3]

Great.

But now you start thinking about post-order traversal. Do you want to write a third traversal function that’s almost the same as the first two?

So you go back to the inorder traversal and start staring real hard at that let statement. Can you rewrite it? Yes:

inorder_traversal2 f z tree = go tree z
  where
    go Empty        z = z
    go (Node v l r) z = go r . f v . go l $ z

And now you’ve got it. Just by reordering the applications of (go l), (f v), and (go r), you can control the traversal.

Which leads to a general traversal function that lets some other function control that ordering:

traverse step f z tree = go tree z
  where
    go Empty        z = z
    go (Node v l r) z = step (f v) (go l) (go r) z

Doesn’t that look beautiful?

Now you can just spell out your desired traversals:

preorder   = traverse $ \n l r -> r . l . n
inorder    = traverse $ \n l r -> r . n . l
postorder  = traverse $ \n l r -> n . r . l

A test drive:

test1p = flatten preorder t3   -- [2,1,3]
test1i = flatten inorder t3    -- [1,2,3]
test1o = flatten postorder t3  -- [1,3,2]

And you’re happy.

But can you go further? What if your trees are binary search trees and you just want to find the minimum or maximum value? Can you just traverse to the left or right?

Yes:

leftorder  = traverse $ \n l r -> l . n
rightorder = traverse $ \n l r -> r . n

treemin = leftorder min maxBound
treemax = rightorder max minBound

test2l = treemin t3 :: Int  -- 1
test2r = treemax t3 :: Int  -- 3

Isn’t that neat?

Posted in
Tags , , , ,
5 comments
no trackbacks
Reddit Delicious

Writing a simple plagiarism detector in Haskell

Posted by Tom Moertel Thu, 16 Jun 2011 04:13:00 GMT

Recently, I wrote a simple plagiarism detector in Haskell using n-grams. The problem it solves is this: You have a document, the suspect, which you believe may be derived from other documents, the sources. You want to find regions in the suspect that are identical to those in the sources.

One simple solution is to use n-grams, which are just the n-word neighborhoods taken from some sequence of words. For example, the 4-grams of the sequence a b c d e f are a b c d, b c d e, and c d e f.

The idea for plagiarism detection with n-grams is that, as you increase n, the probability that a given n-gram will be shared between two arbitrary documents rapidly vanishes. For example, Google says, right now, that the 2-gram “rapidly vanishes” appears in about 11,000 documents, but the 5-gram “two different documents rapidly vanishes” occurs in no documents. (Once Google indexes this post, however, that will change.)

But when one document is copied partly from another, they will probably share many long sequences of words. So, the sharing of longer n-grams is evidence of copying.

It’s not proof, however. If two papers about U.S. history both contain the 5-gram “The United States of America,” that coincidence is not surprising. But when lots of unlikely sharing occurs, it’s strong evidence of some form of common ancestry.

Fortunately, when copying occurs, it’s often easy to see, once you know where to look. So my detector just annotates the suspect document to indicate which portions of it share n-grams with the source documents. Shared regions are converted to ALL CAPS. For example, looking for shared 4-grams, my detector finds ample evidence that the following paragraph was copied from the text above:

$ ./Plag 4 suspect.txt blog-post-plagiarism-detector.txt

But it's not rock solid. If two documents ABOUT U.S. HISTORY
BOTH CONTAIN THE 5-GRAM "THE UNITED STATES OF AMERICA," 
that's not shocking. BUT WHEN LOTS OF sharing occurs, that's
pretty good EVIDENCE OF SOME FORM OF COMMON ANCESTRY.

In fact, it was copied: I duplicated an earlier paragraph and then changed a few words.

Using n-grams to detect sharing is not a perfect system, but it’s a simple system, and, because its output is easily interpreted, it’s an effective system. When your eyeballs see a lot of ALL CAPS TEXT, it’s easy to judge what it represents.

Anyway, here’s the code.

{-

Simple plagiarism detector that detects n-gram duplication.

Usage: ./Plag N suspect.txt source.txt... > annotated-suspect.txt
(where N is the n-gram size; try 3, 4, and 5).

Tom Moertel <tmoertel@gmail.com>
2011-06-11

-}

module Main where

import Control.Applicative ((<$>))
import Data.Char (toLower, toUpper, isAlphaNum, isSpace)
import Data.List (tails)
import qualified Data.Set as S
import System (getArgs)

main = do
  n:suspect:sources <- getArgs
  putStrLn =<< findMatches (read n) suspect sources

findMatches n suspect sources = do
  db <- S.fromList . concat  <$> mapM (loadNgrams n) sources
  stxt <- readFile suspect
  return $ concat (match n db (nGrams n stxt) (whiteWords stxt))

match :: Int -> S.Set [String] -> [[String]] -> [String] -> [String]
match n db ngs wws = mapAt (map toUpper) matchlocs wws
  where
    matchlocs = uniq . concat $ zipWith isMatch [0..] ngs
    isMatch loc ng | ng `S.member` db = [loc .. loc + n - 1]
                   | otherwise        = []

mapAt :: (a -> a) -> [Int] -> [a] -> [a]
mapAt f locs ws = go 0 locs ws
  where
    go _   []     ws               = ws
    go _   _      []               = []
    go i (l:ls) (w:ws) | i < l     = w : go (i + 1) (l:ls) ws
    go i (l:ls) (w:ws) | otherwise = f w :  go (i + 1) ls ws

uniq = S.toList . S.fromList

loadNgrams n file = nGrams n <$> readFile file

nGrams :: Int -> String -> [[String]]
nGrams n = takeWhile ((n ==) . length) . map (take n) .
           tails . map normalize . words

normalize = map toLower . filter isAlphaNum

-- returns words like 'words' but preserves leading whitespace for each

whiteWords :: String -> [String]
whiteWords s = case span isSpace s of
  ("", "") -> []
  (s1, "") -> [s1]
  (s1, s') -> (s1 ++ w) : whiteWords s''
    where
      (w, s'') = break isSpace s'

Posted in
Tags , ,
9 comments
no trackbacks
Reddit Delicious

The best-kept secret in programming conferences, especially in a down economy

Posted by Tom Moertel Fri, 22 May 2009 05:59:00 GMT

I know, the economy sucks, and everything is expensive these days. It’s even worse for you, a polyglot programmer with a serious programming-language obsession. You prowl Proggit, lounge at LtU, and occasionally step on over to Stack Overflow. But it’s just not enough. You need more. You need to hang out in meatspace with other fascinating programmers, diving into modern object systems, getting mechanical with crazy VMs, hacking on code like the wild code-hacking beast that you are.

Sure, it’s a nice dream and all, but how are you going to make it happen? And even if you could in theory make it happen, how could you afford to do it now, in this down economy?

Well, my friend, let me share a secret: You can make it happen. And you can afford it. Here’s how: Just be at the 10th Anniversary Yet Another Perl Conference. It’s day upon day upon day of jam-packed programming-language goodness of all sorts, not “just” Perl – and this year it’s the one conference you can afford.

Seriously, I did a little price-checking, and YAPC is about the most underpriced programming-fest on the planet:

Conference Price
JavaOne $1,995
RailsConf 895
PyCon 450
RubyConf 200
YAPC 125


Wait, you’re not into Perl? No problem. The Perl community has always embraced diversity, and there’s a lot more than just Perl at YAPC. Check out the tag cloud for talks and you’ll see what I’m saying. At YAPC, the good stuff comes in enormous buckets, plenty for programming aficionados of all stripes. Here’s a taste:

See, YAPC is for you.

Am I trying to persuade you to join us at YAPC? Yes. But I’m only doing it because I care about you. YAPC is a fascinating conference, packed with hackers from around the world, all eager to share interesting things, things many you would find delightful, if only you knew about them. So I’m letting you know about them, right now, so you don’t miss out.

Do yourself a favor. If you can figure out how to get your brain to Pittsburgh in the 4th week of June 2009 – yes, only 4 weeks away – then by all means register now for YAPC|10. It’s a great conference at a great price, and it’s something no discriminating hacker ought to be denied.

I hope to see you at YAPC|10.

Update: If any Haskellers are reading this and want to meet up at YAPC, let me know. I’m trying to put together a BOF session.

Posted in
Tags , , , , , , , ,
14 comments
no trackbacks
Reddit Delicious

PXSL Tools now on Hackage and GitHub

Posted by Tom Moertel Mon, 25 Aug 2008 02:56:00 GMT

I finally got around to releasing PXSL Tools on Hackage. The package contains pxslcc, a preprocessor that converts Parsimonious XML Shorthand Language into XML, and supporting documentation.

If you want to hack on the Haskell sources, I’ve put the project on GitHub, too. See the pxsl-tools project page to browse the code, or just clone the repo and hack away:

$ git clone git://github.com/tmoertel/pxsl-tools.git

Tags , ,
no comments
no trackbacks
Reddit Delicious

Thinking algebraically: a functional-programming dividend that pays during your imperative-programming day job

Posted by Tom Moertel Thu, 21 Aug 2008 01:50:00 GMT

Although at work I code mostly in Python – a language from which lambda and map were nearly removed – I still find that functional-programming experience has its benefits. One of the “functional-programming dividends” I notice most often is insight gained from considering problems from an algebraic perspective.

Recently, for example, I had a small parsing problem. I had to write code that, given a simple grammar specification as input, emits a regular expression that matches the language generated by the grammar.

Here’s a simplified version of the problem. A grammar specification is limited to a series of one or more atoms. For example, “a b c” represents the atom “a”, followed by the atom “b”, followed by the atom “c”. To generate the grammar, the series of atoms is interpreted such that each atom (except the last) generates a production rule of the following form:

atom_rule ::=
  <the literal atom> (SPACE <the next rule> | NOTHING)

(SPACE represents literal white space and NOTHING represents an empty string.) The grammar as a whole is rooted in the first atom’s rule.

Thus the specification “a b c” represents the following grammar:

grammar ::= a_rule
a_rule  ::= "a" (SPACE b_rule | NOTHING)
b_rule  ::= "b" (SPACE c_rule | NOTHING)
c_rule  ::= "c" 

Note that the final atom’s production matches only the literal atom itself: it has no following rule on which to chain.

The grammar, in turn, generates the following language:

a
a b
a b c

Thus, given the grammar specification “a b c”, my code had to produce a regular expression that would match “a”, “a b”, or “a b c”.

At this point, please stop for a moment and think about this little programming exercise. Do any solutions leap to mind? How would you approach the problem? Form your opinions now, because I’m going to ask you about them later. (If you’re feeling especially caffeinated, try coding a solution before reading on.)

Read more...

Posted in
Tags , , ,
20 comments
no trackbacks
Reddit Delicious

PXSL Tools 1.0: Your ticket out of XML Hell

Posted by Tom Moertel Tue, 18 Dec 2007 03:33:00 GMT

XML is fine for representing document-like things, but when it’s twisted to represent build recipes, configuration files, and little programming languages, it opens the gates to XML Hell. Once the gates are opened, the demons of cargo-cult thinking are loosed upon the world, where they are free to trick innocent programmers into working with grotesquely twisted XML documents – something no human mind was designed to comprehend. Ensnared, these programmers are slowly drawn into the depths of XML Hell, from which their lamentations echo across the universe.

When the demons of cargo-cult thinking come for you, don’t be ensnared! Instead, be prepared – with PXSL – the Parsimonious XML Shorthand Language (pronounced “pixel”).

What’s PXSL? It’s a luxurious, thermonuclear smoking jacket that you can slip on using a convenient preprocessor. Use it whenever you see grotesque XML on the horizon. Within PXSL’s plush (and stylish) protection, you can create all the nasty, twisted XML that may be demanded of you, but you need not descend into XML Hell to do it. Instead, you can work from the comfort of a well-stocked lounge, where clarity and conciseness are always on tap.

For example, here’s a snippet from an XSLT stylesheet, in the original XML:

<xsl:template match="/">
  <xsl:for-each select="//*/@src|//*/@href">
    <xsl:value-of select="."/>
    <xsl:text>&#10;</xsl:text>
  </xsl:for-each>
</xsl:template>

And here’s the same snippet, written in PXSL:

template /
  for-each //*/@src|//*/@href
    value-of .
    text <<&#10;>>

Isn’t that refreshing?

Why PXSL?

There are lots of XML shorthands available. (The PXSL FAQ lists about ten of them.) So why choose PXSL? Here’s why:

Also, PXSL is battle tested. It was first released in 2003 and has been saving people from XML Hell since. People who try it seem to like it:

  • I think PXSL could do wonders for soothing my irrational hatred for all things XML.kowey
  • Impressive… I converted some of my files from XML to PXSL and the readability was much improved.chris
  • Quite aside from the fact that XSLT is finally somewhat readable, the fact that you’ve added a serious macro system means that some serious scripting of XML can occur. I’m very impressed.invisible

The next time you’re headed for XML Hell, why not give the venerable PXSL a try? You might just find that you like it, too.


This public service announcement was brought to you in celebration of the 1.0 release of the pxsl-tools package. The PXSL-to-XML compiler pxslcc is written in Haskell and uses the cross-platform Haskell Cabal build/package system to let you use PXSL just about anywhere.

Posted in
Tags , , ,
13 comments
no trackbacks
Reddit Delicious

How I stopped missing Darcs and started loving Git

Posted by Tom Moertel Mon, 10 Dec 2007 21:52:00 GMT

About three years ago, I switched to Darcs as my primary source-code management system. It was simple, intuitive, and powerful, and it made managing my projects more fun and less frustrating than any centralized VCS ever had. That it was written in Haskell, one of my favorite programming languages, made it even better. I was hooked.

Since then, the distributed SCM landscape has changed. Darcs hasn’t improved much, but its competitors have made long strides, especially Git and Mercurial. Both are crazy fast, vigorously developed, and widely used on large, highly active real-world projects, such as the Linux kernel and Mozilla 2. In comparison, Darcs has stagnated.

When I started working for a new company recently, I had to consider whether to advocate Darcs or something else. In the end, I decided that Darcs would be a hard sell. Nobody else at the company uses Haskell, and having to explain how to avoid the occasional corner case seemed liked a losing proposition.

After researching and playing around with Git and Mercurial, I settled on Git. I like Git’s underlying hashed-blobs model better than Mercurial’s revlogs, and Git seems to have slightly more development momentum. Still, it was a close call. Either choice would have been completely reasonable.

Missing Darcs

When I started using Git on real projects, the one thing I really missed was the ability to easily amend earlier patches, something Darcs made trivial. Let me explain. The typical development workflow goes something like this:

  1. Checkout copy of upstream code base.
  2. Implement feature X.
  3. Commit.
  4. Implement independent feature Y.
  5. Commit.
  6. Implement independent feature Z.
  7. Commit.
  8. Push new features back upstream.

Now, what really happens is that when I’m implementing Y or Z, I’ll realize that I made a mistake in X. The trick is then fixing X so that my fix is part of the changeset/patch for X that ultimately gets pushed upstream in the last step. That way, the upstream folks will see only a single, clean patch for feature X – not a mishmash of patches that together represent X.

In Darcs, amending the original patch is easy because its patch theory lets me tweak the patch for X independently of the other patches. Darcs will simply ask me which patch I want to amend, and I’ll select the orignal patch for X:

$ emacs               # fix X
$ darcs amend-record  # amend original patch for X

Mon Dec 10 14:43:13 EST 2007  Tom Moertel <tom@moertel.com>
  * Implemented Z
Shall I amend this patch? [yNvpq], or ? for help: n

Mon Dec 10 14:42:12 EST 2007  Tom Moertel <tom@moertel.com>
  * Implemented Y
Shall I amend this patch? [yNvpq], or ? for help: n

Mon Dec 10 14:41:46 EST 2007  Tom Moertel <tom@moertel.com>
  * Implemented X
Shall I amend this patch? [yNvpq], or ? for help: y
hunk ./x 1
-X1
+X2
Shall I add this change? (1/?)  [ynWsfqadjkc], or ? for help: y
Finished amending patch:
Mon Dec 10 14:43:25 EST 2007  Tom Moertel <tom@moertel.com>
  * Implemented X

That’s it. The exact same process will work regardless of when I realize I need to fix X: before I start Y, while I’m implementing Y, after I’ve committed Y, while I’m working on Z, or after I’ve committed Z.

Learning to love Git

With Git, however, I can amend a commit only if I haven’t committed anything else before making my fix. In Git’s mind, Y depends on X, and Z depends on Y, even if they really are independent of one another.

So if I commit the original patch for X and then immediately realize I need to make a fix, before I start working on Y or Z, it’s easy:

$ emacs               # implement X
$ git commit -m 'Implemented X'

# discover problem in X

$ emacs               # fix X
$ git commit --amend  # amend original patch

More typically, it’s only while I’m working on Y that I’ll realize I need to fix X. Then it’s more complicated to amend the original commit:

$ emacs               # implement X
$ git commit -m 'Implemented X'
$ emacs               # start working on Y

# discover problem in X

$ git stash           # stash away half-completed work on Y
$ emacs               # fix X
$ git commit --amend  # amend original patch for X
$ git stash apply     # restore work on Y
$ emacs               # continue working on Y

While not as convenient as Darcs’s workflow, it’s perfectly workable.

Now let’s consider another fairly typical case: I commit X and Y and then start working on Z before I notice the problem in X. I used to think that Git couldn’t handle this case, but it can, thanks to git rebase --interactive:
$ emacs               # implement X
$ git commit -m 'Implemented X'
$ emacs               # implement Y
$ git commit -m 'Implemented Y'
$ emacs               # start working on Z

# discover problem in X

$ git stash           # stash away half-completed work on Z
$ emacs               # fix X
$ git commit -m 'Fixed X'
$ git rebase --interactive HEAD~3  # see comments below
$ git stash apply     # restore work on Z
$ emacs               # continue working on Z
The git rebase --interactive command is powerful. What the command does, as called in the snippet above, is invoke my editor of choice on a text file describing the last 3 commits (that’s the HEAD~3 part):
# Rebasing 3ad99a7..b9a8405 onto 3ad99a7
#
# Commands:
#  pick = use commit
#  edit = use commit, but stop for amending
#  squash = use commit, but meld into previous commit
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
pick 0885540 Implemented X
pick 320b115 Implemented Y
pick b9a8405 Fixed X

I can then edit the file to reorder, merge (squash), and/or remove the commits. In this example, I want to merge the fix for X into the original commit that implemented X. So I edit the file like so:

pick 0885540 Implemented X
squash b9a8405 Fixed X
pick 320b115 Implemented Y

Then I save the file, at which point Git takes over and makes the requested changes, merging the fix for X into the original commit for X. Now the log shows the original implementation and fix as one commit:

$ git log
commit f387d650976246c0854d028b040cca40e542be56
Author: Tom Moertel <tom@moertel.com>
Date:   Mon Dec 10 15:11:26 2007 -0500

    Implemented Y

commit 82a1c849ffd1bd688d5bc9d99be0e63548a89c4c
Author: Tom Moertel <tom@moertel.com>
Date:   Mon Dec 10 15:13:03 2007 -0500

    Implemented X

    Fixed X

commit 3ad99a7ef537b7ae99e435e0d2b4b0d03de92c65
Author: Tom Moertel <tom@moertel.com>
Date:   Mon Dec 10 15:11:14 2007 -0500

    Initial checkin

Once I figured out how to use git rebase --interactive, I stopped missing Darcs and started loving Git.

Posted in
Tags , , , ,
42 comments
no trackbacks
Reddit Delicious

TMR 9!

Posted by Tom Moertel Mon, 19 Nov 2007 18:07:00 GMT

Issue 9 of The Monad.Reader is hot off the presses! The issue focuses on three Google-Summer-of-Code projects for Haskell: Cabal configurations, Darcs’s Patch Theory, and the typechecker-framework TaiChi. Good stuff.

I know what I’ll be reading for lunch today.

Posted in
Tags ,
no comments
no trackbacks
Reddit Delicious

Updated cabal2rpm helps you make RPM packages from Haskell Cabal packages

Posted by Tom Moertel Sat, 08 Sep 2007 00:19:00 GMT

I just released an updated version of cabal2rpm, a small program (written in Perl) that creates RPM spec files from Cabal package descriptions. RPM is the software-packaging format used by several popular Linux distributions, including Red Hat and Fedora. Cabal is the packaging format used by the Haskell community to distribute software written in Haskell.

Bryan O’Sullivan’s cabal-rpm also creates spec files from Cabal packages. Unlike cabal2rpm, it is written in Haskell and directly interfaces with the Cabal libraries. Long term, it is the way to go. For now, however, cabal2rpm may be more convenient because it works out of the box. (To use cabal-rpm, you’ll first need to install the just-tagged Cabal 1.2.0 library, not yet in wide distribution.)

Posted in
Tags , ,
no comments
no trackbacks
Reddit Delicious

ClusterBy: a handy little function for the toolbox

Posted by Tom Moertel Sat, 01 Sep 2007 19:39:00 GMT

Via Reddit I found Mark Nelson’s post about a recent word puzzle from NPR’s Weekend Edition:

Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?

The puzzle is fairly straightforward to solve by hand (think about it), but let’s write a program to solve it. That will give us a convenient excuse to discuss a super-handy function I use all the time: clusterBy. In Haskell, it looks like this:

import Control.Arrow ((&&&))
import qualified Data.Map as M

clusterBy :: Ord b => (a -> b) -> [a] -> [[a]]
clusterBy f = M.elems . M.map reverse . M.fromListWith (++)
            . map (f &&& return)

What clusterBy does is group a list of values by their signatures, as computed by a given signature function f, and returns the groups in order of ascending signature. For example, we can cluster the words “the tan ant gets some fat” by length, by first letter, or by last letter just by changing the signature function we give to clusterBy:

*Main> let antwords = words "the tan ant gets some fat"

*Main> clusterBy length antwords
[["the","tan","ant","fat"],["gets","some"]]

*Main> clusterBy head antwords
[["ant"],["fat"],["gets"],["some"],["the","tan"]]

*Main> clusterBy last antwords
[["the","some"],["tan"],["gets"],["ant","fat"]]

If we use sort as the signature function, we can find anagrams:

*Main> clusterBy sort antwords
[["fat"],["tan","ant"],["gets"],["the"],["some"]]

And that brings us back to the original puzzle. To find the solution, we must consider each unique pair of state names to form a “word” and find the anagrams among a list of such “words.”

Assuming we are given a list of state names on standard input, one state per line, we can write the shell of our solution as follows:

main = mapM_ print . solve . lines =<< getContents

The shell delegates the real work to solve. It’s job is to compute the unique, 2-state combinations from the original list of states, and then find the anagrams among these combinations. As before, finding the anagrams is simply a matter of calling clusterBy with the right signature function. We also filter out the trivial results, which are not valid solutions:

solve = filter ((>1) . length) . clusterBy signature . ucombos
ucombos xs = [[x,y] | x <- xs, y <- xs, x < y]
signature = sort . filter isAlpha . concat   -- sort letters

That’s it. Now we can solve the puzzle by feeding our program a list of states:

$ runhaskell anagrams2.hs < states.txt
[["NORTH CAROLINA","SOUTH DAKOTA"],
 ["NORTH DAKOTA","SOUTH CAROLINA"]]

What a handy little function, that clusterBy.

Update: made clear that clusterBy returns clusters in order of ascending signature.

Update 2007-10-31: For more interesting discussion of clusterBy and the original puzzle from NPR, see Anders Pearson’s blog: A Simple Programming Puzzle Seen Through Three Different Lenses.

Posted in
Tags , , , ,
12 comments
no trackbacks
Reddit Delicious

Older posts: 1 2 3