<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheets/rss.css" type="text/css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Tom Moertel's Weblog: Tag clusterby</title>
    <link>http://blog.moertel.com/articles/tag/clusterby?tag=clusterby</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Quality rants on programming theory and stuff geeks like</description>
    <item>
      <title>ClusterBy: a handy little function for the toolbox</title>
      <description>&lt;p&gt;Via Reddit I found &lt;a href="http://marknelson.us/2007/04/01/puzzling/"&gt;Mark Nelson&amp;#8217;s post about a recent word puzzle&lt;/a&gt; from &lt;span class="caps"&gt;NPR&lt;/span&gt;&amp;#8217;s
Weekend Edition:&lt;/p&gt;


	&lt;blockquote&gt;
		&lt;p&gt;Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?&lt;/p&gt;
	&lt;/blockquote&gt;


	&lt;p&gt;The puzzle is fairly straightforward to solve by hand (think about
it), but let&amp;#8217;s write a program to solve it. That will give us a convenient
excuse to discuss a super-handy function I use all the time:
&lt;em&gt;clusterBy&lt;/em&gt;.  In Haskell, it looks like this:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='keyword'&gt;import&lt;/span&gt; &lt;span class='conid'&gt;Control&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='conid'&gt;Arrow&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varop'&gt;&amp;amp;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt;
&lt;span class='keyword'&gt;import&lt;/span&gt; &lt;span class='varid'&gt;qualified&lt;/span&gt; &lt;span class='conid'&gt;Data&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='conid'&gt;Map&lt;/span&gt; &lt;span class='keyword'&gt;as&lt;/span&gt; &lt;span class='conid'&gt;M&lt;/span&gt;

&lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='keyglyph'&gt;::&lt;/span&gt; &lt;span class='conid'&gt;Ord&lt;/span&gt; &lt;span class='varid'&gt;b&lt;/span&gt; &lt;span class='keyglyph'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varid'&gt;a&lt;/span&gt; &lt;span class='keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;b&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt; &lt;span class='keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='varid'&gt;a&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt; &lt;span class='keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='varid'&gt;a&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;
&lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;f&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='conid'&gt;M&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='varid'&gt;elems&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='conid'&gt;M&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='varid'&gt;map&lt;/span&gt; &lt;span class='varid'&gt;reverse&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='conid'&gt;M&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='varid'&gt;fromListWith&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varop'&gt;++&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt;
            &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;map&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varid'&gt;f&lt;/span&gt; &lt;span class='varop'&gt;&amp;amp;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='varid'&gt;return&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;What &lt;em&gt;clusterBy&lt;/em&gt; does is group a list of values by their signatures,
as computed by a given signature function &lt;em&gt;f&lt;/em&gt;, and returns
the groups in order of ascending signature.  For example, we
can cluster the words &amp;#8220;the tan ant gets some fat&amp;#8221; by length, by
first letter, or by last letter just by changing the
signature function we give to &lt;em&gt;clusterBy&lt;/em&gt;:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='keyword'&gt;let&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='varid'&gt;words&lt;/span&gt; &lt;span class='str'&gt;"the tan ant gets some fat"&lt;/span&gt;

&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;length&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt;
&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"the"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"tan"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"ant"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"fat"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"gets"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"some"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;

&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;head&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt;
&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"ant"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"fat"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"gets"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"some"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"the"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"tan"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;

&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;last&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt;
&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"the"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"some"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"tan"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"gets"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"ant"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"fat"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;If we use &lt;em&gt;sort&lt;/em&gt; as the signature function, we can find anagrams:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;sort&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt;
&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"fat"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"tan"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"ant"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"gets"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"the"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"some"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;And that brings us back to the original puzzle.  To find the solution,
we must consider each unique pair of state names to form a &amp;#8220;word&amp;#8221; and
find the anagrams among a list of such &amp;#8220;words.&amp;#8221;&lt;/p&gt;


	&lt;p&gt;Assuming we are given
a list of state names on standard input, one state per line, we can
write the shell of our solution as follows:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='varid'&gt;main&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='varid'&gt;mapM_&lt;/span&gt; &lt;span class='varid'&gt;print&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;solve&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;lines&lt;/span&gt; &lt;span class='varop'&gt;=&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class='varid'&gt;getContents&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;The shell delegates the real work to &lt;em&gt;solve&lt;/em&gt;.  It&amp;#8217;s job is to
compute the unique, 2-state combinations from the original
list of states, and then find the anagrams among these combinations.
As before, finding the anagrams is simply a matter of calling
&lt;em&gt;clusterBy&lt;/em&gt; with the right signature function.  We also filter
out the trivial results, which are not valid solutions:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='varid'&gt;solve&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='varid'&gt;filter&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt;&lt;span class='num'&gt;1&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;length&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;signature&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;ucombos&lt;/span&gt;
&lt;span class='varid'&gt;ucombos&lt;/span&gt; &lt;span class='varid'&gt;xs&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='varid'&gt;x&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='varid'&gt;y&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt; &lt;span class='keyglyph'&gt;|&lt;/span&gt; &lt;span class='varid'&gt;x&lt;/span&gt; &lt;span class='keyglyph'&gt;&amp;lt;-&lt;/span&gt; &lt;span class='varid'&gt;xs&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt; &lt;span class='varid'&gt;y&lt;/span&gt; &lt;span class='keyglyph'&gt;&amp;lt;-&lt;/span&gt; &lt;span class='varid'&gt;xs&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt; &lt;span class='varid'&gt;x&lt;/span&gt; &lt;span class='varop'&gt;&amp;lt;&lt;/span&gt; &lt;span class='varid'&gt;y&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;
&lt;span class='varid'&gt;signature&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='varid'&gt;sort&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;filter&lt;/span&gt; &lt;span class='varid'&gt;isAlpha&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;concat&lt;/span&gt;   &lt;span class='comment'&gt;-- sort letters&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;That&amp;#8217;s it.  Now we can solve the puzzle by feeding our program a list of states:&lt;/p&gt;


&lt;div class="typedin"&gt;
&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;$ runhaskell anagrams2.hs &amp;lt; states.txt&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;[[&amp;quot;NORTH CAROLINA&amp;quot;,&amp;quot;SOUTH DAKOTA&amp;quot;],
 [&amp;quot;NORTH DAKOTA&amp;quot;,&amp;quot;SOUTH CAROLINA&amp;quot;]]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;What a handy little function, that &lt;em&gt;clusterBy&lt;/em&gt;.&lt;/p&gt;


&lt;div class="update"&gt;

	&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; made clear that &lt;em&gt;clusterBy&lt;/em&gt; returns clusters in order
of ascending signature.&lt;/p&gt;


	&lt;p&gt;&lt;strong&gt;Update 2007-10-31:&lt;/strong&gt;  For more interesting discussion of &lt;em&gt;clusterBy&lt;/em&gt;
and the original puzzle from &lt;span class="caps"&gt;NPR&lt;/span&gt;, see Anders Pearson&amp;#8217;s blog: &lt;a href="http://thraxil.org/users/anders/posts/2007/10/30/A-Simple-Programming-Puzzle-Seen-Through-Three-Different-Lenses/"&gt;A Simple Programming Puzzle Seen Through Three Different Lenses&lt;/a&gt;.&lt;/p&gt;


&lt;/div&gt;</description>
      <pubDate>Sat, 01 Sep 2007 15:39:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:46727e92-f04a-4fab-90d8-7cefc6caee77</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox</link>
      <category>programming</category>
      <category>haskell</category>
      <category>puzzles</category>
      <category>clusterby</category>
      <category>hof</category>
      <category>functions</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/562</trackback:ping>
    </item>
  </channel>
</rss>
