<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheets/rss.css" type="text/css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Tom Moertel's Weblog: ClusterBy: a handy little function for the toolbox</title>
    <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Quality rants on programming theory and stuff geeks like</description>
    <item>
      <title>ClusterBy: a handy little function for the toolbox</title>
      <description>&lt;p&gt;Via Reddit I found &lt;a href="http://marknelson.us/2007/04/01/puzzling/"&gt;Mark Nelson&amp;#8217;s post about a recent word puzzle&lt;/a&gt; from &lt;span class="caps"&gt;NPR&lt;/span&gt;&amp;#8217;s
Weekend Edition:&lt;/p&gt;


	&lt;blockquote&gt;
		&lt;p&gt;Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?&lt;/p&gt;
	&lt;/blockquote&gt;


	&lt;p&gt;The puzzle is fairly straightforward to solve by hand (think about
it), but let&amp;#8217;s write a program to solve it. That will give us a convenient
excuse to discuss a super-handy function I use all the time:
&lt;em&gt;clusterBy&lt;/em&gt;.  In Haskell, it looks like this:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='keyword'&gt;import&lt;/span&gt; &lt;span class='conid'&gt;Control&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='conid'&gt;Arrow&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varop'&gt;&amp;amp;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt;
&lt;span class='keyword'&gt;import&lt;/span&gt; &lt;span class='varid'&gt;qualified&lt;/span&gt; &lt;span class='conid'&gt;Data&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='conid'&gt;Map&lt;/span&gt; &lt;span class='keyword'&gt;as&lt;/span&gt; &lt;span class='conid'&gt;M&lt;/span&gt;

&lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='keyglyph'&gt;::&lt;/span&gt; &lt;span class='conid'&gt;Ord&lt;/span&gt; &lt;span class='varid'&gt;b&lt;/span&gt; &lt;span class='keyglyph'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varid'&gt;a&lt;/span&gt; &lt;span class='keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;b&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt; &lt;span class='keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='varid'&gt;a&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt; &lt;span class='keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='varid'&gt;a&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;
&lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;f&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='conid'&gt;M&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='varid'&gt;elems&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='conid'&gt;M&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='varid'&gt;map&lt;/span&gt; &lt;span class='varid'&gt;reverse&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='conid'&gt;M&lt;/span&gt;&lt;span class='varop'&gt;.&lt;/span&gt;&lt;span class='varid'&gt;fromListWith&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varop'&gt;++&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt;
            &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;map&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varid'&gt;f&lt;/span&gt; &lt;span class='varop'&gt;&amp;amp;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='varid'&gt;return&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;What &lt;em&gt;clusterBy&lt;/em&gt; does is group a list of values by their signatures,
as computed by a given signature function &lt;em&gt;f&lt;/em&gt;, and returns
the groups in order of ascending signature.  For example, we
can cluster the words &amp;#8220;the tan ant gets some fat&amp;#8221; by length, by
first letter, or by last letter just by changing the
signature function we give to &lt;em&gt;clusterBy&lt;/em&gt;:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='keyword'&gt;let&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='varid'&gt;words&lt;/span&gt; &lt;span class='str'&gt;"the tan ant gets some fat"&lt;/span&gt;

&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;length&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt;
&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"the"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"tan"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"ant"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"fat"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"gets"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"some"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;

&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;head&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt;
&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"ant"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"fat"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"gets"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"some"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"the"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"tan"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;

&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;last&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt;
&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"the"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"some"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"tan"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"gets"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"ant"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"fat"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;If we use &lt;em&gt;sort&lt;/em&gt; as the signature function, we can find anagrams:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='varop'&gt;*&lt;/span&gt;&lt;span class='conid'&gt;Main&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;sort&lt;/span&gt; &lt;span class='varid'&gt;antwords&lt;/span&gt;
&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"fat"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"tan"&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='str'&gt;"ant"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"gets"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"the"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='str'&gt;"some"&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;And that brings us back to the original puzzle.  To find the solution,
we must consider each unique pair of state names to form a &amp;#8220;word&amp;#8221; and
find the anagrams among a list of such &amp;#8220;words.&amp;#8221;&lt;/p&gt;


	&lt;p&gt;Assuming we are given
a list of state names on standard input, one state per line, we can
write the shell of our solution as follows:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='varid'&gt;main&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='varid'&gt;mapM_&lt;/span&gt; &lt;span class='varid'&gt;print&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;solve&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;lines&lt;/span&gt; &lt;span class='varop'&gt;=&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class='varid'&gt;getContents&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;The shell delegates the real work to &lt;em&gt;solve&lt;/em&gt;.  It&amp;#8217;s job is to
compute the unique, 2-state combinations from the original
list of states, and then find the anagrams among these combinations.
As before, finding the anagrams is simply a matter of calling
&lt;em&gt;clusterBy&lt;/em&gt; with the right signature function.  We also filter
out the trivial results, which are not valid solutions:&lt;/p&gt;


&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='varid'&gt;solve&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='varid'&gt;filter&lt;/span&gt; &lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='layout'&gt;(&lt;/span&gt;&lt;span class='varop'&gt;&amp;gt;&lt;/span&gt;&lt;span class='num'&gt;1&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;length&lt;/span&gt;&lt;span class='layout'&gt;)&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;clusterBy&lt;/span&gt; &lt;span class='varid'&gt;signature&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;ucombos&lt;/span&gt;
&lt;span class='varid'&gt;ucombos&lt;/span&gt; &lt;span class='varid'&gt;xs&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='keyglyph'&gt;[&lt;/span&gt;&lt;span class='varid'&gt;x&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt;&lt;span class='varid'&gt;y&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt; &lt;span class='keyglyph'&gt;|&lt;/span&gt; &lt;span class='varid'&gt;x&lt;/span&gt; &lt;span class='keyglyph'&gt;&amp;lt;-&lt;/span&gt; &lt;span class='varid'&gt;xs&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt; &lt;span class='varid'&gt;y&lt;/span&gt; &lt;span class='keyglyph'&gt;&amp;lt;-&lt;/span&gt; &lt;span class='varid'&gt;xs&lt;/span&gt;&lt;span class='layout'&gt;,&lt;/span&gt; &lt;span class='varid'&gt;x&lt;/span&gt; &lt;span class='varop'&gt;&amp;lt;&lt;/span&gt; &lt;span class='varid'&gt;y&lt;/span&gt;&lt;span class='keyglyph'&gt;]&lt;/span&gt;
&lt;span class='varid'&gt;signature&lt;/span&gt; &lt;span class='keyglyph'&gt;=&lt;/span&gt; &lt;span class='varid'&gt;sort&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;filter&lt;/span&gt; &lt;span class='varid'&gt;isAlpha&lt;/span&gt; &lt;span class='varop'&gt;.&lt;/span&gt; &lt;span class='varid'&gt;concat&lt;/span&gt;   &lt;span class='comment'&gt;-- sort letters&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;That&amp;#8217;s it.  Now we can solve the puzzle by feeding our program a list of states:&lt;/p&gt;


&lt;div class="typedin"&gt;
&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;$ runhaskell anagrams2.hs &amp;lt; states.txt&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;[[&amp;quot;NORTH CAROLINA&amp;quot;,&amp;quot;SOUTH DAKOTA&amp;quot;],
 [&amp;quot;NORTH DAKOTA&amp;quot;,&amp;quot;SOUTH CAROLINA&amp;quot;]]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

	&lt;p&gt;What a handy little function, that &lt;em&gt;clusterBy&lt;/em&gt;.&lt;/p&gt;


&lt;div class="update"&gt;

	&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; made clear that &lt;em&gt;clusterBy&lt;/em&gt; returns clusters in order
of ascending signature.&lt;/p&gt;


	&lt;p&gt;&lt;strong&gt;Update 2007-10-31:&lt;/strong&gt;  For more interesting discussion of &lt;em&gt;clusterBy&lt;/em&gt;
and the original puzzle from &lt;span class="caps"&gt;NPR&lt;/span&gt;, see Anders Pearson&amp;#8217;s blog: &lt;a href="http://thraxil.org/users/anders/posts/2007/10/30/A-Simple-Programming-Puzzle-Seen-Through-Three-Different-Lenses/"&gt;A Simple Programming Puzzle Seen Through Three Different Lenses&lt;/a&gt;.&lt;/p&gt;


&lt;/div&gt;</description>
      <pubDate>Sat, 01 Sep 2007 15:39:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:46727e92-f04a-4fab-90d8-7cefc6caee77</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox</link>
      <category>programming</category>
      <category>haskell</category>
      <category>puzzles</category>
      <category>clusterby</category>
      <category>hof</category>
      <category>functions</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/562</trackback:ping>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by Tom Moertel</title>
      <description>&lt;p&gt;Nick, the problem with using &lt;em&gt;flip&lt;/em&gt; &lt;code&gt;(++)&lt;/code&gt; as the combining function for &lt;em&gt;Map.fromListWith&lt;/em&gt; is that it results in quadratic run-time w.r.t. cluster size.  By making sure the new element (which is passed as the first argument to the combining function) is always added to the head of the list, we are able to implement cluster extension with a single cons operation&amp;#8212;O(1).&lt;/p&gt;


	&lt;p&gt;If you wanted to keep the low cost and eliminate the &lt;em&gt;reverse&lt;/em&gt; step, you could use &lt;em&gt;Data.Sequence&lt;/em&gt; values instead of lists to represent clusters.  For clarity, however, I just used lists.&lt;/p&gt;


	&lt;p&gt;Cheers,&lt;br /&gt;
Tom&lt;/p&gt;</description>
      <pubDate>Thu, 01 Nov 2007 15:45:38 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:27a9c0f0-36d3-407a-95c7-44d968e72a6e</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-614</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by Nick</title>
      <description>&lt;p&gt;If you &lt;code&gt;fromListWith (flip (++))&lt;/code&gt;, you shouldn&amp;#8217;t have to &lt;code&gt;map reverse&lt;/code&gt;, iiuc.&lt;/p&gt;</description>
      <pubDate>Thu, 01 Nov 2007 01:53:13 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:b7043e4d-b0a0-400f-81de-ba403f5d1960</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-612</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by Slobodan</title>
      <description>&lt;p&gt;ClusterBy is very nice, my common lisp solution
&lt;a href="http://paste.lisp.org/display/50081"&gt;http://paste.lisp.org/display/50081&lt;/a&gt;&lt;/p&gt;


	&lt;p&gt;cheers
Slobodan Blazeski&lt;/p&gt;</description>
      <pubDate>Wed, 31 Oct 2007 21:23:07 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:fd9da7a0-d093-4448-ae0d-06123f6e29cb</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-611</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by Porges</title>
      <description>&lt;p&gt;Whoops, that was a later version. The earlier one had just (++) instead of that lambda.&lt;/p&gt;</description>
      <pubDate>Wed, 31 Oct 2007 02:11:35 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:3f89d3be-5e69-479a-bf5f-fbf7b17c6ec8</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-610</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by Porges</title>
      <description>&lt;p&gt;Strangely enough, I wrote something similar the day before yesterday&amp;#8230; I was making a map from morse values (lists of dit|dah) to strings.&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;createDict :: [String] -&amp;gt; Map.Map [Morse] String
createDict xs = Map.fromListWith (\ a b -&amp;gt; a ++ "/" ++ b)
                                 (map (wordToMorse &amp;#38;&amp;#38;&amp;#38; id) xs)
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Seems like clusterBy could be quite a common method!&lt;/p&gt;</description>
      <pubDate>Wed, 31 Oct 2007 02:10:52 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:0767cc0d-0d57-4d40-a273-63a2d9863613</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-609</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by George</title>
      <description>&lt;p&gt;Oops. What I wrote isn&amp;#8217;t true &amp;#8211; groupBy won&amp;#8217;t put a pair of elements that satisfies the predicate into the same group if they are separated by an element that belongs in a different group.&lt;/p&gt;


	&lt;p&gt;In other words, order matters to groupBy but not to clusterBy.&lt;/p&gt;</description>
      <pubDate>Mon, 03 Sep 2007 11:30:47 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:5c05f62a-e256-4722-bc17-b056a150665e</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-569</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by George</title>
      <description>&lt;p&gt;The same semantics are in turn implementable in terms of two other functions that I continually find helpful:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;equalBy f x y = (f x) == (f y)
clusterBy f = groupBy (equalBy f)
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;I suspect this is less efficient, though.&lt;/p&gt;</description>
      <pubDate>Mon, 03 Sep 2007 09:06:54 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:86dbc288-e2c6-47e4-9ce4-e0e27e56f675</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-568</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by augustss</title>
      <description>You can also do
&lt;pre&gt;&lt;code&gt;ucombos xs = [[x,y] | x:ys &amp;lt;- tails xs, y &amp;lt;- ys]
&lt;/code&gt;&lt;/pre&gt;</description>
      <pubDate>Mon, 03 Sep 2007 08:37:23 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:c21841e0-dc2f-4022-b164-5350e5cb6f44</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-567</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by Tom Moertel</title>
      <description>&lt;p&gt;Paul, thanks for the Ruby implementation. I was able to shorten it by having new Hash slots default to empty arrays:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;module Enumerable
  def cluster_by(&amp;amp;sig_fn)
    h = Hash.new { |h,k| h[k] = [] }
    self.each { |x| h[sig_fn[x]] &amp;lt;&amp;lt; x }
    h.values_at(*h.keys.sort!)
  end
end&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Regarding your question:&lt;/p&gt;


	&lt;blockquote&gt;
		&lt;p&gt;[H]ow does [&lt;em&gt;clusterBy&lt;/em&gt;] define organizing “ascending” with arbitrary types of the signature?&lt;/p&gt;
	&lt;/blockquote&gt;


	&lt;p&gt;It uses the natural ordering of the signature values.&lt;/p&gt;


	&lt;p&gt;If you look at the type of &lt;em&gt;clusterBy&lt;/em&gt;, you&amp;#8217;ll see that the result of the signature function &lt;em&gt;f&lt;/em&gt; has the type &lt;em&gt;b&lt;/em&gt; and that &lt;em&gt;b&lt;/em&gt; is qualified to be a member of the &lt;a href="http://www.haskell.org/onlinereport/basic.html#sect6.3.2"&gt;&lt;em&gt;Ord&lt;/em&gt; type class&lt;/a&gt;.  That means that any signature function you provide to &lt;em&gt;clusterBy&lt;/em&gt; must produce results that can be ordered (and hence sorted).  The sorting itself happens implicitly:  &lt;a href="http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Map.html"&gt;&lt;em&gt;Data.Map&lt;/em&gt;&lt;/a&gt; stores keys in an ordered tree and its &lt;a href="http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Map.html#v%3Aelems"&gt;&lt;em&gt;elems&lt;/em&gt;&lt;/a&gt; function is defined to &amp;#8220;return all elements of the map in the ascending order of their keys.&amp;#8221;&lt;/p&gt;


	&lt;p&gt;Thanks for your comment!&lt;/p&gt;


	&lt;p&gt;Cheers,&lt;br /&gt;Tom&lt;/p&gt;</description>
      <pubDate>Sun, 02 Sep 2007 18:56:36 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:44ae37d7-7267-4738-b584-ada62142c0c3</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-566</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by Paul Betts</title>
      <description>&lt;p&gt;I tried to implement this in Ruby &amp;#8211; of course it&amp;#8217;s not as good as the Haskell version, and there&amp;#8217;s probably a much more elegant way to do it, but here it is:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;module Enumerable
  def cluster_by(&amp;amp;b)
    buf = {}; self.each { |x|
      a = yield x;
      buf[a] ? buf[a] &amp;lt;&amp;lt; x : buf[a] = [x]
    }
    buf.keys.sort!.collect {|x| buf[x]}
  end
end&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;[&lt;em&gt;I tweaked the code to fit into the narrow column width. &amp;#8212;Tom&lt;/em&gt;]&lt;/p&gt;


	&lt;p&gt;My question is in regard to the &amp;#8220;signature&amp;#8221;, how does collectTo define organizing &amp;#8220;ascending&amp;#8221; with arbitrary types of the signature? I suppose it just calls &amp;#8220;sort&amp;#8221;, which itself can support arbitrary types. Thoughts?&lt;/p&gt;</description>
      <pubDate>Sun, 02 Sep 2007 17:49:50 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:c1f55adc-68ff-4c1f-9e5d-3812d727effb</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-565</link>
    </item>
    <item>
      <title>"ClusterBy: a handy little function for the toolbox" by pied</title>
      <description>&lt;p&gt;That&amp;#8217;s a neat function !&lt;/p&gt;


	&lt;p&gt;P!&lt;/p&gt;</description>
      <pubDate>Sat, 01 Sep 2007 22:49:33 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:fcb98ae7-109a-48bf-afa0-4aae3febef2f</guid>
      <link>http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox#comment-563</link>
    </item>
  </channel>
</rss>
