Posted by Tom Moertel
Tue, 18 Dec 2007 03:33:00 GMT
XML is fine for representing document-like things, but when it’s
twisted to represent build recipes, configuration files, and little
programming languages, it opens the gates to XML Hell. Once the
gates are opened, the demons of cargo-cult thinking are loosed upon
the world, where they are free to trick innocent programmers into
working with grotesquely twisted XML documents – something no human
mind was designed to comprehend. Ensnared, these programmers are
slowly drawn into the depths of XML Hell, from which their
lamentations echo across the
universe.
When the demons of cargo-cult thinking come for you, don’t be
ensnared! Instead, be prepared – with PXSL – the Parsimonious XML
Shorthand Language
(pronounced “pixel”).
What’s PXSL? It’s a luxurious, thermonuclear smoking jacket that you
can slip on using a convenient preprocessor. Use it whenever you see
grotesque XML on the horizon. Within PXSL’s plush (and stylish)
protection, you can create all the nasty, twisted XML that may be
demanded of you, but you need not descend into XML Hell to do it.
Instead, you can work from the comfort of a well-stocked lounge, where
clarity and conciseness are always on tap.
For example, here’s a snippet from an XSLT stylesheet, in the
original XML:
<xsl:template match="/">
<xsl:for-each select="//*/@src|//*/@href">
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:template>
And here’s the same snippet, written in PXSL:
template /
for-each //*/@src|//*/@href
value-of .
text << >>
Isn’t that refreshing?
Why PXSL?
There are lots of XML shorthands available. (The PXSL FAQ lists about ten of them.) So why choose
PXSL? Here’s why:
Also, PXSL is battle tested. It was first released in 2003 and has
been saving people from XML Hell since. People who try it seem to like it:
- I think PXSL could do wonders for soothing my irrational hatred for all things XML. —kowey
- Impressive… I converted some of my files from XML to PXSL and the readability was much improved. —chris
- Quite aside from the fact that XSLT is finally somewhat readable, the fact that you’ve added a serious macro system means that some serious scripting of XML can occur. I’m very impressed. —invisible
The next time you’re headed for XML Hell, why not give the venerable PXSL a try? You might just find that you like it, too.
This public service announcement was brought to you in celebration of
the 1.0 release of the pxsl-tools package. The PXSL-to-XML compiler
pxslcc is written in Haskell and uses the
cross-platform Haskell Cabal
build/package system to let you use PXSL just about anywhere.
Posted in programming
Tags haskell, pxsl, xml, xslt
6 comments
no trackbacks

Posted by Tom Moertel
Mon, 10 Dec 2007 21:52:00 GMT
About three years ago, I switched to
Darcs
as my primary source-code management system. It was simple,
intuitive, and powerful, and it made managing my projects more fun and
less frustrating than any centralized VCS ever had. That it was
written in Haskell, one of my favorite programming languages, made
it even better. I was hooked.
Since then, the distributed SCM landscape has changed. Darcs hasn’t
improved much, but its competitors have made long strides, especially
Git and
Mercurial. Both
are crazy fast, vigorously developed, and widely used on large, highly
active real-world projects, such as the Linux kernel and Mozilla 2.
In comparison, Darcs has
stagnated.
When I started working for a new company recently, I had to consider
whether to advocate Darcs or something else. In the end, I decided
that Darcs would be a hard sell. Nobody else at the company uses
Haskell, and having to explain how to avoid the occasional corner
case
seemed liked a losing proposition.
After researching and playing around with Git and Mercurial, I settled
on Git. I like Git’s underlying hashed-blobs model better than
Mercurial’s revlogs, and Git seems to have slightly more development
momentum. Still, it was a close call. Either choice would have been
completely reasonable.
Missing Darcs
When I started using Git on real projects, the one thing I really
missed was the ability to easily amend earlier patches, something
Darcs made trivial. Let me
explain. The typical development workflow goes something like this:
- Checkout copy of upstream code base.
- Implement feature X.
- Commit.
- Implement independent feature Y.
- Commit.
- Implement independent feature Z.
- Commit.
- Push new features back upstream.
Now, what really happens is that when I’m implementing Y or Z,
I’ll realize that I made a mistake in X. The trick is then
fixing X so that my fix is part of the changeset/patch for X that
ultimately gets pushed upstream in the last step. That way, the
upstream folks will see only a single, clean patch for feature X – not
a mishmash of patches that together represent X.
In Darcs, amending the original patch is easy because its patch theory
lets me tweak the patch for X independently of the other patches.
Darcs will simply ask me which patch I want to amend, and I’ll select
the orignal patch for X:
$ emacs # fix X
$ darcs amend-record # amend original patch for X
Mon Dec 10 14:43:13 EST 2007 Tom Moertel <tom@moertel.com>
* Implemented Z
Shall I amend this patch? [yNvpq], or ? for help: n
Mon Dec 10 14:42:12 EST 2007 Tom Moertel <tom@moertel.com>
* Implemented Y
Shall I amend this patch? [yNvpq], or ? for help: n
Mon Dec 10 14:41:46 EST 2007 Tom Moertel <tom@moertel.com>
* Implemented X
Shall I amend this patch? [yNvpq], or ? for help: y
hunk ./x 1
-X1
+X2
Shall I add this change? (1/?) [ynWsfqadjkc], or ? for help: y
Finished amending patch:
Mon Dec 10 14:43:25 EST 2007 Tom Moertel <tom@moertel.com>
* Implemented X
That’s it. The exact same process will work regardless of when I
realize I need to fix X: before I start Y, while I’m implementing Y,
after I’ve committed Y, while I’m working on Z, or after I’ve committed
Z.
Learning to love Git
With Git, however, I can amend a commit only if I haven’t committed anything else before making my fix. In Git’s mind, Y depends on X, and Z
depends on Y, even if they really are independent of one another.
So if I commit the original patch for X and then immediately realize I
need to make a fix, before I start working on Y or Z, it’s easy:
$ emacs # implement X
$ git commit -m 'Implemented X'
# discover problem in X
$ emacs # fix X
$ git commit --amend # amend original patch
More typically, it’s only while I’m working on Y that I’ll
realize I need to fix X. Then it’s more complicated
to amend the original commit:
$ emacs # implement X
$ git commit -m 'Implemented X'
$ emacs # start working on Y
# discover problem in X
$ git stash # stash away half-completed work on Y
$ emacs # fix X
$ git commit --amend # amend original patch for X
$ git stash apply # restore work on Y
$ emacs # continue working on Y
While not as convenient as Darcs’s workflow, it’s perfectly workable.
Now let’s consider another fairly typical case: I commit X and Y and
then start working on Z before I notice the problem in X. I used to
think that Git couldn’t handle this case, but it can, thanks to
git rebase --interactive:
$ emacs # implement X
$ git commit -m 'Implemented X'
$ emacs # implement Y
$ git commit -m 'Implemented Y'
$ emacs # start working on Z
# discover problem in X
$ git stash # stash away half-completed work on Z
$ emacs # fix X
$ git commit -m 'Fixed X'
$ git rebase --interactive HEAD~3 # see comments below
$ git stash apply # restore work on Z
$ emacs # continue working on Z
The
git rebase --interactive command is
powerful. What the
command does, as called in the snippet above, is invoke my editor of
choice on a text file describing the last 3 commits (that’s the
HEAD~3 part):
# Rebasing 3ad99a7..b9a8405 onto 3ad99a7
#
# Commands:
# pick = use commit
# edit = use commit, but stop for amending
# squash = use commit, but meld into previous commit
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
pick 0885540 Implemented X
pick 320b115 Implemented Y
pick b9a8405 Fixed X
I can then edit the file to reorder, merge (squash), and/or remove
the commits. In this example, I want to merge the fix for X into
the original commit that implemented X. So I edit the file like so:
pick 0885540 Implemented X
squash b9a8405 Fixed X
pick 320b115 Implemented Y
Then I save the file, at which point Git takes over and makes the
requested changes, merging the fix for X into the
original commit for X. Now the log shows the original implementation
and fix as one commit:
$ git log
commit f387d650976246c0854d028b040cca40e542be56
Author: Tom Moertel <tom@moertel.com>
Date: Mon Dec 10 15:11:26 2007 -0500
Implemented Y
commit 82a1c849ffd1bd688d5bc9d99be0e63548a89c4c
Author: Tom Moertel <tom@moertel.com>
Date: Mon Dec 10 15:13:03 2007 -0500
Implemented X
Fixed X
commit 3ad99a7ef537b7ae99e435e0d2b4b0d03de92c65
Author: Tom Moertel <tom@moertel.com>
Date: Mon Dec 10 15:11:14 2007 -0500
Initial checkin
Once I figured out how to use git rebase --interactive, I stopped
missing Darcs and started loving Git.
Posted in programming
Tags darcs, dvcs, git, haskell, scm
19 comments
no trackbacks

Posted by Tom Moertel
Mon, 19 Nov 2007 18:07:00 GMT
Issue 9 of The Monad.Reader is hot off the presses! The issue focuses on three Google-Summer-of-Code projects for Haskell: Cabal configurations, Darcs’s Patch Theory, and the typechecker-framework TaiChi. Good stuff.
I know what I’ll be reading for lunch today.
Posted in haskell
Tags haskell, tmr
no comments
no trackbacks

Posted by Tom Moertel
Sat, 08 Sep 2007 00:19:00 GMT
I just released an updated version of cabal2rpm, a small program (written in Perl) that creates RPM spec files from Cabal package descriptions. RPM is the software-packaging format used by several popular Linux distributions, including Red Hat and Fedora. Cabal is the packaging format used by the Haskell community to distribute software written in Haskell.
Bryan O’Sullivan’s cabal-rpm also creates spec files from Cabal packages. Unlike cabal2rpm, it is written in Haskell and directly interfaces with the Cabal libraries. Long term, it is the way to go. For now, however, cabal2rpm may be more convenient because it works out of the box. (To use cabal-rpm, you’ll first need to install the just-tagged Cabal 1.2.0 library, not yet in wide distribution.)
Posted in haskell
Tags cabal, cabal2rpm, haskell
no comments
no trackbacks

Posted by Tom Moertel
Sat, 01 Sep 2007 19:39:00 GMT
Via Reddit I found Mark Nelson’s post about a recent word puzzle from NPR’s
Weekend Edition:
Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?
The puzzle is fairly straightforward to solve by hand (think about
it), but let’s write a program to solve it. That will give us a convenient
excuse to discuss a super-handy function I use all the time:
clusterBy. In Haskell, it looks like this:
import Control.Arrow ((&&&))
import qualified Data.Map as M
clusterBy :: Ord b => (a -> b) -> [a] -> [[a]]
clusterBy f = M.elems . M.map reverse . M.fromListWith (++)
. map (f &&& return)
What clusterBy does is group a list of values by their signatures,
as computed by a given signature function f, and returns
the groups in order of ascending signature. For example, we
can cluster the words “the tan ant gets some fat” by length, by
first letter, or by last letter just by changing the
signature function we give to clusterBy:
*Main> let antwords = words "the tan ant gets some fat"
*Main> clusterBy length antwords
[["the","tan","ant","fat"],["gets","some"]]
*Main> clusterBy head antwords
[["ant"],["fat"],["gets"],["some"],["the","tan"]]
*Main> clusterBy last antwords
[["the","some"],["tan"],["gets"],["ant","fat"]]
If we use sort as the signature function, we can find anagrams:
*Main> clusterBy sort antwords
[["fat"],["tan","ant"],["gets"],["the"],["some"]]
And that brings us back to the original puzzle. To find the solution,
we must consider each unique pair of state names to form a “word” and
find the anagrams among a list of such “words.”
Assuming we are given
a list of state names on standard input, one state per line, we can
write the shell of our solution as follows:
main = mapM_ print . solve . lines =<< getContents
The shell delegates the real work to solve. It’s job is to
compute the unique, 2-state combinations from the original
list of states, and then find the anagrams among these combinations.
As before, finding the anagrams is simply a matter of calling
clusterBy with the right signature function. We also filter
out the trivial results, which are not valid solutions:
solve = filter ((>1) . length) . clusterBy signature . ucombos
ucombos xs = [[x,y] | x <- xs, y <- xs, x < y]
signature = sort . filter isAlpha . concat
That’s it. Now we can solve the puzzle by feeding our program a list of states:
$ runhaskell anagrams2.hs < states.txt
[["NORTH CAROLINA","SOUTH DAKOTA"],
["NORTH DAKOTA","SOUTH CAROLINA"]]
What a handy little function, that clusterBy.
Posted in programming
Tags clusterby, functions, haskell, hof, puzzles
11 comments
no trackbacks

Posted by Tom Moertel
Wed, 28 Mar 2007 19:40:00 GMT
This article is part three in a series on introductory Haskell
programming. In the first
article,
we wrote a small program to recursively scan file-system directories
and print their contents as ASCII-art trees. In the second
article,
we refactored the program to make its logic more reusable by separating
the directory-scanning logic from the tree-printing logic. In this
article, we will address a shortcoming of the refactored version: It
must scan directory hierarchies completely before printing their
trees, i.e., it must scan and then print,
when doing both simultaneously is both more efficient and
more user friendly.
Recall from the previous article that our directory-printing program
is factored into three pieces of logic:
- fsTraverse, which traverses a file-system hierarchy and returns a tree data structure;
- showTree, which converts a tree into lovingly crafted ASCII art; and
- traverseAndPrint, which prints the tree for a file-system hierarchy by
using the first two pieces of logic.
The types of the functions are as follows:
fsTraverse :: Path -> DentName -> IO DirTree
showTree :: Tree String -> String
traverseAndPrint :: Path -> IO ()
Note that showTree is a pure function, but the other two return
IO actions that may have side effects.
Within traverseAndPrint, fsTraverse and showTree are combined
into a composite IO action by the =<< combinator:
putStr . showTree =<< fsTraverse root path
The sequencing semantics of Haskell’s IO monad forces all of the
effects of fsTraverse to complete before any following
effects can begin. To better understand these sequencing semantics,
let’s consider a simple example.
The IO-monad code,
can loosely be interpreted as running the action a, which forces
its side effects to occur, and then running the action b, which forces
its side effects to occur.
In reality, a and b are not actions. They are
functions. Like all Haskell functions, they are pure and have no side
effects. It’s just that a and b return values that
represent actions, and those actions may have side effects, and the
semantics of the IO monad guarantee the ordering of those effects
(should the actions end up being connected to the runtime’s
top-level IO action and executed). If you think that’s weird, hold
that thought. For now, all that’s important is that, if the composite
action represented by the expression
(a >> b) is executed, the
effects of a, regardless of how complex, will be executed
before the effects of b.
Thus if a represents building a tree by recursively scanning
a file-system hierarchy, the entire tree must be built before
b ever gets a chance to do its thing. For our particular
application, however, that particular sequencing is suboptimal. We
know from our earlier, monolithic implementation that the
file-system hierarchy can be scanned and printed simultaneously, which
is more efficient. Ideally, then, our refactored
implementation should be just as efficient.
In this article, we will look at one way to maintain the clean,
logical separation of the a part from the b part
while allowing the parts’ effects to be interleaved for efficiency.
We will use an extension to the Haskell language to make the
directory-scanning action lazy so that it builds the tree as the tree
is consumed.
Ready? Let’s dive in.
Read more...
Posted in haskell
Tags directory_tree_series, haskell, io, laziness, lazy, trees
6 comments
no trackbacks

Posted by Tom Moertel
Wed, 07 Mar 2007 21:04:00 GMT
In my previous article on writing a simple directory-tree printer in
Haskell,
I wrote a small program to recursively scan file-system
directories and print their contents as ASCII-art trees. The
program made for an approachable example of how to use Haskell for
“imperative” tasks, but it has a few problems.
First, the directory-scanning logic and tree-printing logic are
intertwined. Neither is reusable. Second, both bits of logic are
rigid, specialized for this particular task. Even if you could
reuse them, you wouldn’t want to.
In this article, the second in a series, we will explore ways to make
our original code more reusable. We will separate the directory
scanning from the tree printing, harness the power of some old
friends from Haskell’s libraries, and think about the costs
and benefits of our changes.
The plan
Recall our original directory tree–listing solution, the
core of which I will reprint below:
tlist path =
visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path
visit path leader tie arm node = do
putStrLn (leader ++ arm ++ tie ++ node)
visitChildren (path ++ "/" ++ node) (leader ++ extension)
where
extension = case arm of "" -> ""; "`" -> " "; _ -> "| "
visitChildren path leader =
whenM (doesDirectoryExist path) $ do
contents <- getDirectoryContents path
`catch` (\e -> return [show e])
let visibles = sort . filter (`notElem` [".", ".."]) $ contents
arms = replicate (length visibles 1) "|" ++ ["`"]
zipWithM_ (visit path leader "-- ") arms visibles
The tlist function kicks off the process for a particular
file-system path, handing off to visit which recursively descends the
directory tree from the root node. The visit function calls
visitChildren to expand the subtree, if any, for each node visited.
The visitChildren function, in turn, calls back to visit to
repeat the process for each child in the subtree.
In effect, we are traversing the tree rooted at path, printing each
node in passing.
To separate the traversal part from the printing part, we will
introduce a tree data structure. The file system–traversal code
will emit a tree, and the tree-showing code will consume a tree. We
will rewrite our old tlist function, which we might as well rename to
the more descriptive traverseAndPrint, to glue the two pieces
together with the tree serving as glue:
traverseAndPrint :: Path -> IO ()
traverseAndPrint path = do
tree <- fsTraverse root path
putStrLn (showTree tree)
where
root = if "/" `isPrefixOf` path then "" else "."
That’s the plan. Now let’s carry it out.
Read more...
Posted in haskell
Tags directory_tree_series, haskell, io, refactoring, trees
7 comments
no trackbacks

Posted by Tom Moertel
Thu, 22 Feb 2007 21:30:00 GMT
At a recent gathering, my friend Casey
mentioned that he was learning a new programming language and, as a
learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and
prints out a tree representation for each:
$ tlist cheating-hangman
cheating-hangman
|-- CVS
| |-- Entries
| |-- Repository
| `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl
I thought the problem sounded like fun, and so I wrote a small
solution in Haskell. Let’s take a look.
Read more...
Posted in programming, functional programming, haskell
Tags directory_tree_series, haskell, io, trees
7 comments
no trackbacks

Posted by Tom Moertel
Wed, 01 Nov 2006 22:01:00 GMT
Last night on #haskell, Don
Stewart asked if I had seen
HsColour
for rendering syntax-highlighted Haskell in HTML. He had
used it recently, he noted in passing, to add syntax highlighting to planet.haskell.org.
Now, I can’t be certain about this, but I suspect that Don’s question
was cleverly designed to instill in me a subtle case of
syntax-highlighting envy. For on my blog, Haskell code snippets
were rendered in dreadfully boring uncolored text.
But on his blog, the
snippets dance in joyous polychromatic splendor.
Thus I was compelled to add Haskell syntax-highlighting to my blog.
Adding Haskell syntax-highlighting to Typo
My blog runs on the Ruby-on-Rails-powered Typo
system, which allows for plug-in text filters. One of the included filters, in fact, is a syntax-highlighting filter for snippets of Ruby, XML, and YAML code. This filter is built upon the Ruby Syntax module, which wasn’t exactly designed for Haskell syntax analysis. So I set out to create a new plug-in filter based upon HsColour.
This task turned out to be easy. All I did was duplicate
Typo’s existing syntax-highlighting filter and swap out its filtering
code for the following:
IO.popen("HsColour -css", "r+") do |f|
pid = fork { f.write text; f.close; exit! 0 }
f.close_write
text = f.read
Process.waitpid pid
end
I also tweaked the post-processing regular expressions so that they
would whittle away the HTML filler before and after the
syntax-highlighted output of HsColour:
text.gsub!(/.*<p()re>/m, ...)
text.gsub!(/<\/pre>.*/m, ...)
A few more tweaks and I was done.
Now I can wrap my Haskell code in <typo:haskell> tags and it, too, will
dance in joyous polychromatic splendor:
constructTable tspecs = do
ecolspecs <- during "argument evaluation" $ do
toNvps . concat =<< mapM splice tspecs
let names = map fst ecolspecs
let evecs = map snd ecolspecs
vecs <- argof nm $ mapM evalVector evecs
let vlens = map vlen vecs
if length (group vlens) == 1
then return . VTable $ mkTable (zip names vecs)
else throwError $
"table columns must be non-empty vectors of equal length"
where
nm = "table(...) constructor"
splice (TCol envp) = return [envp]
splice (TSplice e) = do
val <- eval e
case val of
VTable t ->
return $ zipWith mkNVP (tcnames t) (elems (tvecs t))
VList gl ->
liftM (zipWith mkNVP (map name . elems $ glnames gl)) $
mapM asVectorNull (elems $ glvals gl)
_ -> throwError $
"can't construct table columns from (" ++
show val ++ ")"
mkNVP n vec = NVP n (mkNoPosExpr . EVal $ VVector vec)
name "" = "NA"
name n = n
If you want the filter code, here it is: haskell_controller.rb. Just drop it into components/plugins/textfilters and restart Typo. The corresponding CSS styles can be found in my user-styles.css.
Posted in haskell, ruby, typo
Tags haskell, hscolour, ruby, typo
no comments
no trackbacks

Posted by Tom Moertel
Tue, 31 Oct 2006 19:44:00 GMT
Last Tuesday, my friend Casey and I were hanging
out at Aldo Coffee. We planned on enjoying
some espresso, doing some work, and then heading over to the Pittsburgh Coding
Dojo, where we could hang out with
other geekly folks.
We ended up
not having enough time to go to the meeting, but we decided to hack
on the challenge problem anyway, using Aldo’s ever-handy free
wireless to access the Internet.
The Dojo problem was PragDave’s Kata Eleven – Sorting it
Out. (It’s short;
read it now.) We decided to use Haskell for our implementation
language.
In this post, I’ll walk through our coding session and explain how our
solution evolved. To better fit the session into a blog post, I
have removed a lot of back-and-forth micro iterations, and I have
edited some of the code for clarity.
The first part of the problem
The first part of the problem was “Sorting Balls.” The story: You
need to implement a “rack” to hold the balls drawn at random (without
replacement) from a bin containing sixty balls, numbered 0 to 59.
Regardless of the order in which the balls are added to the rack, you
need to present them in sorted order whenever you’re asked for them.
Upon reading this part of the challenge, a couple of thoughts sprung to mind:
- Because the range of balls is so small, the problem was begging for a solution based on a counting sort.
- Because the balls are uniquely numbered and drawn without replacement, we could even use a bit vector to represent counts.
Nevertheless, we decided to ignore these thoughts and implement a
more-general solution that would work for any (orderable) values,
not just small ranges of integers.
Sketching the interface
The first step, then, was to sketch out an interface. Our
interface mirrored the one from the problem statement but
was tweaked for Haskell:
mkRack :: Rack a
add :: Ord a => a -> Rack a -> Rack a
balls :: Rack a -> [a]
The function mkRack makes a new rack to hold values (“balls”) of
type a. It’s equivalent to Rack.new in Ruby.
The add function adds a ball to a rack. You give it a ball and a
rack, and it returns a new rack that is the same as the original rack
but also contains the ball. (If you’re accustomed to stateful
programming, this may seem weird. Why return a new rack instead of
modifying the original rack? Because, in Haskell, you can’t change
values: you can only create new values. At first, this constraint may
seem limiting, but after you get used to it, you’ll find it
empowering.)
Note: the Ord a qualification on the type signature of
add says that it will work for any type a whose values can be
ordered. The qualification is necessary because values of some types,
like IO actions, cannot be compared to see which are less than the
others.
The balls function is an “observer”: it lets you observe the balls
in a rack by returning them as an ordered list.
And that’s the interface.
With the interface sketched, we gave it meaning by defining its
properties.
Giving our interface meaning: defining properties using QuickCheck
QuickCheck is a
powerful, easy-to-use testing tool. Instead of checking test cases,
it checks properties – statements about what your code ought to do
in general.
The great thing about QuickCheck properties is that they are
testable documentation. They tell the world what your code
is supposed to do,
and they do so in a concise, formal language that just happens to be
easily readable by humans and automatically testable by computers.
To specify the desired properties of our Rack interface, we first had
to import QuickCheck:
Then, we defined our first property. It said, simply, that a new rack
must be empty when observed:
prop_New =
balls mkRack =~ []
Our second property said that, when you add a ball x to
a rack, the resulting rack must contain the same
balls as the original rack plus x:
prop_AddAddsElement rack x =
balls (add x rack) =~ (x : balls rack)
Both of the properties above rely upon a special, order-insensitive
equality test that we defined for lists of Int values:
(=~) :: [Int] -> [Int] -> Bool
xs =~ ys = sort xs == sort ys
Note that under this test, [1,2] “equals”
both [1,2] and [2,1], but it does not “equal”
any other values.
The reason we defined this operator was to help us specify the two
essential properties of add separately: (1) it must insert a ball
into a rack, and (2) the new ball’s position, when observed, must
preserve the rack’s ordering invariant. The previous property
definition used the =~ operator to specify the first of
these two properties. The next property we defined specified the
second:
prop_AddPreservesOrdering rack x =
isOrdered (balls rack) ==> isOrdered (balls (add x rack))
This definition specifies that, for all racks rack and all balls
x, if the balls in rack are ordered, the balls in the rack that
results from adding x to rack must also be ordered. If you
are familiar with proof by
induction, you’ll
know why we went this route. In short, if we can prove that this
property holds (and, trivially, that an empty rack is ordered), we can
prove that add preserves the ordering invariant.
To round out the property definition, we needed to define the isOrdered test:
isOrdered :: [Int] -> Bool
isOrdered xs = xs == sort xs
And those are the properties we needed to check the correctness
of our implementation. Of course, we still needed to write our
implementation, and we turned to that task next.
A simple, list-based Rack implementation
For our first implementation, we decided upon a drop-dead-simple
list-based representation. We would keep the elements of the list
in sorted order by inserting them into the correct positions when
add was called.
Here, then, was our code:
type Rack a = [a]
mkRack = []
add x xs = insertList x xs
balls = id
insertList :: Ord a => a -> [a] -> [a]
insertList x [] = [x]
insertList x (y:ys)
| x < y = x : y : ys
| otherwise = y : insertList x ys
That’s it.
We took our new implementation for a spin in GHCi:
*Rack> balls mkRack
[]
*Rack> balls (add 3 mkRack)
[3]
*Rack> balls (add 4 (add 3 mkRack))
[3,4]
*Rack> balls (add 1 (add 4 (add 3 mkRack)))
[1,3,4]
*Rack> balls (foldr add mkRack [4,2,6,3,-9,0,33,9])
[-9,0,2,3,4,6,9,33]
To really test our implementation, we asked QuickCheck to check its
properties:
*Rack> quickCheck prop_New
OK, passed 100 tests.
*Rack> quickCheck prop_AddAddsElement
OK, passed 100 tests.
*Rack> quickCheck prop_AddPreservesOrdering
OK, passed 100 tests.
I should point out that QuickCheck did not prove that our properties
held. Rather, it gathered evidence that we could use to argue that
our properties held. The evidence was that each of our properties’
claims was subjected to 100 randomly generated tests, and none of
the tests was able to disprove a claim.
Was this evidence sufficient for us to rest satisfied that our
implementation was correct? Given how simple our implementation
was, I felt that the evidence was sufficient. Casey agreed, and we moved on.
With the first implementation done, we decided to try a more-sophisticated
implementation.
Generalizing the interface
Since we were about to have multiple implementations, it made sense
for us to define a generalized interface that any “Rack-like”
implementation could use. For that, Haskell’s type classes were
perfect:
class Racklike a ra | ra -> a where
mkRack :: ra
add :: Ord a => a -> ra -> ra
balls :: ra -> [a]
The interface was essentially the same as before, except that the data
type behind the rack implementation was not given by a specific type
Rack a but rather by the type variable ra, which represents some
type of rack container for balls of type a.
Note that ra determines a. If, for example, you know that
the container type ra equals “a list of Int values,”
you know that a must equal Int. (To represent this
relationship, we used functional
dependencies,
a popular extension to the Haskell 98 standard.)
With the Racklike type class in place, we moved our list-based
implementation inside of the interface:
type ListRack a = [a]
instance Racklike a (ListRack a) where
mkRack = []
add = insertList
balls = id
Next, we modified our QuickCheck property definitions. Where before
it was fine to assume that we would be testing our single, list-based
implementation, now we needed to allow for testing other
implementation types. We did this by adding a rackType parameter to
our property definitions. We used the type, not the value, of this
parameter to determine the type of rack to test:
prop_New rackType =
balls (mkRack `asTypeOf` rackType) =~ []
prop_AddAddsElement rackType ballList x =
balls (add x rack) =~ (x : balls rack)
where
rack = rackFromList ballList `asTypeOf` rackType
prop_AddPreservesOrdering rackType ballList x =
isOrdered (balls rack) ==> isOrdered (balls (add x rack))
where
rack = rackFromList ballList `asTypeOf` rackType
Because we could no longer assume the rack would be represented
as a list of integers, we wrote rackFromList to convert such
a list into a rack:
rackFromList xs = foldr add mkRack xs
With these modifications in place, we re-ran our tests, specifying
(via type annotations) that we wanted to run them for the ListRack
implementation:
*Rack> quickCheck $ prop_New (undefined :: ListRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddAddsElement (undefined :: ListRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddPreservesOrdering (undefined :: ListRack Int)
OK, passed 100 tests.
A tree-based Rack implementation
Now that we were free to add additional implementation types,
we created one based on binary trees. We started by defining
the tree data type:
data Tree a
= Empty
| Root (Tree a) a (Tree a)
deriving (Ord, Eq, Show)
This definition says that a tree can be either empty or a root node.
A root node has a single value and left and right sub-trees.
Further, root nodes must satisfy an ordering invariant: if a root
node’s value is x, all of the values in its left subtree must be
less than x, and all of the values in its right subtree must be
greater than or equal to x. The data type doesn’t enforce this
invariant, so we would need to enforce it in our implementation.
Next, we wrote the basic functions for creating, adding elements to,
and observing our trees.
We needed to be able to create empty trees:
Inserting an element into a tree requires us to walk the tree and
append the element as a new leaf node in the correct location, being
mindful of our ordering invariant. Because our data structure is
inherently recursive, a recursive implementation was straightforward
to code:
insertTree x Empty = Root Empty x Empty
insertTree x (Root left y right)
| x < y = Root (insertTree x left) y right
| otherwise = Root left y (insertTree x right)
Note that we don’t try to ensure that the tree is balanced. The
problem statement says that the balls are randomly selected, and thus
we can expect our trees, on average, to be balanced naturally.
Next, we wrote the code to observe the elements of a tree.
We used a functional-programming idiom
for efficiently flattening a tree into a list:
elemsTree rx =
elemsTree' rx []
elemsTree' Empty = id
elemsTree' (Root left x right) =
elemsTree' left . (x :) . elemsTree' right
Finally, we defined a new tree-based rack type and declared
it to be an instance of the Racklike type class:
type TreeRack a = Tree a
instance Racklike a (TreeRack a) where
mkRack = emptyTree
add = insertTree
balls = elemsTree
With the implementation done, we took it for a test drive:
*Rack> add 1 mkRack :: TreeRack Int
Root Empty 1 Empty
*Rack> add 3 (add 1 mkRack) :: TreeRack Int
Root Empty 1 (Root Empty 3 Empty)
*Rack> balls (add 3 (add 1 mkRack) :: TreeRack Int)
[1,3]
Then, for the real test, we checked that our properties held for
TreeRacks:
*Rack> quickCheck $ prop_New (undefined :: TreeRack Int)
OK, passed 100 tests.
*Rack> quickCheck $ prop_AddAddsElement (undefined :: TreeRack Int)
OK, passed 100 tests.
quickCheck $ prop_AddPreservesOrdering (undefined :: TreeRack Int)
OK, passed 100 tests.
Satisfied with these results, we moved on to part two of the problem.
The second part of the problem
The second part of the problem was about sorting the letters within a
block of text, ignoring white space and punctuation, and converting
upper case letters into lower case: “Are there any ways to
perform this sort cheaply, and without using built-in libraries?”
Again, a counting sort seemed like an obvious ideal solution, but
we decided to recycle our existing code since we had to leave soon.
Because our Rack implementations were generic, they would work on
letters just as well as on numbers or other kinds of balls:
*Rack> balls (rackFromList "this is a test" :: TreeRack Char)
" aehiisssttt"
With our existing code already doing the hard work
for us, it was trivial to code up the letter-sorting function:
sortLetters xs =
balls (rackFromList letters :: TreeRack Char)
where
letters = [toLower x | x <- xs, isAlpha x]
(Note: Because of the nature of the problem, I interpreted the
question’s “without using built-in libraries” to mean “without
built-in sorting libraries.”)
We took the new function for a test drive, and it worked
as expected:
*Rack> sortLetters "This is a test, pal."
"aaehiilpsssttt"
And that ended our coding session.
Update: Tweaked the revised definition of the AddAddsElement
property for greater parallelism with the original.
Update 2007-03-03: Minor edits for clarity.
Posted in programming, functional programming, haskell, testing
Tags haskell, kata, quickcheck, sorting, testing
7 comments
no trackbacks
