A simple directory-tree printer in Haskell
Posted by Tom Moertel Thu, 22 Feb 2007 21:30:00 GMT
At a recent gathering, my friend Casey mentioned that he was learning a new programming language and, as a learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and prints out a tree representation for each:
$ tlist cheating-hangman
cheating-hangman
|-- CVS
| |-- Entries
| |-- Repository
| `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl
I thought the problem sounded like fun, and so I wrote a small solution in Haskell. Let’s take a look.
First, some documentation by way of comments.
-- Tree list for directories
-- Tom Moertel <tom@moertel.com>
-- 2007-02-20
--
-- Compile: ghc -O -o tlist --make tlist.hs
-- Usage: ./tlist [dir...]
Next, we’ll start a new module and import a few libraries that will make our job easier.
module Main (main) where
import Control.Monad
import Data.List
import System.Directory
import System.Environment
Now we arrive at the first real code, the main function:
main = do
args <- getArgs
mapM_ tlist (if null args then ["."] else args)
IO action, and thus we must use a monadic flavor of map,
mapM_, to sequence its effects.
Next, tlist visits the file-system path it receives as an argument, handing off to the visit function, to which it passes some sensible initial values. We’ll examine what these values mean in a moment.
tlist path =
visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path
The visit function, in turn, “visits” a node, printing its little portion of the tree. Before we look at that function, however, let’s think about drawing a single node.
Say we’re visiting the “Repository” node in the hierarchy:
cheating-hangman
|-- CVS
| |-- Entries
| |-- Repository <--- consider just this line
^ ^
A B
To connect the node to the tree, we need to draw three things:
- the leader, which connects the tree structure way above the node;
- the arm, which connects the node’s siblings; and
- the tie, which connects the node to the arm.
In this particular case, the leader is everything to the left of point
A. Here it is "| ", but it will get
longer as we descend deeper into the hierarchy. The arm is the single
character at point A. It’s usually a vertical bar,
but for the final sibling it’s rounded off into an elbow (a backquote
in ASCII art). Finally the tie, which connects the name of the
current node to the arm, is everything between points
A and B. It’s always "--
" except at the tree’s root, which needs no tie.
With the tree’s anatomy firmly in mind, we’re ready to visit a node and draw its contribution to the tree:
visit path leader tie arm node = do
putStrLn (leader ++ arm ++ tie ++ node)
visitChildren (path ++ "/" ++ node) (leader ++ extension)
where
extension = case arm of "" -> ""; "`" -> " "; _ -> "| "
Let’s examine the arguments. Path is the path to the directory we’re listing, and node is the item we’re visiting within that directory. The remaining arguments are the tree-drawing parts necessary to connect this node to the tree.
The first line of the function’s body prints the node’s addition to the tree. The second line visits the node’s children, if any, updating the path and leader accordingly. The path is updated to point to the current node. The leader is extended to ensure that the current node’s unvisited siblings, if any, will be connected to the tree by the current node’s children, if any. In effect, the current node’s arm is projected into the children’s leader to make the connection.
Visiting the children is a little more complicated:
visitChildren path leader =
whenM (doesDirectoryExist path) $ do
contents <- getDirectoryContents path
`catch` (\e -> return [show e])
let visibles = sort . filter (`notElem` [".", ".."]) $ contents
arms = replicate (length visibles - 1) "|" ++ ["`"]
zipWithM_ (visit path leader "-- ") arms visibles
The whenM line tests to see if the path represents a directory. If it’s not, the remaining code is ignored. (Only directories have children to visit.)
The remaining code, if it does run, gets the directory’s contents
(catching and reporting errors, if any), filters out the
"." and ".." directory entries, and stores
the sorted, remaining entries in visibles. Then it prepares a list
of arms: straight arms for all but the last entry, which gets an
elbow. Finally, zipWithM_ “zips” the prepared arms and
visible entries into one big, effectful computation that calls
(recursively) to the visit function to zip each arm with its
corresponding entry. And that completes our algorithm.
One final tidbit: the handy whenM, which lamentably is not part of the Haskell standard libraries (yet), is defined like so:
whenM mtest ma = mtest >>= flip when ma
The complete code
Here’s the code, without the commentary:
module Main (main) where
import Control.Monad
import Data.List
import System.Directory
import System.Environment
main = do
args <- getArgs
mapM_ tlist (if null args then ["."] else args)
tlist path =
visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path
visit path leader tie arm node = do
putStrLn (leader ++ arm ++ tie ++ node)
visitChildren (path ++ "/" ++ node) (leader ++ extension)
where
extension = case arm of "" -> ""; "`" -> " "; _ -> "| "
visitChildren path leader =
whenM (doesDirectoryExist path) $ do
contents <- getDirectoryContents path
`catch` (\e -> return [show e])
let visibles = sort . filter (`notElem` [".", ".."]) $ contents
arms = replicate (length visibles - 1) "|" ++ ["`"]
zipWithM_ (visit path leader "-- ") arms visibles
whenM mtest ma = mtest >>= flip when ma
Next time: a refactoring
Our program does two things: it scans directory hierarchies and it prints trees. You may have noticed, however, that those two things are intertwined in the code. There are times when we might want to scan a directory hierarchy or print a tree, but our code can’t be reused for those purposes. It can only scan and print as one indivisible action.
Wouldn’t it be better to separate the directory-scanning code from the tree-printing code so that each formed a reusable module? Next time, we’ll do just that. Haskell’s lazy magic makes it easy and efficient.
See you then.
dir1
|-- file1
|-- subdir1
| |-- file11
| |-- file12
| `-- subdir13
| `-- file131
`-- zfile
readers
Great, real-world, simple example that really puts across the power of Haskell.
Tom I love your Haskell articles!! I’ve started to learn Haskell and really appreciate your tutorials as they pack so much information in them! Keep them coming!
Good article, I like it. Although I wonder how you will treat the laziness being an obstacle. I presume it will be intimate massage with the profiler and mumba-youmba dances around !, seq and $! :-)
Nice article, and nice little puzzle. The version I wrote is structured like your refactoring suggestion: a data structure for the tree, and a function that recurses the data structure to create output.
One minor quibble: your version doesn’t check if the command line arguments are existing files before displaying them.
Hi Tom! Thanks for this great article! I’m just getting into Haskell myself, and I’m very impressed by its elegance.
Oh, just a quick question… I was wondering what license the code in the article uses. Is it public-domain? - Andy
Andy, unless otherwise specified, my code on this site is licensed under the GNU GPL.
Cheers,
Tom
Hi –
- Andy
Hello Tom, Thanks for explaining the concepts! Great article and great code!
One problem: On my Linux computer (Kubuntu), in both Firefox and Konqueror the code display gets cut off, (i.e. $ contents is not seen) as it gets overwritten by sidebar with blog categories, etc.
So, instead of seeing the line: let visibles = sort . filter (`notElem` [”.”, ”..”]) $ contents
I see let visibles = sort . filter (`notElem` [”.”, ”..”])
So, this left me puzzling as to accuracy of the code until I did view source.
People who might have the same problem as me, i.e. if you see code that is cut off, please use this trick: Select segment of page including the code, copy and paste into a plain text editor such as, Kate (or Notepad) and you will see the whole code. If your browser can display as plain-text that might be useful also.
Please include types of all functions for a better understanding. Thanks, Bastl.