At a recent gathering, my friend Casey mentioned that he was learning a new programming language and, as a learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and prints out a tree representation for each:
$ tlist cheating-hangman
cheating-hangman
|-- CVS
| |-- Entries
| |-- Repository
| `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl
I thought the problem sounded like fun, and so I wrote a small solution in Haskell. Let’s take a look.
First, some documentation by way of comments.
-- Tree list for directories
-- Tom Moertel <tom@moertel.com>
-- 2007-02-20
--
-- Compile: ghc -O -o tlist --make tlist.hs
-- Usage: ./tlist [dir...]
Next, we’ll start a new module and import a few libraries that will make our job easier.
module Main (main) where
import Control.Monad
import Data.List
import System.Directory
import System.Environment
Now we arrive at the first real code, the main function:
= do
main <- getArgs
args mapM_ tlist (if null args then ["."] else args)
The code gets the arguments from the command line, each representing a
file
or directory. Then it maps the tlist function over the list of
arguments or, if the list is empty, over the default argument, which
represents the current working directory. Because tlist has side
effects (e.g., directory scanning and tree printing), it returns an
IO
action, and thus we must use a monadic flavor of map,
mapM, to sequence its effects.
Next, tlist visits the file-system path it receives as an argument, handing off to the visit function, to which it passes some sensible initial values. We’ll examine what these values mean in a moment.
=
tlist path if "/" `isPrefixOf` path then "" else ".") "" "" "" path visit (
The visit function, in turn, “visits” a node, printing its little portion of the tree. Before we look at that function, however, let’s think about drawing a single node.
Say we’re visiting the “Repository” node in the hierarchy:
cheating-hangman
|-- CVS
| |-- Entries
| |-- Repository <--- consider just this line
^ ^
A B
To connect the node to the tree, we need to draw three things:
- the leader, which connects the tree structure way above the node;
- the arm, which connects the node’s siblings; and
- the tie, which connects the node to the arm.
In this particular case, the leader is everything to the left of point
A. Here it is “|”
, but it will get
longer as we descend deeper into the hierarchy. The arm is the single
character at point A. It’s usually a vertical bar,
but for the final sibling it’s rounded off into an elbow (a backquote
in ASCII art). Finally the tie, which connects the name of the
current node to the arm, is everything between points
A and B. It’s always
“—”
except at the tree’s root, which needs no tie.
With the tree’s anatomy firmly in mind, we’re ready to visit a node and draw its contribution to the tree:
= do
visit path leader tie arm node putStrLn (leader ++ arm ++ tie ++ node)
++ "/" ++ node) (leader ++ extension)
visitChildren (path where
= case arm of "" -> ""; "`" -> " "; _ -> "| " extension
Let’s examine the arguments. Path is the path to the directory we’re listing, and node is the item we’re visiting within that directory. The remaining arguments are the tree-drawing parts necessary to connect this node to the tree.
The first line of the function’s body prints the node’s addition to the tree. The second line visits the node’s children, if any, updating the path and leader accordingly. The path is updated to point to the current node. The leader is extended to ensure that the current node’s unvisited siblings, if any, will be connected to the tree by the current node’s children, if any. In effect, the current node’s arm is projected into the children’s leader to make the connection.
Visiting the children is a little more complicated:
=
visitChildren path leader $ do
whenM (doesDirectoryExist path) <- getDirectoryContents path
contents `catch` (\e -> return [show e])
let visibles = sort . filter (`notElem` [".", ".."]) $ contents
= replicate (length visibles - 1) "|" ++ ["`"]
arms "-- ") arms visibles zipWithM_ (visit path leader
The whenM line tests to see if the path represents a directory. If it’s not, the remaining code is ignored. (Only directories have children to visit.)
The remaining code, if it does run, gets the directory’s contents
(catching and reporting errors, if any), filters out the
“.”
and “..”
directory entries, and stores
the sorted, remaining entries in visibles. Then it prepares a list
of arms: straight arms for all but the last entry, which gets an
elbow. Finally, zipWithM “zips” the prepared arms and
visible entries into one big, effectful computation that calls
to the visit function to zip each arm with its
corresponding entry. And that completes our algorithm.
One final tidbit: the handy whenM, which lamentably is not part of the Haskell standard libraries (yet), is defined like so:
= mtest >>= flip when ma whenM mtest ma
The complete code
Here’s the code, without the commentary:
module Main (main) where
import Control.Monad
import Data.List
import System.Directory
import System.Environment
= do
main <- getArgs
args mapM_ tlist (if null args then ["."] else args)
=
tlist path if "/" `isPrefixOf` path then "" else ".") "" "" "" path
visit (
= do
visit path leader tie arm node putStrLn (leader ++ arm ++ tie ++ node)
++ "/" ++ node) (leader ++ extension)
visitChildren (path where
= case arm of "" -> ""; "`" -> " "; _ -> "| "
extension
=
visitChildren path leader $ do
whenM (doesDirectoryExist path) <- getDirectoryContents path
contents `catch` (\e -> return [show e])
let visibles = sort . filter (`notElem` [".", ".."]) $ contents
= replicate (length visibles - 1) "|" ++ ["`"]
arms "-- ") arms visibles
zipWithM_ (visit path leader
= mtest >>= flip when ma whenM mtest ma
Next time: a refactoring
Our program does two things: it scans directory hierarchies and it prints trees. You may have noticed, however, that those two things are intertwined in the code. There are times when we might want to scan a directory hierarchy or print a tree, but our code can’t be reused for those purposes. It can only scan and print as one indivisible action.
Wouldn’t it be better to separate the directory-scanning code from the tree-printing code so that each formed a reusable module? Next time, we’ll do just that. Haskell’s lazy magic makes it easy and efficient.
See you then.
Update 2007-02-23: If you’re interested in coding up an implementation, please don’t forget to round off corners and prune unnecessary connecting lines. For example:
dir1
|-- file1
|-- subdir1
| |-- file11
| |-- file12
| `-- subdir13
| `-- file131
`-- zfile