A simple directory-tree printer in Haskell

Posted by Tom Moertel Thu, 22 Feb 2007 21:30:00 GMT

At a recent gathering, my friend Casey mentioned that he was learning a new programming language and, as a learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and prints out a tree representation for each:

$ tlist cheating-hangman

cheating-hangman
|-- CVS
|   |-- Entries
|   |-- Repository
|   `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl

I thought the problem sounded like fun, and so I wrote a small solution in Haskell. Let’s take a look.

First, some documentation by way of comments.

-- Tree list for directories
-- Tom Moertel <tom@moertel.com>
-- 2007-02-20
--
-- Compile:  ghc -O -o tlist --make tlist.hs
-- Usage:    ./tlist [dir...]

Next, we’ll start a new module and import a few libraries that will make our job easier.

module Main (main) where

import Control.Monad
import Data.List
import System.Directory
import System.Environment

Now we arrive at the first real code, the main function:

main = do
    args <- getArgs
    mapM_ tlist (if null args then ["."] else args)
The code gets the arguments from the command line, each representing a file or directory. Then it maps the tlist function over the list of arguments or, if the list is empty, over the default argument, which represents the current working directory. Because tlist has side effects (e.g., directory scanning and tree printing), it returns an IO action, and thus we must use a monadic flavor of map, mapM_, to sequence its effects.

Next, tlist visits the file-system path it receives as an argument, handing off to the visit function, to which it passes some sensible initial values. We’ll examine what these values mean in a moment.

tlist path =
    visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path

The visit function, in turn, “visits” a node, printing its little portion of the tree. Before we look at that function, however, let’s think about drawing a single node.

Say we’re visiting the “Repository” node in the hierarchy:

cheating-hangman
|-- CVS
|   |-- Entries
|   |-- Repository    <--- consider just this line

    ^   ^
    A   B

To connect the node to the tree, we need to draw three things:

  1. the leader, which connects the tree structure way above the node;
  2. the arm, which connects the node’s siblings; and
  3. the tie, which connects the node to the arm.

In this particular case, the leader is everything to the left of point A. Here it is "| ", but it will get longer as we descend deeper into the hierarchy. The arm is the single character at point A. It’s usually a vertical bar, but for the final sibling it’s rounded off into an elbow (a backquote in ASCII art). Finally the tie, which connects the name of the current node to the arm, is everything between points A and B. It’s always "-- " except at the tree’s root, which needs no tie.

With the tree’s anatomy firmly in mind, we’re ready to visit a node and draw its contribution to the tree:

visit path leader tie arm node = do
    putStrLn (leader ++ arm ++ tie ++ node)
    visitChildren (path ++ "/" ++ node) (leader ++ extension)
  where
    extension = case arm of ""  -> ""; "`" -> "    "; _   -> "|   "

Let’s examine the arguments. Path is the path to the directory we’re listing, and node is the item we’re visiting within that directory. The remaining arguments are the tree-drawing parts necessary to connect this node to the tree.

The first line of the function’s body prints the node’s addition to the tree. The second line visits the node’s children, if any, updating the path and leader accordingly. The path is updated to point to the current node. The leader is extended to ensure that the current node’s unvisited siblings, if any, will be connected to the tree by the current node’s children, if any. In effect, the current node’s arm is projected into the children’s leader to make the connection.

Visiting the children is a little more complicated:

visitChildren path leader =
    whenM (doesDirectoryExist path) $ do
        contents <- getDirectoryContents path
            `catch` (\e -> return [show e])
        let visibles = sort . filter (`notElem` [".", ".."]) $ contents
            arms = replicate (length visibles - 1) "|" ++ ["`"]
        zipWithM_ (visit path leader "-- ") arms visibles

The whenM line tests to see if the path represents a directory. If it’s not, the remaining code is ignored. (Only directories have children to visit.)

The remaining code, if it does run, gets the directory’s contents (catching and reporting errors, if any), filters out the "." and ".." directory entries, and stores the sorted, remaining entries in visibles. Then it prepares a list of arms: straight arms for all but the last entry, which gets an elbow. Finally, zipWithM_ “zips” the prepared arms and visible entries into one big, effectful computation that calls (recursively) to the visit function to zip each arm with its corresponding entry. And that completes our algorithm.

One final tidbit: the handy whenM, which lamentably is not part of the Haskell standard libraries (yet), is defined like so:

whenM mtest ma = mtest >>= flip when ma

The complete code

Here’s the code, without the commentary:

module Main (main) where

import Control.Monad
import Data.List
import System.Directory
import System.Environment

main = do
    args <- getArgs
    mapM_ tlist (if null args then ["."] else args)

tlist path =
    visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path

visit path leader tie arm node = do
    putStrLn (leader ++ arm ++ tie ++ node)
    visitChildren (path ++ "/" ++ node) (leader ++ extension)
  where
    extension = case arm of ""  -> ""; "`" -> "    "; _   -> "|   "

visitChildren path leader =
    whenM (doesDirectoryExist path) $ do
        contents <- getDirectoryContents path
            `catch` (\e -> return [show e])
        let visibles = sort . filter (`notElem` [".", ".."]) $ contents
            arms = replicate (length visibles - 1) "|" ++ ["`"]
        zipWithM_ (visit path leader "-- ") arms visibles

whenM mtest ma = mtest >>= flip when ma

Next time: a refactoring

Our program does two things: it scans directory hierarchies and it prints trees. You may have noticed, however, that those two things are intertwined in the code. There are times when we might want to scan a directory hierarchy or print a tree, but our code can’t be reused for those purposes. It can only scan and print as one indivisible action.

Wouldn’t it be better to separate the directory-scanning code from the tree-printing code so that each formed a reusable module? Next time, we’ll do just that. Haskell’s lazy magic makes it easy and efficient.

See you then.

Update 2007-02-23: If you’re interested in coding up an implementation, please don’t forget to round off corners and prune unnecessary connecting lines. For example:
dir1
|-- file1
|-- subdir1
|   |-- file11
|   |-- file12
|   `-- subdir13
|       `-- file131
`-- zfile

Posted in , ,
Tags , , ,
7 comments
no trackbacks
Reddit Delicious

Comments

  1. Dru Nelson said 1 day later:

    Great, real-world, simple example that really puts across the power of Haskell.

  2. petekaz said 1 day later:

    Tom I love your Haskell articles!! I’ve started to learn Haskell and really appreciate your tutorials as they pack so much information in them! Keep them coming!

  3. Nick said 4 days later:

    Good article, I like it. Although I wonder how you will treat the laziness being an obstacle. I presume it will be intimate massage with the profiler and mumba-youmba dances around !, seq and $! :-)

  4. Kurt Hutchinson said 6 days later:

    Nice article, and nice little puzzle. The version I wrote is structured like your refactoring suggestion: a data structure for the tree, and a function that recurses the data structure to create output.

    One minor quibble: your version doesn’t check if the command line arguments are existing files before displaying them.

  5. Andy said 116 days later:

    Hi Tom! Thanks for this great article! I’m just getting into Haskell myself, and I’m very impressed by its elegance.

    Oh, just a quick question… I was wondering what license the code in the article uses. Is it public-domain? - Andy

  6. Tom Moertel said 116 days later:

    Andy, unless otherwise specified, my code on this site is licensed under the GNU GPL.

    Cheers,
    Tom

  7. Andy said 117 days later:

    Hi –

    Ok - thanks!

    - Andy

Trackbacks

Use the following link to trackback from your own site:
http://blog.moertel.com/articles/trackback/387

(leave url/email »)

   Comment Markup Help Preview comment