A simple directory-tree printer in Haskell

Posted on February 22, 2007

Tags: haskell, trees, io, directory-tree-series

At a recent gathering, my friend Casey mentioned that he was learning a new programming language and, as a learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and prints out a tree representation for each:

$ tlist cheating-hangman

cheating-hangman
|-- CVS
|   |-- Entries
|   |-- Repository
|   `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl

I thought the problem sounded like fun, and so I wrote a small solution in Haskell. Let’s take a look.

First, some documentation by way of comments.

-- Tree list for directories
-- Tom Moertel <tom@moertel.com>
-- 2007-02-20
--
-- Compile:  ghc -O -o tlist --make tlist.hs
-- Usage:    ./tlist [dir...]

Next, we’ll start a new module and import a few libraries that will make our job easier.

module Main (main) where

import Control.Monad
import Data.List
import System.Directory
import System.Environment

Now we arrive at the first real code, the main function:

main = do
    args <- getArgs
    mapM_ tlist (if null args then ["."] else args)

The code gets the arguments from the command line, each representing a file or directory. Then it maps the tlist function over the list of arguments or, if the list is empty, over the default argument, which represents the current working directory. Because tlist has side effects (e.g., directory scanning and tree printing), it returns an IO action, and thus we must use a monadic flavor of map, mapM, to sequence its effects.

Next, tlist visits the file-system path it receives as an argument, handing off to the visit function, to which it passes some sensible initial values. We’ll examine what these values mean in a moment.

tlist path =
    visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path

The visit function, in turn, “visits” a node, printing its little portion of the tree. Before we look at that function, however, let’s think about drawing a single node.

Say we’re visiting the “Repository” node in the hierarchy:

cheating-hangman
|-- CVS
|   |-- Entries
|   |-- Repository    <--- consider just this line

    ^   ^
    A   B

To connect the node to the tree, we need to draw three things:

the leader, which connects the tree structure way above the node;
the arm, which connects the node’s siblings; and
the tie, which connects the node to the arm.

In this particular case, the leader is everything to the left of point A. Here it is “|”, but it will get longer as we descend deeper into the hierarchy. The arm is the single character at point A. It’s usually a vertical bar, but for the final sibling it’s rounded off into an elbow (a backquote in ASCII art). Finally the tie, which connects the name of the current node to the arm, is everything between points A and B. It’s always “—” except at the tree’s root, which needs no tie.

With the tree’s anatomy firmly in mind, we’re ready to visit a node and draw its contribution to the tree:

visit path leader tie arm node = do
    putStrLn (leader ++ arm ++ tie ++ node)
    visitChildren (path ++ "/" ++ node) (leader ++ extension)
  where
    extension = case arm of ""  -> ""; "`" -> "    "; _   -> "|   "

Let’s examine the arguments. Path is the path to the directory we’re listing, and node is the item we’re visiting within that directory. The remaining arguments are the tree-drawing parts necessary to connect this node to the tree.

The first line of the function’s body prints the node’s addition to the tree. The second line visits the node’s children, if any, updating the path and leader accordingly. The path is updated to point to the current node. The leader is extended to ensure that the current node’s unvisited siblings, if any, will be connected to the tree by the current node’s children, if any. In effect, the current node’s arm is projected into the children’s leader to make the connection.

Visiting the children is a little more complicated:

visitChildren path leader =
    whenM (doesDirectoryExist path) $ do
        contents <- getDirectoryContents path
            `catch` (\e -> return [show e])
        let visibles = sort . filter (`notElem` [".", ".."]) $ contents
            arms = replicate (length visibles - 1) "|" ++ ["`"]
        zipWithM_ (visit path leader "-- ") arms visibles

The whenM line tests to see if the path represents a directory. If it’s not, the remaining code is ignored. (Only directories have children to visit.)

The remaining code, if it does run, gets the directory’s contents (catching and reporting errors, if any), filters out the “.” and “..” directory entries, and stores the sorted, remaining entries in visibles. Then it prepares a list of arms: straight arms for all but the last entry, which gets an elbow. Finally, zipWithM “zips” the prepared arms and visible entries into one big, effectful computation that calls to the visit function to zip each arm with its corresponding entry. And that completes our algorithm.

One final tidbit: the handy whenM, which lamentably is not part of the Haskell standard libraries (yet), is defined like so:

whenM mtest ma = mtest >>= flip when ma

The complete code

Here’s the code, without the commentary:

module Main (main) where

import Control.Monad
import Data.List
import System.Directory
import System.Environment

main = do
    args <- getArgs
    mapM_ tlist (if null args then ["."] else args)

tlist path =
    visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path

visit path leader tie arm node = do
    putStrLn (leader ++ arm ++ tie ++ node)
    visitChildren (path ++ "/" ++ node) (leader ++ extension)
  where
    extension = case arm of ""  -> ""; "`" -> "    "; _   -> "|   "

visitChildren path leader =
    whenM (doesDirectoryExist path) $ do
        contents <- getDirectoryContents path
            `catch` (\e -> return [show e])
        let visibles = sort . filter (`notElem` [".", ".."]) $ contents
            arms = replicate (length visibles - 1) "|" ++ ["`"]
        zipWithM_ (visit path leader "-- ") arms visibles

whenM mtest ma = mtest >>= flip when ma

Next time: a refactoring

Our program does two things: it scans directory hierarchies and it prints trees. You may have noticed, however, that those two things are intertwined in the code. There are times when we might want to scan a directory hierarchy or print a tree, but our code can’t be reused for those purposes. It can only scan and print as one indivisible action.

Wouldn’t it be better to separate the directory-scanning code from the tree-printing code so that each formed a reusable module? Next time, we’ll do just that. Haskell’s lazy magic makes it easy and efficient.

See you then.

Update 2007-02-23: If you’re interested in coding up an implementation, please don’t forget to round off corners and prune unnecessary connecting lines. For example:

dir1
|-- file1
|-- subdir1
|   |-- file11
|   |-- file12
|   `-- subdir13
|       `-- file131
`-- zfile