Directory-tree printing in Haskell, part three: lazy I/O

Posted by Tom Moertel Wed, 28 Mar 2007 19:40:00 GMT

This article is part three in a series on introductory Haskell programming. In the first article, we wrote a small program to recursively scan file-system directories and print their contents as ASCII-art trees. In the second article, we refactored the program to make its logic more reusable by separating the directory-scanning logic from the tree-printing logic. In this article, we will address a shortcoming of the refactored version: It must scan directory hierarchies completely before printing their trees, i.e., it must scan and then print, when doing both simultaneously is both more efficient and more user friendly.

Recall from the previous article that our directory-printing program is factored into three pieces of logic:

  1. fsTraverse, which traverses a file-system hierarchy and returns a tree data structure;
  2. showTree, which converts a tree into lovingly crafted ASCII art; and
  3. traverseAndPrint, which prints the tree for a file-system hierarchy by using the first two pieces of logic.

The types of the functions are as follows:

fsTraverse       :: Path -> DentName -> IO DirTree
showTree         :: Tree String -> String
traverseAndPrint :: Path -> IO ()

Note that showTree is a pure function, but the other two return IO actions that may have side effects.

Within traverseAndPrint, fsTraverse and showTree are combined into a composite IO action by the =<< combinator:

putStr . showTree =<< fsTraverse root path

The sequencing semantics of Haskell’s IO monad forces all of the effects of fsTraverse to complete before any following effects can begin. To better understand these sequencing semantics, let’s consider a simple example.

The IO-monad code,

a >> b

can loosely be interpreted as running the action a, which forces its side effects to occur, and then running the action b, which forces its side effects to occur.

In reality, a and b are not actions. They are functions. Like all Haskell functions, they are pure and have no side effects. It’s just that a and b return values that represent actions, and those actions may have side effects, and the semantics of the IO monad guarantee the ordering of those effects (should the actions end up being connected to the runtime’s top-level IO action and executed). If you think that’s weird, hold that thought. For now, all that’s important is that, if the composite action represented by the expression (a >> b) is executed, the effects of a, regardless of how complex, will be executed before the effects of b.

Thus if a represents building a tree by recursively scanning a file-system hierarchy, the entire tree must be built before b ever gets a chance to do its thing. For our particular application, however, that particular sequencing is suboptimal. We know from our earlier, monolithic implementation that the file-system hierarchy can be scanned and printed simultaneously, which is more efficient. Ideally, then, our refactored implementation should be just as efficient.

In this article, we will look at one way to maintain the clean, logical separation of the a part from the b part while allowing the parts’ effects to be interleaved for efficiency. We will use an extension to the Haskell language to make the directory-scanning action lazy so that it builds the tree as the tree is consumed.

Ready? Let’s dive in.

Read more...

Posted in
Tags , , , , ,
9 comments
no trackbacks
Reddit Delicious

Directory-tree printing in Haskell, part two: refactoring

Posted by Tom Moertel Wed, 07 Mar 2007 21:04:00 GMT

In my previous article on writing a simple directory-tree printer in Haskell, I wrote a small program to recursively scan file-system directories and print their contents as ASCII-art trees. The program made for an approachable example of how to use Haskell for “imperative” tasks, but it has a few problems.

First, the directory-scanning logic and tree-printing logic are intertwined. Neither is reusable. Second, both bits of logic are rigid, specialized for this particular task. Even if you could reuse them, you wouldn’t want to.

In this article, the second in a series, we will explore ways to make our original code more reusable. We will separate the directory scanning from the tree printing, harness the power of some old friends from Haskell’s libraries, and think about the costs and benefits of our changes.

The plan

Recall our original directory tree–listing solution, the core of which I will reprint below:

tlist path =
    visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path

visit path leader tie arm node = do
    putStrLn (leader ++ arm ++ tie ++ node)
    visitChildren (path ++ "/" ++ node) (leader ++ extension)
  where
    extension = case arm of ""  -> ""; "`" -> "    "; _   -> "|   "

visitChildren path leader =
    whenM (doesDirectoryExist path) $ do
        contents <- getDirectoryContents path
            `catch` (\e -> return [show e])
        let visibles = sort . filter (`notElem` [".", ".."]) $ contents
            arms = replicate (length visibles - 1) "|" ++ ["`"]
        zipWithM_ (visit path leader "-- ") arms visibles

The tlist function kicks off the process for a particular file-system path, handing off to visit which recursively descends the directory tree from the root node. The visit function calls visitChildren to expand the subtree, if any, for each node visited. The visitChildren function, in turn, calls back to visit to repeat the process for each child in the subtree. In effect, we are traversing the tree rooted at path, printing each node in passing.

To separate the traversal part from the printing part, we will introduce a tree data structure. The file system–traversal code will emit a tree, and the tree-showing code will consume a tree. We will rewrite our old tlist function, which we might as well rename to the more descriptive traverseAndPrint, to glue the two pieces together with the tree serving as glue:

traverseAndPrint :: Path -> IO ()
traverseAndPrint path = do
    tree <- fsTraverse root path
    putStrLn (showTree tree)
  where
    root = if "/" `isPrefixOf` path then "" else "."

That’s the plan. Now let’s carry it out.

Read more...

Posted in
Tags , , , ,
8 comments
no trackbacks
Reddit Delicious

A simple directory-tree printer in Haskell

Posted by Tom Moertel Thu, 22 Feb 2007 21:30:00 GMT

At a recent gathering, my friend Casey mentioned that he was learning a new programming language and, as a learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and prints out a tree representation for each:

$ tlist cheating-hangman

cheating-hangman
|-- CVS
|   |-- Entries
|   |-- Repository
|   `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl

I thought the problem sounded like fun, and so I wrote a small solution in Haskell. Let’s take a look.

Read more...

Posted in , ,
Tags , , ,
10 comments
no trackbacks
Reddit Delicious