PPW 2007: a twenty-ton can of programming whoop-ass

Posted by Tom Moertel Tue, 25 Sep 2007 23:04:00 GMT

I am on the planning committee for the Pittsburgh Perl Workshop. So far, it’s been an interesting ride. Last year was the first PPW, and it went surprisingly well. In the post-conference surveys, 94 percent of respondents said they wanted to come back for another PPW, so we committed ourselves to repeating the grueling conference-planning process for 2007. (Fact: making big commitments like this is much more likely to happen if you’re drinking beer at the time.)

Now a year has gone by, and PPW 2007 is only three weeks away. This year’s conference is 100% larger – two full days – and offers a new, much-asked-for option: a one-day introductory course to give programmers new to Perl a quick dose of the language so they can dive into the rest of the conference. This year’s conference also offers a full-length Hackathon for those who feel the urge to code at the conference.

The main attraction, however, is the conference’s wide array of technical talks. We have retained the same mix of industry and academic speakers that attendees said they liked so much last year. Indeed, our speaker list includes some of last year’s most fascinating speakers, as well as many new speakers drawn from the world of Perl. No matter what your interests are, you’ll find talks for you at PPW 2007. (I’m particularly interested in the talks on continuation-based web applications, the cool new stuff in Perl 5.10, and the Moose object system.)

All of this is to say: Do not miss PPW 2007! Where else are you going to find so many interesting people, so many fascinating talks, and so many opportunities to have fun and make friends while learning useful stuff, all for so little expense? (Regular admission is only $70, and students get a big discount.) Get your ticket now because over half of the seats are already gone.

I hope to see you there.

Posted in
Tags , , , ,
no comments
no trackbacks
Reddit Delicious

Bloglines doesn't handle inter-element white space properly

Posted by Tom Moertel Tue, 11 Sep 2007 16:51:00 GMT

If you’re reading my blog via Bloglines, you may have noticed that some of my posts look terrible, especially when they contain code snippets. I am sorry for that, but it’s not my fault. Bloglines doesn’t handle white space properly.

Here’s the more detailed explanation. When you request one of my feeds in, say, Atom format, you get back a bunch of XML that contains the most-recent posts from my blog. Each post is represented as lovingly crafted HTML, escaped per the Atom specs. When Bloglines gets its hands on this very same HTML, it attempts to scrub it nice and clean – get rid of any naughty bits, you know. And there’s nothing wrong with that. Except when the scrubbing goes horribly, horribly wrong. Which is exactly what happens when Bloglines encounters perfectly legitimate markup that represents syntax-highlighted code snippets.

What does Bloglines do then? It strips out all of the significant white space, turning each block of code into a single, mile-long, unbreakable line of NoSpaceText that forces your web browser to expand the page until it is wide enough to enshroud a small solar system. Then you are forced to scroll forever to read each line of the text column. Ugg.

More specifically, each syntax-highlighted code block is represented in HTML as a preformatted (PRE) text block. Each word in that block is wrapped in a SPAN element whose class attribute indicates the word’s role in the original source code. Keywords get one class, identifiers another, and so on. For example, the code “import List” might be represented as follows:

<span class="kwd">import</span> <span class="name">List</span>

But when Bloglines gets its hands on that markup, it strips out the whitespace between the SPAN elements:

<span class="kwd">import</span><span class="name">List</span>

Thus the markup renders as “importList” when it hits your web browser. Now imagine the same space-denuding bad behavior applied to all of the inter-element white space in a full-length block of code. That’s right, what you end up with is a single, insanely long LineOfUnbreakableText that your web browser chokes on. Again: Ugg.

The folks at Bloglines have had similar problems in the past, most of which have been fixed. I hope they fix this particular problem soon, too.

Until that time, however, you might want to consider other feed readers.

Posted in
Tags , , , , ,
4 comments
no trackbacks
Reddit Delicious

Updated cabal2rpm helps you make RPM packages from Haskell Cabal packages

Posted by Tom Moertel Sat, 08 Sep 2007 00:19:00 GMT

I just released an updated version of cabal2rpm, a small program (written in Perl) that creates RPM spec files from Cabal package descriptions. RPM is the software-packaging format used by several popular Linux distributions, including Red Hat and Fedora. Cabal is the packaging format used by the Haskell community to distribute software written in Haskell.

Bryan O’Sullivan’s cabal-rpm also creates spec files from Cabal packages. Unlike cabal2rpm, it is written in Haskell and directly interfaces with the Cabal libraries. Long term, it is the way to go. For now, however, cabal2rpm may be more convenient because it works out of the box. (To use cabal-rpm, you’ll first need to install the just-tagged Cabal 1.2.0 library, not yet in wide distribution.)

Posted in
Tags , ,
no comments
no trackbacks
Reddit Delicious

ClusterBy: a handy little function for the toolbox

Posted by Tom Moertel Sat, 01 Sep 2007 19:39:00 GMT

Via Reddit I found Mark Nelson’s post about a recent word puzzle from NPR’s Weekend Edition:

Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?

The puzzle is fairly straightforward to solve by hand (think about it), but let’s write a program to solve it. That will give us a convenient excuse to discuss a super-handy function I use all the time: clusterBy. In Haskell, it looks like this:

import Control.Arrow ((&&&))
import qualified Data.Map as M

clusterBy :: Ord b => (a -> b) -> [a] -> [[a]]
clusterBy f = M.elems . M.map reverse . M.fromListWith (++)
            . map (f &&& return)

What clusterBy does is group a list of values by their signatures, as computed by a given signature function f, and returns the groups in order of ascending signature. For example, we can cluster the words “the tan ant gets some fat” by length, by first letter, or by last letter just by changing the signature function we give to clusterBy:

*Main> let antwords = words "the tan ant gets some fat"

*Main> clusterBy length antwords
[["the","tan","ant","fat"],["gets","some"]]

*Main> clusterBy head antwords
[["ant"],["fat"],["gets"],["some"],["the","tan"]]

*Main> clusterBy last antwords
[["the","some"],["tan"],["gets"],["ant","fat"]]

If we use sort as the signature function, we can find anagrams:

*Main> clusterBy sort antwords
[["fat"],["tan","ant"],["gets"],["the"],["some"]]

And that brings us back to the original puzzle. To find the solution, we must consider each unique pair of state names to form a “word” and find the anagrams among a list of such “words.”

Assuming we are given a list of state names on standard input, one state per line, we can write the shell of our solution as follows:

main = mapM_ print . solve . lines =<< getContents

The shell delegates the real work to solve. It’s job is to compute the unique, 2-state combinations from the original list of states, and then find the anagrams among these combinations. As before, finding the anagrams is simply a matter of calling clusterBy with the right signature function. We also filter out the trivial results, which are not valid solutions:

solve = filter ((>1) . length) . clusterBy signature . ucombos
ucombos xs = [[x,y] | x <- xs, y <- xs, x < y]
signature = sort . filter isAlpha . concat   -- sort letters

That’s it. Now we can solve the puzzle by feeding our program a list of states:

$ runhaskell anagrams2.hs < states.txt
[["NORTH CAROLINA","SOUTH DAKOTA"],
 ["NORTH DAKOTA","SOUTH CAROLINA"]]

What a handy little function, that clusterBy.

Update: made clear that clusterBy returns clusters in order of ascending signature.

Update 2007-10-31: For more interesting discussion of clusterBy and the original puzzle from NPR, see Anders Pearson’s blog: A Simple Programming Puzzle Seen Through Three Different Lenses.

Posted in
Tags , , , ,
11 comments
no trackbacks
Reddit Delicious