R tips and tricks: Producing smooth bitmap plots

Posted by Tom Moertel Sun, 26 Aug 2007 01:56:00 GMT

The R statistics system can produce first-class data visualizations, commonly known as plots. Internally, plots are represented in an abstract graphics format that can be rendered on any of R’s wide range of graphics “devices” to produce concrete output – windows, bitmap files, PostScript files, PDF files, and others.

The bitmap formats, such as PNG, are preferred for posting plots online because of their widespread support by web browsers. The default bitmap-rendering devices in R, unfortunately, produce graphics that look a little too “bitmapped” for modern web tastes. Here, for example, is a plot rendered by R’s “png” device:

Plot rendered via R's PNG device

There’s nothing technically wrong with the plot, but it looks out of place on a web page. That’s because modern web browsers use font-smoothing and anti-aliasing techniques to render just about everything else on the page. Against this clean, un-jagged backdrop, the oh-so-bitmapped plot looks like a throwback to a previous era.

Happily, we can produce clean, anti-aliased R plots with a little help. Here’s the earlier plot, anti-aliased:

Plot rendered via R's PDF device, then post-processed

To produce the anti-aliased plot, I used R to produce a PDF file. Then I rendered the PDF file into a PNG image at 300 dpi using Ghostscript. Finally, I scaled the 300-dpi image down to screen resolution, producing a high-quality, anti-aliased result.

Here’s the recipe in detail.

First, I define an R function called pdfit that takes an abstract graphics object and makes a PDF-file rendering of it, using my preferred graphics-device settings:

require("lattice")

pdfit <- function(f, ...) {
  trellis.device(dev=pdf, theme="col.whitebg", ...);
  print(f);
  dev.off()
}

Then, when I create a plot I want to publish, I use pdfit to render it into a PDF file:

P.img <- xyplot( subs.low + subs.high ~ date, ... )

pdfit(P.img, file="image-downloads.pdf")  # render plot into PDF file

Finally, I use Ghostscript and ImageMagick to convert the PDF file into a high-quality, anti-aliased PNG file. (I keep both formats: the PDF file is best for publishing in printed papers, and the PNG file is best for posting online.) I use a simple Makefile to automate the process of converting the PDF files into PNG files:

# Makefile (GNU make)

pdfs := $(wildcard *.pdf)
pngs := $(pdfs:.pdf=.png)

all: $(pngs)
.PHONY: all

%.png: %.pdf
    gs -dSAFTER -dBATCH -dNOPAUSE -sDEVICE=png16m \
       -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -r300 \
       -dBackgroundColor='16#ffffff' \
       -sOutputFile=$@ > /dev/null \
       $< && \
    mogrify -resize 500 $@

With this Makefile in my graphics directory, just a single “make” command is all it takes to convert my PDF images into anti-aliased PNG files, ready to post online.

And that’s it.

Do you have any tips or tricks for making good-looking graphics with R? If so, please do share.

Update: There is one downside to the sexy, anti-aliased plots: they are not as compressible as the old-style jagged plots. For the images above, for instance, the anti-aliased PNG file weighs in at 45 KB, but the original PNG file is a feathery 4.7 KB. So, if bandwidth is precious to you – or you’re planning on getting Slashdotted – you might want to stick with the jaggies.

Posted in
Tags , , , , ,
10 comments
no trackbacks
Reddit Delicious

Fun with statistics: estimating blog readership (a do-it-yourself recipe)

Posted by Tom Moertel Thu, 23 Aug 2007 01:34:00 GMT

As everybody knows, statistics is fun. Is there anything cooler than crushing a heap of seemingly uninteresting numbers into gleaming jewels of meaning? Of course not! Models, data-visualization plots, and fat data sets are way cool. So, let’s find an excuse to play with them.

Here’s an excuse – I mean, an important and highly relevant question that many of us share: How many people actually read our blogs? To answer the question, we will need to use statistics, data, and cool plots. Further, if you’ve got the raw data for your blog, you can follow along with your own analysis. Even more fun!

We’ll start with a simple inspection of common web-log data, using command-line tools. After developing a rough understanding of what useful information we can extract, we’ll analyze the raw data using a series of successively more sophisticated techniques. In the end, we will derive a simple formula for estimating readership from easily obtainable data.

Sound good? Then let’s get rocking.

But first, a preemptive strike on would-be poo-pooers: I know all about FeedBurner. I know they will track my blog’s subscribers and use their mystical powers to infer the number of “real” subscribers I have. I know it’s all so easy. But easy isn’t the point. I want to understand what’s going on. Just taking somebody’s word for it isn’t nearly as satisfying as figuring it out yourself – nor as fun.

OK. For real this time, let’s get rocking.

Read more...

Posted in
Tags , , , ,
5 comments
no trackbacks
Reddit Delicious

Pittsburgh Perl Workshop 2007: Don't miss your chance to speak!

Posted by Tom Moertel Mon, 20 Aug 2007 23:18:00 GMT

This year’s Pittsburgh Perl Workshop is shaping up to be uber-techno-awesome. This year, it’s two big days of lively technical talks and full-force Perl festiveness. Yes, come October, programmers of all stripes will gather in Pittsburgh over the weekend of the 13th to grab a slice of the fun. A big slice. And you – yes you, my friend – should be there.

Lots of interesting talks are flowing in, but it’s not too late to grab a speaking slot. If you have anything interesting to say about Perl, now is your time. 20- and 50-minute slots are available. To claim one, just go to pghpw.org and submit a talk proposal. It’s easy. But act now, before it’s too late!

If you have any interest in Perl, you’ll want to be at PPW 2007, and if you have anything to say about Perl, you’ll definitely want to speak at PPW 2007.

Don’t miss your opportunity. Seize the day!

Posted in
Tags , , ,
no comments
no trackbacks
Reddit Delicious

Seven signs YOU may have created a Gratuitous Domain Specific Language

Posted by Tom Moertel Sat, 18 Aug 2007 17:01:00 GMT

Like chromatic, I have watched the recent irrational exuberance for domain-specific languages (DSLs) with bewilderment. In certain quarters of the programming universe, it seems that creating DSLs is nearly a rite of passage. The problem is, more and more of these DSLs appear to have been created mainly because, well, DSLs are cool these days, even if less “novel” solutions probably would have been more sensible.

Whereas chromatic unhesitatingly confronted the madness head-on, I have so far managed to avoid the fray. Sure, I’ve asked the occasional probing question of the DSL enthusiast, but mostly my reaction has been limited to standing back and staring in mute amazement at the runaway Domain-Specific Fun-Time Language Train, screaming down the tracks, destined for its inevitable high-speed derailment into what I can only expect will be a bridge abutment. But I’m starting to get the feeling that some of the train’s passengers are aboard because they think it’s the Right Thing To Do Train, so maybe it’s time to say something.

To set the record straight, I don’t have anything against DSLs, embedded or otherwise. (I have created my fair share, some of which are actually useful.) No, my concern is limited strictly to the rise of the Gratuitous DSL. So let’s talk about it.

The reason – the right reason – for creating a DSL is because it ultimately lowers the cost of solving problems. If, then, you create a DSL and the cost of solving your problems does not go down, why did you create it? Think about it. Creating a DSL is an expensive proposition. Making people learn your DSL’s syntax, semantics, and underlying domain is a lot to ask – it’s costly. If you do ask, if you do make the imposition, you had better be sure your DSL pays its bills.

But what if your DSL turns out to be a deadbeat? What if using your DSL doesn’t lower the cost of solving problems? Well, guess what? You have created a Gratuitous Domain Specific Language.

Still unsure of whether you’re on the DSL Train for the wrong reason? No problem. Just take this simple, seven-step test:

Seven signs you may have created a Gratuitous Domain Specific Language (GDSL)

  1. You can’t actually explain what a DSL is.
  2. For your DSL, you can’t explain what the domain is.
  3. You have a hard time explaining the DSL’s syntax and semantics.
  4. You have a hard time explaining how the DSL interacts with the language it is embedded in. (For embedded DSLs only.)
  5. A vanilla library API would have captured the domain’s semantics without awkwardness.
  6. It’s easier to express complex domain concepts in general-purpose code than in your DSL.
  7. Your colleagues have a hard time writing things in your DSL.

Did more than a few of the statements ring true? If so, take a bow. You are the proud creator of a Gratuitous DSL!1

Even so, it’s not too late. You can always hop off the DSL Train at the next stop.


1. Rationale for the Seven Signs. Signs 1–4 suggest that your DSL may not even be a DSL. Signs 4–7 suggest that, though your DSL may be real, it may not be paying the bills.

Update: minor edit for clarity.

Update 2008-03-22: edits for clarity.

Posted in ,
Tags , , , , ,
9 comments
no trackbacks
Reddit Delicious

A bright future: security and modern type systems

Posted by Tom Moertel Wed, 15 Aug 2007 20:07:00 GMT

The recent defacement of the United Nations web site is a prime example of why we programmers shouldn’t trust ourselves to write secure code – at least not without our computers’ help. The U.N. web site, according to Slashdot’s coverage of the incident, was defaced by way of a common, well-known attack: SQL injection. What’s interesting is that programmers can render this attack harmless by employing simple, readily available programming tools such as placeholders and prepared statements. Why, then, are so many web sites, including the UN site apparently, still vulnerable?

Some say it’s because the programmers of these sites are incompetent, but that argument ignores that programmers are human, while the security tools we give them offer meaningful protection only if wielded with inhuman perfection. Having the tools to plug security holes, even if the tools are simple to use and readily available, is not enough to ensure that every single security hole will be identified, let alone plugged. Even the most experienced programmer can be expected to overlook a hole now and then. Unfortunately, one hole is all it takes.

That’s because security is not like other software-quality challenges: its costs are fundamentally asymmetric. For the attacker, the bad guy, the challenge is to find just a single exploitable hole. For us, the good guys, the challenge is to achieve perfection: to plug all of the holes in our code, every single one. That’s because attackers, unlike regular users, can be expected to probe our code until they find a hole to exploit.

How then do we ensure that we have plugged every single hole in our code? Testing isn’t sufficient: we can easily overlook holes when writing tests – a perfectly human error. We could supplement testing with code reviews, painstakingly searching for remaining holes while enforcing the use of hole-preventing best practices, but reviews are expensive and, again, subject to human error. A better approach, both less costly and more reliable, is to delegate this burden to our computers, which can do the job correctly, every single time.

This kind of delegation is possible today with modern static type systems. For example, in A Type-Based Solution to the Strings Problem, I offered a tiny “safe strings” library for the Haskell programming language. The library takes advantage of Haskell’s powerful type system to detect unsafe string interactions at compile time. If we faithfully build our code on top of the library, and our code compiles without error, we can be assured that our code is free – completely free – of SQL-injection (and XSS) holes.

While this result is indeed quite beautiful, it certainly isn’t novel. Researchers have been proving interesting properties via type systems for a long time. As Oleg Kiselyov and Chung-chieh Shan pointed out in a comment on my earlier article, the foundational idea is over three decades old.

More recently, Kiselyov and Shan have extended the idea to guarantee more-interesting properties using a trusted kernel and types that represent lightweight static capabilities. The kernel, which is small enough to be reasoned about and formally verified, carefully hands out capabilities to untrusted application code. The untrusted code, in turn, presents the capabilities back to the kernel to invoke operations, which, thanks to the kernel’s trustworthiness, are guaranteed to be safe. (My safe-string library can be seen as a trivial implementation of this programming style.)

When static type systems are used in this way, they don’t merely catch typos and bugs that good testing would have caught as a matter of course, but offer programmers guarantees that would have been impractical to obtain any other way.1 If you consider security important, you might bear this fact in mind when choosing languages for your next project.

Going further, the security benefits of rich static type systems are only now starting to trickle into mainstream industry. As libraries like “safe strings” and idioms like static capabilities become more familiar and get woven into future generations of development frameworks, we can expect marked improvements in the security and robustness of our applications.

In the not-too-distant future, perhaps, we might look back in amazement at the days when important security properties were neither free nor guaranteed but expensive and uncertain, underwritten only by the heroic efforts of individual programmers, struggling against impossible odds to achieve inhuman perfection.

Then again, it sure took garbage collection a long time to catch on.


1. How, for example, could you eliminate the possibility of SQL-injection and XSS holes via testing?

I suppose you could do it if you worked at it hard enough. You could augment your string data structures with run-time information about what they represent: this string represents SQL, this string represents plain-old text, and so on. Then you could redefine your string operations and template interpolation systems to assert that their string inputs were compatible. Of course, if these assertions ever failed, they would do so only at run time, when it would be too late to do anything but die rather ungracefully. So you would be forced to augment your code-coverage tools to ensure that every string-path was covered during testing. That way you could catch all potential run-time string failures – indicating holes – during testing and eliminate the holes (and the subsequent need to fail at run time) before you deployed your application for real.

So, yes, you could do it. But to do so would require you, in effect, to write a crude, single-purpose type system that checks types at test time. That says something, doesn’t it?

Posted in , ,
Tags , , , , ,
9 comments
no trackbacks
Reddit Delicious

My photo of British soldier lichen is now published!

Posted by Tom Moertel Tue, 14 Aug 2007 05:05:00 GMT

I wrote in an earlier post that one of my photos of British soldier lichen was going to be published in an upcoming issue of a popular Pennsylvania magazine. Now that the issue is out, I can reveal that the periodical is none other than Milford Magazine. My photo appears in the August 2007 issue, on top of the second page of the lichen feature, “The Secret Lives of Lichen” (PDF).

Do check it out. This may be your only opportunity to read an article that celebrates lichen in interview form.

I kid you not.

Posted in
Tags , ,
no comments
no trackbacks
Reddit Delicious