Posted by Tom Moertel
Sun, 26 Aug 2007 01:56:00 GMT
The R statistics system can produce
first-class data visualizations, commonly known as plots. Internally,
plots are represented in an abstract graphics format that can be
rendered on any of R’s wide range of graphics “devices” to produce
concrete output – windows, bitmap files, PostScript files, PDF files,
and others.
The bitmap formats, such as PNG, are preferred for posting
plots online because of their widespread support by web browsers. The
default bitmap-rendering devices in R, unfortunately, produce graphics
that look a little too “bitmapped” for modern web tastes. Here, for example,
is a plot rendered by R’s “png” device:
There’s nothing technically wrong with the plot, but it looks out
of place on a web page. That’s because modern web
browsers use font-smoothing and anti-aliasing techniques to render
just about everything else on the page. Against this clean, un-jagged
backdrop, the oh-so-bitmapped plot looks like a throwback to
a previous era.
Happily, we can produce clean, anti-aliased R plots with a little
help. Here’s the earlier plot, anti-aliased:
To produce the anti-aliased plot, I used R to produce a PDF file. Then I
rendered the PDF file into a PNG image at 300 dpi using Ghostscript.
Finally, I scaled the 300-dpi image down to screen resolution,
producing a high-quality, anti-aliased result.
Here’s the recipe in detail.
First, I define an R function called pdfit that takes an
abstract graphics object and makes a PDF-file rendering of it, using
my preferred graphics-device settings:
require("lattice")
pdfit <- function(f, ...) {
trellis.device(dev=pdf, theme="col.whitebg", ...);
print(f);
dev.off()
}
Then, when I create a plot I want to publish, I use pdfit to render
it into a PDF file:
P.img <- xyplot( subs.low + subs.high ~ date, ... )
pdfit(P.img, file="image-downloads.pdf") # render plot into PDF file
Finally, I use Ghostscript and
ImageMagick to convert the PDF file into
a high-quality, anti-aliased PNG file. (I keep both formats: the PDF
file is best for publishing in printed papers, and the PNG file is
best for posting online.) I use a simple Makefile to automate the
process of converting the PDF files into PNG files:
# Makefile (GNU make)
pdfs := $(wildcard *.pdf)
pngs := $(pdfs:.pdf=.png)
all: $(pngs)
.PHONY: all
%.png: %.pdf
gs -dSAFTER -dBATCH -dNOPAUSE -sDEVICE=png16m \
-dGraphicsAlphaBits=4 -dTextAlphaBits=4 -r300 \
-dBackgroundColor='16#ffffff' \
-sOutputFile=$@ > /dev/null \
$< && \
mogrify -resize 500 $@
With this Makefile in my graphics directory, just a single “make”
command is all it takes to convert my PDF images into
anti-aliased PNG files, ready to post online.
And that’s it.
Do you have any tips or tricks for making good-looking graphics with
R? If so, please do share.
Update: There is one downside to the sexy, anti-aliased plots: they
are not as compressible as the old-style jagged plots. For the
images above, for instance, the anti-aliased PNG file weighs in
at 45 KB, but the original PNG file is a feathery 4.7 KB.
So, if bandwidth is precious to you – or you’re planning on getting
Slashdotted – you might want to stick
with the jaggies.
Posted in statistics
Tags graphics, plots, R, statistics, tips, tricks
8 comments
no trackbacks

Posted by Tom Moertel
Thu, 23 Aug 2007 01:34:00 GMT
As everybody knows, statistics is fun. Is there
anything cooler than crushing a heap of seemingly uninteresting
numbers into gleaming jewels of meaning? Of course not! Models,
data-visualization plots, and fat data sets are way cool.
So, let’s find an excuse to play with them.
Here’s an excuse –
I mean, an important and highly relevant question that many of us share:
How many people actually read our blogs? To answer the
question, we will need to use statistics, data, and cool plots.
Further, if you’ve got the raw data for your blog, you can follow
along with your own analysis. Even more fun!
We’ll start with a simple inspection of common web-log data, using
command-line tools. After developing a rough understanding of what
useful information we can extract, we’ll analyze the raw data using a
series of successively more sophisticated techniques. In the end, we
will derive a simple formula for estimating readership from easily
obtainable data.
Sound good? Then let’s get rocking.
But first, a preemptive strike on would-be poo-pooers: I know all about
FeedBurner. I know they will track my blog’s subscribers and use
their mystical powers to infer the number of “real” subscribers I
have. I know it’s all so easy. But easy isn’t the point. I want to
understand what’s going on. Just taking somebody’s word for it isn’t
nearly as satisfying as figuring it out yourself – nor as fun.
OK. For real this time, let’s get rocking.
Read more...
Posted in statistics
Tags blog, fun, modeling, R, statistics
5 comments
no trackbacks

Posted by Tom Moertel
Mon, 20 Aug 2007 23:18:00 GMT
This year’s Pittsburgh Perl Workshop is shaping up
to be uber-techno-awesome. This year, it’s two big days of
lively technical talks and full-force Perl festiveness. Yes, come
October, programmers of all stripes will gather in Pittsburgh over the
weekend of the 13th to grab a slice of the fun. A big slice. And you – yes
you, my friend – should be there.
Lots of interesting talks are flowing in, but it’s not too late
to grab a speaking slot. If you have
anything interesting to say about Perl, now is your time. 20-
and 50-minute slots are available. To claim one, just go to
pghpw.org and submit a talk
proposal. It’s easy. But
act now, before it’s too late!
If you have any interest in Perl, you’ll want to be at PPW
2007, and if you have anything to say about
Perl, you’ll definitely want to speak at PPW
2007.
Don’t miss your opportunity. Seize the day!
Posted in perl
Tags perl, pghpw, ppw2007, speaking
no comments
no trackbacks

Posted by Tom Moertel
Sat, 18 Aug 2007 17:01:00 GMT
Like chromatic, I have
watched the recent irrational exuberance for domain-specific languages
(DSLs) with bewilderment. In certain quarters of the programming
universe, it seems that creating DSLs is nearly a rite of passage.
The problem is, more and more of these DSLs appear to have been
created mainly because, well, DSLs are cool these days, even if less
“novel” solutions probably would have been more sensible.
Whereas chromatic unhesitatingly confronted the madness
head-on,
I have so far managed to avoid the fray. Sure, I’ve asked the
occasional probing question of the DSL
enthusiast,
but mostly my reaction has been limited to standing back and staring
in mute amazement at the runaway Domain-Specific Fun-Time Language
Train, screaming down the tracks, destined for its inevitable high-speed
derailment into what I can only expect will be a bridge abutment.
But I’m starting to get the feeling that some of the train’s passengers are
aboard because they think it’s the Right Thing To Do Train,
so maybe it’s time to throw in my two cents.
To set the record straight, I don’t have anything against DSLs,
embedded or otherwise. (I have created my fair
share,
some of which are actually
useful.) No, my concern is
limited strictly to the rise of the Gratuitous DSL. So let’s talk
about it.
The reason – the right reason – for creating a DSL is because it ultimately lowers the cost
of solving problems. If, then, you create a DSL and the cost of
solving your problems does not go down, why did you create
it? Think about it. Creating a DSL is an expensive proposition. Making
people learn your DSL’s syntax,
semantics, and underlying domain is a lot to ask – it’s costly. If you do ask, if you do make
the imposition, you had better be sure your DSL pays its bills.
But what if your DSL turns out to be a deadbeat? What if using your DSL doesn’t lower the cost of solving problems? Well, guess what? You have
created a Gratuitous Domain Specific Language.
Still unsure of whether you’re on the DSL Train for the wrong reason? No problem. Just take
this simple, seven-step test:
Seven signs you may have created a Gratuitous Domain Specific Language (GDSL)
- You can’t actually explain what a DSL is.
- For your DSL, you can’t explain what the domain is.
- You have a hard time explaining the DSL’s syntax and semantics.
- You have a hard time explaining how the DSL interacts with the language it is embedded in. (For embedded DSLs only.)
- A vanilla library API would have captured the domain’s semantics without awkwardness.
- It’s easier to express complex domain concepts in general-purpose code than in your DSL.
- Your colleagues have a hard time writing things in your DSL.
Did more than a few of the statements ring true? If so, take a bow.
You are the proud creator of a Gratuitous
DSL!1
Even so, it’s not too late. You can always hop off the DSL Train at the next stop.
(Note for the humor-impaired: This post is meant to be interpreted in
tongue-in-cheek fashion.)
Update: minor edit for clarity.
Update 2008-03-22: edits for clarity.
Posted in programming, humor
Tags cargocult, coding, culture, dsl, edsl, humor
9 comments
no trackbacks

Posted by Tom Moertel
Wed, 15 Aug 2007 20:07:00 GMT
The recent defacement of the United Nations web
site is a prime
example of why we programmers shouldn’t trust ourselves to write
secure code – at least not without our computers’ help. The U.N. web
site, according to Slashdot’s coverage of the incident, was defaced by
way of a common, well-known attack: SQL
injection. What’s
interesting is that programmers can render this attack harmless by
employing simple, readily available programming tools such as
placeholders and prepared statements. Why, then, are so many web sites,
including the UN site apparently, still vulnerable?
Some say it’s because the programmers of these sites are incompetent,
but that argument ignores that programmers are
human, while the security tools we give them offer meaningful
protection only if wielded with inhuman perfection. Having the tools
to plug security holes, even if the tools are simple to use and
readily available, is not enough to ensure that every single security
hole will be identified, let alone plugged. Even the most experienced
programmer can be expected to overlook a hole now and then.
Unfortunately, one hole is all it takes.
That’s because security is not like other software-quality challenges:
its costs are fundamentally asymmetric. For the attacker, the bad
guy, the challenge is to find just a single exploitable hole. For
us, the good guys, the challenge is to achieve perfection: to plug
all of the holes in our code, every single one. That’s because
attackers, unlike regular users, can be expected to probe our code
until they find a hole to exploit.
How then do we ensure that we have plugged every single hole in our
code? Testing isn’t sufficient: we can easily overlook holes
when writing tests – a perfectly human error. We could supplement
testing with code reviews, painstakingly searching for remaining holes
while enforcing the use of hole-preventing best practices, but reviews
are expensive and, again, subject to human error. A better approach,
both less costly and more reliable, is to delegate this burden to our
computers, which can do the job correctly, every single time.
This kind of delegation is possible today with modern static type systems.
For example, in A Type-Based Solution to the Strings
Problem,
I offered a tiny “safe strings” library for the Haskell programming
language. The library takes advantage of
Haskell’s powerful type system to detect unsafe string interactions at
compile time. If we faithfully build our code on top of the library, and our
code compiles without error, we can be assured that our code is
free – completely free – of SQL-injection (and XSS) holes.
While this result is indeed quite beautiful, it certainly isn’t novel.
Researchers have been proving interesting
properties via type systems for a long time. As Oleg Kiselyov and Chung-chieh Shan pointed out in a comment on my earlier article, the foundational idea is over three decades old.
More recently, Kiselyov and Shan have extended the
idea to guarantee more-interesting properties using a trusted kernel and types that represent
lightweight static
capabilities.
The kernel, which is small enough to be reasoned about and formally
verified, carefully hands out capabilities to untrusted application
code. The untrusted code, in turn, presents the capabilities back to
the kernel to invoke operations, which, thanks to the kernel’s
trustworthiness, are guaranteed to be safe. (My safe-string library
can be seen as a trivial implementation of this programming style.)
When static type systems are used in this way, they don’t merely catch
typos and bugs that good testing would have caught as a matter of
course, but offer programmers guarantees that would have been
impractical to obtain any other way.1 If
you consider security important, you might bear this fact in mind when
choosing languages for your next project.
Going further, the security benefits of rich static type systems are only now
starting to trickle into mainstream industry. As libraries like “safe
strings” and idioms like static capabilities become more familiar and
get woven into future generations of development frameworks, we can
expect marked improvements in the security and robustness of our
applications.
In the not-too-distant future, perhaps, we might look
back in amazement at the days when important security properties were
neither free nor guaranteed but expensive and uncertain, underwritten
only by the heroic efforts of individual programmers, struggling
against impossible odds to achieve inhuman perfection.
Then again, it sure took garbage collection a long time to catch on.
Posted in programming, web development, security
Tags capabilities, safestrings, security, sqlinjection, types, xss
9 comments
no trackbacks

Posted by Tom Moertel
Tue, 14 Aug 2007 05:05:00 GMT
I wrote in an earlier post that one of my photos of British soldier lichen was going to be published in an upcoming issue of a popular Pennsylvania magazine. Now that the issue is out, I can reveal that the periodical is none other than Milford Magazine. My photo appears in the August 2007 issue, on top of the second page of the lichen feature, “The Secret Lives of Lichen” (PDF).
Do check it out. This may be your only opportunity to read an article that celebrates lichen in interview form.
I kid you not.
Posted in photography
Tags lichen, milfordmagazine, photography
no comments
no trackbacks
