Open-source statistics: R and ESS

Posted on August 27, 2004

Tags: r, ess, statistics, mathematica, oss, math

Recently, I needed to perform some statistical work. But I didn’t want use my previous tool-of-choice, Mathematica, because I decided after my switch to Linux not to rely on proprietary software when viable open-source alternatives existed. And thus I embarked on a short search for open-source statistics software.

R

My search was fruitful, leading me immediately to the delightfully GPL-licensed R Project for Statistical Computing: “R is a language and environment for statistical computing and graphics.” (The R system and language are similar to S, developed at Bell Labs.) The R language has functional-programming semantics (which I love) and supports (among others) the object-oriented style of programming, which is used extensively for R’s statistical interface. Most results in R are delivered in terms of objects, such as tables and and vectors and linear models, whose properties you can inspect and manipulate as you would expect. The underlying classes provide specialized methods for common operations so that the objects do the right things in response to generic commands.

Immediately, I was hooked on R. Despite having a sharp initial learning curve, R is straightforward to use. Once you get the lay of the land, you can reliably guess what functions and their arguments mean. The help facility is good, too, and can integrate with your web browser if you desire.

And the graphics! Graphs and charts are often the first, best way to size up data sets. R makes it easy to create publication-quality graphs and charts, drawing on any number of supported “graphical devices.” Among the stock devices are postscript, pdf, LaTeX, png, xfig, postscript-rendered bitmaps, and X11 (windows). For a tiny example of R’s graphics, see my posts on Mining gold from the Internet Movie Database.

To make the already-attractive R downright irresistible, the R community offers the Comprehensive R Archive Network (CRAN), the R equivalent of Perl’s CPAN. (One of the CRAN mirrors is hosted by Pittsburgh’s own pair networks) CRAN provides packages for esoteric methods of analysis, database integration, genetics, time series analysis, HTTP (!), map projections, vegetation science, and myriad others. Additionally, CRAN provides numerous sample data sets, many corresponding to examples and problem sets from popular statistics textbooks. (I should note that R, out of the box, comes loaded with tools and sample data. CRAN isn’t in any way remedial but rather expands R’s initial richness to mind-blowing proportions.)

ESS

Once I started to use R frequently, I grew tired of the command-line interface. That’s where Emacs Speaks Statistics (ESS) comes in. It’s an add-on to Emacs that provides a seamless, rich interface to R (and other statistics packages). Since I live in Emacs, ESS was a natural fit for my working style. Highly recommended. (If you’re interested, I have made a Fedora/RedHat RPM package for ESS. Get it in the RPMs section of the site.)

Summary

If you’re looking for a good statistics system, get R. Now. And if you use Emacs, too, by all means get ESS. (If you just need a few bare-bones tools, however, you might want to check out my tiny statistics tools in Tom’s Perl code on the Community Projects site)