Engauge Digitizer: a handy tool for extracting data from charts

Posted on
Tags: statistics, charts, plots, data, fedora, rpms, tools

Today I wanted to extract the data that were visualized in a chart I saw on Seth Roberts’s blog. That is, I had a picture of a data set, and I wanted the numbers behind the picture.

This task turned out to be surprisingly easy – once I found Engauge Digitizer, an open-source (GPL) tool made for this very task. After I launched Engauge, the digitization process was straightforward:

  1. I established the chart’s coordinate system by clicking in the corners and entering the associated coordinates.
  2. Then I had Engauge identify data points. With the mouse, I selected a data point by hand, teaching Engauge what a point looks like. Then Engauge identified spots on chart that looked like data points and locked on to them. I was able to step through the points to tell Engauge to skip the few it misidentified.
  3. I manually selected a few more data points that were scrunched into blobs and had eluded Engauge’s point-detection heuristics.
  4. Finally, I exported the data set in CSV format.

If you ever need to extract the data behind a chart, do check out Engauge Digitizer. (If you use Fedora Linux, you’ll be happy to know that I have packaged Engauge for you. Get it at the RPMs section

of the community site.)