Today I wanted to extract the data that were visualized in a chart I saw on Seth Roberts’s blog. That is, I had a picture of a data set, and I wanted the numbers behind the picture.
This task turned out to be surprisingly easy – once I found Engauge Digitizer, an open-source (GPL) tool made for this very task. After I launched Engauge, the digitization process was straightforward:
- I established the chart’s coordinate system by clicking in the corners and entering the associated coordinates.
- Then I had Engauge identify data points. With the mouse, I selected a data point by hand, teaching Engauge what a point looks like. Then Engauge identified spots on chart that looked like data points and locked on to them. I was able to step through the points to tell Engauge to skip the few it misidentified.
- I manually selected a few more data points that were scrunched into blobs and had eluded Engauge’s point-detection heuristics.
- Finally, I exported the data set in CSV format.
If you ever need to extract the data behind a chart, do check out Engauge Digitizer. (If you use Fedora Linux, you’ll be happy to know that I have packaged Engauge for you. Get it at the RPMs section
of the community site.)