<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheets/rss.css" type="text/css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Tom Moertel's Weblog: Tag fun</title>
    <link>http://blog.moertel.com/articles/tag/fun?tag=fun</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Quality rants on programming theory and stuff geeks like</description>
    <item>
      <title>Fun with statistics:  estimating blog readership (a do-it-yourself recipe)</title>
      <description>&lt;p&gt;As everybody knows, &lt;em&gt;statistics is fun&lt;/em&gt;.  Is there
anything cooler than crushing a heap of seemingly uninteresting
numbers into gleaming jewels of meaning?  Of course not!  Models,
data-visualization plots, and fat data sets are &lt;em&gt;way cool&lt;/em&gt;.
So, let&amp;#8217;s find an excuse to play with them.&lt;/p&gt;


	&lt;p&gt;Here&amp;#8217;s &lt;span style="text-decoration: line-through"&gt;an excuse&lt;/span&gt; &amp;#8211;
I mean, an important and highly relevant question that many of us share:
&lt;em&gt;How many people actually read our blogs&lt;/em&gt;?  To answer the
question, we will need to use statistics, data, and cool plots.
Further, if you&amp;#8217;ve got the raw data for your blog, you can follow
along with your own analysis.  Even more fun!&lt;/p&gt;


	&lt;p&gt;We&amp;#8217;ll start with a simple inspection of common web-log data, using
command-line tools.  After developing a rough understanding of what
useful information we can extract, we&amp;#8217;ll analyze the raw data using a
series of successively more sophisticated techniques.  In the end, we
will derive a simple formula for estimating readership from easily
obtainable data.&lt;/p&gt;


	&lt;p&gt;Sound good?  Then let&amp;#8217;s get rocking.&lt;/p&gt;


	&lt;p&gt;But first, a preemptive strike on would-be poo-pooers: I know all about
FeedBurner.  I know they will track my blog&amp;#8217;s subscribers and use
their mystical powers to infer the number of &amp;#8220;real&amp;#8221; subscribers I
have.  I know it&amp;#8217;s &lt;em&gt;all so easy&lt;/em&gt;.  But easy isn&amp;#8217;t the point.  I want to
&lt;em&gt;understand&lt;/em&gt; what&amp;#8217;s going on.  Just taking somebody&amp;#8217;s word for it isn&amp;#8217;t
nearly as satisfying as figuring it out yourself &amp;#8211; nor as fun.&lt;/p&gt;


	&lt;p&gt;OK.  For real this time, &lt;em&gt;let&amp;#8217;s get rocking.&lt;/em&gt;&lt;/p&gt;&lt;h3&gt; The goal&lt;/h3&gt;


	&lt;p&gt;We want to know how many people read my blog regularly.  By regularly,
I mean that if I post something today, we want to count the people who
will read it within a week&amp;#8217;s time.  That way we&amp;#8217;ll count the weekend
readers but not the one-time readers who will trickle in from search
engines over the months ahead.&lt;/p&gt;


	&lt;p&gt;We can&amp;#8217;t just look at my web-log stats to determine my blog&amp;#8217;s
readership, however.  That&amp;#8217;s because a lot of people read my blog
through online feed aggregators, such as Bloglines and Google Reader,
and never actually &amp;#8220;hit&amp;#8221; my blog when they read it.  (My blog is so
ugly, in fact, that I would expect &lt;em&gt;lots&lt;/em&gt; of my readers to use a feed
aggregator just to protect themselves from my design &amp;#8220;skills.&amp;#8221;)&lt;/p&gt;


	&lt;p&gt;So the goal is to figure out how to count my readers using the data
we can actually get our hands on.&lt;/p&gt;


	&lt;h3&gt; The data&lt;/h3&gt;


	&lt;p&gt;Here&amp;#8217;s what we have: my &lt;span class="caps"&gt;HTTP&lt;/span&gt; server&amp;#8217;s log.  That&amp;#8217;s it.  Can we squeeze
the good stuff from it?  Let&amp;#8217;s find out.&lt;/p&gt;


	&lt;p&gt;Each entry in the log represents a single request for something on my
site.  A typical entry looks like this (split over multiple
lines for your reading pleasure):&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;72.14.199.81 - - [19/Aug/2007:19:31:43 -0400]
"GET /xml/atom/article/472/feed.xml HTTP/1.1" 200 1959 "-" 
"Feedfetcher-Google; (+http://www.google.com/...; 1 subscribers; ...)" 
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;There&amp;#8217;s a lot of potentially useful information in there:&lt;/p&gt;


	&lt;ul&gt;
	&lt;li&gt;the IP address of the host that made the request&lt;/li&gt;
		&lt;li&gt;the date and time that the request was received&lt;/li&gt;
		&lt;li&gt;a summary of the request (e.g., &amp;#8220;GET /xml/atom/article/472/feed.xml &lt;span class="caps"&gt;HTTP&lt;/span&gt;/1.1&amp;#8221;)&lt;/li&gt;
		&lt;li&gt;the response code, typically 200 for a successful response&lt;/li&gt;
		&lt;li&gt;the string sent by the requester&amp;#8217;s user agent to identify itself (e.g., &amp;#8220;Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 1 subscribers; ...)&amp;#8221;)&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;Note that this particular request was made by Google&amp;#8217;s Feedfetcher
for an Atom feed.  Also note that Feedfetcher told us,
via its user-agent identification string, how many of its users have
subscribed to this particular feed.  That&amp;#8217;s good stuff we can use.&lt;/p&gt;


	&lt;p&gt;My blog&amp;#8217;s main Atom feed is at /xml/atom10/feed.xml.  There are other
&amp;#8220;main&amp;#8221; feeds as well (e.g., &lt;span class="caps"&gt;RSS&lt;/span&gt;), but let&amp;#8217;s focus on this one for
now.  Let&amp;#8217;s see who&amp;#8217;s been asking for it recently.  First, I&amp;#8217;ll create a
bash-shell function to grab the subset of the log corresponding to 19
August:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;$ get_subset() {
    fgrep "GET /xml/atom10/feed.xml" blog_log |
    fgrep 19/Aug/2007;
  }
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Then I&amp;#8217;ll summarize the user-agent part of that subset&amp;#8217;s log entries:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;$ get_subset |
  perl -lne 'print $1 if /"([^";(]+)[^"]*"$/' |
  sort | uniq -c | sort -rn
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;78 NewsGatorOnline/2.0
47 Vienna/2.1.3.2111
38 Mozilla/5.0
27 YandexBlog/0.99.101
21 NewsFire/69
20 Planet Haskell +http://planet.haskell.org/ ...
19 Feedfetcher-Google
19 AppleSyndication/54
14 Zhuaxia.com 1 Subscribers
14 NetNewsWire/2.1b33
13 RssFwd
13 Bloglines/3.1
11 livedoor FeedFetcher/0.01
10 Feeds2.0
 8 RssBandit/1.5.0.10
 8 Akregator/1.2.6
 7 Eldono
 6 Netvibes
 4 NetNewsWire/3.0
 2 trawlr.com
 2 Opera/9.21
 2 NetNewsWire/3.1b5
 2 NetNewsWire/2.1
 2 Mozilla/3.0
 1 Vienna/2.2.0.2206
 1 Vienna/2.1.0.2107
 1 NetNewsWire/2.1.1
 1 Liferea/1.2.10
 1 JetBrains Omea Reader 2.2
 1 FeedTools/0.2.26 +http://www.sporkmonger.com/projects/feedtools/
 1 Feedshow/2.0
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Of the user agents that fetched my feed, only some, such as
Bloglines and Google Reader, aggregate on behalf of other users, and
only some of those mass aggregators reported how many people have
subscribed through them:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;$ get_subset |
  perl -lne 'print $1 if /"([^"]*?\d+ subscribers?)/i' |
  sort | uniq -c | sort -rn
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;78 NewsGatorOnline/2.0 (... 22 subscribers
19 Feedfetcher-Google; (... 102 subscribers
14 Zhuaxia.com 1 Subscribers
13 RssFwd (... 1 subscribers
13 Bloglines/3.1 (http://www.bloglines.com; 82 subscribers
11 livedoor FeedFetcher/0.01 (... 1 subscriber
10 Mozilla/5.0 (Rojo 1.0; ... 4 subscriber
 7 Eldono (http://www.eldono.de; 1 subscribers
 6 Netvibes (http://www.netvibes.com/; 12 subscribers
 2 trawlr.com (+http://www.trawlr.com; 4 subscribers
 1 Feedshow/2.0 (http://www.feedshow.com; 1 subscriber
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Of the user agents that don&amp;#8217;t report subscriber counts, most are
single-user feed readers.  The 47 requests from the Vienna-2.1.3.2111
reader, for example, came from 5 distinct IP addresses (which I&amp;#8217;ve
obscured to protect my innocent readers&amp;#8217; identities):&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;$ get_subset |
  perl -lane 'print $F[0] if m{"Vienna/2.1.3.2111}' |
  sort | uniq -c | sort -rn
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;22 121.44.xxx.xxx
20 208.120.xxx.xxx
 3 69.154.xxx.xxx
 1 84.163.xxx.xxx
 1 202.89.xxx.xxx
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Does that mean I have only 5 distinct readers using Vienna 2.1.3.2111?
Not necessarily.  The first IP address, for example, could represent a
firewall that serves several people from a single corporate
campus.  So there could, indeed, be more than 5 users lurking behind
those addresses, but it&amp;#8217;s hard to know for sure.&lt;/p&gt;


	&lt;p&gt;Thus we can&amp;#8217;t rely on feed-fetching statistics to
reliably determine the count of readers.  The mass aggregators don&amp;#8217;t
all report their subscriber counts, and the stand-alone aggregators&amp;#8217;
fetching habits are not readily interpreted.  And, even if we could
obtain reliable fetching inferences, that only tells us how many
people fetched my blog&amp;#8217;s feeds.  We want to know how many people &lt;em&gt;read&lt;/em&gt;
my blog &amp;#8211; actually look at the articles.&lt;/p&gt;


	&lt;p&gt;To do that, we&amp;#8217;ll need a more-sophisticated approach.&lt;/p&gt;


	&lt;h3&gt; A different approach: counting image downloads&lt;/h3&gt;


	&lt;p&gt;Every once in a while, I&amp;#8217;ll post an article that contains photos or
graphs of something I&amp;#8217;m trying to explain.  Since images like that are
included by reference, they are not actually part of the article
itself.  So when a feed fetcher grabs a syndicated copy of the
article, it won&amp;#8217;t bother to fetch the images. There&amp;#8217;s no need to use
the bandwidth unless the person on the other side of the feed actually
reads the article, at which time the person&amp;#8217;s feed reader can download
the images on demand.&lt;/p&gt;


	&lt;p&gt;Thus we can use the number of image downloads as an estimate of the
number of people who actually read my blog.  For each article that has
images, we can count how many times each image was downloaded during
the article&amp;#8217;s first week online and take the average of the counts as
an estimate of the number of people who read the article.  (Marketing
weasels use this technique, too, to track your reading habits.  The
only difference is that they will often insert gratuitous, personally
identifying images &amp;#8211; &lt;a href="http://en.wikipedia.org/wiki/Web_bug"&gt;web bugs&lt;/a&gt;
&amp;#8211; into their documents to track you specifically.)&lt;/p&gt;


	&lt;p&gt;The image-counting technique isn&amp;#8217;t foolproof, however.  Requests from
people behind proxy servers may never actually make it to my server
to be counted, leading to under-counting.  Also, some web crawlers
fetch images, which may artificially inflate the count of &amp;#8220;readers.&amp;#8221; 
Examining the logs, I didn&amp;#8217;t see many image requests from crawlers,
so our primary concern is under-counting.  Since I&amp;#8217;m OK with a
conservative count, under-counting is acceptable.&lt;/p&gt;


	&lt;p&gt;Let&amp;#8217;s give image-counting a try.  On 15 July 2007, I posted &lt;a href="http://blog.moertel.com/articles/2007/07/15/hailstorm"&gt;a story
about a nasty
hailstorm&lt;/a&gt; that
hit my neighborhood.  The story included some photos of the storm and
its aftermath.  Let&amp;#8217;s count how many times the second photo in the
story was requested on the day the story was posted:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;$ fgrep "webcam-2007-07-13--153421.jpg" mc_log |
  fgrep 15/Jul/2007 | wc -l
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;884
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;884 times.  Many of those downloads, however, were made by just a few
requesting hosts.  Here are the top ten downloaders:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;$ fgrep "webcam-2007-07-13--153421.jpg" mc_log |
  fgrep 15/Jul/2007 |
  perl -lane 'print $F[0]' |
  sort | uniq -c | head
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;
 42 84.45.xxx.xxx
 27 192.168.xxx.xxx
 10 75.182.xxx.xxx
  7 83.132.xxx.xxx
  7 213.203.xxx.xxx
  6 72.173.xxx.xxx
  6 67.180.xxx.xxx
  6 65.214.xxx.xxx
  5 89.98.xxx.xxx
  5 85.104.xxx.xxx
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;How do we interpret these duplicate requests? One way would be to say
that each request, duplicate or not, represents a unique reader.  It&amp;#8217;s
plausible.  When many readers share a gateway
firewall, say in a corporate setting, they will all end up making
requests from the same IP address(es).  Thus, if we want to count
all such readers, we should count all of the requests.&lt;/p&gt;


	&lt;p&gt;The more conservative interpretation is that all of the requests from
the same IP address represent only a single reader.  All of the
duplicate requests might be reloads or, perhaps, the work of an
overzealous user-agent working (inefficiently) on behalf of that
user.  Let&amp;#8217;s recount using this conservative assumption:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;$ fgrep "webcam-2007-07-13--153421.jpg" mc_log |
  fgrep 15/Jul/2007 |
  perl -lane 'print $F[0]' | sort -u | wc -l
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;635
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;So what&amp;#8217;s the real count, 635 or 884?  The truth probably lies
somewhere in between.  To make sure we capture the truth, then, let&amp;#8217;s
use both interpretations in our ongoing analysis.  We will develop low
and high estimates from now on.&lt;/p&gt;


	&lt;p&gt;If you have sharp eyes, you may have noticed that the second IP
address in the list above was from a private network.  That address,
in fact, belongs to my workstation.  When I write articles, I
frequently reload the drafts, and reloading causes the images within
the drafts to be re-fetched.  We&amp;#8217;ll need to filter out my addresses
during our later analyses.&lt;/p&gt;


	&lt;p&gt;There&amp;#8217;s one more thing to consider.  We still need to count the image
downloads for the rest of the week.  So far, we have only counted
those for the article&amp;#8217;s first day online.  So, let&amp;#8217;s re-do our
conservative count, only this time for the whole week. Let&amp;#8217;s also
filter out my private addresses and ignore all but &lt;span class="caps"&gt;HTTP 200&lt;/span&gt;
&amp;#8220;OK&amp;#8221; responses:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;$ fgrep "webcam-2007-07-13--153421.jpg" mc_log |
  fgrep " 200 " |  # only count full downloads (status code = 200)
  grep -P '(1[56789]|2[01])/Jul/2007' |  # Jul 15 thru 21 (7 days)
  perl -lane 'print $F[0] unless $F[0] =~ /^192\.168\./' |
  sort -u | wc -l
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;1601
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;So, we estimate conservatively that my article on the hailstorm was
read by about 1600 people in its first week.  Since the article was
published on 15 July 2007, we can conservatively estimate that my
blog&amp;#8217;s regular readership was about 1600 at that time, too.&lt;/p&gt;


	&lt;p&gt;But that&amp;#8217;s just a single point estimate.  We&amp;#8217;ll need more data
if we&amp;#8217;re to draw reliable conclusions.&lt;/p&gt;


	&lt;h3&gt; Compiling the image data&lt;/h3&gt;


	&lt;p&gt;To compile enough data for meaningful inferences, I have whipped up a
small script (in Perl) to extract and summarize image-download
statistics, given an &lt;span class="caps"&gt;HTTP&lt;/span&gt;-server log.  Running the script on my blog&amp;#8217;s
log, here&amp;#8217;s what we get:&lt;/p&gt;


&lt;div style="font-size: smaller; line-height: 1em; margin-bottom: 1em;"&gt;

	&lt;table&gt;
		&lt;tr&gt;
			&lt;th&gt; Date       &lt;/th&gt;
			&lt;th style="text-align:right;"&gt;Hits low &lt;/th&gt;
			&lt;th style="text-align:right;"&gt;Hits high  &lt;/th&gt;
			&lt;th&gt; Image &lt;/th&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-06-18   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    158   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     192   &lt;/td&gt;
			&lt;td&gt;  lady-beetle-larva-upside-down-small.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-06-18   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    157   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     191   &lt;/td&gt;
			&lt;td&gt;  lady-battle-larva-upside-down-close.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-07-06   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    163   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     200   &lt;/td&gt;
			&lt;td&gt;  lectro-shirt-before-and-after-wash-small &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-07-06   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    163   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     203   &lt;/td&gt;
			&lt;td&gt;  lectro-shirt-before-wash-300dpi.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-07-06   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    168   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     206   &lt;/td&gt;
			&lt;td&gt;  lectro-shirt-before-wash-small.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-07-07   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    155   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     194   &lt;/td&gt;
			&lt;td&gt;  Cladonia-cristatella-close.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-07-07   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    155   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     194   &lt;/td&gt;
			&lt;td&gt;  Cladonia-cristatella.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-08-03   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    147   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     188   &lt;/td&gt;
			&lt;td&gt;  annies-mixup-0003.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-08-03   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    146   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     188   &lt;/td&gt;
			&lt;td&gt;  annies-mixup-0002.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-08-24   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    173   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     217   &lt;/td&gt;
			&lt;td&gt;  blog-fd-usage-vs-time.png &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-09-12   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    271   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     328   &lt;/td&gt;
			&lt;td&gt;  perl-at-work-sign.png &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-10-18   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1448   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    1582   &lt;/td&gt;
			&lt;td&gt;  safe-strings.png * &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-11-04   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1005   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    1351   &lt;/td&gt;
			&lt;td&gt;  old-web-site-3.png &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-11-04   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1011   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    1364   &lt;/td&gt;
			&lt;td&gt;  old-web-site.png &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2006-11-14   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1265   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    1747   &lt;/td&gt;
			&lt;td&gt;  toms-apple-pie.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-05-25   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1567   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2406   &lt;/td&gt;
			&lt;td&gt;  problem-close.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-05-25   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1563   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2400   &lt;/td&gt;
			&lt;td&gt;  receiver-insides.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-05-25   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1551   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2383   &lt;/td&gt;
			&lt;td&gt;  repair.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-06-21   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   2290   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    3024   &lt;/td&gt;
			&lt;td&gt;  perl-and-r.png * &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1574   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2360   &lt;/td&gt;
			&lt;td&gt;  webcam-2007-07-13&amp;#8212;153751.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1562   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2379   &lt;/td&gt;
			&lt;td&gt;  backyard-ice.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1553   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2364   &lt;/td&gt;
			&lt;td&gt;  shredded.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1567   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2346   &lt;/td&gt;
			&lt;td&gt;  webcam-2007-07-13&amp;#8212;153757.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1561   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2355   &lt;/td&gt;
			&lt;td&gt;  webcam-2007-07-13&amp;#8212;153808.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1612   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2469   &lt;/td&gt;
			&lt;td&gt;  hailstorm2.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1592   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2382   &lt;/td&gt;
			&lt;td&gt;  webcam-2007-07-13&amp;#8212;153726.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1586   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2381   &lt;/td&gt;
			&lt;td&gt;  webcam-2007-07-13&amp;#8212;153747.jpg &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;  2007-07-15   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;   1601   &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    2404   &lt;/td&gt;
			&lt;td&gt;  webcam-2007-07-13&amp;#8212;153421.jpg &lt;/td&gt;
		&lt;/tr&gt;
	&lt;/table&gt;




&lt;/div&gt;

	&lt;p&gt;Like most data sets, this one looks better in graphical form:&lt;/p&gt;


&lt;div class="photo"&gt;
&lt;img src="http://community.moertel.com/~thor/blog/pix-20070821/image-downloads.png" title="Image downloads by date" alt="Image downloads by date" /&gt;
&lt;/div&gt;

	&lt;p&gt;The circles represent our conservative readership estimates, and the
pluses represent our liberal readership estimates.  To
interpret the overall readership trend, focus on one set of estimates,
either circles or pluses.&lt;/p&gt;


	&lt;p&gt;What do we see?  First, it looks like the quantity of downloads has
increased steadily, from a few hundred in July 2006 to the low
thousands by July 2007.  That&amp;#8217;s nice.&lt;/p&gt;


	&lt;p&gt;Second, the data are sparse.  I don&amp;#8217;t post images often, so we don&amp;#8217;t
have much data to go on.&lt;/p&gt;


	&lt;p&gt;Third, it looks like we have some outliers.  If you look at the points
near October 2006 and June 2007, you&amp;#8217;ll see that they jump up from the
surrounding points.  (In the lower-bound series, I have marked these
outliers with a short orange, vertical line segment.) If these jumps
truly represented a sudden increase in readership, we would expect
them to be permanent, reflected in later readership data.  What we
see, however, is that these gains are only temporary.&lt;/p&gt;


	&lt;p&gt;Thus it seems reasonable to conclude that something else is going
on for these images.  If you look back at the data table, I
have marked the pair of curious images with asterisks.  As it
turns out, both of these images were part of stories that were
featured on Reddit.  So, what these data reflect is the normal
readership &lt;em&gt;plus&lt;/em&gt; the Reddit effect.  To avoid throwing off our
inferences, let&amp;#8217;s discard the data for these two images.&lt;/p&gt;


	&lt;p&gt;In the end, we have a pretty good means of estimating my blog&amp;#8217;s
readership on the dates when I posted articles that contained images
(provided those wily Redditers didn&amp;#8217;t pile on the articles).  The
problem is, I would like to know what my readership is all the time,
not just on those rare occasions I post images.  I certainly don&amp;#8217;t
want to resort to using web bugs.  Hey, I&amp;#8217;m no marketing weasel.&lt;/p&gt;


	&lt;p&gt;It&amp;#8217;s time to add yet another layer of sophistication to our analysis.&lt;/p&gt;


	&lt;h3&gt; A combined model: reported subscribers &lt;em&gt;with&lt;/em&gt; image downloads&lt;/h3&gt;


	&lt;p&gt;Let&amp;#8217;s go back to the subscriber numbers reported by online aggregators
such as Bloglines and Google Reader.  If we assume that those
aggregators represent a decent slice of my readers, and that the size
of that slice as a proportion of the whole universe of readers doesn&amp;#8217;t
change much over time, we can model actual readership (as gathered
from image downloads) in terms of reported subscriber numbers.
Then, we can use that model to predict actual readership for the
dates when no image-download data are available.&lt;/p&gt;


	&lt;p&gt;That&amp;#8217;s the plan.  So, let&amp;#8217;s get going.&lt;/p&gt;


	&lt;h4&gt; Gathering subscriber data&lt;/h4&gt;


	&lt;p&gt;So, let&amp;#8217;s grab those subscriber numbers.  Again, I&amp;#8217;ve whipped up
a Perl script to gather the data.  Here&amp;#8217;s what the script does.  It &amp;#8211;&lt;/p&gt;


	&lt;ul&gt;
	&lt;li&gt;scans my blog&amp;#8217;s &lt;span class="caps"&gt;HTTP&lt;/span&gt; server log&lt;/li&gt;
		&lt;li&gt;ignores requests from private networks&lt;/li&gt;
		&lt;li&gt;ignores requests that don&amp;#8217;t report a subscriber count&lt;/li&gt;
		&lt;li&gt;emits one subscriber count for each day of data in the log, computed as the sum of each feed&amp;#8217;s subscriber count, as reported by each aggregator (if an aggregator fetches a feed more than once in a day, all but the final request are ignored)&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;Running the script on my server log, I got a large data set.  It&amp;#8217;s
so large that I&amp;#8217;ll go straight to the plot:&lt;/p&gt;


&lt;div class="photo"&gt;
&lt;img src="http://community.moertel.com/~thor/blog/pix-20070821/agg-reported-subs.png" title="Subscriber counts, as reported by online aggregators" alt="Subscriber counts, as reported by online aggregators" /&gt;
&lt;/div&gt;

	&lt;p&gt;As you would expect, these subscriber counts are less than the
corresponding reader counts we gathered from image downloads.  Not
everybody uses an online feed reader, after all.&lt;/p&gt;


	&lt;p&gt;One thing that leaps out is the discontinuity around February 2007.
What happened back then?  As it turns out, that is when Google finally
started reporting its subscriber counts.  Since Google has a large
share of the online aggregator market, that one little change resulted
in a big increase in the total of reported counts.&lt;/p&gt;


	&lt;p&gt;Still, that jump is going to make our analysis a bit more difficult.
When we relate subscriber counts to actual readers, we will need to
account for the &amp;#8220;Google effect.&amp;#8221;&lt;/p&gt;


	&lt;p&gt;Likewise, there are a few other sets of outliers &amp;#8211; points that look
like bogus data &amp;#8211; we should keep in mind.  To see whether any of our
image-download data coincide with these outliers, let&amp;#8217;s highlight our
subscriber data for the days when we also have image data:&lt;/p&gt;


&lt;div class="photo"&gt;
&lt;img src="http://community.moertel.com/~thor/blog/pix-20070821/subs-and-dls.png" title="Subscriber counts, highlighted if corresponding image-download data are available" alt="Subscriber counts, highlighted if corresponding image-download data are available" /&gt;
&lt;/div&gt;

	&lt;p&gt;Sure enough, some of our early download data coincide with an outlier
group in July 2006.  Let&amp;#8217;s remove that download data from our analysis
set, too.&lt;/p&gt;


	&lt;p&gt;Our data cleaned, let&amp;#8217;s move on.&lt;/p&gt;


	&lt;h4&gt; The model&lt;/h4&gt;


	&lt;p&gt;Now we are ready to relate subscribers to
readers (as determined by downloads).  Here&amp;#8217;s our model:&lt;/p&gt;


&lt;div style="text-align: center"&gt;
&lt;em&gt;y&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; = &lt;em&gt;a&amp;#160;&amp;#xB7;&amp;#160;g&lt;sub&gt;i&lt;/sub&gt;&amp;#160;&amp;#xB7;&amp;#160;x&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; + &lt;em&gt;e&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt;
&lt;/div&gt;

	&lt;p&gt;Where:&lt;/p&gt;


	&lt;ul&gt;
	&lt;li&gt;&lt;em&gt;y&lt;/em&gt; represents actual readers (as estimated from image downloads)&lt;/li&gt;
		&lt;li&gt;&lt;em&gt;x&lt;/em&gt; represents subscribers as reported by online aggregators&lt;/li&gt;
		&lt;li&gt;&lt;em&gt;i&lt;/em&gt; ranges over 1&amp;#8211;&lt;em&gt;N&lt;/em&gt; for our &lt;em&gt;N&lt;/em&gt; data points&lt;/li&gt;
		&lt;li&gt;&lt;em&gt;a&lt;/em&gt; is the coefficient that relates readers to subscribers&lt;/li&gt;
		&lt;li&gt;&lt;em&gt;g&lt;/em&gt; is a true/false factor to indicate whether &lt;em&gt;x&lt;/em&gt; includes Google Reader users&lt;/li&gt;
		&lt;li&gt;&lt;em&gt;e&lt;/em&gt; is the model&amp;#8217;s error term&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;What the model says is that readership (&lt;em&gt;y&lt;/em&gt;) varies linearly
with subscriber counts (&lt;em&gt;x&lt;/em&gt;) and that the rate at which it
varies is given by &lt;em&gt;a&amp;#160;&amp;#xB7;&amp;#160;g&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt;.  (Model aficionados may
note that this is a varying-slope model.) The model does not include a
constant term; this is to fix the &lt;em&gt;y&lt;/em&gt;-intercept at 0
because when we have no actual readers, we cannot have any subscribers,
either.  Thus we know the point (0,0) must be part of the fitted model.&lt;/p&gt;


	&lt;p&gt;Here&amp;#8217;s the data set we will use to fit our model:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;        date y.low y.high   x     g
1 2006-06-18   158    192  53 FALSE
2 2006-08-03   146    188  68 FALSE
3 2006-08-24   173    217  89 FALSE
4 2006-09-12   271    328  97 FALSE
5 2006-11-04  1008   1358 112 FALSE
6 2006-11-14  1265   1747 114 FALSE
7 2007-05-25  1560   2396 385  TRUE
8 2007-07-15  1579   2382 401  TRUE
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;This data set combines a summarized version of our image-download data
set with the corresponding data from our aggregator-reported subscriber
set (the red points in the previous plot).&lt;/p&gt;


	&lt;p&gt;The low and high &lt;em&gt;y&lt;/em&gt; values represent our conservative and
liberal interpretations of readership, which we discussed earlier.
You&amp;#8217;ll also note that where multiple images were available for any
particular date, I have averaged their download counts to give a
centralized readership estimate for that date.  (Exercise: For this
model, why shouldn&amp;#8217;t we include multiple images for a single date?)&lt;/p&gt;


	&lt;p&gt;Let&amp;#8217;s plot this data set (just the &lt;em&gt;y.low&lt;/em&gt; part):&lt;/p&gt;


&lt;div class="photo"&gt;
&lt;img src="http://community.moertel.com/~thor/blog/pix-20070821/model-fitting-data.png" title="Data set for model fitting" alt="Data set for model fitting" /&gt;
&lt;/div&gt;

	&lt;p&gt;There aren&amp;#8217;t many points to go on, but because our model is so simple,
there are probably enough.  That means it&amp;#8217;s time to fit our model to
our data.&lt;/p&gt;


	&lt;p&gt;To fit our linear model, I&amp;#8217;ll use the &lt;em&gt;lm&lt;/em&gt; function from the amazingly
cool &lt;a href="http://www.r-project.org/"&gt;R statistics system&lt;/a&gt; (which I&amp;#8217;ve also
been using for our plots).  To summarize the results,
I&amp;#8217;ll use the &lt;em&gt;display&lt;/em&gt; function from the
&lt;a href="http://cran.r-project.org/src/contrib/Descriptions/arm.html"&gt;&amp;#8220;arm&amp;#8221; 
&lt;span class="caps"&gt;CRAN&lt;/span&gt; package&lt;/a&gt;, which accompanies Andrew Gelman and Jennifer Hill&amp;#8217;s
wonderful book &lt;a href="http://www.amazon.com/exec/obidos/ASIN/052168689X/ref=nosim/tommoertesweb-20"&gt;&lt;em&gt;Data Analysis Using Regression and
Multilevel/Hierarchical
Models&lt;/em&gt;&lt;/a&gt;.
(BTW, &lt;a href="http://www.stat.columbia.edu/~gelman/blog/"&gt;Gelman&amp;#8217;s blog&lt;/a&gt; is
fascinating.  It&amp;#8217;s one of my favorite reads.)  If you are following
along and don&amp;#8217;t have the &amp;#8220;arm&amp;#8221; package installed, you can use the
&lt;em&gt;summary&lt;/em&gt; function instead of &lt;em&gt;display&lt;/em&gt;.&lt;/p&gt;


	&lt;p&gt;First, let&amp;#8217;s fit the model to the conservative
data:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;M1.low &amp;lt;- lm (y.low ~ g:x + 0, data=subs.readers)
display(M1.low)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;lm(formula = y.low ~ g:x + 0, data = subs.readers)
         coef.est coef.se
gFALSE:x 6.31     1.59
gTRUE:x  3.99     0.64
  n = 8, k = 2
  residual sd = 356.94, R-Squared = 0.90
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;That&amp;#8217;s a pretty good fit.  Both of our model parameters are
significant (even at the 1-percent level).  The resulting model says
that each subscriber represents about 4 actual readers (or 6.3 readers
if the subscriber count doesn&amp;#8217;t include Google Reader users).&lt;/p&gt;


	&lt;p&gt;Let&amp;#8217;s visualize the model, now fit to our data:&lt;/p&gt;


&lt;div class="photo"&gt;
&lt;img src="http://community.moertel.com/~thor/blog/pix-20070821/m1-fit.png" title="Our model, fit to our data" alt="Our model, fit to our data" /&gt;
&lt;/div&gt;

	&lt;p&gt;The gray line segments represent our fitted model&amp;#8217;s predictions.
Thus, for example, when we have &lt;em&gt;x&lt;/em&gt;&amp;#160;=&amp;#160;100 reported
subscribers, the model predicts that we have about
&lt;em&gt;y&lt;/em&gt;&amp;#160;=&amp;#160;630 actual readers.  Likewise, when we have
400 subscribers, the model predicts that we have about 1600 actual
readers.&lt;/p&gt;


	&lt;p&gt;The two line segments show how our model accommodates the &amp;#8220;Google
effect.&amp;#8221; On the left, we have the pre-Google slope; on the right, the
post-Google slope.  In effect, our model combines two simpler models
and chooses between them based on the Boolean factor &lt;em&gt;g&lt;/em&gt;.&lt;/p&gt;


	&lt;p&gt;And that&amp;#8217;s all there is to the fitting process.
Let&amp;#8217;s repeat the process for the liberal-interpretation data.&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;M1.high &amp;lt;- lm (y.high ~ g:x + 0, data=subs.readers)
display(M1.high)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;lm(formula = y.high ~ g:x + 0, data = subs.readers)
         coef.est coef.se
gFALSE:x 8.46     2.25
gTRUE:x  6.08     0.91
  n = 8, k = 2
  residual sd = 504.22, R-Squared = 0.91
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Under this model, each subscriber represents about 6 actual readers
(or 8.5 if our subscriber count doesn&amp;#8217;t include Google Reader users).&lt;/p&gt;


	&lt;p&gt;Now that we have our models, let&amp;#8217;s use them to
predict actual readership.&lt;/p&gt;


	&lt;h3&gt;Using our models for prediction&lt;/h3&gt;


	&lt;p&gt;Models ready, we can now predict my blog&amp;#8217;s readership for any day, not
just those days on which I happened to include images in my
postings.&lt;/p&gt;


	&lt;p&gt;I have subscriber data in an R data frame called, unsurprisingly,
&lt;em&gt;subscriber.data&lt;/em&gt;.  It provides, for each day I have subscriber
statistics, values for &lt;em&gt;x&lt;/em&gt; and &lt;em&gt;g&lt;/em&gt;.  (This is the same
data set visualized in the earlier plot &amp;#8220;Aggregator-reported
subscribers to blog.moertel.com.&amp;#8221;) We can tell R to plug these values
into our model to predict the actual number of readers for those days.
Let&amp;#8217;s make both conservative and liberal predictions, storing them in
a new data frame called &lt;em&gt;predicted.readers&lt;/em&gt;:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;predicted.readers &amp;lt;-
  transform(subscriber.data,
            readers.low  = predict(M1.low, subscriber.data),
            readers.high = predict(M1.high, subscriber.data))
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Now let&amp;#8217;s plot our predictions.  First the plot code, just
so you can see how it&amp;#8217;s done in R:&lt;/p&gt;


&lt;pre&gt;&lt;code class="typedin"&gt;xyplot(readers.low + readers.high ~ date,
       data = predicted.readers,
       main = "Predicted actual readers of blog.moertel.com",
       ylab = "Readers",
       xlab = "Date",
       auto.key = list(x = .35, y = .9, corner = c(0,0),
                       text = c("conservative estimate",
                                "liberal estimate"),
                       reverse.rows = T, between = -19))
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;And the resulting plot:&lt;/p&gt;


&lt;div class="photo"&gt;
&lt;img src="http://community.moertel.com/~thor/blog/pix-20070821/predicted-readers.png" title="Readership of blog.moertel.com, low and high predictions" alt="Readership of blog.moertel.com, low and high predictions" /&gt;
&lt;/div&gt;

	&lt;h3&gt;The bottom line&lt;/h3&gt;


	&lt;p&gt;We have distilled a ton of raw data into a simple formula for
predicting my blog&amp;#8217;s actual readership from readily available
subscriber counts.  Just take the total
subscriber count and multiply by 4 and 6, respectively, for low and
high estimates of readership.&lt;/p&gt;


	&lt;p&gt;So, to answer our original question, how many readers does my blog
have?  Only a few days ago, on 18 August, the online aggregators reported
that they were serving my feeds to 442 subscribers.  So we can predict
that, right now, my blog has 1750 to 2650 readers.&lt;/p&gt;


	&lt;p&gt;We have our answer.  Getting it took some doing, but the doing was
fun, so all&amp;#8217;s good.&lt;/p&gt;


	&lt;p&gt;Certainly, we could go on.  There are many interesting questions left
to be answered.  What, for example, is the growth trend of my
readership?  What is Google Reader&amp;#8217;s market share? For now, however,
it&amp;#8217;s time to take a break.&lt;/p&gt;


	&lt;p&gt;I hope you had fun following along.  If you have your own data, I&amp;#8217;d be
interested in hearing about your analytical explorations.  (And, if you
haven&amp;#8217;t installed &lt;a href="http://www.r-project.org/"&gt;R&lt;/a&gt; on your computer yet,
&lt;em&gt;do it now&lt;/em&gt;.  R is seriously cool and comes with great
documentation, examples, and sample data.  If you&amp;#8217;re not using R,
you&amp;#8217;re not having all the fun you deserve.)&lt;/p&gt;


&lt;div class="update"&gt;

	&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; minor editing tweaks for clarity.&lt;/p&gt;


&lt;/div&gt;</description>
      <pubDate>Wed, 22 Aug 2007 21:34:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:01ec2aa2-2a63-4f48-8ab6-a7c1b6af4c20</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2007/08/22/fun-with-statistics-estiating-blog-readership</link>
      <category>statistics</category>
      <category>R</category>
      <category>statistics</category>
      <category>blog</category>
      <category>fun</category>
      <category>modeling</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/544</trackback:ping>
    </item>
    <item>
      <title>I'm going to be a published photographer!</title>
      <description>&lt;p&gt;Earlier today I received an email from the editor of a
Pennsylvania-based magazine.  (I won&amp;#8217;t mention the name of the magazine
in case what I&amp;#8217;m about to write next amounts to a spoiler.) He asked
if I would allow the magazine to publish one of my &lt;a href="http://blog.moertel.com/articles/2006/07/07/interesting-stuff-matchstick-moss-british-soldier-lichen"&gt;photographs of
British soldier
lichen&lt;/a&gt;
in an upcoming issue.&lt;/p&gt;


	&lt;p&gt;Of course, I said yes. (I&amp;#8217;m always looking for ways to spread the word
about British soldier lichen.)&lt;/p&gt;


	&lt;p&gt;My fee? I asked for a free issue of the magazine when it goes to
press.  They said it will be in my mailbox.&lt;/p&gt;


	&lt;p&gt;Cool.&lt;/p&gt;</description>
      <pubDate>Thu, 31 May 2007 00:01:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:8cd3b0e6-fe35-4370-9976-941beaab592a</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2007/05/31/im-going-to-be-a-published-photographer</link>
      <category>photography</category>
      <category>lichen</category>
      <category>photography</category>
      <category>fun</category>
      <category>me</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/469</trackback:ping>
    </item>
  </channel>
</rss>
