Posted by Tom Moertel
Tue, 28 Dec 2010 19:02:00 GMT
What did people read on my blog in 2010? Mostly, it was older content. Here are the ten most-popular pages, ordered by unique page views relative to that of the home page (1.0):
1. A Coder’s Guide to Coffee (2002, popularity = 5.30). This oldie continues to be popular mainly because coders still drink coffee – and because the Guide gets rediscovered every few months and posted to Reddit or Hacker News. This year it got an additional boost from being the cover story of Hacker Monthly #4.
2. Never store passwords in a database! (2006, popularity = 3.18). Despite being 4 years old, this article gets a steady flow of readers because lots of programmers are still storing passwords in databases. And getting owned.
3. Ruby 1.9 gets handy new method Object#tap (2007, popularity = 1.37). I’m not sure why this article keeps getting the hits, but it does. People just love Object#tap, I guess.
4. Wondrous oddities: R’s function-call semantics (2006, popularity = 1.22). This article’s popularity is easy to explain: R continues to steamroll just about everything else in statistical computing and has a continuous influx of new, curious users who want to know more about R’s inner workings.
5. Verizon FiOS fiber-optic Internet service: a first look (2005, popularity = 1.05). I think this article is popular because I was an early adopter of FiOS had one of the first hands-on reviews. It gets lots of search hits.
6. A couple of tips for writing Puppet manifests (2007, popularity = 1.02). I’m not sure either of these tips is still relevant. Still, this article brings in readers.
7. How I stopped missing Darcs and started loving Git (2007, popularity = 1.01). Programmers love to talk about DVCSs, Git and Darcs especially. Plus, if you search on “darcs git”, this article is one of the first results.
8. A type-based solution to the ‘strings problem’: a fitting end to XSS and SQL-injection holes? (2006, popularity = 1.00). This article remains popular because it gets readers from two sources: from religious wars over typing systems and from discussions of what to do about XSS vulnerabilities.
9. Don’t let password recovery keep you from protecting your users (2007, popularity = 0.93). This article is a follow-up to Never store passwords…! and tends to pick up a share of its sibling’s traffic.
10. On the evidence of a single coin toss (2010, popularity = 0.78). This short article raises a simple question: If I hand you a coin and claim that it always comes up heads, and you toss the coin and it does come up heads, how much more should you believe my claim compared to before the coin toss? This kind of question is irresistible to anyone even remotely Bayesian, so it ended up on Hacker News and got a lot of traffic in a few days. (The follow-up article is also popular, but didn’t make the top-ten list.)
So, once again, it looks like the old content dominates. Only one article from 2010 made the top ten, and just barely at that.
Posted in site news
Tags blog, content, popularity, statistics
no comments
no trackbacks

Posted by Tom Moertel
Mon, 27 Dec 2010 00:14:00 GMT
After much neglect, my poor blog is finally getting some much-needed care.
The first improvement I wanted to make was to add support for TeX mathematics. Last week’s article on how to update your beliefs after observing a coin toss contained enough painstakingly entered mathematical notation to provide the necessary motivation. The solution I preferred was MathJax, a JavaScript library that runs in the browser to render TeX markup into mathematical notation after a page is loaded. But that solution created a new problem.
MathJax, you see, has a rather large footprint. And my blog runs on a decrepit server that is already overtaxed. So, to use MathJax, I first had to put a reasonably tuned cache in front of my blog to offload the byte-slinging duties soon to be imposed.
Varnish
Enter Varnish, an efficient, highly tunable, caching HTTP proxy. I set it up on a front-end server and told it to cache anything mostly static on the blog’s server:
# allow caching of mostly static resources
sub vcl_fetch {
if (req.url ~ "\.(ico|png|gif|jpg|swf|css|js)$" ||
req.url ~ "^/xml/.*\.xml$" ||
req.url ~ "^/$") {
set obj.ttl = 600s;
}
if (req.url ~ "/javascripts/MathJax/") {
set obj.ttl = 3600s;
}
if (req.url ~ "\.(ico|png|gif|jpg|swf|css|js)\?[0-9]+$") {
set obj.ttl = 1d;
}
}
Basically, that bit of Varnish Configuration Language says that after the proxy fetches a resource from the back-end blog server, if it’s an image, script, feed, or MathJax resource, it should be given some reasonable amount of time to live in the cache. Once in the cache, Varnish will serve it up until its time to live expires, when Varnish will finally ask the old blog server to fetch another copy.
This little change made a big difference in my blog’s responsiveness. It feels much snappier now. (Let me know if you agree.)
MathJax
The front-end cache done, I moved on to installing MathJax. Basically, I downloaded a couple of Zip archives, decompressed them, and dropped the resulting files onto my blog’s server. Then I tweaked the blog’s default page template to load the root MathJax JavaScript file. That’s it.
Now I can have fun with TeX-markup mathematical formulas on the blog: $$1 + x_1 + x_2 + \cdots$$
The only downside to using a client-side library like MathJax is that it will probably not go so well for readers using Instapaper and e-readers. (If you’re one of them, let me know how it goes for you.)
Posted in site news
Tags blog, caching, mathjax, performance, tex, varnish
3 comments
no trackbacks

Posted by Tom Moertel
Mon, 20 Dec 2010 14:35:00 GMT
Recently, I asked how much evidence was contained in a single coin toss:
After seeing the outcome of this single coin toss [which came up heads], how much more should you believe my claim that the coin always comes up heads, compared to what you believed before the coin toss?
Many people submitted answers here on the blog and also on Hacker News, where the question led to an interesting discussion. Before I get to my answer, however, let’s talk about the question.
I like this question because it’s simple yet offers ample opportunity to explore something valuable but often unappreciated: weak evidence. Here we have the evidence of a single coin toss that comes up heads. That’s not much to go on. But it is something, and we would be wrong to ignore it.
Nevertheless, I’ve witnessed many experts ignore weak evidence, doctors in particular. The problem with ignoring weak evidence is that it’s abundant. Think of it as “long-tail” evidence: there’s so much of it that even if each piece is worth only a tiny bit, as a whole it’s worth a ton. So, if you don’t know how to mine it, you’re leaving a ton of potential knowledge buried within that long tail.
Interpreting evidence (weak or otherwise)
So, let’s talk about the evidence of our coin toss. My question was how much your prior beliefs about my claim (that the coin always comes up heads) should be swayed by the outcome of that single coin toss. I’m not asking about the coin, but about your state of knowledge about the coin and, more specifically, how that state should change in light of the coin toss.
There are many ways to approach the question, but to start, let’s define some notation. We’ll let P(X) denote our degree of belief in the proposition X, some statement that can be either true or false. Let P(X) = 0 represent our absolute conviction that X is false, and P(X) = 1 our absolute conviction that X is true. When P(X) = 1/2, it represents that we have no reason to believe that X is more likely to be true than false. If we know nothing about X, then, our default value for P(X) must be 1/2.
Let’s be clear that X is either true or false, regardless of what we think. Our X represents some real property of the universe, and the universe doesn’t alter itself just because our thoughts about it change or because we do a mathematical calculation that we think describes it in some way. That’s why we write P(X): the P notation represents that we’re not talking about X itself but rather our belief in X. The P(·) can be read as “the probability of” (or “the plausibility of”), so P(X) represents “the probability of X.”
Instead of some placeholder X, let’s define some real propositions that relate to our coin toss:
- S: the coin is a special coin that always comes up heads when tossed
- H: we observe the coin to come up heads in a coin toss
- T: we observe the coin to come up tails in a coin toss
- K: our prior knowledge about the coin, the universe, and everything
That last one, K, is important. It’s a massive proposition, the logical conjunction of many smaller propositions that represent everything we already know – that the Earth is approximately spherical, that gravity pulls things toward one another, that the author of this blog post is exceedingly handsome, and so on. This massive proposition is often left out of probability calculations with the understanding that it’s implied, but I’m going to include it because it makes our assumptions more explicit.
Now, the probabilities we’re interested in:
- P(S|K): our belief that the coin is a special heads-always coin, in light of our prior knowledge
- P(S|H∧K): our belief that the coin is a special heads-always coin, in light of our prior knowledge and the knowledge that we observed the coin to come up heads in a coin toss
I’ve introduced some new notation. The vertical bar (|) is read as “given” and can be interpreted to mean “in light of the following.” The ∧ operator is new, too. It represents logical conjunction and can be read as “and.” For instance, A∧B represents the proposition that both propositions A and B are true; and P(S|H∧K) represents the probability that S is true, in light of both H and K being true.
The first probability, P(S|K), is sometimes called our prior probability because it represents how much we believe S before considering new evidence, when we have only our prior knowledge K to go on. The second, P(S|H∧K), is sometimes called our posterior probability because it represents how much we believe S after considering the new evidence H, too.
Now, how do we update our prior beliefs about the coin to arrive at our posterior beliefs, in light of having witnessed the coin toss come up heads? Let’s think about this updating process for a moment.
Our new beliefs about the plausibility of some proposition X, in light of new evidence E, ought to be the same as our prior beliefs about X, but adjusted to account for observing the new evidence. The adjustment factor, according to Bayes’ rule (and justified by Cox’s theorem), is given by a quotient: the plausibility of observing the new evidence, given that X is true, divided by the plausibility of observing the new evidence in any case. (And, of course, all of these adjustments occur in light of our prior knowledge K about the universe in general.)
As a pseudo-English equation, Bayes’ rule is surprisingly intuitive:
(new plausibility) = (old plausibility) × (evidence adjustment),
or, equivalently, using our probability notation:
P(X|E∧K) = P(X|K) × [ P(E|X∧K) / P(E|K) ].
The evidence adjustment itself may not seem so intuitive, but it does make sense. It is the quotient of two plausibilities: that of observing the evidence E given that the proposition X is true, and that of observing E regardless. You can think of the adjustment as quantifying how well the proposition uniquely explains the evidence.
For example, if the proposition being true is the only reasonable explanation for the evidence, observing the evidence ought to provide strong support for the proposition. If rain is the only way that every house in the neighborhood gets wet at the same time, knowing that every house in the neighborhood is currently getting wet provides strong support to the proposition that it is raining. On the other hand, knowing that somebody is carrying an umbrella provides weaker support because things besides rain can also explain that evidence, the anticipation of rain, for one.
Getting back to my original question, I asked how much more you should believe my claim S (that the coin always comes up heads) after observing the evidence H (that the coin did come up heads when you tossed it). That is, I’m asking you to characterize the new plausibility in light of the old. The relative change between the two is given as follows:
[ (new plausibility) / (old plausibility) ] – 1
This quantity, we can see from Bayes’s rule, is merely our evidence adjustment less one. But to calculate this value, we’ll first need the probabilities the calculation is likely to require. Let’s see, what do we already know?
Representing our knowledge
First, our prior knowledge K informs us that a coin toss is understood to have only two potential outcomes: heads and tails. A coin toss is considered invalid, for example, if the coin stands on edge or is tossed into a chasm. Therefore, a coin toss must result in heads or tails:
P(H∨T|K) = 1,
and getting tails is the same as not getting heads:
P(T|K) = P(¬H|K).
More notation: we use ¬ to denote “not” and ∨ to denote logical disjunction, read “or.”
Next, we know that if the coin is special, it will come up heads when tossed:
P(H|S∧K) = 1.
But what if the coin is not special? In that case, do we have any reason to believe it is more likely to come up heads than tails, or vice versa? No. So, we must consider each proposition equally likely:
P(H|¬S∧K) = P(¬H|¬S∧K).
Further, because there are no other possibilities – the coin must come up heads or tails – their total probability must be one:
P(H|¬S∧K) + P(¬H|¬S∧K) = 1.
If the two probabilities are equal and must sum to one, each must be one half:
P(H|¬S∧K) = P(¬H|¬S∧K) = 1/2.
At this point, you may be tempted to object that our beliefs, being overly subjective, have led us to an unjustified conclusion. Even if the coin isn’t special, how can we say it has an even chance of coming up heads (or tails), in other words, that it’s fair? What justifies this claim?
In truth, we can’t justify it. But we didn’t make it, either.
Remember, we are not making any claims about the coin. Our equations make claims only about our knowledge of the coin. If the coin isn’t special, maybe it is still biased somehow. Even so, we have no reason to believe it is more likely to be biased one way or the other. Therefore, by symmetry, we can assign only one degree of belief to either proposition H or ¬H, and that is 1/2.
The evidence-adjustment factor
With our prior beliefs represented as probability equations, let’s get back to computing that evidence adjustment.
(evidence adjustment) = P(H|S∧K) / P(H|K)
The numerator on the right-hand side we already know: P(H|S∧K) = 1.
The denominator, P(H|K), we do not. We must find some way to break it into terms that we do know.
The nice thing about propositions, like H, is that we can use Boolean logic to manipulate them. So, let’s break H into pieces that are more likely to be useful:
H = H∧(S ∨ ¬S) = (H∧S) ∨ (H∧¬S).
What I did was split the proposition that the coin comes up heads into a disjunction of two mutually exclusive propositions: that the coin comes up heads and is special, or that the coin comes up heads and is not special. That first term of the disjunction, however, is redundant: if a coin is special, our prior knowledge already tells us that it must come up heads; therefore, we can simplify H∧S to S. Now we have,
H = S ∨ (H∧¬S), given K,
and, therefore,
P(H|K) = P((S ∨ (H∧¬S))|K).
We can break up the disjunction on the right-hand side using the sum rule for probabilities, which is given as:
P(A∨B) = P(A) + P(B) – P(A∧B).
Since our disjunction is of mutually exclusive propositions, the final term of the sum-rule expansion drops out; therefore,
P(H|K) = P(S|K) + P(H∧¬S|K).
Now let’s crack that new final term, P(H∧¬S|K). To do so, we’ll use the product rule for probabilities:
P(A∧B) = P(A|B) P(B) = P(B|A) P(A).
So:
P(H∧¬S|K) = P(H|¬S∧K) P(¬S|K).
And, already knowing that P(H|¬S∧K) = 1/2, we can simplify the right-hand side:
P(H∧¬S|K) = P(¬S|K)/2.
And substituting this reduction back into the equation for P(H|K) gives,
P(H|K) = P(S|K) + P(¬S|K)/2.
We can further simplify the equation by noting that P(S|K) + P(¬S|K) must equal 1 and, therefore, that the ¬S term can be rewritten in terms of S to give,
P(H|K)
= P(S|K) + (1 – P(S|K))/2
= (1 + P(S|K)) / 2.
Now, to bring it all home, let’s plug these values into our evidence-adjustment formula:
(evidence adjustment)
= P(H|S∧K) / P(H|K)
= 1 / P(H|K)
= 1 / [(1 + P(S|K)) / 2]
= 2 / (1 + P(S|K)).
And that’s our evidence-adjustment factor. Now, what does it do?
Adjusting our beliefs in light of the new evidence
To better understand what the evidence adjustment does, let’s recall the original belief-adjustment equation:
(new plausibility) = (old plausibility) × (evidence adjustment)
So the adjustment factor nudges our initial degree of belief, whatever it may be, one way or the other, depending on the evidence. To see the effect of this nudge for various initial degrees of belief, consider the following plot:

Looking at the plot, let’s see if the nudge agrees with our intuition. First, if we were absolutely convinced that the coin is (or is not) special, no amount of evidence should sway our beliefs. Looking at the plot, we see that when our prior probability is 0 or 1, so is our adjusted (posterior) probability, exactly what we expected.
But what if our initial knowledge is complete ignorance about the coin being special? In that case, upon seeing the coin toss, our prior probability of 1/2 gets nudged to the posterior probability of 2/3 – toward the belief that the coin is indeed special. Again, it’s what we would expect.
In fact, the evidence adjustment is always going to push us toward confirming the belief that the coin is special because the evidence supports that belief. The force of that push, however, depends on how surprising we find the evidence, that is, how much it challenges our prior beliefs. The following plot shows this relationship:

Note that the evidence provides the strongest push – a factor of 2 – when our prior knowledge makes us doubt most strongly that the coin is special. On the other extreme, when we are already convinced that the coin is special, observing that the coin comes up heads when tossed isn’t surprising at all, and correspondingly the evidential push of that observation is nothing: an adjustment factor of unity.
Answering the original question
Finally, with our evidence adjustment well in hand, we can answer the original question: After seeing the outcome of this single coin toss, how much more should you believe my claim that the coin always comes up heads, compared to what you believed before the coin toss?
The answer, we reasoned earlier, is the evidence adjustment less one:
(relative plausibility increase)
= [evidence adjustment] – 1
= [2 / (1 + P(S|K))] – 1
= [2 / (1 + (prior plausibility))] – 1
= (1 – (prior plausibility)) / (1 + (prior plausibility))).
So, if we let p represent our prior degree of belief that the coin is a special, heads-always coin, we should be
100% × (1 – p) / (1 + p)
more confident in our belief after seeing the coin come up heads when tossed.
And that’s the answer.
But there are other ways of arriving at it. One of the more convenient is to use odds instead of probabilities. But let’s save that discussion for next time.
Posted in interesting stuff
Tags bayesian, coin, evidence, odds, probability, reasoning, toss
6 comments
no trackbacks
