Odds and the evidence of a single coin toss

Posted on
Tags: probability, odds, bayesian, coin-toss-problem, reasoning, evidence

A while ago I wrote about how to update your beliefs in light of the evidence contained within a single coin toss. My article ended with the tantalizing prospect of an easier way to do such updates using odds. That prospect is what we’ll explore in this article.

First, we’ll be using the same probability notation as before. If you’re unfamiliar with it, go back and review the earlier article.

Second, if you’re reading this article from a feed, there’s a good chance that the math is going to look screwy because MathML doesn’t travel well. If so, just click through to the original article for a proper rendering.

Now, to the good stuff.

Introducing odds

The odds on a proposition \(X\), written \(O(X)\), is just the probability that \(X\) is true relative to the probability that it is false: \[O(X) = \frac{P(X)}{P(\neg X)} = \frac{P(X)}{1 - P(X)}\]

Thus given a probability \(p\) we can compute the equivalent odds \(o = p/(1-p)\). This formula maps (monotonically) the intuitive probability scale of \(p \in [0, 1]\) to the seemingly bizarre odds scale of \(o \in [0, \infty]\). Why would we want to do this? The short answer: odds are often easier to work with.

But before we see how odds can make calculations easier, let’s try to gain some intuition about what they represent. Here are some common degrees of belief encoded as both probabilities and odds:

degree of belief probability odds
I’m certain it’s false \(p=0\) \(o=0\)
I’m certain it’s true \(p=1\) \(o=\infty\)
I have no idea \(p=1/2\) \(o=1\)

For historical reasons, you’ll often see odds written as a ratio \(n:m\). These ratios can be normalized to the form \(o:1\) where \(o = n/m\). When \(n=m\), the odds are said to be “even” or “one-to-one” or “fifty-fifty”; all such ratios normalize to \(o=1\).

Updating beliefs using odds

If you’ll recall, Bayes’ rule tells us that the plausibility of a proposition \(X\) in light of some new evidence \(E\) is just the proposition’s prior plausibility multiplied by an adjustment factor for the new evidence:

\[(\mbox{new plausibility}) = (\mbox{old plausibility}) \times (\mbox{evidence adjustment})\]

In terms of probabilities, it looks like this:

\[P(X \mid E) = P(X) \times {P(E \mid X)}/{P(E)}\]

What about in terms of odds? Since odds relate the probabilities of \(X\) and \(\neg X\), let’s consider what happens to the probability of \(\neg X\) when we learn about new evidence \(E\). As for \(X\), it’s just a straightforward application of Bayes’ rule:

\[P(\neg X \mid E) = P(\neg X) \times {P(E \mid \neg X)}/{P(E)}\]

Now let’s divide the first equation by the second:

\[\frac{P(X \mid E)}{P(\neg X \mid E)} = \frac{P(X) \times {P(E \mid X)}/{P(E)}}{P(\neg X) \times {P(E \mid \neg X)}/{P(E)}}\]

Breaking up the terms gives us pieces we can begin to recognize:

\[\frac{P(X \mid E)}{P(\neg X \mid E)} = \frac{P(X)}{P(\neg X)} \times \frac{P(E \mid X)}{P(E \mid \neg X)} \times \frac{1/P(E)}{1/P(E)}\]

The term on the left side and the initial term on the right side are now easily recognized as odds. The final term on the right side, involving our prior beliefs about \(E\), helpfully drops out altogether. (This is a big win, as we’ll see later.) What we’re left with is simple and easy to interpret:

\[O(X \mid E) = O(X) \times \frac{P(E \mid X)}{P(E \mid \neg X)}\]

In other words, it’s our familiar belief-updating rule, just in terms of odds:

\[(\mbox{new odds}) = (\mbox{old odds}) \times (\mbox{evidence adjustment})\]

But now the evidence adjustment is a ratio of two likelihoods (and is often called a “likelihood ratio”):

adjustment for probabilities adjustment for odds
\[\frac{P(E \mid X)}{P(E)}\] \[\frac{P(E \mid X)}{P(E \mid \neg X)}\]

Note the difference in denominators. It’s often easier to come by \(P(E \mid \neg X)\) than \(P(E)\). As a case in point, let’s return to our earlier coin-toss problem.

Using odds to update our beliefs in the coin-toss problem

If you’ll recall, I had claimed \(S\), that I had a special coin that always comes up heads. You then observed evidence \(H\), that when you flipped the coin it did indeed come up heads. My question to you was this: How much more should you believe \(S\) in light of \(H\)?

This “how much more” is exactly what our evidence adjustment represents. In terms of odds, it’s just the likelihood ratio

\[\frac{P(H \mid S)}{P(H \mid \neg S)}.\]

If the coin is special, we know it will come up heads; thus the numerator is \(1\). But if it’s not special, we have no reason to believe it’s more likely to come up heads or tails; thus the denominator is \(1/2\). And that’s all we need:

\[\frac{P(H \mid S)}{P(H \mid \neg S)} = \frac{1}{1/2} = 2.\]

Problem solved: In light of the coin coming up heads, we should double our odds on the coin being special.

Chaining updates

Notice that in this case the evidence adjustment is a constant multiplier of \(2\). In other words, regardless of our prior odds that the coin is special, when we see the coin come up heads, we should double our odds.

Exercise: What if you flip the coin three times and get three heads? How should this outcome affect your beliefs about \(S\)?

Solution: This three-flip experiment is just the one-flip experiment repeated three times. And each time we know what to do: double our odds on \(S\). Thus observing the three-heads outcome should cause us to scale our prior odds by \(2\times 2\times 2 = 2^3 = 8\).

Using odds, our belief updates become trivial. If the coin continues to come up heads in \(100\) follow-up experiments, we just scale the current odds by \(2^{100}\).

In contrast, recall that when we had solved the problem using probabilities, the evidence adjustment was not constant; it depended upon our prior probability of \(S\). If our prior probability was \(p\), our evidence adjustment was \(2/(1 + p)\). This dependence makes it harder to answer questions about repeated experiments because each experiment affects our current probability \(p\) and thus alters our evidence adjustment. No more easy chaining.

So using odds in this case is a big win (and will often be when testing binary hypotheses).

The point

Odds and probabilities are two ways of encoding degrees of belief. They are equivalent but have different effects on calculation. Using one or the other will often make your life easier, so know how to use both.

P.S. If you find this stuff interesting, the best formal treatment I’ve found is Probability Theory: The Logic of Science, a delightful book by E. T. Jaynes. You can read the first three chapters online. The first chapter is especially rewarding; in it Jaynes builds the logic of “plausible reasoning” from the ground up.