The recent spat between Google and Microsoft over Bing’s search results is entertaining as it is, but what it could really use is some completely uninformed speculation. Like this.
The story so far is that Google accused Microsoft of using Google’s search results for Bing. As evidence of this claim, Google seeded its search index with nonsense terms that appeared nowhere else on the Internet. For each of these terms, Google provided one search result, pointing to some legitimate page on the Internet. Then some Googlers did Google searches on those terms using new Windows laptops with IE8 and the Bing Toolbar enabled. Later, when searching on Bing for those same nonsense terms, Bing provided the same search results, results that could have come only from Google’s search engine.
Or could there be another explanation? Another possibility, one that Google doesn’t seem to have considered (more on this later), is that Bing uses IE8 and the Toolbar to observe user behavior and, from this observational evidence, infer page relevance in general, not just when users are surfing Google’s search results. That is, if Microsoft sees you reading a page and notes that you typed “potato chips” into a form a bit earlier, it could take the relationship as a tiny piece of evidence that the current page is relevant to potato chips.
Now, I don’t know if Bing actually infers page relevance from user behavior, but if you’re watching what users are doing as they surf, it seems obvious that you could mine those observations for relevance evidence. Google uses its celebrated PageRank algorithm to mine relevance evidence from the web’s link graph, but Bing needs some other way. Why not watch users and see what they think is relevant?
If Bing does mine user behavior for relevance evidence, that puts Google’s sting operation in a new light.
In Google’s experiment, the only evidence of what was relevant to its nonsense search terms was the Googlers’ own surfing behavior. When they clicked on their own fake search results and surfed the pages those results pointed to, they sent Microsoft evidence that those pages were relevant to the earlier nonsense terms. And Bing recorded this evidence.
But, because those terms were nonsense, nobody on the Internet was searching for them, let alone visiting pages that might be “relevant” to them – except for the Googlers. Therefore, the only thing Bing had learned about those terms was what the Googlers had told it through their surfing behavior. It was denied all other evidence.
So, when somebody searched Bing for those terms, Bing tried to figure out what results were relevant and found those tiny pieces of evidence gathered from the Googlers’ surfing. And, in absence of other evidence, those tiny pieces won out, and Bing coughed up the most relevant pages, which just happened to be the fake results from Google’s own search engine.
The point is that Google’s sting operation would have had the same outcome, regardless of whether Bing was trying to borrow Google’s search results specifically or trying to mine user behavior in general. To put it in Bayesian terms, the following likelihood ratio is pretty close to one:
P(Google’s sting “busts” Bing | Bing tries to reuse Google results)
P(Google’s sting “busts” Bing | Bing mines user behavior in general)
But that’s not the interesting thing.
The interesting thing is that there are folks at Google who know how to use Bayes’ rule. And my guess is that they pointed out to their fellow Googlers that the sting, as evidence of Bing deliberately trying to use Google results, wasn’t persuasive. But Google went public anyway.
Here comes the completely uninformed speculation I promised.
Maybe Google is worried that Bing’s relevance evidence, if it comes directly from users, might in time become more reliable than Google’s own PageRank-derived evidence, now that SEO and content farms have thrown so much noise into the calculations.
So – more speculation – maybe what’s really going on is that Google is trying to focus the public’s attention on how Bing is getting its relevance evidence. If it turns out that Bing is watching over people’s shoulders as they surf, I can imagine a lot of people, including citizens’ groups, raising a fuss over it. Maybe some politicians take notice. You see where I’m going?
If monitoring user behavior lets Bing do an end-run around PageRank, Google might want to shut that play down by appealing to the referees. But Google can’t be seen as trying to take a competitive advantage away from Microsoft. So, what if Google put Microsoft into a position where it had to defend Bing’s results, and the only way it could make a credible defense was by admitting, in the public spotlight, it was spying on its users?
Anyway, it’s just completely uninformed speculation. What else have you got?