The Google-Microsoft squabble over Bing results: some completely uninformed speculation
Posted by Tom Moertel Wed, 02 Feb 2011 15:08:00 GMT
The recent spat between Google and Microsoft over Bing’s search results is entertaining as it is, but what it could really use is some completely uninformed speculation. Like this.
The story so far is that Google accused Microsoft of using Google’s search results for Bing. As evidence of this claim, Google seeded its search index with nonsense terms that appeared nowhere else on the Internet. For each of these terms, Google provided one search result, pointing to some legitimate page on the Internet. Then some Googlers did Google searches on those terms using new Windows laptops with IE8 and the Bing Toolbar enabled. Later, when searching on Bing for those same nonsense terms, Bing provided the same search results, results that could have come only from Google’s search engine.
Busted!
Or could there be another explanation? Another possibility, one that Google doesn’t seem to have considered (more on this later), is that Bing uses IE8 and the Toolbar to observe user behavior and, from this observational evidence, infer page relevance in general, not just when users are surfing Google’s search results. That is, if Microsoft sees you reading a page and notes that you typed “potato chips” into a form a bit earlier, it could take the relationship as a tiny piece of evidence that the current page is relevant to potato chips.
Now, I don’t know if Bing actually infers page relevance from user behavior, but if you’re watching what users are doing as they surf, it seems obvious that you could mine those observations for relevance evidence. Google uses its celebrated PageRank algorithm to mine relevance evidence from the web’s link graph, but Bing needs some other way. Why not watch users and see what they think is relevant?
If Bing does mine user behavior for relevance evidence, that puts Google’s sting operation in a new light.
In Google’s experiment, the only evidence of what was relevant to its nonsense search terms was the Googlers’ own surfing behavior. When they clicked on their own fake search results and surfed the pages those results pointed to, they sent Microsoft evidence that those pages were relevant to the earlier nonsense terms. And Bing recorded this evidence.
But, because those terms were nonsense, nobody on the Internet was searching for them, let alone visiting pages that might be “relevant” to them – except for the Googlers. Therefore, the only thing Bing had learned about those terms was what the Googlers had told it through their surfing behavior. It was denied all other evidence.
So, when somebody searched Bing for those terms, Bing tried to figure out what results were relevant and found those tiny pieces of evidence gathered from the Googlers’ surfing. And, in absence of other evidence, those tiny pieces won out, and Bing coughed up the most relevant pages, which just happened to be the fake results from Google’s own search engine.
The point is that Google’s sting operation would have had the same outcome, regardless of whether Bing was trying to borrow Google’s search results specifically or trying to mine user behavior in general. To put it in Bayesian terms, the following likelihood ratio is pretty close to one:
divided by
P(Google’s sting “busts” Bing | Bing mines user behavior in general)
But that’s not the interesting thing.
The interesting thing is that there are folks at Google who know how to use Bayes’ rule. And my guess is that they pointed out to their fellow Googlers that the sting, as evidence of Bing deliberately trying to use Google results, wasn’t persuasive. But Google went public anyway.
Why?
Here comes the completely uninformed speculation I promised.
Maybe Google is worried that Bing’s relevance evidence, if it comes directly from users, might in time become more reliable than Google’s own PageRank-derived evidence, now that SEO and content farms have thrown so much noise into the calculations.
So – more speculation – maybe what’s really going on is that Google is trying to focus the public’s attention on how Bing is getting its relevance evidence. If it turns out that Bing is watching over people’s shoulders as they surf, I can imagine a lot of people, including citizens’ groups, raising a fuss over it. Maybe some politicians take notice. You see where I’m going?
If monitoring user behavior lets Bing do an end-run around PageRank, Google might want to shut that play down by appealing to the referees. But Google can’t be seen as trying to take a competitive advantage away from Microsoft. So, what if Google put Microsoft into a position where it had to defend Bing’s results, and the only way it could make a credible defense was by admitting, in the public spotlight, it was spying on its users?
Anyway, it’s just completely uninformed speculation. What else have you got?

Bing is vindicated as it did give results linking that gibberish word to the supposedly fake search results. If a search engine was supposed to infer patterns from user behaviors, it did a good job as it discovered that someone has connected the term to a result (its irrelevant that the purpose was a sting). If the search result can show the purpose of the connection as well next to the result that would be awesome.
A relevant post: http://willwhim.wordpress.com/2011/02/02/is-bing-cheating-at-search/
If what you said would have been the case the bogus search term should have been connected to the actual Google search results… but no, they took Google out and inserted the final site it pointed to.
So now they’re basically using a competitor’s search results to improve their own! And think it’s strange Google don’t like this?
Improving your OWN search results by detecting what YOUR clients click on when they visit YOUR pages is okay, but doing it by looking at the pages they visit on your competitor’s site is just not done IMHO.
I still don’t understand the reasoning behind why Google would want to villify Bing over this. Don’t they already have their own tool-bar, presumably with its own clickstream? If they are out to crucify Bing for mining user data, then wouldn’t they would be shooting themselves in the foot?
Quintesse, what you’re missing is that the Googlers who conducted the experiment actually clicked on Google’s own fake search results and visited the pages those results pointed to. So, if Bing wasn’t stealing Google’s results but only watching the pages users were visiting, those pages (what you call the “final site”) would have ended up in Bing’s index. And, again, if Bing wasn’t stealing Google’s results but only looking back in the observed user history to see what users may have typed into query forms before visiting those pages, Bing would have found the association between the nonsense search terms and the final sites without needing to use the intermediate Google-provided search results.
Chris, if Google is trying to vilify Bing over user-monitoring – and we’re just speculating, here – my guess would be it’s to exploit an asymmetry.
Microsoft has a monopoly on the desktop and could exploit it to capture huge amounts of user-generated evidence. If this evidence turns out to be highly effective for making relevance inferences (and I suspect it is), it could dominate other sources of evidence and diminish any competitive advantage those sources provide.
So, if Google thinks it beats Bing when using those other sources, it may wish to protect those sources from becoming dominated. One way it can protect them is by depriving Bing (but also itself) of user-generated evidence. That way, Bing must compete on grounds where Google has historically prevailed.