Chapter One, Part IV

Recently I decided I needed—this minute!—the exact text of Bill Murray’s Caddyshack riff about toting the Dalai Lama’s golf bag. The punch line of the riff is “So I got that going for me, which is nice” and the Dalai Lama, in Murray's telling, likes to say “Cunga galunga.” So I went to Google, the Internet search engine, typed in “going for me” and “gunga,” and hit the search button. A list of 695 Web pages came back. First on the list was an article from GolfOnline, which included the second half of the riff. That was okay, but third on the list was a Web site for something called the Penn State Soccer Club. The goalie, a guy named David Heist, had
posted the entire monologue. The search took 0.18 seconds.

Then I needed to check out the Mulherin paper on the Challenger that I discuss above. I couldn’t remember the author’s name, so I typed in “‘stock market’ challenger reaction”: 2,370 pages came back. The first one was an article by Slate’s Daniel Gross about the Mulherin paper. The third was Mulherin’s own Web site, with a link to his paper. That search—which, remember, did not include Mulherin’s name—took 0.10 seconds. A few minutes later my search for the lyrics to a Ramones song about Ronald Reagan visiting the Bitburg cemetery took 0.23 seconds, and the First item on the list had what I needed.

If you use the Internet regularly, these examples of Google’s performance will not surprise you. This is what we have come to expect from Google: instantaneous responses with the exact page we need up high in the rankings. But if possible, it’s worth letting yourself be a little amazed at what happened during those routine searches. Each time, Google surveyed billions of Web pages and picked exactly the pages that I would find most useful. The cumulative time for all the searches: about a minute and a half.

Google started in 1998, at a time when Yahoo! seemed to have a stranglehold on the search business—and if Yahoo! stumbled, then AltaVista or Lycos looked certain to be the last man standing. But within a couple of years, Google had become the default search engine for anyone who used the Internet regularly, simply because it was able to do a better job of finding the right page quickly. And the way it does that—and does it while surveying three billion Web pages—is built on the wisdom of crowds.

Google keeps the details of its technology to itself, but the core of the Google system is the PageRank algorithm, which was first defined by the company’s founders, Sergey Brin and Lawrence Page, in a now-legendary 1998 paper called ‘The Anatomy of a Large-Scale Hypertextual Web Search Engine.” PageRank is an algorithm—a calculating method—that attempts to let all the Web pages on the Internet decide which pages are most relevant to a particular search. Here’s how Google puts it:


PageRank capitalizes on the uniquely democratic characteristic of the web by using its vast link structure as an organizational tool. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. Google assesses a page’s importance by the votes it receives. But Google looks at more than sheer volume of votes, or links; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.”
In that 0.12 seconds, what Google is doing is asking the entire Web to decide which page contains the most useful information, and the page that gets the most votes goes first on the list. And that page, or the one immediately beneath it, more often than not is in fact the one with the most useful information.

Now, Google is a republic, not a perfect democracy. As the description says, the more people that have linked to a page, the more influence that page has on the final decision. The final vote is a “weighted average”—just as a stock price or an NFL point spread is—rather than a simple average like the ox-weighers’ estimate. Nonetheless, the big sites that have more influence over the crowd’s final verdict have that influence only because of all the votes that smaller sites have given them. If the smaller sites were giving the wrong sites too much influence, Google’s search results would not be accurate. In the end, the crowd still rules. To be smart at the top, the system has to be smart all the way through.

1 comment:

  1. Submit your blog or website now for listing in Google and 300+ search engines!

    Over 200,000 sites submitted!

    Submit RIGHT NOW using I NEED HITS!!!

    ReplyDelete