I’ve been working on a search engine, and one of the issues that I had to consider was how many bytes to allocate to page indexing. I decided to consult the master. The search engine that I’m making isn’t scanning the whole internet, and so will not approach the number of pages that Google is indexing, but it gives me an idea.
And I discovered something interesting. The copyright line at the bottom of Google’s front page states: “©2004 Google—Searching 4,285,199,774 web pages”.
2 32—the number of separate items that can be indexed by a 32-bit binary value—is 4,294,967,296. Google is very close to this, which leads me to suspect that their index has a 32-bit ceiling. Another 9 million pages may sound like a lot, but it isn’t really.
I wonder whether there is a hard limit on Google’s indexing capabilities, and what they are going to do about it, if anything.