The Lurker

Latest posts | Archive

posted by ajf on 2002-10-17 at 10:34 pm

Daniel Brandt doesn't like Google's PageRank:

As Google explains, "Votes cast by pages that are themselves 'important' weigh more heavily and help to make other pages 'important.'" In other words, the rich get richer, and the poor hardly count at all. This is not "uniquely democratic," but rather it's uniquely tyrannical. It's corporate America's dream machine, a search engine where big business can crush the little guy. This alone makes PageRank more closely related to the "pay for placement" schemes frowned on by the Federal Trade Commission, than it is related to those "impartial and objective ranking criteria" that the FTC exempts from labeling.

How is it anything like "pay for placement"? Google doesn't get paid! Pay-for-placement is about selective favouritism. If a site is considered trustworthy by Google's algorithm, it is because the PageRank algorithm deems it so, not because of a special exception made by Google employees. Brandt seems to believe that there is something wrong with selecting favouring the "opinion" of highly respected sites, but in reality some measure of trust is vital both to determining the relevance of a page and attack resistance. That an algorithm based largely on trusted link popularity has been so successful for four years now in determining web page relevance, while limiting the amount of trust an attacker can acquire, shows the absurdity of the argument; the dominance he fears is a result of satisfied consumers, who have no reason not to use other search engines (Teoma, AllTheWeb, Altavista) if Google's results are inadequate. Brandt says that there are several search engines that have made interesting advances in content analysis and even visualization, but Google is not one of them as though a potentially better measure of relevance is more valuable than one which is effective today.

Brandt's remedies are even more absurd than his charges. He claims that Google ought to replace all mention of PageRank in their own public relations documentation, in favor of general phrases about how link popularity is one factor among many in their ranking algorithms but provides no reason to do so. I fail to see how the use of a trademark affects anything of consequence. Removing the term "PageRank" from their web site will not affect search results in any way.

There's more wrong with Google than PageRank, apparently. Using GET to send search terms is apparently evil:

There are two methods for collecting your search terms, GET and POST. Search engines use GET because you can bookmark the search, link the search, and pass data inside the link. However, your search terms end up on the same line as your IP address in standard web logs all over the world with the GET method. This is "referrer" information, which is available to the distant webmaster every time you click on a link from a search results page. The webmaster knows that someone at your IP address accessed his page, and also knows what you were thinking from your search terms. That is why we use the POST method in our search proxy. You cannot bookmark the search, but neither can anyone see your terms. The search engine sees them, but all they know is that someone used those terms on our proxy at a certain time.

Only two items are shown below, the remote address and the "referrer" information. Not shown are these pieces of information, which would be on the same log line:

  • the date and time stamp
  • the name of the file that was accessed on our site
  • the status code (whether the request succeeded)
  • the number of bytes transferred
  • the type of browser used

The status code? The number of bytes transferred?! These are not privacy concerns — they're included here solely to make this list longer. The status code basically indicates whether or not the URL pointed to a valid web page or not, and the number of bytes transferred is the size of the web page, which is determined by what the web site decides to do with the request.

Apparently the cookie Google uses to store preferences is suspect too:

The CIA had to stop using a comparatively innocent log-analysis cookie that expired in 10 years, and their document search site isn't even used by many people. Google handles 150 million searches per day, and their cookie expires in 2038. One of Google's leading software engineers, Matt Cutts, has a top-secret clearance and used to work for the National Security Agency. Google doesn't even feel the need to defend their cookie policy; they merely laugh off anyone who inquires about it.

I agree that the 2038 cookie expiry date is unnecessary and that, as Brandt argues in his letter to Google on the subject, a less extreme expiration period would be sufficient. (Though he suggests a 30 day expiration only reluctantly as an alternative to a session cookie — which would expire when you shut down your browser, making it completely useless for storing preferences for your next visit, suggesting that he doesn't really know what he's talking about.) But his comparison to the CIA and utterly ridiculous and irrelevant reference to the NSA reeks of a "Google is worse than Hitler!" argument. If somebody asked me a question about cookies while rambling about the CIA and the NSA, I'd ignore them too.

Related topics: Rants Mindless Link Propagation

All timestamps are Melbourne time.