The Lurker

Latest posts | Archive

posted by ajf on 2006-08-17 at 11:59 pm

Is URL.hashCode() Busted?

The java.net.URL javadoc says what I'd expect "Creates an integer suitable for hash table indexing." So I tried this:

        URL url1 = new URL("http://postsecret.blogspot.com");
        URL url2 = new URL("http://dorion.blogspot.com");
        System.out.println(url1.hashCode() + " " + url1);
        System.out.println(url2.hashCode() + " " + url2); 

and got this

1117198397 http://postsecret.blogspot.com
1117198397 http://dorion.blogspot.com

I was expecting different hashCode's. Either java.net.URL is busted or I'm blowing it and my understanding of the contract with java.lang.Object and its hashCode() method is busted.

I have no idea how I stumbled across this — probably browsing Javablogs — but I was tempted to reply, then noticed that the blog didn't accept comments, which is why I decided to write about it in my own blog — six months later... and then I noticed that somebody else happens to have described the underlying bug just the other day.

The problem isn't with hashCode() - having the same hash code doesn't prevent two objects from being contained in the same Set (or being used as keys in the same Map). The problem is with the equals() method which, as Havoc Pennington pointed out, determine equality by resolving the domain name and comparing the IP address.

every RSS feed URL on blogspot.com (for example) compares equal.

Oops.

There is no reason to use java.net.URL ever. This is just one example of its many bugs and bad design. (I'm too lazy to look up the bug report in which creating a URL for a filename containing spaces doesn't escape the space — which was resolved "won't fix"!) Use commons-httpclient if you need to retrieve content (or some other third party library for non-HTTP protocols). Otherwise just use String if you're using the URLs simply as opaque identifiers, or java.net.URI if you need to manipulate them (to resolve relative URLs, for example).

Related topics: Rants Java Web Mindless Link Propagation

All timestamps are Melbourne time.