The Lurker

Latest posts | Archive

posted by ajf on 2006-08-17 at 11:59 pm

Is URL.hashCode() Busted?

The javadoc says what I'd expect "Creates an integer suitable for hash table indexing." So I tried this:

        URL url1 = new URL("");
        URL url2 = new URL("");
        System.out.println(url1.hashCode() + " " + url1);
        System.out.println(url2.hashCode() + " " + url2); 

and got this


I was expecting different hashCode's. Either is busted or I'm blowing it and my understanding of the contract with java.lang.Object and its hashCode() method is busted.

I have no idea how I stumbled across this — probably browsing Javablogs — but I was tempted to reply, then noticed that the blog didn't accept comments, which is why I decided to write about it in my own blog — six months later... and then I noticed that somebody else happens to have described the underlying bug just the other day.

The problem isn't with hashCode() - having the same hash code doesn't prevent two objects from being contained in the same Set (or being used as keys in the same Map). The problem is with the equals() method which, as Havoc Pennington pointed out, determine equality by resolving the domain name and comparing the IP address.

every RSS feed URL on (for example) compares equal.


There is no reason to use ever. This is just one example of its many bugs and bad design. (I'm too lazy to look up the bug report in which creating a URL for a filename containing spaces doesn't escape the space — which was resolved "won't fix"!) Use commons-httpclient if you need to retrieve content (or some other third party library for non-HTTP protocols). Otherwise just use String if you're using the URLs simply as opaque identifiers, or if you need to manipulate them (to resolve relative URLs, for example).

Related topics: Rants Java Web Mindless Link Propagation

All timestamps are Melbourne time.