Wednesday, August 3, 2011

Reminder: hashcode is not a key

I just wanted to share a reminder not to treat Object.hashCode() as a key. This seems very basic, but I have seen experienced developers get tripped up over this method. A number of people seem to think that hashCode can be relied on to return unique integers for each object value, especially for strings. Oops.

For example, I saw someone write code that logically works like this:

   1: Foo fooObject = new Foo();



   2: ...



   3: Map<String, Foo> mymap = new HashMap<String, Foo>();



   4: mymap.put(fooObject.toString().hashCode(), fooObject);




Even of Foo.toString() is guaranteed to be unique for each distinct value of Foo, it does not follow that you can map Foo objects to the string's hashcode like this. In fact, collisions are quite possible.



String.hashCode() does not guarantee a unique value for each string. This is mathematically impossible, if you think about it. After all, hashCode returns an int. What is tricky is that String.hashCode() generates values that are just different enough for strings that code like this can pass preliminary testing, and only fail later with real data that results in hash collisions.



Another example: at a developer conference I attended, a Java guru and conference keynote speaker described the new Java 7 String in switch feature as implemented by a switch on the string's hashcode. If that were true, the string in switch feature would have been practically useless. Fortunately, this is not true and we set him straight (but having an argument with the keynote speaker is not really a great way to start a developer conference).



The thing to remember about hashCode() is that it is really a performance aid for hashtables. Hash functions strive to, but need not return, distinct integer hash codes for different object values. If you read the Javadoc for Object.hashCode(), you will find that the following hash code implementation will satisfy the hashCode contract for any object type:





   1: public int hashCode() { return 0; }




Such a hash function will yield terrible performance when used for hashtables, but technically it is correct. Never rely on the value of hashCode for identity: that's what equals() is for.

No comments:

Post a Comment