Thursday, December 6, 2007

Can you watch this without smiling?

If so... I don't know. You might be a replicant or something.

Why does Set.contains() take an Object, not an E?

Virtually everyone learning Java generics is initially puzzled by this. Why should code like the following compile?

Set<Long> set = new HashSet<Long>();
set.add(10L);
if (set.contains(10)) {
// we won't get here!
}

We're asking if the set contains the Integer ten; it's an "obvious" bug, but the compiler won't catch it because Set.contains() accepts Object. Isn't this stupid and evil?

A popular myth is that it is stupid and evil, but it was necessary because of backward compatibility. But the compatibility argument is irrelevant; the API is correct whether you consider compatibility or not. Here's the real reason why.

Let's say you have a method that wants to read from a Set of Foos:

public void doSomeReading(Set<Foo> foos) { ... }

The problem with this signature is it won't allow a Set<SubFoo> to be passed in (where SubFoo is, of course, a subtype of Foo).

To preserve the substitutability principle, any method that wants to read from a set of Foos should be equally able to read from a set of SubFoos, so let's tweak our signature:

public void doSomeReading(Set<? extends Foo> foos) { ... }

Perfect!

But here's the catch: if Set.contains() accepted type E instead of type Object, it would now be rendered completely unusable to you inside this method body!

That signature tells the compiler, "don't let anyone ask about containment of an object unless you are damn sure that it's of the exact right type." But the compiler doesn't know the type -- it could be a Foo, or SubFoo, or SubSubFoo, or who knows what? Thus the compiler would have to forbid everything -- the only safe parameter to a method like this is null.

This is the behavior you want for a method like Set.add() -- if you can't make damn sure of the type, don't allow it. And that's why add() accepts only type E while contains() accepts anything.

So the distinction I'm making is between read methods and write methods, right? No, not exactly -- notice that Set.remove() also accepts Object, and it's a write method. The real difference is that add() can cause "damage" to the collection when called with the wrong type, and contains() and remove() cannot.

Uniformly, methods of the Java Collections Framework (and the Google Collections Library too) never restrict the types of their parameters except when it's necessary to prevent the collection from getting broken.

So what to do about this vexing source of bugs, as illustrated at top? Well, when I typed that code into IntelliJ, it flagged a warning for me right away. This let me know to either fix the problem or add an annotation/comment to suppress it. Problem solved.

Static analysis plays an extremely important role in the construction of bug-free software. And the very best kind of static analysis is the kind that pops up in your face the second you write something questionable.

The moral of the story: if you're not coding in a good, modern IDE, you're coding with one hand tied behind your back!