Monday, March 1, 2010

The little optimization that couldn't

Let's say you represent two groups of people. If I collect one penny from each member of Group A, then, from the unbounded kindness of my heart, give one dollar to each member of Group B, has the total wealth of the groups combined become greater or less?

If you answered "greater", this article's for you! You see, there's a question you forgot to ask -- and it's a question that we as programmers forget to ask constantly.

A nicely optimized method?

In our internal core Java libraries at Google, we have a method called StringUtil.repeat(). It returns a string consisting of n copies of a base string -- so StringUtil.repeat("hey", 3) produces "heyheyhey". When I first came across it, and cleaned it up a bit, its implementation looked a little bit like this:

public static String repeat(String base, int count) {
if (base.equals("") || count == 0) {
return "";
}
if (count == 1) {
return base;
}
StringBuilder sb = new StringBuilder(base.length() * count);
for (int i = 0; i < count; i++) {
sb.append(base);
}
return sb.toString();
}

What's going on here? Well, there are four basic cases.

Case one: base is the empty string, so the result should always be the empty string. We don't want to loop and all that, so this optimization returns an empty string straight away.

Case two: count is zero. Here again, why do any actual work? We should return the empty string here and now.

Case three: count is one. We can avoid instantiating a new string by returning the original string directly!

Case four: aww nuts. In this case, we really do have to loop and create the new string. Well, at least we optimized out as much ass we could first!

In each case, we're very carefully doing the bare minimum amount of work we can! With me so far? Sounds good?

When I found this, I proceeded to make the method run even faster. Any guesses how I did it?

That's right, I simply removed all of these so-called optimizations.

Remember that to optimize a special case, you must check for that special case... in every case! That small extra cost goes to every single user of the method. And what of the benefits? Hah! Notice that the "optimized" special cases are the cases that are the fastest to compute anyway!

What's more, surprisingly enough, it doesn't really happen that often that users call a repeat() method passing zero or one as the count. Why would they? Commonly, that is. So we're "optimizing" a case that hardly even exists, at the expense of all the cases that do exist. The special-case checks were a net loss, and better off removed.

It's not like my removing them radically improved the performance of anyone's application. However, the experience is useful as a lesson. You'll encounter this same situation many times in many guises, and it will often be tempting to think about the benefits to the few rather than the overall aggregate cost to society as a whole (hmm... remind you of any politicians?)

If the Group A of my opening scenario has thousands of times more members than Group B -- not to mention that dollar really turns out to be a dime -- it's a bad deal. Just say no!