Monday, June 14, 2010

Guava release 05!

Guava release 05!

Hello guavateers,

I'm happy to (finally) inform you that Guava release 05 was quietly posted two weeks ago! It is also in the central maven repository as com.google.guava:guava:r05 (as will all future releases, so you may stop asking :-)).

Here is a report of the changes between r04 and r05. For that matter, here are the changes from r03 to r04 as well. There have been a few very nice new additions, as I hope you'll agree -- including the humble beginnings of a brand new package, com.google.common.net.

Remember that most newly-added classes and methods are marked @Beta, and so are still subject to change at any time. Note: this is particularly true of the cool new InternetDomainName class, several methods of which I already need to rename soon (sorry).

About the "Google Collections Library":

Everyone! It's time to stop using the library called the "Google Collections Library"! Guava represents a fully-compatible proper superset of that library. It also contains six months worth ofimportant bug fixes and improvements to performance and documentation.

Continuing to use the Google Collections may lead to trouble when an application ends up with both that and Guava on the classpath at the same time. If this happens, and Guava comes later in the classpath, unpredictable breakages could result! (And if Guava comes earlier, then the google-collect JAR will never even be seen; either way, the situation is senseless.)

Tools like Maven seek to manage your dependencies in an intelligent way, but cannot tell that Guava represents a newer version of Google Collections, so as long as the latter is still in use, it won't know the right thing to do with it.

Yes, the Guava JAR file is about 60% larger than the Google Collections one, but if this is an issue for you, we strongly recommend you address this using a JAR shrinking tool such as the much-loved ProGuard, and please share your experiences in doing so with the rest of us on this list.

Publicity?

With this release, I now see Guava as truly ready to be evangelized to the corners of the globe. Would you be able to help us spread the word? Blogs, comments, twitter, podcasts, company discussion forums, skywriting, whatever you please. It's especially important to me that we convey the message that Guava is the new Google Collections, and no one should use google-collect-1.0.jar anymore.

Any questions?

Thanks everyone!

Monday, March 1, 2010

The little optimization that couldn't

Let's say you represent two groups of people. If I collect one penny from each member of Group A, then, from the unbounded kindness of my heart, give one dollar to each member of Group B, has the total wealth of the groups combined become greater or less?

If you answered "greater", this article's for you! You see, there's a question you forgot to ask -- and it's a question that we as programmers forget to ask constantly.

A nicely optimized method?

In our internal core Java libraries at Google, we have a method called StringUtil.repeat(). It returns a string consisting of n copies of a base string -- so StringUtil.repeat("hey", 3) produces "heyheyhey". When I first came across it, and cleaned it up a bit, its implementation looked a little bit like this:

public static String repeat(String base, int count) {
if (base.equals("") || count == 0) {
return "";
}
if (count == 1) {
return base;
}
StringBuilder sb = new StringBuilder(base.length() * count);
for (int i = 0; i < count; i++) {
sb.append(base);
}
return sb.toString();
}

What's going on here? Well, there are four basic cases.

Case one: base is the empty string, so the result should always be the empty string. We don't want to loop and all that, so this optimization returns an empty string straight away.

Case two: count is zero. Here again, why do any actual work? We should return the empty string here and now.

Case three: count is one. We can avoid instantiating a new string by returning the original string directly!

Case four: aww nuts. In this case, we really do have to loop and create the new string. Well, at least we optimized out as much ass we could first!

In each case, we're very carefully doing the bare minimum amount of work we can! With me so far? Sounds good?

When I found this, I proceeded to make the method run even faster. Any guesses how I did it?

That's right, I simply removed all of these so-called optimizations.

Remember that to optimize a special case, you must check for that special case... in every case! That small extra cost goes to every single user of the method. And what of the benefits? Hah! Notice that the "optimized" special cases are the cases that are the fastest to compute anyway!

What's more, surprisingly enough, it doesn't really happen that often that users call a repeat() method passing zero or one as the count. Why would they? Commonly, that is. So we're "optimizing" a case that hardly even exists, at the expense of all the cases that do exist. The special-case checks were a net loss, and better off removed.

It's not like my removing them radically improved the performance of anyone's application. However, the experience is useful as a lesson. You'll encounter this same situation many times in many guises, and it will often be tempting to think about the benefits to the few rather than the overall aggregate cost to society as a whole (hmm... remind you of any politicians?)

If the Group A of my opening scenario has thousands of times more members than Group B -- not to mention that dollar really turns out to be a dime -- it's a bad deal. Just say no!