Tuesday, April 29, 2008

JavaOne approacheth

Just got my sessions all scheduled. As usual, I chose them more for the speakers than for the topics; there are certain individuals who I just know can speak well and tend to talk about topics I like -- people like Brian Goetz, Bill Pugh, Cliff Click, Joshua Bloch, etc. Whatever they talk about, I go.

If you're attending this year, and you might like to meet me and chat about Java, collections, Guice, working at Google, the smallwig or whatever, well gosh, I'd like that too. What you can do is stop by the Google booth in the pavilion at one of these times:

  • Tuesday May 6, 2:00-3:00 pm
  • Wednesday May 7, 12:30-1:30 pm
  • Thursday May 8, 12:30-1:30 pm
And then look for the guy wearing a Google t-shirt who looks like this but needs a haircut. Say hello and we can talk about whatever. That would be cool.

Also, if you're a Guice user, please see if you can come to BOF-6400, The Future Of Guice, which is on Thursday at 7:30 pm. Bob, Jesse and I will all be there for an informal fireside chat about forthcoming Guice goodness.

Hope to see you!

And for those not coming to JavaOne, what conferences will you be attending over the next year, if any?

I'm a twit

For those who care about such things, I'm on twitter now.

Friday, April 25, 2008

Interesting Stuff I Read

I've added a link to my Google Reader shared items in my sidebar. You can view that, or subscribe to its feed, or whatever! (You don't have to be using Google Reader... though you should). :-)

Thursday, April 24, 2008

I get to break awesome news

I asked Josh if I could have the pleasure of breaking this news on my li'l blog here, and unbelievably, he actually said "sure."

Lucky me! Here's the news:

Effective Java, Second Edition by Joshua Bloch has gone to press and copies will be available at JavaOne in two weeks.

Hooray! We've been waiting for this for a long time.

Having read it (again, lucky), I'll quickly tell you my opinion (personal opinion only, not an endorsement by my employer, and feel free to disregard it as biased if you like).

You probably all know how valuable the first edition is already. The new edition really takes it a step further. It's vastly improved and has entire new sections on generics, enums, annotations, and other recent Java developments. The concurrency chapter was completely redone to reflect the "java.util.concurrent" new world order. There's a wealth of new information about serialization pitfalls and patterns, and the list goes on.

It is not just the Effective Java you know with a few extra chapters tacked on! Josh has painstakingly revisited every single line of every single page. I believe it shows.

This book will certainly replace its predecessor as the bible of our craft. Many of the code reviews I do for Java library code at Google basically end up with me spouting chapter and verse from EJ, and I can't wait for everyone to get the new edition so I can start doing the same with it!

(Not linking to amazon because I'm peeved at them; they let you click "see inside the book" but then they just show you the insides of the first edition, leading you to think that nothing has changed.)

Pure functions

To its detriment and yours, the Java language makes no distinction between a pure function, and any plain old subroutine. Even in the core libraries, the two are freely intermingled, with no obvious distinguishing characteristic. Yet we can all benefit from striving to make this distinction clear in our own code.

By "pure function" I mean a function in the mathematical sense: it performs a calculation with no observable side-effects, and its result depends only on its arguments. Invoke it again on the same instance (or Class if static), and with the same arguments in the same states, and you must always get the same answer.

What are some advantages of pure functions?

  • They're testable
  • They're thread-safe (though not necessarily "thread-correct", more on this later)
  • They're deterministic
  • They never need to be mocked out*
  • They're easier to understand and reason about
  • They're "referentially transparent," so they can be "memoized" (more on this later)

They're the easy kind of functions to work with, just like immutables are the easy variety of data objects.

(*About this particular claim. Have you ever felt compelled to test how your class behaves if the implementation of integer addition were to change? I doubt it, unless you're just plain batshit crazy, or a mathematician (but I repeat myself). In rare cases, if a pure function is very expensive, you may want to mock it anyway just to make your test runs faster. But you didn't "need" to do it.)

When is a function pure?

All its dependencies must be pure functions themselves (or constants, which are basically just pure functions that have no arguments). Impurity, just like it sounds, is a contaminant. If your method calls eight other methods, and just one of those calls a method which sometimes calls a method which uses System.currentTimeMillis(), kaboom: your function is not pure.

So a method which invokes new Random(5) may still be pure (as guaranteed by that class's specification), while one that invokes new Random() certainly is not. Collections.shuffle(), the two-argument form, is pure, while Collections.shuffle() the one-argument form is not. (wait, duh, neither is pure, because they mutate the passed-in list! but maybe you see the point anyway?) Now you see the "intermingling" I was bemoaning before!

What are the most common sources of impurity in my code?

Some I can think of:
  • mutable state
  • the system clock
  • I/O

I'm sure there are more. Help me out here: what others can you think of?

Are impure functions evil?

No, of course not. If they were, I would never be able to write any, as it would be against company policy. They're simply very different from their pure cousins, and more challenging to work with and to test. Keeping your functions pure, like keeping your value objects immutable, just gives you less to worry about. (Remember that hit song "Mo' Mutatin', Mo' Problems?" Toootally analogous to that. Listen to Biggie, he knew.)

How to deal with impurity?

I've told you that the system clock is a contaminant, that makes everything it touches impure. But, of course, some of your business logic probably needs to know the current time. Are you just hopelessly contaminated as well?

No! You have at your disposal a chlorine tablet called dependency injection! (You just knew it would come to that, didn't you?)

Before:

  public class SignUtils {
public static String getCurrentMessage() {
Instant now = new Instant(); // automatically set to now
return someCalculation(now) ? "OPEN" : "CLOSED";
}
}

After (simplified):

  public class SignController {
@Inject Clock clock;
public static String getCurrentMessage() {
Instant now = clock.now();
return someCalculation(now) ? "OPEN" : "CLOSED";
}
}

The result is a function which can be either pure or impure depending on what dependencies are provided for it. In "real life", you need it to be impure, and return a different result at 9:01 than it did at 8:59. But this nondeterminism has now been walled off behind an interface. Because the result of getCurrentMessage() itself now depends only on the states of its arguments (none) and the state of its instance, it will always be just as pure as its provided clock instance is. Now the code is testable, because we properly isolated the impurity.

In summary:
  • Pay attention to the difference between your pure and impure functions.
  • Use dependency injection to limit the damage radius of impure functions.
  • If you're designing the Next Great Language, ferchrissakes handle these two things differently. Don't make the system time available via a simple static method call.

Thanks for reading. Let me know if this kind of post is helpful to you!

Tuesday, April 22, 2008

fun with IdentityHashMap

What does this program print? (Eliding the generics so you can read it.)

public static void main(String[] args) {
Map m = new IdentityHashMap();
m.put("a", 1);
m.put("a", 2);
m.put("a", 3);
System.out.println(new HashSet(m.entrySet()).size());
}


When you've got the answer, scroll down...






























The answer is 1. Even though this is an identity-based HashMap, String literals are interned, so after the first entry is created, it is overwritten two times leaving a map of size one. This single entry will then be placed into the HashSet, so the HashSet has size one.

If you got it right, congrats. Now let's make a small change.

public static void main(String[] args) {
Map m = new IdentityHashMap();
m.put("a", 1);
m.put("b", 2);
m.put("c", 3);
System.out.println(new HashSet(m.entrySet()).size());
}


Now what does it print?

Once you've decided on your answer, compile and run the code (sorry about the warnings). Were you right?

Update: Ok, this isn't doing the same thing for y'all that it was doing for me. And now it's not doing it for me either. :) Ok look. Try this: remove the call to .size(). Just print out the entry set itself. Guess what it's going to be first. Then see. It'll be worth it, really!

Monday, April 21, 2008

"   "

Who among my readers believes that he/she has a firm grasp on the meaning of the term "whitespace" as it applies to modern Java development?

Anyone?

Whitespace! How much simpler could anything be than that?

Yeah. Well guess what. I've found, so far, six conflicting definitions worth knowing about. They are summarized in this table for your viewing enjoyment. I daresay you will be surprised at how bad the situation is.

Wednesday, April 16, 2008

The real difference between List<Object> and List, illuminated at last

Suppose you're a store clerk, and a customer asks you, "what kinds of credit cards do you accept?"

The difference between List<Object> and List is basically the difference between answering this question "we accept all kinds," and answering it, "duuuuhhhhhhh?"

The smallwig theory of optimization

There are three kinds of optimization.
  1. Optimization by using a more sensible overall approach.
  2. Optimization by making the code less weird.
  3. Optimization by making the code more weird.
You've probably heard, and maybe even spouted yourself, the phrase "premature optimization is the root of all evil." It's exclusively "Type 3 optimization" that this aphorism applies to. Types 1 and 2 are quite fine to engage in pre-emptively.

To make a type 3 optimization, your burdens are six:
  1. Thou shalt have excellent, comprehensive unit tests.
  2. Thou shalt have a reliable benchmark, based on representative inputs.
  3. Thou shalt demonstrate that your change improves the benchmark.
  4. Thou shalt successfully argue that this improvement really matters.
  5. Thou shalt comment the code.
  6. In nontrivial cases, thou shalt also preserve the clear-but-slow implementation, to use in parity tests with your optimized implementation.
In all things, remember these truths:
  1. Your brain is a terrible profiler.
  2. Hotspot will outsmart you.
  3. It just doesn't matter, until it matters.
If you believe this post, please spread the word!