Thursday, October 25, 2007

Understanding CLS

This post is about all of us helping each other to understand CLS. CLS is an acronym for "Class Loader Stuff."

Man, I just really don't understand Class Loader Stuff.

Doing what I do, it happens fairly often that someone busts into my office or starts an IM and says, "hey, you know a lot about Java, right? I need some help."

And I always get really excited whenever this happens. Because I just loooove answering questions about things I know! Who doesn't?

"Sure, sure!" I say, "make it quick though, cause I gotta run over to Neal's building and help him with some generics problems he's having" (this is what passes for humor in my cubicle).

And then it all goes downhill, because five seconds of the way in, I realize that they're asking about Class Loader Stuff.

Aw shit. Now I have to be all "um, well, remember how you were, um, asking if I was the guy who knew a lot about Java? yeah? Well, hehe, I was just kiddin, see. Pretty funny, huh?"

Nope, I don't understand class loaders and I don't understand all the fucked up problems that they seem to cause.

So here's what I'm going to do. In fact, I'm going to try to do this every time I become flummoxed by some major Java thing that I just don't get:

I'm going to come to this blog and explain the thing I don't understand to you.

How does that sound? It's really quite a novel privilege to be able to learn from someone who has no idea what they're talking about, I mean since graduation anyway, so I hope you'll enjoy this as much as I will!

Understanding CLS (Class Loader Stuff)

This post assumes that you understand, going in:


  • What a class is; sort of, at least


Because if you don't, then you're in even worse shape than me, man.

The easiest way to understand what a class loader is all about would be to understand the one and only one purpose that it has. Unfortunately, it has something like three purposes, so that's just not going to work.

Plan B. Let's just start with the simplest thing it does first. At the most basic level, a class loader is a thing that you can tell it a name and will give you some bytecodez.


Definition
Class loader. n. It's a thing that you can tell it a name and it will give you some bytecodez.


It's a function from class name to class bytecode. You give it a String containing a valid Java class name, and it will use some mechanism to find and return you the bytecode to use for that name. So one of the things that makes different Class Loader implementations different is that they may each use different mechanics for how to come up with that bytecode.


  • Some of them might look at your classpath, read and unpack a JAR file, and get the bytes of a file ending in .class outta there
  • Some of them might make an HTTP request to the porn site you're currently frequenting and retrieve the class files from there (BoobieCam.class, etc.)
  • Some of them might just make some shit up and give it to you, then laugh at you with their friends later
  • Some of them want to abuse you
  • Some of them want to be abused


So that was the easy part. Of course, there's more.

The class loader doesn't just come up with this byte[] containing the bytecode for the class; it also provides these bytes to the JVM in an act known definining the class. This just means -- well, the instant before it does this, that class does not exist in memory, and the instant after, it does.

Every single class your JRE has in memory was placed there by some class loader. The class loader is an obstetrician, delivering new baby classes into the world.

Quick: what was the name of the person who delivered you when you were born? You probably don't even know, do you? But the weird thing about classes is: they know. Whenever a class loader defines a new class, that class contains an immutable reference back to the class loader instance that defined it. You can ask it yourself: clazz.getClassLoader().

But now we're finally getting to the interesting part: every class in memory in your runtime environment can be uniquely identified by the pair of

(a) its full name
(b) the class loader that loaded it

These two things together form, in database terms, the "unique key" of that class within this JVM process. Another way to say it is that each class loader gets its own independent namespace in which it can define classes.

It's as if you asked me my name and I said, "I'm Kevin Dr. Bob Farquar" and you said, "nice to meet you, I'm Kevin Dr. Fenton Pulsifer" -- each of us known by the combination of our own name with the name of our obstetrician who delivered us.

When necessary, I will refer to "the class foo.bar[A]" as a shorthand for "the unique class with the name foo.Bar which was loaded by class loader A".

You may have heard someone explain, or you may have explained yourself, "see, you can't cast a foo.Bar to a foo.Bar here even though it's the same class, because they came from different class loaders, so there's funny class loader hoodoo going on there." (My explanations are not usually that eloquent and cogent, but I try.)

Or you may have said, "Right, this class is a singleton, but that actually doesn't mean you have only one instance of the class per VM, it means you have one instance of this class per class loader."

But both of these explanations are incorrect. In the situations described, these are multiple different classes that have been loaded. They are not two different "versions" of the same class, they're just two different classes.

The class named foo.Bar defined by class loader A (shorthand: 'foo.Bar[A]') and the class named foo.Bar defined by class loader B ('foo.Bar[B]') have essentially nothing in common with each other. Just a name, and that's just coincidence, really. Having the same name as each other makes it perhaps more likely that they have the same bytecode as well, but this is irrelevant; they very easily may not have the same bytecode at all.

So when "the class foo.Bar" appears to have multiple different static states at the same time, or you sometimes see a ClassCastException for trying to cast a Bar to a Bar -- what's going on is not as mysterious as it first seems. You simply have two classes both using the same name.

So far, I've mentioned that two things the class loader does is that it (a) gets the bytecode somehow and it (b) performs the actual action of defining the class in the VM. These two functions are quite separate: it happens often that class loader B will want to use the same mechanics for obtaining the bytecode as class loader A does, so it will delegate to class loader A for that part. Then once that's done, class loader B will be the one to define the class, so the class will live in class loader B's namespace, and class loader A's noble contribution to this whole affair is just forgotten by everyone.

So. You have this big old soup of classes in memory, some of them have the same names as other ones, but the pair of (name, class loader) is always unique. And there are no hard barriers between these groups of classes; that is, the class Foo[A] can extend, implement, or in any other way refer to the class Bar[B] which comes from a different class loader. There is nothing weird about that.

Except how can that even happen? If I'm loading class Foo, and it extends class Bar, isn't that going to automatically trigger the loading of class Bar, and by the same class loader that's currently loading Foo? Well, yes, it is -- but the class loader can be crafty!

When you ask class loader A to load class Foo, it can say "okay," then when this triggers a request for it to also load class Bar, it can say, "no way, I'M not loading THAT piece of tripe", and it can delegate that operation over to class loader B to carry out. (This is different from the example I gave earlier, where one class loader cruelly exploits another just to get the bytes, but still defines the new class itself. Here, the class loader lets the delegate define the new class itself.)

So now you have class Foo[A] and class Bar[B], and all the references from Foo to Bar will be interpreted as references to Bar[B], not Bar[Z] or whatever other Bars were sitting around.

I'll stop here for now, but if time permits, I will come back and explain how much I don't understand about:


  • the bootstrap class loader
  • the system class loader
  • the extension class loader
  • the application class loader
  • the context class loader

Value objects WTF!!?!!2!

You might recall that here at the smallwig we recently, geologically speaking, discussed the interesting and important topic of how to model a simple "value object" in the Java[TM] Technology Platform Language Technology[TM[TM]]. (note: not its exact name. I can never remember exactly how we're supposed to refer to J*va. F*ckin' J*va.)

Now, in that post, we considered a simple example -- a class with no special behavior, only two simple attributes.

We'd like this class be simple, straightforward, well-behaved, idiomatic and correct.

Here's look at how the code came out -- actually no, let's make it immutable this time, because immutable is simpler and easier. Here goes:


public final class Foo implements Serializable {
private final String text;
private final Integer number;

public Foo(String text, Integer number) {
this.text = text;
this.number = number;
}

public String getText() {
return text;
}

public Integer getNumber() {
return number;
}

@Override public boolean equals(Object object) {
if (object instanceof Foo) {
Foo that = (Foo) object;
// oops! I cheated and used a helper class from the
// Google Collections Library!
return Objects.equal(this.text, that.text)
&& Objects.equal(this.number, that.number);
}
return false;
}

@Override public int hashCode() {
// oops! I did it again! ha ha!
return Objects.hashCode(text, number);
}

@Override public String toString() {
return String.format("[Foo: %s, %s]", text, number);
}

private static final long serialVersionUID = 0xB0B15C00L;
}


(Incidentally, this is the point in the previous post at which I proceeded to engage in the professionally dubious activity of laying down a few good old-fashioned "F-Bombs". Please note that it is generally considered inadvisable to spew "foul language" in a "technology blog" which you dream will become "respected" one "day." However, in some circumstances this approach is actually appropriate, for the basic reason that THIS IS A ****ING LOT OF CODE FOR SOMETHING SO MOTHER****ING SIMPLE WHAT THE ****.)

So anyway!

We have a problem here. Here's the problem:


  • A value object is a commonly-needed thing.
  • This is too much code to have to write for such a commonly-needed thing.
  • It easy to get some of the subtle details wrong.
  • If we write tests for these idiotic classes, we're wasting time; if we don't write tests for these idiotic classes, we find out later that they're buggy because, say, we forgot to use a null-safe equality check for a nullable field, or something.
  • Any special behavior you want to add to the class just gets lost in the sea of boilerplate.
  • Uh oh -- now you want your value object to be Comparable too, say by a lexical comparison of its fields. More code to write and rewrite.


Now, what should we do?

Solution 1: Do nothing?

But this answer makes no sense. We've all learned time and time again the perils of code duplication. And this is egregious code duplication. Why should we tolerate it? We shouldn't.

Solution 2: IDE Templates to the rescue?

But wait, you say, I don't have to write this stuff, I just click-click-click-click in my IDE and it generates all of that for me! Problem solved!

The last thing I'll do is argue against this because "not everyone uses an IDE." I'll be totally honest with you: forget people who don't use an IDE. I'm sorry, you know, I believe in "to each his own" and all that, it's just that if "your own" is to "run away from tools that are there to help you and work really well", then I just can't save you from yourself. You know what I mean?

No, that's not it. Look: IDE-generated code is copy-and-paste code. That's all it is. The IDE has a template, it copies it, it pastes it, it changes stuff around. So why people who vehemently detest copy-and-paste coders would then go and have their IDE generate equals() and hashCode() and toString() and compareTo() and clone() methods for them I don't know.

Sure I've alt-Enter'd my way through the creation of many a class. I like generating constructors and automatically extracting fields. But I like it because it's a faster way to write the code that I could have written myself, and would have written the same way anyway.

But no, the equals() and hashCode() methods I've seen IDEs generate are hideously ugly. Which brings me to my other point: IDE templates are not a solution because they only address a small part of the problem. They make classes faster to write the first time, but they do nothing at all towards making your code easier to read or maintain.

Solution 3: Pair! Triplet! Quadruplet! .... McCaugheys?

Don't laugh (all right, laugh). A lot of people really are doing this. They're getting their equals() and all that for free by subclassing their objects from classes like Pair and Triplet and.... well, God, I really hope they just stop there. This brings up all kinds of subtle trouble. For example, you don't want someone's FooPair("a", 1) to be considered as equals() to someone else's BarPair("a", 1), but they kind of have to be, since Pair is a useful (if degenerate) collection class in its own right which demands the customary equals() behavior and subclasses deviating from that breaks "substitutability" and blah blah blah blah.

It's even worse when they don't bother with even this much, have Pair showing up in their public API and all kinds of garbage.

Anyway, this is totally unscalable, so it's a non-solution and I don't think we should spend another column inch talking about this one.

Solution 4: We need a language change!

I once had this friend, a female friend, who was one of those rare people who had brains and beauty and a fun personality and wasn't a stark raving bitch, etc. But she had this problem that some people have, where she was incapable of ever falling in love with a guy unless that guy was somehow completely unavailable to her. She'd be smitten with him as long as he was married, or gay, or her faculty advisor, or her psychiatrist, or a minor, etc. But she kept never noticing the people who were right in front of her who she could have had any time she wanted. It was really sad.

Huh? Where was I? Oh yeah...

Solution 5: Reflectoporn

Here the idea is that you implement equals(), hashCode(), toString(), and compareTo() like this:


@Override public boolean equals(Object obj) {
return Reflectomatic.equals(this, obj);
}


And these libraries would use reflection to look at all the fields of your class and do the expected fieldwise thing. If you had a field you didn't want to be considered for these purposes, perhaps you could annotate it to that effect.

And in fact, these methods could be defined in an abstract base class which you could extend so you'd have to write even less code. Our example above might turn into:


public final class Foo extends AbstractValueObject {
private final String text;
private final Integer number;

public Foo(String text, Integer number) {
this.text = text;
this.number = number;
}

public String getText() {
return text;
}

public Integer getNumber() {
return number;
}

private static final long serialVersionUID = 0xC11FF15C00L;
}


Hrm. Well.... this isn't tooo, bad, actually. We're supposed to abhor reflection, though, aren't we? Demonized as being slow, isn't it? Doesn't it just feel like cheating?

Let's bookmark this idea and just go ahead and see if we can do any better.

Oh noe! This post is another goddamn teaser again!

Tune in next time when I discuss "classic" code generation, bytecode generation... and an idea which, unlike all the rest of these, might possibly be new to you! You can expect that post in... oh, certainly if not this year then definitely in the next one, I'm sure.

Wednesday, October 24, 2007

Monday, October 22, 2007

Collections update: I've integrated more stuff out to subversion (including a big ol' rewrite of the Preconditions class) and I've built and posted a downloadable zip containing jar, source, and javadocs. I hope you'll check it out.

Friday, October 19, 2007

I got a kick out of this observation from Jesse:


Here's my two problems with using it [the assert keyword] for business logic:


  • it can be turned off, which leads to bugs
  • it can be turned on, which leads to performance problems



AWESOME! Only two problems! So if we can just figure out how to plug those two, it will really work sweet!

I love that guy.

Thursday, October 18, 2007

Boy, I sure am getting in some trouble for leaving that teaser post on value objects and never coming back to it! That should teach me a lesson!

The worst part is that I've been avoiding blogging anything at all, because I keep thinking that I have to finish that topic up before I can say much else.

So: you'll get the conclusion to that when you get it. :-) I hope you'll find it was worth waiting for.

So what's up right now? Man, I am tired! I've really been busting ass here trying to get more and more of our collectiony goodness polished and pushed out to the open-source project. (Sometimes I do those two things in one order, sometimes in the other.)

A couple of things just went out. First, I've finished a major reworking of crazybob's ReferenceMap. (Ever wonder how Bob and I collaborate, on things like this, and Guice? One, he writes amazing, amazing shit. Two, I take his shit and I fuck with it. Big time. That's about how it goes.)

I have to tell you, I am really, really proud of ReferenceMap -- the kind of proud you are of something you know you only helped across the last mile, but still. I really think this thing is a beauty.

So what is it? It's the complete generalization of the concept in WeakHashMap; you tell it whether you want to use strong, soft or weak references for keys, and whether you want strong, soft or weak references for values. It does the rest; all nine combinations.

Was Bob the first person ever to think of this? Probably not. I think Apache Commons made theirs around the same time he first made this. But regardless, and I mean no offense to my Apache friends, I'm pretty sure this is the best implementation of this concept you're gonna find. It's fully concurrent -- it implements the ConcurrentMap interface and is backed by a ConcurrentHashMap. Reclaiming of entries happens concurrently as the garbage collector gets to them -- no extra cost to your application threads. And, of course, it's fully generified -- expect no less from anything in our library.

So try it out. Read the source if you're into that kind of thing. Let us know what you think. We're working on more related stuff... to come.

Next up, Cliff Biffle and I (again, mostly Cliff!) wrote a concurrent implementation of Multiset. We like to call it, ConcurrentMultiset. I won't urge you to run out and read it over, though, since I'm knee-deep in the middle of a huge ground-up revision to the Multiset API and documentation which will have ripple effects throughout all the Multiset implementations. When I'm done, you'll know, cause I'll show up here one day talking about how great multisets are all of a sudden.

I feel great these days about where the Google Collections are and where they're heading. I think when we get to 1.0 we're going to have a product you won't want to live without. I hope so, anyway -- and I guess it's that hope that keeps me at work past 11 on nights like this!

Friday, October 12, 2007

Of course, everyone keeps asking me how much I decided to pay for In Rainbows. They ask me since, well, I love Radiohead enough that, from the perspective of most of my friends, I appear to be the biggest Radiohead fan who ever lived. (I'm not; there are people who FAR outclass me, but yes, I love this band.)

And it was hard for me to figure out how much I should pay. I know that their cut on each of their previous albums was probably about a pound at most per copy sold, and this time without a record label or physical CD production or anything, they probably have only a small list of people they had to pay. So I could probably have paid 3 pounds and felt perfectly good about it.

But then, I just kinda felt like I owe them, you know?

It got me thinking. What if some malevolent supervillain developed the power to actually _unmake_ side one of OK Computer, to snuff it out of existence? It could never be heard again, and if I tried to remember how any of it went, I'd just draw a blank. How much ransom would I be willing to pay to stop that from happening?

The answer is probably an amount so high it would shock you. After all, it's JUST MONEY. WTF is money, anyway? And what I'd be saving, I just can't put a dollar amount on. I can hardly quantify the value of what they've given me, even in just those six songs, and then when you add in the Bends, and the National Anthem and Talk Show Host and all of that?

So anyway, no cheaping out for me. I went up to like 8 pounds or something like that.

Man, I tell you this: if I'm ever to be executed, let them shoot me through the head while I'm listening to Paranoid Android. Loud. I mean REALLY fucking loud. At the very end, when it all hits maximum intensity -- just kill me right then. I won't even know it. And who knows, maybe my consciousness would just somehow stay trapped for eternity in that moment. Is that what Heaven is?

Next post is sure to be something about Java, don't worry.