Posted by Anders
Sun, 26 Oct 2008 16:26:00 GMT
Last week I attended a small workshop on building with Maven, which quickly turned into a Buildr workshop once we realized that most attending simply didn’t like Maven very much. It inspired me to write down some of the things I don’t like with Maven:
Verbosity
Maybe the problem is spelled “XML”.
Buildr shows that you can express essentially the same things as Maven’s pom files in an order of magnitude fewer lines and still be more readable. Good old Ant has the excuse of being created in a time when XML was hyped, but there’s no excuse for the masses of XML that Maven forces you to create and maintain.
Working With Legacy Code
Have you ever tried to introduce Maven to build a large legacy system? Because of Maven’s inflexibility, you can’t easily do this in small steps, incrementally moving closer to (Maven’s idea of) an ideal file and dependency structure. Instead you’re looking at doing it all in a single huge, messy, risky step.
Transitive Dependencies
Nice in theory. In practice this means Maven will download half the internet for you and at the same time making you sensitive to every minor slip-up in any pom of any n-level indirect dependency you have.
Even if your code never invokes the rarely used parts of commons-something that actually depends on commons-kitchen-sink, you’re going to get commons-kitchen-sink. Or maintain the screenful of XML to override it.
Flexibility and Plugins
Most non-trivial projects have at least one detail that just doesn’t fit Maven’s model. In any other build tool that can be easily remedied with (often just a single line of) scripting. In Maven, the solution is plugins.
So your build ends up depending on abandoned third-party plugins just to accomplish the most trivial scripting tasks. So much for dependency management.
Unit Testing
When unit testing, you are only interested in two things: Are my tests at 100% and, if not, what is failing? (And, for bragging rights, how many tests do I have?) Just like Ant before it, Maven fails badly at this.
The (mostly useless) stdout from the tested code is puked to the terminal, drowning Maven’s own test summaries. The causes of failures, on the other hand, are hidden away in report files in some other directory.
I can see the use for “test reports” in some scenarios and tool sets, but as a default behavior for a unit test runner they’re absurd.
Build Output
Why is so much useless output written to the terminal during a build? Are they trying to impress Linux Kernel hackers?
Repositories
It’s good practice to keep the things you depend on under version control, among other things for repeatability. Maven’s take on this is to keep dependency meta-data under version control, and download the dependencies themselves from public repositories.
So now you’re depending on someone else’s public repository to be available (and correctly configured and maintained). If it’s not, you won’t be able to build on a freshly installed machine.
I’m told the recommended solution is to have your own local caching repo (which also allows you to depend on private or proprietary libraries). So now you have yet another server to maintain. Wouldn’t it be both less work and lower risk to just keep the jars under local version control?
(Also, the only caching repo I have tried, Artifactory, kept corrupting data. It was probably a problem with its embedded Jackrabbit content repository, but still).
File Structure
I’m forced to structure my files in a way the Maven developers, a group whose judgement and taste I usually dissent with, find ideal.
I could go on all day…
Posted in Java, Programming | Tags buildr, maven, rant | 15 comments
Posted by Anders
Mon, 25 Aug 2008 19:06:00 GMT
or “Hot Potato Exception Handling”
This is a common use of try-catch:
public void foo() {
// Catch any exception so that the call to super is done anyway
try {
//...
} catch (Exception e) {
// Log
// ...
}
// Call super last
super.foo();
}
You could think that the purpose of the try-catch is to enable logging of the exception. But the first comment (taken from actual example code) suggests that the logging is just incidental. The purpose is to make sure that something is run no matter what. The logging is just a case of not knowing anything better to do with the exception once it’s caught.
Instead, why not use the simple try-finally:
public void foo() {
try {
//...
} finally {
super.foo();
}
}
We’re not “handling” the exception, but that’s probably good. We are handling non-Exception throwables though. I think people just forget that you can have a try-finally without the catch.
Posted in Java, Programming | 1 comment
Posted by Anders
Tue, 24 Jun 2008 22:00:00 GMT
...or some other provocative title.
The good old toString() method, with us since Java 1.0, has at least two different meanings:
- Displaying: How the object should appear to the user, in the GUI, on a web page, etc.
- Inspection: How the object should appear in debug output, logs, debugger tools etc.
Both are in some way “a string representation of the object”. The default implementation in java.lang.Object suggests inspection, e.g. “java.lang.Object@c37f31”, but many APIs, like AWT/Swing, use it for displaying the object to the user.
Problems
- It’s hard to tell which usage is intended when reading the code.
- Debuggers will use
toString(), which can cause confusing side-effects.
- Since every object has a
toString(), the IDE’s usage search becomes unusable.
- It’s hard to tell if a
toString() method is dead code or not.
In addition, a lot of code implements it when it has a more specific meaning. For instance, generating HTML is better done as toHTML() than as toString().
What Others Do
Ruby solves this differently than Java. There are two methods, inspect() and to_s(), where the default implementation of to_s() in Object uses inspect(). This separates the two intentions, but still has to_s() available on every object.
We can’t do much about java.lang.Object, but we still have options.
Suggestion
Use toString() only for logging and debug output.
If the method has a more specific meaning, communicate that instead, e.g. title() or name().
If the value to display has a specific format you can communicate that instead, e.g. toHTML() or asLeetSpeak().
If the value to display is nothing other than a string, still avoid toString(). Call it something like displayString(), or maybe even asString() to avoid problems.
Posted in Java, Programming | 6 comments
Posted by Anders
Wed, 04 Jun 2008 17:47:00 GMT
Notes on java.math.BigDecimal’s performance (in Java 1.5):
Sorting
BigDecimal’s compareTo method relies on both of the BigDecimals being in the same internal form. Internally BigDecimal uses either a BigInteger or, when possible, a native integer to represent its value. To compare two BigDecimals they’re both normalized (“inflated”) to the BigInteger form. This means that simply sorting a list of BigDecimals can cause memory use to increase. Not what you’d expect.
Serialization
Serialization of BigDecimal is surprisingly slow. Not only do they inflate their internal representation, just like when comparing, but they also use a lot of CPU for some reason. When serializing large graphs of objects of a lot of different classes, the BigDecimals stood out like a sore thumb in the CPU profile. Dumping them as String representations instead was quicker and didn’t use as much memory.
Posted in Java, Programming | Tags performance | no comments
Posted by Anders
Thu, 20 Dec 2007 09:27:00 GMT
Charles Miller nicely summarizes my opinions on Maven:
Paradoxically, by trying to make dependency management easy, maven makes it incredibly hard. It becomes dangerously easy for a project to accumulate dependency cruft – at best unnecessary, at worst conflicting – and excruciatingly painful to remove them.
Managing transitive dependencies by automatically traversing the entire dependency tree, the basic strategy of Maven, must be an anti-pattern. Managing them “manually” is a little more work, but will probably save time in the end and definitely lower risk.
Posted in Java, Programming | Tags links, maven | no comments
Posted by Anders
Fri, 14 Sep 2007 15:11:00 GMT
When JUnit 4.1 was released last year, they added a nice feature that has gone mostly unnoticed.
RSpec envy
Consider this archetypical RSpec example in Ruby. One class, Stack, being tested in two different scenarios (empty and non-empty):
describe Stack, " (empty)" do
before(:each) do
@stack = Stack.new
end
it "should have zero size" do
@stack.size.should == 0
end
# ...
end
describe Stack, " (non-empty)" do
before(:each) do
@stack = Stack.new
@stack.push 'x'
end
it "should have size greater than zero" do
@stack.size.should > 0
end
# ...
end
Doing the same thing in JUnit would require us to either create two different classes, which makes our tests hard to follow, or to abandon the “before”-methods and initializing at the start of each test method. Other tools, like JDave, solve this by having an inner class for each scenario, but for various reasons JDave isn’t the solution for me.
Enclosed to the rescue
When browsing the JUnit source code, my colleague Rickard stumbled on the org.junit.runners.Enclosed class. Apparently it’s been part of JUnit since 4.1 released in 2006. Enclosed is a test runner that runs all the inner classes of a class as tests. It works perfectly within IntelliJ and other tools. Now you can have Rspec-style testing in JUnit, almost. Behold its goodness:
@RunWith(org.junit.runners.Enclosed.class)
public class StackTest {
public static class EmptyStack {
private Stack stack;
@Before
public void before() {
stack = new Stack();
}
@Test
public void shouldHaveZeroSize() {
assertEquals(0, stack.size());
}
// ...
}
public static class NonEmptyStack {
private Stack stack;
@Before
public void before() {
stack = new Stack();
stack.push("x");
}
@Test
public void shouldHaveSizeGreaterThanZero() {
assertTrue(stack.size() > 0);
}
// ...
}
}
Posted in Java, Programming | Tags bdd, jdave, junit, rspec, testing | 2 comments
Posted by Anders
Fri, 07 Sep 2007 20:58:00 GMT
Open Source, System Architecture and Centralization
Thoughts on how Open Source software affects system architecture. Or, to be more precise, how the lack of per-server licensing affects it.
Last year, as an example, I worked on a project using JMS messaging. Originally the plan was to send messages to a local message broker application on the sending system (2-3 servers), through central message brokers in two data-centers (2-4 servers) and possibly brokers on every receiving system (lots and lots). Obviously we were planning to use an Open Source message broker, because if we had considered using a commercial product, we would never have come up with an architecture with so many brokers.
When the project was underway we had to switch to a commercial JMS broker. These products are big, “enterprisey” and expensive things, so suddenly we could only practically have four central servers, and they still would cost us a fortune. The result? Among other things worse performance, since the round-trip time to the central servers was longer than talking to a local broker. Even though the commercial software had much better high-availability support, the system was in practice more vulnerable because of its centralization. (Fortunately availability never became an issue, since the project failed and never reached production…)
Another, much better, example is Google. What would Microsoft charge them to run Windows on half a million servers? The kind of massively parallel architecture that Google use did exist early, but it wasn’t until the freely available Linux was available that it became popular, e.g. Beowulf clustering. Suddenly you could build a super-computer out of scrap hardware, something you earlier wouldn’t dream of.
When you can design your systems free of the licensing constraint, they tend to become more decentralized. Even though the total cost of a decentralized architecture may not be prohibitive, I think you subconsciously lean towards the centralized solutions when per-server licensing is in the picture. For such a major design constraint, it’s rarely put into numbers or even mentioned.
Posted in Java, Programming | Tags architecture, jms, oss | no comments
Posted by Anders
Sun, 08 Jul 2007 19:50:00 GMT
I’m currently developing a clustered back-end system for a website using Terracotta. Not yet in production, but here are some random observations so far:
Transparency
The same old “transparency” that every new framework claims? Yes and no.
You can’t ignore that you’re writing a clustered app. There are performance concerns, memory usage, some obscurer parts of the Java API that’s unsupported, etc. There’s details that you must know, so you really have to read the documentation. You also have to configure Terracotta for your application.
But after you’ve grasp the basic idea about how the clustering works, it’s pretty straightforward. When you have a Terracotta-friendly overall architecture going, the day-to-day work with business logic details isn’t much affected by it.
Testing
Since Terracotta doesn’t have an API, unit testing the application is very simple. There’s nothing that requires mocking and things work the same with or without clustering. You can unit test as if Terracotta didn’t exist, which is much better than your average framework. I think the ease of unit testing is one of the strongest points of Terracotta.
Since you’re unit testing without the cluster, there’s still a lot of functional testing required to make sure things really work. It does happen that things break when you run them clustered. But most problems we’ve seen are easily fixed configuration problems, or threading issues that we would have even without clustering.
Threading
Since Terracotta follows Java’s “memory model”, you just have to write a correctly threaded Java application for it to work. Unfortunately it’s very tricky to write such a thing.
The kind of sloppy thread programming you can often get away with, without any observable bugs, will not work. Fortunately Terracotta will catch a lot of problems directly, but you still have to have a good insight into Java threads to get things right. This ain’t Erlang.
Performance
The throughput and transaction rate looks good, you can do a lot of changes per second. For our application, with a big data-set, we’ve instead found the “virtual heap” speed to be what limits us. Paging in data from the server to the cluster nodes takes time. You can add new objects to the shared heap faster than you can page in old ones. This means we have to optimize our application for localized memory access. No real surprise.
Other
We’ve run into a few problems, small and big, but they’ve been quickly resolved with the help of the mailing list.
Posted in Java, Programming | Tags terracotta | 5 comments
Posted by Anders
Sun, 01 Jul 2007 20:20:00 GMT
About intentions, two stages in the history of programming and the corresponding stages in the individual programmer’s skills.
Stage A – How do I get it to do what I want?
In the beginning of a programmer’s learning there is a struggle to simply get the machine to do something, anything. Much like when you try to communicate in a spoken language you’ve just started learning, you can’t express some concepts, invent awkward statements instead of words you don’t know, and get things plain wrong.
You have a pretty clear idea what you want to say, but you struggle to communicate it.
“I need a lift in your el truck-o to the next town-o!”—Brad Pitt, The Mexican
In the early programming languages, there wasn’t much more to it than this. The language was very limited and even though it could theoretically express everything, you couldn’t communicate it. The higher level concepts and ideas was not expressed in the language, but in the documentation or, more likely, remained in the programmer’s head. Even though you learn the language to perfection, there’s little difference in what you can express.
Stage B – What do I want?
“I want a helmet. A cheese helmet. A helmet full of cheese. You just pop it on your head and eat all day.”—Denis Leary
Once you master a language, the focus shifts away from the language itself. If you know what you want to say, you can often express it directly, less limited by the language. Ever higher levels of abstraction and expressiveness in languages makes this increasingly simple. (Learning the language in the first place isn’t necessarily any simpler). Constructs like classes allow you to express very abstract concepts and ideas in just a few statements.
The original purpose of a language, communication, can now be achieved. The problem is now to find out what you want, your intention, so that you can express it.
Understanding and Documentation
While most of what programmers do these days is firmly in “B”, there are a lot of ideas with roots in “A” that still linger in our minds. Our ideas about how and when to document our code often have their origins in a time where the languages couldn’t serve the purpose of communication. There was no option but to annotate everything with natural language if you wanted to communicate. When we left machine code for procedural languages this need was reduced a lot (to merely procedure-level comments), and object-oriented languages reduced even further (maybe to just class-level comments). This is all under the assumption that the language’s abilities are actually used.
The intensive documentation required in “A” also sometimes created the belief that the documentation should be extensive enough for someone, with no prior knowledge of the system, to understand it directly.
But just like in any natural language, the individual statements don’t make sense unless you know their context. In a program the context can be very big, ranging from knowledge about the problem domain to the technical solutions preferred in the programming team and can not be expected to be learned from low-level documentation within the code.
The purpose of the code is communication with the other programmers working on the system (and with the compiler). Like any natural language, you use people’s shared context to make the communication more efficient, at the expense of the outsider’s understanding of it.
Getting to B
“I CAN WRITE COBOL IN ANY LANGUAGE”—Unknown
While a beginner may naturally be in “A”, there is no guarantee that experienced programmers will be in “B”. While it is a requirement to know the language well to do “B”-style programming, it isn’t enough. Just using a modern language doesn’t make our code more communicative. We can still write long methods with three-letter variable names in Java, as we did in C. We can still organize and name our Ruby classes entirely without connection to our problem domain.
We have to strive to keep accidental complexity and technicalities to creep into how we write and organize our code. Using good naming on everything from variables up to modules, we have to keep the code close to the intentions that originate it. When the intentions change, we have to use Refactoring to make the code reflect the new conditions. If we don’t use these new opportunities, we’re also stuck with the needs for heavy documentation and other practices of “A”. Even if we don’t write code for the “outsider” audience, we still need to communicate as clearly as possible.
“Getting the words right”
The big challenge is to know what your intentions really are. Unless they are truly clear in our minds, they can’t be clear in the code. This is where design tools like Test-Driven Development (TDD) is a big help. By forcing us to reason and express what we really want our code to do, it makes it easier to write code that also communicates this intention.
The more recent Behavior-Driven Development (BDD) concept emphasizes the TDD idea that the tests themselves should be used for communication. BDD tools typically encourage you to state your intentions in natural language, together with code that verifies each intention (paradoxically similar to the early documentation practices). Unlike other forms of documentation, it is harder for this one to become out-of-date and misleading.
The TDD/BDD ideas seems like a parallel, but orthogonal, development to that of the modern languages. They are adding more ways of communication, but at the same time as helping us use the existing ways better.
Thus
There are a lot of opportunities in “B”, as long as you actually take advantage of them. For novices, there is the need to master their programming language, to gain the ability to express themselves in it. For experienced programmers, it’s important to really use the abilities to communicate that modern languages provide, supported by the available tools.
Posted in Ruby, Java, Programming | Tags language, rambling | no comments
Posted by Anders
Fri, 18 May 2007 23:20:00 GMT
On a common trap that a lot of us have walked into as our Java skills deepened:
You’re solving some programming problem involving multiple types/classes when you realize that you can make great use of Java’s method overloading. Just implement one method for each of the classes, and you’re done! In this example, two subclasses of Person:
void doStuff(Worker worker) {
...
}
void doStuff(Capitalist capitalist) {
...
}
Now if you just call doStuff like this:
Person person = ...
x.doStuff(person);
it should all work, right? It should call the correct method, depending on which class person is. You probably feel a slight rush of pride when you look at how simple your solution is.
But of course it doesn’t work. Java’s method calls must have their argument types known at compile-time1. You need to have person cast to one of the subclasses, otherwise the compiler will complain that it doesn’t know which of the two methods you mean. You mean “either”, but the compiler can’t help you.
At this point there is often bitter disappointment and anger, usually directed at the compiler or language (“jävla skitspråk!”). Once that subsides, you know you have learned something. Even programmers with good knowledge of Java seem to walk into this trap, which is what made me notice this phenomenon. Even though you know all the theory of how Java works, you can still get surprised by the practical implications.
For me, it was around 1999 at my first job. I can’t remember the problem I was trying to solve, but I do remember the surprise and disappointment. I last witnessed it just a few weeks ago. When I mentioned the phenomenon to a friend, he recalled having gone through it himself, and also having witnessed it recently.
Does everyone go through this when they’re learning Java?
1 This doesn’t happen with dynamically typed languages, since they typically bind their methods at run-time. On the other hand most dynamically typed languages don’t do method dispatching on types at all. Some functional languages with fancier method dispatchers can do this stuff, maybe at the expense of slower method calls. Erlang’s pattern matching of messages takes this feature to the extreme (though it’s not exactly method calls). You could argue that Prolog is the very extreme of this, where everything is fancy method dispatching.
Posted in Java, Programming | Tags languages, learning | 3 comments