Posted by Anders
Sun, 26 Oct 2008 16:26:00 GMT
Last week I attended a small workshop on building with Maven, which quickly turned into a Buildr workshop once we realized that most attending simply didn’t like Maven very much. It inspired me to write down some of the things I don’t like with Maven:
Verbosity
Maybe the problem is spelled “XML”.
Buildr shows that you can express essentially the same things as Maven’s pom files in an order of magnitude fewer lines and still be more readable. Good old Ant has the excuse of being created in a time when XML was hyped, but there’s no excuse for the masses of XML that Maven forces you to create and maintain.
Working With Legacy Code
Have you ever tried to introduce Maven to build a large legacy system? Because of Maven’s inflexibility, you can’t easily do this in small steps, incrementally moving closer to (Maven’s idea of) an ideal file and dependency structure. Instead you’re looking at doing it all in a single huge, messy, risky step.
Transitive Dependencies
Nice in theory. In practice this means Maven will download half the internet for you and at the same time making you sensitive to every minor slip-up in any pom of any n-level indirect dependency you have.
Even if your code never invokes the rarely used parts of commons-something that actually depends on commons-kitchen-sink, you’re going to get commons-kitchen-sink. Or maintain the screenful of XML to override it.
Flexibility and Plugins
Most non-trivial projects have at least one detail that just doesn’t fit Maven’s model. In any other build tool that can be easily remedied with (often just a single line of) scripting. In Maven, the solution is plugins.
So your build ends up depending on abandoned third-party plugins just to accomplish the most trivial scripting tasks. So much for dependency management.
Unit Testing
When unit testing, you are only interested in two things: Are my tests at 100% and, if not, what is failing? (And, for bragging rights, how many tests do I have?) Just like Ant before it, Maven fails badly at this.
The (mostly useless) stdout from the tested code is puked to the terminal, drowning Maven’s own test summaries. The causes of failures, on the other hand, are hidden away in report files in some other directory.
I can see the use for “test reports” in some scenarios and tool sets, but as a default behavior for a unit test runner they’re absurd.
Build Output
Why is so much useless output written to the terminal during a build? Are they trying to impress Linux Kernel hackers?
Repositories
It’s good practice to keep the things you depend on under version control, among other things for repeatability. Maven’s take on this is to keep dependency meta-data under version control, and download the dependencies themselves from public repositories.
So now you’re depending on someone else’s public repository to be available (and correctly configured and maintained). If it’s not, you won’t be able to build on a freshly installed machine.
I’m told the recommended solution is to have your own local caching repo (which also allows you to depend on private or proprietary libraries). So now you have yet another server to maintain. Wouldn’t it be both less work and lower risk to just keep the jars under local version control?
(Also, the only caching repo I have tried, Artifactory, kept corrupting data. It was probably a problem with its embedded Jackrabbit content repository, but still).
File Structure
I’m forced to structure my files in a way the Maven developers, a group whose judgement and taste I usually dissent with, find ideal.
I could go on all day…
Posted in Java, Programming | Tags buildr, maven, rant | 15 comments
Posted by Anders
Tue, 07 Oct 2008 15:31:00 GMT
Gissa vilket CMS jag jobbar med…
210 cd
81 ls
29 more
28 ant
25 bin/polopoly
15 svn
11 sudo
9 pwd
8 irb
7 fgrep
Posted in Computers, Programming | no comments
Posted by Anders
Mon, 25 Aug 2008 19:06:00 GMT
or “Hot Potato Exception Handling”
This is a common use of try-catch:
public void foo() {
// Catch any exception so that the call to super is done anyway
try {
//...
} catch (Exception e) {
// Log
// ...
}
// Call super last
super.foo();
}
You could think that the purpose of the try-catch is to enable logging of the exception. But the first comment (taken from actual example code) suggests that the logging is just incidental. The purpose is to make sure that something is run no matter what. The logging is just a case of not knowing anything better to do with the exception once it’s caught.
Instead, why not use the simple try-finally:
public void foo() {
try {
//...
} finally {
super.foo();
}
}
We’re not “handling” the exception, but that’s probably good. We are handling non-Exception throwables though. I think people just forget that you can have a try-finally without the catch.
Posted in Java, Programming | 1 comment
Posted by Anders
Tue, 24 Jun 2008 22:00:00 GMT
...or some other provocative title.
The good old toString() method, with us since Java 1.0, has at least two different meanings:
- Displaying: How the object should appear to the user, in the GUI, on a web page, etc.
- Inspection: How the object should appear in debug output, logs, debugger tools etc.
Both are in some way “a string representation of the object”. The default implementation in java.lang.Object suggests inspection, e.g. “java.lang.Object@c37f31”, but many APIs, like AWT/Swing, use it for displaying the object to the user.
Problems
- It’s hard to tell which usage is intended when reading the code.
- Debuggers will use
toString(), which can cause confusing side-effects.
- Since every object has a
toString(), the IDE’s usage search becomes unusable.
- It’s hard to tell if a
toString() method is dead code or not.
In addition, a lot of code implements it when it has a more specific meaning. For instance, generating HTML is better done as toHTML() than as toString().
What Others Do
Ruby solves this differently than Java. There are two methods, inspect() and to_s(), where the default implementation of to_s() in Object uses inspect(). This separates the two intentions, but still has to_s() available on every object.
We can’t do much about java.lang.Object, but we still have options.
Suggestion
Use toString() only for logging and debug output.
If the method has a more specific meaning, communicate that instead, e.g. title() or name().
If the value to display has a specific format you can communicate that instead, e.g. toHTML() or asLeetSpeak().
If the value to display is nothing other than a string, still avoid toString(). Call it something like displayString(), or maybe even asString() to avoid problems.
Posted in Java, Programming | 6 comments
Posted by Anders
Wed, 04 Jun 2008 17:47:00 GMT
Notes on java.math.BigDecimal’s performance (in Java 1.5):
Sorting
BigDecimal’s compareTo method relies on both of the BigDecimals being in the same internal form. Internally BigDecimal uses either a BigInteger or, when possible, a native integer to represent its value. To compare two BigDecimals they’re both normalized (“inflated”) to the BigInteger form. This means that simply sorting a list of BigDecimals can cause memory use to increase. Not what you’d expect.
Serialization
Serialization of BigDecimal is surprisingly slow. Not only do they inflate their internal representation, just like when comparing, but they also use a lot of CPU for some reason. When serializing large graphs of objects of a lot of different classes, the BigDecimals stood out like a sore thumb in the CPU profile. Dumping them as String representations instead was quicker and didn’t use as much memory.
Posted in Java, Programming | Tags performance | no comments
Posted by Anders
Thu, 20 Dec 2007 09:27:00 GMT
Charles Miller nicely summarizes my opinions on Maven:
Paradoxically, by trying to make dependency management easy, maven makes it incredibly hard. It becomes dangerously easy for a project to accumulate dependency cruft – at best unnecessary, at worst conflicting – and excruciatingly painful to remove them.
Managing transitive dependencies by automatically traversing the entire dependency tree, the basic strategy of Maven, must be an anti-pattern. Managing them “manually” is a little more work, but will probably save time in the end and definitely lower risk.
Posted in Java, Programming | Tags links, maven | no comments
Posted by Anders
Fri, 14 Sep 2007 15:11:00 GMT
When JUnit 4.1 was released last year, they added a nice feature that has gone mostly unnoticed.
RSpec envy
Consider this archetypical RSpec example in Ruby. One class, Stack, being tested in two different scenarios (empty and non-empty):
describe Stack, " (empty)" do
before(:each) do
@stack = Stack.new
end
it "should have zero size" do
@stack.size.should == 0
end
# ...
end
describe Stack, " (non-empty)" do
before(:each) do
@stack = Stack.new
@stack.push 'x'
end
it "should have size greater than zero" do
@stack.size.should > 0
end
# ...
end
Doing the same thing in JUnit would require us to either create two different classes, which makes our tests hard to follow, or to abandon the “before”-methods and initializing at the start of each test method. Other tools, like JDave, solve this by having an inner class for each scenario, but for various reasons JDave isn’t the solution for me.
Enclosed to the rescue
When browsing the JUnit source code, my colleague Rickard stumbled on the org.junit.runners.Enclosed class. Apparently it’s been part of JUnit since 4.1 released in 2006. Enclosed is a test runner that runs all the inner classes of a class as tests. It works perfectly within IntelliJ and other tools. Now you can have Rspec-style testing in JUnit, almost. Behold its goodness:
@RunWith(org.junit.runners.Enclosed.class)
public class StackTest {
public static class EmptyStack {
private Stack stack;
@Before
public void before() {
stack = new Stack();
}
@Test
public void shouldHaveZeroSize() {
assertEquals(0, stack.size());
}
// ...
}
public static class NonEmptyStack {
private Stack stack;
@Before
public void before() {
stack = new Stack();
stack.push("x");
}
@Test
public void shouldHaveSizeGreaterThanZero() {
assertTrue(stack.size() > 0);
}
// ...
}
}
Posted in Java, Programming | Tags bdd, jdave, junit, rspec, testing | 2 comments
Posted by Anders
Fri, 07 Sep 2007 20:58:00 GMT
Open Source, System Architecture and Centralization
Thoughts on how Open Source software affects system architecture. Or, to be more precise, how the lack of per-server licensing affects it.
Last year, as an example, I worked on a project using JMS messaging. Originally the plan was to send messages to a local message broker application on the sending system (2-3 servers), through central message brokers in two data-centers (2-4 servers) and possibly brokers on every receiving system (lots and lots). Obviously we were planning to use an Open Source message broker, because if we had considered using a commercial product, we would never have come up with an architecture with so many brokers.
When the project was underway we had to switch to a commercial JMS broker. These products are big, “enterprisey” and expensive things, so suddenly we could only practically have four central servers, and they still would cost us a fortune. The result? Among other things worse performance, since the round-trip time to the central servers was longer than talking to a local broker. Even though the commercial software had much better high-availability support, the system was in practice more vulnerable because of its centralization. (Fortunately availability never became an issue, since the project failed and never reached production…)
Another, much better, example is Google. What would Microsoft charge them to run Windows on half a million servers? The kind of massively parallel architecture that Google use did exist early, but it wasn’t until the freely available Linux was available that it became popular, e.g. Beowulf clustering. Suddenly you could build a super-computer out of scrap hardware, something you earlier wouldn’t dream of.
When you can design your systems free of the licensing constraint, they tend to become more decentralized. Even though the total cost of a decentralized architecture may not be prohibitive, I think you subconsciously lean towards the centralized solutions when per-server licensing is in the picture. For such a major design constraint, it’s rarely put into numbers or even mentioned.
Posted in Java, Programming | Tags architecture, jms, oss | no comments
Posted by Anders
Sun, 26 Aug 2007 07:55:00 GMT
Just so I don’t have to search for them again the next time I need someone to read them:
The Pragmatic Programmers show in simple words how to do good OO design, as they explain the Law of Demeter.
Robert C. Martin on how to manage dependencies between classes and layers. (Don’t confuse it with Dependency Injection).
Posted in Programming | Tags links | no comments
Posted by Anders
Sun, 08 Jul 2007 19:50:00 GMT
I’m currently developing a clustered back-end system for a website using Terracotta. Not yet in production, but here are some random observations so far:
Transparency
The same old “transparency” that every new framework claims? Yes and no.
You can’t ignore that you’re writing a clustered app. There are performance concerns, memory usage, some obscurer parts of the Java API that’s unsupported, etc. There’s details that you must know, so you really have to read the documentation. You also have to configure Terracotta for your application.
But after you’ve grasp the basic idea about how the clustering works, it’s pretty straightforward. When you have a Terracotta-friendly overall architecture going, the day-to-day work with business logic details isn’t much affected by it.
Testing
Since Terracotta doesn’t have an API, unit testing the application is very simple. There’s nothing that requires mocking and things work the same with or without clustering. You can unit test as if Terracotta didn’t exist, which is much better than your average framework. I think the ease of unit testing is one of the strongest points of Terracotta.
Since you’re unit testing without the cluster, there’s still a lot of functional testing required to make sure things really work. It does happen that things break when you run them clustered. But most problems we’ve seen are easily fixed configuration problems, or threading issues that we would have even without clustering.
Threading
Since Terracotta follows Java’s “memory model”, you just have to write a correctly threaded Java application for it to work. Unfortunately it’s very tricky to write such a thing.
The kind of sloppy thread programming you can often get away with, without any observable bugs, will not work. Fortunately Terracotta will catch a lot of problems directly, but you still have to have a good insight into Java threads to get things right. This ain’t Erlang.
Performance
The throughput and transaction rate looks good, you can do a lot of changes per second. For our application, with a big data-set, we’ve instead found the “virtual heap” speed to be what limits us. Paging in data from the server to the cluster nodes takes time. You can add new objects to the shared heap faster than you can page in old ones. This means we have to optimize our application for localized memory access. No real surprise.
Other
We’ve run into a few problems, small and big, but they’ve been quickly resolved with the help of the mailing list.
Posted in Java, Programming | Tags terracotta | 5 comments