The Things That Are Wrong With Maven

Posted by Anders Sun, 26 Oct 2008 16:26:00 GMT

Last week I attended a small workshop on building with Maven, which quickly turned into a Buildr workshop once we realized that most attending simply didn’t like Maven very much. It inspired me to write down some of the things I don’t like with Maven:

Verbosity

Maybe the problem is spelled “XML”.

Buildr shows that you can express essentially the same things as Maven’s pom files in an order of magnitude fewer lines and still be more readable. Good old Ant has the excuse of being created in a time when XML was hyped, but there’s no excuse for the masses of XML that Maven forces you to create and maintain.

Working With Legacy Code

Have you ever tried to introduce Maven to build a large legacy system? Because of Maven’s inflexibility, you can’t easily do this in small steps, incrementally moving closer to (Maven’s idea of) an ideal file and dependency structure. Instead you’re looking at doing it all in a single huge, messy, risky step.

Transitive Dependencies

Nice in theory. In practice this means Maven will download half the internet for you and at the same time making you sensitive to every minor slip-up in any pom of any n-level indirect dependency you have. Even if your code never invokes the rarely used parts of commons-something that actually depends on commons-kitchen-sink, you’re going to get commons-kitchen-sink. Or maintain the screenful of XML to override it.

Flexibility and Plugins

Most non-trivial projects have at least one detail that just doesn’t fit Maven’s model. In any other build tool that can be easily remedied with (often just a single line of) scripting. In Maven, the solution is plugins. So your build ends up depending on abandoned third-party plugins just to accomplish the most trivial scripting tasks. So much for dependency management.

Unit Testing

When unit testing, you are only interested in two things: Are my tests at 100% and, if not, what is failing? (And, for bragging rights, how many tests do I have?) Just like Ant before it, Maven fails badly at this. The (mostly useless) stdout from the tested code is puked to the terminal, drowning Maven’s own test summaries. The causes of failures, on the other hand, are hidden away in report files in some other directory.

I can see the use for “test reports” in some scenarios and tool sets, but as a default behavior for a unit test runner they’re absurd.

Build Output

Why is so much useless output written to the terminal during a build? Are they trying to impress Linux Kernel hackers?

Repositories

It’s good practice to keep the things you depend on under version control, among other things for repeatability. Maven’s take on this is to keep dependency meta-data under version control, and download the dependencies themselves from public repositories. So now you’re depending on someone else’s public repository to be available (and correctly configured and maintained). If it’s not, you won’t be able to build on a freshly installed machine. I’m told the recommended solution is to have your own local caching repo (which also allows you to depend on private or proprietary libraries). So now you have yet another server to maintain. Wouldn’t it be both less work and lower risk to just keep the jars under local version control?

(Also, the only caching repo I have tried, Artifactory, kept corrupting data. It was probably a problem with its embedded Jackrabbit content repository, but still).

File Structure

I’m forced to structure my files in a way the Maven developers, a group whose judgement and taste I usually dissent with, find ideal.

I could go on all day…

Posted in ,  | Tags , ,  | 15 comments

Kommandorad topp-tio

Posted by Anders Tue, 07 Oct 2008 15:31:00 GMT

Gissa vilket CMS jag jobbar med…

  210    cd
   81    ls
   29    more
   28    ant
   25    bin/polopoly
   15    svn
   11    sudo
    9    pwd
    8    irb
    7    fgrep

Posted in ,  | no comments

Forgetting try-finally

Posted by Anders Mon, 25 Aug 2008 19:06:00 GMT

or “Hot Potato Exception Handling”

This is a common use of try-catch:

public void foo() { 
    // Catch any exception so that the call to super is done anyway 
    try { 
        //... 
    } catch (Exception e) { 
        // Log
        // ...
    } 
    // Call super last 
    super.foo(); 
}

You could think that the purpose of the try-catch is to enable logging of the exception. But the first comment (taken from actual example code) suggests that the logging is just incidental. The purpose is to make sure that something is run no matter what. The logging is just a case of not knowing anything better to do with the exception once it’s caught.

Instead, why not use the simple try-finally:

public void foo() { 
    try { 
        //... 
    } finally {
        super.foo(); 
    }
} 

We’re not “handling” the exception, but that’s probably good. We are handling non-Exception throwables though. I think people just forget that you can have a try-finally without the catch.

Posted in ,  | 1 comment

toString() is Evil

Posted by Anders Tue, 24 Jun 2008 22:00:00 GMT

...or some other provocative title.

The good old toString() method, with us since Java 1.0, has at least two different meanings:

  • Displaying: How the object should appear to the user, in the GUI, on a web page, etc.
  • Inspection: How the object should appear in debug output, logs, debugger tools etc.

Both are in some way “a string representation of the object”. The default implementation in java.lang.Object suggests inspection, e.g. “java.lang.Object@c37f31”, but many APIs, like AWT/Swing, use it for displaying the object to the user.

Problems

  • It’s hard to tell which usage is intended when reading the code.
  • Debuggers will use toString(), which can cause confusing side-effects.
  • Since every object has a toString(), the IDE’s usage search becomes unusable.
  • It’s hard to tell if a toString() method is dead code or not.

In addition, a lot of code implements it when it has a more specific meaning. For instance, generating HTML is better done as toHTML() than as toString().

What Others Do

Ruby solves this differently than Java. There are two methods, inspect() and to_s(), where the default implementation of to_s() in Object uses inspect(). This separates the two intentions, but still has to_s() available on every object.

We can’t do much about java.lang.Object, but we still have options.

Suggestion

Use toString() only for logging and debug output.

If the method has a more specific meaning, communicate that instead, e.g. title() or name().

If the value to display has a specific format you can communicate that instead, e.g. toHTML() or asLeetSpeak().

If the value to display is nothing other than a string, still avoid toString(). Call it something like displayString(), or maybe even asString() to avoid problems.

Posted in ,  | 6 comments

BigDecimal performance notes

Posted by Anders Wed, 04 Jun 2008 17:47:00 GMT

Notes on java.math.BigDecimal’s performance (in Java 1.5):

Sorting

BigDecimal’s compareTo method relies on both of the BigDecimals being in the same internal form. Internally BigDecimal uses either a BigInteger or, when possible, a native integer to represent its value. To compare two BigDecimals they’re both normalized (“inflated”) to the BigInteger form. This means that simply sorting a list of BigDecimals can cause memory use to increase. Not what you’d expect.

Serialization

Serialization of BigDecimal is surprisingly slow. Not only do they inflate their internal representation, just like when comparing, but they also use a lot of CPU for some reason. When serializing large graphs of objects of a lot of different classes, the BigDecimals stood out like a sore thumb in the CPU profile. Dumping them as String representations instead was quicker and didn’t use as much memory.

Posted in ,  | Tags  | no comments

Charles Miller on Maven

Posted by Anders Thu, 20 Dec 2007 09:27:00 GMT

Charles Miller nicely summarizes my opinions on Maven:

Paradoxically, by trying to make dependency management easy, maven makes it incredibly hard. It becomes dangerously easy for a project to accumulate dependency cruft – at best unnecessary, at worst conflicting – and excruciatingly painful to remove them.

Managing transitive dependencies by automatically traversing the entire dependency tree, the basic strategy of Maven, must be an anti-pattern. Managing them “manually” is a little more work, but will probably save time in the end and definitely lower risk.

Posted in ,  | Tags ,  | no comments

JUnit Hidden Feature: Enclosed

Posted by Anders Fri, 14 Sep 2007 15:11:00 GMT

When JUnit 4.1 was released last year, they added a nice feature that has gone mostly unnoticed.

RSpec envy

Consider this archetypical RSpec example in Ruby. One class, Stack, being tested in two different scenarios (empty and non-empty):

  describe Stack, " (empty)" do
    before(:each) do
      @stack = Stack.new
    end

    it "should have zero size" do
      @stack.size.should == 0
    end

    # ...
  end

  describe Stack, " (non-empty)" do
    before(:each) do
      @stack = Stack.new
      @stack.push 'x'
    end

    it "should have size greater than zero" do
      @stack.size.should > 0
    end

    # ...
  end

Doing the same thing in JUnit would require us to either create two different classes, which makes our tests hard to follow, or to abandon the “before”-methods and initializing at the start of each test method. Other tools, like JDave, solve this by having an inner class for each scenario, but for various reasons JDave isn’t the solution for me.

Enclosed to the rescue

When browsing the JUnit source code, my colleague Rickard stumbled on the org.junit.runners.Enclosed class. Apparently it’s been part of JUnit since 4.1 released in 2006. Enclosed is a test runner that runs all the inner classes of a class as tests. It works perfectly within IntelliJ and other tools. Now you can have Rspec-style testing in JUnit, almost. Behold its goodness:


@RunWith(org.junit.runners.Enclosed.class)
public class StackTest {

    public static class EmptyStack {
        private Stack stack;

        @Before
        public void before() {
            stack = new Stack();
        }

        @Test
        public void shouldHaveZeroSize() {
            assertEquals(0, stack.size());
        }

        // ...
    }

    public static class NonEmptyStack {
        private Stack stack;

        @Before
        public void before() {
            stack = new Stack();
            stack.push("x");
        }

        @Test
        public void shouldHaveSizeGreaterThanZero() {
            assertTrue(stack.size() > 0);
        }

        // ...
    }
}

Posted in ,  | Tags , , , ,  | 2 comments

The Commercial Constraint

Posted by Anders Fri, 07 Sep 2007 20:58:00 GMT

Open Source, System Architecture and Centralization

Thoughts on how Open Source software affects system architecture. Or, to be more precise, how the lack of per-server licensing affects it.

Last year, as an example, I worked on a project using JMS messaging. Originally the plan was to send messages to a local message broker application on the sending system (2-3 servers), through central message brokers in two data-centers (2-4 servers) and possibly brokers on every receiving system (lots and lots). Obviously we were planning to use an Open Source message broker, because if we had considered using a commercial product, we would never have come up with an architecture with so many brokers.

When the project was underway we had to switch to a commercial JMS broker. These products are big, “enterprisey” and expensive things, so suddenly we could only practically have four central servers, and they still would cost us a fortune. The result? Among other things worse performance, since the round-trip time to the central servers was longer than talking to a local broker. Even though the commercial software had much better high-availability support, the system was in practice more vulnerable because of its centralization. (Fortunately availability never became an issue, since the project failed and never reached production…)

Another, much better, example is Google. What would Microsoft charge them to run Windows on half a million servers? The kind of massively parallel architecture that Google use did exist early, but it wasn’t until the freely available Linux was available that it became popular, e.g. Beowulf clustering. Suddenly you could build a super-computer out of scrap hardware, something you earlier wouldn’t dream of.

When you can design your systems free of the licensing constraint, they tend to become more decentralized. Even though the total cost of a decentralized architecture may not be prohibitive, I think you subconsciously lean towards the centralized solutions when per-server licensing is in the picture. For such a major design constraint, it’s rarely put into numbers or even mentioned.

Posted in ,  | Tags , ,  | no comments

Links: Great Programming Articles

Posted by Anders Sun, 26 Aug 2007 07:55:00 GMT

Just so I don’t have to search for them again the next time I need someone to read them:

Tell, Don’t Ask

The Pragmatic Programmers show in simple words how to do good OO design, as they explain the Law of Demeter.

The Dependency Inversion Principle

Robert C. Martin on how to manage dependencies between classes and layers. (Don’t confuse it with Dependency Injection).

Posted in  | Tags  | no comments

Terracotta Clustering Observations

Posted by Anders Sun, 08 Jul 2007 19:50:00 GMT

I’m currently developing a clustered back-end system for a website using Terracotta. Not yet in production, but here are some random observations so far:

Transparency

The same old “transparency” that every new framework claims? Yes and no.

You can’t ignore that you’re writing a clustered app. There are performance concerns, memory usage, some obscurer parts of the Java API that’s unsupported, etc. There’s details that you must know, so you really have to read the documentation. You also have to configure Terracotta for your application.

But after you’ve grasp the basic idea about how the clustering works, it’s pretty straightforward. When you have a Terracotta-friendly overall architecture going, the day-to-day work with business logic details isn’t much affected by it.

Testing

Since Terracotta doesn’t have an API, unit testing the application is very simple. There’s nothing that requires mocking and things work the same with or without clustering. You can unit test as if Terracotta didn’t exist, which is much better than your average framework. I think the ease of unit testing is one of the strongest points of Terracotta.

Since you’re unit testing without the cluster, there’s still a lot of functional testing required to make sure things really work. It does happen that things break when you run them clustered. But most problems we’ve seen are easily fixed configuration problems, or threading issues that we would have even without clustering.

Threading

Since Terracotta follows Java’s “memory model”, you just have to write a correctly threaded Java application for it to work. Unfortunately it’s very tricky to write such a thing. The kind of sloppy thread programming you can often get away with, without any observable bugs, will not work. Fortunately Terracotta will catch a lot of problems directly, but you still have to have a good insight into Java threads to get things right. This ain’t Erlang.

Performance

The throughput and transaction rate looks good, you can do a lot of changes per second. For our application, with a big data-set, we’ve instead found the “virtual heap” speed to be what limits us. Paging in data from the server to the cluster nodes takes time. You can add new objects to the shared heap faster than you can page in old ones. This means we have to optimize our application for localized memory access. No real surprise.

Other

We’ve run into a few problems, small and big, but they’ve been quickly resolved with the help of the mailing list.

Posted in ,  | Tags  | 5 comments

Older posts: 1 2 3