Terracotta Clustering Observations
Posted by Anders Sun, 08 Jul 2007 19:50:00 GMT
I’m currently developing a clustered back-end system for a website using Terracotta. Not yet in production, but here are some random observations so far:
Transparency
The same old “transparency” that every new framework claims? Yes and no.
You can’t ignore that you’re writing a clustered app. There are performance concerns, memory usage, some obscurer parts of the Java API that’s unsupported, etc. There’s details that you must know, so you really have to read the documentation. You also have to configure Terracotta for your application.
But after you’ve grasp the basic idea about how the clustering works, it’s pretty straightforward. When you have a Terracotta-friendly overall architecture going, the day-to-day work with business logic details isn’t much affected by it.
Testing
Since Terracotta doesn’t have an API, unit testing the application is very simple. There’s nothing that requires mocking and things work the same with or without clustering. You can unit test as if Terracotta didn’t exist, which is much better than your average framework. I think the ease of unit testing is one of the strongest points of Terracotta.
Since you’re unit testing without the cluster, there’s still a lot of functional testing required to make sure things really work. It does happen that things break when you run them clustered. But most problems we’ve seen are easily fixed configuration problems, or threading issues that we would have even without clustering.
Threading
Since Terracotta follows Java’s “memory model”, you just have to write a correctly threaded Java application for it to work. Unfortunately it’s very tricky to write such a thing. The kind of sloppy thread programming you can often get away with, without any observable bugs, will not work. Fortunately Terracotta will catch a lot of problems directly, but you still have to have a good insight into Java threads to get things right. This ain’t Erlang.
Performance
The throughput and transaction rate looks good, you can do a lot of changes per second. For our application, with a big data-set, we’ve instead found the “virtual heap” speed to be what limits us. Paging in data from the server to the cluster nodes takes time. You can add new objects to the shared heap faster than you can page in old ones. This means we have to optimize our application for localized memory access. No real surprise.
Other
We’ve run into a few problems, small and big, but they’ve been quickly resolved with the help of the mailing list.
