September 05, 2004

Financial Cryptography v. The Enterprise

People often say, you should be using XXX. Where, XXX includes for today's discussion, J2EE, and various other alphabet soup systems, also known hilariously as "solutions". (I kid you not, there is a Java Alphabet Soup!) I've been working on one area of this jungle for the last many months, for a website frontend to payments systems - mostly because when it comes to websites, simple solutions don't cut it any more.

I came across this one post by Cameron Purdy which struck as a very clear example of why FC cannot use stuff like J2EE application servers for the backends [1]. The problem is simple. What happens when it crashes? As everything of importance is about transactional thinking, the wise FCer (and there are many out there) builds only for one circumstance: the crash.

Why do we care so much? It's straight economics. Every transaction has the potential to go horribly wrong. Yet, almost all transactions will earn about a penny if they go right. This means that the only economic equation of import in this world is: how many support calls per 1000 transactions, and how much revenue per 1000 transactions? If the answer to the first question is anything different to zero, worry. The worst part of the worry is that it will all seem ok until well after you think you are successful... You won't see it coming, and neither did a half dozen examples I can think of right now!

So every transaction has to be perfect and flawless. To do that, you have to be able to answer the question, what happens if it crashes? And the answer has to be, restart and it carries on. You lose time, but you never have to manually bury a transaction.

And here's the rub: as soon as you pull in one of these enterprise tools (see below for a definition of enterprise!) you can't answer the question. For JBoss, the open source application server under question below, it's "maybe." For all those big selling solutions like Oracle, IBM, SAP etc etc, the answer is: "Oh, yes, we can do that," and how is what Cameron's post describes. In words very briefly, it's a maybe, and it's more expensive [2].

Assuming that you can do all that, and you really do know the issues to address, and you pick the right solution, then the cost to take a fully capable transaction system, and show that it is "right" is probably about the same cost as writing the darn thing from scratch, and making it right. The only difference is that the first solution is more expensive.

That's all IMHO - but I've been there and done that about a half-dozen times now, and I know others in who've made the same soup out of the same staple ingredients. Unfortunately, the whole software market place has no time for anything but an expensive tin of jumbled letters, as you can't sell something that the customer does himself.

[1] What is the recovery log for in a J2EE engine? Cameron Purdy answers: Recovery Log.
[2] Scroll down a bit to message 136589 if you're unsure of this. Ignore the brand and keep the question firmly in mind.


Recovery log

Posted By: Cameron Purdy on September 01, 2004 @ 08:58 AM in response to Message #136449 1 replies in this thread

Does it have a recovery log now? The ability to survive a crash is important.

??? I thought recovery logs was a feature of DB engines. I've never seen that in a J2EE application server. Did you mean a JTA transactions recovery service, used to automatically switch the started JTA transactions to another server in the cluster in case of failure ?

It is for transactions that contain more than one transactional resource, or "Resource Manager" (RM) in OTS parlance. For example, let's say I have MQ Series and an Oracle database, and one of my transactions processes a message from MQ Series and does updates to Oracle. If I fail to commit, then I don't want my partial changes being made to Oracle, which is what a simple JDBC transaction does for me. Even more importantly, if I fail to commit, then I want someone else to get the message from MQ Series, as if I had never seen it.

This is called "recoverable two-phase commit with multiple resource managers," and implies that a recoverable transaction manager log the various steps that it goes through when it commits a transaction. In this case, the "transaction" is a virtual construct with two "branches" - one going to an Oracle transaction and one going to MQ Series. Let's consider what can happen:

1) Something screws up and we roll back the transaction. In this case both branches are rolled back, and everyone is happy. Since we never tried to commit (i.e. we never "prepared" the transactions,) each of the branches would know to automatically roll back if a certain time period passed without notification. This is called "assumed rollback before prepare."

2) Nothing screws up and we commit the transaction. This commit in turn prepares both branches, and if both succeed, then it does a commit on both branches. Everyone is happy.

3) The problem areas exist only once the first branch is requested to prepare, until all branches have been either committed or rolled back. For example, if both branches are prepared, but then the server calling "prepare" and "commit" dies before it gets both "commit" commands out. In this case, the transaction is left hanging and has to be either manually "rolled forward" or "rolled back," or the transaction log needs to be recovered.

This "transaction log" thingie is basically a record of all the potential problem points (prepares, commits, post-prepare rollbacks) that the server managing the "virtual transaction" has encountered. The problem is that when the server restarts and reads its log, it doesn't have the old JDBC connection object that it was using to manage the Oracle transaction, and it doesn't have whatever JMS (etc.) objects that it was using to manage MQ Series. So now it has to somehow contact Oracle and MQ Series and figure out "what's up." The way that it keeps track of the transactions that it no longer has "references" to is to create identifiers for the transactions and each branch of the transaction. These are "transaction IDs", or "XIDs" since "X" is often used to abreviate "transaction." These XIDs are logged in a special kind of file (called a transaction log) that is supposed to be safely flushed to disk at each stage (note that I gloss over the details because entire books are written on this tiny subject) so that when the server comes back up, it is sure to be able to find out everything important that happened before it died.

Now, getting back to the JBoss question, it used to have a non-recoverable implementation, meaning that if the server died during 2PC processing, those transactions would not be recoverable by JBoss. I haven't looked lately, so it could have been fixed already .. there are several open source projects that they could have glued in to resolve it with minimal effort. (JBoss is 90% "other projects" anyway .. which is one of the benefits of being able to make new open source stuff out of existing open source stuff.)

As far as whether it is an important feature is largely irrelevant to most "J2EE" applications, since they have exactly one transactional resource -- the database. However, "enterprise" applications often have more than one RM, and in fact that is what qualifies the applications as being "enterprise" -- the fact that they have to glue together lots of ugly crap from previous "enterprise" applications that in turn glued together ugly crap from previous generations and so-on. (For some reason, some people think that "enterprise applications" are just apps that have lots of users. If that were the case, Yahoo! would be an "enterprise application." ;-)

The funny thing about this particular article is that it's written by a guy who gets paid to sell you open source solutions, so he's writing an article that says it's OK for you to pay him to sell you on JBoss. That doesn't mean that he's right or wrong, it just means that the article is about as objective as a marketing user story .. but at least it was submitted to TSS by the US Chamber of Commerce. ;-)

To answer the question, is JBoss ready for enterprise deployment .. that's a tough one. I know what Bill Burke would say, and if you had Bill Burke working for you full time, then you would probably be comfortable using JBoss for some enterprise deployments. I think that the best way to ascertain the applicability of JBoss (or any product) to a particular problem is to find users who were braver than you and already tried it to solve a similar problem. Further, it's more than just asking "Does it work?" but also finding out "How does it react when things go wrong?" For example, without a recovery log, JBoss will work fine with 2PC transactions .. until a JBoss server crashes. How does it react when you reboot the server? How do you deal with heuristically mixed transactional outcomes? Is it a manual process? Do you know how to resolve the problem? How much does it cost to get support from JBoss to answer the questions?

JBoss if fine for 90% of "J2EE" applications. In fact, it's probably overkill for the 75% of those that should just use Caucho Resin. ;-) The question remains, is it fine for "enterprise deployments." I'm not convinced, but it's only a matter of time until it (or Apache Geronimo) gets there.

Peace,

Cameron Purdy
Tangosol, Inc.
Coherence: Shared Memories for J2EE Clusters

Posted by iang at September 5, 2004 07:04 AM | TrackBack
Comments

Personally, if I were a good technologist (not a code monkey or a manager) _and_ I knew exactly what I am after, I would start with good but basic infra like Caucho Resin and build the rest myself. J2EE is so layered, it's very hard to know what's going on inside unless you are a J2EE maven, which kinds of defeats its stated purpose (namely, to allow non-specialists to build "enterprise-class" web apps with relative ease).

Why use Java anyway? Have you tried playing with Erlang?

Posted by: Olivier at September 5, 2004 08:10 PM

Is J2EE's stated purpose to allow non-specialists to build "enterprise-class" web apps with relative ease? I think the post by Cameron Purdy puts paid to that - his very definition indicates that this is a non-specialist need:

> However, "enterprise" applications often have more than one RM, and in
> fact that is what qualifies the applications as being "enterprise" -- the
> fact that they have to glue together lots of ugly crap from previous
> "enterprise" applications that in turn glued together ugly crap from
> previous generations and so-on.

I haven't looked at Caucho Resin, how does it compare (to JBoss) ?

The reason for not looking at Erlang is that it hasn't got wide-spread support. When it comes to writing big systems over many years, you do have to choose something that's got legs. GH chose Java for us in 1995 and it was a good choice; there are few competitors that support big systems, are easy to program and have wide support.

Posted by: Iang at September 6, 2004 03:57 AM

>Is J2EE's stated purpose to allow non-specialists to build
> "enterprise-class" web apps with relative ease?


That's how I read it. You can implement the recoverability features described by Purdy's post with a home-grown scheme but that may give you gray hairs.

However my comment on building it yourself assumed you have a relatively simple application and no legacy bits. The point is that as long it doesn't take you longer to implement something yourself, building is always better than buying if the penalty for failure is severe because, first, when you buy usually you don't understand what is going on inside the software and, second, the way software licensing contracts are written, you'll have no recourse for damages incured when using the 3rd-party software, even if it was buggy.


>> I haven't looked at Caucho Resin, how does it compare (to JBoss) ?


It's just a servlet processor. It seems I had misunderstood your application needs: for db stuff, Resin will not help you.


>> The reason for not looking at Erlang is that it hasn't got widespread
>> support.


I think that's a fantasy. Erlang is used at Ericsson for mission-critical systems and you can buy support from Ericsson and various consultancies. True, it is nowhere as widespread as Java but it has a growing community of real-world users and it is not an academic fringe language.

-- O.L.

Posted by: Olivier at September 6, 2004 12:58 PM

the benefit of so called widespread support is vastly overrated. I have a decent amount of experience with J2EE and Java in general and still prefer to build systems with Smalltalk. Smalltalk does not have as widespread support as Java does, but it is modern and powerful and had been quite successful in delivering enterprise solutions for quite a number of years and has a strong community of real users. The idea that Java is the end all only benefits big companies like Sun and IBM. There are a number of viable technologies out there that have some real advantages over so called "widely" supported technologies such as J2EE.

Posted by: Charles Monteiro at September 13, 2004 10:42 AM