Thursday, February 24, 2011

The pain of enterprise Java development

Java enterprise development can be painful. Now, it's true that Java is not as concise as some other languages. It's also true that the Java ecosystem is full of famously complex tools and frameworks such old-style EJBs, JSF and Maven. Still, I believe that for us Java enterprise developers, much of the pain is self-inflicted. Let me suggest a few things we do to ourselves, often unnecessarily. These are generally justified as a Best Practice (spoken in a deep, authoritative voice) or something that decouples something or other. I think it says something about our mindset, and I don't necessarily mean in a bad way.



Pain 1: lots of modules

One day, I started importing a new Maven-based project into Eclipse, and watched in horror as some 30 Eclipse projects poured into my project list. The sensation was similar to watching a toilet overflow. This was a green-field Java project that had only 3 engineers working for about that many months. And we already had 30 Eclipse projects (and no shipping product). Why?

I know arguments in favor of modularity. It allows developers to work separately and not step on each others' toes. Each module can be developed and unit-tested independently. Modules allows us to control complexity by partitioning the pieces, controlling dependencies. This is all well and good. The pain comes from going module-crazy from the start. If you are starting from scratch, you are not going to have a code base large enough to justify a highly modular system. Worse, you won't know enough about what your project will look like in the future to make good decisions about where the module boundaries should be.

If you have too little code going into too many buckets, there is going to be pain. Fine-grained modularity means you might break up code that really belongs together, and get all tangled up managing dependencies. You do not have a working product yet, so the danger is that you go dark and work happily in your own little silo, oblivious to integration problems until much further down the road. When I work in a highly modularized project, it's not unusual for my code commits to span half a dozen or more Eclipse projects, which makes the value of this modularization questionable.

For green-field projects, my own inclination is to get to an integrated, working application going as soon as possible. If you don't have an integrated, working application, you won't always know if you have broken the overall application. In the meantime, there is too little code to partition meaningfully. At this stage, Java packages should be all you need.

Pain 2: middle management

Not you, boss. I have blogged about Java middle management before, that practice of inserting useless code so that you have to dig through 5-6 layers of  classes and interfaces to accomplish something that might normally take just 1 line of code. Every implementation class would have its own interface. It's defensive coding, designed to isolate us from change. It's not that Java interfaces are a bad thing. But reflexively slapping an interface on everything that moves just because it is a Best Practice just seems so much unnecessary pain. Come on, we have some of the finest refactoring tools in existence. We can add interfaces later if we really need them. I believe the main justification behind this piling on of interfaces is to "decouple" code. "Decouple A from B" is a professional-sounding way of saying "put useless crap between A and B". The end result is that not only do you have to maintain A and B, but now you also have to maintain the crap too.

Pain 3: framework frenzy

Enterprise developers just love our technology frameworks. I think it is a sign of the richness of the Java platform that we have so many viable choices. Sometimes, the pain comes from not being able to say "no" to these choices. A team might lack discipline in introducing libraries, so an application might end up with 4 logging APIs, both Google Collections and Commons Collection, and so on. A team might be so afraid of changes that we might end up chasing abstractions that isolate us from having to choose. But I think the biggest pain comes not from lack of discipline nor timidity, but from ambition. We want to do it "right", from the very beginning.

The frameworks of ambition are the all-encompassing ones such ESBs, EIP frameworks (Camel, Spring Integration), OSGi and workflow engines (jBPM). You don't just use these frameworks: they use you. You basically build your application around the framework. These are all powerful technologies, but they all carry a price in terms of development complexity, maintenance, deployment and migration. And you may not really need them.

Part of this phenomenon, I suspect, comes from a variation of what Fred Brooks in his Mythical Man-Month calls the Second System Effect. Developers, on building an earlier system, will think about frill after frill that could have been used, and mentally file it for "next time". When "later" comes around and it's time to build a new system, the combined cumulative wish list of an experienced team gets piled onto the project. Which is why Brooks says this second system "is the most dangerous system a man ever designs".

Pain 4: extreme generality

We want our applications to be flexible. But generality has to be built. We could parameterize everything, and make them configurable. Going further, we could decree that those parameters be stored in an SQL database so they can be maintained programmatically. Now we have properties to maintain and track, and additional SQL scripts that need to be deployed, and more places that can go kablooie if the database is not right. And during development, we now have to execute SQL each time we want to modify a parameter instead of just editing a file.

Another manifestation of excessive generality is where we decide not to "hard-code" business process. That's when the BPM engine (e.g., jBPM) gets rolled in, configured and deployed. You'll need a separate database schema, of course. And you'll have to design that workflow separately, either with a ton of XML or a custom workflow designer tool.

Flexibility generally comes from externalizing stuff that is normally embedded in code. Often, this is desirable and even necessary. The pain comes from doing this externalizing -- of parameters, processes, or delegating to an external application -- even where the flexibility is not yet needed. Externals are exposed, therefore we end up with more moving parts to maintain.

Pain 5: hand-coding your own DDL

This, I don't get at all. Hibernate is perfectly capable of automatically generating DDL that is perfectly in sync with your mappings. But apparently, there is a school of thought that DB initialization scripts should be maintained by hand. This is a Best Practice, meant to decouple (there's that word again) the database from Hibernate. My thinking is that Hibernate and your database schema are unavoidably coupled: Hibernate will barf if they are out of sync. You'd only be kidding yourself by not letting Hibernate generate your DDL for you. All that means is that you have to maintain that coupling manually.

That is not to say that standalone DDL has no place. If you are working with a legacy schema, or a database that your application does not own, then Hibernate has no business driving the schema. You adapt Hibernate to the database, not the reverse. But if your application owns the schema, then why not let Hibernate own it?


Pain 6: Hibernate named queries

Java developers think SQL will give them cooties.That's the best reason I have for why we are so reluctant to keep SQL alongside Java. As far as I am concerned, SQL is code and belongs with code. This is all the more true with Hibernate's HQL, which has object semantics. I think there is no need to decouple (here we go again) HQL/SQL from Java. So I have never understood the fondness many developers have for tucking away their HQL in XML files, using them as named queries. There are a few ways that this hurts:
  • Now information is in 2 places. When you use the query in Java, you want to know what exactly that query fetches, what entity references it initializes, what named parameters it uses. Guess what? That information is in the HQL, which you exiled into the XML ghetto.
  • Named queries are referenced by name. You're just throwing strings around, and you have to make sure those magic strings are always in sync in the invocation and the XML. That's more arbitrary stuff to maintain.
  • Effort is doubled. Each new query means hunting down the file that holds named queries, adding that query there, then writing the Java that uses that query in another file. If you need to debug something, you'd have read the code that invokes it and look up the query in the other file that holds the named queries. The latter will get pretty crowded as queries multiply.
I have heard people argue that the queries should be centralized for easier maintenance. Except that they aren't. Remember pain #1? You probably have those named queries tucked into 30 XML files, that are in turn zipped into 30 jars. And why would query centralization make maintenance easier anyway, when you no longer have the context in which they are used? By contrast if you keep the queries within the context in which they are used, then you have all the logic -- Java and HQL -- for each query in the same place. And if you diligently maintain a discrete persistence layer, DB query logic should now be relatively isolated and easy to maintain.

The complexity dragon

I've described a few ways we enterprise Java developers make our lives miserable. We don't do this because we want to be miserable. The fact is, we are ambitious. We want to build big, complex and long-lived code. That kind of code breeds dragons. (The source code for XTerm famously had a comment "There be serious and nasty dragons here".) As a code base grows, its complexity -- the number of possible interactions between components -- grows as the square of  the code's size.  If the complexity dragon is untamed, the code base becomes unusable: each new feature or bug fix takes longer to add as each little change breaks something elsewhere.

So we tell ourselves, "let's do it right from the start". We break the project up into modules. We use architectural layers. We decouple code by using interfaces. We write highly generalizable code, because the  world will realize the awesomeness of our code and want to use it. We adopt complex frameworks with more capability than we need in order to future-proof our product. We want to pay the cost up front, rather than have to rearchitect a big code base later on.

The trouble with that thinking is that the complexity dragon is not just a cost: it is a tax. Once you start paying it, you will continue paying it: every day, every hour, every minute, for the rest of your product's life. That complexity you introduce up front will be with you always. The complexity dragon you grow will be constantly on your back. Do you want it so soon? Some decisions you might make for a project in its first month may be perfectly good or even necessary for a project in its fifth or tenth year. But if you accept that cost now, your project may never make it to year 5, becoming a casualty of faster competition or cost/schedule overruns. Perhaps in many cases the answer to the question "should we adopt Best Practice X?" should be "not yet".

10 comments:

  1. You are not wrong in what you say. I can relate to all those things as both a victim and perpetrator!!

    It reminds me of a situation I am in right now. We have many, many Selenium-IDE-scripted tests in my app. They are an increasing maintenance burden and to rewrite to use the Selenium Java API is unlikely.

    In the beginning, going with the Java API is a hard sell but the 'record-and-playback' a lot easier. At some point though, there's a tipping point when you regret that decision.

    So if I start a new project tomorrow, I will use the Selenium Java API and the page objects pattern. Yes, it will seem to be harder work than necessary but I can't tell if I will reach that tipping point but at the same time, if I do, I may not be able to rewrite the tests in Java.

    Have I just fallen into the trap you mention?

    ReplyDelete
  2. As both a victim and oppressor I love this.

    ReplyDelete
  3. 1) Not creating modules (sensibly) prevents reuse.
    2) Not using interfaces removes teastability (and reuse)
    3) People avoiding frameworks regularly end up writing their own; uncomplete, undocumented and unproven.
    4) depends on use case
    5) I'm ok with
    6) Inline queries are generally unformatted and hard to read. Worse people end up using string concatenation and open up to injection attacks.
    Also JPA will tell you when a named query is not available or not compiles.
    The persistence.xml will let you know where the files are

    ReplyDelete
  4. 5) Hibernate/Eclipselink/etc generated DDL's will generate great table definitions. But, sorry to say, that you still need at least some SQL to define indices and other goodies.

    ReplyDelete
  5. Good article. I feel your pain.

    ReplyDelete
  6. Thanks for commenting, Ingo. Unfortunately, your comments are vague, the sort of hand-wavy platitudes that I no longer find convincing after years in the industry. I'd need more elaboration to be convinced.

    1) I was talking about a project's early stages. I'd say that in the first year or two you won't have written enough code to meaningfully justify modules. Just "reuse" it as one jar.

    2) Ah yes, "testability". I've already dealt with that in the linked blog posting, but the bottom line is that interfaces are not necessary for testing. Want to stub a class? Use Mockito, or just subclass it. Also, the vast majority of classes have only one implementation, so preemptively slapping an interface on them is mostly useless.

    3) The framework question is not a buy vs build dichotomy. The question is whether you actually need such a framework (like an ESB), not whether you need to build your own.

    6) If you can't trust a developer to use inline queries, maybe you shouldn't trust a developer to write your code. I always format my inline queries, and programmatic queries should be constructed using the Criteria API (which you can't externalize anyway) rather than string concatenation.

    ReplyDelete
  7. I can imagine the pain you are going through. A reader on my blog asked me a question that what are the things he should learn for getting a job in java and as I started explaining this things to her. She got overwhelmed with the amount of frameworks and tools she needs to learn and each of them require a lot of study and hands on experience.

    ReplyDelete
  8. Hi,

    I can’t believe I was lucky enough to find this article. I love this kind of content because it gives a lot of great information. I really enjoy reading most part of the article. I'm learning about Java in this blog. Really it will help lot of people.

    Outsource JBPM Development

    Thanks!

    ReplyDelete