Friday, October 29, 2010

Beware the magic flush

It often surprises me what a good Java profiler can tell me about Hibernate performance issues. Recently I was studying a performance issue that seemed straightforward. There were lots of objects, lots of writes and lots of queries. Preliminary profiling showed one particular Hibernate query dominating the operation. That's not surprising, I thought: it's a frequently made query, and it's probably slow. I'll just find a way to speed it up, maybe with an index.

As it turns out, it was a simple, quick query. There was little hope of speeding that query up much. But further profiling showed that very little time was spent actually doing the DB query. So what was Query.list() doing that was taking so long? It was the implicit flush. The Hibernate session keeps track of objects that you have loaded, looks for modified objects and writes those modified objects to the database when flushed. A final flush happens upon transaction commit. Additionally, Hibernate by default behavior flushes dirty objects to the database before making a query, to ensure that the query is made on data that reflects all updates made to that point. If you have lots of persistent objects loaded, each flush can be expensive because with each query:
  • Hibernate has to check all persistent objects in the session for changes 
  • Hibernate will write any updated objects to the database. This can be wasteful: 
    • The updates may not be complete yet, so these intermediate writes are unnecessary
    • Spreading the DB writes across reduces Hibernate's opportunities to batch SQL statements for performance compared to writing everything at the end of the transaction.
The fact that flushes can happen during queries is significant. If you realize that an operation that interleaves object modifications and queries, you might consider the potential for significant speedup just by preventing that interleaving. For example, you could make all the queries up-front before adding, modifying or deleting any persistent objects so the only flush would be the final pre-commit flush.

2 comments:

  1. Perhaps the problem is that Hibernate is encouraging you to create objects you don't need? Every created object is a disk read, no? Then it decides when to write them out for you? So you have too much disk activity and you are not in control of it? If I've read too much into this, just ignore it, but if I'm correct, then you cannot optimize what you cannot control.

    Maybe a little direct SQL to make the required updates and inserts would do the trick. But I would have to know more to be sure.

    ReplyDelete
  2. Actually, that is not really the problem. I needed all the objects I loaded, and Hibernate only writes objects that I modify in-memory. The problem was that the code was interleaving object updates and queries, so Hibernate kept doing its dirty check and writing before every query. This was unnecessarily expensive: batched DB updates are much faster, and the dirty checking was expensive when you have a lot of objects loaded.

    But fear not, this story has a happy ending. Hibernate does give you lots of options for optimization, and I mentioned a couple already. You do have substantial control over writes to the DB. The approach I ultimately took was to make the necessary queries first before updating any DB objects. This improved performance by an order of magnitude. Other possible solutions include turning off the automatic flush, or deferring actual updates to the very end. The last is especially feasible when you are creating, rather than updating objects: Hibernate won't know that you need to persist a new object until you call save or saveOrUpdate on that object, so you can hold the new objects in memory until it's time to write them out.

    ReplyDelete