As it turns out, it was a simple, quick query. There was little hope of speeding that query up much. But further profiling showed that very little time was spent actually doing the DB query. So what was Query.list() doing that was taking so long? It was the implicit flush. The Hibernate session keeps track of objects that you have loaded, looks for modified objects and writes those modified objects to the database when flushed. A final flush happens upon transaction commit. Additionally, Hibernate by default behavior flushes dirty objects to the database before making a query, to ensure that the query is made on data that reflects all updates made to that point. If you have lots of persistent objects loaded, each flush can be expensive because with each query:
- Hibernate has to check all persistent objects in the session for changes
- Hibernate will write any updated objects to the database. This can be wasteful:
- The updates may not be complete yet, so these intermediate writes are unnecessary
- Spreading the DB writes across reduces Hibernate's opportunities to batch SQL statements for performance compared to writing everything at the end of the transaction.
Perhaps the problem is that Hibernate is encouraging you to create objects you don't need? Every created object is a disk read, no? Then it decides when to write them out for you? So you have too much disk activity and you are not in control of it? If I've read too much into this, just ignore it, but if I'm correct, then you cannot optimize what you cannot control.
ReplyDeleteMaybe a little direct SQL to make the required updates and inserts would do the trick. But I would have to know more to be sure.
Actually, that is not really the problem. I needed all the objects I loaded, and Hibernate only writes objects that I modify in-memory. The problem was that the code was interleaving object updates and queries, so Hibernate kept doing its dirty check and writing before every query. This was unnecessarily expensive: batched DB updates are much faster, and the dirty checking was expensive when you have a lot of objects loaded.
ReplyDeleteBut fear not, this story has a happy ending. Hibernate does give you lots of options for optimization, and I mentioned a couple already. You do have substantial control over writes to the DB. The approach I ultimately took was to make the necessary queries first before updating any DB objects. This improved performance by an order of magnitude. Other possible solutions include turning off the automatic flush, or deferring actual updates to the very end. The last is especially feasible when you are creating, rather than updating objects: Hibernate won't know that you need to persist a new object until you call save or saveOrUpdate on that object, so you can hold the new objects in memory until it's time to write them out.