Wednesday, December 9, 2009

Hibernate formulas do not like subselects

Hibernate formulas and the subselect fetch strategy are currently incompatible. There is already a bug associated with it: HHH-1400. I am linking the bug at the beginning because blog posts like this have a limited shelf life. If you came across this post from Google, it is possible that this note is already obsoleted by a bug fix. I have little hope of this happening very soon right now, though, seeing that HHH-1400 was opened almost 4 years ago.

What's a formula? A formula is a Hibernate feature that lets you map a property to an SQL expression rather than a table column. Hibernate would automatically compute the value of the property for you each time you load an entity. A formula in annotation form looks like the following:

@Formula("(select count(*) from foo where ...)")
public Integer getFooCount() { return fooCount; }

I think it is an elegant feature. Unfortunately,  this feature seems to interfere with the subselect fetch strategy.

Why do subselects matter? The subselect fetch mode is an important tool to deal with the n+1 select problem when loading entities with one-to-many relations. Suppose we had an entity called Parent with a collection of Child entities. Suppose also we load n Parents and eventually want to iterate over each element in Parent.children. The typical Hibernate-generated SQL with a lazy-loaded collection would look something like:

select * from Parent where ...;
select * from Child c where c.parent_id=?;
select * from Child c where c.parent_id=?;
select * from Child c where c.parent_id=?;
...

In other words, we would have n+1 select statements, which is bad if n is large. We could do a join between Parent and Child to eagerly load the collection, but this only works with one collection. If Parent has multiple collections, a join-fetch of multiple collections would lead to a Cartesian product, retrieving a prohibitively large number of rows. This is where the subselect fetch strategy comes in. If mapped to use the SUBSELECT fetch strategy, we would only need two select statements, with the first one's where clause reused in the second:

select * from Parent where ...;
select * from Child c where c.parent_id in 
   (select id from Parent where ...);


Unfortunately, if you had annotated a property with a @Formula, Hibernate will mangle the SQL, leading to an SQL parse error:

select * from Parent where ...;
select * from Child c where c.parent_id in 
   (select ...) as formula0_1_, ...;


If you really need to use subselect fetches, you'll have to find an alternative to @Formula for all its elegance. Possible solutions:
  • Compute the properties with separate queries only as needed. This is tricky if you frequently need those computed properties.
  • Sort of "denormalize" (I'm sure I'm misusing that term here) by persisting the computed values in the table. This works if the computed values rarely change, but is a problem otherwise.
  • Have the DB manage the computed values using views. Eyal Lupu has a great blog entry on how to do this. Obviously, this only works if your database supports views, and requires DB-specific SQL.

6 comments:

  1. Thanks for posting this, I ran into this problem and was baffled by the mangled SQL Hibernate was generating.

    ReplyDelete
  2. Thank you! I have the same problem with Hibernate 3.6!

    ReplyDelete
  3. If you capitalize the word "from" the problem goes away. For example:
    @Formula("(select count(*) FROM foo where ...)")

    The subselect generator (still) has the following line:

    int fromIndex = queryString.indexOf( " from " );

    which fortunately is case sensitive

    ReplyDelete
  4. OK, here it is, 10 years later from this original report, and this is STILL A BUG?!? This is still a bug on Hibernate 5.4.3-Final. Holy Cow! If you have a correlated sub-query for an @Formula, it tries to preface everything, including the table name of the sub-query table, with the table alias that hibernate generated for the outer query. That is just nuts.

    ReplyDelete
    Replies
    1. UPDATE: I found that the current bug seems only to exist if you try to quote the identifiers in the sub-query withing the formula. For example, this will not work:

      @Formula( "( select x.\"FOO\" from \"FOOBAR\" x where x.\"OOF\" = \"SPOOF\" )" )

      The above will result in a query that looks like:

      select ..., (select x._details_001."FOO" from _details_001."FOOBAR" x where x._details_001."OOF" = _details_001."SPOOF") from ... as _details_001 ... where ...

      However, this works correctly:

      @Formula( "( select x.FOO from FOOBAR x where x.OOF = SPOOF )" )

      and results in:


      select ..., (select x.FOO from FOOBAR x where x.OOF = _details_001.SPOOF) from ... as _details_001 ... where ...

      Who'd a thunk it?!?

      Delete