Hibernate Tip – Session Closing for Large Imports

We recently moved from the poor session-per-sql-statement approach to Hibernate to the open session in view approach, complete with thread local handling of the session. (Yes, I know I could use Spring.) We do have some large, legacy import jobs that we have been migrating, and they were opening the session at the beginning, importing 40,000 records, and closing the session. At least we were trying.

As best I can figure it, Hibernate’s session cache isn’t limited in size or objects (at least by default). So the import would start strong, slow down, slow to a crawl, and then pretty much kill the server, requiring a reboot (of RedHat). So by never closing the session, we were causing the session cache to grow to occupy most of the system’s memory.

Since the cache really wasn’t doing us any good, I decided to try closing the session after each record was processed. This changed the performance dramatically. The import went from taking days to run and crashing about 2/3rds of the way through to completing in a couple of hours. While this discovery isn’t earth-shattering, I thought that I’d put this tip out there in case someone else runs into similar problems.

For most tasks, allowing several queries worth of data to accumulate in the session cache is likely harmless and probably helpful, but be aware of its impact on large jobs.


8 thoughts on “Hibernate Tip – Session Closing for Large Imports

  1. Yes, we found this problem. The solution to the problem – switch to OJB. Our problems with the pointless “cache everything until you commit” feature went away.

    The Hibernate folks keep saying – why hit the database when you don’t have to. I think the opposite, if you are writing data to the database, why cache it up. You are going to have to write it at some point, how does caching a write help? A database is extremely good at handling transactions, why replicate it in Hibernate and have users have problems like this.

    Of course, the JBoss people are very good at implementing very shitty things and not listening to their users about fixing them. Like the JBoss classloader and the multi-cast clustering.

    Switch to OJB and it will run even faster.

  2. Already had to dump OJB.

    We gave OJB the first shot. The problem we ran into had to do with OJB/JBoss and some silly eager-release attribute. Memory isn’t perfect on this one, but essentially if it was false, it led to some resource consumption error. If true, then it would only load the first element of a collection, the 2nd element would load on the 2nd call to that collection, and so on.

    In short, no matter which way we set the stupid configuration parameter, there was going to be an unacceptable consequence. Thought inconvenient, Hibernate is working for us.

  3. have you tried to flush() your Hibernate session ? I’s significantly cheaper than reopening a session and you can do it in smartly sized batches.

    My experience shown that Castor ‘JDO’ and OJB are slower that Hibernate if your code is properly tuned.

  4. Yes, if you have a large batch of data, it is
    probably worthwhile to call Session.flush()
    every once in a while. This will execute the
    SQL to insert the data into the database, but
    won’t remove the object from session cache and
    your session will still grow. So you’ll
    also want to remove the object from session
    cache by calling Session.evict(Object) after

    Of course this manual flushing and evicting is
    only necessary when you have huge amount of data
    (like your 40,000 records) to insert in batch.

  5. The comments about OJB are incredibly ill-informed.

    The reason for the “cache everything unless explicitly told not to” policy is to enable automatic dirty checking, a feature that OJB does not have in its PersistenceBroker API.

    You can *at any time* flush() then clear() the session cache. It’s funny. You will need to explicitly call flush()/clear() much, much, much less often in your application than you would need to explicitly call update() in OJB PersistenceBroker.

    So you guys are so wrong it hurts.

  6. In addition to the Hibernate tips above, I’ll add that in my experience building large scale batch processing frameworks, it’s usually a good idea to commit every x records so as not to keep a transaction open while you process 100,000+ records. It does make batch recovery a bit more tricky though, since it’s not all one transaction, but you could create a temp table or something and do a “select into”…

  7. Of course, any time Hibernate doesn’t work right, it’s the users fault. The authors love to point that out every time they can.

    Don’t you love it when a vendor tells it’s users that they are wrong so much?

  8. Nobody showed that Hibernate “doesn’t work right”: on the contrary, someone pointed out that there is a reason why Hibernate works this way, and it’s a (very useful, indeed) feature. Anyway, for anyone that has used Hibernate for more than 10 minutes, and has spent a couple of hours reading the doc, the whole point of this blog entry and the OJB comments is totally screwed, and the “jboss is evil” comments are incredibly lame. Note: I am NOT on the Hibernate/Jboss team, only a happy Hibernate user, which respects a lot Gavin’s work: try to give a look to the inner Hibernate architecture, before saying it’s “shitty”.

Comments are closed.