12

I need to do a massive insert using EJB 3, Hibernate, Spring Data and Oracle. Originally, I am using Spring Data and code is below:

talaoAITDAO.save(taloes);

Where talaoAITDAO is a Spring Data JpaRepository subclass and taloes is a Collection of TalaoAIT entity. In this entity, Its respective ID has this form:

@Id
@Column(name = "ID_TALAO_AIT")
@SequenceGenerator(name = "SQ_TALAO_AIT", sequenceName = "SQ_TALAO_AIT", allocationSize = 1000)
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "SQ_TALAO_AIT")
private Long id;

Also this entity has no related entities to do cascade insert.

My problem here, is that all entities are individually inserted (such as INSERT INTO TABLE(col1, col2) VALUES (val1, val2)). Occasionally, it can cause a timeout and all insertions will be rolled back. I would want convert these individual inserts in batch inserts (such as INSERT INTO TABLE(col1, col2) VALUES (val11, val12), (val21, val22), (val31, val32), ...).

Studying alternatives to improve performance, I found this page in hibernate documentation, beyond Hibernate batch size confusion and this other page. Based on them, I wrote this code:

Session session = super.getEntityManager().unwrap(Session.class);
int batchSize = 1000;
for (int i = 0; i < taloes.size(); i++) {
    TalaoAIT talaoAIT = taloes.get(i);
    session.save(talaoAIT);
    if(i % batchSize == 0) {
        session.flush();
        session.clear();
    }
    taloes.add(talaoAIT);
}
session.flush();
session.clear();

Also, in peristence.xml, I added these properties:

<property name="hibernate.jdbc.batch_size" value="1000" />
<property name="order_inserts" value="true" />

However, although in my tests I had perceived a subtle difference (mainly with big collections and big batch sizes), it was not so big as desirable. In logging console, I saw that Hibernate continued to do individual inserts, not replacing them for massive insert. As in my entity, I am using a Sequence generator I believe that it is not problem (according Hibernate documentation, I would had problem if I was using Identity generator).

So, my question is what can be missing here. Some configuration? Some method not used?

Thanks,

Rafael Afonso.

Community
  • 1
  • 1
Rafael Afonso
  • 535
  • 1
  • 8
  • 24
  • Why are you unwrapping the `session`? You can do `flush()` and `clear()` on the `entityManager` directly. BUT: Usually doing big inserts with java is the wrong way; dumping everything to file, transferring these to the destination server and bulk loading usually works better. That said, occasionally you actually need to do some transformation in code; this is perhaps one of those cases. – beerbajay Nov 29 '13 at 12:09
  • I would first trim down the batch size to something more reasonable (like 50). Then enable DEBUG logging for hibernate and see what is happeing. Also make sure that you have a database (and JDBC Driver) that support batch updates. Which database and hibernate version are you using? – M. Deinum Nov 29 '13 at 12:36
  • 1
    If I remember that correct, then you will find single insert-statements for each entity in the log even if bulkinserts are used. If you enable you should see some special informations about batch updates. Something like "[AbstractBatcher] Executing batch size: 5" and "[Expectations] success of batch update unknown: 0" – treeno Nov 29 '13 at 12:42
  • M. Deinium: I am using Hibernate 4.1.9 with oracle 11.2. About Sequence, as I defined allocationSize in SequenceGenerator as 1000, it will be called every 1000 inserts. – Rafael Afonso Nov 29 '13 at 12:48
  • Hi any update on this so far.. ? I have the same issue i have updated each configuriation correctly but still spring data jpa generationg multiple insert statements. – utsav anand Apr 19 '18 at 09:00
  • You can always fallback to HQL, see my example [here](https://stackoverflow.com/a/63357298/320761). – Lukasz Frankowski Aug 11 '20 at 11:38

3 Answers3

16

A couple of things.

First your configuration properties are wrong order_inserts must be hibernate.order_inserts . Currently your setting is ignored and you haven't changed a thing.

Next use the EntityManager instead of doing all that nasty hibernate stuff. The EntityManager also has a flush and clear method. This should at least cleanup your method. Without the order this helps a little to cleanup the session and preventing dirty-checks on all the objects in there.

EntityManager em = getEntityManager();
int batchSize = 1000;
for (int i = 0; i < taloes.size(); i++) {
    TalaoAIT talaoAIT = taloes.get(i);
    em.persist(talaoAIT);
    if(i % batchSize == 0) {
        em.flush();
        em.clear();
    }
    taloes.add(talaoAIT);
}
em.flush();
em.clear();

Next you shouldn't make your batches to large as that can cause memory problems, start with something like 50 and test which/what performs best. There is a point at which dirty-checking is going to take more time then flusing and clearing to the database. You want to find this sweet spot.

M. Deinum
  • 94,295
  • 20
  • 185
  • 191
  • 1
    Indeed, write a loop with batchsize is between 20 and 50 and in that loop do a 'flush' and a 'clear'. Also, the hibernate property should be the same batch size : – K.C. Nov 29 '13 at 13:03
  • How do you get your entityManager? I keep getting a javax.persistence.TransactionRequiredException error when I try to flush – obesechicken13 Nov 23 '15 at 19:21
  • So at first I tried using @PersistentContext EntityManager entityManager to wire up the entity manager but then I wasn't allowed to flush using a shared entity manager. So I followed a solution here http://stackoverflow.com/questions/26606608/how-to-manually-start-a-transaction-on-a-shared-entitymanager-in-spring and it just hangs – obesechicken13 Nov 23 '15 at 20:49
1

The solution posted by M. Deinum worked great for me, provided I set the following Hibernate properties in my JPA persistence.xml file:

<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.jdbc.batch_versioned_data" value="true" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.cache.use_second_level_cache" value="false" />
<property name="hibernate.connection.autocommit" value="false" />

I am using an Oracle database, so I also have this one defined:

<property name="hibernate.dialect" value="org.hibernate.dialect.Oracle10gDialect" />
Jim Tough
  • 13,464
  • 23
  • 66
  • 90
0

I recently found a promising small library for batching inserts with Hibernate and Postgresql. It is called pedal-dialect and uses the Postgresql - command COPY which is claimed by many people to be much faster than batched inserts (references: Postgresql manual, Postgresql Insert Strategies - Performance Test, How does copy work and why is it so much faster than insert?). pedal-dialect allows to use COPY without fully losing the ease of use of Hibernate. You still get automatic mapping of entities and rows and don't have to implement it on your own.

mm759
  • 1,350
  • 1
  • 7
  • 7