3

I am trying to import a large amount of data from csv to neo4j using neo4j-rest java api. To avoid out of memory exceptions , I am using periodic commit , so a sample java code would be :

// just to let you know what classes I am using
    import org.neo4j.rest.graphdb.query.CypherTransaction;
    import org.neo4j.rest.graphdb.query.CypherTransaction.Statement;
    import org.neo4j.rest.graphdb.query.CypherTransaction.Result;
    import org.neo4j.rest.graphdb.query.CypherTransaction.ResultType;

private static final String CREATE_USER = 
    " USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM " +
                "\"URL\"   AS line  WITH line\n" +
                " CREATE (u:USER{id:toInt(line.customer_key)})";

//create USER Node
Statement userStatement = new Statement(CREATE_USER, null, ResultType.rest, false);

CypherTransaction periodicCommitTransaction = new CypherTransaction(dbPath, CypherTransaction.ResultType.rest);
            periodicCommitTransaction.addAll(userStatement);
            periodicCommitTransaction.commit();

Now my question is how should I handle transaction rollbacks in periodic commits? I know that the periodic commit statements can not be run in an open transaction and they should be committed right after the request is sent. This means there is no way to rollback if something goes wrong. I guess this is a common problem in batch insertions , so how should I handle such rollbacks? Should I drop my db in neo4j and try to start the whole process from the beginning? Any thoughts?

Lina
  • 867
  • 10
  • 21

1 Answers1

1

Correct, PERIODIC COMMIT commits every x-rows by default.

The only thing you can do is to mark your "in-flight" nodes with a certain label like :Importing and remove that label if your import was successful, or remove all nodes and their relationships if something failed. You have to batch it though.

MATCH  (n:Importing) 
WITH n LIMIT 10000 
DETACH DELETE n 
RETURN count(*);
Michael Hunger
  • 39,665
  • 3
  • 48
  • 74
  • Thanks Michael , well I need to drop everything ! In that case I think , I do not really need to mark them with any label. I am still using neo 2.2.5 so DETACH is not working here yet. Will use : MATCH n WITH n LIMIT 10000 OPTIONAL MATCH n-[r]-() DELETE n, r One more question. {WITH n LIMIT 10000 } --> Will this work as batches? I mean if I have 1000000 nodes then this will execute 100 times? – Lina Apr 01 '16 at 07:16