3

I have successfully migrated dblp dataset in to neo4j database and i use neo4jShell for running the cypher quires. The database has millions of nodes and relations between publications and authors. Now when i try to run a query on neo4j database it takes 10 to 12 hours for processing and then ended up with this error

Error occurred in server thread; nested exception is : java.lang.OutOfMemoryError: Java heap space

i am using neo4j community edition version 2.2.3, jdk 1.7 machine with 8 gb of memory and core i7 processor.

Query :

neo4j-sh (?)$ MATCH (p:`publication`)-[:`publishedby`]->(a:`author`)
RETURN p.year, p.type, a.id, count(*) order by a.id desc LIMIT 25;

Experts please advice me any way out from this exception.

Community
  • 1
  • 1
Muhammad Adnan
  • 450
  • 9
  • 22
  • Sounds like Neo4J isn't a good fit. This would be a trivial problem for a relational database. Object databases make sense for deep object graphs. That doesn't sound like the case here. – duffymo Aug 20 '15 at 10:40
  • 1
    Duplicate of [How to set the maximum memory usage for JVM?](http://stackoverflow.com/questions/1493913/how-to-set-the-maximum-memory-usage-for-jvm) – l4mpi Aug 20 '15 at 10:48
  • Try to rewrite your query. Example: `MATCH (a:author) WITH a LIMIT 25 MATCH (p:publication)-[p:publishedby]->(a) RETURN p.year, p.type, a.id ORDER BY a.id desc`. Using `WITH` statement. – FylmTM Aug 20 '15 at 10:49
  • Possible duplicate of http://stackoverflow.com/questions/24510188/what-is-an-outofmemoryerror-and-how-do-i-debug-and-fix-it – Raedwald Aug 20 '15 at 12:25
  • @Raedwald this is not the duplicate question as you have suggested, i getting this exception on neo4j database. i am fully aware of jvm memory issues... but question here is i want a solution of how to resolve this my modifying the query or modifying any configuration of neo4j – Muhammad Adnan Aug 20 '15 at 13:12
  • @FylmTM thanks for the reply bro, i will try this query. – Muhammad Adnan Aug 20 '15 at 13:15

2 Answers2

1

Probably you should set more max memory to your java process. Java process only uses up to the max memory amount configured, by default it is ussually only 256 MB. Use -Xmx parameter to achieve that. Read this How to set the maximum memory usage for JVM? to have more detailed explanation.

Be aware that you must use a 64 bit jdk, and 64 bit OS to set Xmx to more than 4 GB.

Community
  • 1
  • 1
Ricardo Vila
  • 1,529
  • 1
  • 16
  • 33
  • 1
    Please flag the question as a duplicate instead of answering it with a link to another SO question... – l4mpi Aug 20 '15 at 10:49
  • I'm pointing to a solution and adding some advice too. I don't think my answer is bad. Perhaps i should mark the question as a duplicate, but there are also solutions on the Neo4j side so it's not exactly a duplicate. Please reconsider your downvote. – Ricardo Vila Aug 20 '15 at 10:52
  • IMO your advice, while valid, should be a comment and not an answer; combined with a duplicate flag. And you're right that OPs Neo4j query is probably far from optimized, but as your answer doesn't deal with that it's not relevant. I'm very much against spoon feeding of people who apparently can't be bothered to search such as OP (there's countless resources on SO and outside of it describing what an OutOfMemoryError is and how it can be dealt with), and your answer doesn't add anything of importance which can't be found on SO already, hence the downvote. – l4mpi Aug 20 '15 at 11:16
  • @Raedwald this is not the duplicate question as you have suggested, i getting this exception on neo4j database. i am fully aware of jvm memory issues... but question here is i want a solution of how to resolve this my modifying the query or modifying any configuration of neo4j. – Muhammad Adnan Aug 20 '15 at 13:14
  • @Ricardo thanks for the reply bro, actually i have issues with neo4j database, while running a query on a billion node dataset. thats what i want to resolve. – Muhammad Adnan Aug 20 '15 at 13:14
  • @l4mpi i am fully aware of java.lang.OutOfMemoryError: in java, but consider reviewing my question is more related to the neo4j query optimization. i have done a lot of research before posting my question here so be sure, you are not spoon feeding any buddy here ... any help will be appreciated as we are all learners :) – Muhammad Adnan Aug 20 '15 at 13:23
1

As your dataset is a public dataset it would be very helpful if you could share your database.

In general you are computing many million or billion paths, which you are aggregating after the fact, that just takes a while. Combined with probably too little memory and a slow disk it takes a long time to load the data from disk.

This is a global graph query, you can see that if you run it prefixed with PROFILE.

Make sure your id property is numberic !

I would change the query like this:

// this is the expensive operation, to order millions of authors by id
// still, do it and take the top 25
MATCH (a:author) WITH a order by a.id LIMIT 25
// find publications for the top 25 authors
MATCH (a)<-[:publishedby]-(p)
// return aggregation
RETURN a.id, p.year, p.type, count(*)
LIMIT 25;

To start neo4j-shell with sensible memory settings:

  • stop the server
  • edit conf/neo4j-wrapper.conf, set min and maxmemory to 4000
  • edit conf/neo4j.properties set dbms.pagecache.memory=3G
  • start the server, run bin/neo4j-shell

if you run neo4j-shell in standalone mode, stop the server and use this:

export JAVA_OPTS="-Xmx4000M -Xms4000M -Xmn1000M" 
bin/neo4j-shell -path data/graph.db -config conf/neo4j.properties
Michael Hunger
  • 39,665
  • 3
  • 48
  • 74