0

in my program I use a lot of serialization and deserialization of Jena (2.13.0) DatasetGraphs through Thrift and RDFDataMgr but at a certain point I get a OutOfMemory exception. Could someone help me in finding the problem?

OutOfMemoryError: GC overhead limit exceeded
at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
    at org.apache.jena.riot.thrift.TRDF.protocol(TRDF.java:72)
    at org.apache.jena.riot.thrift.StreamRDF2Thrift.<init>(StreamRDF2Thrift.java:55)
    at org.apache.jena.riot.thrift.BinRDF.streamToOutputStream(BinRDF.java:103)
    at org.apache.jena.riot.thrift.WriterDatasetThrift.write(WriterDatasetThrift.java:53)
    at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1331)
    at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1205)
    at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1195)

and

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:558)
    at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:222)
    at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
    at org.apache.thrift.TUnion.read(TUnion.java:138)
    at org.apache.jena.riot.thrift.wire.RDF_Quad$RDF_QuadStandardScheme.read(RDF_Quad.java:582)
    at org.apache.jena.riot.thrift.wire.RDF_Quad$RDF_QuadStandardScheme.read(RDF_Quad.java:549)
    at org.apache.jena.riot.thrift.wire.RDF_Quad.read(RDF_Quad.java:464)
    at org.apache.jena.riot.thrift.wire.RDF_StreamRow.standardSchemeReadValue(RDF_StreamRow.java:203)
    at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)
    at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
    at org.apache.thrift.TUnion.read(TUnion.java:138)
    at org.apache.jena.riot.thrift.BinRDF.apply(BinRDF.java:187)
    at org.apache.jena.riot.thrift.BinRDF.applyVisitor(BinRDF.java:176)
    at org.apache.jena.riot.thrift.BinRDF.protocolToStream(BinRDF.java:164)
    at org.apache.jena.riot.thrift.BinRDF.inputStreamToStream(BinRDF.java:149)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRDFThrift.read(RDFParserRegistry.java:221)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:906)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:577)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:554)
erhun
  • 3,133
  • 2
  • 29
  • 40
user2539645
  • 73
  • 1
  • 5
  • Did you check these links http://stackoverflow.com/questions/5839359/java-lang-outofmemoryerror-gc-overhead-limit-exceeded and this one http://stackoverflow.com/questions/1393486/error-java-lang-outofmemoryerror-gc-overhead-limit-exceeded – erhun Apr 15 '15 at 21:08
  • Yes, I augmented -Xmx from 6 to 10 and added GC options to VM but the problem is still there. With less serialization, with the same amount of data everything worked – user2539645 Apr 16 '15 at 06:44
  • How large are the RDF structures you are reading and serializing? Which collector are you using? – David Soroko Apr 16 '15 at 09:03
  • Without seeing a minimal example of your actual code that exhibits the problem it is impossible for anyone to give a real answer. All you can possibly get with your question as it stands is pointers to tuning JVM parameters (as you already got) and you already state doesn't really help in your use case. – RobV Apr 16 '15 at 14:07

1 Answers1

-1

Actually I'm using Spark and Flink to run complex mapreduce jobs and I serialize a lot of groups of quads using Thrift serialization of DatasetGraph. The methods I use the most are:

public static void ser(DatasetGraph dsg, byte[] b, Lang l) {
    InputStream is = new ByteArrayInputStream(b);
    RDFDataMgr.read(dsg, is, l);
    closeStream(is);
    dsg.close();
}

and

public static DatasetGraph deser(byte[] b, Lang l) {
    DatasetGraph ret = DatasetGraphFactory.createMem();
    InputStream is = new ByteArrayInputStream(b);
    RDFDataMgr.read(ret, is, l);
    closeStream(is);
    return ret;
}
user2539645
  • 73
  • 1
  • 5