0

I would like to read a big file (it cannot fit in heap as an object). I have to read line by line, process each line and then save (append) in a new file.

I finished first step (loading and processing) and I print output on the console. I don't create an object of data. I think I must do it on the fly, but I don't know libraries which might be helpful.

What is more I would like to add XML or CSV Serialization.
Do you know some libraries which might solve this problem?

for (String line; (line = bufferedReader.readLine()) != null; ) {
     String processedNewLine = processLine(Line);
     //and I would like to serialize to XML (append)
     XMLSerialiazer.serialize(processedNewLine, xmlTemp.getPath());

}
Community
  • 1
  • 1
bolec_kolec
  • 390
  • 1
  • 12
  • Your question has been answered before, take a look here http://stackoverflow.com/questions/14037404/java-read-large-text-file-with-70million-line-of-text – Ashouri Aug 22 '16 at 12:49
  • @M.RAshouri the question linked by you answer only for a csv file. Is not useful for an XML, because reading it row by row doesn't solve the problem, and sometimes the whole xml is saved in a single row to save space. – Davide Lorenzo MARINO Aug 22 '16 at 12:54
  • You want to serialize every porocessed line as an XML file? Why? – user207421 Aug 24 '16 at 00:29
  • Each processed line should be XML part of a one big XML file: `name...` – bolec_kolec Aug 25 '16 at 10:38

3 Answers3

0

If you use .csv files you need simply read them line by line. It is not really necessary to use a special library and you can work also with very big files without problems.

If you use .xml files you need a SAX parser. Basically a SAX parser is a parser that operates on events (like open tag, close tag) instead of building the whole structure in memory as it happens with a DOM parser.

Davide Lorenzo MARINO
  • 22,769
  • 4
  • 33
  • 48
  • With CSV was easy, but for XML it's more complicated. Finally I treat XML as Random Access File and append new lines to the end. It's not clean, but very fast. – bolec_kolec Aug 25 '16 at 10:41
0

If you are looking for an alternate approach to using available XML serialization libraries, please take a look into protocol buffers from google.

Tutorial

Git source

Gautam Jose
  • 676
  • 8
  • 20
0

You should look at Kryo, one of the fastest serialization libraries.

Guy Sela
  • 173
  • 1
  • 8