2

I have a very large HashMap of the format HashMap<String, List<String>>, and I want to serialize it using BufferedOutputStream because I think that it will be more efficient than with a regular OutputStream.

But how do I divide the HashMap in chunks of the size of the buffer? Should I just iterate through the HashMap?

Karol Dowbecki
  • 38,744
  • 9
  • 58
  • 89
S.D
  • 372
  • 1
  • 12
  • 1
    Where are you serializing the object into: SSD, RAM, S3? You might not get any performance benefit with `BufferedOutputStream` if it's RAM. – Karol Dowbecki Nov 18 '19 at 14:21
  • A file in a HDD. It's a file that may be read multiple times. Does that mean I should approach it differently? – S.D Nov 18 '19 at 14:22
  • 3
    You don't need to divide the hashmap in chunks. Just wrap your buffered stream into an ObjectOutputStream, and use writeObject() to write your HashMap. The BufferedStream will make sure all by itself to bufferize the multiple writes that writeObject() might do, and to write to the underlying stream when its internal buffer is full. – JB Nizet Nov 18 '19 at 14:24
  • Thanks, @JBNizet . Do you mean something like this? FileOutputStream fout = new FileOutputStream(outputfile);BufferedOutputStream bout = ObjectOutputStream(new BufferedOutputStream(fout));? – S.D Nov 18 '19 at 14:33
  • That wouldn't compile. An ObjectOutputStream is not a BufferedOutputStream. You first wrap the OutputStream into a BufferedOutputStream. And you then wrap the BuffuredOutputStream into an ObjectOutputStream. – JB Nizet Nov 18 '19 at 14:34

1 Answers1

2

If you plan to write into a local file you need to chain FileOutputStream, BufferedOutputStream and ObjectOutputStream. With below setup BufferedOutputStream should minimize direct writes to the file system using default buffer of 8192 bytes.

Map<String, List<String>> data = new HashMap<>();
data.put("myKey", List.of("A", "B", "C"));

File outFile = new File("out.bin");
try (FileOutputStream fos = new FileOutputStream(outFile);
     BufferedOutputStream bos = new BufferedOutputStream(fos);
     ObjectOutputStream oos = new ObjectOutputStream(bos)) {
    oos.writeObject(data);
    oos.flush();
}

Unless the output file is too big there is no need for further chunking.

Karol Dowbecki
  • 38,744
  • 9
  • 58
  • 89
  • Thanks. May I ask what is more efficient if I don't want to store the information about the class and just the String and List so that I can quickly read through the file and find exactly the key? – S.D Nov 18 '19 at 15:26
  • 1
    Instead of using Java serialization which is very verbose use Protobuf/Thrift format or write your own format. Finding quickly by key in a file is generally not possible unless you have ordered the entries somehow e.g. read how database builds an index for a column. – Karol Dowbecki Nov 18 '19 at 16:06
  • My idea was to have a hashmap where the String is the key and the Integer is the number of bytes/characters from the beginning of the file until the beginning of the values of that key (basically, the position in the file). Do you think that could work? – S.D Nov 18 '19 at 16:21
  • @KarolDowbecki Could you please show chunking example in your answer if the output file is too large? Thanks – ronan Dec 29 '20 at 19:33