6

I have a huge file with a list of objects written by ObjectOutputStream, one after another.

for (Object obj : currentList){
    oos.writeUnshared(obj);
}

Now I want to read this file using ObjectInputStream. However, I need to read multiple files at the same time, so I can't read the entire file into memory. However, using ObjectInputStream causes a Heap Out Of Memory Error. From what I read, this is caused because ObjectInputStream has a memory leak and maintains references to the read objects even after returning them.

How can I ask ObjectInputStream to not maintain a reference of whatever its reading?

Mathias G.
  • 4,093
  • 3
  • 34
  • 56
copperhead
  • 547
  • 1
  • 7
  • 14
  • I closed the stream while writing it. While reading it, I have around 100 files that I'm reading simultaneously and all those streams are open together. I only close them once the respective files have been read. However, the memory leak is definitely happening in ObjectInputStream since there should be only 100 objects stored in memory at a given time. – copperhead Feb 25 '14 at 08:30
  • Why don't you try to read the files in batches. E.g for 100 files, you can make 4 batches, such that initially batch1 will be read which contain 25 file, then batch2 will be read.. and so on.. – Gaurav Gupta Feb 25 '14 at 08:39

2 Answers2

7

A possible solution is to call the method reset() on your ObjectOutputStream: “This will disregard the state of any objects already written to the stream. The state is reset to be the same as a new ObjectOutputStream. The current point in the stream is marked as reset so the corresponding ObjectInputStream will be reset at the same point.” (extracted from the java documentation) Doing a reset on your ObjectOutputStream also resets the ObjectInputStream state.

I assume you can control also your ObjectOutputStreams?

Mathias G.
  • 4,093
  • 3
  • 34
  • 56
  • Yes, I can control the ObjectOutputStreams, but I write to the files using ObjectOutputStreams before, using a different program and its not a socket connection. So will doing a reset on the ObjectOutputStream still affect the ObjectInputStream while reading? – copperhead Feb 25 '14 at 08:32
  • 1
    @copperhead The object stream neither knows nor cares what it is writing to. I suggest you try it. – user207421 Feb 25 '14 at 08:39
  • Waaau, it really works! I would never expect that the change must be made on the output side. I also noted that **too frequent reset()** is significantly **extending size** of the result file. – dedek May 21 '14 at 09:13
2

When you are using writeUnshared on the writing side you have already done one half of the job. If you now also use readUnshared on the input side rather than readObject, the ObjectInputStream will not maintain references to the objects.

You can use the following program to verify the behavior:

package lib.io;

import java.awt.Button;
import java.io.*;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
import java.lang.ref.WeakReference;
import java.util.concurrent.ConcurrentHashMap;

public class ObjectInputStreamReferences {
  public static void main(String[] args)
  throws IOException, ClassNotFoundException {
    final int numObjects=1000;
    Serializable s=new Button();
    ByteArrayOutputStream os=new ByteArrayOutputStream();
    try( ObjectOutputStream oos=new ObjectOutputStream(os) ) {
      for(int i=0; i<numObjects; i++) oos.writeUnshared(s);
    }
    final ConcurrentHashMap<WeakReference<?>, Object> map
                                                  =new ConcurrentHashMap<>();
    final ReferenceQueue<Object> q=new ReferenceQueue<>();
    new Thread(new Runnable() {
      public void run() {
        reportCollections(map, q);
      }
    }).start();
    try(ObjectInputStream ois=
        new ObjectInputStream(new ByteArrayInputStream(os.toByteArray()))) {
      for(int i=0; i<numObjects; i++) {
        Object o=ois.readUnshared();
        map.put(new WeakReference<>(o,q), "");
        o=null;
        System.gc();Thread.yield();
      }
    }
    System.exit(0);
  }

  static void reportCollections(
      ConcurrentHashMap<WeakReference<?>, Object> map, ReferenceQueue<?> q) {
    for(;;) try {
      Reference<?> removed = q.remove();
      System.out.println("one object collected");
      map.remove(removed);
    } catch(InterruptedException ex){}
  }
}
Holger
  • 243,335
  • 30
  • 362
  • 661
  • System.gc() does practically nothing, and Thread.yield() even less. – user207421 Feb 25 '14 at 08:55
  • @EJP thanks for the comment, but on my machine it makes the difference between getting one hundred reports of garbage collection or no report at all. I know that there is no guarantied behavior but at least with Oracle’s current JVM version 7 it works and that’s what it is all about, a piece of code for testing, not for production. – Holger Feb 25 '14 at 08:57
  • Which of those two methods are you talking about? – user207421 Feb 25 '14 at 09:07
  • @EJP: I didn’t test them individually. I tested with that line and without. But I suppose that `Thread.yield()` is indeed obsolete on today’s systems but it doesn’t hurt anyway. As said, I wouldn’t use neither of them in production code. – Holger Feb 25 '14 at 09:17