12

I am reading a very large file and extracting some small portions of text from each line. However at the end of the operation, I am left with very little memory to work with. It seems that the garbage collector fails to free memory after reading in the file.

My question is: Is there any way to free this memory? Or is this a JVM bug?

I created an SSCCE to demonstrate this. It reads in a 1 mb (2 mb in Java due to 16 bit encoding) file and extracts one character from each line (~4000 lines, so should be about 8 kb). At the end of the test, the full 2 mb is still used!

The initial memory usage:

Allocated: 93847.55 kb
Free: 93357.23 kb

Immediately after reading in the file (before any manual garbage collection):

Allocated: 93847.55 kb
Free: 77613.45 kb (~16mb used)

This is to be expected since the program is using a lot of resources to read in the file.

However then I garbage collect, but not all the memory is freed:

Allocated: 93847.55 kb
Free: 91214.78 kb (~2 mb used! That's the entire file!)

I know that manually calling the garbage collector doesn't give you any guarantees (in some cases it is lazy). However this was happening in my larger application where the file eats up almost all available memory, and causes the rest of the program to run out of memory despite the need for it. This example confirms my suspicion that the excess data read from the file is not freed.

Here is the SSCCE to generate the test:

import java.io.*;
import java.util.*;

public class Test {
    public static void main(String[] args) throws Throwable {
        Runtime rt = Runtime.getRuntime();

        double alloc = rt.totalMemory()/1000.0;
        double free = rt.freeMemory()/1000.0;

        System.out.printf("Allocated: %.2f kb\nFree: %.2f kb\n\n",alloc,free);

        Scanner in = new Scanner(new File("my_file.txt"));
        ArrayList<String> al = new ArrayList<String>();

        while(in.hasNextLine()) {
            String s = in.nextLine();
            al.add(s.substring(0,1)); // extracts first 1 character
        }

        alloc = rt.totalMemory()/1000.0;
        free = rt.freeMemory()/1000.0;
        System.out.printf("Allocated: %.2f kb\nFree: %.2f kb\n\n",alloc,free);

        in.close();
        System.gc();

        alloc = rt.totalMemory()/1000.0;
        free = rt.freeMemory()/1000.0;
        System.out.printf("Allocated: %.2f kb\nFree: %.2f kb\n\n",alloc,free);
    }
}
tskuzzy
  • 34,355
  • 14
  • 66
  • 132
  • 3
    Unless you're doing something very unusual that nobody else is likely to be doing, "jvm bug" shouldn't be your first assumption. – Paul Tomblin Jun 08 '12 at 15:34
  • How do you expect System.gc() to free up all the memory? You're still using the strings in al, so they can't be freeed. – Paul Tomblin Jun 08 '12 at 15:36
  • @PaulTomblin: I've been researching this problem for a while now but nothing came up. And I don't see any good reason for why this should be happening. – tskuzzy Jun 08 '12 at 15:36
  • @PaulTomblin: Yes, I am using the Strings in `al`, however they should only take 2kb of memory (since I'm only storing small substrings of each line) whereas it's taking a whole 2 mb (the size of the entire file). – tskuzzy Jun 08 '12 at 15:37
  • 2
    @dystroy's answer hits it on the head. substring isn't making a totally new string, it's holding a reference to the original string. – Paul Tomblin Jun 08 '12 at 15:40
  • 1
    `substring` does keep the original character data. – Has QUIT--Anony-Mousse Jun 08 '12 at 15:40

3 Answers3

22

When making a substring, your substring keeps a reference to the char array of the original string (this optimization makes handling many substring of a string very fast). And so, as you keep your substrings in the al list, you're keeping your whole file in memory. To avoid this, create a new String using the constructor that takes a string as argument.

So basically I'd suggest you do

    while(in.hasNextLine()) {
        String s = in.nextLine();
        al.add(new String(s.substring(0,1))); // extracts first 1 character
    }

The source code of the String(String) constructor explicitly states that its usage is to trim "the baggage" :

  164       public String(String original) {
  165           int size = original.count;
  166           char[] originalValue = original.value;
  167           char[] v;
  168           if (originalValue.length > size) {
  169               // The array representing the String is bigger than the new
  170               // String itself.  Perhaps this constructor is being called
  171               // in order to trim the baggage, so make a copy of the array.
  172               int off = original.offset;
  173               v = Arrays.copyOfRange(originalValue, off, off+size);
  174           } else {
  175               // The array representing the String is the same
  176               // size as the String, so no point in making a copy.
  177               v = originalValue;
  178           }
  179           this.offset = 0;
  180           this.count = size;
  181           this.value = v;

Update : this problem is gone with OpenJDK 7, Update 6. People with a more recent version don't have the problem.

Denys Séguret
  • 335,116
  • 73
  • 720
  • 697
  • Hm... Interesting. That's a strange optimization that substring does. But it explains what's going on. Also there appears to be a bug report about it: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513622 – tskuzzy Jun 08 '12 at 15:52
  • As I recall, this was present in the first versions of java (1.02) and at the time it was seen as a smart optimization. The problem is that it makes garbaging more complex. – Denys Séguret Jun 08 '12 at 15:55
  • I can see the reasoning behind it since it reduces substring to an `O(1)` operation. But this almost seems like a memory leak to me. – tskuzzy Jun 08 '12 at 15:56
  • 2
    @assylias look at the source code of the String(String) constructor : you'll see it was created espacially for this. – Denys Séguret Jun 08 '12 at 16:01
6

Make sure to not keep references you don't need any more.

You still have references to al and in.

Try adding al = null; in = null; before calling the garbage collector.

Also, you need to realize how substring is implemented. substringkeeps the original string, and just uses a different offset and length to the same char[] array.

al.add(new String(s.substring(0,1)));

Not sure if there is a more elegant way of copying a substring. Maybe s.getChars() is more useful for you, too.

As of Java 8, substring does now copy the characters. You can verify yourself that the constructor calls Arrays.copyOfRange.

Has QUIT--Anony-Mousse
  • 70,714
  • 12
  • 123
  • 184
-1

System.gc() is not a guarantee that JVM will garbage collect - it is only an advise to the JVM that it can try and garbage collect. As there is a lot of memory already available, JVM may ignore the advise and keep running till it feels the need to do so.

Read more at the documentation http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()

Another question that talks about it is available at When does System.gc() do anything

Community
  • 1
  • 1
sangupta
  • 2,336
  • 3
  • 22
  • 36