-2

I have a couple of question regarding this problem I am having. I read some other questions and I discovered that the problem is that I am generating hundred of thousands of small Maps (Map<mutableObject, String>).

  1. I did not get in the questions though, hy does this happen? I would like to understand what is happening behind the scenes. So If anyone has a pointer here it would be greatly appreciated.

  2. The second question is related to what would be a good alternative to the use of HashMap. Each Map is different from each other but there are lots of repetitions of most of the elements. I am generating permutations of the Strings and I store them in maps. Does someone has a good solution and, if possible, also a pointer to understand why should I do it like that?

Thanks in advance

Altober

OK Here some code to those who requested. I am reading an XML file, I tried to reduce it to the minimal extent possible so you can have an idea of what is happening.

String s = null;
String wd = null;
SomeObject word = null;
List<Sentence> sentences = new ArrayList<Sentence>();
Sentence sentence = null;
String line = null;
while(in.hasNextLine()) {
    line = in.nextLine().trim();
    if((line.startsWith("<s>")) {
       sentence = new Sentence();
       senses = new HashMap<someObject, String>();
    } else if(line.endsWith("</f>")) {
        int beggin = line.indexOf('>');
        wd = line.substring(beggin + 1, line.length() - 5);
        word.word = wd;
        if(s != null)
             senses.put(word, s);
    } else if if(line.endsWith("</s>")) {
        sentence.setSenses(senses);
        sentences.add(sentence);
    }

}

SOLVED (here the solution)

After using using a memory analyser I figure out that the problem were not the maps but a enormous amount of strings that I was generating in the loop (each token had multiple attributes). So as the file was a result of string permutations there were a lot of repeated strings. I just used the .intetn() for every string, as suggested in another post, and then it worked smoothly.

Community
  • 1
  • 1
Altober
  • 922
  • 2
  • 13
  • 27
  • so your code isn't creating those maps ? try to examine data that it holds and try to map it to the logical block of code – jmj Dec 17 '13 at 17:38
  • sorry I cannot follow. My code creates the maps but after hundred of thousands iterations it gives the GC error. – Altober Dec 17 '13 at 17:39
  • 2
    post the related code, looks like your code holds the references of Map where it should forget about it – jmj Dec 17 '13 at 17:41
  • What are "small" Maps? Why do you need them? Where do you store all those Maps? – Ingo Dec 17 '13 at 17:43
  • GC holds onto anything that is "reachable". If those Maps are clogging storage it's because you somehow have references to them. This is what's mistakenly called a "storage leak" in Java. – Hot Licks Dec 17 '13 at 17:45
  • @Ingo - Obviously he stores them in GC heap. – Hot Licks Dec 17 '13 at 17:45
  • I'm probably not the only person who would want to see a code sample to better see what you are talking about and to confirm that there are no other obvious problems with it. – user1445967 Dec 17 '13 at 17:46
  • @HotLicks good catch - the question should be where he stores the references to those HashMaps. Another possibility is that he misuses HashMaps when he needs Tuples, or something like that. – Ingo Dec 17 '13 at 17:47
  • @Ingo - If he's using HashMaps for tuples he's got a lot of tuples. – Hot Licks Dec 17 '13 at 17:57
  • So how long is your XML file? – Hot Licks Dec 17 '13 at 17:59
  • @HotLicks 32MB, no problem there. – Altober Dec 17 '13 at 18:00
  • 1
    That code makes no sense. You throw away the hash map with every line you read. And, besides that, it won't compile. – Hot Licks Dec 17 '13 at 18:01
  • 1
    @Altober note that your code does not make sense (sic!), since `senses` ought to get grabage collected after each iteration. So I think you're withholding the part where you somehow keep a reference to those HasMaps. – Ingo Dec 17 '13 at 18:02
  • Yeah, show us something vaguely resembling the real code. – Hot Licks Dec 17 '13 at 18:05
  • I added this. At some point I collect the reference to the Map. See the end. Sorry but the code is huge I tried to make it minimal. – Altober Dec 17 '13 at 18:06
  • @Altober We acknowledge your effort to make small example, but it is actually too small, as it doesn't reproduces the behaviour. – Ingo Dec 17 '13 at 18:07
  • Ok, I think now it should be OK, do not worry about the syntax. Take it as pseudo code. – Altober Dec 17 '13 at 18:11
  • It still doesn't make sense. For example, "if(s != null)" s will always be null because you never set it to anything else. – bcorso Dec 17 '13 at 18:32
  • If you are having trouble reducing the code you might be better off telling us in words what you are trying to achieve. – bcorso Dec 17 '13 at 18:35
  • if you are reading an xml file, why you are not using an xml parser like dom4j or semothing else.... the code you have there will have a lot of problems.... – Javier Neyra Dec 17 '13 at 18:48
  • Also, we can't answer your second question, because it is totally unclear why you store the permutations in HasMaps (What is the key, btw? Or are the permutations the key? What is the value then?) and how you later use those maps. – Ingo Dec 17 '13 at 18:49
  • @bcoroso just assume that from time to time s is not null. The code parses a sentence, gets some attributes and generates and object of type sentence where it stores them. The map is stored in the sentence when the sentence is over. – Altober Dec 17 '13 at 22:22
  • @Javier Neyra because the file format is flawed so I had to do it myself, let's assume is a "semi-XML" :-) – Altober Dec 17 '13 at 22:27

2 Answers2

1

I am generating hundred of thousands of small Maps (Map).

  1. I did not get in the questions though, why does this happen? I would like to understand what is happening behind the scenes. So If anyone has a pointer here it would be greatly appreciated.

It's not really happening behind the scenes. It's right there in your code:

while(in.hasNextLine()) {
...
    if((line.startsWith("<s>")) {
        sentence = new Sentence();
        senses = new HashMap<someObject, String>(); // <-- generating a new HashMap
...
}

Also, to be precise, it's generating one for every occurrence of <s>, not necessarily hundreds of thousands.

bcorso
  • 40,568
  • 8
  • 56
  • 73
  • But since "senses" is not stored anywhere, the Map is immediately GCed. A lot of wasted motion, but the heap will not fill up. – Hot Licks Dec 17 '13 at 20:16
  • Thanks a lot for the answer. The map (at least it is what I entended) is stored in the sentence object once the sentence is finished. The I said hundred thousands because there are a bit less than 1 million sentences and I generate one map per sentence. Regarding the first part of your answer I store the map in a sentence object (as shown in the code), isnt that correct? – Altober Dec 17 '13 at 22:26
  • So a hash map is stored, at max, once per every other line. And it's stored in a "Sentence", which is of unknown size. So if you have a 32MB file and sentences are 32 chars long on average, you'll have at max 500,000 HashMaps and 500,000 "Sentences" -- a million objects, except that each HashMap is a minimum of 4 objects, so closer to 2.5 million objects. Each object is a minimum of 32 bytes in size. So we're up to about 80 million bytes, maybe twice that, but your max heap size, unless you muck with it, should be at least 500 million bytes. – Hot Licks Dec 18 '13 at 03:51
0

How many permutations are you generating? Millions? Billions?

If there are, say, 30,000 possible permutations, your problem is something else. However, once the number gets big enough, it would be impossible for a 'normal' desktop computer to store all of the results, even if you are doing this efficiently.

If you are doing this as part of a school assignment, often one of the points of the assignment is to get you to realize this :)

Google 'combinatorial explosion' for more - this is from the wikipedia page:

http://en.wikipedia.org/wiki/Combinatorial_explosion

For example, 100! = 9.33262154 × 10157, a number so large that it cannot be displayed on most calculators, and vastly larger than the estimated number of fundamental particles in the Universe.

If you think you are -almost- there in terms of successfully running your program, you can try changing JVM options and increasing the available heap size. However it is often the case that when trying to solve problems of this sort, you don't need twice as much memory.... rather you need something like 100-10,000,000 times as much memory.

Are you able to run your program successfully with a smaller version of your problem? If so, you should study at what input size your program starts to fail.

user1445967
  • 1,410
  • 2
  • 14
  • 29
  • thanks for the answer but the permutations are not a problem. I am reading a file of 32MB only. – Altober Dec 17 '13 at 17:58