2

I am trying to read multiple lines in mapper. For that I started using NLineInputFormat class. While using this, I am getting GC limit error. For reference, the error code is:

16/02/21 01:37:13 INFO mapreduce.Job:  map 0% reduce 0%
16/02/21 01:37:38 WARN mapred.LocalJobRunner: job_local726191039_0001
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1019)
at java.util.concurrent.ConcurrentHashMap.putAll(ConcurrentHashMap.java:1084)
at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:852)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:713)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:442)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.<init>(LocalJobRunner.java:217)
at org.apache.hadoop.mapred.LocalJobRunner$Job.getMapTaskRunnables(LocalJobRunner.java:272)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:517)
16/02/21 01:37:39 INFO mapreduce.Job: Job job_local726191039_0001 failed with state FAILED due to: NA

For reference, please find the code snippet below.

public class JobLauncher {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "TestDemo");
        job.setJarByClass(JobLauncher.class);

        job.setMapperClass(CSVMapper.class);
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(NullWritable.class);

        conf.setInt(NLineInputFormat.LINES_PER_MAP, 3);
        job.setInputFormatClass(NLineInputFormat.class);
        NLineInputFormat.addInputPath(job, new Path(args[0]));

        job.setNumReduceTasks(0);
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
   }
}

I just have simple CSVMapper mapper. Why I am getting this error ? Please help me resolve this error.

Thanks in advance.

Santhosh Tangudu
  • 739
  • 9
  • 19

2 Answers2

1

Why I am getting this error?

In general, the most likely explanations for an OOME are that you have run out of memory because

  • your code has a memory leak, or
  • you do not enough memory for what you are trying to do / the way you are trying to do it.

(With this particular "flavour" of OOME, you haven't completely run out of memory. However, in all likelihood you are close to running out, and that has caused the GC CPU utilization to spike, exceeding the "GC overhead" threshold. This detail doesn't change the way you should try to solve your problem.)

In your case, it looks like the error is occurring while you are loading input from a file into a map (or collection of maps). The inference is therefore that you have told Hadoop to load more data than is going to fit in memory at one time.

Please help me resolve this error.

Solutions:

  • Reduce input file size; e.g. by break your problem down into smaller problems
  • Increase the memory size (specifically, the Java heap size) for the affected JVM(s).
  • Change your application so that the job streams the data from the file (or from HFS) themselves ... rather than loading a CSV into a map.

If you need a more specific answer, you will need to provide more details.

Stephen C
  • 632,615
  • 86
  • 730
  • 1,096
  • Thanks @stephen. But I want to know why I am getting this error in this situation. I already provided the code snippet etc. Please let me know what details you want to know. – Santhosh Tangudu Feb 21 '16 at 18:49
  • 1) That has been explained; see above. I think either your input file is too big, your heap is too small, or you should not preload the entire CSV. 2) Tell me / us how big the input file is, and provide details so we can understand why you need to preload the entire CSV. – Stephen C Feb 21 '16 at 22:27
  • The input file size is 30MB. I extended the heap size till 4GB. Still I am getting this error. – Santhosh Tangudu Feb 22 '16 at 18:41
  • And the other details? – Stephen C Feb 22 '16 at 22:36
  • I am not preloading the file. This is Hadoop program. I kept this file HDFS. – Santhosh Tangudu Feb 23 '16 at 03:41
  • 1
    Erm ... the stacktrace makes it clear that *something* is loading *something* into a map while initializing a `JobConf` object, and that is when the OOME is occurring. I am inferring that it is the contents of your CSV file ... because you are using a CSVMapper. I am also inferring that *that* is the root cause, because your CSV file is really big. – Stephen C Feb 23 '16 at 07:54
0

Adding to Stephen C answer, which lists out possible solutions

From oracle documentation link,

Exception in thread thread_name: java.lang.OutOfMemoryError: GC Overhead limit exceeded

Cause: The detail message "GC overhead limit exceeded" indicates that the garbage collector is running all the time and Java program is making very slow progress. After a garbage collection, if the Java process is spending more than approximately 98% of its time doing garbage collection and if it is recovering less than 2% of the heap and has been doing so far the last 5 (compile time constant) consecutive garbage collections, then a java.lang.OutOfMemoryError is thrown.

This exception is typically thrown because the amount of live data barely fits into the Java heap having little free space for new allocations.

Action: Increase the heap size. The java.lang.OutOfMemoryError exception for GC Overhead limit exceeded can be turned off with the command line flag -XX:-UseGCOverheadLimit.

Have a look at this SE question for better handling of this error:

java.lang.OutOfMemoryError: GC overhead limit exceeded

Community
  • 1
  • 1
Ravindra babu
  • 42,401
  • 8
  • 208
  • 194