1

I am not pasting the input, output, mapper and reducer class below. The following is my main function. I am using Hadoop 1.0.4 to run the below code. It works fine until I try to compress the output from the reducer. I am pasting the compilation error along with the code:

public static void main(String[] args) throws Exception
{
    Configuration conf = new Configuration();

    conf.set("xmlinput.start", "<page>");
    conf.set("xmlinput.end", "</page>");
    Job job = new Job(conf);  //configure the job, submit it, control its execution, and query the state
    job.setJarByClass(XmlParser11.class); //set jar by finding where the class came from
    job.setOutputKeyClass(Text.class); //Set the key class for the job output data
    job.setOutputValueClass(Text.class);

    //job.setCompressMapOutput(true);
    //job.setMapOutputCompressorClass(GzipCodec.class);

    //job.setCompressOutput(job, true);
    //job.setClass("mapred.output.compression.codec", GzipCodec.class,CompressionCodec.class);
    job.setMapperClass(XmlParser11.Map.class);
    job.setReducerClass(XmlParser11.Reduce.class);

    job.setInputFormatClass(XmlInputFormat1.class);  //Set the InputFormat for the job                job.setOutputFormatClass(TextOutputFormat.class); //Set the OutputFormat for the job
    FileOutputFormat.setCompressOutput(job,true);
    FileOutputFormat.setOutputCompressorClass(job,GzipCodec.class);
    FileInputFormat.addInputPath(job, new Path(args[0])); //the job for which the input path should be modified                FileOutputFormat.setOutputPath(job, new Path(args[1]));
    job.waitForCompletion(true);       
}

[ravisg@topsail-sn ~]$ javac -classpath /var/hadoop/hadoop-core-1.0.4.jar -d stopWords/ XmlParser11.java
 XmlParser11.java:306: error: cannot find symbol
        FileOutputFormat.setOutputCompressorClass(job,GzipCodec.class);
                                                      ^
 symbol:   class GzipCodec
 location: class XmlParser11

Can you tell me how to compress the output from my reducer or can you point out what I am doing incorrectly? I tried using the different styles of compression suggested on Stackoverflow, but I always getting a similar error.

Sergey Brunov
  • 11,755
  • 7
  • 39
  • 71
user2623946
  • 45
  • 1
  • 13

2 Answers2

1

Sorry, I just had to use

FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class

instead of

FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
Sergey Brunov
  • 11,755
  • 7
  • 39
  • 71
user2623946
  • 45
  • 1
  • 13
0

You need to add hadoop-common*jar from your Hadoop distribution to your classpath when compiling the code.The jar in question contains the GZipCodec class

ybodnar
  • 150
  • 4
  • I am already adding hadoop-core-1.0.4.jar to the classpath . Can I add a second classpath to hadoop-common*jar while compiling the code ? – user2623946 Oct 07 '13 at 04:02
  • You need to add jars to classpath separating them by : in Linux or s by ; if you're compiling it on Windows. Here's a Q&A on that http://stackoverflow.com/questions/219585/setting-multiple-jars-in-java-classpath – ybodnar Oct 07 '13 at 13:29