I want to create UDF in pig using tika for processing image in the HDFS.
Below is my code, but I'm getting ClassNotFound exception
public String exec(Tuple input) throws ExecException, IOException {
try {
if (input == null || input.size() == 0 || input.get(0) == null)
{
return null;
}
} catch (ExecException ex) {
Logger.getLogger(Check.class.getName()).log(Level.SEVERE, null, ex);
}
String s="";
ByteArrayInputStream b = (ByteArrayInputStream)input.get(0);
ContentHandler contenthandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
ParseContext parseCtx = new ParseContext();
try {
parser.parse(b, contenthandler, metadata,parseCtx);
} catch (SAXException ex) {
Logger.getLogger(Check.class.getName()).log(Level.SEVERE, null, ex);
} catch (TikaException ex) {
Logger.getLogger(Check.class.getName()).log(Level.SEVERE, null, ex);
}
System.out.println("Mime: " + metadata.get(Metadata.CONTENT_TYPE));
return metadata.get(Metadata.CONTENT_TYPE);
}
Input is image file which is stored in hdfs in unknown format.
Output I need the output as type of the file. But I am Getting TikaException and java Class not found exception for the above code.
Error
2014-11-21 12:00:56,417 [main] INFO org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - Failed!
2014-11-21 12:00:56,483 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 10
66: Unable to open iterator for alias f. Backend error : java.lang.ClassNotFound
Exception: org.apache.tika.exception.TikaException
PigScript
a= load '/image.jpeg' as x;
b= group a all;
f= foreach b generate package.check(a)
If anyone know the solution for above problem. Please guide me as soon as possible.