2

I want to create UDF in pig using tika for processing image in the HDFS.

Below is my code, but I'm getting ClassNotFound exception

        public String exec(Tuple input) throws ExecException, IOException  {
        try {
            if (input == null || input.size() == 0 || input.get(0) == null)
            {
                return null;
            }
        } catch (ExecException ex) {
            Logger.getLogger(Check.class.getName()).log(Level.SEVERE, null, ex);
        }
       String s="";
        ByteArrayInputStream b = (ByteArrayInputStream)input.get(0);
        ContentHandler contenthandler = new BodyContentHandler();
        Metadata metadata = new Metadata();
        Parser parser = new AutoDetectParser();
        ParseContext parseCtx = new ParseContext();
        try { 
            parser.parse(b, contenthandler, metadata,parseCtx);
        } catch (SAXException ex) {
            Logger.getLogger(Check.class.getName()).log(Level.SEVERE, null, ex);
        } catch (TikaException ex) {
            Logger.getLogger(Check.class.getName()).log(Level.SEVERE, null, ex);
        }


        System.out.println("Mime: " + metadata.get(Metadata.CONTENT_TYPE));
        return metadata.get(Metadata.CONTENT_TYPE);

    }

Input is image file which is stored in hdfs in unknown format.

Output I need the output as type of the file. But I am Getting TikaException and java Class not found exception for the above code.

Error

2014-11-21 12:00:56,417 [main] INFO  org.apache.pig.backend.hadoop.executionengi
ne.mapReduceLayer.MapReduceLauncher - Failed!
2014-11-21 12:00:56,483 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 10
66: Unable to open iterator for alias f. Backend error : java.lang.ClassNotFound
Exception: org.apache.tika.exception.TikaException

PigScript

a= load '/image.jpeg' as x;
b= group a all;
f= foreach b generate package.check(a)

If anyone know the solution for above problem. Please guide me as soon as possible.

Mallieswari
  • 113
  • 9
  • 1
    Please specify the input value you pass and output received with full stack trace.. – Dinesh Kumar P Nov 20 '14 at 11:04
  • i have edited the content ,please find the above. – Mallieswari Nov 20 '14 at 11:20
  • What about the stack trace? – Dinesh Kumar P Nov 21 '14 at 05:49
  • I have add the error message for my source code,please help me to solve this issue , – Mallieswari Nov 21 '14 at 06:38
  • Please try to start with a completely trivial UDF and see if that works. Afterwards build gradually towards your actual UDF to figure out where the problem is encountered. -- For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 14:45

1 Answers1

0

Please check if the Apache Tika jar is registered in your Pig script. Ensure that it is available during the execution of the Pig script.

ex:

REGISTER '/home/user/pig/udfrepository/projectUDF.jar'
REGISTER '/home/user/thridpartyjars/xyz.jar';
Manjunath Ballur
  • 5,842
  • 3
  • 31
  • 44
Chetan Naik
  • 53
  • 1
  • 5