0

I am experimenting with Pig UDFs. I am able to get simple UDF like Upper Case working for me. so I was trying to write my own UDF. I want to process each line of an input file which contains 3 integers. If the 3 integers satisfy the criteria for sides of a right angle triangle, then the hypotenuse is returned else null is returned.

But I am getting following error - ERROR 1066: Unable to open iterator for alias B

Here is the Pig Script Code -

-- rat.pig - A Pig script to test right angle triangle
REGISTER /Users/admin/Programming/PigUDF/bin/myudfs/myudfs.jar;
A = LOAD '/Users/admin/Programming/pigdata/triangle.csv' AS (sides: tuple(side_0:int, side_1:int, side_2:int));
B = FILTER A BY (myudfs.RAT(A.sides)!= 0);
DUMP B; 

The UDF is like

package myudfs;

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class RAT extends EvalFunc<Integer>{
    public Integer exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0) {
            return null;
        }
        try {
            int num_0 = (Integer)input.get(0);
            int num_1 = (Integer)input.get(1);
            int num_2 = (Integer)input.get(2);

            if ((num_0 * num_0) + (num_1 * num_1) == num_2 * num_2) 
                return Integer.valueOf(num_2);
            else if ((num_0 * num_0) + (num_2 * num_2) == num_1 * num_1)
                return Integer.valueOf(num_1);
            else if ((num_1 * num_1) + (num_2 * num_2) == num_0 * num_0)
                return Integer.valueOf(num_0);
            else {
                return null;
            }
        } catch (Exception e) {
            throw new IOException(" Caught exception processing input row", e);
        }
    }

}

I was wondering what I am doing wrong here. Any pointer is appreciated. Thanks.

Sumod
  • 3,676
  • 8
  • 46
  • 67
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:33

1 Answers1

0

Sumod,

There are couple of changes that you need to do.

Your LOAD statement doesn't seem to generate a proper Tuple. And the function also needs to be changed slightly.

Please see the code modifications I have done. Any questions please let me know.

REGISTER PIGTrnFilter.jar;
A = LOAD '/home/hadoop/lab/examples/PigTrnTest.txt' AS (side_0:int, side_1:int, side_2:int);
B = FILTER A BY (inverika.training.examples.RAT(TOTUPLE(A.side_0, A.side_1, A.side_2)) != 0);
DUMP B;

The Filter functions is below.

package inverika.training.examples;

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class RAT extends EvalFunc<Integer>{
    public Integer exec(Tuple TT) throws IOException {
        if (TT == null || TT.size() == 0) {
            return null;
        }
        try {           
            Object tupleObject = TT.get(0);

            Tuple input = (Tuple) tupleObject;

            Object object0 = input.get(0);
            Object object1 = input.get(1);
            Object object2 = input.get(2);

            int num_0 = (Integer) object0;
            int num_1 = (Integer) object1;
            int num_2 = (Integer) object2;

            if ((num_0 * num_0) + (num_1 * num_1) == num_2 * num_2) 
                return Integer.valueOf(num_2);
            else if ((num_0 * num_0) + (num_2 * num_2) == num_1 * num_1)
                return Integer.valueOf(num_1);
            else if ((num_1 * num_1) + (num_2 * num_2) == num_0 * num_0)
                return Integer.valueOf(num_0);
            else {
                return new Integer(0);
            }
        } catch (Exception e) {
            throw new IOException(" Caught exception processing input row", e);
        }
    }
}

Please note that I have used a Tab separated data rather than a csv. if have a csv then you need to use PigStorage function to load.

1   2   3
2   5   2
2   2   2
1   3   7
7   2   10
3   4   5

I have made minor modifications which I guess you can follow. Look at the relation schema to understand the changes I made. You can actually use a FilerFunc which returns Boolean than an EvalFunc. Hope this helps you.

Rags
  • 1,701
  • 16
  • 17
  • Provided that the input is formatted correctly, the `(sides: tuple(side_0:int, side_1:int, side_2:int));` is actually correct. The input needs to be formatted (123,234,345) for example. – Frederic Mar 27 '13 at 15:32
  • Fred,I tries with the above syntax. It is creating the schema correctly, but the the DUMP gives me empty tuples. Can you share your sample code and the data. – Rags Mar 28 '13 at 03:58
  • Thanks Rags and Fred. I tried approach suggested by Rags. I Changed input format from (num1,num2,num3) to num1 num2 num3. Then I made some modifications in the Java code. Now here is what is happening. I do not get any errors while loading and filtering. But when I run "illustrate B", I get the error - org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. – Sumod Apr 07 '13 at 09:29
  • Sumod, Just describe your schema and you will get a better understanding. This could be because the outer tuple contains actual tuple. i.e a tuple with in tuple. Look at this part of my code where I am getting the inner tuple. Object tupleObject = TT.get(0); Tuple input = (Tuple) tupleObject; – Rags Apr 08 '13 at 10:19