2

I'm following the PigUnit testing example in Apache Pig page here. I tried to do the code example in Eclipse using a Maven project. I already added Pig and PigUnit dependency in pom.xml, tried both 0.14 and 0.15 version.

Here's the PigUnit test code taken from Apache Pig page (I enclosed it with a class of course):

  @Test
  public void testTop2Queries() {
    String[] args = {
        "n=2",
        };

    PigTest test = new PigTest("top_queries.pig", args);

    String[] input = {
        "yahoo",
        "yahoo",
        "yahoo",
        "twitter",
        "facebook",
        "facebook",
        "linkedin",
    };

    String[] output = {
        "(yahoo,3)",
        "(facebook,2)",
    };

    test.assertOutput("data", input, "queries_limit", output);
  }

and the Pig script, also copied:

data = LOAD 'input' AS (query:CHARARRAY);
queries_group = GROUP data BY query;
queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS total;
queries_ordered = ORDER queries_count BY total DESC, query;
queries_limit = LIMIT queries_ordered 2;
STORE queries_limit INTO 'output';

However, I am encountering this result, when I try to Run As > JUnit Test:

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias queries_limit
    at org.apache.pig.PigServer.openIterator(PigServer.java:935)
    ...[truncated]
Caused by: java.io.IOException: Couldn't retrieve job.
    at org.apache.pig.PigServer.store(PigServer.java:999)
    at org.apache.pig.PigServer.openIterator(PigServer.java:910)
    ... 28 more

This is the output from console that I'm getting:

STORE queries_limit INTO 'output';
--> none
data: {query: chararray}
data = LOAD 'input' AS (query:CHARARRAY);
--> data = LOAD 'file:/tmp/temp-820202225/tmp-1722948946' USING PigStorage('\t') AS (
    query: chararray
);
STORE queries_limit INTO 'output';
--> none

It looks like the Pig script is trying to load a local file system data for 'input' instead of using the Java String[] variable 'input' variable.

Can anyone help with this?

oikonomiyaki
  • 6,659
  • 11
  • 52
  • 85
  • Not sure if it is sufficient for this question, but for people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:23

1 Answers1

2

Before getting into the solution, I wanted to comment on the fact that the pig script is loading from local disk. When pig overrides a statement and you supply data for it to mock, it creates a file on local disk and loads it. That's why you see that file being loaded. If you look at that file you should see the data that you supply in the string array, input.

For anyone still looking for a solution to this, the following is what worked for me. This solution is based on pig version 0.15 and Hadoop 2.7.1. It seems to me you have to specify what pig artifact that you need.

    <dependency>
        <groupId>org.apache.pig</groupId>
        <artifactId>pigunit</artifactId>
        <version>${pig.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.pig</groupId>
        <artifactId>pig</artifactId>
        <version>${pig.version}</version>
        <classifier>h2</classifier>
        <!-- NOTE: It is very important to have this classifier. Unit tests will
        break if this doesn't exist. This gets the pig jars for Hadoop v2. -->
    </dependency>

Here are some very helpful classes on the pig github page.

PigTest Implementation (Good for reading API docs): https://github.com/apache/pig/blob/trunk/test/org/apache/pig/pigunit/PigTest.java

PigUnit examples: https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/pigunit/TestPigTest.java

JJ Meyer
  • 21
  • 3
  • I was missing classifier. I am using ping-0.14.After mentioning classifier it downloads pig-0.14.0-h2.jar. But honestly I dont get what its actually doing. Why h2 is needed? While running pig scripts I did not mentioned it and still it was working – MANISH ZOPE Dec 12 '16 at 11:09
  • @MANISHZOPE I believe the classifier is used to let maven know to grab the Pig dependency compiled for Hadoop 2. I don't believe this is actually documented anywhere. There isa Jira open to update the docs though (https://issues.apache.org/jira/browse/PIG-3738). – JJ Meyer Dec 28 '16 at 02:24
  • `pig--h2` appears to be available in Maven Central repository for 0.14, 0.15 and 0.16, but not for 0.17. – user1808924 Mar 18 '18 at 11:51