I'm following the PigUnit testing example in Apache Pig page here. I tried to do the code example in Eclipse using a Maven project. I already added Pig and PigUnit dependency in pom.xml, tried both 0.14 and 0.15 version.
Here's the PigUnit test code taken from Apache Pig page (I enclosed it with a class of course):
@Test
public void testTop2Queries() {
String[] args = {
"n=2",
};
PigTest test = new PigTest("top_queries.pig", args);
String[] input = {
"yahoo",
"yahoo",
"yahoo",
"twitter",
"facebook",
"facebook",
"linkedin",
};
String[] output = {
"(yahoo,3)",
"(facebook,2)",
};
test.assertOutput("data", input, "queries_limit", output);
}
and the Pig script, also copied:
data = LOAD 'input' AS (query:CHARARRAY);
queries_group = GROUP data BY query;
queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS total;
queries_ordered = ORDER queries_count BY total DESC, query;
queries_limit = LIMIT queries_ordered 2;
STORE queries_limit INTO 'output';
However, I am encountering this result, when I try to Run As > JUnit Test:
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias queries_limit
at org.apache.pig.PigServer.openIterator(PigServer.java:935)
...[truncated]
Caused by: java.io.IOException: Couldn't retrieve job.
at org.apache.pig.PigServer.store(PigServer.java:999)
at org.apache.pig.PigServer.openIterator(PigServer.java:910)
... 28 more
This is the output from console that I'm getting:
STORE queries_limit INTO 'output';
--> none
data: {query: chararray}
data = LOAD 'input' AS (query:CHARARRAY);
--> data = LOAD 'file:/tmp/temp-820202225/tmp-1722948946' USING PigStorage('\t') AS (
query: chararray
);
STORE queries_limit INTO 'output';
--> none
It looks like the Pig script is trying to load a local file system data for 'input' instead of using the Java String[]
variable 'input' variable.
Can anyone help with this?