0

I'm running the following pig but I"m getting an ERROR 1066: Unable to open iterator for alias H.

A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id;
G = GROUP F BY E.id;
H = FOREACH G GENERATE $0, SUM($1.hits);
DUMP H;

When I describe G, I get:

G: {group: bytearray,F: {(E::id: bytearray,E::hits: int,C::id:bytearray,
    C::first: bytearray,C::last: bytearray,C::bats:bytearray,
    C::birthMonth: byetarray,C::deathYear: bytearray)}}

I've tried a ton of things inside of the SUM() function: F:hits, F.hits, F.E.hits, E.hits, E:hits but I don't know how I'm supposed to reference the tuple within the bag.

Thanks for ideas.

sisdog
  • 2,522
  • 2
  • 24
  • 47

2 Answers2

1

I suggest you try this (Haven't tried practicals) :

A = LOAD 'hdfs:/home/ubuntu/pigtest/Master.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS id, $13 AS first, $14 AS last, $18 AS bats, $2 AS birthMonth,$7 AS deathYear;
C = FILTER B BY birthMonth==10 and deathYear==2011 and bats=='R';
D = LOAD 'hdfs:/home/ubuntu/pigtest/Batting.csv' USING PigStorage(',');
E = FOREACH D GENERATE $0 AS id, $7 AS hits;
F = JOIN E BY id, C BY id; 
----- Try generating the columns you need and try DUMP to see if output 
F1 = FOREACH F GENERATE E::id  as id, E::hits as hits;
G = GROUP F1 BY id;
H = FOREACH G GENERATE FLATTEN(group) as ID , SUM(F1.hits);
DUMP H;

Notice H = FOREACH G GENERATE FLATTEN(group) as ID , SUM(F1.hits); That's error in your code.

San
  • 161
  • 2
  • 13
  • Thanks @Sandesh. I knew that FLATTEN function must have been good for something. And I see the 1 reputation point... welcome to SO :> – sisdog Mar 24 '17 at 04:58
0

There could be a couple of reasons for this to happen:

a)The pig version being run needs to be changed. ERROR 1066: Unable to open iterator for alias - Pig
b)The values in the test data might have null values. For this try to adapt your script to a similar one below:

values = FOREACH test1 GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;

This could possibly resolve the issue.

Community
  • 1
  • 1
Keshav Pradeep Ramanath
  • 1,347
  • 3
  • 21
  • 29