1

Does anybody have problems with Pig filters not working properly. And generally acting goofy.

As an example, I have some logs that look like this.

a1

(2013-12-25 02:55:08,000085594,15468,80365991,1387940111723)
(2013-12-25 02:55:08,000085594,63943,80365991,1387940111723)
(2013-12-25 02:55:08,000085594,64014,80365991,1387940111723)

decribe a1

a1: {time:chararray, id:chararray, buckets::bucket: int, chararray, chararray)

If I try to filter on $2. I get an error.

a2 = filter a2 by ($2 == 64034);

I get the following error. ERROR 1066: Unable to open iterator for alias a2.

I messed around with this for quite a bit and couln't figure it out. So, I wrong a Python UDF that returns either "Yes" or "No" if the number matches.

@outputSchema('y:chararray')
 def bucket(bucket):
    if bucket == '64034':
        return "Yes"
    else:
        return "No"

 a3 = foreach a1 generate time, myfuncs.bucket($2), $3, $4;

describe a3
a3: {time:chararray, id:chararray, y:chararray, chararray, cararray}

No when I filter on this is works.

a4 = filter a3 by ($2 == 'Yes');

This produces the desired result. However, I need to run a couple more transformations of the data with other UDFs. These UDFs don't do anything to column $2. They just look at column $1 and $3. Both of the UDFs work when I use them before the filter. However, I get an "Unable to open alias" error if I try to apply the UDFs after the filter. Also, If I perform the additional UDFs before the filter and then apply the filter - the filter stops working and I get an "unable to open alias" error. Again these don't alter the schema of $2 at all.

So, what could possible be going on here? One, why doesn't the filter originally work. Two, why do certain UDFs work and then not work with seemingly no logic. Any sort of troubleshooting direction would be helpful.

cloud36
  • 844
  • 6
  • 15
  • 32
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:17

1 Answers1

1

It seems like a2 has not been defined yet. More likely, the code should go as follows:

a2 = filter a1 by ($2 == 64034);
vicsana1
  • 339
  • 1
  • 4