0

I'm trying to get a count of all users in an alias. Each row contains a map of users.

Like this: ([user_name/454543#Paul Kison]) ([user_name/43433#Josiel's iPhone,user_name/34343434#Jose's iPAD,user_name/3434645655#Josiel's])

When using size() on the entire alias I get this error: ERROR 1066: Unable to open iterator for alias user_count. Backend error : Scalar has more than one row in the output.

users = LOAD 'hbase://group'
   USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('n:user_display_name*', '-limit 10')
   as(display_name);

user_count = FOREACH users GENERATE SIZE(users.display_name);

The idea was to sum the output of the count of each map to get the total count.

Merch
  • 53
  • 3
  • 8
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:18

1 Answers1

1

I had to explicitly set the type of the display_name column to map[] and change use just the column name as the expression passed to SIZE().

users = LOAD 'hbase://group'
   USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('n:user_display_name*', '-limit 10')
   as(display_name:MAP[]);

user_count = FOREACH users GENERATE SIZE(display_name);

After that I summed the result like this:

users_group = GROUP user_count ALL;
total = FOREACH users_group GENERATE SUM(user_count);
Merch
  • 53
  • 3
  • 8