1

I'm getting an error in the Cloudera QuickStart VM I downloaded from http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html.

I am trying a toy example from Tom White's Hadoop: The Definitive Guide book called map_temp.pig, which "finds the maximum temperature by year".

I created a file called temps.txt that contains (year, temperature, quality) entries on each line:

1950 0 1

1950 22 1

1950 -11 1

1949 111 1

Using the example code in the book, I typed the following Pig code into the Grunt terminal:

records = LOAD '/home/cloudera/Desktop/temps.txt'

  AS (year:chararray, temperature:int, quality:int);

DUMP records;

After I typed DUMP records;, I got the error:

2014-05-22 11:33:34,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias records. Backend error : org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1400775973236_0006' doesn't exist in RM.

Details at logfile: /home/cloudera/Desktop/pig_1400782722689.log

I attempted to find out what was causing the error through a Google search: https://www.google.com/search?q=%22application+with+id%22+%22doesn%27t+exist+in+RM%22.

The results there weren't helpful. For example, http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-troubleshoot-error-vpc.html mentioned this bug and said "To solve this problem, you must configure a VPC that includes a DHCP Options Set whose parameters are set to the following values..."

Amazon's suggested fix doesn't seem to be the problem because I'm not using using AWS.

EDIT:

I think the HDFS file path is correct.

[cloudera@localhost Desktop]$ ls
Eclipse.desktop  gnome-terminal.desktop  max_temp.pig  temps.txt
[cloudera@localhost Desktop]$ pwd
/home/cloudera/Desktop
Community
  • 1
  • 1
user3662937
  • 245
  • 7
  • 16
  • Here is a paste of the full log of the terminal input/output in Grunt: http://www.pastebin.com/BBex8MYx – user3662937 May 22 '14 at 23:49
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:07

2 Answers2

2

there's another exception before your error :

org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost.localdomain:8020/home/cloudera/Desktop/temps.txt
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)

Is your file in HDFS? Have you checked the file path?

adrean
  • 21
  • 3
1

I was able to solve this problem by doing pig -x local to start the Grunt interpreter instead of just pig.

I should have used local mode because I did not have access to a Hadoop cluster.

This gave me the errors:

2014-05-22 11:33:34,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias records. Backend error : org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1400775973236_0006' doesn't exist in RM.

2014-05-22 11:33:28,799 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost.localdomain:8020/home/cloudera/Desktop/temps.txt

From http://pig.apache.org/docs/r0.9.1/start.html:

Pig has two execution modes or exectypes:

Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local).

Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce).

You can run Pig in either mode using the "pig" command (the bin/pig Perl script) or the "java" command (java -cp pig.jar ...).

Running the toy example from Tom White's Hadoop: The Definitive Guide book:

-- max_temp.pig: Finds the maximum temperature by year
records = LOAD 'temps.txt' AS (year:chararray, temperature:int, quality:int);
filtered_records = FILTER records BY temperature != 9999 AND
  (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9);
grouped_records = GROUP filtered_records BY year;
max_temp = FOREACH grouped_records GENERATE group,
  MAX(filtered_records.temperature);
DUMP max_temp;

against the following data set in temps.txt (remember that Pig's default input is tab-delimited files):

1950   0      1
1950   22     1
1950   -11    1
1949   111    1

gives this:

grunt> [cloudera@localhost Desktop]$ pig -x local -f max_temp.pig 2>log

(1949,111)

(1950,22)

Community
  • 1
  • 1
user3662937
  • 245
  • 7
  • 16