3

I'm running Pig 0.13.0 and Hadoop 2.5.1, both installed from the Apache distros, they're not packages from Horton or Cloudera or anything.

I'm working with a tutorial and can get it to work fine when running Pig locally ($> ./pig -x local), but when trying to run it on the Hadoop instance I get an error that I'm having a hard time researching on the internet.

This command:

movies = LOAD '/home/hduser/pig-tutorial-master/movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
DUMP movies;

Works fine running locally. When I run it in Hadoop/MR mode, it seems to work fine when I run the first line of code:

grunt> movies = LOAD '/home/hduser/pig-tutorial-master/movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
2014-10-29 18:16:26,281 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-29 18:16:26,281 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

But when I try to $> DUMP movies it gives me this trace:

grunt> dump movies
2014-10-29 18:17:15,419 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-10-29 18:17:15,420 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-10-29 18:17:15,445 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2014-10-29 18:17:15,469 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2081: Unable to setup the load function.
Details at logfile: /usr/local/pig/pig_1414606194436.log

The ERROR 2081 is what I'm trying to diagnose, but can't find anything that helps point me in the right direction. Any ideas of where to start? I assume it's something to do with my Hadoop installation and not Pig, but I don't know. Any suggestions will be helpful.

Thanks,

Mark

EDIT: Here is the full log output:

ERROR 2081: Unable to setup the load function.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias movies
    at org.apache.pig.PigServer.openIterator(PigServer.java:912)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
    at org.apache.pig.Main.run(Main.java:542)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias movies
    at org.apache.pig.PigServer.storeEx(PigServer.java:1015)
    at org.apache.pig.PigServer.store(PigServer.java:974)
    at org.apache.pig.PigServer.openIterator(PigServer.java:887)
    ... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: movies: Store(hdfs://localhost:54310/tmp/temp-1276361014/tmp-2000190966:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:143)
    at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:160)
    at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:275)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
    at org.apache.pig.PigServer.storeEx(PigServer.java:1011)
    ... 14 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:127)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
    ... 21 more
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/home/hduser/pig-tutorial-master/movies_data.csv
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:95)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:123)
    ... 22 more
================================================================================
user7337271
  • 1,362
  • 1
  • 12
  • 23
Mark
  • 239
  • 3
  • 14
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 14:59
  • After searching for the solution for ERROR 2081, I started looking at the errors in the log file more closely. It was an issue of trying to access local files from MR mode. I hadn't noticed anything in the documentation about how to access data in MR vs. Local, but that was the issue. If running in MR, you must access the files via hdfs://hostname:54310. Locally you can access them with the path. This S.O. question was my solution: https://stackoverflow.com/questions/9491888/how-to-load-files-on-hadoop-cluster-using-apache-pig. – Mark Oct 29 '14 at 19:17

2 Answers2

2

If you are running the pig commands from grunt shell on hadoop cluster, set the property: set opt.fetch false;

By setting the above property dump will run in mapreduce mode, by default the above property is set to true.

Kiran Thati
  • 171
  • 1
  • 8
0

if you are working with hadoop 2.6.0 and pig 0.14 downgrading pig to 0.13 may help. This worked for me.

Shibashis
  • 6,350
  • 2
  • 23
  • 34