PIG/Hadoop issue: ERROR 2081: Unable to setup the load function

Question

I'm running Pig 0.13.0 and Hadoop 2.5.1, both installed from the Apache distros, they're not packages from Horton or Cloudera or anything.

I'm working with a tutorial and can get it to work fine when running Pig locally ($> ./pig -x local), but when trying to run it on the Hadoop instance I get an error that I'm having a hard time researching on the internet.

This command:

movies = LOAD '/home/hduser/pig-tutorial-master/movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
DUMP movies;

Works fine running locally. When I run it in Hadoop/MR mode, it seems to work fine when I run the first line of code:

grunt> movies = LOAD '/home/hduser/pig-tutorial-master/movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
2014-10-29 18:16:26,281 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-29 18:16:26,281 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

But when I try to $> DUMP movies it gives me this trace:

grunt> dump movies
2014-10-29 18:17:15,419 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-10-29 18:17:15,420 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-10-29 18:17:15,445 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2014-10-29 18:17:15,469 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2081: Unable to setup the load function.
Details at logfile: /usr/local/pig/pig_1414606194436.log

The ERROR 2081 is what I'm trying to diagnose, but can't find anything that helps point me in the right direction. Any ideas of where to start? I assume it's something to do with my Hadoop installation and not Pig, but I don't know. Any suggestions will be helpful.

Thanks,

Mark

EDIT: Here is the full log output:

ERROR 2081: Unable to setup the load function.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias movies
    at org.apache.pig.PigServer.openIterator(PigServer.java:912)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
    at org.apache.pig.Main.run(Main.java:542)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias movies
    at org.apache.pig.PigServer.storeEx(PigServer.java:1015)
    at org.apache.pig.PigServer.store(PigServer.java:974)
    at org.apache.pig.PigServer.openIterator(PigServer.java:887)
    ... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: movies: Store(hdfs://localhost:54310/tmp/temp-1276361014/tmp-2000190966:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:143)
    at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:160)
    at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:275)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
    at org.apache.pig.PigServer.storeEx(PigServer.java:1011)
    ... 14 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:127)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
    ... 21 more
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/home/hduser/pig-tutorial-master/movies_data.csv
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:95)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:123)
    ... 22 more
================================================================================

For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). — Dennis Jaheruddin, Dec 28 '15 at 14:59
After searching for the solution for ERROR 2081, I started looking at the errors in the log file more closely. It was an issue of trying to access local files from MR mode. I hadn't noticed anything in the documentation about how to access data in MR vs. Local, but that was the issue. If running in MR, you must access the files via hdfs://hostname:54310. Locally you can access them with the path. This S.O. question was my solution: https://stackoverflow.com/questions/9491888/how-to-load-files-on-hadoop-cluster-using-apache-pig. — Mark, Oct 29 '14 at 19:17

score 2 · Answer 1 · answered Jun 21 '16 at 21:48

2

If you are running the pig commands from grunt shell on hadoop cluster, set the property: set opt.fetch false;

By setting the above property dump will run in mapreduce mode, by default the above property is set to true.

answered Jun 21 '16 at 21:48

Kiran Thati

171
1
8

score 0 · Answer 2 · answered May 31 '15 at 00:30

0

if you are working with hadoop 2.6.0 and pig 0.14 downgrading pig to 0.13 may help. This worked for me.

answered May 31 '15 at 00:30

Shibashis

6,350
2
23
34

PIG/Hadoop issue: ERROR 2081: Unable to setup the load function

2 Answers2