-2

lets suppose there is a text file tab limited (datetemp.txt) I want to load this text file in pig for processing but when I am typing below line its giving me error as :

grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As (EventID: chararray,eventdate: chararray,count:int);

grunt> dump inputfile;

2014-09-06 08:41:23,527 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2014-09-06 08:41:23,544 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-09-06 08:41:23,551 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2014-09-06 08:41:23,551 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-09-06 08:41:23,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2739171785773930333.jar 2014-09-06 08:42:39,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2739171785773930333.jar created 2014-09-06 08:42:39,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2014-09-06 08:42:39,619 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2014-09-06 08:42:39,630 [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2014-09-06 08:42:39,891 [Thread-12] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://192.168.195.130:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/training/.staging/job_201408292336_0009 2014-09-06 08:42:39,891 [Thread-12] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt 2014-09-06 08:42:40,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs 2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896) at org.apache.hadoop.mapreduce.Job.submit(Job.java:531) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269) at java.lang.Thread.run(Thread.java:662) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273) ... 15 more

2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2014-09-06 08:42:40,135 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.0.0-cdh4.1.1 0.10.0-cdh4.1.1 training 2014-09-06 08:41:23 2014-09-06 08:42:40 UNKNOWN

Failed!

Failed Jobs: JobId Alias Feature Message Outputs N/A inputfile MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896) at org.apache.hadoop.mapreduce.Job.submit(Job.java:531) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269) at java.lang.Thread.run(Thread.java:662) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273) ... 15 more hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785,

Input(s): Failed to read data from "/training/pig/datetemp.txt"

Output(s): Failed to produce result in "hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785"

Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0

Job DAG: null

2014-09-06 08:42:40,135 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2014-09-06 08:42:40,142 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias inputfile Details at logfile: /home/training/pig_1410006833865.log

Please help me here..!!

Prix
  • 19
  • 1
  • 5
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:06

7 Answers7

2

PigStorage is case sensitive. Use PigStorage and not pigstorage.

Gaurav Phapale
  • 949
  • 1
  • 7
  • 21
0

Your question headliner said you were trying to load a CSV file. For that, I've had good luck with using org.apache.pig.piggybank.storage.CSVExcelStorage() in my LOAD statements as demonstrated at https://martin.atlassian.net/wiki/x/WYBmAQ.

0

Why don't you write PigStorage('\t') as you have mentioned already you have tab delimited file instead of PigStorage()

Mentioned code -

grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As (EventID: chararray,eventdate: chararray,count:int);

May be this might solve your problem.

let me know if it is something else.

Indrajeet Gour
  • 2,636
  • 1
  • 28
  • 42
0
hdfs://192.168.195.130:8020/training/pig/datetemp.txt 

file wat not found in your hdfs!! make sure the input file to be placed in the above location.

karthik
  • 551
  • 7
  • 21
0

Have you checked whether the input path exists?

Try:

fs -ls /training/pig/ in Grunt Shell

if it displays datetemp.txt in the list then it will work otherwise give proper input path

Naga
  • 989
  • 6
  • 19
0

Log is telling the ERROR clearly.

org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt

Can you check the file exists in HDFS or not? You can also check your pig is running in mapreduce mode or local mode.

Naga
  • 989
  • 6
  • 19
Narasimha
  • 11
  • 3
0

You can specify ',' in PigStorage Class to read CSV file.

Query looks like :

grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage(',') As (EventID: chararray,eventdate: chararray,count:int);

grunt> dump inputfile;

And make sure that you have file '/training/pig/datetemp.txt' on HDFS. To test run : hadoop fs -ls /training/pig/datetemp.txt

pradeep
  • 3,491
  • 6
  • 24
  • 41