2

I'm currently trying to implement a Binary Pig (see https://github.com/endgameinc/binarypig for more information) Cluster to analyze Malware Binaries with Hadoop and Pig. I used Cloudera CDH for installing Hadoop and Pig.

My Pig script is as follows:

SET debug 'on';

register '/home/myuser/binarypig-1.0-SNAPSHOT-jar-with-dependencies.jar';

SET mapred.cache.files /tmp/scripts#scripts;
SET mapred.create.symlink yes;

%default INPUT 'hdfs://namenode1:8020/bla/test/malware.archive.seq'
%default TIMEOUT_MS '180000'
%default USE_DEVSHM 'true'

data = load '$INPUT' using com.endgame.binarypig.loaders.ExecutingTextLoader('scripts/strings.sh',   '$TIMEOUT_MS', '$USE_DEVSHM');
DUMP data;

The bash script strings.sh is just executing the unix "string" command to collect all the strings of each file within the malware.archive.seq container. I'm running the script with on my namenode:

pig -f strings.pig

For some reason my the job always fails with the following error messages:

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1440074864855_0058  data    MAP_ONLY    Message: Job failed!        hdfs://namenode1:8020/tmp/temp-362821719/tmp-171792164,

Input(s):
Failed to read data from "hdfs://namenode1:8020/bla/test/malware.zip.seq"

Output(s):
Failed to produce result in "hdfs://namenode1:8020/tmp/temp-362821719/tmp- 171792164"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1440074864855_0058

2015-08-25 17:07:21,616 [main] INFO    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher -  Failed!
2015-08-25 17:07:21,616 [main] DEBUG org.apache.pig.impl.io.InterStorage -  Pig Internal storage in use
2015-08-25 17:07:21,622 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias data

The file hdfs://namenode1:8020/bla/test/malware.zip.seq does exist and the rights are set to 777 just to exclude permission errors.

Since my guess is that it has something to do with the load command within the pig script, here are the debug messages for the load command:

2015-08-25 17:07:06,639 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Original macro AST:
 (QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig  . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,640 [main] DEBUG org.apache.pig.parser.QueryParserDriver - macro AST after import:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,640 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Resulting macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Original macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - macro AST after import:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Resulting macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

Does anyone have an idea how to fix this or even how to debug this?

Edit (pig_log added):

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias data

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias data
    at org.apache.pig.PigServer.openIterator(PigServer.java:892)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:478)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
    at org.apache.pig.PigServer.openIterator(PigServer.java:884)
    ... 13 more
     ================================================================================
mr.proton
  • 803
  • 2
  • 7
  • 20
  • The script must have generated a log file, like `pig_1847371234.log` in the directory where you ran it. It should have more information about the error. Could you edit your post to add it, please? – Balduz Aug 26 '15 at 07:52
  • Not sure if it is sufficient, but you could try the steps mentioned here: [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:05

0 Answers0