I am having issues running pig streaming. When I start up an interactive pig instance (fyi, I am doing this on the master node of an interactive pig AWS EMR instance via SSH/Putty) with only one machine my pig streaming work perfectly (it also works on my windows cloudera VM image). However, when I switch to using more than one computer, it simply stops working and give various errors.
Note that:
- I am able to run Pig scripts that don’t have any stream commands with no problem on a multi computer instance.
- all my pig work is being done in pig MapReduce mode rather than –x local mode.
- my python script (stream1.py) has this on top #!/usr/bin/env python
Below is small sample of the options I have tried so far (all of the below commands are done in the grunt shell on the master/main node, which I am accessing via ssh/putty):
This is how I get the python file onto the mater node so it can be used:
cp s3n://darin.emr-logs/stream1.py stream1.py
copyToLocal stream1.py /home/hadoop/stream1.py
chmod 755 stream1.py
These are my various stream attemts:
cooc = stream ct_pag_ph through `stream1.py`
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127
cooc = stream ct_pag_ph through `python stream1.py`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python stream1.py ' failed with exit status: 2
DEFINE X `stream1.py`;
cooc = stream ct_bag_ph through X;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127
DEFINE X `stream1.py`;
cooc = stream ct_bag_ph through `python X`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python X ' failed with exit status: 2
DEFINE X `stream1.py` SHIP('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;
ERROR 2017: Internal error creating job configuration.
DEFINE X `stream1.py` SHIP('/stream1.p');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;
DEFINE X `stream1.py` SHIP('stream1.py') CACHE('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
ERROR 2017: Internal error creating job configuration.
define X 'python /home/hadoop/stream1.py' SHIP('/home/hadoop/stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;