Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce () service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with :

pip install mrjob
307 questions
-1
votes
2 answers

Python command line loop

I'm running a mrjob python script, and in the command line I can pass the number of cores for the system to use. python example_script.py --num-cores 5 I'm looking to run the script for n number of cores for beach marking performance test. IE: I…
F.D
  • 565
  • 1
  • 7
  • 21
-1
votes
1 answer

How to use mrjob.cat to auto-decompress inputs?

I want to use MrJob to analyze a dataset without decompressing it on disk beforehand (it is 18Gb compressed but >3Tb uncompressed). How can I use use mrjob.cat to auto-decompress the file and stream it to my mapper? There aren't any code samples.
crypdick
  • 4,829
  • 3
  • 31
  • 50
-1
votes
1 answer

How to integrate data with python code before running python program on command line

I have downloaded movielens dataset from that hyperlink ml-100k.zip (it is a movie and user information dataset and it is in the older dataset tab) and i have write the simple MapReduce code like below; from mrjob.job import MrJob class…
pcpcne
  • 43
  • 1
  • 9
-1
votes
2 answers

Performing a mapreduce function in Python

I'm trying to learn a little bit of mapreduce in combination with Python. Now I have the following code running from a tutorial I'm doing. from mrjob.job import MRJob class SpendByCustomer(MRJob): def mapper(self, _, line): …
John Dwyer
  • 171
  • 1
  • 11
-1
votes
1 answer

MRJob using a different Python interpreter for local vs. hadoop

I'm using MRJob on machine A to launch MapReduce jobs on machines B_0 thru B_10. The job has dependencies that require it to be run not with the default /bin/python (i.e. the output of which python on machine A) but with /path/to/weird/python, which…
Eli Rose
  • 5,633
  • 7
  • 30
  • 49
-1
votes
3 answers

How can I run mrjob with no input file?

I have a mrjob program, and just get data from sql database, so I don't need read local file or any input file, however mrjob forces me to 'reading from STDIN', so I just create an empty file as input file. It's really ugly, is there a way to run…
-4
votes
1 answer

Python - Mapreduce - PermissionError: [WinError 5] Access is denied

I am getting this error, I tried admin rights, open as admin, UAC off..but still same problem, can anyone tell what is the problem? I am passing 2 files movies2.csv and ratings2.csv from terminal from mrjob.job import MRJob from mrjob.step import…
1 2 3
20
21