I want to execute a shell script as a step on EMR that loads a tarball, unzips it and runs the script inside. I chose this setup to stay as vendor-agnostic as possible. My script is
#!/bin/sh
aws s3 cp s3://path_to_my_bucket/name_of.tar.gz .
tar -xzf name_of.tar.gz
. main_script.sh
Where main_script.sh
is part of the tarball along with a number of other packages, scripts and config files.
If I run this script as a Hadoop user on the master node, everything works as intended. Added as a step via the command-runner.jar, I get errors, no matter what I try.
What I tried so far (and the errors):
- running the script as above (file not found "main_script.sh")
- hardcoding the path to be the Hadoop users home directory (permission denied on
main_script.sh
) - dynamically getting the path where script lives (using this) and giving this path as an argument for the
tar -C
option and invokingmain_script.sh
explicitly from this path (another permission denied onmain_script.sh
)
What is the proper way of loading a bash script into the master node and executing it?
As a bonus, I am wondering why the command-runner.jar
is set up so different from the spark step, which runs as the Hadoop user in the Hadoop user directory.