1

I want to execute a shell script as a step on EMR that loads a tarball, unzips it and runs the script inside. I chose this setup to stay as vendor-agnostic as possible. My script is

#!/bin/sh
aws s3 cp s3://path_to_my_bucket/name_of.tar.gz .
tar -xzf name_of.tar.gz
. main_script.sh

Where main_script.sh is part of the tarball along with a number of other packages, scripts and config files.

If I run this script as a Hadoop user on the master node, everything works as intended. Added as a step via the command-runner.jar, I get errors, no matter what I try.

What I tried so far (and the errors):

  • running the script as above (file not found "main_script.sh")
  • hardcoding the path to be the Hadoop users home directory (permission denied on main_script.sh)
  • dynamically getting the path where script lives (using this) and giving this path as an argument for the tar -C option and invoking main_script.sh explicitly from this path (another permission denied on main_script.sh)

What is the proper way of loading a bash script into the master node and executing it?

As a bonus, I am wondering why the command-runner.jar is set up so different from the spark step, which runs as the Hadoop user in the Hadoop user directory.

Xanthir
  • 17,126
  • 2
  • 27
  • 31
Neuneck
  • 236
  • 2
  • 12

1 Answers1

1

you can use script-runner.jar with region

JAR location : s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar

Arguments : s3://your_bucket/your_shell_script.sh

Refer below link for more info https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html

shashi
  • 26
  • 2