I'm running a very simple Spark job on AWS EMR and can't seem to get any log output from my script.
I've tried with printing to stderr:
from pyspark import SparkContext
import sys
if __name__ == '__main__':
sc = SparkContext(appName="HelloWorld")
print('Hello, world!', file=sys.stderr)
sc.stop()
And using the spark logger as shown here:
from pyspark import SparkContext
if __name__ == '__main__':
sc = SparkContext(appName="HelloWorld")
log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger(__name__)
logger.error('Hello, world!')
sc.stop()
EMR gives me two log files after the job runs: controller
and stderr
. Neither log contains the "Hello, world!"
string. It's my understanding the stdout
is redirected to stderr
in spark. The stderr
log shows that the job is accepted, run, and completed successfully.
So my question is, where can I view my script's log output? Or what should I change in my script to log correctly?
Edit: I used this command to submit the step:
aws emr add-steps --region us-west-2 --cluster-id x-XXXXXXXXXXXXX --steps Type=spark,Name=HelloWorld,Args=[--deploy-mode,cluster,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=true,s3a://path/to/simplejob.py],ActionOnFailure=CONTINUE