15

I am running a AWS EMR cluster with Spark (1.3.1) installed via the EMR console dropdown. Spark is current and processing data but I am trying to find which port has been assigned to the WebUI. I've tried port forwarding both 4040 and 8080 with no connection. I'm forwarding like so

ssh -i ~/KEY.pem -L 8080:localhost:8080 hadoop@EMR_DNS

1) How do I find out what the Spark WebUI's assigned port is? 2) How do I verify the Spark WebUI is running?

gallamine
  • 765
  • 2
  • 10
  • 25

5 Answers5

11

Spark on EMR is configured for YARN, thus the Spark UI is available by the application url provided by the YARN Resource Manager (http://spark.apache.org/docs/latest/monitoring.html). So the easiest way to get to it is to setup your browser with SOCKS using a port opened by SSH then from the EMR console open Resource Manager and click the Application Master URL provided to the right of the running application. Spark History server is available at the default port 18080.

Example of socks with EMR at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-web-interfaces.html

ChristopherB
  • 1,915
  • 11
  • 18
  • Hi, I'm able to access the Hadoop ResourceManager on `http://master-public-dns-name:8088/`, but none of the links to Application Master URLs work. Have I set up my proxy wrong, or should I be using the YARN ResourceManager - how can I access the YARN ResourceManager? Lastly, the `RecourseManager` link on the EMR console isn't available to me, only `Enable Web Connection` is a clickable link. Any idea why this is? – Rory Byrne Aug 26 '15 at 11:56
  • 3
    It sounds like you are using EMR release 4.0.0. The Hadoop ResourceManager at port 8088 is the YARN ResourceManager. Check that your URL patterns on the socks proxy includes the URL paths the Application Master URL is showing (likely expected domain is different). Finally, the Enable Web Connection on the AWS EMR console is a bug that will be fixed soon. – ChristopherB Aug 27 '15 at 12:56
  • I also have the same problem, I can open 8088 in my browser, but I cannot open 9026 and 9101. Besides, I click the "Enable Web Connection" and never see the list of links – soulmachine Oct 05 '15 at 19:24
  • @soulmachine is your ssh tunnel and socks proxy enabled? http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-ssh-tunnel.html – ChristopherB Oct 06 '15 at 13:08
  • 1
    I can open 8088(ResourceManager) 8888(Hue) and 11000(Oozie) via SSH tunnel, I'm sure my socks proxy is working – soulmachine Oct 07 '15 at 21:25
  • @soulmachine the 9026 and 9101 have changed with EMR release 4.x. See http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-differences.html – ChristopherB Oct 08 '15 at 00:27
  • I was able to access the yarn resource manager at 8088, and the spark job was accessible at 20888. If for some reason, one is not able to access the same, then the port is either closed or blocked. Checking the same using `netstat -an | grep port_number | grep -i listen` would be a good idea. – Pramit Aug 30 '16 at 03:51
9

Here is an alternative if you don't want to deal with the browser setup with SOCKS as suggested on the EMR docs.

  1. Open a ssh tunnel to the master node with port forwarding to the machine running spark ui

    ssh -i path/to/aws.pem  -L 4040:SPARK_UI_NODE_URL:4040 hadoop@MASTER_URL
    

    MASTER_URL (EMR_DNS in the question) is the URL of the master node that you can get from EMR Management Console page for the cluster

    SPARK_UI_NODE_URL can be seen near the top of the stderr log. The log line will look something like:

    16/04/28 21:24:46 INFO SparkUI: Started SparkUI at http://10.2.5.197:4040
    
  2. Point your browser to localhost:4040

Tried this on EMR 4.6 running Spark 2.6.1

Mogsdad
  • 40,814
  • 19
  • 140
  • 246
ud3sh
  • 1,109
  • 9
  • 13
  • 2
    where is the stderr log? If I start the pyspark in the same note as master_URL, should the master_URL and the SPRK_UL_NODE should be the same? – sgu Feb 23 '17 at 18:13
5

Glad to announce that this feature is finally available on AWS. You won't need to run any special commands (or to configure a SSH tunnel) : enter image description here

By clicking on the link to the spark history server ui, you'll be able to see the old applications logs, or to access the running spark job's ui :

enter image description here

For more details: https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html

I hope it helps !

mahmoud mehdi
  • 1,081
  • 13
  • 20
2

Just run the following command:

ssh -i /your-path/aws.pem -N -L 20888:ip-172-31-42-70.your-region.compute.internal:20888 hadoop@ec2-xxx.compute.amazonaws.com.cn

There are 3 places you need to change:

  1. your .pem file
  2. your internal master node IP
  3. your public DNS domain.

Finally, on the Yarn UI you can click your Spark Application Tracking URL, then just replace the url:

"http://your-internal-ip:20888/proxy/application_1558059200084_0002/" 

->

"http://localhost:20888/proxy/application_1558059200084_0002/"

It worked for EMR 5.x

Miedena
  • 93
  • 11
DennisLi
  • 2,708
  • 3
  • 16
  • 35
-1

Simply use SSH tunnel On your local machine do:

ssh -i /path/to/pem -L 3000:ec2-xxxxcompute-1.amazonaws.com:8088 hadoop@ec2-xxxxcompute-1.amazonaws.com

On your local machine browser hit:

localhost:3000

Jimmy
  • 2,007
  • 14
  • 12