4

I'm a newbie to Spark and I encountered an issue while submitting an application. I setup a master node with two slaves with spark, a single node with zookeeper, and a single node with kafka. I wanted to launch a modified version of the kafka wordcount example using spark streaming in python.

To submit an application what I do is to ssh into the master spark node and run the <path to spark home>/bin/spark-submit. If I specify the master node with its ip everything is fine and the application correctly consumes messages from kafka and I can see from SparkUI that the application is correctly running on both slaves:

./bin/spark-submit --master spark://<spark master ip>:7077 --jars ./external/spark-streaming-kafka-assembly_2.10-1.3.1.jar ./examples/src/main/python/streaming/kafka_wordcount.py <zookeeper ip>:2181 test

But if I specify the master node with its host name:

./bin/spark-submit --master spark://spark-master01:7077 --jars ./external/spark-streaming-kafka-assembly_2.10-1.3.1.jar ./examples/src/main/python/streaming/kafka_wordcount.py zookeeper01:2181 test

then it hangs with these logs:

15/05/27 02:01:58 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master01:7077/user/Master...
15/05/27 02:02:18 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master01:7077/user/Master...
15/05/27 02:02:38 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master01:7077/user/Master...
15/05/27 02:02:58 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/05/27 02:02:58 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
15/05/27 02:02:58 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.

My /etc/hosts file looks like this:

<spark master ip> spark-master01
127.0.0.1 localhost

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
<spark slave-01 ip> spark-slave01
<spark slave-02 ip> spark-slave02
<kafka01 ip> kafka01
<zookeeper ip> zookeeper01

Update

Here's the first part of the output of netstat -n -a:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address           State
tcp        0      0 0.0.0.0:22              0.0.0.0:*                 LISTEN
tcp        0      0 <spark master ip>:22    <my laptop ip>:60113      ESTABLISHED
tcp        0    260 <spark master ip>:22    <my laptop ip>:60617      ESTABLISHED
tcp6       0      0 :::22                   :::*                      LISTEN
tcp6       0      0 <spark master ip>:7077  :::*                      LISTEN
tcp6       0      0 :::8080                 :::*                      LISTEN
tcp6       0      0 <spark master ip>:6066  :::*                      LISTEN
tcp6       0      0 127.0.0.1:60105         127.0.0.1:44436           TIME_WAIT
tcp6       0      0 <spark master ip>:43874 <spark master ip>:7077    TIME_WAIT
tcp6       0      0 127.0.0.1:51220         127.0.0.1:55029           TIME_WAIT
tcp6       0      0 <spark master ip>:7077  <spark slave 01 ip>:37061 ESTABLISHED
tcp6       0      0 <spark master ip>:7077  <spark slave 02 ip>:47516 ESTABLISHED
tcp6       0      0 127.0.0.1:51220         127.0.0.1:55026           TIME_WAIT
Cœur
  • 32,421
  • 21
  • 173
  • 232
se7entyse7en
  • 3,592
  • 4
  • 26
  • 42

2 Answers2

1

You are using Hostname instead of ip address. So you should mention your Hostname in each node's /etc/hosts file. Then it will work.

Kaushal
  • 2,981
  • 3
  • 27
  • 44
  • The `/etc/hosts` file is the same in all nodes. – se7entyse7en May 27 '15 at 19:23
  • Are you sure? You have same `hosts` file with all machine's host name in your all workers, master and your current node where you are running your application (Spark Driver). – Kaushal May 28 '15 at 06:24
  • Yup, I'm sure. The `/etc/hosts` file is generated by an ansible task and is run on all hosts. Anyway I also checked by ssh-ing in each node. – se7entyse7en May 28 '15 at 11:50
  • Are you executing your application from different machine that is not mentioned in your host file ? – Kaushal May 28 '15 at 12:16
  • Nope, what I do is to ssh into the spark master and run the command above mentioned. – se7entyse7en May 28 '15 at 20:54
0

You can first try ping spark-master01 to see what ip spark-master01 is resolved to. And then you can try netstat -n -a to see if your spark master's port 7077 is correctly bound to your spark master node's ip.

Wesley Miao
  • 791
  • 4
  • 7
  • Please see the update containing the outpu of `netstat -n -a`. Pinging `spark-master01` pings the right ip. – se7entyse7en May 27 '15 at 19:30
  • Could you try submitting your app on one of your slave to see if it works? – Wesley Miao May 28 '15 at 00:53
  • What do you mean? As I told using the ip address when submitting the application works as expected, its run across the two slaves. But I would like to use the hostname instead. – se7entyse7en May 28 '15 at 05:07