I'm a newbie to Spark and I encountered an issue while submitting an application. I setup a master node with two slaves with spark, a single node with zookeeper, and a single node with kafka. I wanted to launch a modified version of the kafka wordcount example using spark streaming in python.
To submit an application what I do is to ssh into the master spark node and run the <path to spark home>/bin/spark-submit
. If I specify the master node with its ip everything is fine and the application correctly consumes messages from kafka and I can see from SparkUI that the application is correctly running on both slaves:
./bin/spark-submit --master spark://<spark master ip>:7077 --jars ./external/spark-streaming-kafka-assembly_2.10-1.3.1.jar ./examples/src/main/python/streaming/kafka_wordcount.py <zookeeper ip>:2181 test
But if I specify the master node with its host name:
./bin/spark-submit --master spark://spark-master01:7077 --jars ./external/spark-streaming-kafka-assembly_2.10-1.3.1.jar ./examples/src/main/python/streaming/kafka_wordcount.py zookeeper01:2181 test
then it hangs with these logs:
15/05/27 02:01:58 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master01:7077/user/Master...
15/05/27 02:02:18 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master01:7077/user/Master...
15/05/27 02:02:38 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master01:7077/user/Master...
15/05/27 02:02:58 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/05/27 02:02:58 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
15/05/27 02:02:58 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
My /etc/hosts
file looks like this:
<spark master ip> spark-master01
127.0.0.1 localhost
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
<spark slave-01 ip> spark-slave01
<spark slave-02 ip> spark-slave02
<kafka01 ip> kafka01
<zookeeper ip> zookeeper01
Update
Here's the first part of the output of netstat -n -a
:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 <spark master ip>:22 <my laptop ip>:60113 ESTABLISHED
tcp 0 260 <spark master ip>:22 <my laptop ip>:60617 ESTABLISHED
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 <spark master ip>:7077 :::* LISTEN
tcp6 0 0 :::8080 :::* LISTEN
tcp6 0 0 <spark master ip>:6066 :::* LISTEN
tcp6 0 0 127.0.0.1:60105 127.0.0.1:44436 TIME_WAIT
tcp6 0 0 <spark master ip>:43874 <spark master ip>:7077 TIME_WAIT
tcp6 0 0 127.0.0.1:51220 127.0.0.1:55029 TIME_WAIT
tcp6 0 0 <spark master ip>:7077 <spark slave 01 ip>:37061 ESTABLISHED
tcp6 0 0 <spark master ip>:7077 <spark slave 02 ip>:47516 ESTABLISHED
tcp6 0 0 127.0.0.1:51220 127.0.0.1:55026 TIME_WAIT