6

So I'm working with Tableau, Spark 1.2, and Cassandra 2.1.2. I have been successful in doing a number of things.

My major gap at this point is, how do I properly configure the Spark 1.2 ThriftServer to be able to talk to my Cassandra instance? The ultimate goal being running SparkSQL through Tableau (requires ThriftServer). I am able to start the ThriftServer without an issue (mostly) to where I can run beeline as in the examples and do a "show tables" call. But as you can see below, it results in a 0 length list of tables.

beeline> !connect jdbc:hive2://192.168.56.115:10000
scan complete in 2ms
Connecting to jdbc:hive2://192.168.56.115:10000
Enter username for jdbc:hive2://192.168.56.115:10000: 
Enter password for jdbc:hive2://192.168.56.115:10000: 
log4j:WARN No appenders could be found for logger (org.apache.thrift.transport.TSaslTransport).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 1.2.0)
Driver: null (version null)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.56.115:10000> show tables;
+---------+
| result  |
+---------+
+---------+
No rows selected (1.755 seconds)
0: jdbc:hive2://192.168.56.115:10000>
  • Do I need the datastax connector? I have to assume the answer to that is "yes".
  • Do I need to declare a hive-site.xml even though I'm not leveraging Hive in the least?
  • Can I run this setup without Hive/Metastore? Or is that a requirement of the ThriftServer in Spark 1.2?
  • Assuming my existing Spark Master/Worker setups are correct, but could be wrong there.

Help! :)

chris.guethle
  • 193
  • 1
  • 6

1 Answers1

0

You could create a global temporary view of a Cassandra table, then you'll be able to access it via the JDBC thrift server.

val spark = SparkSession
    .builder()
    .enableHiveSupport()
    .getOrCreate()

val cassandraTable = spark.sqlContext
  .read
  .cassandraFormat("mytable", "mykeyspace", pushdownEnable = true)
  .load()

cassandraTable.createGlobalTempView("mytable")

spark.sqlContext.setConf("hive.server2.thrift.port", "10000")
HiveThriftServer2.startWithContext(spark.sqlContext)
System.out.println("Server is running")
Justin Cameron
  • 523
  • 3
  • 8