Connecting SparkSQL HiveServer to Cassandra?

Question

So I'm working with Tableau, Spark 1.2, and Cassandra 2.1.2. I have been successful in doing a number of things.

Connecting from Spark shell to Cassandra instance via https://github.com/datastax/spark-cassandra-connector.
Make SparkSQL queries through the previously mentioned connector to Cassandra instance.
Use Tableau (newest CQL3 compatible Simba ODBC Driver for Cassandra: http://www.simba.com/connectors/apache-cassandra-odbc) to run queries and visualizations on Cassandra instance.

My major gap at this point is, how do I properly configure the Spark 1.2 ThriftServer to be able to talk to my Cassandra instance? The ultimate goal being running SparkSQL through Tableau (requires ThriftServer). I am able to start the ThriftServer without an issue (mostly) to where I can run beeline as in the examples and do a "show tables" call. But as you can see below, it results in a 0 length list of tables.

beeline> !connect jdbc:hive2://192.168.56.115:10000
scan complete in 2ms
Connecting to jdbc:hive2://192.168.56.115:10000
Enter username for jdbc:hive2://192.168.56.115:10000: 
Enter password for jdbc:hive2://192.168.56.115:10000: 
log4j:WARN No appenders could be found for logger (org.apache.thrift.transport.TSaslTransport).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 1.2.0)
Driver: null (version null)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.56.115:10000> show tables;
+---------+
| result  |
+---------+
+---------+
No rows selected (1.755 seconds)
0: jdbc:hive2://192.168.56.115:10000>

Do I need the datastax connector? I have to assume the answer to that is "yes".
Do I need to declare a hive-site.xml even though I'm not leveraging Hive in the least?
Can I run this setup without Hive/Metastore? Or is that a requirement of the ThriftServer in Spark 1.2?
Assuming my existing Spark Master/Worker setups are correct, but could be wrong there.

Help! :)

Got any luck i am doing a similar thing the only difference is i am using Jasper instead of Tableau. — Muhammad Haris Altaf, Nov 25 '15 at 19:16

score 0 · Answer 1 · answered Mar 14 '19 at 22:53

You could create a global temporary view of a Cassandra table, then you'll be able to access it via the JDBC thrift server.

val spark = SparkSession
    .builder()
    .enableHiveSupport()
    .getOrCreate()

val cassandraTable = spark.sqlContext
  .read
  .cassandraFormat("mytable", "mykeyspace", pushdownEnable = true)
  .load()

cassandraTable.createGlobalTempView("mytable")

spark.sqlContext.setConf("hive.server2.thrift.port", "10000")
HiveThriftServer2.startWithContext(spark.sqlContext)
System.out.println("Server is running")

Connecting SparkSQL HiveServer to Cassandra?

1 Answers1