So I'm working with Tableau, Spark 1.2, and Cassandra 2.1.2. I have been successful in doing a number of things.
- Connecting from Spark shell to Cassandra instance via https://github.com/datastax/spark-cassandra-connector.
- Make SparkSQL queries through the previously mentioned connector to Cassandra instance.
- Use Tableau (newest CQL3 compatible Simba ODBC Driver for Cassandra: http://www.simba.com/connectors/apache-cassandra-odbc) to run queries and visualizations on Cassandra instance.
My major gap at this point is, how do I properly configure the Spark 1.2 ThriftServer to be able to talk to my Cassandra instance? The ultimate goal being running SparkSQL through Tableau (requires ThriftServer). I am able to start the ThriftServer without an issue (mostly) to where I can run beeline as in the examples and do a "show tables" call. But as you can see below, it results in a 0 length list of tables.
beeline> !connect jdbc:hive2://192.168.56.115:10000
scan complete in 2ms
Connecting to jdbc:hive2://192.168.56.115:10000
Enter username for jdbc:hive2://192.168.56.115:10000:
Enter password for jdbc:hive2://192.168.56.115:10000:
log4j:WARN No appenders could be found for logger (org.apache.thrift.transport.TSaslTransport).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 1.2.0)
Driver: null (version null)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.56.115:10000> show tables;
+---------+
| result |
+---------+
+---------+
No rows selected (1.755 seconds)
0: jdbc:hive2://192.168.56.115:10000>
- Do I need the datastax connector? I have to assume the answer to that is "yes".
- Do I need to declare a hive-site.xml even though I'm not leveraging Hive in the least?
- Can I run this setup without Hive/Metastore? Or is that a requirement of the ThriftServer in Spark 1.2?
- Assuming my existing Spark Master/Worker setups are correct, but could be wrong there.
Help! :)