How do I get independent service Zeppelin to see Hive?

Question

I am using HDP-2.6.0.3 but I need Zeppelin 0.8, so I have installed it as an independent service. When I run:

%sql 
show tables

I get nothing back and I get 'table not found' when I run Spark2 SQL commands. Tables can be seen in the 0.7 Zeppelin that is part of HDP.

Can anyone tell me what I am missing, for Zeppelin/Spark to see Hive?

The steps I performed to create the zep0.8 are as follows:

maven clean package -DskipTests -Pspark-2.1 -Phadoop-2.7-Dhadoop.version=2.7.3 -Pyarn -Ppyspark -Psparkr -Pr -Pscala-2.11

Copied zeppelin-site.xml and shiro.ini from /usr/hdp/2.6.0.3-8/zeppelin/conf to /home/ed/zeppelin/conf.

created /home/ed/zeppelin/conf/zeppeli-env.sh in which I put the following:

export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.6.0.3-8"

Copied /etc/hive/conf/hive-site.xml to /home/ed/zeppelin/conf

EDIT: I have also tried:

import org.apache.spark.sql.SparkSession
val spark = SparkSession
          .builder()
          .appName("interfacing spark sql to hive metastore without configuration file")
          .config("hive.metastore.uris", "thrift://s2.royble.co.uk:9083") // replace with your hivemetastore service's thrift url
          .config("url", "jdbc:hive2://s2.royble.co.uk:10000/default")
          .config("UID", "admin")
          .config("PWD", "admin")
          .config("driver", "org.apache.hive.jdbc.HiveDriver")
          .enableHiveSupport() // don't forget to enable hive support
          .getOrCreate()

same result, and:

import java.sql.{DriverManager, Connection, Statement, ResultSet}
val url = "jdbc:hive2://"
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "admin"
val password = "admin"
Class.forName(driver).newInstance
val conn: Connection = DriverManager.getConnection(url, user, password)

which gives:

 java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
ERROR XSDB6: Another instance of Derby may have already booted the database /home/ed/metastore_db

Fixed error with:

val url = "jdbc:hive2://s2.royble.co.uk:10000"

but still no tables :(

schoon · Answer 1 · 2017-10-20T11:06:12.847

This works:

import java.sql.{DriverManager, Connection, Statement, ResultSet}
val url = "jdbc:hive2://s2.royble.co.uk:10000"
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "admin"
val password = "admin"
Class.forName(driver).newInstance
val conn: Connection = DriverManager.getConnection(url, user, password)
val r: ResultSet = conn.createStatement.executeQuery("SELECT * FROM tweetsorc0")

but then I have the pain of converting the resultset to a dataframe. I'd rather SparkSession worked and I get a dataframe so I will add a bounty later today.

Michail N · Answer 2 · 2017-12-18T16:30:18.437

I had a similar problem in Cloudera Hadoop. In my case the problem was that spark sql did not see my hive metastore. So when I used my Spark Session object for spark SQL I could not see my previously created tables. I managed to solve it with adding in zeppelin-env.sh

export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2
export HADOOP_HOME=/opt/cloudera/parcels/CDH
export SPARK_CONF_DIR=/etc/spark/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf

(I assume for Horton Works these paths are something else). I also change spark.master from local[*] to yarn-client at Interpreter UI. Most importantly I manually copied hive-site.xml in /etc/spark/conf/ because I though it was strange that it was not in that directory and that solved my problem.

So my advice is to see if hive-site.xml exists in your SPARK_CONF_DIR and if not add it manually. I also find a guide for Horton Works and zeppelin in case this will not work.

How do I get independent service Zeppelin to see Hive?

2 Answers2