Spark and ElasticSearch integration

Question

I'm trying to ingest data through spark and push it to ElasticSearch.

I've following the basic tutorial to ingest from Oracle, put it in memory thru Spark and push it to ElasticSearch.

When I try to call the JavaEsSparkSQL.saveToEs(jdbcDF, "spark/test"). jdbcDF is the DataSet loaded from Oracle. I'm simplying reading from my Oracle DB and calling saveToEs.

I am getting the following error:

java.lang.NoClassDefFoundError: org/elasticsearch/spark/sql/api/java/JavaEsSparkSQL

I read that it could be because of the differing version of spark as the Hadoop library is using a separate driver? Here are my dependencys:

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-hadoop</artifactId>
            <version>7.3.2</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>2.4.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>2.4.3</version>
        </dependency>

It doesn't seem that the elasticsearch-hadoop package is installed on the worker... Remove the `provided` and use the `--package` to include it (This is one solution) There is others... — eliasah, Sep 14 '19 at 06:32
could you explain more about the --package command, I'm completely new to spark. From what I understand, I need to make the jars available to all Spark Clusters. I've read various articles and all of them use the spark shell to make the JAR available to the spark classpath. — AzureWorld, Sep 16 '19 at 01:04

Spark and ElasticSearch integration

0 Answers0