The scala spark object runs good when it is run in intelliJ. But after building artifact and executing as jar, I am getting this error below.

Exception in thread "main" java.lang.NoClassDefFoundError:org/apache/spark/sql/types/DataType

How to fix this? Appreciate your inputs on this.

IntelliJ IDEA:

jar file generated by File>Project Structure>Project Setting>Artifacts> + > Jar > from modules with dependencies Check Box "Include in project build" selected Apply > OK Tab:Build>Build Artifacts>poc:jar>Build

name := "poc"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.11" % "2.4.3",
  "org.apache.spark" % "spark-sql_2.11" % "2.4.3",
  "com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.4.1",
  "org.apache.hadoop" % "hadoop-aws" % "2.7.1"


import org.apache.spark.sql.types.{ IntegerType, StringType, StructField, StructType}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession

object dataload {
  def main(args: Array[String]): Unit =
    val awsAccessKeyId: String     = args(0)
    val awsSecretAccessKey: String = args(1)
    val csvFilePath: String        = args(2)
    val host: String               = args(3)
    val username: String           = args(4)
    val password: String           = args(5)
    val keyspace: String           = args(6)

    println("length args: " + args.length)

    val Conf = new SparkConf().setAppName("Imp_DataMigration").setMaster("local[2]")
      .set("fs.s3n.awsAccessKeyId", awsAccessKeyId)
      .set("fs.s3n.awsSecretAccessKey", awsSecretAccessKey)
      .set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
      .set("spark.cassandra.connection.host", host)
      .set("spark.cassandra.auth.username", username)
      .set("spark.cassandra.auth.password", password)

    val sc = new SparkContext(Conf)
    val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

    val schemaHdr = StructType(
      StructField("a2z_name", StringType) ::
        StructField("a2z_key", StringType) ::
        StructField("a2z_id", IntegerType) :: Nil

    val df = spark.read.format( source = "csv")
      .option("header", "true")
      .option("delimiter", "\t")
      .option("quote", "\"")
      .load( path = "s3n://at-spring/a2z.csv")


      .format( source = "org.apache.spark.sql.cassandra")



Prakash Raj
Spark applications are typically submitted via the spark-submit script. It is possible to to submit jobs using java -jar ..., but you will have a much more difficult time dealing with classpath issues, as you see to be experiencing right now.

Relatedly, you will want to mark your Spark/Hadoop dependencies as "provided", e.g. "org.apache.spark" % "spark-core_2.11" % "2.4.3" % "provided", as spark-submit will locate and add the necessary .jar files to the classpath from your local install.

Charlie Flowers
  • did dependencies with "provided and did "spark-submit but still the same error :( – Prakash Raj Jul 22 '19 at 01:08
  • spark-submit giving this error **Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found** – Prakash Raj Jul 22 '19 at 01:20

Fixed this issue by building a fat jar using sbt assembly.

This post helped me

How to build an Uber JAR (Fat JAR) using SBT within IntelliJ IDEA?

Prakash Raj
