0

I'm running my program of spark which locally working but not remotly. My program have these components(containers):

  • My application which is base on spring (for REST calls) that initiate a driver (Spark Session with getOrCreate) and has all the transformers that I built.
  • Spark Master based on bitnami image.
  • Spark Worker base on bitnami image but also have all the dependencies of my application(i.e. all jars under /dependencies dir).

Locally everything is working great but remotely I got that error when running transformers with UDFs (The rest of the transformers(i.e. without UDFs) are working fine):

Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2350) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ...

This is my spark session code:

val sparkConf = new SparkConf()
.setMaster("spark://spark-master:7077")
.setAppName("My-App")
.set("spark.executor.extraClassPath", "/dependencies/*")

val spark = SparkSession.builder().config(sparkConf).getOrCreate()

So, jobs with external dependencies are working fine, but UDFs produce the error above. I also tried to add my application jar (which contains in it the driver and the spring code with all the other dependencies that already exists in the worker) to the worker to the dependencies folder in it but the error is still produced. Also tried to place it inthe worker in the same location as the driver and add to sparkConf its location using "spark.jars" but without success. Any suggestion?

ChopChop
  • 13
  • 2
  • Does this helps https://stackoverflow.com/questions/39953245/how-to-fix-java-lang-classcastexception-cannot-assign-instance-of-scala-collect? – koiralo Jan 11 '21 at 12:15
  • No, I wrote it at the bottom of the question. Tried to add my app jar to the worker image using "spark.jars" but without success – ChopChop Jan 11 '21 at 12:17

1 Answers1

0

After much of googling I came across the solution of how to integrate Spring-Boot and Spark. I needed to change my pom to make an uber-jar using shade plugin. So I replaced this:

            <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
            <configuration>
                <fork>true</fork>
                <executable>true</executable>
            </configuration>
            <executions>
                <execution>
                    <goals>
                        <goal>repackage</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

with:

            <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.4</version>
            <dependencies>
                <dependency>
                    <groupId>org.springframework.boot</groupId>
                    <artifactId>spring-boot-maven-plugin</artifactId>
                    <version>${spring-boot.version}</version>
                </dependency>
            </dependencies>
            <configuration>
                <keepDependenciesWithProvidedScope>false</keepDependenciesWithProvidedScope>
                <createDependencyReducedPom>false</createDependencyReducedPom>
                <filters>
                    <filter>
                        <artifact>*:*</artifact>
                        <excludes>
                            <exclude>module-info.class</exclude>
                            <exclude>META-INF/*.SF</exclude>
                            <exclude>META-INF/*.DSA</exclude>
                            <exclude>META-INF/*.RSA</exclude>
                        </excludes>
                    </filter>
                </filters>
                <transformers>
                    <transformer
                            implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                        <resource>META-INF/spring.handlers</resource>
                    </transformer>
                    <transformer
                            implementation="org.springframework.boot.maven.PropertiesMergingResourceTransformer">
                        <resource>META-INF/spring.factories</resource>
                    </transformer>
                    <transformer
                            implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                        <resource>META-INF/spring.schemas</resource>
                    </transformer>
                    <transformer
                            implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
                    <transformer
                            implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <mainClass>${start-class}</mainClass>
                    </transformer>
                </transformers>
            </configuration>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

And then, I added the project jar to every worker and also these configurations to the Spark Session:

    "spark.executor.extraClassPath", "/path/app.jar",
    "spark.driver.extraClassPath", "/path/app.jar",
    "spark.jars", "/path/app.jar",
ChopChop
  • 13
  • 2