I'm running my program of spark which locally working but not remotly. My program have these components(containers):
- My application which is base on spring (for REST calls) that initiate a driver (Spark Session with getOrCreate) and has all the transformers that I built.
- Spark Master based on bitnami image.
- Spark Worker base on bitnami image but also have all the dependencies of my application(i.e. all jars under /dependencies dir).
Locally everything is working great but remotely I got that error when running transformers with UDFs (The rest of the transformers(i.e. without UDFs) are working fine):
Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2350) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ...
This is my spark session code:
val sparkConf = new SparkConf()
.setMaster("spark://spark-master:7077")
.setAppName("My-App")
.set("spark.executor.extraClassPath", "/dependencies/*")
val spark = SparkSession.builder().config(sparkConf).getOrCreate()
So, jobs with external dependencies are working fine, but UDFs produce the error above. I also tried to add my application jar (which contains in it the driver and the spring code with all the other dependencies that already exists in the worker) to the worker to the dependencies folder in it but the error is still produced. Also tried to place it inthe worker in the same location as the driver and add to sparkConf its location using "spark.jars" but without success. Any suggestion?