Why does pattern matching in Spark not work the same as in Scala? See example below... function f()
tries to pattern match on class, which works in the Scala REPL but fails in Spark and results in all "???". f2()
is a workaround that gets the desired result in Spark using .isInstanceOf()
, but I understand that to be bad form in Scala.
Any help on pattern matching the correct way in this scenario in Spark would be greatly appreciated.
abstract class a extends Serializable {val a: Int}
case class b(a: Int) extends a
case class bNull(a: Int=0) extends a
val x: List[a] = List(b(0), b(1), bNull())
val xRdd = sc.parallelize(x)
attempt at pattern matching which works in Scala REPL but fails in Spark
def f(x: a) = x match {
case b(n) => "b"
case bNull(n) => "bnull"
case _ => "???"
}
workaround that functions in Spark, but is bad form (I think)
def f2(x: a) = {
if (x.isInstanceOf[b]) {
"b"
} else if (x.isInstanceOf[bNull]) {
"bnull"
} else {
"???"
}
}
View results
xRdd.map(f).collect //does not work in Spark
// result: Array("???", "???", "???")
xRdd.map(f2).collect // works in Spark
// resut: Array("b", "b", "bnull")
x.map(f(_)) // works in Scala REPL
// result: List("b", "b", "bnull")
Versions used... Spark results run in spark-shell (Spark 1.6 on AWS EMR-4.3) Scala REPL in SBT 0.13.9 (Scala 2.10.5)