I'm trying to implement a function responsible for returning the intersection of two RDD by comparing a given property.
def intersect[T](left: RDD[Article], right: RDD[Article])(by: Article => (T,Article)) = {
val a: RDD[(T, Article)] = left.map(by)
val b: RDD[(T, Article)] = right.map(by)
a.join(b).map { case (attr, (leftItem, rightItem)) => leftItem }
}
However, during compilation, sbt throws the following error :
Error:(128, 7) value join is not a member of org.apache.spark.rdd.RDD[(T, org.example.Article)]
a.join(b).map { case (attr, (leftItem, rightItem)) => leftItem }
^
If I hardcode the type, everything goes fine. Any idea why I have this error ?
UPDATE
It seems that scala is not able to make the implicit conversion from RDD[(T, Article)] to PairRDDFunctions[K, V], but I have no idea why.
UPDATE
If I modify the code like this :
def intersect[T](left: RDD[Article], right: RDD[Article])(by: Article => (T,Article)) = {
val a: PairRDDFunctions[T, Article] = left.map(by)
val b: RDD[(T, Article)] = right.map(by)
a.join(b).map { case (attr, (leftItem, rightItem)) => leftItem }
}
I get another error :
[error] No ClassTag available for T
[error] val a: PairRDDFunctions[T, Article] = left.map(by)