What is the best practice to iterate through an RDD in Spark getting both the previous and the current element? The same as the reduce
function but returning and RDD instead of a single value.
For instance, given:
val rdd = spark.sparkContext.textFile("date_values.txt").
map {
case Array(val1, val2, val3) =>
Element(DateTime.parse(val1), val2.toDouble)
}
The output should be a new RDD with the differences in val2 attributes:
Diff(date, current.val2 - previous.val2)
With the map
function I can only get the current element, and with the reduce
function I can only return 1 element not and RDD.
I could use the foreach
function saving in temporal variables the previous value but I don't think this would follow the Scala-Spark guidelines.
What do you think is the most appropriate way to handle this?