I'm trying to append an entry to an existing RDD each iteration of a loop. My code until now is:
var newY = sc.emptyRDD[MatrixEntry]
for (j <- 0 until 8000) {
var arrTmp = Array(MatrixEntry(j, j, 1))
var rddTmp = sc.parallelize(arrTmp)
newY = newY.union(rddTmp)
}
Making these 8000 iterations I get an error when I try to take(10) from that RDD but if I try with smaller number every thing is ok.
The error Exception in thread "main" java.lang.StackOverflowError
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.immutable.List.map(List.scala:296)
at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
Help?