1

I just started to program with Spark 0.7.2 and Scala 2.9.3. I am testing a machine learning algorithm on a standalone machine, and the last step of the algorithm requires to calculate the MSE (mean square error) between two matrices, i.e. || A - M||^2 and we do a element wise subtraction between the two matrices. Since the potential size of A is extremely large and sparse, we store the matrix in terms of (key, value) pairs, where the key is the coordinate (i,j) and the value is a tuple of the corresponding element of A and M, i.e. (A_ij, M_ij). The whole ML algorithm is gradient descent, so for each iteration we calculate the MSE and test it against a certain threshold. However, the whole program run normally without calculating the MSE for each iteration. Here is how the program looks like:

val ITERATIONS = 100
for (i <- 1 to ITERATIONS) {
  ... // calculate M for each iteration
  val mse = A.map{ x => 
    val A_ij = x._2(0) 
    val M_ij = x._2(1)
    (A_ij - M_ij) * (A_ij - M_ij)
  }.reduce(_+_)
  ...
}

This program could only run up to 45 iterations, and it will crash with the following Spark Exception:

[error] (run-main) spark.SparkException: Job failed: ShuffleMapTask(764, 0) failed: ExceptionFailure(java.lang.StackOverflowError)
spark.SparkException: Job failed: ShuffleMapTask(764, 0) failed: ExceptionFailure(java.lang.StackOverflowError)
    at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)
    at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)
    at spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:601)
    at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:300)
    at spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)
    at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)
java.lang.RuntimeException: Nonzero exit code: 1
    at scala.sys.package$.error(package.scala:27)

Another observation is that for each iteration the runtime will increasing increase by around 5%. Also without "reduce( _ + _ )", there is no StackOverflowError. I have tried to increase the parallelism to the total number of possible physical threads, but that doesn't help.

Really appreciate anyone could point out some direction that I could figure out the root cause of the stack overflow error.

Edit:

  1. The type of A is spark.RDD[((Double, Double), Array[Double])]
  2. The stackoverflow exception, and it repeats from "at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)" 61 times:

    13/06/26 00:44:41 ERROR LocalScheduler: Exception in task 0
    java.lang.StackOverflowError
        at java.lang.Exception.<init>(Exception.java:77)
        at java.lang.reflect.InvocationTargetException.<init>(InvocationTargetException.java:54)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1849)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:435)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
    
  3. Main Iterations code

Some utility functions are included in the next list element

while (i <= ITERATION && err >= THRESHOLD) {      
  // AW: group by row, then create key by col
  // split A by row
  // (col, (A_w_M_element, W_row_vector, (row, col)))
  AW = A.map(x =>
    (x._1._1, (x._1, x._2))
  ).cogroup(W).flatMap( x => {
    val wt_i = x._2._2(0)
    val A_i_by_j = x._2._1
    A_i_by_j.map( j => (j._1._2, (j._2, wt_i, j._1)))
  })

  // calculate the X = Wt*A
  X_i_by_j = AW.map( k => 
    (k._1, k._2._2.map(_*k._2._1(0)))
  ).reduceByKey(op_two_arrays(_, _, add))

  // Y = Wt*M = Wt*WH at the same time  
  Y_i_by_j = AW.map( k => 
    (k._1, k._2._2.map(_*k._2._1(2)))
  ).reduceByKey(op_two_arrays(_, _, add))

  // X ./ Y
  X_divide_Y = X_i_by_j.join(Y_i_by_j).map(x => 
    (x._1, op_two_arrays(x._2._1, x._2._2, divide))
  )

  // H = H .* X_divide_Y
  H = H.join(X_divide_Y).map(x => 
    (x._1, op_two_arrays(x._2._1, x._2._2, multiple))
  )

  // Update M = WH
  // M = matrix_multi_local(AW, H)
  A = AW.join(H).map( x => {
    val orig_AwM = x._2._1._1
    val W_row = x._2._1._2
    val cord = x._2._1._3
    val H_col = x._2._2
    // notice that we include original A here as well
    (cord, Array(orig_AwM(0), orig_AwM(1), dot_product_local(W_row, H_col)))
  })

  // split M into two intermediate matrix (one by row, and the other by col)

  /*val M_by_i = M.map(x =>
    (x._1._1, (x._1, x._2))
  )
  val M_by_j = M.map(x =>
    (x._1._2, (x._1, x._2))
  )*/

  // AH: group by col, then create key by row
  // Divide A by row first
  // val AH = matrix_join_local(M_by_j, H)
  AH = A.map(x =>
    (x._1._2, (x._1, x._2))
  ).cogroup(H).flatMap( x => {
    val H_col = x._2._2(0)
    val AM_j_by_i = x._2._1
    AM_j_by_i.map( i => (i._1._1, (i._2, H_col, i._1)))
  })

  // calculate V = At*H
  V_j_by_i = AH.map( k => 
    (k._1, k._2._2.map(_*k._2._1(0)))
  ).reduceByKey(op_two_arrays(_, _, add))

  // calculate U = Mt*H
  U_j_by_i = AH.map( k => 
    (k._1, k._2._2.map(_*k._2._1(2)))
  ).reduceByKey(op_two_arrays(_, _, add))

  // V / U
  V_divide_U = V_j_by_i.join(U_j_by_i).map(x => 
    (x._1, op_two_arrays(x._2._1, x._2._2, divide))
  )

  // W = W .* V_divide_U
  W = W.join(V_divide_U).map(x => 
    (x._1, op_two_arrays(x._2._1, x._2._2, multiple))
  )
  // M = W*H
  A = AH.join(W).map( x => {
    val orig_AwM = x._2._1._1
    val H_col = x._2._1._2
    val cord = x._2._1._3
    val W_row = x._2._2
    // notice that we include original A here as well
    (cord, Array(orig_AwM(0), orig_AwM(1), dot_product_local(W_row, H_col)))
  })  

  // Calculate the error
  // calculate the sequre of difference
  err = A.map( x => (x._2(0) - x._2(2))*(x._2(0) - x._2(2))/A_len).reduce(_+_)
  println("At round " + i + ": MSE is " + err)
}

Some Utility functions that used:

def op_two_arrays (array1: Array[Double], array2: Array[Double], f: (Double, Double) => Double) : Array[Double] = {
  val len1 = array1.length
  val len2 = array2.length
  if (len1 != len2) {
    return null
  }
  // val new_array : Array[Double] = new Array[Double](len1)
  for (i <- 0 to len1 - 1) {
    array1(i) = f(array1(i), array2(i))
  }
  return array1
}

// element-wise operation
def add (a: Double, b: Double): Double = { a + b }

def multiple (a: Double, b: Double): Double = { a * b }

def divide (a: Double, b: Double): Double = {
  try {
    return a / b
  } catch {
    case x: ArithmeticException => {
      println("ArithmeticException: detect divide by zero")
      return Double.NaN
    }
  }
}

def array_sum (array: Array[Double]) : Double = {
  var sum: Double = 0.0
  for (i <- array) {
    sum += i
  }
  return sum
}

def dot_product (vec1: Array[Double], vec2: Array[Double]) : Double = {
  array_sum(op_two_arrays(vec1, vec2, multiple))
}
teamwork523
  • 11
  • 1
  • 4
  • Can you please add the following information: 1) Stacktrace of the stack overflow exception (you only posted the stack trace for the Spark Exception, the stack overflow exception is embedded in that) and 2) The datatypes of the variables you use (e.g. what type is "A"). – stefan.schwetschke Jun 26 '13 at 07:34
  • Have you taken a look at the spark examples? They do a logistic regression which is very similar as they have an implementation of gradient decent. If I had to guess, you're probably not using the spark context to generate an RDD which allows for distributed mapreduce, and you're actually using scala collections reduce. Take a look at their example: https://github.com/mesos/spark/blob/master/examples/src/main/scala/spark/examples/SparkLR.scala – Noah Jun 26 '13 at 12:56
  • @stefan.schwetschke I added the information at the end of the post. – teamwork523 Jun 26 '13 at 17:17
  • Hi @Noah, I have used the Spark context in my code `val sc = new SparkContext(args(0), "nmf", System.getenv("SPARK_HOME"), Seq(projJAR))`. There is one difference between my code and that example. I created a "util.scala", and I created local instance for the utility methods, like `val add_local = util.add _` according to slide 32 of [link](http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-part-1-amp-camp-2012-spark-intro.pdf). If I continuously reuse them in my loop, will that be a concern for the stack overflow issue? – teamwork523 Jun 26 '13 at 17:23
  • First, if you don't call `reduce` then the RDD won't do any computation, so you don't get the stack overflow. Second, Spark sets the default memory limit to ram available on your machine minus 1gb, so if you're running locally it's very possible that you're running out of memory and might need to increase the default to use more than available ram (though performance will suffer quite a bit). Have you tried doing fewer iterations? You might want to look at the Spark tuning guide as well http://spark-project.org/docs/latest/tuning.html – Noah Jun 26 '13 at 17:44
  • @Noah, I noticed that in the example it uses spark.util.Vector. What is the advantage of that compared with native Scala Array? – teamwork523 Jun 26 '13 at 17:49
  • Not much, it's just a wrapper over Array[Double] with some convenience functions. – Noah Jun 26 '13 at 17:57
  • 2
    Could you post more of your code? I suspect that there's a recursive function and the recursion depth is growing with each iteration, causing the ~5% per-iteration slowdown followed by the StackOverflow once the recursion becomes deep enough. Some of the other suggestions in this discussion might apply if you were seeing OutOfMemoryError, but not to the StackOverflow exception. – Josh Rosen Jun 26 '13 at 17:58
  • @Noah, I had tried to increase the memory to 32G by using `env JAVA_OPTS="-Xmx32g" sbt run` but still got the overflow problem. I will try to use the local function declaration and spark.util.Vector to see whether those could fix or not. – teamwork523 Jun 26 '13 at 18:02
  • @JoshRosen makes a good point, you're running out of stack space not heap space. – Noah Jun 26 '13 at 18:07
  • Side note, you could always try `-Xss512m` or some variant to have some ridiculous stack size... – Noah Jun 26 '13 at 18:34
  • What version of Scala is this? – Richard Sitze Jun 26 '13 at 18:51
  • Is this possibly related to [this issue](http://oldfashionedsoftware.com/2009/07/10/scala-code-review-foldleft-and-foldright/)? There, a stack overflow occurs because of using foldRight instead of foldLeft. – ptikobj Jun 26 '13 at 19:08
  • @Noah, I have tried `-Xss32g`, but still overflow at exactly the same place – teamwork523 Jun 26 '13 at 20:51
  • @JoshRosen, I have posted the code. I didn't use any recursion, or at least I didn't mean to do that. Basically, it is all spark RDD operations, i.e. `join`, `map`, `reduceByKey` and etc. – teamwork523 Jun 26 '13 at 20:53
  • @RichardSitze, I am using Scala 2.9.3 – teamwork523 Jun 26 '13 at 20:53
  • @ptikobj, I took a look at the post, but I didn't call `foldRight` function. Thanks for pointing that out though. – teamwork523 Jun 26 '13 at 21:33

1 Answers1

0

I have tried to increase stack size, localize the utility functions, using spark.util.Vector, but unfortunately none of them work it out. Then I tried to downgrade Spark from 0.7.2 to 0.6.3 (https://github.com/mesos/spark/tree/branch-0.6). And it works and no more Stack Overflow even for a 10,000 by 10,000 matrix. I don't know how exactly it fix it, so I post the difference between the reduce function in RDD.scala:

--- spark-0.6.3/core/src/main/scala/spark/RDD.scala 2013-06-27 11:31:12.628017194 -0700
+++ spark-0.7.2/core/src/main/scala/spark/RDD.scala 2013-06-27 13:42:22.844686240 -0700
@@ -316,39 +468,93 @@
   def reduce(f: (T, T) => T): T = {
     val cleanF = sc.clean(f)
+    // println("RDD.reduce: after sc.clean")
     val reducePartition: Iterator[T] => Option[T] = iter => {
       if (iter.hasNext) {
         Some(iter.reduceLeft(cleanF))
-      }else {
+      } else {
         None
       }
     }
-    val options = sc.runJob(this, reducePartition)
-    val results = new ArrayBuffer[T]
-    for (opt <- options; elem <- opt) {
-      results += elem
-    }
-    if (results.size == 0) {
-      throw new UnsupportedOperationException("empty collection")
-    } else {
-      return results.reduceLeft(cleanF)
+    // println("RDD.reduce: after reducePartition")
+    var jobResult: Option[T] = None
+    val mergeResult = (index: Int, taskResult: Option[T]) => {
+      if (taskResult != None) {
+        jobResult = jobResult match {
+          case Some(value) => Some(f(value, taskResult.get))
+          case None => taskResult
+        }
+      }
     }
+    // println("RDD.reduce: after jobResult")
+    sc.runJob(this, reducePartition, mergeResult)
+    // println("RDD.reduce: after sc.runJob")
+    // Get the final result out of our Option, or throw an exception if the RDD was empty
+    jobResult.getOrElse(throw new UnsupportedOperationException("empty collection"))
+    // println("RDD.reduce: finished")
   }

   /**
    * Aggregate the elements of each partition, and then the results for all the partitions, using a
-   * given associative function and a neutral "zero value". The function op(t1, t2) is allowed to 
+   * given associative function and a neutral "zero value". The function op(t1, t2) is allowed to
    * modify t1 and return it as its result value to avoid object allocation; however, it should not
    * modify t2.
    */
   def fold(zeroValue: T)(op: (T, T) => T): T = {
+    // Clone the zero value since we will also be serializing it as part of tasks
+    var jobResult = Utils.clone(zeroValue, sc.env.closureSerializer.newInstance())
     val cleanOp = sc.clean(op)
-    val results = sc.runJob(this, (iter: Iterator[T]) => iter.fold(zeroValue)(cleanOp))
-    return results.fold(zeroValue)(cleanOp)
+    val foldPartition = (iter: Iterator[T]) => iter.fold(zeroValue)(cleanOp)
+    val mergeResult = (index: Int, taskResult: T) => jobResult = op(jobResult, taskResult)
+    sc.runJob(this, foldPartition, mergeResult)
+    jobResult
   }
teamwork523
  • 11
  • 1
  • 4