14

I'm seeing what seems to be a very obvious bug with scalacheck, such that if it's really there I can't see how people use it for recursive data structures.

This program fails with a StackOverflowError before scalacheck takes over, while constructing the Arbitrary value. Note that the Tree type and the generator for Trees is taken verbatim from this scalacheck tutorial.

package treegen

import org.scalacheck._
import Prop._

class TreeProperties extends Properties("Tree") {

  trait Tree
  case class Node(left: Tree, right: Tree) extends Tree
  case class Leaf(x: Int) extends Tree

  val ints = Gen.choose(-100, 100)

  def leafs: Gen[Leaf] = for {
    x <- ints
  } yield Leaf(x)

  def nodes: Gen[Node] = for {
    left <- trees
    right <- trees
  } yield Node(left, right)

  def trees: Gen[Tree] = Gen.oneOf(leafs, nodes)

  implicit lazy val arbTree: Arbitrary[Tree] = Arbitrary(trees)

  property("vacuous") = forAll { t: Tree => true }
}

object Main extends App {
  (new TreeProperties).check
}

What's stranger is that changes that shouldn't affect anything seem to alter the program so that it works. For example, if you change the definition of trees to this, it passes without any problem:

  def trees: Gen[Tree] = for {
    x <- Gen.oneOf(0, 1)
    t <- if (x == 0) {leafs} else {nodes}
  } yield t

Even stranger, if you alter the binary tree structure so that the value is stored on Nodes and not on Leafs, and alter the leafs and nodes definition to be:

  def leafs: Gen[Leaf] = Gen.value(Leaf())

  def nodes: Gen[Node] = for {
    x <- ints     // Note: be sure to ask for x first, or it'll StackOverflow later, inside scalacheck code!
    left <- trees
    right <- trees
  } yield Node(left, right, x)

It also then works fine.

What's going on here? Why is constructing the Arbitrary value initially causing a stack overflow? Why does it seem that scalacheck generators are so sensitive to minor changes that shouldn't affect the control flow of the generators?

Why isn't my expression above with the oneOf(0, 1) exactly equivalent to the original oneOf(leafs, nodes) ?

Daniel Martin
  • 21,725
  • 6
  • 46
  • 64
  • Speculation: could it be that your first `trees` always evaluates both arguments while the one using `if` doesn't? – pedrofurla Nov 07 '13 at 06:42
  • It's a shame, `oneOf` is indeed strict: http://scalacheck.googlecode.com/svn/artifacts/1.7/doc/api/org/scalacheck/Gen$.html – pedrofurla Nov 07 '13 at 06:48

3 Answers3

12

Even though following Rickard Nilsson's answer above got rid of the constant StackOverflowError on program startup, I'd still hit a StackOverflowError about one time out of three once I actually asked scalacheck to check the properties. (I changed Main above to run .check 40 times, and would see it succeed twice, then fail with a stack overflow, then succeed twice, etc.)

Eventually I had to put in a hard block to the depth of the recursion and this is what I guess I'll be doing when using scalacheck on recursive data structures in the future:

  def leafs: Gen[Leaf] = for {
    x <- ints
  } yield Leaf(x)

  def genNode(level: Int): Gen[Node] = for {
    left <- genTree(level)
    right <- genTree(level)
  } yield Node(left, right)

  def genTree(level: Int): Gen[Tree] = if (level >= 100) {leafs}
                                       else {leafs | genNode(level + 1)}
  lazy val trees: Gen[Tree] = genTree(0)

With this change, scalacheck never runs into a StackOverflowError.

Community
  • 1
  • 1
Daniel Martin
  • 21,725
  • 6
  • 46
  • 64
11

The problem is that when Scala evaluates trees, it ends up in an endless recursion since trees is defined in terms of itself (via nodes). However, when you put some other expression than trees as the first part of your for-expression in nodes, Scala will delay the evaluation of the rest of the for-expression (wrapped up in chains of map and flatMap calls), and the infinite recursion will not happen.

Just as pedrofurla says, if oneOf was non-strict this would probably not happen (since Scala wouldn't evaluate the arguments immediately). However you can use Gen.lzy to be explicit about the lazyness. lzy takes any generator and delays the evaluation of that generator until it is really used. So the following change solves your problem:

def trees: Gen[Tree] = Gen.lzy(Gen.oneOf(leafs, nodes))
rickynils
  • 1,203
  • 9
  • 9
  • This still fails about one third of the time inside the scalacheck code, but it does answer my question as asked and does move the `StackOverflowError` out of the creation of the `Arbitrary` object and into where it's being evaluated. Accepting this answer, but I'll post what I eventually had to do separately to help people who find this by web search. – Daniel Martin Nov 07 '13 at 14:34
2

A slight generalization of approach in Daniel Martin's own answer is using sized. Something like (untested):

def genTree() = Gen.sized { size => genTree0(size) }

def genTree0(maxDepth: Int) = 
  if (maxDepth == 0) leafs else Gen.oneOf(leafs, genNode(maxDepth))

def genNode(maxDepth: Int) = for {
  depthL <- Gen.choose(0, maxDepth - 1)
  depthR <- Gen.choose(0, maxDepth - 1)
  left <- genTree0(depthL)
  right <- genTree0(depthR)
} yield Node(left, right)

def leafs = for {
  x <- ints
} yield Leaf(x)
Alexey Romanov
  • 154,018
  • 31
  • 276
  • 433