6

I want to create a generator in ScalaCheck that generates numbers between say 1 and 100, but with a bell-like bias towards numbers closer to 1.

Gen.choose() distributes numbers randomly between the min and max value:

scala> (1 to 10).flatMap(_ => Gen.choose(1,100).sample).toList.sorted
res14: List[Int] = List(7, 21, 30, 46, 52, 64, 66, 68, 86, 86)

And Gen.chooseNum() has an added bias for the upper and lower bounds:

scala> (1 to 10).flatMap(_ => Gen.chooseNum(1,100).sample).toList.sorted
res15: List[Int] = List(1, 1, 1, 61, 85, 86, 91, 92, 100, 100)

I'd like a choose() function that would give me a result that looks something like this:

scala> (1 to 10).flatMap(_ => choose(1,100).sample).toList.sorted
res15: List[Int] = List(1, 1, 1, 2, 5, 11, 18, 35, 49, 100)

I see that choose() and chooseNum() take an implicit Choose trait as an argument. Should I use that?

jjst
  • 2,363
  • 1
  • 17
  • 33
  • If by Bell-like you mean Gaussian then it's impossible because the Gaussian (normal) distribution is symmetric around the mean (which you want to be 1) so you get a lot of negative values (you can get the absolute value and then it's probably what you want), anyway there are two ways to achieve this, first way is to convert the uniform distribution (which is the one used by `Gen.choose` by default) to normal distribution, second way is to use random generators which support Gaussian distribution, anyway the answer given by @LaloInDublin should cover your requirement) – Nader Ghanbari Feb 16 '16 at 05:06
  • 2
    Not an answer but it's worth considering why you really want or need to do this. The bias in `chooseNum`, etc. is designed to catch corner cases—it's less clear why you'd want the kind of distribution you describe. – Travis Brown Feb 16 '16 at 17:17

2 Answers2

5

You could use Gen.frequency() (1):

 val frequencies = List(
   (50000, Gen.choose(0, 9)),
   (38209, Gen.choose(10, 19)),
   (27425, Gen.choose(20, 29)),
   (18406, Gen.choose(30, 39)),
   (11507, Gen.choose(40, 49)),
   ( 6681, Gen.choose(50, 59)),
   ( 3593, Gen.choose(60, 69)),
   ( 1786, Gen.choose(70, 79)),
   (  820, Gen.choose(80, 89)),
   (  347, Gen.choose(90, 100))
 )

 (1 to 10).flatMap(_ => Gen.frequency(frequencies:_*).sample).toList
 res209: List[Int] = List(27, 21, 31, 1, 21, 18, 9, 29, 69, 29)

I got the frequencies from https://en.wikipedia.org/wiki/Standard_normal_table#Complementary_cumulative. The code is just a sample of the table (% 3 or mod 3), but I think you can get the idea.

Onilton Maciel
  • 3,320
  • 19
  • 28
  • Thanks. I ended up using a simpler version of this, which gives me a sufficient approximation for my testing needs. – jjst Feb 20 '16 at 22:20
3

I can't take much credit for this, and will point you to this excellent page: http://www.javamex.com/tutorials/random_numbers/gaussian_distribution_2.shtml

A lot of this depends what you mean by "bell-like". Your example doesn't show any negative numbers but the number "1" can't be in the middle of the bell and not produce any negative numbers unless it was a very, very tiny bell!

Forgive the mutable loop but I use them sometimes when I have to reject values in a collection build:

object Test_Stack extends App {

  val r = new java.util.Random()

  val maxBellAttempt = 102
  val stdv = maxBellAttempt / 3  //this number * 3 will happen about 99% of the time


  val collectSize = 100000
  var filled = false


  val l = scala.collection.mutable.Buffer[Int]()

  //ref article above "What are the minimum and maximum values with nextGaussian()?"

  while(l.size < collectSize){

    val temp = (r.nextGaussian() * stdv + 1).abs.round.toInt //the +1 is the mean(avg) offset. can be whatever
    //the abs is clipping the curve in half you could remove it but you'd need to move the +1 over more

    if (temp <= maxBellAttempt) l+= temp

  }

  val res = l.to[scala.collection.immutable.Seq]
  //println(res.mkString("\n"))
}

Here's the distribution I just pasted the output into excel and did a "countif" to show the freq of each: enter image description here

LaloInDublin
  • 5,167
  • 4
  • 19
  • 23