4

Because of the pigeon hole principle, you can't just use

output = min + (rand() % (int)(max - min + 1))

to generate an unbiased, uniform, result. This answer to a similar question provides one solution but it is highly wasteful in terms of random bits consumed.

For example, if the random range of the source is low, then the chances of having to generate a second value from the source can be quite high. Alternative, using a larger source range is inherently wasteful too.

While I'm sure an optimal source range size could be derived, that doesn't address the question that there may be a better algorithm, rather than optimizing this one.

[EDIT] The my idea has been shown in the answers to produce biased results.

An approach that occurs to me is

  1. Consume the minimum number of bits necessary to cover the desired range.
  2. If that value is outside the desired range only throw away one bit and consume one more.
  3. Repeat as necessary.
Community
  • 1
  • 1
Eoin
  • 1,568
  • 9
  • 19
  • The man page for `rand()` suggests something along these lines: `min + (int) (((double) rand() / (RAND_MAX + 1.0)) * (max - min + 1))`. The idea is to scale the output from `rand()` into a fraction in the range `[0, 1)`, then multiply by your range and add your minimum to get a number in the range `[min, max)`. There are probably ways to do it without resorting to slower floating point operations, but whether you want to use them will depend on your performance expectations and how complex you want to get... – twalberg Dec 27 '13 at 17:38
  • 1
    Your approach would lead to a non-uniform distribution of outputs. – Henry Dec 27 '13 at 18:30
  • `rand()` returns a fixed number of random bits, are you suggesting you would only use some of them and save the rest for later? That seems much more cumbersome than necessary, since the bits are "free". If you are worried about exhausting the bit stream (which really means you have started to repeat the "random" sequence) then you should consider a better (real?) random source. – Dwayne Towell Dec 27 '13 at 18:38
  • @twalberg, if you had read the question and the links therein you would realize the flaw in that approach. It works when a small bias in your random numbers is acceptable, which covers many practical applications. It's not universal. – Mark Ransom Dec 27 '13 at 19:02
  • @DwayneTowell For my use case the random bits are relatively expensive to produce. Think along the lines of a source like random.org. – Eoin Dec 28 '13 at 15:25
  • possible duplicate of [How to efficiently convert a few bytes into an integer between a range?](http://stackoverflow.com/questions/13323790/how-to-efficiently-convert-a-few-bytes-into-an-integer-between-a-range) – Lior Kogan Dec 28 '13 at 16:30
  • @LiorKogan, not a duplicate. See my answer for how this question differs from the others. – Mark Ransom Dec 28 '13 at 20:26
  • https://stackoverflow.com/a/17749722/2417578 contains an example bit-saving algorithm and a link to a paper analysing different saving algorithms against different costs of RNG source. Should have everything you need. – sh1 Aug 09 '19 at 03:28
  • Does this answer your question? [How to generate a random integer in the range \[0,n\] from a stream of random bits without wasting bits?](https://stackoverflow.com/questions/6046918/how-to-generate-a-random-integer-in-the-range-0-n-from-a-stream-of-random-bits) – Peter O. Jul 17 '20 at 05:07

2 Answers2

5

The common approach to eliminating bias is to throw away numbers that are outside of the desired range. As noted, this is wasteful. It's possible to minimize the waste by starting with a larger number of bits and generating multiple random numbers at the same time; you can achieve a better match between the range of inputs to outputs.

For example take a roll of a die. The output has 6 possibilities. The naive approach would take 3 random bits for each random number produced. The first example demonstrates the pigeon-hole problem.

def pigeon_die(total_bit_count):
    for i in xrange(total_bit_count // 3):
        bits = random.getrandbits(3)
        yield 1 + bits * 6 // 8

1 : 832855
2 : 417835
3 : 416012
4 : 833888
5 : 416189
6 : 416554
total 3333333
max/min 2.00448063998

The second example is the wasteful approach commonly used. You can see that it generates fewer random number from the same number of random bits, but the bias is eliminated.

def wasteful_die(total_bit_count):
    for i in xrange(total_bit_count // 3):
        bits = random.getrandbits(3)
        if bits < 6:
            yield 1 + bits

1 : 417043
2 : 415812
3 : 417835
4 : 416012
5 : 416645
6 : 417243
total 2500590
max/min 1.00486517946

The final example takes 13 bits at a time and generates 5 random numbers from it. This generates even more numbers than the naive approach!

def optimized_die(total_bit_count):
    for i in xrange(total_bit_count // 13):
        bits = random.getrandbits(13)
        if bits < 6**5:
            for j in range(5):
                yield 1 + bits % 6
                bits //= 6

1 : 608776
2 : 608849
3 : 608387
4 : 608119
5 : 607855
6 : 608559
total 3650545
max/min 1.00163525841

The choice of 13 bits was made by taking the logarithm base 6 of powers of 2 and choosing the one that was closest to an integer.

def waste_list(n):
    for bit in range(1, 31):
        potential = math.log(2**bit, n)
        count = int(potential)
        if count > 0:
            waste = potential - count
            yield waste, bit, count

for waste, bit, count in sorted(waste_list(6)):
    print bit, count, waste
    if bit == 3:
        break

13 5 0.029086494049
26 10 0.0581729880981
8 3 0.0948224578763
21 8 0.123908951925
3 1 0.160558421704

As you can see, there are 4 choices better than the simple 3 bits.

Mark Ransom
  • 271,357
  • 39
  • 345
  • 578
  • Sorry for the late reply, but this seems like an excellent approach as I do need to generate multiple values. – Eoin Jan 16 '14 at 12:07
0

I am afraid that your suggested approach is biased.

Suppose the random number generator made numbers from 0 to 255, but you wanted a random number in the range 0 to 254.

If the random number generator produced 255 (1111_1111 in binary), your approach would throw away a single bit and add one more until eventually you would end up with the number 254 (1111_1110 in binary). (If I have understood your approach correctly.)

Therefore your generated numbers would have a probability of 1/256 for every number, except 254 which would have a probability of 1/128.

Peter de Rivaz
  • 30,956
  • 4
  • 41
  • 68
  • You seem to be completely correct. I still wonder if an alternative algorithm exists. – Eoin Dec 28 '13 at 00:21
  • @Eoin, I believe the gold standard is the [Dr Jacques](http://mathforum.org/library/drmath/view/65653.html) method. I spent time stretching my number theory (which is poor) to try to optimise it further, and eventually got in touch with the author to discuss my findings, but together we could not prove any tangible gains attributable to what I'd come up with. Just maximise `N` to maximise efficiency. – sh1 Aug 09 '19 at 03:36