7

I was working on a simulation of the St Petersburg Paradox when I realized my coin-flipping code never recorded any streaks of greater than 15 heads in a row. I ran the simulation 100,000,000 times, which should have resulted in an average of 1526 streaks of heads 16 long.

(0.5^16) x 100,000,000 = 1526

Clearly, something is wrong.

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

int main(int argc, char const *argv[])
{
srand(time(0));

int i, lim = 100000000, streak = 0, maxstreak = 0;
for (i = 0; i < lim; ++i)
{
    if (rand()%2) {
        streak++;
        if (streak > maxstreak) maxstreak = streak;
    }
    else streak = 0;
}

printf("Ran %d times, longest streak of %d\n", lim, maxstreak);
return 0;
}

Returns the following every time:

Ran 100000000 times, longest streak of 15

Thanks for your help!

Edit: running GCC version 4.6.2 on Windows 7 x64. Bit new to programming in general.

Edit 2: thanks for everyone's help! Anyone sticking around, I wonder what about the current implementation would give a limit of 15 heads? How would the rand() function be so interestingly broken to produce this problem?

tpixel
  • 75
  • 6
  • 2
    I ran your program on different machines, and I usually got the result of 25, 26, 27 or so. – Yu Hao Nov 06 '13 at 05:06
  • 3
    Ran and received 24, 26, 24. Suspect a bias in your `rand()`. – chux - Reinstate Monica Nov 06 '13 at 05:10
  • 3
    Ran 4 times, and got 25, 26, 23, 28. What platform and compiler are you using? – WiSaGaN Nov 06 '13 at 05:11
  • 1
    There are many types of RNGs and some are more "random" than others. – technosaurus Nov 06 '13 at 05:16
  • 1
    Curiously RAND_MAX, by definition must be at least 32767. 15 one bits Hmmm. Wonder what OP would get with ` if ((rand()%2) == 0)`? – chux - Reinstate Monica Nov 06 '13 at 05:20
  • On a side note, this is definitely a good question from a new comer. It is interesting, and is presented clearly. Why not give some upvotes? – WiSaGaN Nov 06 '13 at 05:31
  • Perhaps you've enough results already but I tried your code repeatedly with a Mersenne Twister and got 25, 24, 30, 25. Then with a cryptographic RNG I got 27, 24, 26, 24. – Boann Nov 06 '13 at 05:32
  • 1
    @chux: It's possible that the generator only has about 15 bits of internal state, so over 100,000,000 trials you just see the same output sequence repeated many times. – caf Nov 06 '13 at 05:47

3 Answers3

4

Your code is fine - it's your C library's rand() implementation that is apparently sub-par. Possibly there is a correlation among the low-order bits of the output, or possibly the internal state is very small (so that your 100,000,000 trials are actually covering the entire output sequence of the generator many times over).

In the first case (correlated output bits), you could postprocess the generator's output to "whiten" it, but in the second case you will need to plug in a better implementation, for example the Mersenne Twister.

caf
  • 216,678
  • 34
  • 284
  • 434
4

Try choosing different seed values for your random number generator. Though rand() is a pretty good random number generator, it really is a pseudo-random number generator. You might want to read the man pages for rand (man -s3 rand), which clearly state that you should (for some implementations) use higher-order bits than the lower-order bits...

NOTES
   The versions of rand() and srand() in the Linux C Library use the  same
   random number generator as random(3) and srandom(3), so the lower-order
   bits should be as random as the higher-order bits.  However,  on  older
   rand()  implementations,  and  on  current implementations on different
   systems, the lower-order bits are much less  random  than  the  higher-
   order  bits.   Do  not use this function in applications intended to be
   portable when good randomness is needed.  (Use random(3) instead.)

Without knowing more about the system you are running your program on, we cannot know whether that is your problem. But try changing your code to use a different bit than the 2^0 bit.

Running your version works for me,

/coinflipsim 
Ran 100000000 times
head 50006650, streak 27
tail 49993350, streak 25

Here is code that works for me, using a different bit than bit 0,

int main(int argc, char const *argv[])
{
    srand(time(0));

    int i, lim = 100000000;
    int head=0, tail=0;
    int hstreak=0, tstreak=0;
    int hstreakmax=0, tstreakmax=0;
    for (i = 0; i < lim; ++i)
    {
        //if (rand()%2)
        if( rand() & (1<<13) ) //pick a bit, try different bits
        {
            head++;
            if( ++hstreak>hstreakmax) hstreakmax=hstreak;
            tstreak=0;
        }
        else {
            tail++;
            if( ++tstreak>tstreakmax) tstreakmax=tstreak;
            hstreak=0;
        }
    }
    printf("Ran %d times\n",lim);
    printf("head %d, streak %d\n",head,hstreakmax);
    printf("tail %d, streak %d\n",tail,tstreakmax);
    return 0;
}

Changing the rand()%2 line to this and rerun,

        if( rand() & (1<<13) ) //pick a bit, try different bits

Different results,

./coinflipsim 
Ran 100000000 times
head 50001852, streak 25
tail 49998148, streak 28
ChuckCottrill
  • 3,999
  • 2
  • 22
  • 34
  • 1
    Thank you! I'm not so clear on using bit shifts yet, but I'll mess with it and see how it works, cheers. I'm more than willing to go find out how this part works, but of course, any elaboration here would be appreciated. `rand() & (1<<13)` – tpixel Nov 06 '13 at 05:52
  • 2
    The expression (1<<13) sets bit 13 (counting from 0, ordinal 14th bit). The bitwise and '&' picks out only that bit from your int. Try a couple of different bits, and see if you get different results on your system. – ChuckCottrill Nov 06 '13 at 05:54
2

Let X(i) be the event that there is a head on the ith flip. Let E(i) = union { X(j) | i <= j < i + 16 } be the event that a streak of 16 heads starts at i.

Your analysis assumes that the events E(i) are independent. This is not correct. If E(i) does not happen, that greatly decreases the probability of the directly preceding E(i-1), E(i-2), etc from happening.

It is correct to say that E(i) and E(j) are independent iff |i - j| >= 16.

It could be your random number generator isn't great. Since rand() does ultimately produce a deterministic pattern, it's possible for a random generator to produce a pattern that never gives you 16 even (or odd) numbers in a row.

Timothy Shields
  • 65,578
  • 17
  • 107
  • 158