I have a very simple simulation program, the sequence is:

  • Create an array of 400k elements
  • Use a PRNG to pick an index, and mark the element (repeat 400k times)
  • Count number of marked elements.

An element may be picked more than once, but counted as only one "marked element".

The PRNG is properly seeded. No matter how many times I run the simulation, I always end up getting around 63% (252k) marked elements.

What is the math behind this? Or was there a fault in my PRNG?

  • 62,206
  • 36
  • 276
  • 489
  • 473
  • 4
  • 6
  • 29
    If $N$ is the total number of entries in your array, then the probability for a given entry to come up in a single trial is $1/N$. So the probability of not coming up in a single trial is $1-(1/N)$. Therefore the probability of it not coming up in $N$ independent trials is $$\left(1-\frac1N\right)^N\approx\frac1e$$ as stated in Peter's (+1) answer. Conclusion: If you got something other than this about 63%, THEN you would have reason to suspect the PRNG. Looks like it passed this test :-) – Jyrki Lahtonen Jul 21 '14 at 16:02
  • 2
    @JyrkiLahtonen Answers should be posted as answers, not comments! – David Richerby Jul 21 '14 at 20:32
  • 1
    @DavidRicherby, I don't think my explanation added much to Peter's answer. – Jyrki Lahtonen Jul 21 '14 at 20:37
  • 1
    If you are concerned about the quality of your PRNG, tests are available to determine if it behaves as a true RNG which you may consider running. – Thorbjørn Ravn Andersen Jul 22 '14 at 12:07
  • @JyrkiLahtonen Does it approach 1/e as N -> infty? – Cruncher Jul 22 '14 at 17:26
  • While I agree that your results are within established parameters, I would suggests running the test several times (thousands?) while keeping a running total for each 'bin.' Then look at the distribution of the count values obtained. The point would be to ensure that your test is not always selecting the same 252k 'bins.' I would expect you should see some results that were only picked 1 or a few times. If I understand probability correctly, if you run your test enough times, theoretically, every outcome should occur at least once. If not, it's possible your PRNG is faulty. – MrWonderful Jul 22 '14 at 20:42
  • Similar question to [this](http://math.stackexchange.com/questions/637664/why-does-this-not-seem-to-be-random). –  Jul 04 '16 at 10:52

3 Answers3


No, your program is correct. The probability that a particular element is not marked at all, is about $\frac{1}{e}$. This comes from the poisson-distribution which is a very well approximation for large samples (400k is very large). So $1-\frac{1}{e}$ is the fraction of marked elements.

  • 78,494
  • 15
  • 63
  • 194
  • Because of the very large sample, there is not much space for the actual outcome. It will always be very near $0.63$ – Peter Jul 21 '14 at 15:59
  • Thanks very much guys, after reading more materials on "poisson-distribution", I think I got the idea. – Howard Jul 21 '14 at 16:04
  • 4
    It does not follow rigorously from the fact that any particular element has a probability $\exp(-1)$ of not being marked that the fraction of marked elements is $1-\exp(-1)$, since the probabilities considered are not independent (as extreme case, if no other element than $x$ is marked, then it is certain that $x$ is marked). I'm not saying the conclusion is wrong (the dependencies are probably quite weak, and the experiment seems to confirm it) but the argument is not quite complete. – Marc van Leeuwen Jul 21 '14 at 21:31
  • 1
    @Marc This answer actually claims more than just P(unmarked). It talks about the entire distribution of the number of marks. So, it actually answers more than the question asks, but without proof. – PA6OTA Jul 22 '14 at 14:10

Let $X_k\in \{0, 1\}$ indicate if entry $k$ is unmarked (in which case $X_k=1$). Then the expected number of unmarked items $X$ in an array of $N$ is $$\mathbb{E}(X) = \mathbb{E}\left(\sum_{k=1}^N X_k\right) = \sum_{k=1}^N\mathbb{E}(X_k) = N \, \left(1-\frac{1}{N}\right)^N \approx N \, e^{-1}.$$

The expected number of marked items is therefore approximately $N \, (1-e^{-1})$ or $N \cdot 0.63212\cdots$ which matches your observations quite well. To make the approximation more precise one can show that $$ N\,e^{-1} -\frac{1}{2e(1-1/N)}< N\left(1-\frac{1}{N}\right)^N < N\,e^{-1} -\frac{1}{2e}$$ for all $N\geq 2$.

  • 30,451
  • 2
  • 41
  • 79
  • 2
    This is a much more satisfactory answer to me as it provides a nice calculation of the expectation. – heropup Jul 21 '14 at 16:59
  • 2
    An minor but possibly interesting question to ask: is this result related in any way to the limiting probability that a randomly selected permutation on $n$ elements is not a derangement? – heropup Jul 21 '14 at 20:01
  • 1
    It might be helpful to add another couple steps to your derivation. The probability of each slot being marked by one item is 1/N, and the probability of not being marked by one item is 1-(1/N). The probability of each slot being marked by any of N independent items is thus (1-(1/N))^N, and since there are N slots with that probability, the expected total is thus N * (1-(1/N))^N. – supercat Jul 21 '14 at 22:20
  • I agree with the conclusion but not the calculation- these are *not* independent probabilities! So you can't just add them this way. If $X_1=1$ it (very slightly) increases the chance that $X_2=0$ (and generally). – Richard Rast Jul 22 '14 at 19:48
  • 3
    @RichardRast Expectation is [linear](http://en.m.wikipedia.org/wiki/Expected_value#Linearity) even for dependent random variables. – WimC Jul 22 '14 at 20:10
  • Well, that's news to me. Kind of embarrassing actually. – Richard Rast Jul 22 '14 at 23:43

This problem was recently solved in a slightly more general form using the concept of throwing $m$ balls into $n$ boxes:

We throwing $m$ balls to $n$ cells....

Consider $n$ boxes (the fixed list in our case). We now select $m$ items from the list at random (with replacement) -- or by throwing $m$ balls into $n$ boxes. That problem found the expected fraction of non-empty or marked boxes as $1-(1-1/n)^m$ using the same approach as @WimC. So if we made $m=800k$ choices from the list of $n=400k$ items then the expected fraction of the $n$ items that are marked would be $1-(1-\frac {m/n}{m})^m\approx 1-e^{-m/n}=1-e^{-2}.$

  • 2,767
  • 8
  • 10