36

I am writing a computer program that involves generating 4 random numbers, a, b, c, and d, the sum of which should equal 100.

Here is the method I first came up with to achieve that goal, in pseudocode:

Generate a random number out of 100. (Let's say it generates 16).
Assign this value as the first number, so a = 16.
Take away a from 100, which gives 84.

Generate a random number out of 84. (Let's say it generates 21).
Assign this value as the second number, so b = 21.
Take away b from 84, which gives 63.

Generate a random number out of 63. (Let's say it generates 40).
Assign this value as the third number, so c = 40.
Take away c from 63, which gives 23.

Assign the remainder as the fourth number, so d = 23.

However, for some reason I have a funny feeling about this method. Am I truly generating four random numbers that sum to 100 here? Would this be equivalent to me generating four random numbers out of 100 over and over again, and only accepting when the sum is 100? Or am I creating some sort of bias by picking a random number out of 100, and then a random number out of the remainder, and so on? Thanks.

ajq88
  • 363
  • 1
  • 3
  • 7
  • I guess we are speaking of non-negative (zero included?) integer numbers, no? And "a random number out of 100" means a number in [0...100] with uniform probability? – leonbloy May 10 '15 at 21:12
  • Yes, non negative numbers, from 0 to 100, with uniform probability – ajq88 May 10 '15 at 21:13
  • If each of the four numbers have to be uniformly random on the range $[0..100]$, then the sum will be $4 \cdot 50$ on average. But $4 \cdot 50 \neq 100$. So I would say the desired distribution is not well defined. Choosing four integers independently and uniformly random from $[0..100]$ and repeating until the sum is 100 will produce a well defined distribution, the four numbers will be identically distributed, but not uniformly from $[0..100]$. Is that the distribution you are looking for? You did not specifically state that the numbers had to be integers. – kasperd May 11 '15 at 15:08
  • I believe the method suggested by ajq88 would also work and give the same impartiality as the method by @Thomas as long as care is taken to take a random permutation of the resultant quad of 4 numbers (assuming the order matters, bcoz then the concerns about "the window of choice being widest for the first number and progressively reducing for the others thereby indicating bias" are valid) – vharihar Dec 12 '21 at 01:47
  • In posting an Answer to an old Question, you should highlight what new information you are contributing to *answer the original problem*. A careful reading will show you that you haven't answered the problem posed by the OP, namely whether *their proposed algorithm* is unbiased. – hardmath Dec 12 '21 at 02:20

7 Answers7

37

No, this is not a good approach - half the time, the first element will be $50$ or more, which is way too often. Essentially, the odds that the first element is $100$ should not be the same as the odds that the first elements is $10$. There is only one way for $a=100$, but there are loads of ways for $a=10$.

The number of such sums $100=a+b+c+d$ with $a,b,c,d\geq 0$ integers, is: $\binom{100+3}{3}$. If your algorithm doesn't randomly choose from $1$ to some multiple of $103$, you can't get an even probability.

An ideal approach. Let pick a number $x_1$ from $1$ to $103$. Then pick a different number $x_2\neq x_1$ from $1$ to $103$, then pick a third number $x_3\neq x_1,x_2$ from $1$ to $103$.

Then sort these values, so that $x_1<x_2<x_3$. Then set $$a=x_1-1, b=x_2-x_1-1, c=x_3-x_2-1, d=103-x_3.$$

Thomas Andrews
  • 164,948
  • 17
  • 196
  • 376
  • This method would select, say 23, 24. 26, 27 twenty-four times; once for each permutation of the four numbers? – DJohnM May 11 '15 at 02:09
  • Yes. Actually, it will select any specific $a,b,c,d$ exactly $6$ times. Say you want $(a,b,c,d,)=(24,27,26,23)$. Then you need $\{x_1,x_2,x_3\}=\{25,53,70\}$. – Thomas Andrews May 11 '15 at 02:17
  • 1
    I'm not convinced this is "exactly" right, though it is close. I would think about dividing a closed curve of length 100 units into four parts, at randomly chosen points. Let $x_1, x_2, x_3, x_4$ be uniformly distributed integers such that $0 \le x_i < 100$. Sort them so that $x_1 \le x_2 \le x_3 \le x_4$. Then find $a = x_2-x_1$, $b = x_3-x_2$, $c=x_4-x_3$, $d=100+x_1-x_4$. @Thomas Andrews' method is almost the same, except it always chooses 0 as one of the numbers. That feels wrong to me, though I can't "prove" it. – alephzero May 11 '15 at 02:26
  • 1
    It depends on whether you wanted to allow zero. If you want to allow zero, then this is exactly right. I've been using this algorithm for years. Your approach does not return a multiple of $103$ different results, so it cannot give each possibility equally. @alephzero – Thomas Andrews May 11 '15 at 02:28
  • in your case, the probability that $a=0$ is $\frac{1}{100}$. But $100$ is not a divisor of the number of possible tuples $(a,b,c,d)$. So again, that can't be right. @alephzero – Thomas Andrews May 11 '15 at 02:32
  • 3
    I suggest you look at the "stars and bars" argument for why there are $\binom{103}{3}$ tuples $(a,b,c,d)$ such that $a+b+c+d=100$. @alephzero http://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29 – Thomas Andrews May 11 '15 at 02:33
  • $(a,b,c,d)$ comes from $\{x_1,x_2,x_3\}=\{a+1,a+b+2,a+b+c+3\}$. So you get every $(a,b,c,d)$ from an unordered triple exactly once. @alephzero – Thomas Andrews May 11 '15 at 02:38
  • @Thomas What if I sample 'a' between 0 and (103,3) uniformly to make probability of a 1/(103,3). Then I pick the 3 numbers between (103,3) and (103,3)+103 and do your thing – Gaurav Fotedar May 16 '19 at 22:52
8

Generate four random numbers between $0$ and $1$

Add these four numbers; then divide each of the four numbers by the sum, multiply by $100$, and round to the nearest integer.

Check that the four integers add to $100$ (they will, two thirds of the time). If they don't (rounding errors), try again...

DJohnM
  • 3,482
  • 1
  • 11
  • 17
  • 2
    My question is not about alternative solutions to the problem but about the behaviour of the original method. – ajq88 May 11 '15 at 03:20
  • 3
    I don't think this method produce uniform results. My calculations on a simplified example with 2 numbers that have to sum to 2 gives me these probabilities using your method: $P((0, 2)) = \frac 1 4$, $P((2, 0)) = \frac 1 4$, $P((1, 1)) = \frac 1 3$, $P(retry) = \frac 1 6$. – kasperd May 11 '15 at 14:49
  • @kasperd: How do you get the retry? If two numbers add to $2$, the rounded integers also add to $2$, because the change in rounding happens at the same point for both numbers, at $(0.5,1.5)$ or $(1.5,0.5)$. – joriki Dec 21 '19 at 10:25
3

Your question mention an inefficient algorithm generating four independent and uniformly distributed numbers among the integers from 0 to 100 and repeating until their sum is 100. I'll assume you are satisfied with the distribution generated by that algorithm, but you are not satisfied with the performance.

Before looking into how to produce the distribution more efficiently, one first has to understand what the distribution looks like.

By construction it is easy to see that each of $a$, $b$, $c$, and $d$ are identically distributed. It is also easy to see that they are not independent due to their sum being constant. What we already know about their distribution is that it has minimum value 0, maximum value 100, and average value 25. The average follows from the fact that their sum has to be 100 on average.

This rules out a uniform distribution of the individual numbers (and in fact it rules out every symmetrical distribution). This means your more efficient algorithm, which generates $a$ uniformly will produce a different distribution.

Towards an efficient algorithm

If we define $X = a+b$ and ask what the distribution of $X$ looks like, we will find something interesting. The distribution clearly doesn't depend on which pair of the four numbers we summed. So all six possible pairs are identically distributed, but not independent. This distribution has minimum 0, maximum 100, and average 50. And the distribution has to be symmetrical because $X$ and $100-X$ are identically distributed.

It is not immediately obvious if the distribution of $X$ is uniform across the integers form 0 to 100. However if the distribution of $X$ can be generated efficiently, then the distribution of all four numbers can be generated efficiently as follows:

  • Generate $X$
  • Choose $a$ uniformly random in the range $0$ to $X$
  • Let $b := X-a$
  • Choose $c$ uniformly random in the range $0$ to $100-X$
  • Let $d := 100-X-b$

The distribution of X

The original algorithm would produce $X$ as the sum of two uniformly random numbers in the range $0$ to $100$, but discard any results where the overall sum was different form $100$.

A different algorithm could generate $X$ and $Y$ according to this distribution and discard the result if $X+Y \neq 100$. This is useful because the generation of $X$ and $Y$ can be simplified.

If $X$ is larger than 100 it can be discarded immediately. We easily analyze what the new distribution before we verify the sum of $X$ and $Y$ will be. The initial probability of an outcome $x \in [0;100]$ would be $\frac{1+x}{10000}$, but when we discard values larger than 100, the probability will be $\frac{1+x}{5050}$.

The probability of immediately generating $X=x$ and $Y=100-x$ can then be computed as $\frac{1+x}{5050} \cdot \frac{1+(100-x)}{5050} = \frac{(1+x)(101-x)}{5050^2}$ The probability of $P(X=x \wedge Y=100-x)$ can then be computed by simply scaling the denominator such that the sum will be $1$

At this point it is clear that $X$ isn't uniformly distributed. But it also gives us a way to construct $X$ directly.

In order to generate the distribution of $X$ directly, we need a formula for $P(X \leq x)$. This formula will be:

$$P(X \leq x) = \frac{\Sigma_{i=0}^x (1+x)(101-x)}k = \frac{-2x^3 + 297x^2 + 905x + 606}{6k}$$

Because we know that $P(X \leq 100) = 1$, we can deduce that $k=176851$.

With this the algorithm becomes:

  • Choose $r$ uniformly random from the integers $[0;176850]$
  • Take smallest $x$ such that $\Sigma_{i=0}^x (1+x)(101-x) \geq r$
  • Choose $a$ uniformly random in the range $0$ to $x$
  • Let $b := x-a$
  • Choose $c$ uniformly random in the range $0$ to $100-x$
  • Let $d := 100-x-b$
kasperd
  • 405
  • 1
  • 6
  • 16
0

there may be a need for slightly more precise specification of what kind of sample you want. but to begin with you may feel less uneasy if you sample by picking three numbers at random in $[0,100] \cap \mathbb{Z}$, let us call them $a,b,c$ supposing you have ordered them so that $0 \le a \le b \le c \le 100$

now set: $$ x_1 = a \\ x_2 = b-a \\ x_3 = c-b \\ x_4 = 100-c \\ $$ now you have $$ \sum_{k=1}^4 x_k = 100-c+c-b+b-a+a =100 $$

David Holden
  • 17,608
  • 2
  • 18
  • 34
  • 4
    This method produces a slightly uneven distribution for integers: e.g. $0+0+0+100$ is less likely than $0+0+100+0$ when they should be just as likely. Or try it for four non-negative integers adding up to $1$ to illustrate the issue. This method would work if we were dealing with real numbers in $[0,100]$ – Henry May 10 '15 at 22:39
0

Are computational constraints really an issue? Do you intend to scale this up to higher numbers? Does this method need to be achieved using physical dice, a dice rolling program, Excel, or a programming language?

As Thomas Andrews points out, using that method will bias towards something like 50/25/12/12 compared to 25/25/25/25.

Why not just roll 4 dice between 0-100, and check if they sum to 100? If they do, keep it, otherwise roll again. Below Java code was tested with 4 numbers up to 1,000,000 and returned results within ~3 seconds. For larger numbers, you will need to be smarter.

public class RollerMain {

    public static void main(String[] args) {
        while (true) {
            int firstNumber = (int)((Math.random())*101);
            int secondNumber = (int)((Math.random())*101);
            int thirdNumber= (int)((Math.random())*101);
            int fourthNumber= (int)((Math.random())*101);
            if (firstNumber + secondNumber + thirdNumber + fourthNumber == 100){
                System.out.println("first number = "+firstNumber);
                System.out.println("second number = "+secondNumber );
                System.out.println("third number = "+thirdNumber );
                System.out.println("fourth number = "+fourthNumber);
                break;
            }
        }
    }
}
Scott
  • 129
  • 4
  • 1
    I described this method in the original description. The program is a genetic algorithm where yes, efficiency is important. However my question is regarding the behaviour of that particular method as it really got me thinking, rather than seeking an alternative solution :) – ajq88 May 11 '15 at 03:17
0

The problem is that $x_2$ is dependent on $x_1$, then $x_3$ is dependent on $x_2,x_1$ and so on. Right from the start, once you generate the first number, the rest will "often" have far smaller window of random range. Half of the time $x_1$ is going to be over 50, then half of the time $x_2$ is going to be over 75 and so on. I think generating all of them from 0 to 100 and then normalizing them would work fairly well.

William Ambrose
  • 414
  • 2
  • 12
Gabeee
  • 1
  • 1
    Welcome to MSE. For some basic information about writing mathematics at this site see, *e.g.*, [basic help on mathjax notation](/help/notation), [mathjax tutorial and quick reference](//math.meta.stackexchange.com/q/5020), [main meta site math tutorial](//meta.stackexchange.com/a/70559) and [equation editing how-to](//math.meta.stackexchange.com/q/1773). – José Carlos Santos Dec 21 '19 at 09:37
0

You can do this as follows:

import numpy as np
def rd(n, total_sum):
    nums = np.random.rand(n)
    return nums/np.sum(nums)*total_sum

n = rd(4,100)
print(n)
print(n.sum())

sample output:

[31.4994136  32.28805096  2.94863839 33.26389705]
100.0