116

How can I easily generate random numbers following a normal distribution in C or C++?

I don't want any use of Boost.

I know that Knuth talks about this at length but I don't have his books at hand right now.

MultiplyByZer0
  • 4,341
  • 3
  • 27
  • 46
Damien
  • 1,303
  • 2
  • 10
  • 7
  • 2
    Duplicate of one or the other of http://stackoverflow.com/questions/75677/converting-a-uniform-distribution-to-a-normal-distribution and http://stackoverflow.com/questions/1109446/c-generate-gaussian-distribution – dmckee --- ex-moderator kitten Feb 24 '10 at 17:48

18 Answers18

95

There are many methods to generate Gaussian-distributed numbers from a regular RNG.

The Box-Muller transform is commonly used. It correctly produces values with a normal distribution. The math is easy. You generate two (uniform) random numbers, and by applying an formula to them, you get two normally distributed random numbers. Return one, and save the other for the next request for a random number.

MultiplyByZer0
  • 4,341
  • 3
  • 27
  • 46
S.Lott
  • 359,791
  • 75
  • 487
  • 757
  • 10
    If you need speed, then the polar method is faster, though. And the Ziggurat algorithm even more (albeit much more complex to write). – Joey Feb 24 '10 at 12:15
  • 2
    found an implementation of the Ziggurat here http://people.sc.fsu.edu/~jburkardt/c_src/ziggurat/ziggurat.html It's quite complete. – dwbrito May 19 '13 at 16:54
  • 27
    Note, C++11 adds [`std::normal_distribution`](http://en.cppreference.com/w/cpp/numeric/random/normal_distribution) which does exactly what you ask for without delving into mathematical details. –  Aug 28 '13 at 09:50
  • 3
    std::normal_distribution is not guaranteed to be consistent across all platforms. I'm doing the tests now, and MSVC provides a different set of values from, for example, Clang. The C++11 engines seem to generate the same sequences (given the same seed), but the C++11 distributions seem to be implemented using different algorithms on different platforms. – Arno Duvenhage Jan 20 '16 at 13:36
51

C++11

C++11 offers std::normal_distribution, which is the way I would go today.

C or older C++

Here are some solutions in order of ascending complexity:

  1. Add 12 uniform random numbers from 0 to 1 and subtract 6. This will match mean and standard deviation of a normal variable. An obvious drawback is that the range is limited to ±6 – unlike a true normal distribution.

  2. The Box-Muller transform. This is listed above, and is relatively simple to implement. If you need very precise samples, however, be aware that the Box-Muller transform combined with some uniform generators suffers from an anomaly called Neave Effect1.

  3. For best precision, I suggest drawing uniforms and applying the inverse cumulative normal distribution to arrive at normally distributed variates. Here is a very good algorithm for inverse cumulative normal distributions.

1. H. R. Neave, “On using the Box-Muller transformation with multiplicative congruential pseudorandom number generators,” Applied Statistics, 22, 92-97, 1973

MultiplyByZer0
  • 4,341
  • 3
  • 27
  • 46
Peter G.
  • 13,888
  • 6
  • 51
  • 75
  • by any chance would you have another link to the pdf on the Neave effect? or the original journal article reference? thank you – pyCthon Nov 02 '11 at 13:30
  • 3
    @stonybrooknick The original reference is added. Cool remark: While googling "box muller neave" to find the reference, this very stackoverflow question came up on the first result page! – Peter G. Nov 02 '11 at 17:18
  • 1
    yeah its not every well known outside certain small communities and interest groups – pyCthon Nov 04 '11 at 18:55
  • @Peter G. Why would anyone downvote your answer? - possibly the same person did my comment below too, which I'm fine with, but I thought your answer was very good. It would be good if SO made downvotes force a real comment..I suspect most downvotes of old topics are just frivolous and trolly. – Pete855217 Nov 18 '13 at 23:35
  • "Add 12 uniform numbers from 0-1 and subtract 6." -- distribution of this variable will have normal distribution? Can you provide a link with derivation, because during derivation central limit theorem, n->+inf is very need assumption. – bruziuz Jan 10 '16 at 22:05
  • @bruzias, adding 12 uniform numbers from 0 to 1 and subtracting isn't normally distributed, it only results in the same mean (i.e. 0) and standard deviation (i.e. sqrt( 12*1/12) = 1, because the SD of uniform distribution from 0 to 1 is 1/12 and variances of independent variables add). – Peter G. Jan 11 '16 at 14:59
  • @Peter what do you think about method that I descrived above (with "-3") score above? – bruziuz Oct 18 '18 at 11:36
  • The method you described in 2016 is as far as I see it the Box-Muller Transform. The Box-Muller Transform was given as an answer already in 2010. So, I guess people downvoted because 6 years later, you answered a question which already had been answered with your solution. – Peter G. Oct 18 '18 at 11:54
31

A quick and easy method is just to sum a number of evenly distributed random numbers and take their average. See the Central Limit Theorem for a full explanation of why this works.

Paul R
  • 195,989
  • 32
  • 353
  • 519
  • +1 Very interesting approach. Is it verified to really give normally distributed sub ensembles for smaller groups? – Morlock Feb 24 '10 at 12:53
  • 4
    @Morlock The larger the number of samples you average the closer you get to a Gaussian distribution. If your application has strict requirements for the accuracy of the distribution then you might be better off using something more rigorous, like Box-Muller, but for many applications, e.g. generating white noise for audio applications, you can get away with a fairly small number of averaged samples (e.g. 16). – Paul R Feb 24 '10 at 13:25
  • 2
    Plus, how do you parametrize this to get a certain amount of variance, say you want a mean of 10 with a standard deviation of 1? – Morlock Feb 24 '10 at 13:26
  • @Morlock: To get a given mean and SD you scale your underlying samples properly. Generally, the underlying samples, *u*, are uniform over 0 to 1. Make them uniform over -1 to +1 (compute 2 *u* -1). You can then add the desired mean and multiply to get the desired standard deviation. – S.Lott Feb 24 '10 at 13:39
  • @Morlock If you scale your PRNG to give random numbers in the range -1.0 to +1.0 then you get a mean of 0.0 and a variance of 0.5 when you take a sufficiently large average. You can just linearly shift and scale either the input numbers or the output average to get whatever mean and variance you need. – Paul R Feb 24 '10 at 13:39
  • @S.Lott I can easily see how to scale a distribution given a known SD and mean, as you suggest, but it looks a lot less straight forward to do so with a distribution generated using the central limit theorem, given that 3 values will affect the level of stochasticity: the number of 1) original random numbers; 2) values used in the averages; 3) averages done. I guess you must do quite a number of trials (1000s) to parametrize the whole thing before you can be comfortable to scale it appropriately :) Not that it would necessarily be very heavy on computation time. Cheers! – Morlock Feb 24 '10 at 22:19
  • This is a very inefficient method of generating samples from a normal distribution. I would definitely not call it "quick." – Petter Aug 15 '12 at 21:54
  • @Ben: it depends on how many averages you need to take - if you only need approximately normal distribution and can use say N = 16 averaged numbers then it is much more efficient, if less accurate, than methods that use transcendental functions, – Paul R Aug 16 '12 at 19:35
  • @Paul: I would still not call this "quick." "Easy," yes. There are correct algorithms that would beat this even for N = 16. – Petter Sep 11 '12 at 03:23
  • 1
    @Ben: could you point me at an efficient algo for this ? I've only ever used the averaging technique for generating approximately Gaussian noise for audio and image processing with real-time constraints - if there's a way of achieving this in fewer clock cycles then that could be very useful. – Paul R Sep 11 '12 at 04:58
  • @PaulR: The answers below (std::normal_distribution) are generally much better. – Petter Sep 11 '14 at 19:01
  • 1
    @Petter: you're probably right in the general case, for floating point values. There are still application areas like audio though, where you want fast integer (or fixed point) gaussian noise, and accuracy is not too important, where the simple averaging method is more efficient and useful (especially for embedded applications, where there may not even be hardware floating point support). – Paul R Sep 11 '14 at 22:01
  • Would adding more points to the sample decrease standard deviation? You're more likely to pull towards median with more samples. True bell curves don't have upper and lower limits like the random number generator will do, so there are some discrepancies at the very ends of the curve here. – Denise Skidmore Mar 27 '18 at 16:11
  • @DeniseSkidmore: yes, see comments above - this method is far from perfect but it’s still useful for certain cases where the approximation is good enough, e.g. generating noise for audio applications. – Paul R Mar 28 '18 at 06:41
25

I created a C++ open source project for normally distributed random number generation benchmark.

It compares several algorithms, including

  • Central limit theorem method
  • Box-Muller transform
  • Marsaglia polar method
  • Ziggurat algorithm
  • Inverse transform sampling method.
  • cpp11random uses C++11 std::normal_distribution with std::minstd_rand (it is actually Box-Muller transform in clang).

The results of single-precision (float) version on iMac Corei5-3330S@2.70GHz , clang 6.1, 64-bit:

normaldistf

For correctness, the program verifies the mean, standard deviation, skewness and kurtosis of the samples. It was found that CLT method by summing 4, 8 or 16 uniform numbers do not have good kurtosis as the other methods.

Ziggurat algorithm has better performance than the others. However, it does not suitable for SIMD parallelism as it needs table lookup and branches. Box-Muller with SSE2/AVX instruction set is much faster (x1.79, x2.99) than non-SIMD version of ziggurat algorithm.

Therefore, I will suggest using Box-Muller for architecture with SIMD instruction sets, and may be ziggurat otherwise.


P.S. the benchmark uses a simplest LCG PRNG for generating uniform distributed random numbers. So it may not be sufficient for some applications. But the performance comparison should be fair because all implementations uses the same PRNG, so the benchmark mainly tests the performance of the transformation.

Milo Yip
  • 4,512
  • 1
  • 21
  • 26
  • 2
    "But the performance comparison should be fair because all implementations uses the same PRNG" .. Except that BM uses one input RN per output, whereas CLT uses many more, etc... so the time to generate a uniform random # matters. – greggo Nov 14 '17 at 19:53
14

Here's a C++ example, based on some of the references. This is quick and dirty, you are better off not re-inventing and using the boost library.

#include "math.h" // for RAND, and rand
double sampleNormal() {
    double u = ((double) rand() / (RAND_MAX)) * 2 - 1;
    double v = ((double) rand() / (RAND_MAX)) * 2 - 1;
    double r = u * u + v * v;
    if (r == 0 || r > 1) return sampleNormal();
    double c = sqrt(-2 * log(r) / r);
    return u * c;
}

You can use a Q-Q plot to examine the results and see how well it approximates a real normal distribution (rank your samples 1..x, turn the ranks into proportions of total count of x ie. how many samples, get the z-values and plot them. An upwards straight line is the desired result).

Pete855217
  • 1,500
  • 3
  • 23
  • 34
  • 1
    What is sampleNormalManual()? – solvingPuzzles Jul 22 '12 at 00:17
  • @solvingPuzzles - sorry, corrected the code. It's a recursive call. – Pete855217 Jul 26 '12 at 09:16
  • 1
    This is bound to crash at some rare event (showcasing application to your boss rings a bell?). This should be implemented using a loop, not using recursion. The method looks unfamiliar. What is the source / how is it called? – the swine Nov 18 '13 at 17:00
  • Box-Muller transcribed from a java implementation. As I said, it's quick and dirty, feel free to fix it. – Pete855217 Nov 18 '13 at 23:28
  • Seems perfectly reasonable and gives good results. I implemented this in javascript (and tested the distribution). – Octopus Nov 23 '15 at 18:50
  • I found it worked well too Octopus, validating its results with a Q-Q plot. Just be aware it involves a recursive call, the number of calls being subject to rand(), so theoretically it runs for an 'unknown' number of iterations, though I've never had any problems. A better solution is of course to use a library like boost, but if you must have a home-grown, or quick method, it works fine. – Pete855217 Dec 06 '15 at 05:49
  • 1
    FWIW, many compilers will be able to turn that particular recursive call into a 'jump to top of function'. Question is whether you want to count on it :-) Also, probability that it takes > 10 iterations is 1 in 4.8 million. p(>20) is the square of that, etc. – greggo Nov 14 '17 at 20:05
12

This is how you generate the samples on a modern C++ compiler.

#include <random>
...
std::mt19937 generator;
double mean = 0.0;
double stddev  = 1.0;
std::normal_distribution<double> normal(mean, stddev);
cerr << "Normal: " << normal(generator) << endl;
Petter
  • 33,328
  • 6
  • 42
  • 59
12

Use std::tr1::normal_distribution.

The std::tr1 namespace is not a part of boost. It's the namespace that contains the library additions from the C++ Technical Report 1 and is available in up to date Microsoft compilers and gcc, independently of boost.

JoeG
  • 12,484
  • 1
  • 34
  • 62
5

You can use the GSL. Some complete examples are given to demonstrate how to use it.

Denis Arnaud
  • 388
  • 1
  • 4
  • 6
4

Have a look on: http://www.cplusplus.com/reference/random/normal_distribution/. It's the simplest way to produce normal distributions.

telcom
  • 41
  • 1
4

If you're using C++11, you can use std::normal_distribution:

#include <random>

std::default_random_engine generator;
std::normal_distribution<double> distribution(/*mean=*/0.0, /*stddev=*/1.0);

double randomNumber = distribution(generator);

There are many other distributions you can use to transform the output of the random number engine.

Drew Noakes
  • 266,361
  • 143
  • 616
  • 705
  • That's already been mentioned by Ben (http://stackoverflow.com/a/11977979/635608) – Mat Apr 21 '13 at 12:14
3

I've followed the definition of the PDF given in http://www.mathworks.com/help/stats/normal-distribution.html and came up with this:

const double DBL_EPS_COMP = 1 - DBL_EPSILON; // DBL_EPSILON is defined in <limits.h>.
inline double RandU() {
    return DBL_EPSILON + ((double) rand()/RAND_MAX);
}
inline double RandN2(double mu, double sigma) {
    return mu + (rand()%2 ? -1.0 : 1.0)*sigma*pow(-log(DBL_EPS_COMP*RandU()), 0.5);
}
inline double RandN() {
    return RandN2(0, 1.0);
}

It is maybe not the best approach, but it's quite simple.

MJVC
  • 497
  • 4
  • 7
  • -1 Not working for e.g. RANDN2(0.0, d + 1.0). Macros are notorious for this. – Petter Oct 11 '13 at 11:28
  • The macro will fail if `rand()` of `RANDU` returns a zero, since Ln(0) is undefined. – interDist Dec 20 '13 at 19:37
  • Have you actually tried this code? It looks like you have created a function that generates numbers which are [Rayleigh distributed](https://en.wikipedia.org/wiki/Rayleigh_distribution). Compare to the [Box–Muller transform](https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform), where they multiply with `cos(2*pi*rand/RAND_MAX)`, whereas you multiply with `(rand()%2 ? -1.0 : 1.0)`. – HelloGoodbye Mar 04 '14 at 15:01
1

The comp.lang.c FAQ list shares three different ways to easily generate random numbers with a Gaussian distribution.

You may take a look of it: http://c-faq.com/lib/gaussian.html

Delgan
  • 14,714
  • 6
  • 77
  • 119
1

There exists various algorithms for the inverse cumulative normal distribution. The most popular in quantitative finance are tested on http://chasethedevil.github.io/post/monte-carlo-inverse-cumulative-normal-distribution/

In my opinion, there is not much incentive to use something else than algorithm AS241 from Wichura: it is machine precision, reliable and fast. Bottlenecks are rarely in the Gaussian random number generation.

The top answer here advocates for Box-Müller, you should be aware that it has known deficiencies. I quote https://www.sciencedirect.com/science/article/pii/S0895717710005935:

in the literature, Box–Muller is sometimes regarded as slightly inferior, mainly for two reasons. First, if one applies the Box–Muller method to numbers from a bad linear congruential generator, the transformed numbers provide an extremely poor coverage of the space. Plots of transformed numbers with spiraling tails can be found in many books, most notably in the classic book of Ripley, who was probably the first to make this observation"

jherek
  • 131
  • 4
1

Box-Muller implementation:

#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
 // return a uniformly distributed random number
double RandomGenerator()
{
  return ( (double)(rand()) + 1. )/( (double)(RAND_MAX) + 1. );
}
 // return a normally distributed random number
double normalRandom()
{
  double y1=RandomGenerator();
  double y2=RandomGenerator();
  return cos(2*3.14*y2)*sqrt(-2.*log(y1));
}

int main(){
double sigma = 82.;
double Mi = 40.;
  for(int i=0;i<100;i++){
double x = normalRandom()*sigma+Mi;
    cout << " x = " << x << endl;
  }
  return 0;
}
Sysadmin
  • 111
  • 3
0

1) Graphically intuitive way you can generate Gaussian random numbers is by using something similar to the Monte Carlo method. You would generate a random point in a box around the Gaussian curve using your pseudo-random number generator in C. You can calculate if that point is inside or underneath the Gaussian distribution using the equation of the distribution. If that point is inside the Gaussian distribution, then you have got your Gaussian random number as the x value of the point.

This method isn't perfect because technically the Gaussian curve goes on towards infinity, and you couldn't create a box that approaches infinity in the x dimension. But the Guassian curve approaches 0 in the y dimension pretty fast so I wouldn't worry about that. The constraint of the size of your variables in C may be more of a limiting factor to your accuracy.

2) Another way would be to use the Central Limit Theorem which states that when independent random variables are added, they form a normal distribution. Keeping this theorem in mind, you can approximate a Gaussian random number by adding a large amount of independent random variables.

These methods aren't the most practical, but that is to be expected when you don't want to use a preexisting library. Keep in mind this answer is coming from someone with little or no calculus or statistics experience.

Shubham
  • 1,230
  • 1
  • 10
  • 29
dan dan
  • 345
  • 1
  • 2
  • 9
0

Monte Carlo method The most intuitive way to do this would be to use a monte carlo method. Take a suitable range -X, +X. Larger values of X will result in a more accurate normal distribution, but takes longer to converge. a. Choose a random number z between -X to X. b. Keep with a probability of N(z, mean, variance) where N is the gaussian distribution. Drop otherwise and go back to step (a).

Jagat
  • 1,352
  • 2
  • 11
  • 23
-1

Take a look at what I found.

This library uses the Ziggurat algorithm.

dwbrito
  • 4,738
  • 5
  • 28
  • 47
-3

Computer is deterministic device. There is no randomness in calculation. Moreover arithmetic device in CPU can evaluate summ over some finite set of integer numbers (performing evaluation in finite field) and finite set of real rational numbers. And also performed bitwise operations. Math take a deal with more great sets like [0.0, 1.0] with infinite number of points.

You can listen some wire inside of computer with some controller, but would it have uniform distributions? I don't know. But if assumed that it's signal is the the result of accumulate values huge amount of independent random variables then you will receive approximately normal distributed random variable (It was proved in Probability Theory)

There is exist algorithms called - pseudo random generator. As I feeled the purpose of pseudo random generator is to emulate randomness. And the criteria of goodnes is: - the empirical distribution is converged (in some sense - pointwise, uniform, L2) to theoretical - values that you receive from random generator are seemed to be idependent. Of course it's not true from 'real point of view', but we assume it's true.

One of the popular method - you can summ 12 i.r.v with uniform distributions....But to be honest during derivation Central Limit Theorem with helping of Fourier Transform, Taylor Series, it is neededed to have n->+inf assumptions couple times. So for example theoreticaly - Personally I don't undersand how people perform summ of 12 i.r.v. with uniform distribution.

I had probility theory in university. And particulary for me it is just a math question. In university I saw the following model:


double generateUniform(double a, double b)
{
  return uniformGen.generateReal(a, b);
}

double generateRelei(double sigma)
{
  return sigma * sqrt(-2 * log(1.0 - uniformGen.generateReal(0.0, 1.0 -kEps)));
}
double generateNorm(double m, double sigma)
{
  double y2 = generateUniform(0.0, 2 * kPi);
  double y1 = generateRelei(1.0);
  double x1 = y1 * cos(y2);
  return sigma*x1 + m;
}

Such way how todo it was just an example, I guess it exist another ways to implement it.

Provement that it is correct can be found in this book "Moscow, BMSTU, 2004: XVI Probability Theory, Example 6.12, p.246-247" of Krishchenko Alexander Petrovich ISBN 5-7038-2485-0

Unfortunately I don't know about existence of translation of this book into English.

bruziuz
  • 4,177
  • 2
  • 33
  • 37
  • I have several downvotes. Let me know what is bad here? – bruziuz Aug 31 '17 at 11:41
  • 1
    The question is how to generate pseudo random numbers in the computer (I know, language is loose here), it is not a question of mathematical existence. – user2820579 Feb 26 '18 at 19:39
  • Yes, you're right. And the answer is how to generate pseudo random number with normal distribution based on generator which has uniform distribution. Source code have been provided, you can rewrite it in any language. – bruziuz Feb 26 '18 at 22:22
  • Sure, I think the guy es looking for e.g. "Numerical Recipes in C/C++". By the way, just to complement our discussion, the authors of this last book give interesting references for a couple of pseudo-random generators that fulfills standards for being "decent" generators. – user2820579 Feb 27 '18 at 16:50
  • It's time to save this post before it have been deleted from StackOverflow – bruziuz Mar 29 '18 at 13:29
  • 1
    I made backup here: https://sites.google.com/site/burlachenkok/download – bruziuz Mar 29 '18 at 13:35