76

Abraham de Moivre, when he came up with this formula, had to assure that the points of inflection were exactly one standard deviation away from the center, and so that it was bell-shaped, as well as make sure that the area under the curve was exactly equal to one.

And somehow they came up with the standard normal distribution, which is as follows:

$$\displaystyle\phi(x) = \frac{1}{\sqrt{2\pi}}e^{-\dfrac{1}{2}x^2}$$

And even cooler, he found the distribution for when the mean was not $0$ and the standard deviation was not $1$, and came up with:

$$\displaystyle f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\dfrac{(x - \mu)^2}{2\sigma^2}}$$

And so what I ask is, how? How was an equation come up with that fit all the aforementioned criteria? Moreover, how do the numbers $\pi$ and $e$ come into this?

A.D
  • 5,914
  • 1
  • 16
  • 42
Cisplatin
  • 4,297
  • 7
  • 31
  • 54
  • 8
    For your last question: $\pi$ appears as a normalizing factor. It's a typically exercise in multi-variable calculus to integrate $e^{x^2}$ over the whole real line line, and this integral is similar. As well, the $e$ is not surprising: if we want our function to have finite definite integral over the real line (yet not have compact support!), exponential decay is the easiest way for this to occur. If you use an exponential, it is almost always $e$. – awwalker May 07 '13 at 21:04
  • 4
    https://en.wikipedia.org/wiki/Normal_distribution#History and references – yoyo May 07 '13 at 21:12
  • It's not difficult to make the integral be $1$ once you have a positive function such that $\int_{\mathbb R} f(x)dx<+\infty$ – Marco Disce Mar 16 '16 at 15:23
  • take a look at this link https://www.sonoma.edu/users/w/wilsonst/papers/Normal/default.html – Clock Slave Jul 18 '16 at 19:21
  • 2
    I really like this article's take on Gauss' proof: https://www.maa.org/sites/default/files/pdf/upload_library/22/Allendoerfer/stahl96.pdf – Chris Aug 11 '16 at 14:10
  • @ClockSlave I'm worried that the integral in that document is incorrect. They seem to get a different result than [Wolfram Alpha](http://www.wolframalpha.com/input/?i=df(x)+%2Fdx+%3D+-k+(x-m)+f(x)) which worries me. – Pro Q Jun 11 '18 at 16:39
  • @Chris I think a summary of the proof would make for a great answer. Thank you for the link! – Pro Q Jun 11 '18 at 20:48

6 Answers6

71

Suppose I throw a dart into a dartboard. I aim at the centre of the board $(0,0)$ but I'm not all that good with darts so the dart lands in a random position $(X,Y)$ which has a joint density function $f:\mathbb R^2\to\mathbb R^+$.

Let's make two assumptions about the way I play darts.

1.$\qquad$ The density is rotationally invariant so the distribution of where my dart lands only depends on the distance of the dart to the centre.

2.$\qquad$ The random variables $X$ and $Y$ are independent, how much I miss left and right makes no difference to the distribution of how much I miss up and down.

So by assumption one and Pythagoras I must be able to express the density $$f(x,y) = g(x^2 + y^2).$$

Now as the random variables $X$ and $Y$ are independent and identically distributed I must be able to express $$f(x,y) \propto f(x,0) f(0,y)$$ Combining these assumptions we get that for every pair $(x,y)$ we have $$g(x^2 + y^2) \propto g(x^2)g(y^2).$$

This means that $g$ must be an exponential function $$g(t) = A e^{-Bt}$$

So A will be some normalising constant. B somehow reflects the units I'm measuring in. (So if I measure the distance in cm B will be 10 times as big as if I measured in mm). $B$ must be negative because the density should be a decreasing function of distance (I'm not that bad at darts.)

So to work out $A$ I need to integrate $f(\cdot,\cdot)$ over $\mathbb R^2$ a quick change of coordinates and $$\iint_{\mathbb R} f(x,y) dxdy = 2\pi\int_0^\infty t g(t) dt = \frac{2\pi}{B^2}. $$ for

So we should set $A = \frac{B^2}{2\pi}$ it's convenient to choose $B$ in terms of the standard deviation, so we set $B = \frac 1{2\sigma}$ and $A = \frac{1}{2\pi\sigma^2}$.

So if I set $\tilde f(x) = \frac 1{\sqrt{2\pi}\sigma} e^{-\frac{x^2}{2\sigma}}$ then $f(x,y) = \tilde f(x) \tilde f(y)$.

So the $e$ comes from the fact I wanted my $X$ and $Y$ coordinates to be independent and the $\pi$ comes from the fact that I wanted rotational invariance so I'm integrating over a circle.

The interesting thing happens if I throw two darts. Suppose I throw my first dart aiming at $(0,0)$ which lands at $(X_1,Y_1)$, I aim my next dart at the first dart, so this one lands at $(X_2,Y_2)$ with $X_2 = X_1 + X$ and $Y_2 = Y_1 + Y$.

So the position of the second dart is the sum of the two errors. But my sum is still rotationally invariant and the variables $X_2$ and $Y_2$ are still independent, so $(X_2,Y_2)$ satisfies my two assumptions.

That means that when I add independent normal distributions together I get another normal distribution.

It's this property that makes it so useful, because if I take the average of a very long sequence of random variables I should get something that's the same shape no matter how long my sequence is and taking a sequence twice as long is like adding the two sequences together. It's this property of the normal distribution that makes it so useful.

PS a factor of two seems to be wrong in my derivation but I have to go to the airport now.

Tim
  • 5,299
  • 20
  • 31
  • 1
    Shouldn't B be $1/2\sigma^2$? – Mateen Ulhaq Jul 16 '15 at 08:49
  • 17
    For the first assumption, why do we not get $f(x,y) = g(\sqrt[]{x^2 + y^2})$? – lsy Jun 29 '16 at 15:53
  • Possibly also of historical interest, Herschel's derivation https://books.google.com/books?id=bCP8DRF3eVYC&lpg=PA10&ots=8KpuULNkLd&dq=herschel%20error%20function%20square&pg=PA10#v=onepage&q=herschel%20error%20function%20square&f=false and how it plausibly inspired Maxwell's first take on the speed distribution of molecules http://www.clerkmaxwellfoundation.org/Herschel23_10_2002.pdf – Ron Burk Jan 08 '17 at 08:47
  • 3
    "So the $e$ comes from the fact I wanted my $X$ and $Y$ coordinates to be independent" Actually that does not forces us to use the number $e$, any base would do: $e^{-Bt} = \alpha^{-Ct}$ with $C = B/\log \alpha$ . To choose $e$ is a mere matter of convenience. – leonbloy Dec 18 '18 at 20:16
  • "A factor of two seems to be wrong in my derivation but I have to go to the airport now." - hey, if it's good enough for the physicists... – Andrew Jan 15 '19 at 14:22
  • 6
    "This means that g must be an exponential function" Why? The exponential has this characteristic but why is it the only one to have it? – Roland Aug 20 '19 at 08:27
  • @Roland search for fundamental multiplicative identity – Roland Aug 23 '19 at 08:09
  • 1
    There is an empirical proof of such derivation, in the form of a little game: https://www.antoniorinaldi.it/en/a-playful-generator-of-the-normal-distribution/ – glassy Sep 20 '19 at 06:56
  • I have thought a lot about this „derivation“ but finally I realized this for me is not derivation of the normal distribution at all. What was shown is, that given distribution with the properties that it is circularly invariant and it splits into X,Y (axes independence), then it follows e^-x² distribution. This „proof“ is there for a long long time and I have seen it elsewhere of mathematics like in physics. Maybe I have missed the point of the derivation – for me, normal distribution is the class of dis. which is invariant under + and * by scalar. Can one from this derive circular invariancy? – Machinato Jan 21 '20 at 13:00
  • The suggestion that B is chosen just by convenience isn't quite a good way to say it. Instead. we can derive B by solving for the variance of f(x) (with A's value added in) using integration by parts. – Learning stats by example Mar 04 '22 at 19:16
16

The Normal distribution came about from approximations of the binomial distribution (de Moivre), from linear regression (Gauss), and from the central limit theorem. The derivation given by Tim relates more closely to the linear regression derivation, where the amount of error is represented by a Normal distribution when errors are assumed symmetric about a mean, and to decrease away from the mean. I used Tim's answer and made it a little more formal.


Theorem: Two identically distributed independent random variables follow a distribution, called the normal distribution, given that their probability density functions (PDFs) are known to be continuous and differentiable, symmetric about a mean, and decrease towards zero away from the mean.

Proof: Let $X$ and $Y$ be identically distributed independent random variables with continuous and differentiable PDFs. It is assumed that the PDFs are even functions, for example $f_X(x) = f_X(-x)$, and $f_X(x) \rightarrow 0 \text{ as } x\rightarrow \pm \infty$.

Their joint PDF, because of their independence, is $f_{XY}(x,y) = f_X(x)f_Y(y)$. Because they are identically distributed and symmetric, only the norm or magnitude of the two variables is unique - that is, $x$ and $y$ can be interchanged with no effect on the final probability. They are identically distributed and symmetric, figuratively related to a circle, as opposed to the unequally distributed oval. Therefore, there must exist a function $g(r)$ such that $$ f_{XY}(x,y) = g(\sqrt{x^2 + y^2}) $$ Which, because $g$ is not yet determined, is equivalent to $$ f_{XY}(x,y) = g(x^2 + y^2). $$

From the definition of the joint distribution, $$ f_X(x)f_Y(y) = g(x^2 + y^2). $$ Which, for $y=0$, gives $$ f_X(x) \propto g(x^2). $$ Assuming $f_Y(0)$ is a constant. Similar argument gives $$ f_Y(y) \propto g(y^2). $$ These last two results are significant, because substitution shows that the product of $g(x^2)g(y^2)$ is proportional to the sum $g(x^2 + y^2)$: $$ g(x^2)g(y^2) \propto g(x^2 + y^2) $$ And it is known from algebra that the only function to have this property is the exponential function (and the natural logarithm).

This is to say, $g(r)$ will be some type of exponential, $$ g(r) = Ae^{Br} .$$ Where $A$ and $B$ are constants yet to be determined. We assume, now, that wherever the expected value is, the probability of error away from this expected value will decrease. That is to say, we expect that the chance of error should be minimum near the expected value, and decrease to zero away from this value. Another way of saying this is that the mean must be the maximum of $g(r)$, and yet another way of saying this is that $g(r)$ must approach $0$ as $r\rightarrow \pm \infty$. In any case, we need the argument to the exponential to be negative. $$ g(r) = Ae^{-Br} $$

Now if we return to our joint PDF, $$ f_{XY}(x,y) = f_X(x)f_Y(y) = g(x^2 + y^2) $$ Here again, we can investigate the PDF of $f_X(x)$ alone by setting $y=0$, $$ f_X(x) \propto g(x^2) = A e^{-Bx^2} $$ Note that the mean of this distribution is $0$. In order to give a mean of $\mu$, this distribution becomes $$ f_X(x) \propto A e^{-B(x-\mu)^2} .$$

The constants are determined from the fact that the integral of the PDF $f_{X}(x)$ must be equal to 1 for the entire domain. That is, the cumulative distribution function (CDF) must approach 1 at the upper limit (probability cannot be >100%). $$ \int_{0}^{\infty} f_{X}(x) dx = 1 $$ The integral of $e^{-Bx^2}$ is $\sqrt{\frac{\pi}{B}}$, thus $$ A \int_0^\infty e^{-Bx^2} dx = A \sqrt{\frac{\pi}{B}} = 1$$ $$ A = \sqrt{\frac{B}{\pi}} $$ The constant $B$ is, for convenience, substituted by $\sigma^2 = \frac{1}{2B}$, so that $A = \frac{1}{\sqrt{2\pi\sigma^2}}$ and $$ f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} .$$ Which is, of course, the common form of what is known as the Normal distribution. Note that the proportional symbol became an equals sign, which is necessary from the assumption that $X$ is a random variable, and all random variables have a PDF which integrates to $1$. This proves the theorem.

One will find that $\sigma^2$ is called the variation, and $\sigma$ is the standard deviation. The parameters $\sigma^2$ and $\mu$, that is, the variation and the mean, may be chosen arbitrarily, and uniquely define the distribution.


It's a beautiful thing - without knowing much about the random variables, other than the fact that they are independent, and come from the same symmetric distribution, we can logically determine the distribution they come from.

To write this answer, I followed the references supplied in comments of the question, the description given by Tim, and various online resources such as Wikipedia. Therefore, the proof may not be rigorous, and may contain errors. The errors are certainly mine alone, due to misinterpretations and/or miscalculations.

Sam Gallagher
  • 491
  • 3
  • 10
  • I don't understand the step that re-interprets $(\sqrt{^2+^2})$ as equivalent to $g(x^2+y^2)$ "because g is not yet determined". Doesn't the square root still have to be a part of g? – Nate Glenn Mar 02 '21 at 21:35
  • @NateGlenn g() is an undetermined function, so we can 'push' the square root into g() instead of keeping it in the argument. It's still there, but moved into the undetermined function. We could call it g'() such that g(sqrt(z)) = g'(z), but it's the same to just relabel 'g'. – Sam Gallagher Mar 04 '21 at 18:29
  • How does from the independency of fX(x) and fY(y) normal follow that fX * fY is circularly symmetric in the first place? – Machinato Mar 04 '21 at 23:03
  • @Machinato Theyre not only independent, but identically distributed, by hypothesis. – Sam Gallagher Mar 05 '21 at 03:28
  • @Sam Gallagher Sure, but this does not explain the circular symmetry either. – Machinato Mar 05 '21 at 15:12
  • @Machinato Maybe you could elaborate more about what's confusing you? If two random variables are identically distributed, then in a plot of X vs Y the curves of equal probability are circles. Is the issue with this statement, or do you think I made an unjustified assumption or conclusion? (Which is perfectly possible) – Sam Gallagher Mar 06 '21 at 03:12
  • @Sam Gallagher Yes, with this statement. The problem is that in the „proof“ is implicitly assumed the property of exponentials that e^(x^2)*e^(y^2) = e^(x^2+y^2). Of course it turns out that the normal distribution is (1) circularly symmetric and (2) the joint probability is independent (i.e. it is given as a product). These two assumptions are however true **only** for **gaussians**, if you for example take two independent identically distributed variables with Cauchy distribution (with 1/(1+x^2)), the equal probability curves will no longer be circles. – Machinato Mar 07 '21 at 07:26
  • @Machinato that's interesting, I didn't know about that sort of counterexample. My thinking is, if X and Y are i.i.d., then $f_X(x) = f_Y(x)$ for all $x$, and $f_{X,Y}(x,y) = f_X(x)*f_Y(y)$. So (my thinking goes) the product is a symmetric function in x and y, and if the distribution wasn't circularly symmetric then we could find a point x,y such that the point y,x has a different probability. But now I wonder if this is sufficient, or even true. I'll look into it. – Sam Gallagher Mar 07 '21 at 11:58
  • I wonder if the fact that the joint pdf is circularly symmetric must be an assumption for the derivation to remain valid, because clearly just because we have f(x)*f(y) for the same function f() does not mean the surface has circular symmetry. But assuming the joint pdf is circularly symmetric, the rest follows. It's been a while since I wrote this so I need to check my sources and see what's usually done – Sam Gallagher Mar 07 '21 at 13:18
  • @SamGallagher If the square root has been pushed into the `g()`, why isn't it there in the end? Shouldn't `g()` be an exponential with a square root instead of just an exponential? – Nate Glenn Mar 08 '21 at 15:40
  • @NateGlenn It's worth noting first that the function $g$ doesn't matter, it can be anything as long as it obeys the requirements we set. It's used to derive the function $f_X(x)$, and I've been free to change it in the derivation. If you're hung up on it though, you're right, but look at the exponential's argument: $-\frac{(x-\mu)^2}{2\sigma^2}$. The $\frac{1}{2}$ term is there, and you could pull it out as a square root, but that's not a useful representation. – Sam Gallagher Mar 09 '21 at 15:33
5

It is very old questions. But still, there is a very interesting link where you can find the derivation of density function of Normal distribution. This will help in understanding the construction of probability density function of Normal distribution in a more lucid way.

Pro Q
  • 785
  • 6
  • 17
Pankaj
  • 193
  • 1
  • 4
  • Answers are best when they contain at least a summary of the information in the link; enough so that, should the link ever become broken, there is still useful information. This would probably be better as a comment -- even though I realize you do not have sufficient reputation to comment. – pjs36 Apr 21 '16 at 19:08
  • 4
    Dear pjs36, Due to low reputation I could not comment. I had to answer. – Pankaj Apr 23 '16 at 09:02
  • note the link is also now dead – tdc May 24 '18 at 17:21
  • 1
    I edited to link back to the Wayback Machine version, but it looks like the math symbols on that version are messed up. It might help for intuition, or if someone feels like decoding the symbols. – Pro Q Jun 11 '18 at 21:28
  • Link and symbols are working fine. Very useful to have a full PDF of the derivation – Mehdi LAMRANI Aug 22 '19 at 15:12
5

Let me present you a derivation which does not use the circular assumption presented in the "dart" proof and uses only the property of the Central limit theorem - i.e. that for any suitable set of i.i.d. variables $\{ X_1,X_2,\ldots,X_n \}$ with $\mathrm{E}\left[X\right]=\mu$ and $\mathrm{Var}\left[X\right]=\sigma^2$, we have

$$Z_n := \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \overset{n\rightarrow\infty}{\longrightarrow} Z \sim N(0,1)$$

where $N(0,1)$ is the so called normal distribution (of the variable $Z$). Central limit theorem provides us the universality, i.e. no matter which set we choose, we will allways get $Z\sim N(0,1)$ - the same distribution. From this universality only we will find the formula for the distribution function $f(z)$ of the standard normally distributed $Z$. Any other normal variable $X \sim N(\mu,\sigma)$ is get from $Z$ by scalling and its density function must be $f\left((x-\mu)/\sigma\right)/\sigma$.

1) Stirling approximation

Without proof (can be found elsewhere) we state that for large $n$ we have

$$n! \approx \sqrt{2\pi n}\left(\frac{n}{e}\right)^n$$

2) Poisson distribution

The key to derive the normal distribution density function is to choose some particular set of i.i.d. for which we can find the density function of its sum (average $\bar{X}$ is that sum divided by $n$), i.e. to choose $X_1$. There are multiple choices for such choice, in many derivation of normal distribution function it is common to choose $X_1 \sim \mathrm{Ber}(p)$ Bernoulli, so the sum $S_n = X_1 + X_2 + \ldots + X_n \sim \mathrm{Bin}(n,p)$ is Binomial. In our approach, we set $X_1 \sim \mathrm{Po}(1)$. Random variable $X$ with a Poisson distribution $\mathrm{Po}(\lambda)$ has the known discrete distribution function:

$$\mathbb{P}\left[X = k\right]=\frac{\lambda^k}{k!}e^{-\lambda}, \qquad k=0,1,2,\ldots$$

Moreover $\mathrm{Var}\left[X\right] = \lambda$. For $X_1 \sim \mathrm{Po}(1)$ we have $\mu = \lambda = 1$ and $\sigma = \sqrt{\lambda} = 1$.

Poisson distribution is the distribution of the number $X$ of uniformly distributed events in a given finite time interval with a given $\lambda = \mathrm{E}\left[X\right]$ average number of events.

We have chosed the Poisson distrubition because of one special property - aditivity. One can think of the sum of two identical (same $\lambda$) independent poisonian $X_1 + X_2$ as the number of the same events it an interval with twice the duration of the previous one. Since

$$\mathrm{E}\left[X_1+X_2\right]=\mathrm{E}\left[X_1\right]+\mathrm{E}\left[X_2\right]=1 + 1 = 2$$

we immediately get (using the "events" description of Poisson distribution), that also

$$S_n = X_1 + X_2 + \ldots + X_n \sim \mathrm{Po}(n)$$

This distribution is discrete with again the step equal to $1$. However, on the left hand side of CLT we get

$$Z_n = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} = \frac{S_n - n}{\sqrt{n}}$$

Clearly, $S_n$ is reduced by $n$ and then divided by $\sqrt{n}$. This division causes the $\mathrm{Po}(n)$ in the limit of $n\rightarrow\infty$ be continuous, since now the step is $1/\sqrt{n}$ and this covers eventually all the real numbers.

Consider now the limit of large $n$. Let $z$ be a real number. We can write $z=(k-n)/\sqrt{n}$ (for large $k$ and $n$ natural, in a sense of being close to original real $z$). Then, using the definition of (continuous) distribution function

$$f(z)\approx\frac{1}{1/\sqrt{n}}\left(\mathbb{P}\left[Z_n < z + \frac{0.5}{\sqrt{n}}\right]-\mathbb{P}\left[Z_n < z - \frac{0.5}{\sqrt{n}}\right]\right)=\sqrt{n}\,\mathbb{P}\left[Z_n = z\right]$$

since there is only one point $z$ in the interval $(z-0.5/\sqrt{n},z+0.5\sqrt{n})$. But since

$$\mathbb{P}\left[Z_n = z\right] = \mathbb{P}\left[\frac{S_n - n}{\sqrt{n}} = \frac{k - n}{\sqrt{n}}\right]=\mathbb{P}\left[S_n = k\right]=\frac{n^k}{k!}e^{-n}$$

In the limit $n\rightarrow\infty$ with $k = n + z\sqrt{n}$ we then have explicitely

$$f(z) = \lim_{n\rightarrow \infty} \sqrt{n}\frac{n^{n + z\sqrt{n}}}{(n + z\sqrt{n})!}e^{-n}$$

Using Stirling's approximation

$$f(z) = \lim_{n\rightarrow \infty} \sqrt{n}\frac{e^{n + z\sqrt{n}}}{\sqrt{2\pi (n + z\sqrt{n})}}\left(\frac{n + z\sqrt{n}}{n}\right)^{-n - z\sqrt{n}}e^{-n} = \frac{1}{\sqrt{2\pi}}\lim_{n\rightarrow \infty} e^{z\sqrt{n}}\left(1+\frac{z}{\sqrt{n}}\right)^{-n - z\sqrt{n}}$$

3) Exponential function limit

This can be further simplified using the well known

$$e^x = \lim_{n\rightarrow \infty}\left(1+\frac{x}{n}\right)^n$$

i.e. for example

$$\lim_{n\rightarrow \infty} \left(1+\frac{z}{\sqrt{n}}\right)^{- z\sqrt{n}} = e^{-z^2}$$

therefore

$$f(z) = \frac{e^{-z^2}}{\sqrt{2\pi}}\lim_{n\rightarrow \infty} e^{z\sqrt{n}}\left(1+\frac{z}{\sqrt{n}}\right)^{-n}$$

The last limit is performed by Taylor expansion of logarithm since

$$\lim_{n\rightarrow \infty} e^{z\sqrt{n}}\left(1+\frac{z}{\sqrt{n}}\right)^{-n} = \exp\left(\lim_{n\rightarrow \infty} z\sqrt{n}-n\ln\left(1+\frac{z}{\sqrt{n}}\right)\right) = \exp\left(\lim_{n\rightarrow \infty} z\sqrt{n}-z\sqrt{n}+\frac{z^2}{2}\right)$$

imediately recovering

$$f(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}$$

Machinato
  • 2,328
  • 17
  • 29
5

The Probability Mass Function of the binomial distribution is given by $$ P(X=x)=\binom{n}{x}p^xq^{n-x}=\frac{n!}{(n-x)!.x!}.p^xq^{n-x} $$

$$ \dfrac{P(X=x+1)}{P(X=x)}=\frac{x!.(n-x)!}{(x+1)!.(n-x-1)!}.\frac{p}{q}=\frac{(n-x)}{(x+1)}.\frac{p}{q} $$ Set $t=\dfrac{x-np}{\sqrt{npq}}\implies x=np+t\sqrt{npq}$ $$ \dfrac{P(X=np+t\sqrt{npq}+1)}{P(X=np+t\sqrt{npq})}=\frac{(n-np-t\sqrt{npq})}{(np+t\sqrt{npq}+1)}.\frac{p}{q} $$ $$ LHS=\frac{P\Big(\frac{X-np}{\sqrt{npq}}=t+\frac{1}{\sqrt{npq}}\Big)}{P\Big(\frac{X-np}{\sqrt{npq}}=t\Big)}=\frac{P(T=t+\delta_n)}{P(T=t)} $$ For $n\to\infty\implies \delta_n\to0$ $$ RHS=\frac{(n-np-t\sqrt{npq})}{(np+t\sqrt{npq}+1)}.\frac{p}{q}=\frac{q-t\sqrt{\frac{pq}{n}}}{p+t\sqrt{\frac{pq}{n}}+\frac{1}{n}}.\frac{p}{q}=\frac{1-t\sqrt{\frac{p}{nq}}}{1+t\sqrt{\frac{q}{np}}+\frac{1}{np}}=\frac{1-tp\delta_n}{1+tq\delta_n+q\delta_n^2} \\\implies \frac{P(T=t+\delta_n)}{P(T=t)}=\frac{1-tp\delta_n}{1+tq\delta_n+q\delta_n^2} $$ Assumes existence of a sufficiently smooth probability density function $f(t)$ such that for large $n$, the probability $P(T=t)$ can be approximated by the differential $f(t)dt$. ie., $P(T=t)\approx f(t)dt$ and $P(T=t+\delta_n)\approx f(t+\delta_n)dt\implies$ $\dfrac{P(T=t+\delta_n)}{P(T=t)}\approx\dfrac{f(t+\delta_n)}{f(t)}$ $$ \frac{f(t+\delta_n)}{f(t)}=\frac{1-tp\delta_n}{1+tq\delta_n+q\delta_n^2} $$

$$ \log f(t+\delta_n)-\log f(t)=\log(1-tp\delta_n)-\log(1+tq\delta_n+q\delta_n^2)\\ \lim_{n\to\infty\\\delta_n\to 0}\Big[\frac{\log f(t+\delta_n)}{\delta_n}-\frac{\log f(t)}{\delta_n}\Big]\approx\lim_{n\to\infty\\\delta_n\to 0}\Big[\frac{\log(1-tp\delta_n)}{\delta_n}-\frac{\log(1+tq\delta_n+q\delta_n^2)}{\delta_n}\Big]\\ \frac{d}{dt}\log f(t)=\lim_{\delta_n\to 0}\Big[\frac{-tp}{1-tp\delta_n}\Big]-\lim_{\delta_n\to 0}\Big[\frac{tq+2q\delta_n}{1+tq\delta_n+q\delta_n^2}\Big]=-tp-tq=-t\\\log f(t)=\frac{-t^2}{2}+C\implies \boxed{f(t)=\mathcal{N}e^{{-t^2}/{2}}} $$ normalizing above function gives the normal/gaussian distribution function.

Sooraj S
  • 6,813
  • 3
  • 34
  • 68
2

There is an alternate(not a pure mathematical) derivation of the Gaussian PDF which uses Information Theoretic arguments, the idea there is briefly this: Let X be a continuous r.v., the concept of differential entropy in some sense quantifies the "randomness" of a continuous r.v. and depends only on its PDF, Is there then a PDF for X which has the maximum amount of randomness possible? - the answer turns out to be the Gaussian PDF!, the proof details are in the following link: https://en.wikipedia.org/wiki/Differential_entropy

Also, if you got the time, look up "Digital Communications," by Simon Haykin, pages 43 and 44.

This is one of the reasons why communication engineers are fond of the Gaussian r.v., designing systems with the Gaussian PDF model yields the "worst-case" performance.