$\newcommand{\erf}{\operatorname{erf}}$ This may be a very naïve question, but here goes.

The error function $\erf$ is defined by $$\erf(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2}dt.$$ Of course, it is closely related to the normal cdf $$\Phi(x) = P(N < x) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-t^2/2}dt$$ (where $N \sim N(0,1)$ is a standard normal) by the expression $\erf(x) = 2\Phi(x \sqrt{2})-1$.

My question is:

Why is it natural or useful to define $\erf$ normalized in this way?

I may be biased: as a probabilist, I think much more naturally in terms of $\Phi$. However, anytime I want to compute something, I find that my calculator or math library only provides $\erf$, and I have to go check a textbook or Wikipedia to remember where all the $1$s and $2$s go. Being charitable, I have to assume that $\erf$ was invented for some reason other than to cause me annoyance, so I would like to know what it is. If nothing else, it might help me remember the definition.

Wikipedia says:

The standard normal cdf is used more often in probability and statistics, and the error function is used more often in other branches of mathematics.

So perhaps a practitioner of one of these mysterious "other branches of mathematics" would care to enlighten me.

The most reasonable expression I've found is that $$P(|N| < x) = \erf(x/\sqrt{2}).$$ This at least gets rid of all but one of the apparently spurious constants, but still has a peculiar $\sqrt{2}$ floating around.

Nate Eldredge
  • 90,018
  • 13
  • 119
  • 248
  • I had assumed it was because you can expand both $\erf(x)$ and $\erf^{-1}(x)$ in a Taylor series about $0$, while you can't with $\Phi^{-1}$. I'm not sure about the scaling with $\sqrt{2}$, though. – Mike Spivey May 08 '11 at 21:03
  • What about symmetry: $\text{erf}(x)$ is an odd function... – Fabian May 08 '11 at 21:34
  • @Fabian: But this doesn't explain the strange constant $2/\sqrt{\pi}$ which seems to be the main point of contention. – t.b. May 08 '11 at 21:36
  • 2
    I find $\mathrm{erf}$ being an odd function a convenient property myself; the $2/\sqrt{\pi}$ to have $\lim\limits_{z\to\infty}\mathrm{erf}(z)=1$ is a bit of a nuisance I suppose... I guess I'm in the reverse situation with Nate; I have to dig up Abramowitz and Stegun to remember how the normal distribution CDF is expressed in terms of $\mathrm{erf}$. – J. M. ain't a mathematician May 08 '11 at 21:54
  • I think that it'd actually have been better if the normalizing constant were $\frac{1}{\sqrt{\pi}}$, so that the infinite limits are $\pm \frac{1}{2}$, instead of $\pm 1$. In that way, $\mathrm{erf}(b) - \mathrm{erf}(a)$ would directly give you the probability of $[a, b]$ under the unit normal with no weird scaling - note $\frac{1}{2}$ is the probability of half a normal. – The_Sympathizer Jul 21 '19 at 10:33

2 Answers2


Some paper chasing netted this short article by George Marsaglia, in which he also quotes the article by James Glaisher where the error function was given a name and notation (but with a different normalization). Here's the relevant section of the paper:

In 1871, J.W. Glaisher published an article on definite integrals in which he comments that while there is scarcely a function that cannot be put in the form of a definite integral, for the evaluation of those that cannot be put in the form of a tolerable series we are limited to combinations of algebraic, circular, logarithmic and exponential—the elementary or primary functions. ... He writes:

The chief point of importance, therefore, is the choice of the elementary functions; and this is a work of some difficulty. One function however, viz. the integral $\int_x^\infty e^{-x^2}\mathrm dx$, well known for its use in physics, is so obviously suitable for the purpose, that, with the exception of receiving a name and a fixed notation, it may almost be said to have already become primary... As it is necessary that the function should have a name, and as I do not know that any has been suggested, I propose to call it the Error-function, on account of its earliest and still most important use being in connexion with the theory of Probability, and notably with the theory of Errors, and to write

$$\int_x^\infty e^{-x^2}\mathrm dx=\mathrm{Erf}(x)$$

Glaisher goes on to demonstrate use of $\mathrm{Erf}$ in the evaluation of a variety of definite integrals. We still use "error function" and $\mathrm{Erf}$, but $\mathrm{Erf}$ has become $\mathrm{erf}$, with a change of limits and a normalizing factor: $\mathrm{erf}(x)=\frac2{\sqrt{\pi}}\int_0^x e^{-t^2}\mathrm dt$ while Glaisher’s original $\mathrm{Erf}$ has become $\mathrm{erfc}(x)=\frac2{\sqrt{\pi}}\int_x^\infty e^{-t^2}\mathrm dt$. The normalizing factor $\frac2{\sqrt{\pi}}$ that makes $\mathrm{erfc}(0)=1$ was not used in early editions of the famous “A Course in Modern Analysis” by Whittaker and Watson. Both were students and later colleagues of Glaisher, as were other eminences from Cambridge mathematics/physics: Maxwell, Thomson (Lord Kelvin) Rayleigh, Littlewood, Jeans, Whitehead and Russell. Glaisher had a long and distinguished career at Cambridge and was editor of The Quarterly Journal of Mathematics for fifty years, from 1878 until his death in 1928.

It is unfortunate that changes from Glaisher’s original $\mathrm{Erf}$: the switch of limits, names and the standardizing factor, did not apply to what Glaisher acknowledged was its most important application: the normal distribution function, and thus $\frac1{\sqrt{2\pi}}\int e^{-\frac12t^2}\mathrm dt$ did not become the basic integral form. So those of us interested in its most important application are stuck with conversions...

...A search of the Internet will show many applications of what we now call $\mathrm{erf}$ or $\mathrm{erfc}$ to problems of the type that seemed of more interest to Glaisher and his famous colleagues: integral solutions of differential equations. These include the telegrapher’s equation, studied by Lord Kelvin in connection with the Atlantic cable, and Kelvin’s estimate of the age of the earth (25 million years), based on the solution of a heat equation for a molten sphere (it was far off because of then unknown contributions from radioactive decay). More recent Internet mentions of the use of $\mathrm{erf}$ or $\mathrm{erfc}$ for solving differential equations include short-circuit power dissipation in electrical engineering, current as a function of time in a switching diode, thermal spreading of impedance in electrical components, diffusion of a unidirectional magnetic field, recovery times of junction diodes and the Mars Orbiter Laser Altimeter.

On the other hand, for the applications where the error function is to be evaluated at complex values (spectroscopy, for instance), probably the more "natural" function to consider is Faddeeva's (or Voigt's) function:


there, the normalization factor simplifies most of the formulae in which it is used. In short, I suppose the choice of whether you use the error function or the normal distribution CDF $\Phi$ or the Faddeeva function in your applications is a matter of convenience.

J. M. ain't a mathematician
  • 71,951
  • 6
  • 191
  • 335
  • Interesting that in the Marsaglia article, the notation $x$ is used as both the dummy variable and limit of integration in the first two integrals, which I see you've transcribed verbatim. I don't have access to the Glaisher article. Is that really the notation *he* uses? – cardinal Sep 04 '11 at 13:37
  • @cardinal: I didn't want to edit a quote (within a quote). ;) I will have to admit I haven't peered at Glaisher's paper yet. I'll see if I can get to it, and will ping you if I have something... – J. M. ain't a mathematician Sep 04 '11 at 13:42
  • Indeed, on page 296 of the Glaisher article, $x$ is used for both purposes. In fact, he uses this in the rest of the article as well. – cardinal Sep 04 '11 at 14:05
  • I see, thanks @cardinal! If it's not too much trouble, could you send me a copy? My e-mail address is at my profile. – J. M. ain't a mathematician Sep 04 '11 at 14:06
  • Done. $\text{ }$ – cardinal Sep 04 '11 at 21:28
  • I suspect one reason may be that integrating $e^{-x^2}$, from a mathematical point of view, has a slightly smaller symbolic complexity (namely there is no quotient or constant in the power.). Thus as an extension of the set of operations for solving integrals, which seems to be its original point, it would make more sense this way, even if it doesn't so much from the specific and very common application to the normal, i.e. the so-called "law of errors" from which it derives its name. – The_Sympathizer Dec 03 '18 at 05:29

I think the normalization in $x$ is easy to account for: it's natural to write down the integral $\int_0^x e^{-t^2} \, dt$ as an integral even if it's not actually the most natural probabilistic quantity. So it remains to explain the normalization in $y$, and as far as I can tell this is so $\lim_{x \to \infty} \text{erf}(x) = 1$.

Beyond that, the normalization's probably stuck more for historical reasons than anything else.

Qiaochu Yuan
  • 359,788
  • 42
  • 777
  • 1,145