The Error function

$\mathrm{erf}(x)=\frac{2}{\sqrt{\pi}}\int_0^x e^{-t^2}\,dt$

shows up in many contexts, but can't be represented using elementary functions.

I compared it with another function $f$ which also starts linearly, has $f(0)=0$ and converges against the constant value 1 fast, namely

$\tanh{(x)} = \frac {e^x - e^{-x}} {e^x + e^{-x}}$.

Astoningishly to me, I found that they never differ by more than $|\Delta f|=0.0812$ and converge against each other exponentially fast!

I consider $\tanh{(x)}$ to be the somewhat prettyier function, and so I wanted to find an approximation to $\text{erf}$ with "nice functions" by a short expression. I "naturally" tried

$f(x)=A\cdot\tanh(k\cdot x^a-d)$

Changing $A=1$ or $d=0$ on it's own makes the approximation go bad and the exponent $a$ is a bit difficult to deal with. However, I found that for $k=\sqrt{\pi}\log{(2)}$ the situation gets "better". I obtained that $k$ value by the requirement that "norm" given by


i.e. the difference of the functions areas, should valish. With this value, the maximal value difference even falls under $|\Delta f| = 0.03$. And however you choose the integration bounds for an interval, the area difference is no more than $0.017$.

enter image description here

Numerically speaking and relative to a unit scale, the functions $\text{erf}$ and $\tanh{(\sqrt{\pi}\log{(2)}x)}$ are essentially the same.

My question is if I can find, or if there are known, substitutions for this non-elementary function in terms of elementary ones. In the sense above, i.e. the approximation is compact/rememberable while the values are even better, from a numerical point of view.

The purpose being for example, that if I see somewhere that for a computation I have to integrate erf, that I can think to myself "oh, yeah that's maybe complicated, but withing the bounds of $10^{-3}$ usign e.g. $\tanh(k\cdot x)$ is an incredible accurate approximation."

  • 11,497
  • 2
  • 35
  • 82
  • 1
    Related article: [*A handy approximation for the error function and its inverse*](https://1e47a410-a-62cb3a1a-s-sites.googlegroups.com/site/winitzki/sergei-winitzkis-files/erf-approx.pdf?attachauth=ANoY7cpNSBkV5DaEl8yw203bs9kzVWMxUqytohD1_-3k_MsH6nCBUV6s7_-DpZ50YUglnmDxCbrJtMMyqhWeMXV79CwdzS_BgF_emXzHac3gL6NS5XMLvflwbLPCZnfFD6OQeemOXEE0MGWDnXydDZIr797BcovvBMy-xhr1yJxzsKuMxFi7kG76t4bcJlFEOywtfo9No1XV4kBkyiVNMfaHHXcX-99f2hS_ybSk9MMLw5Pu6aXK4Sc%3D&attredirects=0). – Nikolaj-K Jun 10 '14 at 19:10
  • This is a very handy approximation for which inversion is also quite handy. Thank you! – prime Aug 15 '17 at 06:03
  • 1
    Related stackexchange that I found was enlightening: https://math.stackexchange.com/questions/1892553/why-the-error-function-is-so-similar-to-the-hyperbolic-tangent – Corey Levinson Nov 13 '19 at 18:46

12 Answers12


It depends on how much accuracy you need and over what interval. It seems that you are happy with a few percent. There is an approximation in Abromowitz & Stegun that gives $\text{erf}$ in terms of a rational polynomial times a Gaussian over $[0,\infty)$ out to $\sim 10^{-5}$ accuracy.

In case you care, in the next column, there is a series for erf of a complex number that is accurate to $10^{-16}$ relative error! I have used this in my work and got incredible accuracy with just one term in the sum.

Ron Gordon
  • 134,112
  • 16
  • 181
  • 296
  • Do you happen to know what the integrals of those approximations are (from negative to positive infinity)? I'm asking for the cases where we need to avoid letting the total area go over 1. – user541686 Jan 24 '14 at 05:06
  • 6
    Hello. Following your link to [Abromowitz & Stegun](http://people.math.sfu.ca/~cbm/aands/page_299.htm), one can read that they borrowed those approximations from Hasting: *Approximation for digital computers*, but Hastings as well as A&S doen't provide any explanation, how to obtain those approximations. Do you happen to know how to do that or where this has been done? Thank you. – Antoine Jul 10 '15 at 18:05
  • Hi, I'm a bit confused as, following the link, I am led to the homepage of a Prof. MacDonald, with no mention of a paper by Abromowitz & Stegun. Has the link changed, and if so could you provide an updated one? – YiFan Nov 03 '21 at 05:19
  • @YiFan There is a link to the book in the linked-to webpage – Ron Gordon Nov 15 '21 at 20:35
  • @RonGordon thanks. No idea how I missed it the first time! – YiFan Nov 16 '21 at 00:04

A logistic distribution $F$ -- which can be expressed as a rescaled hyperbolic tangent -- can closely approximate the normal distribution function $\Phi$. Likewise, its inverse function -- the "logit" function $F^{-1}$ -- can be rescaled to approximate the inverse normal CDF -- the "probit" function $\Phi^{-1}$.

In comparison, the logistic distribution has fatter tails (which may be desirable). Whereas the normal distribution's CDF and inverse CDF ("probit") cannot be expressed using elementary functions, closed form expressions for the logistic distribution's CDF and its inverse are facilely derived and behave like elementary algebraic functions.

The logistic distribution arises from the differential equation $\frac{d}{dx}f(x) = f(x)(1-f(x))$. Intuitively, this function is typically used to model a growth process in which the rate behaves like a bell curve.

In comparison, the normal distribution arises from the following differential equation: $ \frac{d \,f(x)}{dx}=f(x)\frac{(\mu-x)}{\sigma^2}$). The normal distribution is commonly used to model diffusion processes. E.g., a Wiener processes is a stochastic process which has independent normally distributed increments with mean $\mu$ and variance $\sigma^2$. In the limit, this is a Brownian Motion.

Interestingly, the logistic distribution arises in a physical process which is analogous to Brownian motion. The "limit distribution of a finite-velocity damped random motion described by a telegraph process in which the random times between consecutive velocity changes have independent exponential distributions with linearly increasing parameters."

Note that the CDF of the logistic distribution $F$ can be expressed using hyperbolic tangent function:

$F(x;\mu ,s)={\frac {1}{1+e^{{-{\frac {x-\mu }{s}}}}}}={\frac 12}+{\frac 12}\;\operatorname {Tanh}\!\left({\frac {x-\mu }{2s}}\right)$

Given that distribution's variance is ${\tfrac {s^{2}\pi ^{2}}{3}}$, the logistic distribution can be scaled to approximate the normal distribution by multiplying its variance $\frac{3}{\pi ^2}$. The resultant approximation will have the same first and second moments as the normal distribution, but will be fatter tailed (i.e., "platykurtotic").

Also, $\Phi$ is related to the error function (and its complement) by: $\Phi (x)={\frac {1}{2}}+{\frac {1}{2}}\operatorname {erf} \left(x/{\sqrt {2}}\right)={\frac {1}{2}}\operatorname {erfc} \left(-x/{\sqrt {2}}\right)$

The chief advantage to approximating normal with the logistic distribution is that the CDF and Inverse CDF can be easily expressed using elementary functions. Several fields of applied science utilize this approximation.

The main disadvantage, however, is the estimation error. The maximum absolute error for the scaled logistic function and the normal CDF is $0.0226628$ for $X = \pm 0.682761$. Furthermore, the maximum errors of the inverse logistic function (logit) and the probit function are bounded at $.0802364$ in the region $[-0.841941,0.841941]$, but become asymptotically large outside that range. It is important to note these functions behave very differently in "the tails".

Thus, for a standard normal distribution with $\mu =0$ and $\sigma =1$: $$\operatorname{erf}(\frac{x}{\sqrt{2}}) \approx \operatorname{Tanh}\left(\frac{x \, \pi}{2 \sqrt{3}} \right) \equiv \frac{e^{\frac{\pi\,x}{\sqrt{3}}}-1}{e^{\frac{\pi\,x}{\sqrt{3}}}+1} $$

$$\operatorname{erf}(x) \approx \operatorname{Tanh}\left(\frac{x \, \pi}{ \sqrt{6}} \right) \equiv \frac{e^{\pi\,x\frac{2}{\sqrt{3}}}-1}{e^{\pi\,x\frac{2}{\sqrt{3}}}+1} $$

$$\Phi \left( x \right) \approx \frac{1}{2} + \frac{1}{2} \operatorname{Tanh} \left( \frac{\pi \, x}{2 \sqrt{3}} \right) $$

And easily, thus: $$x \mapsto \Phi^{-1}\left(p\right) \approx -\frac{2\sqrt{3}\operatorname{ArcTanh}\left( 1-2p \right)}{\pi}$$

Mathematica to analyze errors:


normsdistApprox[X_] = (1/2 + 1/2 Tanh[(\[Pi] X)/(2 Sqrt[3])]) 
normsinvApprox[p_] = X /. Solve[ p == normsdistApprox[X]  , X][[1]]
normalPDFApprox = D[normsdistApprox[X], X]


Plot[{CDF[NormalDistribution[], X], normsdistApprox[X]}, {X, -5, 5}, 
 PlotLabel -> "Logistic Approximation to Normal CDF", 
 PlotLegends -> "Expressions", ImageSize -> 600]

Plot[{Abs[CDF[NormalDistribution[], X] - normsdistApprox[X] ]}, {X, 0,
   5}, PlotLabel -> 
  "Error of the Logistic Approximation to the Normal CDF", 
 ImageSize -> 600]

Plot[{InverseCDF[NormalDistribution[0, 1], p], normsinvApprox[p]}, {p,
   0, 1}, PlotLabel -> 
  "Logistic Approximation to the Inverse Normal CDF (Probit \
Function)", PlotLegends -> "Expressions", ImageSize -> 600]

Plot[{InverseCDF[NormalDistribution[0, 1], p] - 
   normsinvApprox[p]}, {p, 0, 1}, 
 PlotLabel -> 
  "Error of the Logit Approximation to the Inverse Normal CDF (Probit \
Function)", PlotLegends -> "Expressions", ImageSize -> 600]

enter image description here

enter image description here

enter image description here

enter image description here

Lastly, give max errors:

FindMaximum[Abs[CDF[NormalDistribution[], X] - normsdistApprox[X]], X]

   InverseCDF[NormalDistribution[0, 1], p] - normsinvApprox[p]], 
  p >= 1*10^-2, p <= 1 - 1 *10^-2}, p]

   InverseCDF[NormalDistribution[0, 1], p] - normsinvApprox[p]], 
  p >= 1*10^-16, p <= 1*10^-2}, p]


{0.0226628, {X -> 0.682761}}

{0.0802364, {p -> 0.841941}}

{12.032, {p -> 1.*10^-16}}
  • 171
  • 1
  • 10

I suspect the reason the $\tanh x$ solution "works" so well is because it happens to be the second order Pade approximation in $e^x$. unfortunately, higher order Pade Approximations don't seem to work as well. One more thing you could due is try to approximate $\text{erf}(x)$ only on $(-3,3)$, and assume it to be $\pm 1$ everywhere else.

  • 31,733
  • 7
  • 76
  • 133

I pointed out this close correspondence in Section 2.4 of L. Ingber, ``Statistical mechanics of neocortical interactions. I. Basic formulation,'' Physica D 5, 83-107 (1982). [ URL http://www.ingber.com/smni82_basic.pdf ]

Lester Ingber
  • 71
  • 1
  • 2

In addition to the answers above there are two things to note which may be of importance. Both relate to the fact that, depending on your application, the approximation may not be as good as as it looks. This might lead you to choose other approximations, like the ones already mentioned.

First, the tail behaviour of the $\mathrm{erf}$ and $\tanh$ functions is very different. Asymptotically $\mathrm{erf}$ behaves like $e^{-x^2}$, whereas $\tanh$ behaves like $e^{-x}$. Roughly speaking, a one in a 100 years event in the normal distribution becomes a one in 10 years event in the $\tanh$ approximation - this might matter.

Something else to note is that values of $\mathrm{erf}$ functions usually mean probabilities. But differences of probabilities are not meaningful quantities. Hence, depending on your application, another similarity measure may be more appropriate, for example the Kullback Leibler divergence $-\left[(p\log{q}+(1-p)\log(1-q)\right]$, where $p=\mathrm{erf}(\dots)$ and $q=\tanh(\dots)$.

As a sidenote: You can get similarly good approximations with fewer operations by stitching together two $e^{-x}$ functions, e.g. $f(x)=\mathrm{sgn(x)}\,e^{-\alpha|x|}$.

  • 276
  • 3
  • 6

Transforming a non-elementary function with elementary functions into a form that is well approximated with a rational polynomial seems to be a good approach.

Notice that $$ 1-\operatorname{erf}\left(x\right)^{2} \approx \frac{1}{e^{x^{2}}} $$ Now there are two approaches you can take to turn this into an approximately rational polynomial. Taking the log of both sides yields $$ \ln\left(1-\operatorname{erf}\left(x\right)^{2}\right) \approx -x^{2} $$ Substituting $x\rightarrow \sqrt{\ln\left(x\right)}$ yields $$ 1-\operatorname{erf}\left(\sqrt{\ln\left(x\right)}\right)^{2} \approx \frac{1}{x} $$ Both of these transformations of erf are invertible. Finding the Pade Approximant for either of these will yield an astoundingly good approximation for very few terms. In the case of the first one $$ \ln\left(1-\operatorname{erf}\left(x\right)^{2}\right) \approx x^{2}\frac{-1.27324-0.074647x^{2}}{1+0.0886745x^{2}} $$ and so for $x \ge 0$ $$ \operatorname{erf}\left(x\right)\approx \sqrt{1-e^{x^{2}\frac{-1.27324-0.074647x^{2}}{1+0.0886745x^{2}}}} $$ whose accuracy is about 4 decimal places at worst, and converges at 0 and infinity.

Unfortunately this is not invertible.

Trey Reynolds
  • 488
  • 3
  • 9

I found my own very compact and nice but most importantly readily reversible $\operatorname{erf}(x)= \frac2\pi\arctan[2x(1+x^4)]$ (error below 2%).


I like the tanh approximation given, but, I noticed a possible correction you could use. If you look at the plot of the error of the approximation with erf, it's almost a damped sine wave in the form $-e^{-\lambda x} \sin({\alpha \pi x})$. My tweaked version of your approximation is $erf(x)\approx tanh(kx)-Ce^{-\lambda x} \sin({\alpha \pi x})$, with the parameters $k=\sqrt\pi \ln{2}$, $C=0.01$, $\lambda = 0.25$, $\alpha = \frac{0.975}{0.1+\sqrt[3] {|x|}}$ (the addtion in the denomination is obvious, to prevent division by zero, it can be made larger or smaller to adjust the fit, and the root can likewise be changed to adjust the fit, however, it should be the root of the absolute value of x to prevent using imaginary numbers where they're not needed, and to maintain the sign of $\pi x$)

Unfortunately, my fit is still far from perfect, I simply eyeballed the additional approximation, and hope someone will adjust the parameters for a better fit, probably at least an additional $C' e^{\lambda' x} \sin({\beta \pi x})$ calculation will be necessary to better smooth the fit.

I hope my input was helpful, although I'm pretty late to the party!

  • 11
  • 1
  • Thanks for the answer. Btw. both tanh and erf should also be TeX functions. In any case, it's been over 5 years since I was interested in this, so lala :) – Nikolaj-K May 02 '19 at 22:03

Too long for a comment.

The term $k_0=\sqrt \pi \log(2)$ is really elegant.

If we consider the norm $$\Phi(k)=\int_0^\infty \Big[\text{erf}(x)-\tanh (k x)\Big]^2\,dx$$ which corresponds to a least squares regression based on an infinite number of data points, we get a slightly different optimum value, which, rationalized, is $k_1=\frac{605}{503}$ (which is more than trivial). $$\Phi(k_0)=5.28 \times 10^{-4} \qquad \qquad \qquad \Phi(k_1)=4.44 \times 10^{-4}$$ In terms of maximum absolute errors, $k_0$ leads to $0.027$ while $k_1$ leads to $0.019$.

Claude Leibovici
  • 214,262
  • 52
  • 92
  • 197

Nice! I found one very efficient one with a maximum difference of 0.01747. It contains a grand total of 1 division, 1 addition, 1 power, and 2 absolutes

$$ aerf(x)=\frac x {|x|+0.187^{|x|}} $$

For reference: $$ d(x)=aerf(x)-erf(x) $$

And you can reverse it (using more than elementary functions (courtesy of njuffa)): $$ \text{But that's hard.} $$

The 3 most off values of x for peaks (less on both sides) are:

  1. 0.5982 x, 0.01747 more
  2. 1.5582 x, 0.01741 less
  3. 0.11439 x, 0.00681 less

It is 0.01251 less at an x of 2, 0.000045 less at 5, and $5.22 (10^{-9})$ less at 10.

As x approaches infinity or 0, $d(x)$ approaches 0.

$$ \frac{0}{0+0.187^0}=\frac{0}{1}=0 $$ $$ \lim_{x\rightarrow inf}\frac{x}{|x|+0.187^{|x|}}=1 $$

re: njuffa says that a slightly more accurate equation uses $\frac{289}{1545}$ which is $5.501*10^{-5}$ more than $0.187$ and well, it is more accurate, but only by ~0.00004 at the greatest difference, so it's not really that much.

If you don't know, $erf'(x)$ is a scaled normal distribution ($e^{-\pi x^2}$). $$erf'(\sqrt{\pi}x)\frac{\sqrt{\pi}}2=e^{-\pi x^2}$$

If you want accuracy approximating the normal distribution, you would use $$anorm(x) = \frac{\delta}{\delta x}\frac{x}{|x|+0.368^{|x|}}$$

  • Slightly more accurate: $(\frac{289}{1545})^{|x|}$. FWIW, I don't think this is reversible using only elementary math functions. – njuffa Nov 03 '21 at 09:29
  • i didn't really want to go for A B S O L U T E P R E C I S I O N so i stuck with 3 decimal places – brazilian egg Nov 04 '21 at 18:41

There is this series expansion using Gamma Regularized function $Q(a,z)$:

$$1-Q\left(n+\frac12,z^2\right)\mathop=_{n\in\Bbb N}^{z\ge0}\text{erf}(z)-ze^{-z^2}\frac{(-1)^n}{\Gamma\left(n+\frac12\right)}\sum_{k=0}^{n-1}\left(\frac12-n\right)_{n-k-1}(-1)^k z^{2k}\iff\text{Error of Approximation}=\text{erf(z)}-\text{Approximation}$$

You can invert to solve within the range of the approximation error:

$$\text{Error of Approximation}(z)=x\implies z=\sqrt{Q^{-1}\left(n+\frac12,1-x\right)}$$

An application of this Inverse Gamma Regularized $Q^{-1}(a,z)$ formula is to find the value of $z$ where the error of the approximation of $\text{erf}(z)$ is $0\le x\le1$. In other words, you can find which values of $z$ needed to be some amount off from the approximation of the Error function. Please correct me and give me feedback!

Tyma Gaidash
  • 5,239
  • 1
  • 12
  • 45

And another one not so nice but reversible by $\ln(x)$ to depressed quartic equation solvable by Ferrari's formulas and more accurate $\operatorname{erf}(x)= \operatorname{sgn}(x) \tanh[1.152|x| + 0.064|x|^4]$ (error below 0.5%).