This is kind of a signal processing/programming/mathematics crossover question. At the moment it seems more math-related to me, but if the moderators feel it belongs elsewhere please feel free to migrate it.

I'm working on a project where I have limited computational power and need to make a speedy approximation of the hyperbolic tangent function over a fairly large range of input arguments. Assuming the numbers are stored in fixed point with an 8 bit fractional part then the approximation to $\tanh(x)$ should work to the limit implied by the resolution, or for arguments $\tanh^{-1}(\pm[1 - \frac{1}{2^8}]) \approx \pm3.1$. I know that Taylor series will not converge fast enough over this range, and I haven't fully looked into it but I don't believe that a Chebyshev polynomial approximation will converge rapidly enough either. For various reasons a table lookup is also not possible.

$\tanh(x)$ is algebraically equivalent to $\mathrm{sgn}(x)(1 - \frac{2}{e^{2|x|} + 1})$. This looks promising; a series expansion of $e^x$ converges better than $\tanh(x)$. For a rapid numerical calculation on limited hardware the " + 1" in the denominator throws a wrench into the works - it means a division has to be carried out, and since many simple processors don't have "hardware divide" the division has to be done in software through repeated shifts and subtractions, which is achingly slow.

To get around this I'm considering the following approach. Instead of using $\mathrm{sgn}(x)(1 - \frac{2}{e^{2|x|} + 1})$ to exactly represent tanh(x), define the following similar type of function: $\mathrm{sgn}(x)A(1 - e^{-B|x|})$ for unknown constants A and B. Then define a loss function:

$f(x) = \int_0^{3.1}[\tanh(x) - A(1 - e^{-Bx})]^2dx$

Then use a numerical method (gradient descent?) to find the values of A and B that minimize the loss. The problem of approximating $\tanh(x)$ rapidly is then converted into approximating $e^{-x}$ rapidly plus an extra subtraction and multiplication.

$e^{-x} = \dfrac{1}{e^x} = \dfrac{1}{2^{x \log_2(e)}} = \dfrac{2^{-x}}{2^y}$, where y is the integer part and x is the fractional part. Expressing $e^{-x}$ this way lets me reduce the range of the input argument; dividing an unsigned number by $2^y$ where y is a positive integer can be accomplished by logical shifting. For arguments between 0 and -1 just eyeballing a Taylor series expansion of $2^x$ looks pretty good for just 3 terms.

Does this approach seem reasonable, from a mathematical perspective? Pitfalls I haven't considered? Thanks for any advice.

Edit: I should add that in this application (audio processing) I'm willing to trade off absolute accuracy of the calculated value in exchange for speed of processing and a function that works "reasonably" well over the entire range of input arguments - the return value for every argument in the approximation doesn't have to be accurate to the $2^{-8}$ limit implied by the resolution.

  • 2,163
  • 3
  • 20
  • 32
  • If the high school method of division is too slow to compute the quotient, how about a divide-and-conquer implementation of division? Or Newton's method to compute an inverse? –  Jan 08 '13 at 17:57

10 Answers10


You are certainly aware that the hyperbolic tangent has an asymptote; since no polynomial ever had a horizontal asymptote, it stands to reason that a polynomial will always poorly approximate the qualitative behavior of the hyperbolic tangent.

One viable approach would be to consider the rational functions that come from truncations of the continued fraction for the hyperbolic tangent:


Here for instance are plots comparing $\tanh\,z$ and the convergent


comparison of hyperbolic tangent and CF convergent

The left plot shows that the rational approximant and the actual function are almost visually indistinguishable, while the right plot depicts the function $\tanh\,z-R(z)$.

One other possibility you can use in conjunction with rational function approximation is the use of argument reduction; in particular, the identity


is very useful here. To sketch a possible algorithm: scale $z$ by $2^{-k}$, where $k$ is appropriately chosen such that $\dfrac{z}{2^k}$ is "small enough", evaluate the truncated continued fraction at this reduced argument, and then repeatedly apply the double argument identity $k$ times.

J. M. ain't a mathematician
  • 71,951
  • 6
  • 191
  • 335
  • Thanks for your reply! I will definitely investigate the continued fraction representation as an alternate possibility. The fly in the ointment is the hardware restriction on division I have. With the processor I am using 8 bit subtraction, multiplication, and addition are either 1 or 2 clock cycle operations. Division is not done in "hardware", and any division that cannot be accomplished by a shift might take upwards of 90 clock cycles. That means that the continued fraction representation will have to be expanded, and I don't think using the argument reduction formula will be possible. – Bitrex Feb 09 '12 at 02:24
  • 1
    That's alright. As a reminder, if you do take the route of turning the continued fraction convergent into a ratio of two polynomials, you should write the numerator and denominator polynomials in Horner form so that the arithmetic operations are minimized. – J. M. ain't a mathematician Feb 09 '12 at 02:31
  • You should actually get upper and lower bounds via such a truncation. Can you provide a plot of the absolute relative error, instead, so we can better see the relative performance of this approximation? – cardinal Feb 09 '12 at 02:42
  • @cardinal: the magnitudes don't really vary that wildly, so the relative error plot is not too different from the absolute error plot. [Here is a plot of $1-\dfrac{R(z)}{\tanh\,z}$](http://i.stack.imgur.com/PnrsO.png), for instance. – J. M. ain't a mathematician Feb 09 '12 at 02:48
  • I was most interested in seeing the behavior near zero since the denominator of the relative error goes to zero. – cardinal Feb 09 '12 at 02:58
  • @cardinal: [As you wish](http://i.stack.imgur.com/WTCOO.png). You should know that since these truncations are in fact intimately related to Padé approximants, the good behavior of these truncations near the origin is pretty much expected. – J. M. ain't a mathematician Feb 09 '12 at 03:04
  • Yes, but there are (famous!) examples of quite bad behavior at zero too. And, I'm typing from a phone, so was selfishly imposing on you since I could not check myself. Thanks for humoring me. :) – cardinal Feb 09 '12 at 03:08
  • @cardinal: It's fine for me to sometimes be kept on my toes ;) ; I do know those examples you speak of, but luckily enough for the OP, most of the elementary functions are quite amenable to Padé approximation. – J. M. ain't a mathematician Feb 09 '12 at 03:11
  • I find it somewhat amusing that the main issue was the speed of division, yet the accepted answer is the one using continued fractions :-) – Simply Beautiful Art Mar 01 '20 at 01:22

Again, in the interest of showing that there's always more than one way to skin a cat, I present a different set of rational functions that can be used instead of the continued fraction convergents I presented in my previous answer.

The idea is based on the Padé approximants of $\exp\,z$. More concretely, let the $(n,n)$ Padé approximant of $\exp\,z$ be represented by




From this, we find that we can approximate $\tanh\,z=\dfrac{\exp\,2z-1}{\exp\,2z+1}$ with a rational function like so:


For example,


Here are comparison plots for $\tanh\,z$ and $\mathcal{T}_3(z)$:

comparison plots of tanh(z) and Pade-derived approximant

The left plot shows $\tanh\,z$ and $\mathcal{T}_3(z)$ together, while the right plot depicts the relative error function $1-\dfrac{\mathcal{T}_3(z)}{\tanh\,z}$. Note that the error is slightly smaller here than in the previous answer. One might still be able to improve on this approximant with a rational function with simple enough coefficients, but this is a start.

J. M. ain't a mathematician
  • 71,951
  • 6
  • 191
  • 335

The best rational approximation to $\tanh(x)$ with numerator and denominator of degree 3 on the interval $[0, 3.1]$ (as provided by Maple's minimax function) is


This (call it $f(x)$) has maximum error .2735944241730870e-4, which is considerably less than 2^(-8).
On the interval $[-3.1, 3.1]$, use $\text{sgn}(x) f(|x|)$.

Robert Israel
  • 416,382
  • 24
  • 302
  • 603
  • 4
    This approximation has the nice benefit that all of the coefficients but one (the smallest in magnitude, no less!) are positive. However, the error stated is not guaranteed to be achieved once the coefficients are truncated to the storage format described by the OP. – cardinal Feb 09 '12 at 02:37

$\tanh(x)$ is the solution to the differential equation $y'=1-y^2$ with initial condition $y(0)=0$. There are an abundance of very fast methods for approximating solutions to autonomous differential equations like this. The most famous is Runge-Kutta 4. I feel I should emphasize - this method generally provide very good approximations in very little computing time.

2'5 9'2
  • 51,425
  • 6
  • 76
  • 143
  • This is a good approach, with a few caveats, like possibly using two or more Runge-Kutta methods of different orders or running the same RK method with different step sizes, to ensure accuracy. – J. M. ain't a mathematician Feb 09 '12 at 02:38
  • 1
    I suppose your step size will be limited to $2^{-8}$. With a step size of $2^{-8}$, RK4 gives approximations to $\tanh(x)$ with error less than $2^{-8}$ for $|x|<46\cdot2^{-8}\approx0.179688$, not counting any accumulated $8$-bit rounding errors. By the time we get to $|x|=3.1$, the error is now about $2^{-1}$ which is far from acceptable. I still wonder what other RK methods or other differential equations-based methods would produce, given step sizes limited to $2^{-8}$. – 2'5 9'2 Feb 09 '12 at 02:56
  • Additionally: if storage is at a premium, one might do well to use [Gill's version of the Runge-Kutta method](http://dx.doi.org/10.1017/S0305004100026414) instead of the classical one. – J. M. ain't a mathematician Feb 09 '12 at 02:58
  • @alex.jordan While playing around with my approximation, I noticed that it seems like the relative error actually starts out quite poor but then _improves_ with larger arguments. Perhaps I could use some kind of range-switching arrangement? – Bitrex Feb 09 '12 at 05:11
  • 1
    One could probably use Richardson extrapolation to improve approximations. Take the result from a size-$h$ step, and the result from two steps of size $h/2$. An appropriately weighted linear combination of these two results yields a result that is often more accurate than either of the two entities it was derived from. Of course, this adds to the computational burden... – J. M. ain't a mathematician Feb 10 '12 at 00:08

On this page Varosound derives a computationally simpler equation (one division, many adds and multiplies) from the continued fraction discussed here by J.M.:


In C, it would look like this:

float fast_tanh(float x){
  float x2 = x * x;
  float a = x * (135135.0f + x2 * (17325.0f + x2 * (378.0f + x2)));
  float b = 135135.0f + x2 * (62370.0f + x2 * (3150.0f + x2 * 28.0f));
  return a / b;

I should add that if x might be large (not relevant for the OP), you would want to clamp the output to +/-1 for x beyond about +/-4.97.


I suggest the approach by Robert Israel. I have recently done something similar to approximate this function, so here are my two cents. I was actually looking for an approximation of the logistic function, which is just a scaling and translation of this, but the hyperbolic tangent is easier to work with because it is symmetric around the 0. My model calculates tanh(x/2).

In my case I found a higher degree approximation that kept the maximum absolute error below 2e-7. I would like to know if Maple can deal with higher order models, because there is the challenge of avoiding to put poles over the real line. In fact, this model suggested by Robert has a pole approximately at -3.8, that may not be a problem in this application, but this is not great to have...

In my models the numerator is an odd polynomial, and the denominator is even, so all poles and zeros are imaginary pairs, with an extra zero at the origin. Here is the big model I found, for an error below 2e-7:

y =  94.9339451088 * x * ((( x**2 + 1.06315869e+03 ) * x**2 + 1.88748783e+05 ) * x**2 + 5.86237309e+06 ) / ((( ( x**2 + 4.03183926e+03 ) * x**2 + 1.64253046e+06 ) * x**2 + 1.28592857e+08 ) * x**2 + 1.11307745e+09 )

A smaller order model for ~2**-8 error can also certainly be found with the same approach. I used my code to make a new attempt to find this simpler model, and this is what I got, check it out:

y = 25.2075466924 * x * ( x**2 + 6.96321678**2 ) / (( x**2 + 3.17572786**2 ) * ( x**2 + 15.57611343**2 ))

This approximates tanh(x/2) with an error slightly above 2.2e-4 for |x| < 7.6. The derivative of the approximated function at 7.6 also equals 0, so it's safe to truncate the function there (make it sgn(x)), while the resulting model will become monotonically increasing, a nice property to have.

I could not get a good model for an even smaller order, but I don't believe it can get much simpler than that.

PS: The continued fraction x / (1+x^2/(3+x^2/(5+...))) also produces a very similar rational function, but the error is just very flat around 0, different from the Chebyshev-style fit I did. I would like to know if we could use that continued fraction to come up with a good initial estimate for the non-linear regression...

  • 121
  • 4

Surely fast and accurate methods for computing $\ln(x)$ are known. Since $\tanh^{-1}(x)=\frac12\ln\left(\frac{1+x}{1-x}\right)$, you could compute this instead, and then execute a binary search to find $\tanh(x)$. You would need to be able to accurately compute $\ln(Y)$ all the way up to $Y=\frac{1+(1-2^{-8})}{1-(1-2^{-8})}=2^9-1$. The binary search would only have $8$ repetitions.

Added later: This Wikipedia article describes a fast method for computing $\ln(Y)$ using the arithmetic-geometric mean.

2'5 9'2
  • 51,425
  • 6
  • 76
  • 143
  • OP said nothing about the availability of square root, so it might be a deal-breaker. On the subject of AGM-type methods for $\ln\,z$, there is also [Borchardt's method](http://math.stackexchange.com/a/75398). – J. M. ain't a mathematician Feb 10 '12 at 06:34
  • 1
    @J.M. That's true, but given the $2^{-8}$ precision in this question, it occurs to me that square root is quickly computable _again_ using a binary search on the inverse function. As I understand the method for computing $\ln(Y)$, square root would be applied to numbers no greater than $4$. At precision $2^{-8}$, that makes a binary search where $x^2$ is computed only 10 times. – 2'5 9'2 Feb 10 '12 at 09:22
  • +1, this method also generally looks promising! There still may be a problem with my "division requirement" though. – Bitrex Feb 11 '12 at 00:13
  • Instead of ln, if you had a fast way to compute log2, you could do that instead and just multiply by a different constant. – Timothy Miller Oct 24 '14 at 12:57

Have you looked at CORDIC? (alas, I didn't see the no table lookup, but I'll leave it here anyway)

  • 5,510
  • 1
  • 24
  • 43

I wrote this one. It's x4 times faster than built in tanh(), with only 0.0001% error on all range, and easy to convert to vector instructions, like AVX2.

float tanh_c3(float v)
    const float c1 = 0.03138777F;
    const float c2 = 0.276281267F;
    const float c_log2f = 1.442695022F;
    v *= c_log2f;
    int intPart = (int)v;
    float x = (v - intPart);
    float xx = x * x;
    float v1 = c_log2f + c2 * xx;
    float v2 = x + xx * c1 * x;
    float v3 = (v2 + v1);
    *((int*)&v3) += intPart << 24;
    float v4 = v2 - v1;
    return (v3 + v4) / (v3 - v4);
  • 101
  • 2

As you say,


By geometric expansion we can rewrite this as


This expansion converges rapidly with an error of order $e^{-2N|x|}$ for $N$ terms, which is small when $x$ is large. It avoids any division here, and all of the exponentials reduce to one since $e^{-2n|x|}=(e^{-2|x|})^n$.

Below we show a plot of the approximations going up to the $\pm2e^{-2n|x|}$ term up to $n=8$:

enter image description here

All that's left to do is to approximate it near the origin, which can easily be done via Taylor approximations.

If this is still unsatisfying, an alternative approximation may be desired for $x$ between say 0.5 and 1.

Simply Beautiful Art
  • 71,916
  • 11
  • 112
  • 250