This is kind of a signal processing/programming/mathematics crossover question. At the moment it seems more math-related to me, but if the moderators feel it belongs elsewhere please feel free to migrate it.

I'm working on a project where I have limited computational power and need to make a speedy approximation of the hyperbolic tangent function over a fairly large range of input arguments. Assuming the numbers are stored in fixed point with an 8 bit fractional part then the approximation to $\tanh(x)$ should work to the limit implied by the resolution, or for arguments $\tanh^{-1}(\pm[1 - \frac{1}{2^8}]) \approx \pm3.1$. I know that Taylor series will not converge fast enough over this range, and I haven't fully looked into it but I don't believe that a Chebyshev polynomial approximation will converge rapidly enough either. For various reasons a table lookup is also not possible.

$\tanh(x)$ is algebraically equivalent to $\mathrm{sgn}(x)(1 - \frac{2}{e^{2|x|} + 1})$. This looks promising; a series expansion of $e^x$ converges better than $\tanh(x)$. For a rapid numerical calculation on limited hardware the " + 1" in the denominator throws a wrench into the works - it means a division has to be carried out, and since many simple processors don't have "hardware divide" the division has to be done in software through repeated shifts and subtractions, which is achingly slow.

To get around this I'm considering the following approach. Instead of using $\mathrm{sgn}(x)(1 - \frac{2}{e^{2|x|} + 1})$ to exactly represent tanh(x), define the following similar type of function: $\mathrm{sgn}(x)A(1 - e^{-B|x|})$ for unknown constants A and B. Then define a loss function:

$f(x) = \int_0^{3.1}[\tanh(x) - A(1 - e^{-Bx})]^2dx$

Then use a numerical method (gradient descent?) to find the values of A and B that minimize the loss. The problem of approximating $\tanh(x)$ rapidly is then converted into approximating $e^{-x}$ rapidly plus an extra subtraction and multiplication.

$e^{-x} = \dfrac{1}{e^x} = \dfrac{1}{2^{x \log_2(e)}} = \dfrac{2^{-x}}{2^y}$, where y is the integer part and x is the fractional part. Expressing $e^{-x}$ this way lets me reduce the range of the input argument; dividing an unsigned number by $2^y$ where y is a positive integer can be accomplished by logical shifting. For arguments between 0 and -1 just eyeballing a Taylor series expansion of $2^x$ looks pretty good for just 3 terms.

Does this approach seem reasonable, from a mathematical perspective? Pitfalls I haven't considered? Thanks for any advice.

Edit: I should add that in this application (audio processing) I'm willing to trade off absolute accuracy of the calculated value in exchange for speed of processing and a function that works "reasonably" well over the entire range of input arguments - the return value for every argument in the approximation doesn't have to be accurate to the $2^{-8}$ limit implied by the resolution.