I think I understand how to use Taylor polynomials to approximate sine. For instance, if $$ \sin x \approx ax^2+bx+c $$ and we want the approximation to be particularly accurate when $x$ is close to $0$, then we could adopt the following approach. When $x=0$, $\sin x = 0$, and so $ax^2+bx+c=0$, meaning that $c=0$. Therefore we get $$ \sin x \approx ax^2+bx $$ If we want the first derivatives to match, then $\frac{d}{dx}(ax^2+bx)$ should equal $1$. Therefore, $b=1$: $$ \sin x \approx ax^2+x $$ Finally, if we want the second derivatives to match, then $\frac{d^2}{dx^2}(ax^2+x)$ should equal $0$, and so $a=0$. The small angle approximation for sine is $$ \sin x \approx x $$ All of this makes sense to me. What I don't understand is when people try to put this on rigorous footing. I have often heard people say 'this shows that $x$ is the best quadratic approximation of $\sin x$ when $x$ is near to $0$'. But what is meant by 'best', and 'near'? If the approximation suddenly became terrible when $x=0.5$, then would this be considered close enough to $0$ for there to be a problem? It seems that there are formal definitions for these terms, but I don't know what they are.

5Good question. [This](https://math.stackexchange.com/questions/298837/towhatextentisthetaylorpolynomialthebestpolynomialapproximation) may help. – saulspatz Sep 26 '20 at 20:42

1I think Spivak's book Calculus covers this pretty well. If $g$ is the best quadratic approximation to a smooth function $f$ near $a$, then the error term $r(x) = f(x)  g(x)$ is small near $a$, even when compared with $xa$, in the sense that $r(x)/(xa)^2$ approaches $0$ as $x$ approaches $a$. And $g$ is the only polynomial function that has this property. – littleO Sep 27 '20 at 03:24
6 Answers
Given a function $f$, polynomials $p_1$ and $p_2$, and some $x_0$, we can define "better" as meaning that there is a neighborhood in which it is a better approximation. That is, if there exists $\epsilon$ such that $(xx_0<\epsilon) \rightarrow (p_1(x)f(x)<p_2(x)f(x))$, then near $x_0$, $p_1$ is a better approximation to $f$ than $p_2$ is.
So Taylor polynomials can, with this definition, be said to be better than any other polynomial with the same order. That is, if $f$ is analytic and $T_n$ is the $n$th order Taylor polynomial of $f$, then for all $n$th order polynomials $g$, there exists a neighborhood around $x_0$ such that $T_n$ is better than $g$.
 11,145
 3
 13
 26

1Can I please have a reference to a proof of this fact? > Taylor polynomials can, with this definition, be said to be better than any other polynomial with the same order – Siddharth Bhat Oct 03 '20 at 07:50

@Accumulation Thank you for this answer. If possible, I too would benefit seeing the reference that Siddharth has asked for. – Joe Oct 03 '20 at 18:45
I think that, in the area of function approximation, we heve to distinguish two cases
 around a point
 over a range
In the case of $\sin(x)$, if we take into account that is an odd function, for sure around $x=0$ only odd terms will be used and the best quadratic approximation will be something as $k x$ and if we want to math the slope at $x=0$, we shall have $k=1$.
Now, forget about the properties of $\sin(x)$ and say that you want the best quadratic approximation between $x=0$ and $x=\frac \pi 6$. So, consider the norm $$\Phi=\int_0^{\frac \pi 6} \Big[a+b x+c x^2\sin(x)\big]^2\,dx$$ I shall skip the intermediate calculations and the minimum of $\Phi$ will be obtained for $$a=\frac{9 \left(1440720 \sqrt{3}48 \pi 6 \pi ^2+\sqrt{3} \pi ^2\right)}{\pi ^3}\approx 0.00116136$$ $$b=\frac{432 \left(1080540 \sqrt{3}42 \pi 3 \pi ^2+\sqrt{3} \pi ^2\right)}{\pi ^4}\approx 1.02675$$ $$c=\frac{3240 \left(864432 \sqrt{3}36 \pi 2 \pi ^2+\sqrt{3} \pi ^2\right)}{\pi ^5}\approx 0.128777$$ This would give $\Phi=9.91 \times 10^{8}$ while setting $b=1$ and $a=c=0$, over that range, the norme would be $\Phi=4.19 \times 10^{5}$.
 214,262
 52
 92
 197

@JoãoMendes. Thnaks for pointing the typo. You are right. Cheers – Claude Leibovici Sep 28 '20 at 11:59
Try this. Graph $y = e^x  (x + 1).$ You'll get what appears to be a parabola near $x=1.$ Of course, it's not really a parabola. Indeed, it would be absolutely amazing if the transcendental function $e^x  x  1$ had the exact geometrical focus and directrix property that a true parabola has. But it seems fairly clear (this is not a proof, of course, since we're just looking at a picture) that the tangent at $x=0$ of this graph is horizontal. Assuming this, that means the tangent to the graph of $y = e^x  (x + 1) + 0.01x$ will be $y = 0.01x.$ Why? When we're very close to $x=0,$ we're essentially adding the graph of the $x$axis to the graph of $y = 0.01x.$ And sure enough, if you look at a graph of $y = e^x  (x + 1) + 0.01x$, then you'll see that near $x=0$ the graph is linear and not horizontal (this much you can tell without trying to determine whether it's actually $y = 0.01x$ instead of possibly some other nonhorizontal line), which means that changes in $y$ are proportional (by a nonzero constant) to changes in $x,$ something that was NOT true for the graph of $y = e^x  (x+1).$
It will help to investigate, for yourself, the graphs of $y = e^x  (x + 1) + ax$ for various values of $a \neq 0.$ In all such cases you should find that the graph crosses the $x$axis at a nonzero angle, although when $a$ is close to $0$ you might have to zoom in a bit to see this.
This investigation suggests that, among all possible linear functions (by "linear", I mean "algebraic of degree at most $1),$ the one that BEST approximates $e^x$ in the vicinity of $x=0$ is $Ax + B$ for $A = 1+a$ and $B = 1,$ where $a = 0.$ [We actually haven't looked at what happens if $B \neq 1.$ It should be easy to see what happens if $B \neq 1,$ regardless of how we might try to vary the coefficient of $x$ to fix things.]
Usually the next step, when students are presented with an investigation such as this, is to consider what quadratic term we might add to get a better approximation. But before doing that, let's look at an intermediate adjustment to the approximation $1 + x,$ one of the form $1 + x + ax^{1.3},$ for various values of $a.$ The reason I'm using $x$ is to avoid issues with computer algebra systems trying to interpret everything for complex numbers. You'll find that near $x=0$ it doesn't matter what the value of $a$ is. Consider, for instance, the graph of (1) $y = e^x  (1 + x) + 2x^{1.3}$ and the graph of (2) $y = e^x  (1 + x) + 42x^{1.3}$. There seems to be no qualitative distinction between (1) the differences of the values of $e^x$ and the values of $1 + x + 2x^{1.3}$ and (2) the differences of the values of $e^x$ and the values of $1 + x + 42x^{1.3}.$ Of course, to be more convincing (still not a proof, however), you'll want to zoom in closer to $x=0$ to see whether this apparent similarity between (1) and (2) continues to hold. Also, if you try negative values of $a,$ then you'll find that the graph is below the $x$axis, but the qualitative features are the same. Being below the $x$axis for negative values of $a$ just means that when $a < 0$ and we're close to $x=0,$ the values of $e^x$ are less than the values of $1 + x + ax^{1.3}.$
Let's review things a bit. First, $1 + x$ is the best linear approximation to $e^x$ in the sense that, as $x$ approaches $0,$ the errors are smaller than $ax$ for any $a \neq 0.$ The previous paragraph appears to show that we don't get any substantial benefit by considering adjustments of the form $ax^{1.3},$ since the effect of adjusting $1 + x$ by adding $ax^{1.3}$ appears to produce graphs that look like $ax^{1.3}.$
If you repeat the above investigation for other possibilities of the form $1 + x + ax^{b},$ where $1 < b < 2,$ then you'll find that essentially the same thing happens  there is no unique BEST approximation for these exponents, in the sense that there is NOT a unique value of $a$ (for any previously specified $b)$ that gives a qualitatively better approximation than all other values of $a.$
The situation changes abruptly if we use $b=2.$ Consider the graphs of $y = e^x  (1 + x  2x^2)$ and $y = e^x  (1 + x + 5x^2)$. In each case the graphs appear quadratic near the origin, which suggests that the errors are proportional to $x^2,$ which is not qualitatively different than simply using the approximation $1 + x.$ If you experiment by changing the coefficient of the quadratic, you'll find the same thing until by chance you happen to try the magic value $1/2.$ The graph of $y = e^x  (1 + x + \frac{1}{2}x^2)$ appears to be cubic near $x=0.$
To bring this to a close, because this is getting much longer than I really had time for (it started out as a comment), $1 + x + \frac{1}{2}x^2$ is the best quadratic approximation to $e^x$ in the sense that, for functions of the form $ax^2 + bx + c,$ you won't get the errors to be smaller than quadratic (in $x)$ as $x$ approaches $0$ unless you choose $a = \frac{1}{2}$ and $b = 1$ and $c = 1.$ If you don't have $c=1,$ then there will be a "zeroth order" error (i.e. the errors will be proportional to $x^0$ in the limit as $x \rightarrow 0).$ And if $c=1,$ but you don't have $b = 1,$ then there will be a "first order" error (i.e. the errors will be proportional to $x^1$ in the limit as $x \rightarrow 0).$ And finally, if $c=1$ and $b=1,$ but you don't have $a = \frac{1}{2},$ then there will be a "second order" error (i.e. the errors will be proportional to $x^2$ in the limit as $x \rightarrow 0).$ However, if $c=1$ and $b=1$ and $a=\frac{1}{2},$ then the error will be proportional to $x^3$ (and not just to some intermediate order of smallness, like $x^{2.3}$ or $x^{2.87})$ in the limit as $x \rightarrow 0.$
As for the situation with $\sin x,$ what happens is that not only is $x$ the best linear approximation, but in fact $x$ is also the best quadratic approximation. That is, the best quadratic approximation is $x + 0x^2.$ And if you look at the graph of $\sin x  x$, you'll see that it resembles $x^3$ near $x=0.$
I'll end with this question. How does it happen that, once we stumble upon the best quadratic approximation, then only way to get a better approximation is to consider cubic adjustments? That is, why don't we have best $x^b$ approximations for noninteger values of $b$? Or to put it another way, could it be possible that the error between a function and its best quadratic approximation NOT be proportional to $x^3$ as $x \rightarrow 0,$ but instead be a bit later by, for example, being proportional to $x^{2.71}$ as $x \rightarrow 0$? In short, what is behind these exponent jumps, which one might see as analogous to quantum jumps in electron energy in atoms? (Answer: It has to do with the $C^n$ smoothness assumptions in Taylor's theorem.)
 33,231
 3
 60
 110
This sounds like an asymptotic approximation question where we are looking for $$ f(x)=a_0+a_1x+a_2x^2+\cdots+a_nx^n+o\!\left(x^n\right)\tag1 $$ as $x\to0$. $(1)$ means that we have $$ \lim_{x\to0}\frac{f(x)\left(a_0+a_1x+a_2x^2+\cdots+a_nx^n\right)}{x^n}=0\tag2 $$ If $f$ is nicely behaved near $0$ (e.g. real analytic), the Taylor polynomial of degree $n$ satisfies $(2)$.
For your example of $\sin(x)\approx x$, we have $$ \lim_{x\to0}\frac{\sin(x)x}{x^2}=0 $$ but $$ \lim_{x\to0}\frac{\sin(x)x}{x^3}=\frac16 $$ Thus, $\sin(x)\approx x$ is only good to the second order. It is the closest polynomial of degree $2$ or less when $x$ is near $0$.
 326,069
 34
 421
 800
$$ \cos x = 1  \frac {x^2} 2 + \text{higherdegree terms}. $$ To say that this gives the best approximation to $\cos x$ by a quadratic polynomial near $x=0$ means that for every other quadratic polynomial, there is some open interval containing $0$ within which this polynomial gives better approximations to the cosine. How small that open interval needs to be depends on what the other polynomial is.
For example, let $a= (1\cos(0.05))/0.05^2.$ Then $1  a\cdot0.05^2$ is exactly equal to $\cos(0.05),$ but if $0.03<x<0.03$ you get better approximations to $\cos x$ by using $1  \tfrac 1 2 x^2$ than by using $1ax^2.$
 1
 30
 276
 565

2It is not true that the Taylor polynomial gives the best quadratic (nor any other degree) approximation over an interval. – Martin Argerami Sep 27 '20 at 09:58

2@MartinArgerami That's for a fixed interval. But if $T$ is the $n^{\text{th}}$order Taylor polynomial of $f$ (about $0$, say) and $P \neq T$ is any polynomial of degree $\leqslant n$, then as Michael states there is a $\delta > 0$ such that $\lvert T(x)  f(x)\rvert \leqslant \lvert P(x)  f(x)\rvert$ for all $\lvert x\rvert \leqslant \delta$, and the inequality is strict for $0 < \lvert x\rvert \leqslant \delta$. However, $\delta$ can be quite small, and, again as stated by Michael, $\delta$ depends on $P$. – Daniel Fischer Sep 27 '20 at 18:30

@MartinArgerami : You are right if you mean there is no one interval on which the Taylor polynomial gives better approximations than all others, and my answer already said that. Read carefully. Rather, for every other polynomial, there is _some_ open interval about the center within which the Taylor polynomial is better than that polynomial. Which interval it is depends on which other polynomial it is. – Michael Hardy Sep 27 '20 at 19:01
You are correct. The word "best" is ambiguous. There are many metrics used to measure how close one function is to another. Each metric creates its own definition of best approximation. Generally a textbook will say something that $\tilde f$ is the best approximation to $f$ and then they will defend their claim using their definition of best. Once you get used to this, it becomes a forgivable case of putting the cart before the horse.
 25,643
 4
 42
 84