I (sort of) understand what Taylor series do, they approximate a function that is infinitely differentiable. Well, first of all, what does infinitely differentiable mean? Does it mean that the function has no point where the derivative is constant? Can someone intuitively explain that to me?

Anyway, so the function is infinitely differentiable, and the Taylor polynomial keeps adding terms which make the polynomial = to the function at some point, and then the derivative of the polynomial = to the derivative of the function at some point, and the second derivative, and so on.

Why does making the derivative, second derivative ... infinite derivative, of a polynomial and a function equal at some point ensure that the polynomial will match the function exactly?

mr real lyfe
  • 1,497
  • 3
  • 19
  • 22
  • "what does infinitely differentiable mean?" - if $f(x)$ is "infinitely differentiable", this means that if I differentiate $f(x)$ to obtain a new function $f^\prime(x)$, then I can differentiate $f^\prime(x)$ to obtain a new function $f^{\prime\prime}(x)$, which I can differentiate again to obtain... well, you get the drift. Additionally, all those derivatives should evaluate to finite values at the point of expansion. – J. M. ain't a mathematician Aug 15 '12 at 14:25
  • 2
    Well how do I tell if a function is infinitely differentiable or not? – mr real lyfe Aug 15 '12 at 14:28
  • 2
    Do you have a calculator? (I guess WolframAlpha would work for this too.) Try playing around with taking the sine, cosine, and exponential (I mean $e^x$) of some very small numbers, ideally small powers of ten like $0.01, 0.001, 0.0001, ...$. You should notice some patterns. That's the Taylor series popping out at you. – Qiaochu Yuan Aug 15 '12 at 17:07
  • 1
    @ordinary: Usually, you tell because you can express the function in terms of other functions you already know are infinitely differentiable and constructions you already know produce infinitely differentiable functions. For example, the sum of two infinitely differentiable functions is infinitely differentiable. –  Aug 15 '12 at 17:10

6 Answers6


Say I want to approximate the function $f(x)$ at a point. Let's make it $0$ for simplicity. Since the function is continuous, I can just take the constant function $y=f(0)$ as a start. But of course, this is an awful example. A better one would not only take the same value, but have the same rate of change as well! So we do that:

$$y'=f'(0)$$ $$y=f'(0)x+C$$

Since $x=0$, $C=f(0)$ and our new approximation is


But wait! An even better one would have the same second derivative as well! That way it can even start to curve like the function!

$$y''=f''(0)$$ $$y'=f''(0)x+f'(0)$$ $$y=\frac{f''(0)x^2} 2+f'(0)x+f(0)$$

An even better one would be starting from the third derivative, so it can wiggle around zero if need be, and we get:

$$y=\frac{f'''(0)x^3} 6 + \frac{f''(0)x^2} 2+f'(0)x+f(0)$$

The general pattern is easy to see: our better and better approximations are adding the term: $$\frac{f^{(n)}(0)x^n}{n!}$$

onto our previous guess. So we then say that the best infinite polynomial approximation is just taking all of these together, which is going to be:

$$\sum_{n=0}^{\infty} \frac{f^{(n)}(0)x^n}{n!}$$

Which is the Taylor series at $x=0$. As others have pointed out it doesn't always work, but if you're going to start your approximation by requiring all derivatives to be equal, this is what you come up with.

Robert Mastragostino
  • 15,009
  • 3
  • 31
  • 52
  • I'm focusing on the transition from first derivative approximation to second derivative approximation to make sure that I understand the process. Am I correct to assume that, one way to see the process is that the function is first approximated with a linear approximation, and the result expression is then improved by fine-tuning the first derivative (with a linear approximation of the first derivative, resulting in an overall quadratic expression), and so on? – ensbana May 03 '20 at 09:36
  • 1
    If so I have two questions: 1) where does the denominator "2" in the quadratic expression come from? I tried substitute $f'(0)$ in $y = f'(0)x + f(0)$ with $y' = f''(0)x + f'(0)$, but the "2" doesn't show. 2) how can we be sure that the quadratic expression is a better approximation than the linear one? Following up on my interpretation above, the second derivative is indeed a linear approximation of the first derivative. But with respect to the original function, how do we know that the contribution from this approximation to the overall expression of $f(x)$ goes in the "right" direction? – ensbana May 03 '20 at 09:44

As the other answerers have said, you do need to strengthen the condition on your function from smooth to analytic. Once you've done that, here's a stab at the intuition behind Taylor polynomials as successive approximations.

The zeroth Taylor polynomial for $f$ at $x_0$ is simply the constant function $f(x_0)$. Of course it's a very bad approximation for interesting functions, but it does happen to approximate a one-parameter class of functions perfectly-the constant functions $y=a_0$. The next best thing is to take a derivative, which we know gives the best linear approximation to $f$ near $x_0$. Notice we don't lose anything, since if $f$ was originally constant, then our linear approximation is $y=a_1x+a_0$ with $a_1=0$ and the "approximation" is still perfect, while for everything else, with a nonzero derivative, we've gotten closer to the actual function near $x_0$. Taking higher polynomials is a direct generalization of this process. With the second derivative I can find the very closest quadratic to $f$ near $x_0,$ and since everything linear is a special case of a quadratic, I can certainly only get to a better approximation.

By taking this process to the limit of "infinite-degree polynomials," i.e. power series, we might expect that some nice functions actually equal the limit of their expansion in this way. Proving this direction requires some fiddling with difference quotients, but in fact for the most important analytic function there's nothing to prove: $e^t$ is defined, at least in one version, as its power series expansion. We can even get $\sin$ and $\cos$ and thus all trig functions in this way if we start out over $\mathbb{C}$, though this would be pretty perverse.

Kevin Arlin
  • 48,131
  • 3
  • 49
  • 100
  • Thanks. This made a lot of sense. I see why the first derivative gives the best linear approximation, as it is the slope of the line tangent to the function at f(c). But why does the second derivative give the best quadratic approximation?? I know this is something simple, this part just still kind of confuses me though – mr real lyfe Aug 15 '12 at 15:45
  • @ordinary: The tangent line has the same value and derivative as $f$ does at $c$, right? So maybe the best second order approximation should have the same value, derivative, and *second* derivative.... –  Aug 15 '12 at 17:07
  • As @Hurkyl said. We get as close as we can without allowing any derivatives higher than the second to be nonzero. To prove it precisely you could write down the difference $f(x+h)-f(x)-f'(x)-f''(x)$. Since the limit of that over $h^2$ is zero by the definitions, it must go to zero faster than any constant times $h^2,$ so we're not going to get any closer with a different quadratic approximation. – Kevin Arlin Aug 16 '12 at 04:45
  • 1
    Continuing along @KevinCarlson 's line of thought, if we let p_2(x) be the second MacLaurin polynomial of the function f(x), then p_2(x) is the unique degree 2 polynomial with the property that lim_{x \to 0} [f(x) - p_2(x)]/[f(x) - q_2(x)] = 0 for all degree 2 polynomials q_2(x) different than p_2(x); it is in this sense that the degree 2 MacLaurin polynomial of f(x) is the best quadratic approximation to f(x). The proof requires L'Hôpital's Rule, but is otherwise straight-forward. The same statement holds for the degree n MacLaurin polynomial of f(x). – Jeffrey Rolland Aug 11 '17 at 02:29

Being infinitely differentiable just means you can keep taking derivatives. Having one derivative is no guarantee of having two, having two is no guarantee of having three. Consider $f_n(x)=\begin {cases} 0 & x \lt 0 \\ x^n & x \ge 0 \end {cases}$
It has $n-1$ derivatives everywhere, but fails to have $n$ of them at $x=0$ There is no problem having the derivatives constant, as all of these have constant derivatives of $0$ for $x \lt 0$

Adding terms of the Taylor series does match successive derivatives to the function. If the function is analytic, this makes the approximation better and better. If you have terms up to the $n^{th}$ in your series, the error term will be proportional to $x^{n+1}$. For small $x$ this will likely be very small, and smaller as $n$ increases.

It doesn't ensure that it will match exactly. Consider the standard example $g(x)=\begin {cases} 0 & x= 0 \\ \exp(-\frac 1{x^2}) & x \ne 0 \end {cases}$

This is infinitely differentiable, and at $x=0$ has all derivatives $0$, so doesn't equal the Taylor series.

Ross Millikan
  • 362,355
  • 27
  • 241
  • 432
  • 1
    Okay, so how does it work for analytic functions then. What is the intuition behind why the successive terms in the polynomial create better and better approximations of a function (that meets all of the criteria) – mr real lyfe Aug 15 '12 at 14:32
  • @ordinary: Analytic functions are effectively defined to be functions where the Taylor approximation works. So the Taylor approximation works for them because otherwise they would not be analytic. However there's one non-trivial part of it: Namely that such functions exist (apart from polynomials, for which it is trivial). – celtschk Aug 15 '12 at 16:37

No one seems to have said so, but the Taylor series is a power series of the form $f(x) = \sum_{n=0}^{\infty} b_{n}(x-a)^{n}.$ It is a consequence of a theorem of Weierstrass that such a series has a radius of convergence. This can be $0$, some real positive quantity $r$, or $\infty$. This means that the power series converges only for $x = a$, converges for $|x-a| <c,$ or converges for all $x.$ The case that the radius of convergence is zero is useless in practice. The case that the radius of convergence is infinite is ideal when it happens, but can fail even for familiar functions. If the radius of convergence is $r >0$ (possibly $r = \infty,$ the function defined by the power series is infinitely differentiable in the interval $(a-r,a+r).$ Furthermore, all derivatives have a power series expansion, and the power series expansion is what you get by differentiating the power series term by term the appropriate number of times. Evaluating the higher derivatives at $a,$ we see easily that $f^{(k)}(a) = k!b_{k},$ since the $k$-th derivative of $(x-a)^{j}$ is $0$ for $j <k$ and vanishes at $a$ for $j >k.$ Hence the power series, if it converges on an interval of positive length, has to be equal to its Taylor Series. The miracle is that repeated use of the mean value theorem leads to a remainder term which can often be shown to tend to zero, and that many familiar functions have power series representations given by the Taylor series.

Geoff Robinson
  • 22,930
  • 1
  • 39
  • 57

It doesn't. In fact, there are smooth (you can take infinitely many derivatives at the point in question) functions which are not analytic (function represented by its Taylor series near the point in question). A usual example is: for $x \neq 0$ $$f(x) = e^{-1/x^2} $$ and $f(0)=0$. It can be shown that the Taylor polynomials for $f$ at zero are all trivial hence the Taylor series is the zero series. Yet, $f(x) \neq 0$ for $x \neq 0$ hence the approximating series fails to capture $f$ near zero.

In short, the set of smooth functions is larger than the set of analytic functions.

James S. Cook
  • 16,540
  • 3
  • 42
  • 100

The answer I am going to give is going to be more in terms of intuitive, rather than highly technical.

  1. What does it mean that a function is infinitely differentiable? Well, think about what a derivative means. It's the instantaneous rate of change, or the rate of change (ratio) between 2 very close points, which are in fact so close that the distance between them is 0. $f'(x) = lim_{h -> 0} \frac{f(x+h) - f(x)}{h}$. So that means that the function mustn't have any 'tears' there (there are continuous functions that have 'tears' in their graphics) around that point for the derivative to exist. Now if a function is infinitely differentiable, that means that you can keep taking its derivative and it will still exist, and working backwards means that there's an infinity of points where the derivative exists, and hence the function is smooth. Think of it this way: if the graph had a 'tear' the first derivative would also have a tear, the second more, the third even more and so on. If it is infinitely differentiable it just means that the graph is smooth.

  2. We know that the derivative only tells us the slope, and doesn't care about the actual value (hence $\frac{d}{dx}(c + f) = \frac{df}{dx} $ and so there's an infinite number of functions with the same derivative). And the derivative also tells us the shape of the graphic. Since integrating it yields the area under it, it tells us about the shape. So the more higher-order derivatives are equal between the two functions, the more the shape of the graphics will look the same. That is essentially what the Taylor sum does, build a function that looks the same. The first term is there to compensate for the derivative being 0 (it just moves the graphic up and down so the values match up). And if there's a infinite number of equal higher-order derivatives, then it means they have exactly the same shapes.

Because derivatives mean graphic shape, the Taylor sum to infinity simply reconstructs the function in different terms, such that their shapes line up. Here's a visual fix on it http://en.wikipedia.org/wiki/File:Sintay.svg. You'll see that by matching the n-derivatives the functions become more and more alike, in terms of values.

  • 1,030
  • 1
  • 9
  • 17