I have for some time been trawling through the Internet looking for an aesthetic proof of Taylor's theorem.

By which I mean this: there are plenty of proofs that introduce some arbitrary construct: no mention is given of from whence this beast came. and you can logically hack away line by line until the thing is solved. but this kind of proof is ugly. a beautiful proof should rise naturally from the ground.

I've seen one proof claiming to do it from the fundamental theorem of calculus. It looked messy.

I've seen several attempts to use integration by parts repeatedly. But surely it would be tidier to do this without bringing in all of that extra machinery.

The nicest two approaches seem to involve using the mean value theorem and Rolle's theorem. but I can't find a lucid presentation of either approach.

Maybe my brain is unusually stupid, and the approaches on Wikipedia etc are perfectly good enough for everyone else.

Does anyone have a crystal clear understanding of this phenomenon? Or a web-link to such an understanding?

*EDIT*: Eventually a Cambridge mathematician explained it to me in a way that I could understand, and I have written up the proof here. To my mind it is the most instructional proof I have encountered, yet putting it as an answer received mostly downvotes. It seems strange to me that no one else seems to concur. But it should be up to the keenest mathematical minds to choose which answer should be accepted. It shouldn't be up to me. Therefore I will bow to the wisdom of the community, and accept the currently most-upvoted answer. I have learned from Machine Learning that a "Committee of Experts" outperforms any one expert, and I am certainly no expert.

P i
  • 2,024
  • 2
  • 16
  • 30
  • 1
    I find the respective [Wikipedia page](https://en.wikipedia.org/wiki/Taylor%27s_theorem) quite informative. Can you say what you got from it (or any other source) so far? What did you understand and didn't? Where did you get stuck? This may provide you more suitable answers. – JMCF125 Sep 01 '13 at 22:04
  • 1
    I actually like the integration-by-parts approach because with a little modification it yields the Euler-Maclaurin summation formula as well. I find that aesthetic, though artificially "cooked". – ccorn Sep 01 '13 at 22:10
  • 1
    The key of the proof: induction + Integration by parts. –  Sep 01 '13 at 22:11
  • 14
    I agree with JMCF125's comment. If the OP can't enunciate specifically what is unsatisfactory about the standard proofs (ideally with direct reference to at least one standard proof), then the question doesn't seem to be much more than "Please give me proofs of Taylor's theorem until I find one that I like." – Pete L. Clark Sep 02 '13 at 00:14

13 Answers13


Here is an approach that seems rather natural, based on applying the fundamental theorem of calculus successively to $f(x)$, $f'(t_1)$, $f''(t_2)$, etc.: \begin{eqnarray*} && f(x)=f(a)+\int_a^x f'(t_1)\,dt_1 \\&& = f(a)+ \int_a^x f'(a)\,dt_1 + \int_a^x \!\! \int_a^{t_1} f''(t_2)\,dt_2\,dt_1 \\&& = f(a)+ \int_a^x f'(a)\,dt_1 + \int_a^x \!\! \int_a^{t_1} f''(a) \,dt_2\,dt_1 +\int_a^x \!\! \int_a^{t_1} \!\! \int_a^{t_2} f'''(t_3) \,dt_3\,dt_2\,dt_1 \end{eqnarray*} Notice that $$ \int_a^x f'(a)\,dt_1=f'(a)\int_a^x dt_1=f'(a)(x-a), $$$$ \int_a^x \!\! \int_a^{t_1} f''(a)\,dt_2\,dt_1 = f''(a)\int_a^x (t_1-a)\,dt_1 = f''(a)\frac{(x-a)^2}{2}, $$$$ \int_a^x \!\! \int_a^{t_1} \!\! \int_a^{t_2} f'''(a)\,dt_3\,dt_2\,dt_1 = f'''(a)\int_a^x \frac{(t_1-a)^2}{2}\,dt_1 = f'''(a)\frac{(x-a)^3}{3!}, $$ and in general $$ \int_a^x \!\! \int_a^{t_1} \!\ldots \int_a^{t_{n-1}} f^{(n)}(a)\,dt_n\ldots\,dt_2\,dt_1 = f^{(n)}(a)\frac{(x-a)^{n}}{n!}. $$

By induction, then, one proves $$ f(x) = P_n(x)+ R_n(x) $$ where $P_n$ is the Taylor polynomial $$ P_n(x) = f(a)+f'(a)(x-a)+f''(a)\frac{(x-a)^2}{2}+\ldots+ f^{(n)}(a) \frac{(x-a)^n}{n!}, $$ and the remainder $R_n(x)$ is represented by nested integrals as $$R_n (x) = \int_a^x \!\! \int_a^{t_1} \!\ldots \int_a^{t_{n}} f^{(n+1)}(t_{n+1}) \,dt_{n+1}\ldots\,dt_2\,dt_1. $$

We can establish the Lagrange form of the remainder by applying the intermediate and extreme value theorems, using simple comparisons as follows. Consider the case $x>a$ first. Let $m$ be the minimum value of $f^{(n+1)}$ on $[a,x]$, and $M$ the maximum value. Then since $$ m\le f^{(n+1)}(t_{n+1}) \le M $$ for all $t_{n+1}$ in $[a,x]$, after $n+1$ repeated integrations one finds $$ m \frac{(x-a)^{n+1}}{(n+1)!} \le R_n(x) \le M \frac{(x-a)^{n+1}}{(n+1)!}. $$ But now, notice that the function $$ t\mapsto f^{(n+1)}(t) \frac{(x-a)^{n+1}}{(n+1)!} $$ attains the extreme values $$ m \frac{(x-a)^{n+1}}{(n+1)!} \quad\mbox{and} \quad M \frac{(x-a)^{n+1}}{(n+1)!} $$ at some points in $[a,x]$. By the intermediate value theorem, there must be some point $t$ between these two points (so $t\in[a,x]$) such that $$ R_n(x) = f^{(n+1)}(t) \frac{(x-a)^{n+1}}{(n+1)!}. $$ This is the Lagrange form of the remainder. If $x<a$ and $n$ is odd, the same proof works. If $x<a$ and $n$ is even, $(x-a)^{n+1}<0$ and the same proof works after reversing some inequalities.

One can motivate this whole approach in a couple of different ways. E.g., one can argue that $ {(x-a)^n}/{n!} $ becomes small for large $n$, so the remainders $R_n(x)$ will become small if the derivatives of $f$ stay bounded, say.

Or, one can reason loosely as follows: $f(x)\approx f(a)$ for $x$ near $a$. Ask, what is the remainder exactly? Apply the fundamental theorem as above, then approximate the first remainder using the approximation $f'(t_1)\approx f'(a)$. Repeating, one produces the Taylor polynomials by the pattern of the argument above.

Bob Pego
  • 4,999
  • 3
  • 24
  • 18
  • 6
    It's great and simple. – Felix Marin Sep 13 '13 at 01:45
  • 1
    This is such a great explanation, thank you. – littleO Mar 07 '15 at 03:52
  • With a minimum effort you can arrive at the integral formulation of the remainder term, just apply your formula after "and in general..." to solve the nested integral for $R_n$. I think that the approach of your answer (with the integral formulation of the remainder) is very nice, among other things, because it generalizes to every situation in which one has an integral and the integration by parts formula. It applies equally well to complex-, vector-, and even Banach space- valued functions, which is quite useful in practice. – Giuseppe Negro Mar 23 '16 at 15:01
  • Why is $f^{n+1}(t_{n+1})$ bounded between $m$ and $M$? $f^{n+1}$ doesn't need to be continuous, right? – layman Oct 11 '16 at 17:08
  • I guess we're assuming $f^{n+1}(t)$ is continuous since we use the intermediate value theorem. – layman Oct 11 '16 at 20:37
  • 2
    $f^{(n+1)}$ is bounded on some interval $I$ because by Taylor's hypotheses, we let $f$ be a function whose $n + 1^\textrm{th}$ derivative exists on some interval $I$ – Michael Levy Nov 23 '17 at 14:17
  • @FelixMarin "... great and simple." - Then you give this proof in Calculus I ? – nilo de roock Feb 22 '21 at 09:54
  • Nice answer. Btw the second part of the answer can itself be modified into a full proof that doesn't explicitly use integration ! (Lax and Terrell's Calculus book does that) – Venkata Karthik Bandaru Apr 13 '21 at 13:21
  • Thank you for posting this. This proof generalizes to other notions of derivative and integral on more general spaces, it helped me out! – Carson James Nov 13 '21 at 16:00

The clearest proof one can find, in my opinion, is the following. Note it is just a generalized mean value theorem!

THM Let $f,g$ be functions defined on a closed interval $[a,b]$ that admit finite $n$-th derivatives on $(a,b)$ and continuous $n-1$-th derivatives on $[a,b]$. Suppose $c\in [a,b]$. Then for each $x\neq c$ in $[a,b]$ there exists $x_1$ in the segment joining $c$ and $x_1$ such that $$\left(f(x)-\sum_{k=0}^{n-1}\frac{f^{(k)}(c)}{k!}(x-c)^k\right) g^{(n)}(x_1)=f^{(n)}(x_1)\left(g(x)-\sum_{k=0}^{n-1}\frac{g^{(k)}(c)}{k!}(x-c)^k\right)$$

PROOF For simplicity assume $c<b$ and $x>c$. Keep $x$ fixed and consider $$F(t)=f(t)+\sum_{k=1}^{n-1}\frac{f^{(k)}(t)}{k!}(x-t)^k$$ $$G(t)=g(t)+\sum_{k=1}^{n-1}\frac{g^{(k)}(t)}{k!}(x-t)^k$$

for each $t\in[c,x]$. Then $F,G$ are continuous on $[c,x]$ and admit finite derivative on $(c,x)$. By the mean value theorem we may write $$F'(x_1)[G(x)-G(c)]=G'(x_1)[F(x)-F(c)]$$

for $x_1\in (c,x)$. This gives that $$F'(x_1)[g(x)-G(c)]=G'(x_1)[f(x)-F(c)]$$ since $F(x)=f(x),G(x)=g(x)$. But we see, by cancelling terms with opposite signs, that $$F'(t)=\frac{(x-t)^{n-1}}{(n-1)!}f^{(n)}(t)$$ $$G'(t)=\frac{(x-t)^{n-1}}{(n-1)!}g^{(n)}(t)$$ which gives the desired formula when plugging $t=x_1$.

COR We get Taylor's theorem with $g(x)=(x-c)^n$, namely, for some $x_1$ we have $$\left( {f(x) - \sum\limits_{k = 0}^{n - 1} {\frac{{{f^{(k)}}(c)}}{{k!}}} {{(x - c)}^k}} \right)n! = {f^{(n)}}({x_1}){\left( {x - c} \right)^n}$$ or $$f(x) = \sum\limits_{k = 0}^{n - 1} {\frac{{{f^{(k)}}(c)}}{{k!}}} {(x - c)^k} + \frac{{{f^{(n)}}({x_1})}}{{n!}}{\left( {x - c} \right)^n}$$ Note that $g^{(k)}(c)=0$ if $k=0,1,2\ldots,n-1$, $g^{n}=n!$.

  • 116,339
  • 16
  • 202
  • 362
  • 3
    This is from Apostol's Mathematical Analysis 2e, pp.113-114. – RitterSport Jul 27 '16 at 21:00
  • @RitterSport That is correct. =) – Pedro Jul 27 '16 at 21:01
  • 3
    "there exists x1 in the segment joining c and x1 such that" should read "there exists x1 in the segment joining c and x such that" – PJ_Finnegan Sep 28 '16 at 14:02
  • I believe this requires Cauchy's mean value theorem? Otherwise the first use of the MVT wouldn't necessarily follow from the usually stated version (of course the extended version is not a whole lot more) – nimish Apr 04 '20 at 23:33

Let us try and approximate a function by a polynomial in such a way that they coincide closely at the origin. To achieve this, we will require the same value, the same slope, the same curvature and the same higher order derivatives at $0$.

WLOG we us use a cubic polyomial and we start from $$f(x)=p(x)+e(x)=a+bx+cx^2+dx^3+e(x),$$ where $e$ is an error term.

Imposing our conditions, we need as many equations as there are unknown coefficients

$$f(0)=a+e(0),\\ f'(0)=b+e'(0),\\ f''(0)=2c+e''(0),\\ f'''(0)=3!d+e'''(0).\\ $$ Lastly,


To achieve a small error, we ensure $e(0)=e'(0)=e''(0)=e'''(0)$, and set $a=f(0),b=f'(0),2c=f''(0),3!d=f'''(0)$. This gives us the Taylor coefficients. We now have to bound the error term.

Assuming that $|f''''(x)|=|e''''(x)|\le M$ in the range $[0,h]$, by integration

$$|e'''(x)|=\left|\int_0^x e''''(x)\,dx+e'''(0)\right|\le Mx,\\ |e''(x)|=\left|\int_0^x e'''(x)\,dx+e''(0)\right|\le\left|\int_0^x Mx\,dx\right|=\frac{Mx^2}2,\\ |e'(x)|=\left|\int_0^x e''(x)\,dx+e'(0)\right|\le\left|\int_0^x \frac{Mx^2}2\,dx\right|=\frac{Mx^3}{3!},\\ |e(x)|=\left|\int_0^x e'(x)\,dx+e(0)\right|\le\left|\int_0^x \frac{Mx^3}{3!}\,dx\right|=\frac{Mx^4}{4!}.$$

To summarize, for $x\in[0,h]$,

$$\left|f(x)-f(0)-f'(0)x-f''(0)\frac{x^2}2-f'''(0)\frac{x^3}{3!}\right|\le M\frac{h^4}{4!},$$ where $|f''''(x)|\le M$.


My personal favorite is the proof which uses L'Hopital's rule. It is without a doubt one of the lightest proofs for it, and in my own view one of the more elegant. This proof below is quoted straight out of the related Wikipedia page:


$h_k(x) = \begin{cases} \frac{f(x) - P(x)}{(x-a)^k} & x\not=a\\ 0&x=a > \end{cases}$

where, as in the statement of Taylor's theorem, $P(x) = f(a) + > f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \cdots + > \frac{f^{(k)}(a)}{k!}(x-a)^k$

It is sufficient to show that

$\lim_{x\to a} h_k(x) =0$. The proof here is based on repeated application of L'Hôpital's rule.

Note that, for each $j = 0,1,...,k−1, f^{(j)}(a)=P^{(j)}(a)$.

Hence each of the first $k−1$ derivatives of the numerator in $h_k(x)$ vanishes at $x=a$, and the same is true of the denominator. Also, since the condition that the function $f$ be $k$ times differentiable at a point requires differentiability up to order $k−1$ in a neighborhood of said point (this is true, because differentiability requires a function to be defined in a whole neighborhood of a point), the numerator and its $k-2$ derivatives are differentiable in a neighborhood of $a$. Clearly, the denominator also satisfies said condition, and additionally, doesn't vanish unless $x=a$, therefore all conditions necessary for L'Hopital's rule are fulfilled, and its use is justified. So

$\begin{align} \lim_{x\to a} \frac{f(x) - P(x)}{(x-a)^k} &= \lim_{x\to > a} \frac{\frac{d}{dx}(f(x) - P(x))}{\frac{d}{dx}(x-a)^k} = \cdots = > \lim_{x\to a} \frac{\frac{d^{k-1}}{dx^{k-1}}(f(x) - > P(x))}{\frac{d^{k-1}}{dx^{k-1}}(x-a)^k}\\ &=\frac{1}{k!}\lim_{x\to a} > \frac{f^{(k-1)}(x) - P^{(k-1)}(x)}{x-a}\\ &=\frac{1}{k!}(f^{(k)}(a) - > f^{(k)}(a)) = 0 \end{align}$

where the second to last equality follows by the definition of the derivative at $x = a$.

  • 1,760
  • 10
  • 14

This is the best proof I've seen:



It's all about smoothness of the functions.

A continuous function is such that it can be accurately approximated by a constant in the neighborhood of a point:

$$f(x)=f(x_0)+r(x;x_0)$$ where $r$ is a "remainder" function, which tends to zero at $x_0$.

A smooth function is such that it is differentiable, and its derivatives are continuous. (The more derivatives, the smoother.) For the sake of the example, consider the third order:


Then integrating from $x_0$ to $x$ three times,




In the above, the remainders are antiderivatives of each other, and one can show that they belong to $o((x-x_0)^k)$.


You can find a nice proof of Taylor's Thm at: http://www.math.csusb.edu/faculty/pmclough/SPTT.pdf

  • 6
    Link-only answers are frowned upon because links often go dead, while answers here are expected to be permanent. If you could add some explanation of the proof to your answer, leaving the link for users who want additional details, it would greatly improve the answer. – Alex Becker May 02 '14 at 18:57

Here a nice summary and proof from Stewart's Calculus:


John Molokach
  • 1,905
  • 13
  • 17

The following proof is in Bartle's Elements of Real Analysis. It's goal is to exploit Rolle's Theorem as the more elementary version of the Mean Value Theorem does. To this end, it incorporates a clever use of the product rule.

So, suppose that $f$ denotes a function on $[a,b]$ such that $f$ is $n$-times differentiable on $[a,b]$ and such that $f$ is $n+1$ times differentiable on $(a,b)$. For every $x$ and $y$ distinct from $[a,b]$ we show there is a point $\xi$ strictly between both $x$ and $y$ such that $$f(y)=\sum_{k=0}^n\frac{f^{(k)}(x)}{k!}(y-x)^k+\frac{f^{(n+1)}(\xi)}{(n+1)!}(y-x)^{n+1}\,.$$

To prove this, let $\alpha$ denote the real number which satisfies $$\frac{(y-x)^{n+1}}{(n+1)!}\alpha=f(y)-\sum_{k=0}^n\frac{f^{(k)}(x)}{k!}(y-x)^k\,.$$

And now define the function $\varphi$ on $[a,b]$ by $$\varphi(t)=f(y)-\left\{\sum_{k=0}^n\frac{f^{(k)}(t)}{k!}(y-t)^k +\frac{\alpha}{(n+1)!}(y-t)^{n+1}\right\}\,.$$

We clearly have that $\varphi(y)=0$ and, by the definition of $\alpha$, we have $\varphi(x)=0$. Thus, Rolle's Theorem implies there is a $\xi$ strictly between $x$ and $y$ such that $$\varphi'(\xi)=0\,.$$ This is where the clever use of the product rule comes in. For when we use the definition of $\varphi$ and differentiate at $\xi$, we obtain a telescoping series which, upon simplification, leaves us with $$\varphi'(\xi)=\frac{\alpha-f^{(n+1)}(\xi)}{n!}(y-\xi)^n.$$ This shows that $\alpha=f^{(n+1)}(\xi)$ as desired.


The following isn't a rigorous proof, but I think it's "aesthetic", and "rise[s] naturally from the ground", as the original question asked for.

In searching for intuition for Taylor Series, I've developed a perspective involving Pascal's Triangle, which arises from recursively applied Riemann Sum approximations to the function.

I found @Bob Pego's answer really helpful and it's how I started developing this.

The end result involves coefficients based on rows of Pascal's Triangle, and the sequence of approximations (sequence of rows) looks like this

"Pascal" approximations for sin(x)

And they're much less efficient approximations than plain finite Taylor polynomial

Taylor approximations for sin(x)

I'll explain the derivation, but the essence of it is that the recursive Riemann Sum procedure produces binomial coefficients -- rows of Pascal's Triangle -- which are also simplex numbers. Simplex numbers converge to factorial fractions of hypercubes. The nth triangle number approaches $n^2 / (2!)$, the nth tetrahedral number approaches $n^3 / (3!)$, and so on.

A regular Riemann Sum approximation of f(x) of "resolution" 4 would be

$$ f(x) \approx f(0) + f'(0) \cdot \frac x4 + f'(x/4) \cdot \frac x4 + f'(2 \cdot x/4) \cdot \frac x4 + f'(3 \cdot x/4) \cdot \frac x4 \\ = f(0) + \frac x4 \cdot (f'(0) + f'(x/4) + f'(2 \cdot x/4) + f'(3 \cdot x/4)) $$

After each discrete step, we update the slope by setting it to the true slope of the function -- what the 1st derivative is at that point we've stepped to along x. This is the idea of a Riemann Sum.

But since we're interested in Taylor Series (about 0) here, let's pretend that we can't update to $f'(x/4)$ directly, and can only use the values of all derivatives evaluated at 0, not at $x/4$ or anywhere else.

So instead of updating to the actual slope, we'll use a recursive approximation to get an approximate slope update. We can now recurse and approximate each of the terms that have a non-0 x value. For example,

$$ f'(3x/4) \approx f'(0) + \frac x4 \cdot (f^{(2)}(0) + f^{(2)}(x/4) + f^{(2)}(2 \cdot x/4)) $$

There are still some terms with $f$ evaluated elsewhere than 0, so we recursively approximate terms until all terms are derivatives of $f$ evaluated at 0.

For resolution 4, you'll end up with

$$ f(x) \approx 1 \cdot (\frac x4)^0 \cdot f(0) + 4 \cdot (\frac x4)^1 \cdot f'(0) + 6 \cdot (\frac x4)^2 \cdot f^{(2)}(0) + 4 \cdot (\frac x4)^3 \cdot f^{(3)}(0) + 1 \cdot (\frac x4)^4 \cdot f^{(4)}(0) $$

Note the appearance of the Pascal row $1, 4, 6, 4, 1$.

In general, for resolution n, that will be

$$ f(x) \approx \sum_{k=0}^n {n \choose k} \frac {x^k} {n^k} f^{(k)}(0) $$

But I prefer to focus on the simplex perspective. Equivalently, that's

$$ f(x) \approx f(0) + \frac {natural_n}{n} f'(0) x + \frac {triang_{n - 1}}{n^2} f^{(2)}(0) x^2 + \frac {tetra_{n - 2}}{n^3} f^{(3)}(0) x^3 + \frac {penta_{n - 3}}{n^4} f^{(4)}(0) x^4 + ... $$

Where e.g. $penta_{n - 3}$ is the $(n - 3)th$ pentatope number, like if we index the simplex numbers from 1 to infinity. A few examples:

$$ \color{blue}{triang}_{\color{red}{1}} = {(\color{blue}{2} - 1) + \color{red}{1} \choose \color{blue}{2}} = 1, \color{blue}{triang}_{\color{red}{4}} = {(\color{blue}{2} - 1) + \color{red}{4} \choose \color{blue}{2}} = 10 $$

$$ \color{blue}{tetra}_\color{red}{2} = {(\color{blue}{3} - 1) + \color{red}{2} \choose \color{blue}{3}} = 4, \color{blue}{tetra}_\color{red}{5} = {(\color{blue}{3} - 1) + \color{red}{5} \choose \color{blue}{3}} = 35 $$


Check the Pascal's Triangle wikipedia page if you're not following that.

Simplex numbers approaching factorial fractions of hypercubes

$$ \frac {tetra_{n - 2}} {n^3} = \frac {{(3 - 1) + (n - 2) \choose 3}} {n^3} = \frac {n \choose 3} {n^3} = \frac {\frac {n (n - 1) (n - 2)} {3!}} {n^3} = \frac {n (n - 1) (n - 2)} {n^3} \cdot \frac {1} {3!} $$


$$ \lim_{n \to \infty} \frac {n (n - 1) (n - 2)} {n^3} \cdot \frac {1} {3!} = \frac {1} {3!} $$

Taking $n$ to $\infty$ corresponds to increasing the "resolution" of your Riemann Sum, and approaching continuous integration, thus approaching the Taylor Series.

Just like this "triangle"


Is a low resolution of an actual right isosceles triangle polygon

This may seem really roundabout given the concise alternative of the ${n \choose k}$ binomial coefficient notation, but I think simplexes are a nice way to visualize the "lagged" effect of higher order derivatives. If you begin traveling with constant acceleration of 1, then after 1 unit of time, your displacement will be the area of the right triangle in a unit square, $1/2! = 1/2$. If you begin traveling with a constant jerk of 1, then after 1 unit of time, your displacement will be the area of a tetrahedron in the corner of a unit cube, $1/3!$ = $1/6$.

  • 232
  • 1
  • 10

There is also a natural and well-known proof using integration by parts.

Let $ f : I \to \mathbb{R} $ be a $ C^n $ function on open interval $ I $, and $ a,b \in I $. The goal is to relate $ f(b) $ to $ f(a) $ and $ f^{(j)}(a) $s.

$ f(b) = f(a) + \int_{a}^{b} f'(t) dt $

Using integration by parts on $ \int_{a}^{b} f'(t) dt $ will make higher derivative terms appear.
One thought is to write $ \int_{a}^{b} f'(t) dt = f'(t)t \bigr|_{a}^{b} - \int_{a}^{b} f''(t)t \, dt $, but $ f'(b) $ appears here.

To avoid this, we can instead do $ \int_{a}^{b} f'(t) dt = f'(t)(t-b) \bigr|_{a}^{b} - \int_{a}^{b} f''(t)(t-b)dt$.
So continuing this way,
$\scriptstyle{\begin{align} f(b) &= f(a) + \int_{a}^{b} f'(t) dt \\ &= f(a) + f'(t) (t-b) \bigr|_{a}^{b} - \int_{a}^{b} f^{(2)}(t) (t-b) dt \\ &= f(a) + f'(a) (b-a) - \left(f^{(2)}(t) \frac{(t-b)^2}{2} \Bigr|_{a}^{b} - \int_{a}^{b} f^{(3)}(t) \frac{(t-b)^2}{2} dt \right) \\ &= f(a) + f'(a) (b-a) + \frac{f^{(2)}(a)}{2} (b-a)^2 + \int_{a}^{b} f^{(3)}(t) \frac{(t-b)^2}{2} dt \\ &\vdots \\ &= f(a) + f'(a) (b-a) + \frac{f^{(2)}(a)}{2!} (b-a)^2 + \ldots + \frac{f^{(n-1)}(a)}{(n-1)!} (b-a)^{n-1} + (-1)^{n-1} \int_{a}^{b} f^{(n)}(t) \frac{(t-b)^{n-1}}{(n-1)!}dt, \end{align}}%$

the remainder term being $ \int_{a}^{b} f^{(n)}(t) \frac{(b-t)^{n-1}}{(n-1)!} dt$

Like in Bob Pego's answer, this can be expressed as $ \frac{f^{(n)}(c)}{n!} (b-a)^n $ where $ c \in [a,b] $.


Regarding the initial answer to the posted question (which is as straightforward of an approach to a proof of Taylor's Theorem as possible), I find the following the easiest way to explain how the last term on the RHS of the equation (the nested integrals) approaches 0 as the number of iterations (n) becomes arbitrarily large:

There are two cases - (1) f(x) is finitely differentiable or (2) f(x) is infinitely differentiable.

(1) if f(x) is finitely differentiable, then there exists a value of n s.t. for all derivatives of order n+1 or greater, the derivatives are 0, thus resulting in a nested integral with an innermost integral equal to 0, thus rendering the collective nested integral equal to 0, and thus giving us the aforementioned Taylor Polynomial of finite order n with no remainder.

(2) if f(x) is infinitely differentiable, then, as the number of iterations (n) approaches infinity, because we require by definition of the nested integrals that a < t_n < t_n-1 < t_n-2 < ... < t_2 < t_1 < x, we see that t_n -> a as n -> infinity. As a result, we have (as is true in case (1)), that the innermost integral of the collective nested integral approaches 0, thus giving us a remainder term of 0 in the limit, and hence resulting in the infinite series expression for the Taylor Series of the function, f(x).

Authors of most books will not be so kind to illustrate a proof in this manner, though. It's upsetting, I know.

  • 11

Let $ f $ be infinitely differentiable (we'll weaken this hypothesis later), on an open interval containing $ [a, a+h] $ (so $ h $ is $ > 0 $ for now, for simplicity).

Let's try to approximate $ f $, over $ [a, a+h] $, with a polynomial of degree $ \leq n $ :

$$ f(a+t) = a_0 + a_1 t + \dots + a_n t^n + \varepsilon(t), \text { for } t \in [0, h] $$

We didn't yet fix our approximating polynomial $ a_0 + a_1 t + \dots + a_n t^n $. We'll first fix it by picking some intuitively plausible $ a_0, \dots, a_n $, and then study the resulting error function $ \varepsilon(t) $.

Fixing an approximation : Intuitively, we want our approximation $ a_0 + a_1 t + \dots + a_n t^n $ to be such that $ \varepsilon(t) = f(a+t) - ( a_0 + a_1 t + \dots + a_n t^n ) $ is "as flat and close to the $ 0 $-function on $ [0, h] $ as possible". So we can try to make $ \{ \varepsilon(0) = 0; \varepsilon^{(1)}(0) = 0, \dots, \varepsilon^{(n-1)}(0) = 0; \varepsilon(h) = 0 \} $ ($ n + 1 $ constraints, to fix $ n+1 $ coefficients).
Since $ \varepsilon^{(k)}(0) = f^{(k)}(a) - k!a_k $, setting $ a_0 = f(a), $ $ a_1 = \dfrac{f^{(1)}(a)}{1!}, $ $ \dots, a_{n-1} = \dfrac{f^{(n-1)}(a)}{(n-1)!} $, $ a_n = \dfrac{ f(a+h) - f(a) - f'(a) h - \dots - \dfrac{f^{(n-1)}(a)}{(n-1)!}h^{n-1} }{h^n} $ would do the job.

On the resulting error function : $ \varepsilon(0) = \varepsilon(h) = 0 $ gives (by Rolle's theorem) $ \varepsilon'(h_1) = 0 $ for some $ 0 < h_1 < h $. Now $ \varepsilon'(0) = \varepsilon'(h_1) = 0 $ gives $ \varepsilon^{(2)}(h_2) = 0 $ for some $ 0 < h_2 < h_1 $. Now $ \varepsilon^{(2)}(0) = \varepsilon^{(2)}(h_2) = 0 $ gives $ \dots $ , so on. At the end, we get $ \varepsilon^{(n)}(h_n) = 0 $ for some $ 0 < h_n < h $, that is $ f^{(n)} (a+h_n) - n! a_n = 0 $ for some $ 0 < h_n < h $.

Finally, substituting the explicit value of $ a_n $ gives

$$ f(a+h) = f(a) + f'(a)h + \dots + \dfrac{f^{(n-1)}(a)}{(n-1)!} h^{n-1} + \dfrac{f^{(n)}(a+h_n)}{n!} h^n \text{ for some } 0 < h_n < h $$ , as needed.

[Looking back at the proof, we could have taken "$f$ is $n$ times differentiable..." instead of "$f$ is infinitely differentiable..." to begin with. Also, the same idea works for $ h < 0 $ too.]