Given a smooth real function $f$, we can approximate it as a sum of polynomials as $$f(x+h)=f(x)+h f'(x) + \frac{h^2}{2!} f''(x)+ \dotsb = \sum_{k=0}^n \frac{h^k}{k!} f^{(k)}(x) + h^n R_n(h),$$ where $\lim_{h\to0} R_n(h)=0$.

There are multiple ways to derive this result. Some are already discussed in the answers to the question "Where do the factorials come from in the taylor series?".

An easy way to see why the $1/k!$ factors must be there is to observe that computing $\partial_h^k f(x+h)\rvert_{h=0}$, we need the $1/k!$ factors to balance the $k!$ factors arising from $\partial_h^k h^k=k!$ in order to get a consistent result on the left- and right-hand sides.

However, even though algebraically it is very clear why we need these factorials, I don't have any intuition as to why they should be there. Is there any geometrical (or similarly intuitive) argument to see where they come from?

  • 5,543
  • 2
  • 24
  • 46
  • 9
    Check out 3Blue1Brown's [video](https://www.youtube.com/watch?v=3d6DsjIBzJ4) on Taylor series. – Toby Mak Jul 12 '20 at 04:21
  • 7
    Answering this question would take some serious [balls](http://en.wikipedia.org/wiki/Volume_of_an_n-ball). – Lucian Jul 12 '20 at 17:53
  • 1
    The question specifically highlights a "geometric" intuition (or other similarly intuitive explanation, but...). For my money, there should be a really stellar *picture* which gives this intuition. I don't see it myself, but I imagine that someone else has the idea. I would very much like to see such a figure. – Xander Henderson Jul 14 '20 at 03:47
  • 2
    @XanderHenderson I think a very nice one can be obtained via the FTC at [17:01 of 3b1b's video](https://youtu.be/3d6DsjIBzJ4?t=1021). I think the same idea can be pushed to 3D to also see the $1/6$ factor arising [with something like this](https://i.stack.imgur.com/X0qzG.png) (green volume is the linear correction, purple the second order correction with factor $\frac12$, red the third order one with factor $\frac16$). These are essentially visual representations of the ideas in [the currently most upvoted answer](https://math.stackexchange.com/a/3753600/173147) – glS Jul 14 '20 at 06:32
  • @glS Indeed. The point of my comment was to encourage an answerer to include such a figure or representation here. Math SE is meant to be relatively self-contained, thus off-site links aren't as good as inline images here. – Xander Henderson Jul 14 '20 at 12:31
  • Xander brought this question up in [the Pearl Dive](https://chat.stackexchange.com/rooms/102837/pearl-dive). – Jyrki Lahtonen Jul 31 '20 at 14:23

7 Answers7


Yes. There is a geometric explanation. For simplicity, let me take $x=0$ and $h=1$. By the Fundamental Theorem of Calculus (FTC), $$ f(1)=f(0)+\int_{0}^{1}dt_1\ f'(t_1)\ . $$ Now use the FTC for the $f'(t_1)$ inside the integral, which gives $$ f'(t_1)=f'(0)+\int_{0}^{t_1}dt_2\ f''(t_2)\ , $$ and insert this in the previous equation. We then get $$ f(1)=f(0)+f'(0)+\int_{0}^{1}dt_1\int_{0}^{t_1}dt_2 f''(t_2)\ . $$ Keep iterating this, using the FTC to rewrite the last integrand, each time invoking a new variable $t_k$. At the end of the day, one obtains $$ f(1)=\sum_{k=0}^{n}\int_{\Delta_k} dt_1\cdots dt_k\ f^{(k)}(0)\ +\ {\rm remainder} $$ where $\Delta_k$ is the simplex $$ \{(t_1,\ldots,t_k)\in\mathbb{R}^k\ |\ 1>t_1>\cdots>t_k>0\}\ . $$ For example $\Delta_{2}$ is a triangle in the plane, and $\Delta_3$ is a tetrahedron in 3D, etc. The $\frac{1}{k!}$ is just the volume of $\Delta_k$. Indeed, by a simple change of variables (renaming), the volume is the same for all $k!$ simplices of the form $$ \{(t_1,\ldots,t_k)\in\mathbb{R}^k\ |\ 1>t_{\sigma(1)}>\cdots>t_{\sigma(k)}>0\} $$ where $\sigma$ is a permutation of $\{1,2,\ldots,k\}$. Putting all these simplices together essentially reproduces the cube $[0,1]^k$ which of course has volume $1$.

Exercise: Recover the usual formula for the integral remainder using the above method.

Remark 1: As Sangchul said in the comment, the method is related to the notion of ordered exponential. In a basic course on ODEs, one usually sees the notion of fundamental solution $\Phi(t)$ of a linear system of differential equations $X'(t)=A(t)X(t)$. One can rewrite the equation for $\Phi(t)$ in integral form and do the same iteration as in the above method with the result $$ \Phi(s)=\sum_{k=0}^{\infty}\int_{s\Delta_k} dt_1\cdots dt_k\ A(t_1)\cdots A(t_k)\ . $$ It is only when the matrices $A(t)$ for different times commute, that one can use the above permutation and cube reconstruction, in order to write the above series as an exponential. This happens in one dimension and also when $A(t)$ is time independent, i.e., for the two textbook examples where one has explicit formulas.

Remark 2: The method I used for the Taylor expansion is related to how Newton approached the question using divided differences. The relation between Newton's iterated divided differences and the iterated integrals I used is provide by the Hermite-Genocchi formula.

Remark 3: These iterated integrals are also useful in proving some combinatorial identities, see this MO answer:


They were also used by K.T. Chen in topology, and they also feature in the theory of rough paths developed by Terry Lyons.

The best I can do, as far as (hopefully) nice figures, is the following.

The simplex $\Delta_1$

enter image description here

has one-dimensional volume, i.e., length $=1=\frac{1}{1!}$.

The simplex $\Delta_2$

enter image description here

has two-dimensional volume, i.e., area $=\frac{1}{2}=\frac{1}{2!}$.

The simplex $\Delta_3$

enter image description here

has three-dimensional volume, i.e., just volume $=\frac{1}{6}=\frac{1}{3!}$.

For obvious reasons, I will stop here.

Abdelmalek Abdesselam
  • 3,860
  • 12
  • 24
  • 2
    Nice rendition using ordered exponential! – Sangchul Lee Jul 12 '20 at 01:03
  • I think this would be much clearer if you didn't take $h=1$ but left it general – Bananach Jul 13 '20 at 00:26
  • 2
    I have put a bounty on this question in hopes of getting some nice *figures*. I find it hard to believe that a geometrical interpretation is not amenable to nice graphics. If you have the inclination, I think that some of the ideas you present would make for nice visualizations. – Xander Henderson Jul 27 '20 at 17:31
  • @XanderHenderson: Not that easy to explain this visually, but I gave it a try. – Abdelmalek Abdesselam Jul 30 '20 at 22:34
  • 2
    I was just about to add a picture showing how a cube is split into six identical simplices by the planes $x=y$, $y=z$ and $z=x$. But then you had already added pictures, so I guess that would be pointless. Anyway, thanks for a nice answer as well as doing your best visualizing it :-) – Jyrki Lahtonen Jul 31 '20 at 14:22
  • @JyrkiLahtonen: Thanks. What would be best would be an animation where the six identical pieces would be moved around and recombined into the unit cube. However, this is way beyond my current skills. – Abdelmalek Abdesselam Jul 31 '20 at 17:44
  • I think one half of them are mirror images of the other half, so they cannot be shown as rotated copies of the same piece. I am practicing animations with Mathematica (for fun as well as for educational material), so I might give it a try. – Jyrki Lahtonen Jul 31 '20 at 18:29
  • Too late an hour. FWIW the next to last picture [here](https://math.stackexchange.com/a/3665362/11619) shows all eight octants split into six simplices like here. – Jyrki Lahtonen Jul 31 '20 at 21:30
  • @JyrkiLahtonen: You are correct of course. One needs three identical pieces and three mirror images, which reflects the sign of the six permutations. – Abdelmalek Abdesselam Aug 03 '20 at 17:32

Here is a heuristic argument which I believe naturally explains why we expect the factor $\frac{1}{k!}$.

Assume that $f$ is a "nice" function. Then by linear approximation,

$$ f(x+h) \approx f(x) + f'(x)h. \tag{1} $$

Formally, if we write $D = \frac{\mathrm{d}}{\mathrm{d}x}$, then the above may be recast as $f(x+h) \approx (1 + hD)f(x)$. Now applying this twice, we also have

\begin{align*} f(x+2h) &\approx f(x+h) + f'(x+h) h \\ &\approx \left( f(x) + f'(x)h \right) + \left( f'(x) + f''(x)h \right)h \\ &= f(x) + 2f'(x)h + f''(x)h^2. \tag{2} \end{align*}

If we drop the $f''(x)h^2$ term, which amounts to replacing $f'(x+h)$ by $f'(x)$, $\text{(2)}$ reduces to $\text{(1)}$ with $h$ replaced by $2h$. So $\text{(2)}$ may be regarded as a better approximation to $f(x+2h)$, with the extra term $f''(x)h^2$ accounting for the effect of the curvature of the graph. We also note that, using $D$, we may formally express $\text{(2)}$ as $f(x+2h) \approx (1+hD)^2 f(x)$.

Continuing in this way, we would get

$$ f(x+nh) \approx (1+hD)^n f(x) = \sum_{k=0}^{n} \binom{n}{k} f^{(k)}(x) h^k. \tag{3} $$

So by replacing $h$ by $h/n$,

$$ f(x+h) \approx \left(1 + \frac{hD}{n}\right)^n f(x) = \sum_{k=0}^{n} \frac{1}{n^k} \binom{n}{k} f^{(k)}(x) h^k. \tag{4} $$

Now, since $f$ is "nice", we may hope that the error between both sides of $\text{(4)}$ will vanish as $n\to\infty$. In such case, either by using $\lim_{n\to\infty} \frac{1}{n^k}\binom{n}{k} = \frac{1}{k!}$ or $\lim_{n\to\infty} \left(1 + \frac{a}{n}\right)^n = e^a = \sum_{k=0}^{\infty} \frac{a^k}{k!} $,

$$ f(x+h) = e^{hD} f(x) = \sum_{k=0}^{\infty} \frac{1}{k!} f^{(k)}(x) h^k . \tag{5} $$

Although this above heuristic leading to $\text{(5)}$ relies on massive hand-waving, the formal relation in $\text{(5)}$ is justified in the context of functional analysis and tells that $D$ is the infinitesimal generator of the translation semigroup.

Sangchul Lee
  • 141,630
  • 16
  • 242
  • 385
  • this is a very nice way to see it. A few notes: there is an $h$ missing in the second row of (2) I think. Also, if you rewrite (2) with $h\to h/2$, isn't the last factor off? You would get $f(x+h)=f(x)+f'(x)h + \frac14 f''(x) h^2$ – glS Jul 11 '20 at 14:19
  • @glS, Glad it helped! It was also quite appealing to me when I first heard this approach. Also thank you for pointing out the typos. I fixed them now. – Sangchul Lee Jul 11 '20 at 14:25
  • thanks. Isn't the factor still off though? Or am I missing something? – glS Jul 11 '20 at 14:27
  • @glS, That is a valid objection. Answering to that question, another half of the second-order term is buried in the error of that hand-waving relation $\approx$. Thankfully, the missing part is eventually retrieved as $n \to \infty$, although analyzing what exactly happens in that process now requires more than heuristics. – Sangchul Lee Jul 11 '20 at 14:28
  • ah, I see. So to get the correct $1/2$ term we actually need to apply infinitely many times the linear approximation: $\left(1+ \frac{h}{n}D\right)^n=1+hD+ \binom{n}{2}\frac{h^2}{n^2}D^2+(...)$ and thus only for $n\to \infty$ we have $\binom{n}{2}\frac1{n^2}\to\frac12$. – glS Jul 11 '20 at 14:49
  • @glS, That is correct :) – Sangchul Lee Jul 11 '20 at 15:00
  • (+1) This is a very clear heuristic explanation. Well done. – Mark Viola Jul 14 '20 at 18:28

The polynomials


have two remarkable properties:

  • they are derivatives of each other, $p_{k+1}'(h)=p_k(h)$,

  • their $n^{th}$ derivative at $h=0$ is $\delta_{kn}$ (i.e. $1$ iff $n=k$, $0$ otherwise).

For this reason, they form a natural basis to express a function in terms of the derivatives at a given point: if you form a linear combination with coefficients $c_k$, evaluating the linear combination at $h=0$ as well as the derivatives of this linear combination at $h=0$, you will retrieve exactly the coefficients $c_k$. The denominators $k!$ ensure sufficient damping of the fast growing functions $h^k$ for the unit condition to hold and act as normalization factors.

$$\begin{pmatrix}f(x)\\f'(x)\\f''(x)\\f'''(x)\\f''''(x)\\\cdots\end{pmatrix}= \begin{pmatrix}1&h&\frac{h^2}2&\frac{h^3}{3!}&\frac{h^4}{4!}&\cdots \\0&1&h&\frac{h^2}2&\frac{h^3}{3!}&\cdots \\0&0&1&h&\frac{h^2}2&\cdots \\0&0&0&1&h&\cdots \\0&0&0&0&1&\cdots \\&&&\cdots \end{pmatrix} \begin{pmatrix}f(0)\\f'(0)\\f''(0)\\f'''(0)\\f''''(0)\\\cdots\end{pmatrix}$$

$$\begin{pmatrix}f(0)\\f'(0)\\f''(0)\\f'''(0)\\f''''(0)\\\cdots\end{pmatrix}= \begin{pmatrix}1&0&0&0&0&\cdots \\0&1&0&0&0&\cdots \\0&0&1&0&0&\cdots \\0&0&0&1&0&\cdots \\0&0&0&0&1&\cdots \\&&&\cdots \end{pmatrix} \begin{pmatrix}f(0)\\f'(0)\\f''(0)\\f'''(0)\\f''''(0)\\\cdots\end{pmatrix}$$

  • Good answer, but it feels a bit incomplete without explaining why $c_{k} = f^{(k)}(x)$. Something like $c_{k} = k f^{(k)}(x)$ could seem plausible to someone who didn't know better. Although I guess a quick look at the expansion of the exponential function would clear that up. – Jacob Maibach Jul 12 '20 at 03:41
  • @JacobMaibach: it's all in $\delta_{kn}$. Feel free to provide your own answer. –  Jul 12 '20 at 08:51
  • 1
    @JacobMaibach I think it's quite clear why $c_k = f^{(k)}(x)$ are the coefficients, simply because differentiating $k$ times and evaluating at $h=0$ yields exactly that relation. – Alex Jones Jul 12 '20 at 11:06

The simplest way to look it is that the second derivative of $x^2$ is $2x= 2= 2!$, the third derivative of $x^3$ is $6= 3!$, and, in general, the $n$'th derivative of $x^n$ is $n!$. That's where the factorials in the Taylor series come from.

If $$f(x)= \frac{a_0}{0!} + \frac{a_1}{1!}(x- q)+ \frac{a_2}{2!}(x- q)^2+ \frac{a_3}{3!}(x- q)^3+ \cdots$$ then $$f(q)= a_0,$$ $$f'(q)= a_1,$$ $$f''(q)= a_2,$$ $$f'''(q)= a_3,$$ and in general the $n$'th derivative $f^{(n)}(q) = a_n$.

The point of the factorial in the denominator is to make those derivatives come out right.

  • 1,051
  • 1
  • 7
  • 12
  • 18,028
  • 2
  • 10
  • 20

I claim that if you understand why $\exp(x)$ has the form it does, the Taylor expansion makes more sense. Usually, when thinking about the Taylor expansion, we imagine that we are representing a function $f(x)$ as a 'mix' of polynomial terms, which is reasonable, but we could also think of it as a mix of exponential terms.

Why is this? Well, recall that $\exp(x)$ satisfies $\frac{d}{dx} \exp(x) = \exp(x)$. In fact, if we try to find functions $g(x)$ such that $\frac{dg}{dx} = g(x)$, we find that $g(x) = A \exp(x)$ for some constant $A$. This is a critial property of the exponential, and in fact if we try to solve $\frac{d^{k}}{dx^{k}}g(x) = g(x)$ for other values of $k$, we find that again $A \exp(x)$ is a solution. For some $k$ (e.g. $k=4$), there are other solutions, but the functions $A \exp(x)$ are the only ones that work for all $k$.

This allows us to 'store' information about the derivatives of $f(x)$ in the exponential. That is, we want to build a function with the same $k^{th}$ derivative as $f$ at particular value of $x$, we can. In fact, we can do so for all $k$ at the same time. And we do it by patching togther exponential functions in a clever, although kinda opaque, way.

To be precise, it comes out as

$$f(x+h) = f(x) \exp(h) + (f'(x) - f(x)) (\exp(h) - 1) + (f''(x) - f'(x)) (\exp(h) - 1 - h) \ + \ ...$$

which does not seem particularly illuminating (there is a nice explanation in terms of what are called generalized eigenvectors). But regardless, it is possible, which is important.

As for why does the exponential has the form it does, I would point you to other questions on MSE such as this that dig into it. What I will say is that the form $x^{k}/k!$ is deeply linked to the exponential, so anytime you see something like that it makes sense to think of the exponential. It certainly shows up in other places in math, even in something like combinatorics which is far from calculus (see exponential generating functions).

[This answer was partially a response to the answer by @YvesDaoust, but I'm not sure it really succeeds in making things clearer.]

Jacob Maibach
  • 2,025
  • 2
  • 12
  • 20


  1. The iterated integral of $x\to 1$ $n$ times :

$$\cases{I_1(t) = \displaystyle \int_0^t 1 d\tau\\ I_n(t) = \displaystyle \int_0^{t} I_{n-1}(\tau) d\tau}$$

gives us $$\left\{1,x,\frac{x^2}{2!},\frac{x^3}{3!},\cdots,\frac{x^k}{k!},\cdots\right\}$$

together with

  1. Linearity of differentiation : $\frac{\partial \{af(x)+bg(x)\}}{\partial x} = \frac{\partial \{af(x)\}}{\partial x} + \frac{\partial \{bg(x)\}}{\partial x}$

  2. Area-under-curve interpretation of integral of non-negative functions.

gives a geometric interpretation for this. It is basically an iterated area-under-curve calculation for each monomial term in the Taylor expansion.

  • 24,082
  • 9
  • 33
  • 83

I think it can't be done using only simple geometric rules of inference. A geometric proof is a proof using only very simple geometric rules of inference. A geometric proof can tell you that $\forall x \in \mathbb{R}\sin'(x) = \cos(x) \text{and} \cos'(x) = -\sin(x)$. I think geometry doesn't define what the graph $y = x^3$ is. We can however decide to informally say it to mean something else.

I suppose given that two points are the points (0, 0) and (1, 0), you could define (0.5, 0) to be the point such that a translation that moves (0, 0) to that point also moves that point to (1, 0) and can define all the ordered pairs of rational numbers in a similar way. This is too hard for me to figure out for sure but I guess we could also add axioms for distance and which points each Cauchy sequence of points approaches or something like that. Then given any real number $x$ in binary notation, we could compute the binary notation of $x^3$ and then find a way to approach the point $(x, x^3)$ using a geometrical argument.

For what the Taylor series of $\sin$ and $\cos$ are, you pretty much have to prove it through the method of checking that one Taylor series is a derivative of the other and the other is the negative of the derivative of it. Then using geometry, given any real number $x$, you can sum up all the terms to find the point $(\sum_{i = 0}^\infty(-1)^i\frac{x^{2i}}{(2i)!}, \sum_{i = 0}^\infty(-1)^i\frac{x^{2i + 1}}{(2i + 1)!})$. Then using a lot of rigour, you can show that it will always be the same point as the point $(\cos(x), \sin(x))$ which is gotten by very simple means in geometry itself. It be be very simple to just construct the actual point in space $(\cos(x), \sin(x))$ from the binary notation of the real number $x$. However, computing that that point in fact has the coordinates $(\cos(x), \sin(x))$ is a bit more complicated.

  • 741
  • 7
  • 16