Suppose that we approximate a function $f(x)$ for $x$ near $0$ by a polynomial of degree $n$: $$f(x)\approx P_n(x)=C_0+C_1x+C_xx^2 + \dots + C_{n-1}x^{n-1} +C_nx^n$$ We need to find the values of the constants: $C_0,C_1,C_2,\dots , C_n$. To do this, we require that the function $f(x)$ and each of its first $n$ derivatives agree with those of the polynomial $P_n(x)$ at the point $x=0$. In general, the more derivatives that agree at $x=0$, the larger the interval on which the function and the polynomial remain close to each other.

My questions are:

Why is it a requirement that each $n$ derivatives of $f(x)$ agree with the $P(x)$? Would not following this requirement make $P(x)$ a worse approximation? How can we show this? My feeling is that this is simply an extension from observing that for $P_1(x)$, $P(0)=f(x)$ and $P'(x)=f'(x)$. So, to get an even better approximation, let's just add more terms where each additional term is the $n$ derivative of $f(x)$. Is there more to this than what I just said?

Suppose $P_1(x)$ is already a good approximation to $f(x)$. When we create $P_2(x)$, we have an $x^2$ term. Since $x^2>x$, wouldn't $P_2(x)$ cause a larger difference between $f(x)$ compared to between $P_1(x)$ and $f(x)$, i.e. the higher the degree of the polynomial, the worse the approximation gets.

The last sentence, "The more derivatives that agree...". How can we prove that?

Why use a power series to approximate a function? Are there other series that is not a power series that can that be used to approximate a function?