As some people on this site might be aware I don't always take downvotes well. So here's my attempt to provide more context to my answer for whoever decided to downvote.

Note that I will confine my discussion to functions $f: D\subseteq \Bbb R \to \Bbb R$ and to ideas that should be simple enough for anyone who's taken a course in scalar calculus to understand. Let me know if I haven't succeeded in some way.

First, it'll be convenient for us to define a new notation. It's called "little oh" notation.

**Definition**: A function $f$ is called little oh of $g$ as $x\to a$, denoted $f\in o(g)$ as $x\to a$, if

$$\lim_{x\to a}\frac {f(x)}{g(x)}=0$$

Intuitively this means that $f(x)\to 0$ as $x\to a$ "faster" than $g$ does.

Here are some examples:

- $x\in o(1)$ as $x\to 0$
- $x^2 \in o(x)$ as $x\to 0$
- $x\in o(x^2)$ as $x\to \infty$
- $x-\sin(x)\in o(x)$ as $x\to 0$
- $x-\sin(x)\in o(x^2)$ as $x\to 0$
- $x-\sin(x)\not\in o(x^3)$ as $x\to 0$

Now what is an affine approximation? (Note: I prefer to call it affine rather than linear -- if you've taken linear algebra then you'll know why.) It is simply a function $T(x) = A + Bx$ that *approximates* the function in question.

Intuitively it should be clear which affine function should best approximate the function $f$ very near $a$. It should be $$L(x) = f(a) + f'(a)(x-a).$$ Why? Well consider that any affine function really only carries two pieces of information: slope and some point on the line. The function $L$ as I've defined it has the properties $L(a)=f(a)$ and $L'(a)=f'(a)$. Thus $L$ is the unique line which passes through the point $(a,f(a))$ and has the slope $f'(a)$.

But we can be a little more rigorous. Below I give a lemma and a theorem that tell us that $L(x) = f(a) + f'(a)(x-a)$ is the **best affine approximation** of the function $f$ at $a$.

**Lemma**: If a differentiable function $f$ can be written, for all $x$ in some neighborhood of $a$, as $$f(x) = A + B\cdot(x-a) + R(x-a)$$ where $A, B$ are constants and $R\in o(x-a)$, then $A=f(a)$ and $B=f'(a)$.

**Proof**: First notice that because $f$, $A$, and $B\cdot(x-a)$ are continuous at $x=a$, $R$ must be too. Then setting $x=a$ we immediately see that $f(a)=A$.

Then, rearranging the equation we get (for all $x\ne a$)

$$\frac{f(x)-f(a)}{x-a} = \frac{f(x)-A}{x-a} = \frac{B\cdot (x-a)+R(x-a)}{x-a} = B + \frac{R(x-a)}{x-a}$$

Then taking the limit as $x\to a$ we see that $B=f'(a)$. $\ \ \ \square$

**Theorem**: A function $f$ is differentiable at $a$ iff, for all $x$ in some neighborhood of $a$, $f(x)$ can be written as
$$f(x) = f(a) + B\cdot(x-a) + R(x-a)$$ where $B \in \Bbb R$ and $R\in o(x-a)$.

**Proof**: "$\implies$": If $f$ is differentiable then $f'(a) = \lim_{x\to a} \frac{f(x)-f(a)}{x-a}$ exists. This can alternatively be written $$f'(a) = \frac{f(x)-f(a)}{x-a} + r(x-a)$$ where the "remainder function" $r$ has the property $\lim_{x \to a} r(x-a)=0$. Rearranging this equation we get $$f(x) = f(a) + f'(a)(x-a) -r(x-a)(x-a).$$ Let $R(x-a):= -r(x-a)(x-a)$. Then clearly $R\in o(x-a)$ (confirm this for yourself). So $$f(x) = f(a) + f'(a)(x-a) + R(x-a)$$ as required.

"$\impliedby$": Simple rearrangement of this equation yields

$$B + \frac{R(x-a)}{x-a}= \frac{f(x)-f(a)}{x-a}.$$ The limit as $x\to a$ of the LHS exists and thus the limit also exists for the RHS. This implies $f$ is differentiable by the standard definition of differentiability. $\ \ \ \square$

Taken together the above lemma and theorem tell us that not only is $L(x) = f(a) + f'(a)(x-a)$ the only affine function who's remainder tends to $0$ as $x\to a$ **faster** than $x-a$ itself (this is the sense in which this approximation is the *best*), but also that we can even **define the concept differentiability** by the existence of this best affine approximation.