$\lambda f.\lambda x. x$ is a function, not a number. But the point is that we can use it as a *model* of a number.

In modern mathematics, we understand things not as what they *are*, but as what they *do*. We don't spend a lot of time talking about the intrinsic essence of the number 1, or asking "but what *is* the number 1?" That's a question for philosophers, maybe.

Instead, we ask if something *behaves* the way we want the number 1 to behave.

What do we need to say that some collection of objects *behaves like* numbers? Well, at the very least, we would like there to be an infinite collection of the objects, all distinguishable, and we would like to be able to do addition and multiplication and the like. If we take the object that we identify as being the number 1, and add it to itself, we should get the object that we identify as being the number 2. If it doesn't do that, then it isn't very much like numbers.

You may want to consider the way computers handle numbers. The number 1 inside the computer is represented as some pattern of electron flow through some chunk of silicon. What does that have to do with the number 1? Is it really a number 1? Hard to say! But it does *act like* a number 1 in the following sense: we can ask the computer to add 1+1 and it will do some magic thing, and the result will be the pattern of electrons that has been designated as representing the number 2. But what do the electrons have to do with numbers? Are they *really* numbers? Is there 1 electron somewhere, and we add to it another 1 electron to get 2 electrons? (No.) Is the magic thing *really* addition? It's hard to see how, but for most purposes, we don't care.

The Church numerals are similar. Whether or not they *are* numbers isn't the question. The question is whether they *act like* numbers, or rather whether they can be made to act like numbers. For example, if you define
$$\begin{align}
\operatorname{plus} & = \lambda m n f x . (m f (n f x))\\
1 & = \lambda f x . f x\\
2 & = \lambda f x . f (f x)
\end{align}$$

you get a plus function: $\operatorname{plus} 1\, 1$ is in fact the same function as $2$. And that is because this thing we are calling “$2$” is essentially the operation that applies some function twice to an argument, the “$1$” is the operation that applies a function once to an argument, and $plus$ takes two numerals $m$ and $n$, and then a function and an argument, and applies the function first $n$ times to the argument and then applies the function $m$ times to the result, for a total of $m+n$ applications. So $\operatorname{plus} 1\, 1$ does the same thing as $2$ when applied to any arguments $f$ and $x$ (namely, it computes $f(f(x))$) and is therefore the same function; in the $\lambda$-calculus, the two objects are indistinguishable.

There's nothing special about this definition; there are many different ways one could define numbers in $\lambda$-calculus. For example:$\def\pair{\operatorname{pair}}$

$$
\begin{align}
0 & = \pair true \;true \\
1 & = \pair false \;0 \\
2 & = \pair false \;1 \\
3 & = \pair false \;2 \\
& \cdots
\end{align}
$$

This represents each number $n$ as a linked list of length $n$ falses followed by a true. We can test if a number is zero by examining the head of the list; it's $true$ if the number is zero and false otherwise. We can easily define a successor (“$+1$”) operation: $$\operatorname{successor} = \lambda n . \pair false\; n$$ and with enough ingenuity we can define addition and multiplication functions. Is this somehow less legitimate than the Church numerals? No; it's just less convenient. Do we prefer the Church numerals because they better capture the true essence of numbers? No, we only prefer the Church numerals because definition of $plus$ is simpler.

And again: What does it mean for something to be true or false? In this context we are not concerned with abstract truth or falsity. We just want something that behaves in a certain way: We want true and false to be different, and distinguishable, and we want to have some sort of if-then-else construction so that $\mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q$ evaluates to $p$ when $b$ is true, and to $q$ when $b$ is false. There is nothing more to it than that. The definitions

$$
\begin{align}
\mathbf{true} & = \lambda x y . x \\
\mathbf{false} & = \lambda x y . y\\
\mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q & = \lambda b p q . b p q
\end{align}
$$

do exactly that, and that's all we require of them. We could just as easily have defined them this way:

$$
\begin{align}
\mathbf{true} & = \lambda x y . y \\
\mathbf{false} & = \lambda x y . x\\
\mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q & = \lambda b p q . b q p
\end{align}
$$

Notice that I *switched the definitions* of $\mathbf{true}$ and $\mathbf{false}$ here—$\mathbf{true}$ now has the definition that usually belongs to $\mathbf{false}$, and vice versa—but I also changed the definition of $\mathbf{if}$ to match, so that $\mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q$ will still be $p$ when $b$ is $\mathbf{true}$. Is this somehow saying that truth and falsity are interchangeable? No, it says nothing of the sort. It says that *in our model* they are interchangeable. But nobody ever said that the model was supposed to capture all the properties of truth, like that it is true that snow is white. It's only intended to capture the one property of truth that we intended to model, which is that $\mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q$ is $p$ whenever $b$ is true, and $q$ whenever $b$ is false.