Though the title seems clear enough, I'd like to start with a discussion of how I personally came to derive the Jordan Normal Form, because my question is very specific to the details of my derivation.

## Notation

To start, let $X$ be a finite dimensional vector space, $L(X)$ be the space of linear operators on $X$, and $A\in L(X)$. Let $\sigma(A) = \{\lambda_1,\ \cdots,\ \lambda_k\}$ be the spectrum of $A$. Now, we define

- $d(\lambda)$ to be the
**geometric**multiplicity of $\lambda$ - $m(\lambda)$ to be the
**algebraic**multiplicity of $\lambda$

Next, we denote the $k$th generalized eigenspace of $\lambda$ by $$ \text{N}_k(\lambda) = \text{Ker}(A-\lambda I)^k $$ and finally, we let $$ \text{N}(\lambda) = N_{n(\lambda)}(\lambda)\qquad n(\lambda)=\min\{k\in\mathbb{N}\ |\ \text{N}_k(\lambda)=N_{k+1}(\lambda)\} $$ we note that it can be shown that $n(\lambda) = m(\lambda)$, and so the notation $n(\lambda)$ won't really be used.

We will also let $\sum_\lambda$, $\prod_\lambda$, etc. represent the sum/product/etc. over distinct eigenvalues of $A$.

## Fundamentals

First off, it is known that we can decompose $X$ as $$ X = \text{N}(\lambda_1)\oplus\cdots\oplus\text{N}(\lambda_k) $$ Hence $\sum_{\lambda} \dim\ \text{N}(\lambda) = \dim X$. Also, from the characteristic polynomial of $A$, the sum of the algebraic multiplicities of the eigenvalues must equal the degree of the polynomial, which is $\dim X$. Thus $$ \sum_\lambda\dim\ \text{N}(\lambda) = \sum_\lambda m(\lambda) = \dim X $$ Going in a different direction, we present the following theorem:

Theorem:If $B\in L(X)$ is nilpotent of order $n$, and $S\subset X\backslash\text{Ker} B^{n-1}$ is linearly independent, then $$ \bigcup_{x\in S}\{x,\ Bx,\ B^2x,\ \cdots,\ B^{n-1}x\} $$ is linearly independent.

**Proof:** We will show the case for $|S|=2$, and the general case follows the same format. Suppose $S = \{x,\ y\}$, and
$$
\sum_{k=0}^{n-1} a_k B^kx_1 + \sum_{k=0}^{n-1}b_k B^kx_2 = 0
$$
applying $B^{n-1}$ to both sides gives
$$
B^{n-1}\left(\sum_{k=0}^{n-1}a_kB^kx_1+b_kB^kx_2\right) = a_0B^{n-1}x_1+b_0B^{n-1}x_2 = B^{n-1}(a_0x_1+b_0x_2) = 0
$$
so $a_0x_1 + b_0x_2\in\text{Ker}B^{n-1}$. However, since $\text{Ker}B^{n-1}$ is a subspace of $X$, we can decompose $X$ as $X = \text{Ker}B^{n-1}\oplus Z$ for some vector space $Z$, for which $\{x_1,\ x_2\}\subset Z\backslash\{0\}$. Since $Z$ is a subspace, $a_0x_1+b_0x_2\in Z$. To say that $a_0x_1+b_0x_2\in \text{Ker}B^{n-1}\cap Z$ is equivalent to saying $a_0x_1+b_0x_2 = 0$. By linear independence of $S$, $a_0=b_0=0$. This process can be repeated to get $a_j=b_j=0$ for all $j$. $\blacksquare$

Now, take $x\in \text{N}(\lambda)\backslash \text{N}_{m(\lambda)-1}(\lambda)$. Note that $B_\lambda = (A - \lambda I)|_{\text{N}(\lambda)}$ (that is, $A - \lambda I$ restricted to $\text{N}(\lambda)$) is nilpotent of order $m(\lambda)$. Hence $\{x,\ B_\lambda x,\ \cdots,\ B_\lambda^{m(\lambda)-1}x\}$ is linearly independent, and it's span is a subspace of $\text{N}(\lambda)$. Hence $\dim \text{N}(\lambda) \ge m(\lambda)$.

If we suppose that $\dim\text{N}(\lambda) > m(\lambda)$ for at least one $\lambda\in\sigma(A)$, then we contradict the fact that $\sum_\lambda\dim\text{N}(\lambda) = \dim X$, and so we conclude that $m(\lambda) = \dim\text{N}(\lambda)$.

Alright, so far so good I hope...

### Jordan Normal Form

By the above arguments, we conclude that $\text{Span}\{x,\ \cdots,\ B^{m(\lambda)-1}x\} = \text{N}(\lambda)$. Hence, if we let $e_0(\lambda)\in N(\lambda)\backslash N_{m(\lambda)-1}(\lambda)$, and $e_k(\lambda)=(A-\lambda I)^k e_0(\lambda)$, then $$ \text{Span}\left(\bigcup_{\lambda}\bigcup_{k=0}^{m(\lambda)-1}\{e_k(\lambda)\}\right) = X $$

Since $X = \text{N}(\lambda_1)\oplus\cdots\oplus\text{N}(\lambda_k)$, and each $\text{N}(\lambda_k)$ is $A$-invariant (that is $A(\text{N}(\lambda_k))\subseteq \text{N}(\lambda_k)$), it follows that if we have bases for each $N(\lambda_i)$, then we can get the following matrix representation of $A$ wrt the union of these bases: $$ A = \left[\begin{matrix} A|_{\text{N}(\lambda_1)} & O & \cdots & \vdots \\ O & A|_{\text{N}(\lambda_2)} & \cdots & \vdots \\ \vdots & \vdots & \ddots & \vdots \\ \cdots & \cdots & \cdots & A|_{\text{N}(\lambda_k)} \end{matrix}\right] $$ where $A|_{\text{N}(\lambda_i)}$ is the matrix representation of $A$ restricted to $\text{N}(\lambda_i)$ wrt the basis of $\text{N}(\lambda_i)$.

Above, we demonstrated that $\{e_{m(\lambda)-1}(\lambda),\ \cdots,\ e_1(\lambda)\}$ is a basis for $\text{N}(\lambda)$. We can find a matrix representation for $A|_{\text{N}(\lambda_i)}$ by noting that $$ Ae_k(\lambda) = A(A-\lambda I)^ke_1(\lambda) = (A-\lambda I)^{k+1}e_1(\lambda) + \lambda(A-\lambda I)^ke_1(\lambda) \\ Ae_k(\lambda) = e_{k+1}(\lambda)+\lambda e_k(\lambda) \\ Ae_{m(\lambda)-1}(\lambda) = \lambda e_{m(\lambda)-1}(\lambda) $$ and so $$ A|_{N(\lambda)} = \left[\begin{matrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \cdots & 0 \\ 0 & 0 & \lambda & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & 1 \\ 0 & 0 & 0 & \cdots & \lambda \end{matrix}\right] $$

These $A|_{N(\lambda)}$ are the **Jordan Blocks**, and the matrix representation of $A$ above is the **Jordan Normal Form**.

### Main Question

I'm pretty content with this derivation, nothing seems confusing or out of place or contradictory or nonrigorous, at least at a surface level. I would not be asking this question if I didn't go to the Wikipedia page on the Jordan Normal Form and see this line:

The number of Jordan blocks corresponding to $\lambda$ of size at least $j$ is $\dim \text{Ker}(A - \lambda I)^j - \dim \text{Ker}(A - \lambda I)^{j-1}$.

My "derivation" doesn't account for the fact that there can be **multiple Jordan Blocks corresponding to the same eigenvalue**. So, in the broadest sense possible, why? What don't I account for?

My *idea* was that I "assumed" that $\text{Span}\{x,\ \cdots,\ \text{B}_\lambda^{m(\lambda)-1}x\} = \text{N}(\lambda)$. If there are more elements in the basis for $\text{N}(\lambda)$ than this, then there are more Jordan blocks. But if $\text{N}(\lambda)>m(\lambda)$, then the decomposition of $X$ into the direct sum of generalized eigenspaces fails, since the dimensions don't add up. My only other guess is that $\{x,\ \cdots,\ \text{B}_\lambda^{m(\lambda)-1}x\}$ can be "broken down" in some sense into the union of smaller bases which *then* produce more Jordan blocks, but I can't quite see where to go with that.

Any help would be appreciated. Thank you for your time!