When I was first learning abstract algebra, the professor gave the usual sequence of results for polynomials over a field: the Division Algorithm, the Remainder Theorem, and the Factor Theorem, followed by the Corollary that if $D$ is an integral domain, and $E$ is any integral domain that contains $D$, then a polynomial of degree $n$ with coefficients in $D$ has at most $n$ distinct roots in $E$.

He then challenged us, as a homework, to go over the proof of the Factor Theorem and to point out exactly which, where, and how the axioms of a field used in the proof.

Every single one of us missed the fact that commutativity is used.

Here's the issue: the division algorithm (on either side), *does* hold in $\mathbb{H}[x]$ (in fact, over any ring, commutative or not, in which the leading coefficient of the divisor is a unit). So given a polynomial $p(x)$ with coefficients in $\mathbb{H}$, and a nonzero $a(x)\in\mathbb{H}[x]$, there exist unique $q(x)$ and $r(x)$ in $\mathbb{H}[x]$ such that $p(x) = q(x)a(x) + r(x)$, and $r(x)=0$ or $\deg(r)\lt\deg(a)$. (There also exist unique $q'(x)$ and $s(x)$ such that $p(x) = a(x)q'(x) + s(x)$ and $s(x)=0$ or $\deg(s)\lt\deg(a)$.

The usual argument runs as follows: given $a\in\mathbb{H}$ and $p(x)$, divide $p(x)$ by $x-a$ to get $p(x) = q(x)(x-a) + r$, with $r$ constant. *Evaluating at $a$ we get $p(a) = q(a)(a-a)+r = r$, so $r=p(a)$. Hence $a$ is a root if and only if $(x-a)$ divides $p(x)$.*

If $b$ is a root of $p(x)$, $b\neq a$, then *evaluating at $b$ we have $0=p(b) = q(b)(b-a)$; since $b-a\neq 0$, then $q(b)=0$, so $b$ must be a root of $q$; since $\deg(q)=\deg(p)-1$, an inductive hypothesis tells us that $q(x)$ has at most $\deg(p)-1$ distinct roots, so $p$ has at most $\deg(p)$ roots.*

And *that* is where we are using commutativity: to go from $p(x) = q(x)(x-a)$ to $p(b) = q(b)(b-a)$.

Let $R$ be a ring, and let $a\in R$. Then $a$ induces a set-theoretic map from $R[x]$ to $R$, "evaluation at $a$", $\varepsilon_a\colon R[x]\to R$ by evaluation:
$$\varepsilon_a(b_0+b_1x+\cdots + b_nx^n) = b_0 + b_1a + \cdots + b_na^n.$$
This map is a group homomorphism, and *if* $a$ is central, also a ring homomorphism; if $a$ is *not* central, then it is *not* a ring homomorphism: given $b\in R$ such that $ab\neq ba$, then we have $bx = xb$ in $R[x]$, but $\varepsilon_a(x)\varepsilon_a(b) = ab\neq ba = \varepsilon_a(xb)$.

The "evaluation" map also induces a set theoretic map from $R[x]$ to $R^R$, the ring of all $R$-valued functions in $R$, with the pointwise addition and multiplication ($(f+g)(a) = f(a)+g(a)$, $(fg)(a) = f(a)g(a)$); the map sends $p(x)$ to the function $\mathfrak{p}\colon R\to R$ given by $\mathfrak{p}(a) = \varepsilon_a(p(x))$. This map is a group homomorphism, but it is *not* a ring homomorphism *unless* $R$ is commutative.

This means that from $p(x) = q(x)(x-a) + r(x)$ we *cannot* in general conclude that $p(c) = q(c)(c-a) +r(c)$ *unless* $c$ commutes in $R$ with $a$. ~~So the Remainder Theorem may fail to hold (if the coefficients involved do not commute with $a$ in $R$), which in turn means that the Factor Theorem may fail to hold~~ So one has to be careful in the statements (see Marc van Leeuwen's answer). And even when both of them hold for the particular $a$ in question, the inductive argument will fail if $b$ does not commute with $a$, because we cannot go from $p(x) = q(x)(x-a)$ to $p(b)=q(b)(b-a)$.

This is exactly what happens with, say, $p(x) = x^2+1$ in $\mathbb{H}[x]$. We are fine as far as showing that, say, $x-i$ is a factor of $p(x)$, because it so happens that when we divide by $x-i$, all coefficients involved centralize $i$ (we just get $(x+i)(x-i)$). But when we try to argue that any root different from $i$ must be a root of $x+i$, we run into the problem that we cannot guarantee that $b^2+1$ equals $(b+i)(b-i)$ *unless* we know that $b$ centralizes $i$. As it happens, the centralizer of $i$ in $\mathbb{H}$ is $\mathbb{R}[i]$, so we only conclude that the only other *complex* root is $-i$. But this leaves the possibility open that there may be some roots of $x^2+1$ that do *not* centralize $i$, and that is exactly what occurs: $j$, and $k$, and all numbers of the form $ai+bj+ck$ with $a^2+b^2+c^2=1$ are roots, and if either $b$ or $c$ are nonzero, then they don't centralize $i$, so we cannot go from $x^2+1 = (x+i)(x-i)$ to "$(ai+bj+ck)^2+1 = (ai+bj+ck+i)(ai+bj+ck-i)$".

And *that* is what goes wrong, and *there* is where commutativity is hiding.