What are some examples of theorems, whose first proof was quite hard and sophisticated, perhaps using some other deep theorems of some theory, before years later surprisingly a quite elementary, direct, perhaps even short proof has been found?

A related question is MO/24913, which deals with hard theorems whose proofs were simplified by the development of more sophisticated theories. But I would like to see examples where this wasn't necessary, but rather the theory turned out to be superfluous as for the proof of the theorem. I expect that this didn't happen so often. [Ok after reading all the answers, it obviously happened all the time!]

Martin Brandenburg
  • 146,755
  • 15
  • 248
  • 458
  • 8
    You may find some gems in [Proofs from The Book](http://en.wikipedia.org/wiki/Proofs_from_THE_BOOK). – Austin Mohr May 28 '13 at 01:28
  • It's certainly the case that often calculations which are tedious and lengthy can collapse through the introduction of an elegant notation. I don't think this is what your after, correct? – James S. Cook May 28 '13 at 01:40
  • Would proofs of theorems in formal logic qualify? – Doug Spoonwood May 28 '13 at 01:54
  • I don't want to write a long answer giving details, but the prime number theorem is a prime example, and perhaps Hilbert's basis theorem as well. – Ragib Zaman May 28 '13 at 13:32
  • @James S. Cook: Yes, please read the second paragraph of my question. – Martin Brandenburg May 28 '13 at 22:55
  • @Austin Mohr: You are right! Feel free to collect some of your favorite proofs into an answer (providing some background, in particular that the proof really replaced a more sophisticated one). I own this book, and I love it. – Martin Brandenburg May 28 '13 at 23:10
  • I'm not sure Apery's proof of the irrationality of $\zeta(3)$ counts. It by itself was a hard but elementary result; Beukers' proof using integrals of Legendre functions was much shorter, though less elementary. – Eric Jablow May 29 '13 at 01:44

10 Answers10


Goedel's original completeness/compactness proofs in logic are very hard and technical. Modern versions of the proofs are considerably simpler and do not use any sophisticated new theory.

Most proofs of the fundamental theorem of algebra use some topological/homotopical deep(ish) results or some deep(ish) results of complex analysis. In fact, the theorem can be proved using only the definition of complex numbers and absolute value, using very elementary properties of complex numbers, and the entire proof is about half a page long (such a proof is actually a minimum modulus argument but applied to a polynomial so that the computation can be done directly without appeal to the general minimum modulus principle). The proof is extremely elementary.

The proof that Euclide's fifth postulate is independent of the other axioms of Euclidean geometry by exhibiting models (i.e., the sphere and the hyperbolic plane) where the postulate fails (in different ways) is completely elementary. Certainly all those who tried to prove/disprove the fifth postulate did so while walking on a counterexample. However, the numerous attacks on the problem prior to its settlement in the 20th century were quite sophisticated and laborious. The barrier was conceptual - the understand of the importance of models - not technical.

Brouwer's topological proof of his fixed point theorem used (I believe) homology and/or the fundamental group. A perfectly elementary proof using Sperner's Lemma was later discovered (I'm not entirely sure about the chronological order here, so I may be mistaken).

Initial proofs that the higher homotopy groups are all abelian were greatly simplified by the Eckman-Hilton argument (which is completely elementary).

The Mac Lane coherence theorem in category theory is rather technical and is enormously simplified by considering the Yoneda embedding in the 2-categorical setting. All of the machinery was already present, but it required some re-assembly to figure out a rather elementary proof.

Ittay Weiss
  • 76,165
  • 7
  • 131
  • 224
  • The proof of the FTA that uses only the maximum value theorem for continuous functions on compact sets is indeed more elementary than the winding number proofs (or the Galois theory proof, which still needs the intermediate value theorem), but I would contend the winding number argument (due, I believe, to Gauss) gives more insight and leads to elegant generalizations. Just sayin' .... – Ted Shifrin May 28 '13 at 03:06
  • 2
    @TedShifrin oh, no argument about the added insight of clever proofs that use deeper results. – Ittay Weiss May 28 '13 at 03:30
  • 3
    I once wrote a [post](http://mixedmath.wordpress.com/2011/09/21/a-month-you-say) on my blog about an elementary proof of FTA by Oswaldo de Oliveira, using very little along the way. – davidlowryduda May 28 '13 at 06:12
  • 1
    I'm not completely convinced that Mac Lane's coherence theorem can be reduced to Yoneda embeddings – that just shifts the problem from coherence of composition to coherence of pseudofunctors. – Zhen Lin May 28 '13 at 07:11
  • @ZhenLin perhaps this is then just a conceptually simpler proof. – Ittay Weiss May 28 '13 at 07:23
  • Can you please explain a bit about the part "the computation can be done directly without appeal to the general minimum modulus principle"? This interests me very much! (I didn't do that proof in my lecture because I didn't have the general minimum modulus principle available.) – Hendrik Vogt May 28 '13 at 11:12
  • 1
    @HendrikVogt http://arxiv.org/pdf/1109.1459v1.pdf – Ittay Weiss May 29 '13 at 00:36
  • @IttayWeiss could you give a pointer for the last point? Proving coherence using the Yoneda embedding in 2-categorical setting. – Aleš Bizjak Jun 20 '13 at 06:18
  • @AlešBizjak In a nutshell: The 2-Yoneda embedding maps embedding maps the a given 2-category into a 2-category of functors in such a way that the category is equivalent to its image under the Yoneda 2-functor. But, composition of functors is strictly associative, so that image is a strict 2-category. – Ittay Weiss Jun 23 '13 at 05:46

Gauss's initial proof of quadratic reciprocity in Disquisitiones was exceptionally long spanning I believe 30+ pages, and in my opinion it is incredibly nasty. Gauss used induction and then reduced to a something like eight cases. While not necessarily relying on big theorems this proof seems overly complicated especially when compared to some of the proofs that followed.

For example, Eisenstein's proof is quite short and has a very nice geometric component that I think is very nice, and it is does not require a great deal more machinery than Gauss's original.

There are in fact may other nice proofs of quadratic reciprocity. Here's a nice list of some of them

  • 278
  • 1
  • 13

Chebyshev had first proven the Bertrand's postulate which states that for any integer $n > 3$, there always exists at least one prime number $p$ with $n < p < 2n − 2$. A weaker but more elegant formulation is: for every $n > 1$ there is always at least one prime $p$ such that $n < p < 2n$.

Later Paul Erdős had given an elementary yet elegant proof of this. Surprisingly he was only 20 years old when he had given this proof.

  • 4,992
  • 18
  • 28

Perhaps the most exciting and dramatic of the difficult inequalities is Arhangel'skii's theorem that $|X|\le \exp(L(X)\chi(X))$ for every Hausdorff space. The countable version of this result, namely that every Lindelöf, first countable, Hausdorff space has cardinality at most $\mathfrak c$, answer the following fifty-year old question of Alexandroff and Urysohn. Does there exists a compact, first countable space having cardinality greater that the continuum?

As one might guess, Arhangel'skii's original proof was quite difficult. The argument given in Set theoretic Topology Page 19 is due to Pol. It is not difficult for one to undestand. The countable version of this proof should be within the reach of any first-year graduate student in mathematics. The theorem is sufficently important to be included in any introductory graduate course in set-theoretic topology, and provides exposure to modern topology at an early level of mathematical training.

Martin Sleziak
  • 50,316
  • 18
  • 169
  • 342
  • 19,906
  • 5
  • 33
  • 70
  • 1
    I don't understand the claim. Is $\chi(X)$ the Euler characteristic? – Qiaochu Yuan May 28 '13 at 02:50
  • It is a topology notation. It denotes the character of a space. For example, the character of a first countable is $\aleph_0$. – Paul May 28 '13 at 02:53
  • 2
    @Qiaochu: For $x\in X$, $\chi(x,X)$ is the minimum cardinality of a local base at $x$; $\chi(X)=\sup\{\chi(x,X):x\in X\}+\omega$. Arkhanangel'skiĭ’s proof was a rather complicated ramification argument; Pol’s is a much simpler ‘linear’ construction. – Brian M. Scott May 28 '13 at 05:34
  • Which book do you mean by "Set theoretic Topology"? Also, can you give a definition of $L(X)$? And more references? – Martin Brandenburg May 28 '13 at 23:12
  • 1
    @Martin Brandenburg: It is from the book: Kunen K, Vaughan J. Handbook of set-theoretic topology[J]. 1984. Moreover, $L(X)$ is the lindelof degree, i.e., $L(X)=\min\{\kappa: \text{ every open cover of X has a subcover of cardinality }\le \kappa\} + \omega$. – Paul May 29 '13 at 00:25

Too long for a comment.

I think Bill Johnson's answer (Lomonosov 1973) on MO applies to your question as well. Some superfluous theory that was used for the first proof of a weaker result (Bernstein-Robinson 1966) was nonstandard analysis. It was Halmos who immediately removed the nonstandard analysis from the argument.

These things were certainly not easy to figure out because von Neumann himself only proved the existence of nontrivial invariant subspaces for compact operators on Hilbert space ($\dim \geq 2$). Lomonosov entails hyperinvariant ones on general Banach spaces.

See this wikipedia page on the invariant subspace problem for statements and a chronology of related results. Maybe that's a good opportunity to recall that it is still not known whether every bounded operator on an infinite-dimensional separable Hilbert space admits a nontrivial invariant subspace.

  • 42,872
  • 3
  • 69
  • 154

Around 1911 the group theorist William Burnside proved his famous $pq$-Theorem:

Let $p$ and $q$ be primes and $G$ a group of order $p^aq^b$. Then G is solvable.

Burnside used linear representations over $\mathbb{C}$, in particular character theory. Although his proof was not extremely difficult, in the late 60s one began to look for a less sophisticated, that is a non-character proof, relying on new developments in group theory. In fact, David Goldschmidt was able to apply techniques that had been developed by his thesis adviser John Thompson and proved “by elementary means” the case where $p$ and $q$ are both odd. Not long after, Bender and Matsumaya found proofs for the remaining case and the combination of their arguments led to an overall simpler proof than the original one given by Goldschmidt.

Nicky Hekster
  • 42,900
  • 7
  • 54
  • 93
  • Thanks! Can you say something about the length or complexity of the character-free proofs, as compared to Burnside's original proof? – Martin Brandenburg May 28 '13 at 22:59
  • It depends on how you count. I estimate the purely group theoretic proof needs twice as much pages. The character theoretic proof requires some development of basic character theory, and then the proof as such is not to difficult (it includes some algebraic number theory argument as well, but also basic stuff). The character-free uses more to arrive at the final proof. I suggest you study the books of Marty Isaacs, *Character Theory of Finite Groups* and *Finite Group Theory*, where you can find the proofs. – Nicky Hekster May 30 '13 at 10:48

In general proofs that are very intricate and laborious (like brute force for example) give way to more elegant proofs throughout the ages. However, these proofs rely on an arsenal of concepts much more sophisticated than their predecessors. But a language is more sophisticated conceptual offset by an economy in brute force.

I think the most beautiful example that satisfies your question is the story of the proofs of Sylow's Theorem. See in this paper a elegant proof for exemple.

The Sylow theorems have been proven for the first time by the Norwegian mathematician Ludwig Sylow in 1872. See wikipedia for references.

"Classical proofs" using the Conjugacy class equation and insights in counting are proofs very laborious. A proof with Conjugacy class equation can be seen in the excelent book Abstract Algebra: An Introduction by Thomas W. Hungerford.

In outher execelent Hungerford's book we see a simple, elegant and sophisticated proof ( see p. 93) using the concept of group action ( see p. 88) .

Elias Costa
  • 13,301
  • 4
  • 43
  • 79

Please allow me to introduce an interested proof of the following theorem:

Theorem. Let $f:X\to Y$ be a quasi-finite projective morphism of locally Noetherian schemes. Then $f$ is affine (hence, in particular, finite).

Proof. We may assume without loss of generality that $Y$ is connected. By assumption, there exists a locally free sheaf $E$ of rank $r+1 < \infty$ on $Y$, and a closed immersion $i:X \to \mathbb{P}_Y(E)$ over $Y$. Write $L:= \mathcal{O}_{\mathbb{P}_Y(E)/Y)}(1)$. Then we obtain (by pulling back of the Eular sequence) the following exact sequence on $X$: $$ \require{AMScd} \begin{CD} 0 @>>> i^*\Omega_{\mathbb{P}_Y(E)/Y}\otimes L @>{k}>> f^*E @>>> L @>>> 0. \end{CD} $$ Now, the surjection $k^{\vee}:f^*E^{\vee} \to i^*(\Omega_{\mathbb{P}_Y(E)/Y}\otimes L)^{\vee}$ induces a proper morphism $p:\mathbb{P}_X(i^*(\Omega_{\mathbb{P}_Y(E)/Y}\otimes L)^{\vee}) \to \mathbb{P}_Y(E^{\vee})$ over $Y$. Write $$U := \mathbb{P}_Y(E^{\vee})\setminus \mathrm{Im}(p).$$ Then $j:U\hookrightarrow \mathbb{P}_Y(E^{\vee})$ is open, hence $g:U\to Y$ is flat and quasi-compact. Moreover, since $X$ is quasi-finite over $Y$, by considering that after base-change to $\mathrm{Spec}(k(y))$, we conclude that $U$ is surjective. Thus $U$ is fpqc over $Y$.

Write $\varphi: g^*E^{\vee}\to j^*\mathcal{O}_{\mathbb{P}_Y(E^{\vee})/Y}(1)$ for the surjection induced by $j$. Then the surjection $\varphi^{\vee}: g^*E\to \ker(\varphi)^{\vee}$ induces a closed immersion $$\mathbb{P}_U(\ker(\varphi)^{\vee})\hookrightarrow \mathbb{P}_U(g^*E) \cong \mathbb{P}_Y(E) \times_Y U.$$ Write $V\subset (\mathbb{P}_Y(E) \times_Y U) \setminus \mathbb{P}_U(\ker(\varphi)^{\vee})$. Then $V$ is affine over $U$. Moreover, by our construction, $i\times_Y \mathrm{id}_U:X\times_YU \to \mathbb{P}_Y(E)\times_Y U$ factors uniquely through $V$. Since $i\times_Y \mathrm{id}_U$ is a closed immersion, this implies that $X\times_Y U$ is affine. Since $U$ is fpqc over $Y$, $X$ is also affine over $Y$. $\square$

  • 306
  • 1
  • 5
  • What was the original proof of this fact? (From the question: "What are some examples of theorems, **whose first proof was quite hard and sophisticated** [emph. mine], perhaps using some other deep theorems of some theory, before years later surprisingly a quite elementary, direct, perhaps even short proof has been found?") – Noah Schweber Nov 04 '21 at 18:46
  • This is a variant of Zariski's main theorem due to Grothendieck (EGA III-1). In [EGA III-1], he used formal function theorem, that is quite difficult and that is proved after careful preparation. In [Stacks](https://stacks.math.columbia.edu/tag/02LS/cite), we can see a proof of this theorem without formal function theorem, but I think this proof is based on a complicated commutative ring theory. – YJ_cat Nov 05 '21 at 03:25

The Szemeredi-Trotter incidence theorem gives an upper bound on the total number of incidences between a finite set of points and a finite set of lines in the plane. An incidence is a pair $(\ell, x)$ consisting of a line $\ell$ and a point $x \in \mathbb{R}^2$ such that $x \in \ell$.

The original proof of the theorem consisted of a tricky cell-decomposition trick which abound in additive combinatorics. The newer slick proof comes from an argument of Szekely which uses elementary graph theory techniques. The details are summarized very nicely by Terry tao here and here. Ultimately the proof relied on Euler's formula $f -e + v = 2$, the basic inequalities one sees when studying graphs for the first time, such as $e \leq 3v - 6$, and an upper bound on the crossing number of a specific subgraph. The newer proof is much easier to grasp and can certainly be taught to undergraduates.

An overview of the older formulation can be found on Terry Taos'b log (specifically here and here).

  • 12,613
  • 2
  • 36
  • 61
  • Another good example wood be Giorgis Petridis' proof of Plunneke's inequality, which Tim Gowers summarized nicely [here](http://gowers.wordpress.com/2011/02/10/a-new-way-of-proving-sumset-estimates/). – JavaMan May 30 '13 at 03:59

One might suppose that there needs to be some profound insight or conception needed. That is, one either has to deconstruct the notation, or see something in a different light.

For example, a recent question on whether other sets of weights, other than $1,3,9,27,81$ might yield all numbers from $1$ to $121$. One might be tempted to solve this by seeking the largest of the weights, as one does filling a backpack. The actual proof involves hunting down the smallest weight.

When one devises notation, such as my Polygloss, the notation gives rise to insights that were not apparent without it.

One thing, for example, is that it's not apparently obvious that a dot-product of vectors in an oblique coordinate system could be done relatively easily. But this can indeed be done by the use of a matrix, which normally does not appear in the process. One multiplies one of the vectors by a matrix, and takes the dot product of this and the second vector.

I have had many profound insights into many different things, and the chief stumbling block is that one has to deconstruct the current notions and objections. For example, the notion that a base consists of digits $0$ to $b-1$ goes, when one deals with alternating arithmetic.

  • 6,687
  • 1
  • 19
  • 33