I've always been nagged about the two extremely non-obviously related definitions of conic sections (i.e. it seems so mysterious/magical that somehow slices of a cone are related to degree 2 equations in 2 variables). Recently I came across the following pages/videos:

While 3B1B's video makes a lot of sense and is very beautiful from a geometric standpoint, it does not talk about any of the other conics, or discuss the relationship with "degree 2". Moreover, the 2nd 3B1B video I linked and then Bhargava's lecture highlights "degree 2" as something we understand well, compared to higher degrees (reminds me a little bit of Fermat's last theorem and the non-existence of solutions for $n>2$).

So, I suppose my questions are as follows:

  1. Why, from an intuitive standpoint, should we expect cones to be deeply related to zero-sets of degree 2 algebraic equations?

and more generally:

  1. Is there some deep reason why "$2$" is so special? I've often heard the quip that "mathematics is about turning confusing things into linear algebra" because linear algebra is "the only subject mathematicians completely understand"; but it seems we also understand a lot of nice things about quadratics as well -- we have the aforementioned relationship with cones, a complete understanding of rational points, and the Pythagorean theorem (oh! and I just thought of quadratic reciprocity). 2 is also special in all sorts of algebraic contexts, as well as being the only possible finite degree extension of $\mathbb R$, leading to in particular $\mathbb C$ being 2-dimensional.

Also interesting to note that many equations in physics are related to $2$ (the second derivative, or inverse square laws), though that may be a stretch. I appreciate any ideas you share!


EDIT 3/12/21: was just thinking about variances, and least squares regression. "$2$" is extremely special in these areas: Why square the difference instead of taking the absolute value in standard deviation?, Why is it so cool to square numbers (in terms of finding the standard deviation)?, and the absolutely mindblowing animation of the physical realization of PCA with Hooke's law: Making sense of principal component analysis, eigenvectors & eigenvalues.

In these links I just listed, seems like the most popular (but still not very satisfying to me) answer is that it's convenient (smooth, easy to minimize, variances sum for independent r.v.'s, etc), a fact that may be a symptom of a deeper connection with the Hilbert-space-iness of $L^2$. Also maybe something about how dealing with squares, Pythagoras gives us that minimizing reconstruction error is the same as maximizing projection variance in PCA. Honorable mentions to Qiaochu Yuan's answer about rotation invariance, and Aaron Meyerowitz's answer about the arithmetic mean being the unique minimizer of sum of squared distances from a given point. As for the incredible alignment with our intuition in the form of the animation with springs and Hooke's law that I linked, I suppose I'll chalk that one up to coincidence, or some sort of SF ;)


EDIT 2/11/22: I was thinking about Hilbert spaces, and then wondering again why they behave so nice, namely they have the closest point lemma (leading to orthogonal decomposition $\mathcal H = \mathcal M \oplus \mathcal M^\perp$ for closed subspaces $\cal M$), or orthonormal bases (leading to Parseval's identity, convergence of a series of orthogonal elements if and only if the sum of the squared lengths converge), and I came to the conclusion that the key result each time seemed to be the Pythagorean theorem (e.g. the parallelogram law is an easy corollary of Pythag). So that begs the questions, why is the Pythagorean theorem so special? The linked article in the accepted answer of this question: What does the Pythagorean Theorem really prove? tells us essentially the Pythagorean theorem boils down to the fact that right triangles can be subdivided into two triangles both similar to the original.

The fact that this subdivision is reached by projecting the vertex onto the hypotenuse (projection deeply related to inner products) is likely also significant... ahh, indeed by the "commutativity of projection", projecting a leg onto the hypotenuse is the same as projecting the hypotenuse onto the leg, but by orthogonality of the legs, the projection of the hypotenuse onto the leg is simply the leg itself! The square comes from the fact that projection scales proportionally to the scaling of each vector, and there are two vectors involved in the operation of projection.

I suppose this sort of "algebraic understanding" of the projection explains the importance of "2" more than the geometry, since just knowing about the "self-similarity of the subdivisions" of the right triangle, one then has to wonder why say tetrahedrons or other shapes in other dimensions don't have this "self-similarity of the subdivisions" property. However it is still not clear to me why projection seems to be so fundamentally "2-dimensional". Perhaps 1-dimensionally, there is the "objective" perception of the vector, and 2-dimensionally there is the "subjective" perception of one vector in the eyes of another, and there's just no good 3-dimensional perception for 3 vectors?

There might also be some connection between the importance of projection and the importance of the Riesz representation theorem (all linear "projections" onto a 1-dimensional subspace, i.e. linear functionals, are actually literal projections against a vector in the space).


EDIT 2/18/22: again touching on the degree 2 Diophantine equations I mentioned above, a classical example is the number of ways to write $k$ as the sum of $n$ squares $r_n(k)$. There are a number of nice results for this, the most famous being Fermat's 2-square theorem, and Jacobi's 4-square theorem. A key part of this proof was the use of the Poisson summation formula for the Euler/Jacobi theta function $\theta(\tau) := \sum_{n=-\infty}^\infty e^{i \pi n^2 \tau}$, which depends on/is heavily related to the fact that Gaussians are stable under the Fourier transform. I still don't understand intuitively why this is the case (see Intuitively, why is the Gaussian the Fourier transform of itself?), but there seems to be some relation to Holder conjugates and $L^p$ spaces (or in the Gaussian case, connections to $L^2$), since those show up in generalizations to the Hardy uncertainty principle (“completing the square”, again an algebraic nicety of squares, was used in the proof of Hardy, and the Holder conjugates may have to do with the inequality $-x^p + xu \leq u^q$ -— Problem 4.1 in Stein and Shakarchi’s Complex analysis, where the LHS basically comes from computing the Fourier transform of $e^{-x^p}$) Of course why the Gaussian itself appears everywhere is another question altogether: https://mathoverflow.net/questions/40268/why-is-the-gaussian-so-pervasive-in-mathematics.

This (squares leading to decent theory of $r_n(k)$, and squares leading to nice properties of the Gaussian) is probably also connected to the fact that $\int_{\mathbb R} e^{-x^2} d x$ has a nice explicit value, namely $\sqrt \pi$. I tried seeing if there was a connection between this value of $\pi$ and the value of $\pi$ one gets from calculating the area of a circle "shell-by-shell" $\frac 1{N^2} \sum_{k=0}^N r_2(k) \to \pi$, but I couldn't find anything: Gaussian integral using Euler/Jacobi theta function and $r_2(k)$ (number of representations as sum of 2 squares).

  • 5,864
  • 3
  • 17
  • 44
  • 5
    One aspect here is presumably the connection to quadratic forms, which are bilinear i.e. linear in both arguments. That is: Suppose I have a real symmetric matrix $M$, and I define $Q(u,v)=u^\top M v$ as a binary operation on vectors. Then $$Q(au_1+bu_2,v)=a Q(u_1,v)+b Q(u_2,v)$$ and similarly for the second argument. (More generally one can talk about -multilinear- functions, with the determinant as a key example.) – Semiclassical Jan 05 '21 at 05:40
  • Dropping a comment halfway through the 3B1B video, but the argument for a hyperbola (the locus of points where the *difference* between the distances from it to two fixed points is constant) would be the same except one sphere would be inside the "top cone" and the other inside the "bottom cone". The case of the parabola would be different since there is only one sphere, but someone who has thought about this for more than five minutes might have a clearer explanation. –  Jan 05 '21 at 06:50
  • 5
    Degree 2 is just the next simplest case after degree 1 (straight lines). Curves of degree 3 (elliptic curves) and higher have been extensively studied, but classifying and understanding them becomes much more complex. – gandalf61 Jan 05 '21 at 16:06
  • 1
    Regarding the alleged oddity of the number $2$, cf. https://mathoverflow.net/q/160811/27465, https://mathoverflow.net/q/915/27465, https://math.stackexchange.com/q/1573308/96384. – Torsten Schoeneberg Jan 05 '21 at 21:02
  • The question "Why are quadratic equations the same as right circular conic sections?" appears to be based on false notions; for a computational approach I would instead recommend looking for other related questions such as [Equation of Conics](https://math.stackexchange.com/q/3852371/139123), [Confusion about the conic equation](https://math.stackexchange.com/q/2096193/139123), or [The general equation of a conic section in the plane of the conic](https://math.stackexchange.com/q/3283737/139123). – David K Jan 06 '21 at 04:32
  • Other related questions are [Is it possible to derive the formula of the conic sections such as ellipse,hyperbola etc using linear algebra?](https://math.stackexchange.com/q/3304577/139123) and even [Synthetic geometry of conic sections](https://math.stackexchange.com/q/3712428/139123). – David K Jan 06 '21 at 04:37

6 Answers6


A cone itself is a quadratic! Just in three variables rather than two. More precisely, conical surfaces are "degenerate hyperboloids," such as

$$x^2 + y^2 - z^2 = 0.$$

Taking conic sections corresponds to intersecting a cone with a plane $ax + by + cz = d$, which amounts to replacing one of the three variables with a linear combination of the other two plus a constant, which produces a quadratic in two variables. The easiest one to see is that if $z$ is replaced by a constant $r$ then we get a circle $x^2 + y^2 = r^2$ (which is how you can come up with the above equation; a cone is a shape whose slice at $z = \pm r$ is a circle of radius $r$). Similarly if $x$ or $y$ is replaced by a constant we get a hyperbola.

I don't know that I have a complete picture to present about why quadratics are so much easier to understand than cubics and so forth. Maybe the simplest thing to say is that quadratic forms are closely related to square (symmetric) matrices $M$, since they can be written $q(x) = x^T M x$. And we have lots of tools for understanding square matrices, all of which can then be brought to bear to understand quadratic forms, e.g. the spectral theorem. The corresponding objects for cubic forms is a degree $3$ tensor which is harder to analyze.

Maybe a quite silly way to say it is that $2$ is special because it's the smallest positive integer which isn't equal to $1$. So quadratics are the simplest things that aren't linear and so forth.

Qiaochu Yuan
  • 359,788
  • 42
  • 777
  • 1,145

What is a cone?

It is a solid so that every cross section perpendicular to its center axis is a circle, and the radii of the these cross section circles a proportional to the the distance from the cone's vertex.

And that's it. the surface of the cone are the points $(x,y,z)$ where $z = h= $ the height of the the cross-section $= r = $ the radius of the cross section. And $(x,y)$ are the points of the circle with radius $r = h = z$.

As the equation of a circle is $\sqrt{x^2 +y^2} = r$ or $x^2 + y^2 = r^2$ the equation of a cone is $x^2 + y^2 = z^2$.

Every conic section is a matter intersecting the cone with a plane. A plane is a restriction of the three variable to be related by restraint $ax +by + cz= k$ and that is a matter of expressing any third variable as a linear combination of the other two.

So the cross section of a plane and cone will be a derivation of the 2 degree equation $x^2 = y^2 = z^2$ where one of the variables will be linear combination of the other two. In other words a second degree equation with two variables.

And that's all there is to it.

Of course the real question is why is the equation of a circle $x^2 + y^2 =r^2$? and why is that such an important representation of a second degree equation?

And that is entirely because of the Pythagorean theorem. If we take any point $(x,y)$ on a plane and consider the three points $(x,y), (x,0)$ and $(0,0)$ they for the three vertices of a right triangle. The legs of this triangle are of lengths $x$ and $y$ and therefore by the Pythagorean theorem the hypotenuse will have length $\sqrt{x^2 + y^2}=h$ and that is the distance of $(x,y)$ to $(0,0)$.

Now a circle is the collection of points where the distance from $(x,y)$ to $(0,0)$ is the constant value $r = h$. And so it will be all the points $(x,y)$ where $\sqrt{x^2 + y^2} =r$.

And that's it. That's why: distances are related to right triangles, right triangles are related to 2nd degree equations, circles are related to distances, cones are related to circles and all of them are related to 2nd degree equations.

That's it.

  • 1
  • 5
  • 39
  • 125

The proximate reason is that cones are based on circles, and circles, in turn, are given by the quadratic equation

$$x^2 + y^2 = r^2$$

. Now, as for the reason that circles have this equation, that is because they are related to the Euclidean distance function, being the set of all points at a constant distance from a given center, here conventionally taken as the origin. In particular,

$$d(P, Q) = \sqrt{|Q_x - P_x|^2 + |Q_y - P_y|^2}$$

Insofar as why the Euclidean metric has this form, I would say that it comes down to the following. To get a little more insight into this, it is useful to consider the somewhat more general form of metrics

$$d_p(P, Q) := \left(|Q_x - P_x|^p + |Q_y - P_y|^p\right)^{1/p}$$

called the $p$-metrics which, in effect, result from asking "well, what happens if we let the power not be 2?", and so are just right for answering this question.

And it turns out that $d_2$ has a very special property. It is the only one for which you can take a geometric object, declare a point on it a pivot, then take any other point on that object and tag it, measure the distance from the pivot to the tag point, and now transform that object in such a way the center remains fixed, while the tag point comes to face a different direction at the same distance, and yet the whole object's overall size and shape remains unchanged. Or, to put it another way, that such a thing as "rotation" makes geometric sense as being a rigid motion.

So, what is the ultimate reason cones are quadratic? Because in Euclidean space, you can rotate things in any way you please without changing their size and shape.

  • 17,776
  • 4
  • 43
  • 71

There is a paper by David Mumford which may be hard to read depending upon your level of preparation.

The gist of that paper is to say that any system of polynomial equations can be replaced (by adding more variables and more equations) to a system of quadratic and linear equations.

One can probably generalise this further to show that if the polynomial system has parameters, then one can ensure that these parameters only appear in the linear equations.

The very special early case of this is the one you have mentioned.

  • 1,132
  • 7
  • 9

A reason "2" is special for physics is Newton's second law, which relates force to acceleration (not velocity) and that's a second derivative. Well, there's also the role of "2" in inverse square laws.

The reason "2" is special in geometry through quadratic forms in several variables is that quadratic forms in several variables have a few nice properties.

  1. Every quadratic form can be diagonalized to remove all cross terms, so you can focus on the case of diagonal quadratic forms $a_1x_1^2 + \cdots + a_nx_n^2$. (Strictly speaking this is not true for quadratic forms over fields of characteristic $2$, but you don't get geometric intuition from characteristic $2$.) In contrast to that, cubic forms may not be able to be diagonalized, even over $\mathbf C$. For example, the cubic form $y^2z - x^3 + xz^2$ (whose zero set in dehomogenized form is given by the equation $y^2 = x^3 - x$) can't be diagonalized over $\mathbf C$: see my comments here
  1. Every nonsingular quadratic form has a large group of automorphisms thanks to the construction of reflections. It's called the orthogonal group of the quadratic form. In contrast to that, the "orthogonal group" of a higher-degree homogeneous polynomial $f(\mathbf x)$ (that means the group of linear transformations $A$ preserving the polynomial: $f(A\mathbf x) = f(\mathbf x)$) is often finite, e.g., the only isometries of $x_1^n + \cdots + x_n^n$ for $n \geq 3$ are coordinate permutations and multiplying coordinates by $n$th roots of unity.

  2. Fundamental to geometry is the concept of orthogonality, which you want to be a symmetric bilinear relation: $v \perp w$ if and only if $w \perp v$, and if $v \perp w$ and $v \perp w'$ then $v \perp (ax + a'w')$ for all scalars $a$ and $a'$. This suggests looking at bilinear forms $B(v,w)$ on a vector space and asking when the relation $B(v,w) = 0$ (an abstract version of "$v \perp w$") is symmetric. It turns out this happens if and only if $B$ is symmetric or alternating. The first case is, outside of characteristic $2$, closely related to studying the quadratic form $Q(v) = B(v,v)$.

  • 30,396
  • 2
  • 67
  • 110
  • I think conservative forces are far more significant than acceleration being a 2nd derivative, and inverse-square comes from the fact that our universe has 3 macroscopic spatial dimensions. – Lawnmower Man Jan 05 '21 at 23:47
  • I think the square in the inverse law comes from the fact that we live in 3 dimensions. The set of points at a given distance has 1 dimension less, i.e. they form a surface. The surface increase like the square of the distance. – Florian F Jan 06 '21 at 12:35

The index number 2 is special in connection with the way that angles can be defined from distances.

There are many possible distance functions (norms) which can be defined, but most of them do not allow angles to be defined in a consistent way. Angles are defined from an inner product (dot product) and this is only defined if the norm obeys the quadratic expression $$||u+v||^2+||u-v||^2=2||u||^2+2||v||^2$$ for any vectors $u$ and $v$.

In a space with a different norm there are fewer rotations. There may be only a finite number of possible rotations of a circle or a sphere. A "cone" in 3d $(x,y,z)$ defined by $||x+y||=||z||$ can still be intersected by planes and a family of (nonquadratic) curves found.

In the usual geometry angles are defined, so there is a quadratic expression which must be satisfied by lengths.

  • 1,300
  • 4
  • 11