When solving a problem, we often look at some special cases first, then try to work our way up to the general case.

It would be interesting to see some counterexamples to this mental process, i.e. problems that become easier when you formulate them in a more general (or ambitious) form.


Recently someone asked for the solution of $a,b,c$ such that $\frac{a}{b+c} = \frac{b}{a+c} = \frac{c}{a+b} (=t).$

Someone suggested writing this down as a system of linear equations in terms of $t$ and solving for $a,b,c$. It turns out that either (i) $a=b=c$ or (ii) $a+b+c=0$.

Solution (i) is obvious from looking at the problem, but (ii) was not apparent to me until I solved the system of equations.

Then I wondered how this would generalize to more variables, and wrote the problem as: $$ \frac{x_i}{\sum x - x_i} = \frac{x_j}{\sum x - x_j} \quad \forall i,j\in1,2,\dots,n $$

Looking at this formulation, both solutions became immediately evident without the need for linear algebra (for (ii), set $\sum x=0$ so that each denominator cancels out with its numerator).

  • 9,204
  • 3
  • 39
  • 55
  • 2
    The linked Question : http://math.stackexchange.com/questions/897118/what-is-the-non-trivial-general-solution-of-these-equal-ratios – lab bhattacharjee Aug 16 '14 at 12:16
  • 7
    Well, I personally don't think this is actually simplification by generalization, but rather simplification *motivated by* generalization. Like if you write $\frac{a}{b+c}$ as $\frac{a}{(a + b + c) - a}$, you'd say that the problem becomes obvious too. – Tunococ Aug 16 '14 at 12:59
  • 1
    @Tunococ Fair enough, I'm just saying that writing it like this didn't occur *to me* until I thought of generalizing it. I understand that this is all a little subjective (hence the "soft-question" tag). – MGA Aug 16 '14 at 13:01
  • 5
    [This](http://mathoverflow.net/questions/40005/generalizing-a-problem-to-make-it-easier) is relevant. – jcg Aug 16 '14 at 13:58
  • 2
    Introducing topology makes certain proofs in real analysis clearer and more elegant, though I'm not sure whether they are necessarily easier per se. – Harry Johnston Aug 17 '14 at 00:24
  • I remember once (in an exercise about packing and covering) that to compute $\int_0^1 n (1- r)^{n-1} r^n dr$ (for fixed $n$), it actually easier to consider $I_{a,b} := \int_0^1 (1-r)^a r^b dr$ because integration by parts yields $$I_{d-1, d} = \dfrac{d-1}{d+1} I_{d-2, d+1}.$$ – Watson Dec 24 '19 at 15:40
  • I've seen recently an argument that shows that $\rm{tr}(M) = 0 \implies \det(\exp(M))=1$ by defining $\phi(t) := \det(\exp(tM))$ and showing that $\phi'(t) = \phi(t) \phi'(0)$ and $\rm \phi'(0) = \tr(M) = 0$, so $\phi' \equiv 0$ and thus $\phi \equiv 1$. In particular, $\det(\exp(M))=1$. [Of course, there are plenty of other proofs of $\rm \det \circ \exp = \exp \circ \tr$ over $M_n(\Bbb C)$, the most natural one is in the context of Lie algebras, noticing that the $\rm tr = D_e(\det)$ is the derivative of the determinant at the identity matrix]. – Watson Feb 27 '20 at 12:44

11 Answers11


Consider the following integral $\displaystyle\int_{0}^{1}\dfrac{x^7-1}{\ln x}\,dx$. All of our attempts at finding an anti-derivative fail because the antiderivative isn't expressable in terms of elementary functions.

Now consider the more general integral $f(y) = \displaystyle\int_{0}^{1}\dfrac{x^y-1}{\ln x}\,dx$.

We can differentiate with respect to $y$ and evaluate the resulting integral as follows:

$f'(y) = \displaystyle\int_{0}^{1}\dfrac{d}{dy}\left[\dfrac{x^y-1}{\ln x}\right]\,dx = \int_{0}^{1}x^y\,dx = \left[\dfrac{x^{y+1}}{y+1}\right]_{0}^{1} = \dfrac{1}{y+1}$.

Since $f'(y) = \dfrac{1}{y+1}$, we have $f(y) = \ln(y+1)+C$ for some constant $C$.

Trivially, $f(0) = \displaystyle\int_{0}^{1}\dfrac{x^0-1}{\ln x}\,dx = \int_{0}^{1}0\,dx = 0$. Hence $C = 0$, and thus, $f(y) = \ln(y+1)$.

Therefore, our original integral is $\displaystyle\int_{0}^{1}\dfrac{x^7-1}{\ln x}\,dx = f(7) = \ln 8$.

This technique of generalizing an integral by introducing a parameter and differentiating w.r.t. that parameter is known as Feynman Integration.

Joel Reyes Noche
  • 6,307
  • 3
  • 36
  • 61
  • 51,763
  • 3
  • 72
  • 138
  • 9
    Great example! Really neat – seldon Aug 16 '14 at 19:59
  • 2
    My favorite illustration so far. Of course the others are nice too. – MGA Aug 16 '14 at 20:22
  • The special form (not only the general form) can also easily be determined by setting $\ln x=-t$ and then use [Frullani's integral](http://mathworld.wolfram.com/FrullanisIntegral.html). – Tunk-Fey Aug 17 '14 at 14:44
  • The integrand does have an anti-derivative in terms of the [upper incomplete gamma function](https://en.wikipedia.org/wiki/Incomplete_gamma_function#Definition): $\Gamma(0, -\ln x) - \Gamma(0, -8\,\ln x)$. That still doesn't help because it's undefined at both limits of integration. – Tavian Barnes Aug 17 '14 at 21:52
  • @Tunk-Fey You're right, apologies. Deleting my comment. – MGA Aug 18 '14 at 11:39
  • @MGA Please, no need apologize. We're just fine. :) – Tunk-Fey Aug 18 '14 at 12:01
  • For a discussion of a very similar example, namely for the case $y=10,$ and for an excerpt from Feynman's 1985 book **Surely You're Joking Mr. Feynman** (probably when Feynman's name began being associated with what used to be a well-known method), see my answer to [What are some good low-prerequisite examples for the heuristic advice “If you cannot prove it, prove something stronger.”?](https://matheducators.stackexchange.com/questions/2157/what-are-some-good-low-prerequisite-examples-for-the-heuristic-advice-if-you-ca/2174#2174) This other question also has other examples relevant to the OP. – Dave L. Renfro May 13 '18 at 09:05
  • I find replacing $7$ with $\pi$ to be more elegant. – Simply Beautiful Art Aug 06 '18 at 21:48

I recall something like this coming up when evaluating certain summations. For example, consider:

$$ \sum_{n=0}^{\infty} {n \over 2^n} $$

We can generalize this by letting $f(x) = \sum_{n=0}^{\infty} nx^n$, so:

$$ \begin{align} {f(x) \over x} &= \sum_{n=0}^{\infty} nx^{n-1} \\ &= {d \over dx} \sum_{n=0}^{\infty} x^n \\ &= {d \over dx} {1 \over {1-x}} = {1 \over (x-1)^2} \end{align} $$


$$ f(x) = {x \over (x-1)^2} $$

The solution to the original problem is $f({1 \over 2}) = 2$.

  • 752
  • 6
  • 16
  • 5
    Interestingly, this also shows $\sum_{n=0}^\infty \frac{1}{2^n} = \sum_{n=0}^\infty \frac{n}{2^n}$, which succeeded at surprising me. :) – Keba Aug 03 '15 at 23:26

George Polya's book How to Solve It calls this phenomenon "The Inventor's Paradox": "The more ambitious plan may have more chances of success." The book gives several examples, including the following.

1) Consider the problem: "A straight line and a regular octahedron are given in position. Find a plane that passes through the given line and bisects the volume of the given octahedron." If we generalize this to "a straight line and a solid with a center of symmetry are given in position..." it becomes very easy. (The plane goes through the center of symmetry and the line.)

The book also gives other examples of the Inventor's Paradox, but "more ambitious" is not always the same as "more general." Consider: "Prove that $1^3 + 2^3 + 3^3 + ... + n^3$ is a perfect square." Polya shows that it is easier to prove (by mathematical induction) that "$1^3 + 2^3 + 3^3 + ... + n^3 = (1 + 2 + 3 + ...+ n)^2$". This is more ambitious but is not more general.


The web page Generalizations in Mathematics gives many similar examples. It even gets into the difference between "more ambitious" and "more general."

Rory Daulton
  • 31,517
  • 6
  • 42
  • 61
  • Good point regarding generality/ambitiousness, and both are interesting. I have made a minor edit to the question to reflect this. – MGA Aug 16 '14 at 12:51
  • 9
    In the first the easy approach is "more general" in the sense that it picks out the significant property of an entity and ignores any other particular properties it may have. There are *loads* of examples like it, for example any time you're asked to prove some property of a particular group that's true of all Abelian groups, or some property of a particular function that's true of any continuous monotonic function, and so on. The problem is, "identify the useful property of this object", so the more general problem where you *only* have the useful property is always easier ;-) – Steve Jessop Aug 16 '14 at 19:26
  • The cubes and square example, termed 'more ambitious', is an example of the technique of solving a problem by the addition of an invented constraining hypothesis, and thus is actually a case of _particularization_. – Jose Brox Aug 20 '14 at 08:00

The solution to the Monty Hall problem

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, follows the fixed protocol of opening another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

becomes more obvious when you generalize it to an $N$-door problem with the host opening $N-2$ doors. For $N\gg3$ most people's intuition revolts against staying with the original choice.

  • 9,613
  • 7
  • 37
  • 57
  • 1,062
  • 1
  • 6
  • 18
  • 2
    A more helpful approach to the standard Monty Hall is to examine what is wrong with the above formulation. The story, as told here, is consistent with a game show where the host offers a chance to switch only to contestants who have already chosen the correct door. When playing that game, you should never switch regardless of how many doors there are. In the standard problem, it is explicitly stated that the host will _always_ offer the switch. That's what makes the odds $N-1$ to $1$ in favor of the switch; and $2$-to-$1$ odds is good enough. – David K Aug 16 '14 at 15:22
  • @DavidK - the text I included is Wikipedia's description of the Monty Hall problem. Have modified it. – Johannes Aug 16 '14 at 20:20
  • 2
    The problem has been underspecified in some widely-read sources; no surprise one of those got copied to Wikipedia. I'll accept this version as implying the player knows the fixed protocol. In that case, if someone's intuition requires $N$ to be increased above $3$ before they think the switch is beneficial, they do not understand the reason why it is beneficial. They have merely been nudged from an incorrect guess to an unjustified but luckily correct guess. – David K Aug 18 '14 at 23:12
  • @DavidK: I think that would be a fair criticism of someone who claims to be game-theoretically rational, but I doubt such a person exists; would a reasonable person expect a Bayes factor of $2:1$ to come to his rescue? Even though it wouldn't be necessary to the idealised mathematical observer with the eyes of a hawk, in many contexts exaggerating the difference helps me notice there is one, whence I can see via a monotonicity argument that the effect is non-zero (if weak) for $N \gtrsim 3$. – Vandermonde Feb 16 '15 at 19:27
  • viz. http://mathoverflow.net/a/74710 – Vandermonde Feb 16 '15 at 19:43
  • @Vandermonde This problem does not require game theory. It requires only the same horse sense that a good card player or backgammon player uses, namely, know your odds. A choice that lets you double your chance of success is a _phenomenal_ opportunity compared to the choices you have in common games such as blackjack. Granted, many people (myself included!) do not play cards or backgammon very well, precisely because they lack sufficiently detailed understanding of the odds. – David K Feb 16 '15 at 20:38
  • @DavidK: I guess I meant not so much game-theoretical (it never comes into play as the only agent is the player under the crucial assumption about the protocol) as rational in the sense of homo economicus. Having thought about it for a couple of minutes, I also think I haven't been giving enough credit to an improvement of $4/3$ in the chance of winning. Such a gain would seldom be the deciding factor or make me enthusiastic about Russian roulette, but compared to even 'large' house edges in typical games, it really is godly. – Vandermonde Feb 16 '15 at 21:20
  • 1
    Heck, what I thought negligible was a raw probability of 1/6, which I ought to appreciate even then. – Vandermonde Feb 16 '15 at 21:25
  • Mental note to self: 'credit' in the foregoing would more properly read 'respect' or 'faith' – Vandermonde Feb 20 '15 at 03:13

I'm not entirely convinced that problems made somehow easier by generalizations is exactly what is going on here.

In the example provided in your question, what made the solution to the general problem appear easier is that it dawned on you that $$x_1+\cdots+x_{j-1}+x_{j+1}+\cdots+x_n=\sum_{i=1}^nx_i-x_j.$$ Indeed, (as tunococ commented) had the less general problem been written as $$\frac{a}{a+b+c-a}=\frac{b}{a+b+c-b}=\frac{c}{a+b+c-c},$$ then your easier solution to the general problem applies here as well. I would argue that, if anything, the generalization helped you notice a pattern you had not before seen. Would you still have noticed this pattern had you not formulated the problem in a general way? Perhaps, perhaps not.

In my opinion, what your experience shows is that formulating a problem $Q$ in more general terms $P$ is one of many ways by which one can gain a fundamental insight that provides the key to the solution of the general problem $P$ (and thus inevitably also solves the initial special case $Q$ also). Sometimes, this can lead to a solution that was as of yet unknown to you and that will be more elegant or easier than the previous solutions. However, given that such an insight could easily have come without generalizing the problem, the fact that the solution did come from you thinking about the generalization seems highly circumstantial to me.

EDIT: JimmyK4542's example (and Feynmann's integration trick) seems like a spectacular demonstration of the phenomenon, however.

  • 3,718
  • 2
  • 18
  • 27
  • 1
    I merely meant that as an example to get the discussion started, and you're of course right that in this case one doesn't have to generalize to see the solution. But personally, I didn't *think* of writing $+a-a$ etc. because I didn't feel the *need* to. Once I considered the general case, I was compelled to write it like this. Anyway, I think that better examples have now been given on here that really illustrate the point - I'm particularly fond of the Feynman integration trick. – MGA Aug 16 '14 at 20:16
  • 2
    JimmyK4542's example (and Feynmann's integration trick in general) indeed contradicts my claim that a *solution* to a general problem always applies to a special case. I'll edit this out. – user78270 Aug 16 '14 at 23:53

On this site one frequently finds under the linear-algebra tag questions of the kind: what is the determinant of a matrix $$ \begin{pmatrix}a&b&b&b\\b&a&b&b\\b&b&a&b\\b&b&b&a\end{pmatrix}? $$ (I've just posted this question, which contains a list of such questions). It turns out finding an answer to this question becomes almost trivial (see my answer to the linked question) when reformulated more generally as

What is the characteristic polynomial of a square matrix$~A$ of rank$~1$?

knowing that by specialisation the answer gives the determinant of $\lambda I-A$ for any scalar$~\lambda$.

Marc van Leeuwen
  • 107,679
  • 7
  • 148
  • 306

From time to time famous problems have such feature. History suggests this point: the transcendentalness of $\pi$ solves the long-lasting problem of squaring a circle, analytic geometry and irrationality theory solve the problem of doubling a cubic, Galois's invention of group theory and quintic function, Kummer's invention of ideals and Fermat's last theorem, global differential geometry and Chern's intrinsic proof of Gauss-Bonnet theorem, and so on.

  • 20,318
  • 3
  • 22
  • 52

A broadly successful application of this was introduced by Richard Bellman under the phrase dynamic programming. The story of the "birth" of this now foundational topic in applied math is told largely in Bellman's own words here.

A related term gives more evidence of the connection of ideas: invariant imbedding.

A good discussion of dynamic programming references and examples came up early at StackOverflow, but was subsequently closed as off-topic.

An illustration is finding a shortest path between two specified points by "imbedding" that problem in finding all shortest paths from one point, Dijkstra's algorithm.

  • 35,235
  • 19
  • 69
  • 133

The most spectacular example I have seen is this one:

Suppose A is an $n\times n$ matrix with eigenvalues $\lambda_1$, ..., $\lambda_n$, including each eigenvalue according to its multiplicity. Then $A^2$ has eigenvalues $\lambda_1^2$, ..., $\lambda_n^2$ including multiplicity.

To prove this is in fact very very hard. (It's easy to show that $\lambda_1^2$, ..., $\lambda_n^2$ are all eigenvalues of $A$ by considering their eigenvectors, but unless you the dimensions of the eigenspaces match the multiplicities you're stuck.)

However, the proof of the following statement is actually perfectly possible using elementary arguments (albeit clever arguments):

Suppose A is an $n\times n$ matrix with eigenvalues $\lambda_1$, ..., $\lambda_n$, including each eigenvalue according to its multiplicity. Then for any polynomial $g(x)$, $g(A)$ has eigenvalues $g(\lambda_1)$, ..., $g(\lambda_n)$ including multiplicity.

  • 3,232
  • 12
  • 24

Generalization comes up a lot when doing induction. For example,

$$\forall n ~~ \sum_{k=0}^n 2^{-k} \le 2$$

is difficult to prove directly using induction on $n$. However, if you generalize to a stronger statement:

$$\forall n ~~ \sum_{k=0}^n 2^{-k} \le 2 - 2^{-n}$$

Then induction may be used directly:

$$\sum_{k=0}^{n+1} 2^{-k} \le 2 - 2^{-n - 1}$$ $$\sum_{k=0}^n 2^{-k} + 2^{-n-1} \le 2 - 2^{-n - 1}$$ $$\sum_{k=0}^n 2^{-k}\le 2 - 2^{-n}$$

Obviously you could see that it is a geometric series, but that is a generalization also.

problems that become easier when you formulate them in a more general (or ambitious) form

The potential difficulty of a generalization isn't the only disadvantage. If you disprove a generalization, then you haven't disproven the original theorem. In that respect, a generalization effectively forces you to pick sides in the investigation of a theorem.

  • 22,195
  • 5
  • 35
  • 67

A nice example appeared on this web site today: Every prime number $p\ge 5$ has $24\mid p^2-1$ .

As posed, the problem sounds like it might be difficult. But it is very easy to show the more general result that every $n$ of the form $6k\pm 1$ has the required property.

  • 62,206
  • 36
  • 276
  • 489