There are many counter-intuitive results in mathematics, some of which are listed here. However, most of these theorems involve infinite objects and one can argue that the reason these results seem counter-intuitive is our intuition not working properly for infinite objects.

I am looking for examples of counter-intuitive theorems which involve only finite objects. Let me be clear about what I mean by "involving finite objects". The objects involved in the proposed examples should not contain an infinite amount of information. For example, a singleton consisting of a real number is a finite object, however, a real number simply encodes a sequence of natural numbers and hence contains an infinite amount of information. Thus the proposed examples should not mention any real numbers.

I would prefer to have statements which do not mention infinite sets at all. An example of such a counter-intuitive theorem would be the existence of non-transitive dice. On the other hand, allowing examples of the form $\forall n\ P(n)$ or $\exists n\ P(n)$ where $n$ ranges over some countable set and $P$ does not mention infinite sets would provide more flexibility to get nice answers.

What are some examples of such counter-intuitive theorems?

  • 3,516
  • 3
  • 15
  • 26
  • 78
    There are only five platonic solids. – D Wiggles Dec 02 '16 at 19:29
  • The Peano axioms require induction on $\mathbb{N}$, is it allowed ? – reuns Dec 02 '16 at 19:31
  • 11
    @IBWiglin: I would call that an intriguing result rather than counter-intuitive. – Burak Dec 02 '16 at 19:31
  • 2
    @Burak I think it depends on how you got think about it. If you start trying to build Platonic solids, then it's really easy at the beginning. Then it gets harder, but why should it be impossible to build more? – D Wiggles Dec 02 '16 at 19:33
  • @user1952009: I didn't understand your question. Are you asking about the proof of the proposed example? If so, then yes, the proof can involve infinite sets. – Burak Dec 02 '16 at 19:37
  • 35
    There are many at-first-counter-intuitive results in probability theory. Like so called Boy-Girl Paradox or Monty Hall Problem. – Levent Dec 02 '16 at 19:40
  • 24
    ...and the birthday paradox. –  Dec 02 '16 at 22:38
  • 2
    I would say a LOT of combinatorics is surprising. Or Boolean functions, for that matter. You only have $2^{2^n} $ of them, but... – Clement C. Dec 03 '16 at 01:37
  • This story might be of interest: http://mathoverflow.net/a/43477/93602 – polfosol Dec 03 '16 at 05:33
  • 7
    @IBWiglin Because each corner has to be three elements, so hexagons and polygons with more sides are too big. More than five triangles, three squares or three pentagons are also too big. I find it counter-intuitive that that's the only criteria, that every corner that can be made makes a solid, but I'm pretty sure with physical pieces it would pretty obvious that there are five and only five platonic solids. – prosfilaes Dec 03 '16 at 20:21
  • 1
    by "counter-intuitive" do you mean pardoxical? or just "not obvious at first"? –  Dec 03 '16 at 22:45
  • 1
    [The blue-eyed islanders puzzle](https://terrytao.wordpress.com/2008/02/05/the-blue-eyed-islanders-puzzle/) – CodesInChaos Dec 04 '16 at 00:18
  • *counter-intuitive* is very poorly defined here. – RBarryYoung Dec 04 '16 at 02:38
  • @mobileink: Counter intuitive is supposed to mean "looking paradoxical at first sight". However, since I am looking for examples of theorems, the results cannot be actually paradoxical. Results which do not seem paradoxical but are *really* surprising are also acceptable. – Burak Dec 04 '16 at 07:48
  • Related: [‘Obvious’ theorems that are actually false](//math.stackexchange.com/q/820686/121411)‪ and‪ [Widely accepted mathematical results that were later proved wrong?](//mathoverflow.net/q/35468/52965) – Scott Dec 04 '16 at 08:07
  • To the OP: you mentioned in a comment to a now-deleted answer that you don't consider Goedel's incompleteness theorem to be an example. I'm curious why; if we adopt the strengthening due to Rosser, the incompleteness theorem states "If PA is consistent, then PA is not complete." Although this requires quantification over naturals, it does not require direct reference to infinite sets, so seems to fit your more general version (or does the fact that it involves *two* quantifiers matter?). – Noah Schweber Dec 04 '16 at 23:16
  • (Cont'd) By the way, the incompleteness theorem has a one-quantifier version: unwinding the proof, there's a primitive recursive function $f$ such that if $n$ is the Goedel-number of a PA-proof of the Rosser sentence $R_{PA}$, then $f(n)$ is the Goedel-number of a PA-proof of "$0=1$". Then via the usual representation machinery, the incompleteness theorem can be stated with one quantifier: namely, saying that $f$ does what I just said it did. (It matters that $f$ be primitive recursive and not just recursive, since otherwise we'd need to state totality as well, and that takes two quantifiers.) – Noah Schweber Dec 04 '16 at 23:19
  • @NoahSchweber: Dear Noah, That's because I was thinking of the general version ("For any recursively enumerable extension of Q...") where we also quantify over theories. If you pick a specific instance for a theory, then yes you can state it with one or two quantifiers. But in this case, it is not as striking as the general version. (Imagine the scenario where the theorem held for PA, but not for some recursive extension. Then you would be blaming PA for its incompleteness rather than being amazed at that it is insufficient.) – Burak Dec 05 '16 at 07:17
  • [Pairwise independent, but not mutually independent dice sides](https://math.stackexchange.com/questions/720045/probability-of-a-tetrahedron-die-with-4-faces) – Vi0 Dec 05 '16 at 10:52
  • 1
    @Burak Even the more general version can be finitized (although AFAICT more quantifiers are needed): there is an explicit primitive recursive function which, when given a primitive recursive code for any primitive recursively axiomatizable theory extending PA, outputs the Goedel number of a sentence which is unprovable in that theory unless that theory is inconsistent. This can all be expressed with a few quantifiers over $\mathbb{N}$, and no direct reference to infinite sets. Now, whether it can be gotten down to *one* quantifier is harder ... – Noah Schweber Dec 05 '16 at 15:33
  • personally I find almost all of modern mathematics astonishing. just look at groups - N under addition is the same as N+ under multiplication! who woulda thunk it? same for finite groups-zowie! Category theory is full of amazing insights into finite structures. –  Dec 05 '16 at 20:27
  • How many things are counter-intuitive depends on how poor your intuition is. – jwg Dec 06 '16 at 09:43
  • 1
    @mobileink The reason that $\mathbb{N}, +$ being the same as $\mathbb{N}^{+}, \times$ is so surprising is because it isn't true. – jwg Dec 06 '16 at 15:26
  • Chess and many other games are finite w/many counterintuitive positions... – DVD Dec 08 '16 at 22:16
  • 1
    A single integer can be thought of as representing infinite information. e.g. 2 is the set of all sets that have 2 elements. – Bradley Thomas Dec 09 '16 at 13:50

36 Answers36


100 prisoners problem.

Citing Sedgewick and Flajolet, the problem reads as follows:

The director of a prison offers 100 death row prisoners, who are numbered from 1 to 100, a last chance. A room contains a cupboard with 100 drawers. The director randomly puts one prisoner's number in each closed drawer. The prisoners enter the room, one after another. Each prisoner may open and look into 50 drawers in any order. The drawers are closed again afterwards. If, during this search, every prisoner finds his number in one of the drawers, all prisoners are pardoned. If just one prisoner does not find his number, all prisoners die. Before the first prisoner enters the room, the prisoners may discuss strategy—but may not communicate once the first prisoner enters to look in the drawers. What is the prisoners' best strategy?

Surprisingly, there exists a strategy with surviving probability more than 30%. It is connected to the fact---also non-intuitive---that a big random permutation is quite likely to contain "small" cycles only.

Peter Franek
  • 10,750
  • 4
  • 24
  • 45
  • 3
    This is definitely counterintuitive! I would expect the best probability to be 2^-100 since it seems each prisoner would have a 50% chance of finding his number... – user541686 Dec 04 '16 at 04:24
  • 29
    @Mehrdad In fact, each prisoner *has* a 50% chance. Just that those events are not necessarily independent. – Peter Franek Dec 04 '16 at 10:37
  • @PeterFranek: Yeah, I figured as much. :) – user541686 Dec 04 '16 at 11:05
  • 10
    @SimpleArt: Indeed they were. Unfortunately the first prisoner failed to find the correct number, so the prisoners are no longer. – Marc van Leeuwen Dec 05 '16 at 08:07
  • 11
    A variant of this was posted on puzzling stack exchange where a person before the beginning gets to swap two of the numbers in the drawers. If you allow for that the prisoners always survive. – DRF Dec 06 '16 at 11:18
  • Not smart enough to frame @SimpleArt thus avoiding the situation. – Daevin Dec 08 '16 at 14:57
  • 1
    @SimpleArt Or just one very smart and very convincing one :) – htd Dec 08 '16 at 17:34
  • A nice explanation: https://youtu.be/vIdStMTgNl0 – Mahathi Vempati Dec 09 '16 at 14:11
  • 1
    http://puzzling.stackexchange.com/questions/16/100-prisoners-names-in-boxes?answertab=votes#tab-top http://puzzling.stackexchange.com/questions/23150/how-to-beat-count-dracula?noredirect=1&lq=1 – kaine Dec 14 '16 at 16:17

The hydra game. Quote from the link:

A hydra is a finite tree, with a root at the bottom. The object of the game is to cut down the hydra to its root. At each step, you can cut off one of the heads, after which the hydra grows new heads according to the following rules:

If you cut off a head growing out of the root, the hydra does not grow any new heads.

Suppose you cut off a head like this:

Delete the head and its neck. Descend down by 1 from the node at which the neck was attached. Look at the subtree growing from the connection through which you just descended. Pick a natural number, say 3, and grow that many copies of that subtree, like this:

The counter-intuitive fact: you can always kill the hydra using any algorithm. The counter-intuitive meta-fact: You can't prove the theorem in PA.
  • 3,929
  • 10
  • 24
  • 36
  • 9
    My favourite! The best part is that it is unprovable using Peano arithmetic. – The Vee Dec 03 '16 at 22:54
  • 12
    Anyone try [here](http://math.andrej.com/wp-content/uploads/2008/02/Hydra/hydraApplet.html) for yourself to see how **ridiculously** fast it grows, yet you know if you're diligent enough, you can't lose. – The Vee Dec 03 '16 at 23:03
  • @TheVee, thanks for the edit. How did you do it? Using the picture button or editing by hand? – Martín-Blas Pérez Pinilla Dec 03 '16 at 23:04
  • The main part was replacing `
    ` by the corresponding Markdown, somehow the pictures couldn't display within the former. Then I took the template for what a picture link looks like from the button but wrote them myself for brevity.
    – The Vee Dec 03 '16 at 23:05
  • It seems like Markdown doesn’t work inside HTML markup like `
    `. Using Markdown all the way is usually the course of least resistance, but you can put images within a `blockquote` by using ``.
    – Scott Dec 04 '16 at 04:34
  • 1
    _"You can't prove the theorem in PA."_ So is there a practical way I can prove to my non-mathematician (but still engineer/science student) friends that you can't lose this game? They do understand a lot of proofs but I don't think talking about axioms explicitly outside PA will be understandable. They definitely won't understand ordinals like $\omega^{\alpha_1}$ given in the link, but on the other hand, I think they would intuitively accept things like axiom of choice if I don't refer to them as axioms. – JiK Dec 05 '16 at 11:21
  • 1
    @JiK, the idea of good order is reasonably easy... – Martín-Blas Pérez Pinilla Dec 05 '16 at 12:06
  • 1
    @Martín-BlasPérezPinilla _Well_-order, you mean? – Akiva Weinberger Dec 05 '16 at 15:56
  • 1
    @AkivaWeinberger, right. Bad retro-translation. – Martín-Blas Pérez Pinilla Dec 05 '16 at 18:35
  • 9
    The first fact wasn't counter-intuitive to me. If you remove a head, then either you remove a level (by removing the last head on the topmost level), or you decrease the maximum size of branches on that level, or you decrease the number of branches of that size on that level, and hydras are well-ordered by those properties. (Proof sketch only) – user253751 Dec 06 '16 at 02:29
  • 4
    @immibis: The problem is that you're tacitly assuming there's no [limit ordinal](https://en.wikipedia.org/wiki/Limit_ordinal), which is unprovable in PA. You could have [0, 1, 2, ..., omega, omega + 1, omega + 2, ...](https://en.wikipedia.org/wiki/Ordinal_number) and that wouldn't violate any of the PA axioms. – Kevin Dec 06 '16 at 22:08
  • 1
    @Kevin I'm not thinking about ordinals at all (although I could be thinking about something isomorphic to ordinals without knowing it). Where exactly am I assuming something about ordinals? – user253751 Dec 06 '16 at 23:29
  • @Kevin Do you mean I'm assuming all of the variables involved are natural numbers? – user253751 Dec 06 '16 at 23:31
  • 1
    @immibis: You are assuming that the ordinals from omega up are not in the naturals. PA has no axiom which forbids this from being the case. It *only* tells you that 0, 1, 2... are in the naturals. It never says there's nothing else in the naturals. – Kevin Dec 06 '16 at 23:33
  • @Kevin Not even with induction? If $n$ is not a limit ordinal then neither is $n+1$. – user253751 Dec 06 '16 at 23:40
  • 3
    @immibis: You cannot express "is not a limit ordinal" in first-order arithmetic. Adding a "big" number is a canonical [non-standard model of arithmetic](https://en.wikipedia.org/wiki/Non-standard_model_of_arithmetic). – Kevin Dec 06 '16 at 23:43
  • 9
    This is a variant of [Goodstein's Theorem,](http://math.stackexchange.com/a/625404/242) where the trees are replaced by integers (in iterated exponential notation). Such results are "counter-intuitive" only if one is not familiar with ordinals (here $\,\varepsilon_0 = \omega^{\large \omega^{\omega^{\Large\cdot^{\cdot^\cdot}}}}\!\!\! =\, \sup \{ \omega,\, \omega^{\omega}\!,\, \omega^{\large \omega^{\omega}}\!,\, \omega^{\large \omega^{\omega^\omega}}\!,\, \dots\, \} ).\,$ The prior-linked post contains links to many accessible expositions on this and related topics. – Bill Dubuque Dec 07 '16 at 00:20
  • 3
    @immibis: You are also assuming that at each step a head is removed from the top level. Removing a lower head is allowed and can increase the number of "maximum size branches" on the top level. – David Hartley Dec 07 '16 at 19:20
  • 1
    @BillDubuque That answer of yours perhaps it's worth reposting here! It's the first thing I thought after reading the question. I know that it is almost the same as Martín's answer, but It should be here in any case. At least up to the statement of Goodstein's Theorem :-) – Pedro Sánchez Terraf Dec 08 '16 at 14:46
  • 2
    @JiK: One does **not** need ordinals to prove the Hydra game terminates. It suffices to use induction in higher-order logic. See http://matheducators.stackexchange.com/a/2449, where I gave a generalization of the Hydra game as well as a sketch of the proof. – user21820 Dec 24 '16 at 14:51
  • 2
    @Kevin: See my above comment. It may be that *immibis* has in mind the same proof as I sketched. It indeed cannot be expressed in PA, but that is more an artifact of the first-order nature of PA and not related to the strength of induction at all. Full induction in higher-order arithmetic is actually very strong. – user21820 Dec 24 '16 at 14:54

Suppose $X$ is any finite set, which will represent a set of voters, and let $Y$ be another finite set, representing decisions or options that the voters can rank. For example, voting on presidential candidates, favorite ice cream, etc. For simplicity, assume that $X=\{1,\ldots, N\}$.

Call a ranking to be a linear ordering on $Y$, and a social welfare function is a map $$F: L(Y)^N \to L(Y)$$ where $L(Y)$ is the set of all linear orderings on $Y$. $F$ essentially shows how to take the rankings of each voter and turn that into a single ranking. The elements of $L(Y)^N$ are an $N$-tuple of rankings, a ranking of $Y$ from each voter. We shall represent such a tuple by $(\leq_n)_{n=1}^N$ and its image by $F((\leq_n)_{n=1}^N)=\leq$.

Since this is to be a voting system, we probably want this to adhere to some rules which enforce the idea that $F$ accurately reflects the rankings of each voter:

  • Unanimity: If every voter ranks $a\in Y$ better than $b\in Y$, then in the output of $F$, society ranks $a$ higher than $b$. Formally, if $a\leq_n b$ for every $n$, then $a\leq b$.

  • Independence of Irrelevant Alternatives: How voters rank $a$ and $b$ should not effect how society ranks $a$ and $c\neq b$. Formally, if $(\leq_n)_{n=1}^N$ and $(\leq_n')_{n=1}^N$ are two tuples of rankings such that the ordering of $a$ and $c$ are the same for each $n$ (i.e. $a\leq_n c$ if and only if $a\leq_n' c$) then the ordering of $a$ and $c$ are the same in society's rankings (i.e. $a \leq c$ if and only if $a\leq' c$).

    Since this is a bit more involved, consider the example of a group ranking the three ice cream flavors of vanilla, chocolate, and strawberry. The group makes their choices, and $F$ says that the highest ranked flavor is chocolate. Then the group learns that strawberry is out, so they rank strawberry as last. It would be counter-intuitive, then, to suspect that all of a sudden vanilla becomes ranked highest (but there are such functions making this true).

    The intuition is the hope that the group's consensus on how it feels about two options should only depend on how each individual feels about those two options.

    Cases where this still fails are usually indicative of cases where the voting scheme can be gamed in some way, i.e. by voting against your favorite option to avoid your least favorite option or by varying how you rank the remaining options to guarantee their loss.

    A good ranked voting system should be such that you benefit most by actually saying what you really think, than attempting to game the system. The failure of Independence of Irrelevant Alternatives allows for this gaming.

This bring's us to our result:

Arrow's Impossibility Theorem: For $Y$ finite and $|Y|> 2$, the only function $F: L(Y)^N \to L(Y)$ satisfying the above two properties is a dictatorship, i.e. there is a fixed $m$ (which depends only on $F$) such that $1\leq m\leq N$ and $F((\leq_n)_{n=1}^N) = \leq_m$.

One method of proof is by considering filters and using the fact that the only ultrafilters on a finite set are the principal ultrafilters.

It is important to note that Arrow's Impossibility Theorem only applies to ranking systems. There are alternatives ways of voting which are not ranking systems and show more promise.

Moreover, whether the hypothesis of the Independence of Irrelevant Alternatives actually captures what we want in all cases is suspect.

  • 16,137
  • 1
  • 33
  • 59
  • 9
    Why do you consider this result to be counterintuitive? It is surprising, yes, but I don't see what intuition would suggest existence of such a system. – Wojowu Dec 02 '16 at 21:16
  • 16
    @Wojowu I would say it is counter-intuitive in that our intuition - perhaps more due to social upbringing rather than a mathematical intuition - makes it seem like many voting systems that are used in the world should be well-behaved (until you actually start creating example scenarios). Another aspect is because the conditions seem like common-sense. That we find it so obvious that these are properties we want usually makes it seem like they should naturally occur in some non-trivial examples. – Hayden Dec 02 '16 at 21:24
  • 6
    @Wojowu because it seems like a bunch of people can get together and vote on something without someone always inviting Hitler. – djechlin Dec 03 '16 at 06:18
  • 1
    'Independence of irrelevant alternatives' doesn't sound obvious or common sense to me. Isn't it's opposite the whole point of ranked ballots? 45% like chocolate, 15% like strawberry. But if they can't have strawberry the 15% all hate chocolate so vanilla wins? – DJClayworth Dec 04 '16 at 05:21
  • @DJClayworth I cannot evaluate your example because it isn't ranked (and I don't know what function you're using to actually evaluate what the group's ranking is), but I disagree, I don't think that's the point of ranked ballots. I think the point of a ranked ballot is that it conveys more info than casting a single vote. If no one in the group has changed how they feel about vanilla compared to chocolate, then suddenly changed how they feel about strawberry shouldn't have an effect on the group's 'consensus' on vanilla versus chocolate... – Hayden Dec 04 '16 at 05:34
  • 1
    ... that all 'decent' (read: used) voting schemes admit such counter-intuitive examples is exactly what Arrow's Impossibility Theorem is saying. – Hayden Dec 04 '16 at 05:35
  • "The group makes their choices, and F says that the highest ranked flavor is chocolate. Then the group learns that strawberry is out, so they rank strawberry as last. It would be counter-intuitive, then, to suspect that all of a sudden vanilla becomes ranked highest (but there are such functions making this true)." -- That doesn't seem counter-intuitive to me. For people that choose S > V > C, switching that to V > C > S because S is unavailable, their ranking of both V and C increases, and I don't see any logical reason why the overall ranking of one cannot be affected more than the other. – hvd Dec 04 '16 at 10:42
  • 4
    Isn't part of the surprise simply calling this property a *dictatorship*? If I understand correctly $m$ is a function of $(\leq_n)_{n=1}^N$, so it's not like an individual can consciously decide the outcome, it's more like by chance we can guarantee that one person got exactly what they wanted. I think that once the result is phrased this way it's not that unintuitive. I really like the answer in any case. – user347489 Dec 04 '16 at 11:16
  • 9
    @user347489 Perhaps I misphrased it. $m$ is a fixed value dependent on $F$. Once $F$ is chosen, there is a single voter which *always* gets their way. I would describe that as a dictatorship. – Hayden Dec 04 '16 at 14:52
  • @hvd The issue is not that the overall ranking of one cannot be affected more than the other, only that the relative ordering of them stays the same. Things can be shuffled around and gaps could grow or get smaller, so long as society still agrees on how two things compare after the fact. It could be that chocolate was third with vanilla in first before strawberry went out and then chocolate became second with vanilla keeping its place, for example. – Hayden Dec 04 '16 at 14:57
  • @Hayden Oh! that wasn't clear at all to me. So does this mean that if after a few elections $m$ realizes they always get things their way, they could use this to gain *absolute power* ? :P – user347489 Dec 04 '16 at 18:50
  • @user347489 Yeah, I edited the statement to make it clearer, thanks for pointing out the confusion. And yeah, probably! – Hayden Dec 04 '16 at 18:52
  • @Hayden But that's my point. If changing S > V > C to V > C > S benefits V more than it does C, it doesn't surprise me that in the overall results, V might suddenly beat C. Even though the relative order of V and C for each individual person remains the same, the relative order for the voters as a whole does change. Let's use a political election as an example. (...) – hvd Dec 04 '16 at 20:53
  • @Hayden Suppose there is one candidate for party A, and two candidates for party B. Suppose the simple rule is used that each voter gets one vote. Suppose the candidate for party A gets 40% of the votes. Suppose the two candidates for party B get 35% and 25%. Now it's extremely likely that if one of the candidates for party B had dropped out, the other candidate would have received 60% of the votes. It doesn't surprise me that there is no alternative voting approach that avoids this, for the simple but non-mathematical reason that if there were, we'd be using it already. – hvd Dec 04 '16 at 20:53
  • I don't see why the unanimity requirement - *in Arrow's theorem's specific settings* is at all common-sense. If it were just two people, then sure; otherwise the positions could be switched to avoid worse bending of other relative preferences. Perhaps I could even say that about the irrelevance requirement. Also - such ranking schemes are difficult to get intuition about anyway, it seems like an invitation to weird preference-warping to me. – einpoklum Dec 04 '16 at 23:25
  • @hvd I'm still not sure how to respond to your political example, since it isn't ranked voting. The hope is that ranked voting carries with it enough information to deal with the existence of two candidates. How a ranked voting system deals with your situation obviously depends on the system, but the hope is that there would be a system that actually reflected the voters' rankings, but obviously that isn't entirely true. I've added some more discussion of the Independence of Irrelevant Alternatives to the answer though, so maybe that'll be useful. – Hayden Dec 05 '16 at 00:01
  • @einpoklum I've added some more discussion of the Independence of Irrelevant Alternatives to the answer that might be useful. Mostly, the point is that a good voting system should reflect the rankings voters give. Being able to game the system by giving things opposite of your actual feelings is a decent deviation from that idea. Additionally, it might be useful to look at some actual systems [here](https://en.wikipedia.org/wiki/Ranked_voting_system). – Hayden Dec 05 '16 at 00:03
  • 6
    Note that Arrow's theorem deals with a deterministic $F$. If you relax that requirement and allow $F$ to map to a random variable, a trivial observation is that "random dictatorship" (choosing a random individual and taking their preference orer to be the group preference order) is what comes out, and it's an interesting thought experiment whether such a system would produce reasonable/acceptable results. – R.. GitHub STOP HELPING ICE Dec 05 '16 at 03:39
  • @Hayden The political example would be where the formula that translates individual rankings to collective rankings only takes into account the top ranked candidate by each individual. Whether the remaining rankings are entered and discarded, or not entered at all, doesn't matter here, the point is that even if they are entered, the problem still cannot be avoided. And I'm not talking about gaming the system, but you're right that that's a logical consequence. – hvd Dec 05 '16 at 05:23
  • @hvd Imagine you took every voter's preference list, and found that in X lists, candidate A was ranked above candidate B. In the remaining Y lists, candidate B was ranked above candidate A. Now candidate C drops out, so they get crossed out from every list. This shouldn't change the value of either X or Y, so the hope/intuition is that there should be a system where C dropping out doesn't change who wins between A and B. From your example, the hope would be that the down-list info of party B's voters all preferring both their candidates over A would somehow be usable to prevent A from winning. – Ben Aaronson Dec 06 '16 at 09:20
  • Re: "Cases where [Independence of Irrelevant Alternatives] still fails are usually indicative of cases where the voting scheme can be gamed by voting against your favorite options to somehow make that option a reality": Not exactly. The classic example of IIA failing is in plurality-wins (first-past-the-post) systems. You can game the scheme by introducing additional options that are more likely to pick up votes from your opponent from yourself; and as a voter, you may be motivated to vote "dishonestly" by restricting your vote to among the options that might actually win; *[continued]* – ruakh Dec 07 '16 at 01:42
  • *[continued]* but there's no way to game the scheme in the way you describe. Voting against your favorite option can only hurt your favorite option's chances of winning. The only motivation for voting "dishonestly" is to boost the chances of your mth-favorite compared to your nth-favorite option, where m < n. – ruakh Dec 07 '16 at 01:44
  • @ruakh Yes, I misspoke; I should have been talking about avoiding your least favorite candidate. Also, while voting against your favorite choice is always bad, strategic voting still occurs in how you rank the remaining choices, regardless of how you feel about them, as happens in Borda Count. – Hayden Dec 07 '16 at 02:27
  • Amartya Sen (in [Maskin and Sen (2014)](https://cup.columbia.edu/book/the-arrow-impossibility-theorem/9780231153287), pp. 35-37) offers the shortest, simplest, and sweetest proof I've ever come across. –  Dec 08 '16 at 04:50

Closed-form formulas exist for solutions of polynomials up to degree 4, but not more than that.

Only 4 colors are required to color a map of any size, with adjacent areas being distinct colors.

Division rings over the reals have a maximum of 4 dimensions, and cannot have more.

Having more than three regular convex polytypes is a property of dimensions only up to 4.

And more number-4 things by Saharon Shelah (presented in Twitter post by Artem Chernikov).

Daniel R. Collins
  • 7,820
  • 1
  • 19
  • 42
  • 20
    I think the fact that some roots of some degree 5+ polynomials cannot be expressed with algebraic operations is even more interesting. – Serge Seredenko Dec 03 '16 at 05:53
  • 6
    This might be interesting as well: https://en.wikipedia.org/wiki/Exotic_R4 – polfosol Dec 03 '16 at 08:28
  • 7
    For polynomials you, might need to add "solutions by radicals." There might be closed form solutions for degree 5 in terms of transcendental functions, say. Also, I would not consider division algebras over the reals "finite objects." – Kimball Dec 03 '16 at 16:48
  • @SergeSeredenko Euuh... care to elaborate? That seems impossible, [per the definition of an algebraic number](https://en.wikipedia.org/wiki/Algebraic_number): _Any complex number that is a root of a non-zero polynomial in one variable with rational coefficients_. – Iwillnotexist Idonotexist Dec 04 '16 at 01:49
  • 2
    @IwillnotexistIdonotexist : Not all algebraic numbers can be expressed using algebraic operations (addition, subtraction, multiplication, division, and positive integral roots) applied to rational numbers (or integers, since division makes the distinction not a difference). Only algebraic roots whose minimal polynomial has a solvable Galois group can be (i.e., only those that live in a tower of simple algebraic field extensions of $\Bbb{Q}$) "Most" fifth and higher degree polynomials do not have solvable Galois groups, so cannot be expressed with algebraic operations on rationals. – Eric Towers Dec 04 '16 at 02:20
  • I downvoted this answer since the second paragraph of the OP *explicitly* says it does not consider real numbers as finite objects. Therefore statements about division rings over the reals, whether they are finite dimensional or not, is not covered in this question. – Burak Dec 04 '16 at 05:12
  • 3
    @Burak Polynomials over rational numbers, as well as radicals over rational numbers, are finite constructions. They deal with countable algebraic numbers and do not need complete reals at all. OP allows countability in the question. – Serge Seredenko Dec 04 '16 at 05:20
  • @Serge Seredenko: And how is that related to the third proposed example in this post? – Burak Dec 04 '16 at 05:27
  • 4
    @Burak: you can replace "real numbers" with "real closed field," and e.g. the real algebraic numbers (of which there are countable many) are a real closed field. So this example doesn't really have anything to do with the real numbers *per se*, and in particular is not connected to how it takes an infinite amount of information to specify an arbitrary real number (which was the OP's objection to using the real numbers as examples). – Daniel McLaury Dec 05 '16 at 14:47
  • @Daniel McLaury : It is a fact that any real closed field is infinite. Therefore any real closed field in fact contains 'infinite amount of information'. Also, it is very easy to generate very counter-intuitive facts on real closed fields since it is already very easy to generate counter-intuitive facts on any infinite set. – Levent Dec 06 '16 at 18:40
  • 2
    @Levent: Just because a set is infinite doesn't mean that it takes an infinite amount of information to specify it. For instance, I can say "the integers" and I've uniquely specified an infinite set using a finite amount of information. Moreover, each real algebraic number can be specified with a finite amount of information (namely, its minimal polynomial together with something specifying which root we're talking about). This is *completely* different from the situation of the real numbers, where there is no way to specify each real number with a finite amount of information. – Daniel McLaury Dec 06 '16 at 18:45
  • 1
    @DanielMcLaury: It seems to me that in order to say "dimension is at most 4" you have to have some existential quantifiers (quantifying over real algebraic numbers) determining the dimension; and of course you have this universal quantifier at the beginning which is supposed to quantify over division rings over real algebraic numbers. Thus, regardless of whether you consider division rings over real algebraic numbers "finite objects" or not, it does not seem to satisfy the requirements... – Burak Dec 07 '16 at 13:31
  • @DanielMcLaury saying "the integers" and having that finite amount of information point to an infinite object is a bit of smoke and mirrors, imo. Not to get too philosophical/semantic, but it's technically an example of exformation https://en.m.wikipedia.org/wiki/Exformation. Just saying "the integers" would not represent an infinite thing if you didn't already have the knowledge on what the integers are, and the fact that they are infinite. So technically you still need an infinite amount of information to specify it, you just already have that information, and point to it with "the integers" – D. W. Feb 22 '17 at 06:31

Simpson's Paradox.

Gender discrimination example. A university has only two graduate departments — Math and English. Men and women apply to both departments, with varying admit rates, as summarized in the table below.

Each department is more likely to admit women than men. But when aggregated, the university as a whole is more likely to admit men than women (and thus open to charges of gender discrimination)!

enter image description here

Drug testing example. We recycle the exact same numbers from before to a context in which the paradox seems even more paradoxical.

There are 300 male patients and 300 female patients suffering from some illness. 200 of the males and 100 of the females receive the new experimental drug. The recovery rates are as given in the table below.

The conclusion from this trial is that if we know the patient is male or female, we should not administer the drug. But absurdly, if we do not know whether the patient is male or female, we should!

enter image description here

For more, see Pearl (2014), who "safely proclaim[s] Simpson’s paradox 'resolved.'”

P.S. Simpson's Paradox illustrates both the fallacy of composition ("what is true of the parts must also be true of the whole") and the fallacy of division ("what is true of the whole must also be true of the parts").

  • 28
    After staring at this for ten minutes, I think I understand the electoral college. – Mike Vonn Dec 07 '16 at 20:26
  • 7
    Isn't that how the US election system works? – Tobias Kienzler Dec 08 '16 at 13:45
  • 7
    Of course they are open to charges of gender discrimination: women are *twice* as likely to get admitted to the English department. That seems like a pretty strong bias. – tomasz Dec 08 '16 at 13:58
  • 1
    @Mauser If you don't understand the electoral college, you should look up how it works. Some there nice videos out there. Here's a brief description with some pros/cons. http://www.newsmax.com/FastFeatures/Electoral-College-voting-rights/2015/07/02/id/653350/ – mbomb007 Dec 08 '16 at 21:26
  • 5
    @mbomb007 Thanks for the link. I understand how it works. I was making a little bit of a joke, but with some truth behind it, and not intending to belittle the institution. This fallacy of composition warns against aggregating dissimilar data, as information, perhaps critical information, will be destroyed upon the aggregation. This same concern was perhaps involved in the development of that process as well. – Mike Vonn Dec 08 '16 at 22:18
  • 2
    @Mauser It's hard to tell just looking at a comment. There are lots of people out there who think "Hillary lost, so it must be a bad thing." – mbomb007 Dec 08 '16 at 22:21
  • 8
    @tomasz: Didn't you know that gender discrimination can only ever be against women? =) –  Dec 10 '16 at 01:46
  • **If you norm the numbers, you get the opposite result**: From 200 men, 140 recover, so from 100 men, 70 recover and (70men+10women)/200 = 0.4 recover with treatment. At the same time if 40 from 200 women recover without treatment, then 20 from 100 recover (statistically), thus (80men+20women)/200=0.5 recover without treatment. Thus, it is not true that the Grand Total is the result for _"if we do not know whether the patient is male or female"_. To the contrary, it is the result for a group, where we know that a bigger ratio of males than females are treated. – exchange Dec 10 '16 at 16:53
  • The "Grand Total" is the result of recovered people of a group where twice as many men as women were treated. It fulfills an additional constraint that is not perceived at first and only therefore seems counter-intuitive. – exchange Dec 10 '16 at 17:07
  • 1
    I don't agree with the interpretation "if we do not know whether the patient is male or female, we should!" This is just an example of the common fact that correlation is not causation. In this case, the result is simply given by the fact that men have in general a much bigger chance of recovering then women and there are simply many more men-treatment than men-nontreatment. But (based on this), the treatment is **not** recommended, even if you don't know the sex of the patient. – Peter Franek Dec 13 '16 at 18:41
  • 2
    This is essentially the same as the principle of comparative advantage in Economics. The paradox is most easily resolved by observing that the overall acceptance rate was very low in English, which is the subject to which the vast majority of women applied. Therefore the complaint of discrimination is valid, on the basis that the University is offering more places in the subject most popular among men. – samerivertwice Nov 01 '17 at 13:07
  • @RobertFrost: I do not see the connection to the principle of comparative advantage (other than the fact that that principle is perhaps also counter-intuitive). –  Nov 03 '17 at 10:28
  • 1
    @KennyLJ the paradox in this case against women, is caused by an OVERRIDING effect of admitting fewer people from the subject which more women apply to. This is analogous to one country's OVERRIDING productivity efficiency being in conflict with the relative productivity in individual industries. – samerivertwice Nov 03 '17 at 10:43
  • Here is an excellent article on Simpson's Paradox: https://plato.stanford.edu/entries/paradox-simpson/ It says that Pearl's "resolution" mentioned above is not uncontroversial. – Max May 30 '21 at 16:09

I feel like the Monty Hall problem is counter-intuitive the first time you see it.

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door (say No. 1). Then the host, who knows what's behind all the doors, opens another door (say No. 3) that is guaranteed to have a goat. He then says to you, "Do you want to now pick door No. 2?" Is it to your advantage to switch your choice?

The answer is yes, you should ALWAYS switch your choice. The reasoning is as follows: at the beginning you have a $1$ in $3$ chance of selecting the correct door. After the host shows you another door, there are only $2$ doors to select from there. You initially chose the first door which had a probability of $1/3$ of being the correct door. Now, because all the probabilities within a set of choices must always add to $1$, we can conclude that the 2nd door is correct with a probability of $2/3$. So indeed, switching your guess is to your advantage.

  • 2,001
  • 14
  • 22
  • 1
    Bridge players should ask themselves when you first learned about restricted choice, did it seem counter-intuitive? – Airymouse Dec 03 '16 at 04:12
  • 4
    The really fun part with the Monty Hall problem is what happens when you slightly alter the rules, but the situation sounds practically identical. You modify it like this - what happens if Monty chooses one of the other two doors at random to reveal (and if it's the car, you're out of luck)... and he reveals a goat? – Glen O Dec 03 '16 at 07:15
  • 16
    There are some instances in which you'd better stay with the first choice https://xkcd.com/1282/ – Del Dec 03 '16 at 10:22
  • 15
    Under this formulation of the question, you do not know if it is better to switch. You also need to know the information that the host will never open the door with the car. Just because he knows what's behind a door does not necessarily imply he will not open a door with a car. – David Stone Dec 04 '16 at 18:38
  • @DavidStone I think you're just misinterpreting the setup because of my awkward commas. Reading as "The host opens another door which has a goat, and then asks to you..." should clear it up. – NoseKnowsAll Dec 05 '16 at 00:48
  • 13
    @NoseKnowsAll that doesn't clear it up. You still need to specify the hosts motivations. Consider an evil monty who only gives you the option to switch if you selected the car on your first pick (and immediately declares you a loser if you picked either goat). – Steve Cox Dec 05 '16 at 19:30
  • The best explanation for this paradox that I heard is: suppose that instead of 3 doors there are 1 000 003 doors, and the host opens 1 000 001 doors with goats in them. Now it's almost obvious that you should switch. I guess the reason is that the concept of "random choice" doesn't exist in our brains, each choice is deterministic, even if it was determined by apparently random and meaningless properties. – Anton Fetisov Dec 05 '16 at 19:56
  • 2
    @SteveCox. Suppose you and your brother play together. You roll dice to select a door as your common first choice. Your brother never switches. He takes a coffee break and returns after all 3 doors are opened. Clearly he wins the car 1/3 of the time. You always switch. You win the car whenever your brother doesn't, which is 2/3 of the time. – DanielWainfleet Dec 06 '16 at 00:47
  • @AntonFetisov, See my comment directed to SteveCox, which I meant to direct to you. – DanielWainfleet Dec 06 '16 at 09:10
  • 4
    I'm divided on how to react to this answer, since on the one hand I don't like to see this defective formulation of the problem propagated, but on the other hand the Monty Hall problem (correctly understood) is a really good example of a counterintuitive result. – David K Dec 07 '16 at 22:06
  • @DavidK, Well I just changed it to clarify that the host is always opening a door with a goat behind it. So hopefully that makes this answer better describe the actual problem. – NoseKnowsAll Dec 08 '16 at 18:13
  • OK, I guess the word "guaranteed" is a sufficient clue that we are describing the rules of the game and not just what happened this time. +1 – David K Dec 08 '16 at 18:38
  • 3
    To me, the most surprising thing about the Monty Hall problem is how ridiculously specific the problem description has to be so that it actually exhibits the intended counter-intuitive property. Almost all popular retellings get it wrong. – Kilian Foth Dec 12 '16 at 09:09
  • To me, the most surprising thing about the Monty Hall problem is how ridiculously hard it is to help people understand that you need to be very specific in your description for it to be the Monty Hall problem. – JiK Dec 25 '16 at 00:01

Monsky's theorem is easy enough to state for anyone to understand, yet arguably counter-intuitive.

Monsky's theorem states that it is not possible to dissect a square into an odd number of triangles of equal area. In other words, a square does not have an odd equidissection.

The problem was posed by Fred Richman in the American Mathematical Monthly in 1965, and was proved by Paul Monsky in 1970.

  • 68,892
  • 6
  • 58
  • 112
  • 26
    It seems pretty intuitive to me. That means that my intuition is either really great or specatacularly terrible. – user159517 Dec 03 '16 at 01:22
  • 6
    @user159517 Hat's off to that, it's not at all intuitive to me. Same goes for the equivalent result about polyominoes, or centrally symmetric polygons. – dxiv Dec 03 '16 at 01:50
  • 13
    @user159517: I felt the same on first glance, and then realised I’d fallen into the “spectacularly bad” camp: I’d been picturing it with just lattice-aligned right-angled triangles. – Peter LeFanu Lumsdaine Dec 03 '16 at 19:20
  • 2
    +1, although this raises the question whether "mathematics that involve only finite objects" excludes things dealing with continuity/topology... I believe Monsky's theorem deals with 2-adic numbers and their topology. – 986 Dec 04 '16 at 04:45
  • @xI_hate_math420x Thanks. Give any kid a paper square plus a pair of scissors, and tell them to cut it for you in 11 triangles of equal size (area), then watch where the "intuition" lies ;-) Back to your point, the commonly known proof of Monsky's theorem makes use of some pretty sophisticated maths, but the theorem itself has a very tangible and purely geometric formulation. In fact, I thought I had once seen a quite different line of proof longtime ago (using just number theory and continuous perturbations) but I can't locate it now, so I may well misremember. – dxiv Dec 04 '16 at 08:05
  • What exactly is a "dissection" as opposed to a "partition"? – einpoklum Dec 04 '16 at 23:33
  • @einpoklum Same thing. I'd have called it a "partition" myself, but I just copied the paragraph from wikipedia (and learned the word [equidissection](https://en.wikipedia.org/wiki/Equidissection) along the way). – dxiv Dec 04 '16 at 23:42
  • 2
    @xI_hate_math420x The proof, maybe, but not the statement of the theorem. – Akiva Weinberger Dec 05 '16 at 15:59

Stacking Books on a Table Edge

Given a rigid (non-deformable), flat, horizontal surface (e.g., a table), a rigid rectangular parallelepiped (e.g., a block, like a book or a brick) can be placed on the edge of the table so that $49.\overline9\%$ of its weight overhangs the edge:

       one book

Assume that you have a very large supply of identical books.  By moving the first book back toward the center of the table (i.e., away from the edge), you can put a second book on top of the first book so that $74.\overline9\%$ of its weight overhangs the edge:

       two books

By adding more and more books, you can get the top one completely beyond the edge of the table — in fact, as far beyond as you want:

six books

This is discussed (and illustrated much better) at Wolfram MathWorld.
Spoiler alert: it boils down to the fact that $$\sum_k\frac1k$$ grows without bound.

  • 1,015
  • 1
  • 13
  • 27
  • 1
    One of my favorite party tricks...also works as a fun class demo when introducing the harmonic series in calc lectures. I've found that commandeering jenga blocks works quite well. – erfink Dec 04 '16 at 08:53
  • Why is this counterintuitive? If you give somebody a couple of books and let them experiment and then ask what's the case, how sure are you they'll answer wrong? I don't think that's overwhelmingly likely, whence it's not counterintuitive. – quid Dec 04 '16 at 22:05
  • 2
    @quid: If you give somebody two books and let them experiment, they’ll never discover the result that I’m discussing, and they will probably be astonished by it. If you give them three or four books, and they try to replicate my third image, the books will fall on the floor. If the person isn’t a mathematician, he might not think to use the harmonic series. The table is the only thing whose location is fixed; the books are loose and subject to falling. I (sort of) understand this, and yet I have trouble explaining how the top books can be supported when no part of them is over the table. – Scott Dec 04 '16 at 23:33
  • 4
    It's too bad there are no rigid surfaces in the world, and they all bend somewhat, which is why we can't really try this at home. – einpoklum Dec 04 '16 at 23:35
  • "and yet I have trouble explaining how the top books can be supported when no part of them is over the table." By the weight of the other books. If you do not insist on just shifting the stack but allow different forms it becomes very easy to do this. – quid Dec 04 '16 at 23:39
  • 14
    What I find counterintuitive is that you can do a lot better. The method given here gets an overhang on the order of $\log n$, but you can achieve on the order of $n^{1/3}$. This is mentioned at the MathWorld page, but see also http://www.maa.org/sites/default/files/pdf/upload_library/22/Robbins/Patterson2.pdf – Gerry Myerson Dec 05 '16 at 08:44
  • 42
    I feel like that last image is highly misleading. While it's possible to get 1 block completely past the edge of the table, the arrangement of the other blocks below it would not look like that. – Shufflepants Dec 05 '16 at 16:41
  • To me this is just a variation on resting a very heavy plank across the table, where the majority of the plank is on the table and done portion overhangs, and resting a book on the overhang. Ultimately the majority of the mass of the "tower of books" must be before the table edge. – Kirk Broadhurst Dec 07 '16 at 04:33
  • But it's a lot trickier when the plank is cut into six-inch pieces (and you don't have any glue, etc.) – Scott Dec 07 '16 at 04:49
  • 15
    Why "$49.\overline{9}\%$" and "$74.\overline{9}\%$" instead of just 50% and 75%? – anomaly Dec 07 '16 at 16:43
  • @anomaly 50% and 75% seem like they would be unstable. – Scott Dec 07 '16 at 17:43
  • 15
    But $49.\overline9 = 50$, so you have created exactly the same unstable situation you wanted to avoid. You might instead write, "$0.5 - \delta_1,$ where $\delta_1$ is a small positive number." – David K Dec 07 '16 at 22:09
  • 2
    The first two images are not at all counter-intuitive. The third and last image is indeed counter-intuitive, but only because "it is not to scale" and "do not try this at home". In a word, it is *impossible*! If you drew it to scale, I don't think the resulting image would seem at all counter-intuitive. –  Dec 08 '16 at 01:03
  • @KennyLJ: Many of the answers here have comments saying, "That's not counter-intuitive."   I congratulate you on have a superior intuition to the majority of the readers here, and I thank you for giving me a comment along with your downvote.   But in my defense, I linked to a Wolfram page that has a to-scale drawing. – Scott Dec 08 '16 at 02:04
  • @Scott: Sorry. I should have been a bit more polite and tactful in my comment. I hope you can see that it's nothing personal. I'm just voicing my opinion and trying to contribute to this site. –  Dec 08 '16 at 03:48
  • What about "_almost_ 50%" etc. instead? – Tobias Kienzler Dec 08 '16 at 13:46
  • You mean what is on the last image will not fall? Are you kidding me? – Mikhail V Dec 10 '16 at 21:46
  • @MikhailV this will of course fall. The way to construct it is start with a tower. Move the topmost as far as possible (which is almost one half its size). Then move the top two as far as possible (keeping the initial shift in place); you can still move those two by a reasonable amount. Continue like this. Eventually the top most will be completely over the table. See the linked to image which looks a lot more *intuitive.* – quid Dec 11 '16 at 18:17
  • @quid ah, now I see, there was comment about it already. The image in link indeed is much more intuitive, so the center of mass of the whole thing simply should not protrude further than table edge. And for a given book width, one cannot move it "as far as you want". – Mikhail V Dec 11 '16 at 19:31
  • Your picture is way off, though. – samerivertwice Nov 01 '17 at 13:15
  • True.  Welcome to the club; you’re about the fourth person to comment on that fact.  Or the fifth, if you count ***me***, inasmuch as I wrote ‘‘(Not to scale)’’ in my answer from the very beginning. – Scott Nov 01 '17 at 17:21

What surprised me most was the following Arab story.

An inheritance of $35$ camels had to be distributed among $3$ brothers as follows: half for the elder, the third for the second and the ninth for the younger. The problem was that the brothers would then have to receive more than $17$ and less than $18$, more than $11$ and less than $12$, more than $3$ and less than $4$ respectively.

"The man who calculated" solved the problem by asking a friend to lend him a camel with which he obtained $36$ camels to distribute following the instructions of the deceased father. Then the brothers received $18$, $12$ and $4$ camels respectively, i.e. more than they had planned so they were very satisfied.

But then the “man who calculated" distributed $34$ camels so that in addition to returning the camel loaned by his friend had for himself a camel as a reward for his calculations.

It was several years after I read this story that I could explain mathematically what happened.

  • 25,368
  • 3
  • 24
  • 47
  • 2
    I read 'the man who counted' a long time ago, and this segment has remained with me the whole time. – Liam Dec 02 '16 at 22:30
  • 1
    I have heard another version. The man had 17 camels and in his will he said the first son gets half, the second son gets one-third, and the last one gets one-ninth. They were fighting until a man arrived and said let me borrow you a camel! Then the sons took 9, 6 and 2 camels respectively, and left the man with his own camel :) – polfosol Dec 03 '16 at 05:47
  • 26
    $1/2 + 1/3 + 1/9 = 17/18$. I have no idea why the brothers would not accept extras if there had been no camel lent, but if they did, they wouldn't have needed to borrow anything. – John Dvorak Dec 03 '16 at 08:18
  • 6
    @JanDvorak Well then they would fight over who got the extra, of course. – Kimball Dec 03 '16 at 16:44
  • 1
    @Kimball all of them would get an extra – John Dvorak Dec 03 '16 at 16:44
  • 4
    @JanDvorak: psychologically it works because borrowing the camel makes it clear to the brothers that they are receiving the "extra" in exact proportion to their "rightful share". If you just told them "take 18/12/4" then you still have to demonstrate that you haven't unfairly given more extra to one than to the other, just to make round numbers. The number 36 likely will feature somewhere in this demonstration. You don't really need the physical camel to do that, of course, but it does make it easier, and the brothers aren't good with fractions. – Steve Jessop Dec 04 '16 at 15:49
  • 1
    So, it's a psychological puzzle rather than a math puzzle :-D – John Dvorak Dec 04 '16 at 15:51
  • @JanDvorak: I would say so, yes, since the goal is to find something that seems reasonable and that is "as close as possible" to following a set of instructions that cannot be followed precisely without access to a butcher. – Steve Jessop Dec 04 '16 at 15:53
  • This is an overly complicated version of the story. You could just do it wtih 11 horses and the will of the father being "To my firstborn son I bequeath half of my herd; to my second son - a third; and to my youngest - the all that remains." – einpoklum Dec 04 '16 at 23:31
  • 23
    How is this an example of a counter-intuitive theorem? The fact that $1/2 + 1/3 + 1/9 \neq 1$ doesn't seem so strange to me. – JiK Dec 05 '16 at 13:36
  • 2
    @JiK I agree. It seems like it would be much more counter-intuitive if it *did* equal 1. – jwg Dec 06 '16 at 09:47
  • 4
    @jwg if it did equal one it wouldn't work.:D The whole point is you have an 1/18th to spare. Which means you have almost 2 camels left over at the beginning. – DRF Dec 06 '16 at 11:41

There's this game about paying taxes that I saw in Yale's open course on game theory. link

We have tax payers and a tax-collecting agency. The tax payers can choose whether to cheat paying $0$ or not cheat paying $a$. The agency can choose whether to check whether someone has paid tax. If someone cheats and the agency finds out, he'll have to pay $a$ to the agency and a fine $f$ that doesn't go to the agency. To check each tax payer, the agency has to spend $c$. $c$ is less than $a$. The payoff matrix is as follows:

                          Tax Payer
                      Cheat      Not Cheat
         Check   | (a-c, -a-f) | (a-c, -a) |
Agency           +-------------+-----------+
       Not Check |   (0, 0)    |  (a, -a)  |

One's intuition may suggest that as $f$ increases, tax payers will cheat less often. But solving for the Nash equilibrium tells us that in an equilibrium, as $f$ increases, the probability a tax payer cheats doesn't change, but the agency will check less often. Maybe the tax payer will cheat less often and the agency will check at the same probability when $f$ has just changed, but given enough time, rational tax payers and agency will play the Nash equilibrium.

The Nash equilibrium is that the tax payer cheats at a probability of $\frac c a$, and the agency checks at a probability of $\frac a {f+a}$. The tax payer's expected payoff is $-a$. The agency's expected payoff is $a-c$.

  • 541
  • 3
  • 4
  • 3
    So the moral of the story is that you shouldn't let the IRS keep 100% of taxes levied, they should have to hand some of it over to the treasury? ;-) – Steve Jessop Dec 04 '16 at 16:09
  • Even though `f` shouldn't affect the agency in any way whatsoever... – user253751 Dec 06 '16 at 02:35
  • @immibis - and `c` shouldn't affect the taxpayer (or tax cheat) whatsoever, either. Interesting, though, that the expected payoff for both the taxpayer and the agency is independent of the fine itself. – Glen O Dec 06 '16 at 05:54
  • 7
    Since I posted the last comment I had an epiphany of how this can be the case (intuitively; I still haven't bothered to do the maths). If the fine increases then less people want to cheat, so now the agency is wasting their money doing so many checks for so few cheaters, so they do less checks (which increases the number of cheaters again). – user253751 Dec 06 '16 at 07:53
  • 1
    This doesn't seem counter-intuitive. If the fine increases, then the agency will not need to check as much in order to enforce a level of compliance which they are happy with. – jwg Dec 08 '16 at 12:01
  • @jwg: I think you've misread the answer. The answer asserts that the equilibrium level of compliance does *not* vary with the fine. – ruakh Dec 10 '16 at 23:01
  • @ruakh I didn't misread or misunderstand it. The level of compliance stays constant. It is clear that if compliance stays fixed, the size of the fine and the probability of being caught should move in opposite directions. – jwg Dec 12 '16 at 10:05
  • Note that the agency does not receive the fine and that the tax payer has another strategy which also guarantees $-a$ to him. – Carsten S Feb 13 '17 at 00:52

Here is a consequence of the Arc sine law for last visits. Let's assume playing with a fair coin.

Theorem (false) In a long coin-tossing game each player will be on the winning side for about half the time, and the lead will pass not infrequently from one player to the other.

The following text is from the classic An Introduction to Probability Theory and Its Applications, volume 1, by William Feller.

  • According to widespread beliefs a so-called law of averages should ensure the Theorem above. But, in fact this theorem is wrong and contrary to the usual belief the following holds:

    With probability $\frac{1}{2}$ no equalization occurred in the second half of the game regardless of the length of the game. Furthermore, the probabilities near the end point are greatest.

In fact this leads to the Arc sine law for last visits (see e.g. Vol 1, ch.3, section 4, Theorem 1).

Remarkable statements cited from Chapter III: Fluctuations in Coin Tossing and Random Walks:

  • For example, in various applications it is assumed, that observations on an individual coin-tossing game during a long time interval will yield the same statistical characteristics as the observation of the results of a huge number of independent games at one given instant. This is not so.

and later on:

  • Anyhow, it stands to reason that if even the simple coin-tossing game leads to paradoxical results that contradict our intuition, the latter cannot serve as a reliable guide in more complicated situations.

An example:

Suppose that a great many coin-tossing games are conducted simultaneously at the rate of one per second, day and night, for a whole year. On the average, in one out of ten games the last equalization will occur before $9$ days have passed, and the lead will not change during the following $356$ days. In one out of twenty cases the last equalization takes place within $2\frac{1}{4}$ days, and in one out of a hundred cases it occurs within the first $2$ hours and $10$ minutes.

  • 94,265
  • 6
  • 88
  • 219
  • 1
    just started reading this as finally had some time on hands. looking forward to that chapter! – Mehness Dec 03 '16 at 23:34
  • 1
    I would think about this like a random walk. The walk could cross the y-axis often, but it's just as likely it's wandered off to one half or the other – Dylan Frese Dec 07 '16 at 01:08
  • @DylanFrese: I fully agree. This is also my preferred modelling of the situation. – epi163sqrt Dec 07 '16 at 05:16
  • 2
    Out of context, depending on which statistics you look at, the statement about "the results of a huge number of independent games at one given instant" seems either trivial (the lead can change during a series of games, but not during a number of simultaneous games) or untrue (if there are $N$ games in either case and $X$ is the total number of heads, how does sequence vs. simultaneous affect the moments of $X$?). I presume there was some discussion prior to that statement that explains it better. – David K Dec 07 '16 at 22:22
  • 1
    @DavidK: If I understand correctly, what it's saying is this: if you gather scores of a single long-running game after every *N* flips, you'll get a very different distribution than if you gather scores of many independent games of *N* flips. In the long-running game, you might expect that scores will tend to stay near 0 (so that scores < -*N* or > *N*, while possible, are extremely unlikely, so don't affect the overall shape of the distribution), but in fact scores far from 0 are quite likely. – ruakh Dec 10 '16 at 23:53
  • @DavidK: In English, "after every *N* flips" means "after the Nth flip, after the (2N)th flip, after the (3N)th flip, etc.". So if *N* = 1000 and you collect 100 scores, then in the long-running game you will have observed 100,000 flips, just as in the case of multiple simultaneous games. – ruakh Dec 11 '16 at 03:59
  • @DavidK: Property #2 is the one you don't have. Scores are cumulative; so in the long-running game, the first score counts the first *N* flips, the second score counts the first 2N flips, etc. (So, for example, if *N* = 3, and the results of the flips are HHH HHH HHH ..., then in the long-running game the cumulative scores are 3 6 9 ..., whereas in the simultaneous games the scores are all 3.) – ruakh Dec 11 '16 at 04:11
  • @DavidK: Well, remember, the *expected* score (the mean of the distribution of scores) is always 0. So the expected score after N flips is the same as the expected score after 10N flips. So you might naively expect the distributions to be the same. (This naive expectation is informally called the "Law of Averages".) But in fact, the standard-deviation of scores after 10N flips is far greater than the standard-deviation of scores after N flips, so the distributions are very different. If you don't find that surprising/counterintuitive, then great! But many people do. – ruakh Dec 11 '16 at 04:17
  • @DavidK: (I should make clear, by the way, that I haven't read that book. I'm just telling you how I interpret it based on this answer. But I'm pretty sure my interpretation is the right one, or nearly so.) – ruakh Dec 11 '16 at 04:19
  • @DavidK: "Same mean implies same distribution" would always be a fallacy, but it's not intuitive when phrased like that. It's only in the context of an ongoing sequence like this that it (or its consequences) can seem intuitive, and therefore be a fallacy that people actually fall into. – ruakh Dec 11 '16 at 04:43
  • @ruakh Here's the chapter in question: cbmc.it/~marchettil/LinguaggiBioinfo/Feller-chap3.pdf It's quite interesting, of course. – David K Dec 11 '16 at 05:33
  • @DavidK: I've added an example from section 4 which is quite illustrative. – epi163sqrt Dec 11 '16 at 07:44
  • Trying to parse the example: by "a great many coin-tossing games" we mean many independent _series_ of coin tosses, each series consisting of over thirty million flips (one per second, continuing for a year), right? And taking each entire _series_ as a "game," if I understand the table on page 83, in the majority of games there is no change in who is "ahead" during the last 311 days of the year. – David K Dec 11 '16 at 15:41
  • The question regarding the quote from the start of the chapter is, what does it mean by "the results of a huge number of independent games at one given instant," what _true_ fact about those results is being incorrectly assumed to hold for a single long-running game, and how easily can one show that the two situations are not comparable? (Also, by "various applications" do we mean published scientific results, and if so, which ones?) – David K Dec 11 '16 at 15:52
  • I can think of one _false_ interpretation of the Law of Large Numbers for simultaneous games that applies, namely the notion that in a million simultaneous flips the absolute difference between the number of heads and tails will be small. Because of this false notion, one may think that after a million flips one player will probably lead by only a few points and the lead could easily switch in just a few seconds. – David K Dec 11 '16 at 15:53
  • @DavidK: I agree with your interpretation regarding independent series of coin tosses. I'm also pretty sure, that Feller refers to published scientific results, but I'm not aware of these. For me it looks plausible that such wrong assumptions even done be some experts could have been a major motivation for emphasizing these aspects the way he did. – epi163sqrt Dec 11 '16 at 16:26
  • 1
    Perhaps the purpose of the statement about "various applications" was meant to be provocative, to motivate someone to read the chapter. (It worked for me, anyway!) It may be that its exact meaning was never intended to be explained. Certainly the rest of the chapter contains many very well-explained, precise statements that are also surprising or indicative of fallacies, including a questionable inference by Galton in 1876 (page 70). On page 88, "even trained statisticians expect" the lead in the sequential game to change far more often than it actually is likely to. – David K Dec 11 '16 at 17:03
  • @DavidK: Nice interpretation! Good to see that my answer is stimulating. :-) – epi163sqrt Dec 11 '16 at 17:16
  • It shows how much of what we learn we take for granted. Because I read other answers which others say are "not counterintuitive at all" to them, and I did not agree. But I read this one and it isn't counterintuitive at all to me - it's obvious. The point being... I read probability and stats at uni. – samerivertwice Nov 01 '17 at 13:19
  • @RobertFrost: Probability and stats is definitely advantageous. :-) But besides that I have also learned from the authors, that we shouldn't take things for granted only because they are plausible. We should always look for a good reasoning. – epi163sqrt Nov 01 '17 at 13:50

The fallacy of the hot hand fallacy

Suppose you'd like to detect whether a coin ever goes on hot streaks or cold streaks, so that after $k$ heads in a row, the probability of the following flip coming up heads is different from the overall probability $p$ of a heads. To test this, you'll flip the coin $n$ times, and after any streak of $k$ heads, you'll record the outcome of the following flip. Let $X$ be the percentage of your recorded flips that came up heads. For concreteness, let's set the values at $p=\frac{1}{2}$, $n=100$, and $k=3$.

Here is the surprise: for these values, $E[X]\approx0.46$ (not $\frac{1}{2}$!!!!). And, in general for any $0<p<1$, $n\geq 3$, and $0<k<n$, $E[X]<p$, and the bias can be quite large for certain values of $n$ and $k$.

This is counterintuitive enough that when Gilovich, Vallone and Tversky wrote their seminal paper Hot Hand In Basketball in 1985 measuring whether basketball players went on "hot streaks", they used the exact method above to attempt to detect hot streaks, and since the percentage after three hits in a row was not different from the overall percentage, they concluded that there was no evidence of a hot hand. But this was a mistake! If there was no hot hand, they should have observed a significantly lower percentage on shots after three hits in a row. In fact, their data do show evidence for a hot hand in many of the cases, according to a new paper last month. This mistake went unchecked for 30 years, with untold numbers pop psychology books and articles citing the result as evidence for a "hot-hand fallacy".


Here's a demonstration in R.

f7 <- function(x){ 
  # running total of run length
  # stolen from http://tolstoy.newcastle.edu.au/R/e4/devel/08/04/1206.html
  tmp <- cumsum(x)
  tmp - cummax((!x)*tmp)

streak <- function(v, k = 3, n = length(v)) {
  # returns a vector of length n = length(v) this is TRUE when the last k
  # entires are True
  c(FALSE, f7(v)[1:(n-1)] >= k)

random_shots <- function(n, p = 0.5) {
# takes n random shots with probability p of success
  runif(n) < p

trial <- function(n, k = 3, p = 0.5) {
  s <- random_shots(n, p) 
  mean(s[streak(s, k)])

# do simulation 100000 times
results <- sapply(1:100000, function(x) trial(100, 3))
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#  0.0000  0.3636  0.5000  0.4615  0.5714  0.8571       3 

What's going on?

One way to see it is to consider the absolute simplest case, where we flip the coin three times, and tally up the flips after a single head in a row.

Outcome   Heads   Flips   Proportion
HHH       2       2       1
HHT       1       2       1/2
HTH       0       1       0
HTT       0       1       0
THH       1       1       1
THT       0       1       0
TTH       0       0       NA
TTT       0       0       NA

The last two outcomes, of course, can't be included in our tally because the proportion of heads is undefined. Now, if we repeat this experiment many times, we'll find that of the sequences that we record, $\frac{2}{6}$ of the time we will have a proportion of $1$, while $\frac{1}{6}$ of the time we'll have $\frac{1}{2}$, for an expected proportion of $$\frac{2}{6}\times{1} + \frac{1}{6}\times\frac{1}{2} = \frac{5}{12} < \frac{1}{2}$$

So, by inspection we can plainly see that in this case the expected proportion is less than 0.5, although at first glance this might still seem unsatisfactory. Yeah, it's less than 0.5 but... why?

I think there are a few ways to hand-wave about this. One has to do with the fact that we have two outcomes with proportion = $1$, but one of those ways has two heads, and the other only has one. So sequences with more heads are weighted the same as sequences with fewer heads, and in this way heads are somehow being underrepresented, leading to the bias.

  • 5,319
  • 4
  • 29
  • 53
  • 8
    So gambler's fallacy is (kind of) true? How come $E[X] \approx 0.46$? – Jasper Dec 05 '16 at 12:20
  • 6
    To my understanding that's because the samples are allowed to "overlap", and to correct for it, HHHH should only count as *two* HH pairs, not three. (After seeing an H, or *k* consecutive H's), you should record the next flip *and reset your counter of consecutive H's*. – user253751 Dec 06 '16 at 02:57
  • 4
    I've figured out part of what's going on. Suppose (as written in this answer) that you flip a fair coin 100 times, and after any streak of 3 heads (with overlaps permitted), you record the outcome of the following flip. Let $A$ be the number of flips that you record, and let $H$ be the number of those which are heads. As expected, $E[H]/E[A] = 1/2$. Part of the reason that $E[H/A] \neq 1/2$ is that $A$ and $H$ are not independent variables. The flips in a heads-heavy trial run "matter less" than the flips in a tails-heavy trial run. – Tanner Swett Dec 06 '16 at 04:01
  • It's worth noting that as you increase the number of trial runs, $E[x]$ gets closer to $1/2$. While at 100 runs $E[x]\approx 0.46$, at 10,000 runs $E[x]\approx 0.499$. This is based only on experimental runs from the given R program, I didn't actually do the math – rtpax Dec 10 '16 at 22:42
  • This is related to this post, which explains the intuition in a different way: https://math.stackexchange.com/questions/2317508/the-hot-hand-and-coin-flips-after-a-sequence-of-heads – Joshua B. Miller Nov 13 '17 at 06:51

Wedderburn's little theorem: In its simplest form, it states that any finite division ring is commutative. I find it borderline magical; as if somehow the axioms do not imply the result, but rather it is forced due to some weird combinatorial coincidence. Of course, this feeling of mine says more about my flawed intuition than about the theorem - but that's true for any example of a "counterintuitive mathematical result".

Another example, even more elementary, is the fact that a matrix's rank is well defined, i.e., that for any matrix (even if it's non-square), the dimension of its column-space is equal to the dimension of its row-space. Over time I came to terms with this result, but when I first encountered it, I've been reading the proof again and again, understanding every step - but still couldn't believe the theorem is true. It was a long time ago, but I clearly remember I've felt like I'm practicing voodoo.

  • 614
  • 5
  • 11
  • 2
    Arguably, the reason for the counterintuitive matrix rank is that matrices are just a not-really-optimal representation of linear mappings. It's fairly intuitive that the image of a linear mapping has the same dimension as the image of its adjoint mapping. – leftaroundabout Dec 03 '16 at 21:17
  • “Young man, in mathematics you don't understand things. You just get used to them.” -- John von Neumann – Mike Jones Dec 06 '16 at 23:33
  • @leftaroundabout: matrices are the only representation I know of linear mappings, what else did you have in mind? – Mozibur Ullah Dec 11 '16 at 06:51
  • @MoziburUllah there are plenty of representations. One representation that makes it particularly clear that the rank doesn't depend on the space dimension is the [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) (which can be defined without ever talking about matrices, only abstract linear mappings). – leftaroundabout Dec 11 '16 at 10:09
  • @leftaroundabout: I fail to see the difference between *abstract* linear mappings and just linear mappings - can you explain the difference? – Mozibur Ullah Dec 11 '16 at 10:12
  • 1
    @MoziburUllah there is no such difference, I just said _abstract_ to emphasize that there's no need to write out the linear mapping in any particular notation (like a matrix). It's sufficient to describe it as a generic function $\varphi : V\to W$ between two vector spaces which has the linearity property $\varphi(\mu\cdot u + v) = \mu\cdot\varphi(u) + \varphi(v)$. Sure enough, any such function (if the spaces are finite-dimensional) _can_ be written in matrix notation, but my point is that this isn't always such a good idea since it obscures interesting properties (like well-defined rank). – leftaroundabout Dec 11 '16 at 10:57
  • @leftaroundabout: well, I don't see what we're arguing about in this case; I already mentioned 'linear mappings' in my comment; the use of the word abstract, in my mind, is simply an added qualifier to distinguish between matrices as a *concrete* representation of linear maps from linear maps axiomatically (ie abstractly treated) treated. – Mozibur Ullah Dec 11 '16 at 11:02
  • @leftaroundabout I don't find the use singular value decompositions very illuminating here because those only exist for real or complex matrices whereas the theorem holds for any field. – Jyrki Lahtonen Feb 14 '19 at 14:24
  • @JyrkiLahtonen I only gave SVD as one alternative lin-map representation example. The point of my comment was that linear mappings should foremostly be understood as linear functions, for which matrices are also merely one representation. – leftaroundabout Feb 14 '19 at 17:01
  • “It is my experience that proofs involving matrices can be shortened by 50% if one throws the matrices out.” -- E. Artin (Geometric Algebra, p. 14) –  Jul 26 '19 at 20:12

I've always thought Bernoulli's paradox was counter-intuitive, and the name suggests that I'm not the only one who thinks this. (Nicolas) Bernoulli offers the following gambling game; You flip an honest coin. If it comes up heads, you win two dollars and get to flip again. If it comes up heads a 2nd time, you win four more dollars and you get to flip again. In general if it comes up heads for n consecutive times, on the nth flip you win 2^n more dollars and get to flip again. The first time the coin comes up tails the game is over, but you get to keep your winnings. Bernoulli asked essentially what the mathematical expectation of this game is. To keep things finite, let's just ask if should you mortgage your house and pay Mr. Bernoulli $250,000 to play this game? Although I would strongly advise against doing so, the mathematics "shows" that you should be willing to put up all the money you have or can borrow to play the game.

  • 548
  • 4
  • 9
  • 1
    For reference, you may wish to see [here](https://en.wikipedia.org/wiki/St._Petersburg_paradox) – Simply Beautiful Art Dec 03 '16 at 00:32
  • 1
    The answer to your question is "yes, if you expect to be able to flip the coin more than 250k times". If you cannot, you should expect to lose. – bers Dec 03 '16 at 13:41
  • 15
    The counterintuivity here comes from not so much from the game itself, as from tacitly conflating “should you do it?” with “is the expected value, in dollars, positive?” This conflation turns our strong (and very justifiable) gut instinct that we shouldn’t play the game into the erroneous intuition that its expected value is negative. – Peter LeFanu Lumsdaine Dec 03 '16 at 15:14
  • 8
    Also known as the St Petersburg paradox. Since it is presented as a real-life challenge, to evaluate it requires the assessment of at least two entailed real-life hazards: (1) Counterparty risk---the offerer cannot or will not pay up as promised. (2) The decreasing marginal utility of wealth---a million-dollar gain might transform your life now, but you would hardly notice it if you already had a billion. – John Bentin Dec 03 '16 at 15:17
  • 6
    @John Bentin I have comforted myself by taking Bernoulli's game as a mathematical proof of that the law of marginal utility applies even to wealth. Also buried away is the tacit assumption of economists that utility is always positive: you have to wonder if, after receiving eleven Lords a leaping, the recipient doesn't begin to thank God there aren't 5000 days of Christmas. Counterparty risk is a red herring: if ,as a gedankin experiment, you ignore it, the paradox remains. – Airymouse Dec 03 '16 at 16:11
  • 1
    @Peter LeFanu I don't think anyone has the bad intuition that the expected value is negative. We all see that the first flip is worth a dollar and that on the next flip we might win more. But you first point is spot on, and I (and probably many others) have not thought of it. As the paradox shows not all mathematically good bets should be taken, and it is also true that not all bad bets should be avoided. States raise money from the poor by selling lottery tickets with expected value less than half their cost. But it's the buyer's only chance to escape poverty, so the purchase is logical. – Airymouse Dec 03 '16 at 16:57
  • 8
    The classical St Petersburg paradox does not pose a limit to the number of flips which makes it a paradox dealing with infinity (which OP wants to exclude). Indeed, the surprising result is that the expected value is $\infty$ so any amount you bet is acceptable to you **if** you play the game enough times. You should limit the game to at most $N$ flips, which will make the expected value $N$. Any bet below $N$ (per game) is acceptable, which again can be counterintuitive if $N$ is a large number. The trick part is realising that the median and expected value are very different in this game. – Thanassis Dec 04 '16 at 02:38
  • @Airymouse: in short, the result is most counter-intuitive to classical economists and derivatives traders, since it demonstrates that the expected monetary value of a position is *not* equal to the utility of holding it ;-) – Steve Jessop Dec 04 '16 at 15:59
  • This essentially boils down to an infinite sum which diverges, as @Thanassis already said. Thus you should edit it to make it finite, otherwise -1. – Nobody Dec 04 '16 at 17:46
  • @ Nobody in particular. I did edit the game, by asking if should you be willing to mortgage your house and offer $250,000 to play, and I pointed out that I made this alteration so that the decision would not involve an infinite sum. Once we offer a fixed amount of money, in this case $250K, you have to consider only 250,001 flips to see that it's a "good" bet. – Airymouse Dec 04 '16 at 19:32
  • 1
    @Airymouse: The edit still does not prevent this question to be about a divergent infinite sum, or equivalently a probability question with an infinite sample space. On the other hand putting a limit on the number of repetitions of coin flips would make it a finite problem. Even so, it remains extremely sensitive to the objection of a fundamental limitation of utility of financial gain: once one has enough money to buy all the goods and services that humanity can produce for the rest of ones lifetime, any additional money has no utility whatsoever. – Marc van Leeuwen Dec 05 '16 at 08:25
  • I'm out on a limb: your arguments have been made by economists and mathematicians, But I don't buy in.. My question, clearly modified from Bernoulli's question, was would you mortgage your house and bet $250,000. You can't even buy an election for that amount of money, but without knowing about divergent series you can figure out that the expected value exceeds this pittance. All you need to be able to do is to understand the bet and be able to add 1+1+1 ... to 250,001. Bernoulli asked what should you be willing to pay to play the game, and all you say applies to his question, but not mine. – Airymouse Dec 05 '16 at 14:11
  • 12
    This problem becomes much more intuitive when you assume any finite bound on the amount of money the house is able to pay out. If you cap the amount you could possibly win at even a trillion dollars, the expected value drops down from infinite to about $10. – Shufflepants Dec 05 '16 at 16:46
  • Furthermore, utility of wealth doesn't apply only to millionaires. No matter your starting wealth, is winning a quadrillion dollars really a thousand times better than winning a trillion dollars? – Théophile Dec 07 '16 at 20:17
  • @Shufflepants could you elaborate on that or give a source? I'm interested to learn how that works out – rtpax Dec 10 '16 at 21:49
  • @rtpax My initial estimate was a bit off, but you can see the math here: https://en.wikipedia.org/wiki/St._Petersburg_paradox#Finite_St._Petersburg_lotteries For a casino that has 1 trillion dollars to pay out with, the expected value is: E = floor(log_2(1000000000000)) + 1000000000000/2^(floor(log_2(1000000000000))) ~= $40.82 – Shufflepants Dec 12 '16 at 15:00

The amazing speed of exponential growth. This can be explained to children, but it will always defy your intuition, take e.g. the famous wheat and chessboard problem. As Addy Pross explains here, this may play a fundamental role in the emergence of life.

Count Iblis
  • 10,078
  • 2
  • 20
  • 43
  • 2
    Whenever it does come within your understanding, I recommend looking into the Ackermann function, Knuth's up-arrow notation, Graham's number, and other related things from googology, recommendably in that order. Then your mind will truly blow up at how fast things can grow. – Simply Beautiful Art Mar 20 '17 at 23:46
  • @SimplyBeautifulArt there is something of this Googology in the long path $27$ takes to reach $1$ by iteration of the Collatz function. – samerivertwice Nov 01 '17 at 13:22
  • @RobertFrost possibly interesting. Personally I find things like TREE(3) and the Goodstein sequence more interesting :P – Simply Beautiful Art Nov 01 '17 at 13:59

That you only need 23 people in a room for there to be a 50% chance that two of them will share a birthday.

  • 1,744
  • 16
  • 30
  • 1,139
  • 8
  • 18
  • 4
    To me this is like the “counter-intuitive” fact about how often you have to double the thickness of a newspaper to reach the moon. It comes down to it being too easy to acquire incorrect intuitions and be surprised they are wrong – we have to learn caution. Now Dorothy Parker’s intuition was that it would not be at all surprising if all the girls at a Yale prom were laid end to end – that’s more like it. – PJTraill Dec 06 '16 at 00:21
  • Write a story about 23 people and explore the relationship between every pair of them. The birthday paradox will then no longer be a paradox. – Théophile Dec 07 '16 at 20:23

As a non-mathematician it seems pretty obvious to me that there are whole number solutions (where $n>2$) for


I'd be shocked if there weren't. There must be some, surely.

  • 743
  • 9
  • 24
  • I'm not sure if you're being sarcastic, but in 1995 this was famously proven not to have solutions. – Akiva Weinberger Dec 05 '16 at 18:55
  • 3
    @AkivaWeinberger yeah, but it is counter-intuitive that it has no solutions. – theonlygusti Dec 05 '16 at 19:05
  • 1
    @theonlygusti I was referring to the "I'd be shocked **if** they weren't" bit; it implies that OP does not know. – Akiva Weinberger Dec 05 '16 at 19:06
  • 21
    @AkivaWeinberger: That he gives this as an answer suggests quite strongly that he is aware of the result and intends to suggest he considers it counter-intuitive. His phrasing suggests British, perhaps with the matching sense of humour – his profile confirms it. – PJTraill Dec 05 '16 at 23:58
  • 4
    What's counter-intuitive about this having no solutions? IMO it's not intuitive that this has an integral solution for _any_ $n\neq 1$; in fact I recall being mildly surprised when I first read that Pythagorean triples exist – of course, just _trying_ a couple of numbers quickly gives an affirmative example, but a priori it seems just as plausible that there's a simple argument à la irrationality of $\sqrt 2$ (small enough to fit in this margin...) that it can't work. What's counter-intuitive is that it required so insanely advanced maths to settle this simple question. – leftaroundabout Dec 07 '16 at 21:21
  • The Simpsons knew of a near miss. –  Dec 10 '16 at 02:06

Langton's ant is a set of rules that are applied to change "pixels" in a grid. After a finite number of steps of applying those rules, the rather chaotic behaviour changes into a periodic one. That can be seen in the picture below.

langtons ant

  • 1,504
  • 9
  • 21
  • Not only that, but there are some turmites that are only slightly more complex to describe than Langton's ant (e.g. 9 turmites with 3 colours where Langton's ant has 2) where it is not known whether they will ever settle down into periodic behaviour. https://github.com/GollyGang/ruletablerepository/wiki/EdPeggsBusyBeaverTurmiteChallenge and scroll down to "Unresolved Turmites". – Rosie F Jul 09 '18 at 20:05

First, a more elementary example: Given two polygons $A$ and $B$ of equal area, I can always cut one into a finite number of polygonal pieces, and rearrange those to form the other; that is, any two polygons of equal area are scissors congruent.

Hilbert's third problem was about extending this result to three dimensions. He conjectured, rightly, that it fails in three dimensions, and this was proved shortly afterwards by his student, Dehn.

Still, even though it was proved rather quickly, and Hilbert's (and others') intuition was right on the mark, I still find it incredibly counterintuitive.

Another geometric example is the refutation of the triangulation conjecture; however, I'm not sure if arbitrary manifolds count as finite objects.

Noah Schweber
  • 219,021
  • 18
  • 276
  • 502
  • This doesn't quite satisfy “only involves finite objects” though, does it? – leftaroundabout Dec 07 '16 at 21:10
  • @leftaroundabout I think the first one does - everything in question can be appropriately represented by a finite set of natural numbers (OK fine, technically we need to consider polytopes with vertices in some fixed countable set - say, all coordinates rational - but the fact still holds). Note that the OP is okay with quantifiers ranging over $\mathbb{N}$, just not direct invocation of infinite sets. The second example is much less finitary, but I thought it was worth mentioning just in case (it can be finitized, but less naturally). – Noah Schweber Dec 07 '16 at 21:16

I don't know whether it qualifies -

If we cut a Mobius strip, instead of getting two strips, we get only one strip (longer, with two 2 twists). When I first saw this, it was quite counter-intuitive to me.

  • 915
  • 8
  • 19
Kushal Bhuyan
  • 6,881
  • 2
  • 21
  • 66
  • 10
    That's wrong. You don't receive a new Möbius strip, but a strip that has **two** twists. That's odd. But what baffled me: When you cut *this* strip, you get *two* strips that are interwound. Further results of cutting are listed at the [Wikipedia page](https://en.wikipedia.org/wiki/M%C3%B6bius_strip#Properties), and it's really getting crazy at some point... – Marco13 Dec 08 '16 at 19:18
  • 3
    I didn't said that one'd get a new Mobius strip but I said that one'd get a long Mobius strip, without mentioning about the twists though. – Kushal Bhuyan Dec 09 '16 at 02:01
  • 6
    @KushalBhuyan But the resulting strip is not a Möbius strip. I don't understand what you are saying. – Improve Dec 10 '16 at 15:28
  • Its a great example even if the details are missing - because it is something one can straight-forwardly and practically *do* - unlike most of the other examples. – Mozibur Ullah Dec 11 '16 at 06:44
  • @ypercubeᵀᴹ Thanks – Kushal Bhuyan Dec 12 '16 at 10:11
  • @Marco13 answer corrected. – ypercubeᵀᴹ Dec 12 '16 at 10:38
  • [This answer](https://math.stackexchange.com/a/67564) might prove helpful for visualization. – J. M. ain't a mathematician Mar 11 '18 at 14:33

Recall the Busy Beaver function, $BB(n)$ is the maximal number of steps that a Turing machine with at most $n$ states will halt on the blank tape, assuming it halts at all.

$\sf ZFC$ cannot decide the value of $BB(1919)$.1

Namely, if there is a contradiction to $\sf ZFC$, then a Turing machine with less than $2000$ states should be able to find it. Yes, $1919$ is a large number, but it's not unimaginably large. But what it means is that $BB(1919)$ is pretty much entirely unimaginable, because we cannot even give it a concrete estimation.

(See this and that on Scott Aaronson's blog.)

1. Under the usual caveat that we need to assume that $\sf ZFC$ is consistent of course.

Asaf Karagila
  • 370,314
  • 41
  • 552
  • 949

I think it is counterintuitive, at first glance, that list coloring graphs can be strictly harder than ordinary coloring.

To expand on what that means: a graph is a set of vertices, some pairs of which are adjacent. (Think of the vertices as dots, with adjacent vertices having a line drawn between them.) A proper $k$-coloring of a graph assigns each vertex a number from $1$ to $k$ such that no two adjacent vertices receive the same color. The chromatic number of a graph $G$, written $\chi(G)$, is the smallest $k$ such that $G$ has a proper $k$-coloring.

A $k$-list-assignment on a graph is a function $L$ that assigns each vertex some "palette" of $k$ different numbers. If $L$ is a $k$-list-assignment, then a proper $L$-coloring of $G$ assigns each vertex a color from its list such that, again, no two adjacent vertices receive the same color. The list chromatic number of $G$, written $\chi_l(G)$, is the smallest $k$ such that $G$ has a proper $L$-coloring for every $k$-list-assignment $L$.

Now, intuition (at least, my intuition) suggests that finding an $L$-coloring should be hardest when all the vertices have the same list $L$ -- in other words, when we're really just looking for a proper $k$-coloring. If different vertices have different lists, aren't there just fewer opportunities for collisions to happen? This suggests that maybe $\chi_l(G) = \chi(G)$ for every $G$, since the hardest list assignments "should" just be the ones that give every vertex the same list.

However, this isn't true! Consider the complete bipartite graph $K_{3,3}$, whose vertices consist of six vertices $v_1, v_2, v_3, w_1, w_2, w_3$ where any pair of vertices $v_iw_j$ is adjacent. This graph clearly has a proper $2$-coloring: color all the vertices $v_i$ with color $1$ and all the vertices $w_j$ with color $2$. But now consider the following list assignment $L$:

$L(v_1) = L(w_1) = \{1,2\}$

$L(v_2) = L(w_2) = \{1,3\}$

$L(v_3) = L(w_3) = \{2,3\}$.

Among the colors $\{1,2,3\}$ appearing in these lists, we would need to use at least two of them on the vertices $v_i$, since no color appears in all three lists, and we would need to use at least two of them on the vertices $w_j$, for the same reason. This means that some color gets used on both a $v_i$ and a $w_j$, which contradicts the assumption that the coloring is proper. So there is no proper $L$-coloring, which means $\chi_l(K_{3,3}) > 2 = \chi(K_{3,3})$. In other words, it's harder to color from these lists than it is to color when every vertex has the same list!

Gregory J. Puleo
  • 6,275
  • 4
  • 21
  • 28
  • Why do you claim it's counterintuitive that colouring should be harder when you have fewer free choices of colour in list colouring? One can consider the "open" colouring to have assigned every vertex a list of $|V|$ colours, and the problem is simply to make that list as short as possible given $E$ and the definitions that apply. Consider by analogy: it is easy to assemble a Yowie toy on the table, but very difficult to fit the pieces back in the capsule, and often impossible to fit the complete toy at all. – Nij Dec 03 '16 at 06:34
  • 4
    @Nij: He doesn't have fewer choices: In both cases, he can choose one of two colours at each node. And the *total* number of colours in the second case is larger (because there are three colours in the union of all lists). – celtschk Dec 03 '16 at 11:58
  • @Nij - as celtschk said, the comparison isn't between finding the L -coloring for my specified L and finding *any proper coloring at all*, but rather, the comparison is between finding the L-coloring for the given L and finding a proper 2-coloring, which is the same as having the list {1,2} at every vertex. It seems intuitive to guess that the latter coloring problem should be harder, even though it turns out not to be. – Gregory J. Puleo Dec 03 '16 at 14:47
  • 1
    Good example. As you no doubt know, the smallest example for $\chi_l(G)\gt\chi(G)$ is the graph $G$ you get by removing two nonadjacent edges from $K_{3,3}$ or in other words $G=P_3\square P_2.$ – bof Dec 03 '16 at 19:48

For those who don't find the "harmonic overhang" of Scott's answer counterintuitive, here is a variant that Loren Larson, a retired mathematician and current creator of wooden puzzles, discovered and demonstrated some years ago: It's possible to construct a stable tower of blocks with the property that removing the top block causes the tower below it to collapse.

It's worth spending some time trying to imagine how this is possible, but once you give up, here is the secret:

Start with a standard harmonic overhang (as I recall, Loren's had a couple dozen wooden blocks of dimensions something like $8$ inches by $2$ inches by $1/2$ inch) and carefully nudge it, block by block, to form a spiral. If the tower is tall enough and the spiral is big enough -- Loren worked out the mathematical details -- the top block becomes necessary to counterbalance the portion of the spiral lower down that juts out on the opposite side. Removing it creates an imbalance that causes the lower portion to topple.

Barry Cipra
  • 78,116
  • 7
  • 74
  • 151
  • 10
    Why not just put three blocks one on the other, with middle one shifted to the side? Removing the top block will cause middle one to fall. – Abstraction Dec 07 '16 at 12:12
  • Yeah, or surely build up Scott's tower, and then build it on top of itself again going in the other direction. – theonlygusti Dec 07 '16 at 12:47
  • 2
    A tower that collapses when you remove the top block is easy to imagine; the fact that a particular tower does so might be less obvious. The difficulty here is trying to imagine what the tower actually looks like; I wasn't able to get a clear enough picture of it to say whether it is counterintuitive that the top block is required. – David K Dec 07 '16 at 22:36

(1). In the Stone Age, Og and Mog each need to make large numbers of arrow-heads and ax-heads. Their products are of equal quality. Og takes 3 units of time to make an arrow-head and 5 units of time to make an ax-head. Mog takes 4 units of time to make an arrow-head and 7 units of time to make an ax-head. Both of them consider their time to be valuable.

Since Og is faster at both, it seems that Mog could not give Og an incentive to trade.

But Mog offers to give 17 arrow-heads for 10 ax-heads, which, for Mog, is trading 68 units of time in return for 70. And for Og, this is trading 50 units of time in return for 51. So they both benefit by the trade.

(2). This might not count as finite: In triangle ABC draw the trisectors of the interior angles. Let the trisector lines of angles B ,C that are closer to the side opposite A, meet at A'. Define B',C' similarly. Then (Morley's Theorem) A'B'C' is an equilateral triangle. Intuitively it may seem that A'B'C' might have any given shape. There is a nice proof in Introduction To Geometry by Coxeter.

(3). The only solution in positive integers to $X^3=Y^2+2$ is $X=3, Y=5.$ (Pierre de Fermat). Not obvious to me that there aren't any others.

  • 53,442
  • 4
  • 26
  • 68
  • 5
    To help make the first one more intuitive: If you imagine that the timings were the same except that Mog took 1,000,000 units of time to make an ax-head, it's quickly pretty clear that he'd trade a lot of arrow-heads for them, and that that trade would be good for Og. – Ben Aaronson Dec 06 '16 at 09:03
  • 4
    (1) is yet another example from economics — the **theory of comparative advantage**. As said by Paul Samuelson, "That it [the theory of comparative advantage] is logically true need not be argued before a mathematician; that is is not trivial is attested by the thousands of important and intelligent men who have never been able to grasp the doctrine for themselves or to believe it after it was explained to them." –  Dec 07 '16 at 06:48
  • The first one doesn't seem counter-intuitive at all to me. It's a red herring that Og is faster at both; what's relevant is that they produce at different rates (i.e., time for axe vs. time for arrow). – Théophile Dec 07 '16 at 21:09
  • 1
    @Théophile. Try it out on people you know. Yesterday I showed it to a friend, a master at computer programming. He said he wouldn't have thought it possible. What is obscure to some is obvious to others, and what is counter-intuitive to some may be intuitively clear to others. And what was intuitive to S. Ramanujan, who knows? – DanielWainfleet Dec 07 '16 at 23:25
  • 1
    @Théophile: Try this. Say T-shirts and cars are the only two goods in the world. For an American worker, it takes 3 man-hours to produce a T-shirt and 5,000 man-hours to produce a car. For a Chinese worker, it takes 9 man-hours to produce a T-shirt and 20,000 man-hours to produce a car. The typical American who doesn't understand the theory of comparative advantage then wonders, "Americans are literally better than the Chinese at *everything*! So why are we shipping T-shirt jobs to China?" The answer is that surprisingly, both the US and China can gain by "shipping jobs to China". –  Dec 08 '16 at 00:52

Sporadic simple groups in the Classification of finite simple groups


I find the existence of the sporadic simple groups https://en.wikipedia.org/wiki/Sporadic_group like the Monster group is pretty astounding considering how simple the definition of group is.

To top things off, I've been told that this group has applications in String Theory through the Monstrous Moonshine, although my Mathematics is not advanced enough to appreciate that one yet ;-)

I have explained some of the most basic definitions you need to understand the statement of this theorem at: Intuition behind normal subgroups


Hmm, not sure if this counts as counterintuitive, but what about the handshaking lemma, that the number of people at a party who shake hands an odd number of times is even?

  • 577
  • 5
  • 11
  • 1
    -1. It's not counterintuitive to expect the divisors of an even number to be either even or paired with an even complement divisor. – Nij Dec 03 '16 at 00:49
  • 5
    Look I agree (even if understatement forbade me from proclaiming didactically), hence my caveat, I guess it's just a 'cute' result, the amusement owing more to linguistics than mathematics. – Mehness Dec 03 '16 at 01:30
  • Interesting, sure, but only because it uses a simple fact to develop a much deeper idea. Counterintuitive, absolutely not at all. – Nij Dec 03 '16 at 06:26
  • 5
    @Nij : I'm inclined to disregard *any* claim of "being intuitive" from anyone who has absorbed any significant quantity of math or science. "Intuitive" only to a handful of percent of the population is not intuitive. – Eric Towers Dec 04 '16 at 02:15
  • 1
    The fact that even numbers must have an even divisor is taught to eight-year-olds. The fact that factor pairs must contain all of the prime factors of the product is taught to ten-/eleven-year olds. It's intuitive to anybody who has had more math education than learning how to count and add. Defining "counterintuitive" to mean "not immediately expected by a seven-year-old" is frankly a terrible and pointless way to do it. – Nij Dec 04 '16 at 05:31
  • 1
    I'm impressed at the passion this has generated :) Separately, I do agree with some of the commentary on this thread in that if counterintuitivity is rigidly to be adhered to, I think probability / combinatorics is the most fertile hunting ground. One notable exception to this 'rule' is my favourite in Algebra which will probably always be the insolubility of the quintic which definitely had me utterly gobsmacked as a first year undergrad (pardon the deliberate digression, sure someone's mentioned it here). Happy Sunday! – Mehness Dec 04 '16 at 06:49
  • This is problem 101 in "Mathematical Quickies" by Charles W. Trigg, first published in 1967 by McGraw-Hill Book Company, New York. So I think the solution is not immediately obvious for most people. – miracle173 Dec 04 '16 at 21:59
  • 1
    it's just 'brainteaser' quickie, didn't really belong here, so I have sympathy with Nij's position, all good. It's documented to be quite a common interview question in my industry as given a moment's clear thought, it's obvious - it's reminiscent some ways of Gauss' sum to n integers so the invoking of the the 7 year old is quite apposite @Nij :) – Mehness Dec 05 '16 at 11:33
  • @Nij: This answer is less interesting than the comments on it. The question is whose intuition we are interested in and how intuition changes with experience. – PJTraill Dec 06 '16 at 00:07
  • 4
    A set that is not open is not necessarily closed. A relation that is not symmetric is not necessarily anti-symmetric. A statement that is not intuitive is not necessarily counter-intuitive. :) – Théophile Dec 07 '16 at 21:18
  • Euclidean algorithm; gcd=2 – samerivertwice Nov 01 '17 at 13:38

Determinacy of classes of finite games:

The existence of winning strategies for finite games without draws and with perfect information is counter-intuitive at first glance. It is, ultimately, very sound though.

I have been in the situation of explaining this to children and adults with the example of simple games, and most of the time people went from disbelief to recognition that it is in fact obvious.

There are three major reasons for this counter-intuitiveness:

-We are used to playing those types of games since we're children, and in our experience, our strategies often ended up defeated.

-Winning strategies for even simple games are often unknown: most of the time they can't be computed because the numbers of possible outcomes are two high. So most of the time, the existence of a winning strategy for a game doesn't affect the way you play the game.

-The definition of the fact $W$ of being in a winning position is somewhat complex: if you can make a winning move, then $W$, and (if whatever move your opponent makes, then $W$), then $W$. This type of recursive definition can be hard to grab, and when trying to unveil it, complexity, in the sense of an alternance of quantifiers, really appears since you get: $\forall$ opponent move $\exists$ move such that $\forall$ opponent move $\exists$ move such that ... such $\exists$ a winning move for you.

  • 4,739
  • 13
  • 18

82000. It absolutely boggles the mind because this is very close to the axioms, so to speak. You need to prove only a few things based on the Peano axioms to get to

The smallest number bigger than 1 whose base 2, 3, and 4 representations consist of zeros and ones is 4. If we ask the same question for bases up to 3, the answer is 3, and for bases up to 2, the answer is 2. But 82000 is the smallest integer bigger than 1 whose expressions in bases 2, 3, 4, and 5 all consist entirely of zeros and ones.

And it is very likely there is no other for 2-5 and none at all for 2-6.

Edit: if you downvote, please leave a comment so I can learn. Thanks!

  • 1,665
  • 12
  • 21
  • 1
    Didn't downvote, but this doesn't seem like an interesting result (or one that's particularly counterintuitive). It doesn't explain anything; it doesn't seem to mean anything; and it doesn't seem to have any consequences. Questions about what numbers look like in different bases are generally not very interesting ones. – anomaly Dec 07 '16 at 16:40
  • 2
    Also the condition is trivial for base 2, so "bases up to 3" is really just "base 3", and "bases up to 4" is really just "bases 3 and 4". – Ben Millwood Dec 07 '16 at 17:32
  • 2
    Of course the first few are trivial -- but the base-N notation here is not exactly necessary, you could just talk about sum of powers, it's a small convenience. Ie. 82000 is the smallest of numbers that can be written as a sum of powers of 3 and 4 and 5. That's what I find so odd, that somehow this odd number comes from the structure of the mathematical universe. The oddity here is like Ramanujan's taxi cab number. – chx Dec 07 '16 at 17:37

Pick a couple of positive integers $a,b$ and make a tower of exponents with $a$, and find the value $\bmod b$. For example:

$21^{21^{21^{21}}} \bmod 31$

Now you don't need a very tall tower before extending it makes absolutely no difference to the result. I can replace the top $21$ there with any other positive integer, or (for effect) pile on more layers of exponents, and the answer doesn't change.

  • 38,461
  • 5
  • 42
  • 78

Not sure whether this is the kind of thing you were expecting, but here goes:

Some statements about constructive mathematics can seem very counter-intuitive (at first, this is probably because one is misinterpreting what they mean), e.g.:

  • the induction principle holds, but on the other hand: that every non-empty (or inhabited) set of naturals has a smallest element is in general false
  • given a set $A$, consider the statements: (i) "there is a finite set $B$ and an injection $A\to B$", (ii) "there is a finite set $B$ and a surjection $B\to A$". None of the statements imply each other or that $A$ is finite
Stefan Perko
  • 11,817
  • 2
  • 23
  • 59
  • 4
    Can you explain the first one? It looks like a simple negation of the well-ordering principle. I don't see any other way to interpret it. – murgatroid99 Dec 05 '16 at 23:31
  • @murgatroid99 I don't know what you mean by "explain". In my second paragraph I meant: Even if you initially think it is surely true, you may have a faulty explanation. In the case of the well-ordering principle of integers something like: "Of course, you just increment an integer starting at $1$ until you land in the set" which doesn't work because it may be *very hard* to check that you are in this set. – Stefan Perko Dec 06 '16 at 08:27
  • 1
    I'm just trying to understand why that statement is false. Can you provide a counterexample, or a method of finding a counterexample, or some other proof that it's false. Even an explanation of why that statement is not actually equivalent to "the well ordering principle if false" would be helpful. – murgatroid99 Dec 06 '16 at 08:53
  • @murgatroid99 Note that we have $x = y$ or $x\neq y$ for any integers $x,y$. Let $p$ be a proposition, then define $A:= \{ x\in \{0,1\} : x = 1 \text{ or } (x = 0\text{ and } p) \}$. Obviously $1\in A$. Assume $A$ has a least element $m$. If $m = 0$, then $p$. If $m = 1$, then $\neg p$. In conclusion $p \text{ or } \neg p$, that is: the WOP of $\mathbb N$ implies excluded middle. - the right WOP in constructive mathematics is: "Every *decidable* inhabited subset of $\mathbb{N}$ has a smallest element". – Stefan Perko Dec 06 '16 at 12:13
  • 1
    I understand now. When I first read your post, I completely missed the part where you were operating under an unusual set of logical axioms, and that was the source of my confusion. – murgatroid99 Dec 06 '16 at 16:15
  • 1
    For the first one, do you mean "false" or "not provable"? – Theodore Norvell Dec 10 '16 at 15:09
  • @TheodoreNorvell "in general false" = "false in some models" = "not provable". – Stefan Perko Dec 10 '16 at 15:13
  • @StefanPerko I think "false" could be interpreted as "false in all models" rather than "false in some models". It seems ambiguous. – user76284 May 12 '19 at 03:12

Here are several involving the construction of regular polygons. For at least three out of the four cases we would not call it counterintuitive, but the results we know today would have thrown the Greeks for a loop.

1) The regular heptagon has no Euclidean construction. The Greeks would have preferred to be able to construct regular polygons generally by Euclid's methods, which would make their properties accessible to elementary proof. Only in modern times did we definitely learn the bad news.

2) The regular enneagon (nonagon) has no Euclidean construction, either. This is a special case of the old angle trisection problem: trisect the central angle of an equilateral triangle and you can make a regular enneagon. Now we know it works in reverse: we prove that a general angle trisection with Euclidean methods cannot exist by showing that the regular enneagon is not Euclidean-constructible.

3) The regular hendecagon does have a neusis construction. Unlike the other cases, this is purely a "modern" counterintuitive. It was long thought that neusis construction was similar to conic-section construction: you can solve cubic and quartic equations with it. But we know now that some (we don t know about all) irreducible quintic equations are also solvable. Benjamin and Snyder showed that the minimal equation for $\cos((2\pi)/11)$ is one such neusis-solvable equation. See Constructing the 11-gon by splitting an angle in five.

4) After the regular pentagon, the next regular prime-sided polygon constructible by Euclid's methods is the 17-gon. It 's "obvious" now, but ancient and medieval mathematicians would never have suspected it without the theories developed by Gauss.

Oscar Lanzi
  • 29,410
  • 2
  • 32
  • 75

A proper coloring of a graph is an assignment of colors to its vertices in such a way that no two adjacent vertices share the same color.

The chromatic number of a graph is the minimum number of colors for which there is a proper coloring.

Any tree has chromatic number $2$ - imagine starting at a leaf and just alternating red-blue-red-blue.

The girth of a graph is the number of vertices in its smallest cycle.

A tree has infinite girth, as it contains no cycles.

The tree example may cause you to think that large girth means small chromatic number. It seems plausible enough: A graph with large girth "looks like" a tree near any particular vertex, since it will be a long time before the edges leaving that vertex wrap back around to form a cycle. We therefore should be able to alternate red-blue-red-blue locally near a vertex, then just introduce a couple new colors to fix up the places where we get stuck.

Nothing of the sort! Erdős proved in 1959 using probabilistic techniques that there are graphs with arbitrarily high girth and chromatic number. In other words, that "treelike" appearance of high girth graphs has no ultimate control over chromatic number.

Austin Mohr
  • 24,648
  • 4
  • 62
  • 115

$()()$ is not a palindrome but $())($ is.

Intuition tells us that if we can put a mirror in the centre and it reflects, then it's a palindrome. But because the mirror exchanges left brackets for right ones, our intuition deceives us in this particular instance.

  • 8,241
  • 2
  • 20
  • 55

One of the best (and most useful) is Benford's law which states that the leading digit of a randomly chosen number is not equally distributed among the radix. For example, in based 10, the value 9 will appear less often than any other digit in the highest place value position - in a "naturally occurring" population of numbers.

Say you wanted to analyse the accounts of a company to identify falsified data. If you found that some set of numbers (e.g. a set of invoices) contained the digits $0-9$ in roughly equal proportions in the highest place value, this would be a red flag that the data might have been falsely generated.

This is because place value grows logarithmically while a number's value is a linear function of the choice of any given digit, making low-valued digits such as $1$ more likely to lead.

  • 8,241
  • 2
  • 20
  • 55
  • Why does place value grow logarithmically? Doesn't this depend on the distributional assumption which you make? – Epiousios Nov 01 '17 at 08:00
  • @Epiousios yes it does. Any assumption that the leading digit is equally distributed among the radix, inherently implies that the distribution of your figures is not independent of the base in which you write them. – samerivertwice Nov 01 '17 at 11:38
  • @Epiousios another way of looking at it is that the leading digit of any figure $x$ of length $n$ when written in base $b$ is conditional upon the magnitude of that figure $x$ being in the range $b^n-1\geq x\geq b^{n-1}$. So it's conditional upon a logarithmic assumption. – samerivertwice Nov 01 '17 at 11:44

Micha Perles's discovery of non-rational polytopes - combinatorial types of convex polytopes that cannot be realized with rational vertex coordinates.

Non-rational configurations, polytopes, and surfaces by Gunter Ziegler

Dan Moore
  • 1,069
  • 7
  • 16

Maybe not right on the money, but worth mentioning: (subtly) faulty dissections. For example, as shown here, it is seemingly possible to dissect an 8 x 8 square into a 5 x 13 rectangle.

Mike Jones
  • 4,310
  • 1
  • 35
  • 39