I want to teach a short course in probability and I am looking for some counter-intuitive examples in Probability. The results seem to be obviously false but they true or vice versa.

I already found some things. For example these two videos:

In addition, I have found some weird examples of random walks. For example this amazing theorem:

For a simple random walk, the mean number of visits to point $b$ before returning to the origin is equal to $1$ for every $b \neq 0$.

I have also found some advanced examples such as Do longer games favor the stronger player?

Could you please do me a favor and share some other examples of such problems? It's very exciting to read yours...

  • 5,261
  • 5
  • 41
  • 59
  • 2
    A variation on Penny's game: A new family moves to your block. Since you have children you are interested in what children they have: (1) You learn that they have two children, one of which is a girl. What is the probability the other child is also a girl? (2) You learn that they have two children, the older child a girl. What is the probability the other child is also a girl? Tell why the answers are NOT the same. – user247327 Feb 16 '17 at 16:45
  • 1
    @user247327 "Tell why the answers are the SAME" is also valid here. As it all depends on the assumptions you make for this question. The way it is phrased in this comment I would say 1/2 in both is the correct answer. The problem is that you assume a distribution with 4 states. Which is just an assumption for the stated problem. https://en.wikipedia.org/wiki/Boy_or_Girl_paradox – Kami Kaze Feb 17 '17 at 10:31
  • 3
    Mildly interestingly, the random walk problem you mentioned in higher dimensions is more complicated. For $n = 2$, the same result applies: one returns to the origin a.s. In higher dimensions, the probability of returning to the origin is strictly between $0$ and $1$. – anomaly Feb 18 '17 at 04:38
  • 2
    A complementary request for *intuitive* examples in probability would lead to far less answers ... – Torsten Schoeneberg Nov 02 '19 at 06:59
  • @user247327 "You learn that they have two children, one of which is a girl." Ok, let's try to imagine how you could learn this. Maybe you see from a distance that there are two children, and then one runs close enough that you see she is a girl. Or say you learn this by overhearing the parent say, "She's one of my two kids." When you think about it, most ways that you would normally learn this information do not work for the "trick" interpretation of the question. So this is not a case of a counter-intuitive answer, but a counter-intuitive question interpretation. – Matt Dec 02 '20 at 18:58

31 Answers31


The most famous counter-intuitive probability theory example is the Monty Hall Problem

  • In a game show, there are three doors behind which there are a car and two goats. However, which door conceals which is unknown to you, the player.
  • Your aim is to select the door behind which the car is. So, you go and stand in front of a door of your choice.
  • At this point, regardless of which door you selected, the game show host chooses and opens one of the remaining two doors. If you chose the door with the car, the host selects one of the two remaining doors at random (with equal probability) and opens that door. If you chose a door with a goat, the host selects and opens the other door with a goat.
  • You are given the option of standing where you are and switching to the other closed door.

Does switching to the other door increase your chances of winning? Or does it not matter?

The answer is that it does matter whether or not you switch. This is initially counter-intuitive for someone seeing this problem for the first time.

  • If a family has two children, at least one of which is a daughter, what is the probability that both of them are daughters?
  • If a family has two children, the elder of which is a daughter, what is the probability that both of them are the daughters?

A beginner in probability would expect the answers to both these questions to be the same, which they are not.

Math with Bad Drawings explains this paradox with a great story as a part of a seven-post series in Probability Theory

Nontransitive Dice

Let persons P, Q, R have three distinct dice.

If it is the case that P is more likely to win over Q, and Q is more likely to win over R, is it the case that P is likely to win over R?

The answer, strangely, is no. One such dice configuration is $(\left \{2,2,4,4,9,9 \right\},\left \{ 1,1,6,6,8,8\right \},\left \{ 3,3,5,5,7,7 \right \})$

Sleeping Beauty Paradox

(This is related to philosophy/epistemology and is more related to subjective probability/beliefs than objective interpretations of it.)

Today is Sunday. Sleeping Beauty drinks a powerful sleeping potion and falls asleep.

Her attendant tosses a fair coin and records the result.

  • The coin lands in Heads. Beauty is awakened only on Monday and interviewed. Her memory is erased and she is again put back to sleep.
  • The coin lands in Tails. Beauty is awakened and interviewed on Monday. Her memory is erased and she's put back to sleep again. On Tuesday, she is once again awaken, interviewed and finally put back to sleep.

In essence, the awakenings on Mondays and Tuesdays are indistinguishable to her.

The most important question she's asked in the interviews is

What is your credence (degree of belief) that the coin landed in heads?

Given that Sleeping Beauty is epistemologically rational and is aware of all the rules of the experiment on Sunday, what should be her answer?

This problem seems simple on the surface but there are both arguments for the answer $\frac{1}{2}$ and $\frac{1}{3}$ and there is no common consensus among modern epistemologists on this one.

Ellsberg Paradox

Consider the following situation:

In an urn, you have 90 balls of 3 colors: red, blue and yellow. 30 balls are known to be red. All the other balls are either blue or yellow.

There are two lotteries:

  • Lottery A: A random ball is chosen. You win a prize if the ball is red.
  • Lottery B: A random ball is chosen. You win a prize if the ball is blue.

Question: In which lottery would you want to participate?

  • Lottery X: A random ball is chosen. You win a prize if the ball is either red or yellow.
  • Lottery Y: A random ball is chosen. You win a prize if the ball is either blue or yellow.

Question: In which lottery would you want to participate?

If you are an average person, you'd choose Lottery A over Lottery B and Lottery Y over Lottery X.

However, it can be shown that there is no way to assign probabilities in a way that make this look rational. One way to deal with this is to extend the concept of probability to that of imprecise probabilities.

Tanner Swett
  • 8,885
  • 28
  • 51
Agnishom Chattopadhyay
  • 2,274
  • 1
  • 12
  • 22
  • 32
    A classic example: For this, I think it's extremely important to eventually highlight that the reason switching is better is because *monty knows* which door actually has the car. If a door was opened at random after the initial choice, and happened not to contain the car, then your odds are actually no better to switch than not switch. The point being: The seemingly benign and/or irrelevant detail of whether monty opens a door without the car or at random is actually vitally important the answer of the question. – Justin Benfield Feb 12 '17 at 07:54
  • 26
    Concerning the last so-called 'paradox', it's only paradoxical if you make a **false assumption** that rational preference of one choice to another must imply a higher expectation. Indeed, given a uniform prior, each pair has the same expectation, but A and Y have zero variance and so are obviously preferable to B and X for a risk-averse person. It is therefore totally unfair to use this 'paradox' to claim a deficiency in the concept of probability. – user21820 Feb 12 '17 at 10:38
  • 6
    I think the Ellsberg Paradox is not a good example of a counterintuitive probability example, and certainly not for a short course in probability (which the OP will teach). The Ellsberg Paradox is meant to illustrate the difference between risk and Knightian uncertainty. – wythagoras Feb 12 '17 at 11:43
  • 1
    You could add Russell's paradox to the list: https://en.m.wikipedia.org/wiki/Russell%27s_paradox – axolotl Feb 12 '17 at 18:11
  • 2
    @zakoda How does that paradox relate to probability? – Paŭlo Ebermann Feb 12 '17 at 18:21
  • 2
    @PE: Sorry, I meant Bertrand's paradox. I confused the two because "Bertrand Russell"!. Here's what I was talking about: http://math.stackexchange.com/a/2141200/278700 – axolotl Feb 12 '17 at 18:35
  • 16
    The Monty Hall problem should be edited to include that the host **always** reveals a goat. If they don't always reveal a goat, the distribution can change arbitrarily. – Chai T. Rex Feb 13 '17 at 05:01
  • 6
    Regarding the Monty Hall problem, as Chai T. Rex points out, it's critical that Monty will always reveal a goat. Which itself provides a nice way to catch out those who already know of the classic problem, while also providing a further lesson in probability - what happens if Monty randomly chooses one of the other doors to open, and if he reveals the car, you lose? If the sequence of actual events stays the same (you choose a door, Monty opens a door, there's a goat there, choose whether to switch), it'll seem like the situation is the same, but now there's no benefit to switching. – Glen O Feb 13 '17 at 07:03
  • 1
    @ChaiT.Rex I did mention that Monty reveals a goat, in the third bullet – Agnishom Chattopadhyay Feb 13 '17 at 07:56
  • 5
    @AgnishomChattopadhyay That's not enough. If the host picks the door to open randomly (from the two remaining doors after the player's pick) and happens to pick a door with a goat, the probability of winning is 50% whether the player switches or stays. You need to specify that the host was guaranteed to open a door with a goat in the first place. – kviiri Feb 13 '17 at 11:55
  • 1
    Downvoted for incorrect presentation of Monty Hall problem. Understanding paradoxes and gaining understanding of probability by studying those paradoxes absolutely requires unambiguous examples. – JiK Feb 13 '17 at 13:41
  • 1
    Another possibility in this presentation of Monty Hall is that each time the game is played, the host (who knows where the car is) will use that information to immediately open the player's chosen door if the player has chosen a goat, at which point the player loses. Therefore the fact he opened another door in _this_ iteration of the game means you have certainly already chosen the car. But this problem is an excellent one when the rules are correctly stated, and the clearer the rules are, the better. – David K Feb 13 '17 at 15:28
  • 1
    You say that he reveals a goat, but that could mean that he conditionally reveals a goat this time rather than always revealing a goat. For example, he might reveal a goat only when you've already selected the car. This would make the resultant probabilities quite different. – Chai T. Rex Feb 13 '17 at 20:38
  • 2
    Also: If a family of two children has a daughter named Ellen, what is the probability that both of the children are daughters? – The Vee Feb 13 '17 at 23:21
  • 1
    @The Vee: there is a continuum of counter-intuitive results in this direction. Example: A family of two children has a daughter born a monday. What is the probability that both children are daughter? Answer (under some hypotheses of independence and equidistribution): 13/27. – D. Thomine Feb 14 '17 at 12:28
  • @kviiri (and others) "You need to specify that the host was guaranteed to open a door with a goat in the first place". Couldn't it be argued that the host may or may not know this, and if they don't then you're no worse off by switching. So you may as well switch as you will be better off if they do? – John R Ramsden Feb 15 '17 at 11:08
  • @JohnRRamsden Yeah, but if we forgo assumptions about the host's behavior completely it's also possible they offer the switch only when the player has selected the winning, or conversely, losing door, in which case switching wins with 0% or 100% probability (respectively). The host's behavior must be clarified. – kviiri Feb 15 '17 at 11:18
  • @kviiri "also possible they offer the switch only when .." Ah, but that's a different game. My assumption was that the host had to open a door, whether their motives in relation to the contestant were helpful, neutral, or obstructive (although I agree that an obstructive host can reduce the probability, so you're right in that respect). – John R Ramsden Feb 16 '17 at 11:56
  • @JohnRRamsden Sure, it's a different game. So is the host opening a door with a goat with 100% probability instead of picking randomly from the two remaining doors. What is your point? – kviiri Feb 16 '17 at 12:03
  • Regarding Monty Hall, I especially like this NYT article that describes the problem but then brings in Monty Hall (the real person) to play the game. He basically shows that probability has nothing to do with it. He also controls the flow of the game, so the assumptions of the "academic" version are not met. He also became good at reading *individual* players and controlling the flow based on their individual behavior. I think that adds a touch of "context" to the math. http://www.nytimes.com/1991/07/21/us/behind-monty-hall-s-doors-puzzle-debate-and-answer.html?pagewanted=all – Brick Feb 16 '17 at 14:42
  • In the Ellsberg Paradox, it's not actually stated that the number of balls of other colours is random. If you make your choice and then the lottery-runner goes away and prepares the lottery with your choice in mind, they'll pick all the non-red balls to be yellow if you pick A, and all the non-red balls to be blue if you pick X. So pick B or Y. – Ben Millwood Feb 18 '17 at 08:36
  • @Justin Benfield, if Monty randomizes and you know it, you would be indifferent (which is the situation most people who refuse to switch incorrectly believe they're in), but if you're indifferent, then switching is acceptable to the rational player, so in the end __that detail was not so important__ and certainly not vitally so. But if you can find me a process by which Monty can make switching unfavourable, then of course... – PatrickT Sep 14 '17 at 16:18
  • 1
    @PatrickT: Example of Monty behavior where switching=bad idea would be that he reveals the car if you didn't pick it first (of course, this trivializes the game, if you know that he behaves this way). – Justin Benfield Sep 21 '17 at 07:14
  • @JustinBenfield, indeed! – PatrickT Sep 21 '17 at 18:47

Birthday Problem

For me this was the first example of how counter intuitive the real world probability problems are due to the inherent underestimation/overestimation involved in mental maps for permutation and combination (which is an inverse multiplication problem in general), which form the basis for probability calculation. The question is:

How many people should be in a room so that the probability of at least two people sharing the same birthday, is at least as high as the probability of getting heads in a toss of an unbiased coin (i.e., $0.5$).

This is a good problem for students to hone their skills in estimating the permutations and combinations, the base for computation of a priori probability.

I still feel the number of persons for the answer to be surreal and hard to believe! (The real answer is $23$).

Pupils should at this juncture be told about quick and dirty mental maps for permutations and combinations calculations and should be encouraged to inculcate a habit of mental computations, which will help them in forming intuition about probability. It will also serve them well in taking to the other higher level problems such as the Monty Hall problem or conditional probability problems mentioned above, such as:

$0.5\%$ of the total population out of a population of $10$ million is supposed to be affected by a strange disease. A test has been developed for that disease and it has a truth ratio of $99\%$ (i.e., its true $99\%$ of the times). A random person from the population is selected and is found to be tested positive for that disease. What is the real probability of that person suffering from the strange disease. The real answer here is approximately $33\%$.

Here strange disease can be replaced by any real world problems (such as HIV patients or a successful trading / betting strategy or number of terrorists in a country) and this example can be used to give students a feel, why in such cases (HIV patients or so) there are bound to be many false positives (as no real world tests, I believe for such cases are $99\%$ true) and how popular opinions are wrong in such cases most of the times.

This should be the starting point for introducing some of the work of Daniel Kahneman and Amos Tversky as no probability course in modern times can be complete without giving pupils a sense of how fragile one's intuitions and estimates are in estimating probabilities and uncertainties and how to deal with them. $20\%$ of the course should be devoted to this aspect and it can be one of the final real world projects of students.

Nikhil Pandey
  • 541
  • 3
  • 3
  • 4
    A quick intuitive explanation of the rough order of the answer to the birthday problem is to notice that for birthdays to be all distinct each pair has to be distinct, and there are $Θ(n^2)$ pairs for $n$ people. Thus roughly you expect that, if $n \in Θ(\sqrt{k})$ as $k \to \infty$, where $k$ is the number of possible birthdays, then there are $Θ(k)$ pairs each of which has probability $\frac1k$ of being equal, and so the total probability of distinct birthdays is roughly $(1-\frac1k)^{Θ(k)}$ which tends to $\exp(-c)$ for some positive constant $c$. – user21820 Feb 13 '17 at 04:17
  • In fact, this analysis can be made rigorous by using smoothing inequalities. Specifically $\frac{k-2r}{k} \le \frac{k-i}{k} \cdot \frac{k-(n-1)+i}{k} \le (\frac{k-r}{k})^2$ for any $i \in [0..(n-1)]$, where $r = \frac12(n-1)$. – user21820 Feb 13 '17 at 04:22
  • 4
    One way to interpret the "sickness" example is in terms of a signal to noise ratio. Namely, Bayes' rule gives us that for $S$ denoting sickness and $T$ denoting a positive test, $P(S \mid T)=\frac{P(T \mid S) P(S)}{P(T \mid S) P(S) + P(T \mid S^c) P(S^c)}$. Now as $P(S) \to 0$ with the conditional probabilities held constant, the ratio between this and $P(S)$ tends to $\frac{P(T \mid S)}{P(T \mid S^c)}$. – Ian Feb 15 '17 at 16:35
  • 4
    (Cont.) This means that you approximately multiply your prior probability that the person is sick by $\frac{P(T \mid S)}{P(T \mid S^c)}$, which is usually rather large. This tells us that multiplying our prior probability by a fixed factor is essentially as much as we can do...which means that testing for rare conditions is simply hard. – Ian Feb 15 '17 at 16:35
  • Yes this is exactly one of the points of "strange disease" problem is. As information to noise ratio( false positives) decreases it's very difficult to make right inference, which is supposedly still one of the least understood concept. But this is not surprising given how fragile people's intuition about grouped multiplication an division ( inverse multiplication) is in general while at the same time they have so much confidence in their estimates in this regard. It always felt so strange. – Nikhil Pandey Feb 17 '17 at 03:42

Airplane Seating

$100$ people are boarding a plane in a line and each of them is assigned to one of the $100$ seats on a plane. However, the first person in line forgot his boarding pass and as a result decided to sit down in a random seat. The second person will do the following:

  1. Sit in her seat if it still available.
  2. If her seat is not available, choose a random seat among the seats remaining and sit there.

Each following person sits according to the same rules as the second person. What is the probability the $100^{th}$ person will be able to sit in her assigned seat?

Most people think the probability is very small and think there is a tiny chance of the 100th person's seat being left after all the people move around. But the actual probability ends up being $\frac{1}{2}$.

  • 5,261
  • 5
  • 41
  • 59
  • 684
  • 4
  • 7
  • Not only that; it seems any number of passengers >1 gives 1/2. I see _how_ but not _why_ if that makes any sense. – JollyJoker Feb 13 '17 at 11:40
  • Is there any quick way to justify that? – MR_BD Feb 13 '17 at 15:20
  • 20
    Yes. The 100th person either gets her own seat, or the first person's (no other seat can be unoccupied when she gets there, since it would have been taken by its owner). The other one of those two seats has taken by someone else. They must have picked it at random, so it's equally likely to be either of the two. – Especially Lime Feb 13 '17 at 15:47
  • 3
    To elaborate on the previous, for anyone who has a chance of sitting in either the first or the last seat, both are available and equally likely, since picking either means everyone else (before the 100th) can take their assigned seat. – JollyJoker Feb 13 '17 at 20:06
  • 6
    @LeilaHatami the probability of winning is obviously 1/2 in the 2-seat problem (either the first passenger chose their own seat, or your seat).In the n>2 seat problem there are three possibilities. 1/n: they choose their own seat (win, without any choices being made by anyone later). 1/n: they choose your seat (lose, regardless of any choices made later). (n-2)/n: they choose the mth passenger's seat. All of the passengers up to the mth will sit in their correct seats, so we can remove them and their seats, re-label the remaining seats, and reduce to the n-m seat problem. – hobbs Feb 16 '17 at 07:37

I find that almost anything about probability is counter-intuitive to my college students on first encounter. Possibly this may depend on your audience. Here are a few examples:

$1.$ Question: "If a certain event has a $40\%$ chance of success, and we run $50$ experiments, then how many would you expect to succeed?" The most common responses I usually get are "all of them" and "none of them". This is after an hour-long lecture on the subject.

$2.$ Question: "Interpret this probability statement: There is a $30\%$ chance of rain today in the New York area." I usually only get about a $65\%$ successful response rate on this on a multiple-choice quiz, even after the hour-long lecture on the subject. Once I had a student so bamboozled by it that she called up the national meteorology service for a consultation.

$3.$ Question: "We have a hand of four cards $\{A, 2, 3, 4\}$, and pick out two at random; what is the probability we get the $A$ or $2$ ?" Common responses are $25\%$, $50\%$, and $75\%$. I've never had anyone in a class intuit the correct answer on first presentation.

$4.$ Question: "If you drive to school on a given day, you either get in an accident or you don't. Are these equally likely outcomes?" At least half of any class answers "yes" on the first presentation. This can be repeated with the same result with similar follow-up questions.

Daniel R. Collins
  • 7,820
  • 1
  • 19
  • 42
  • 34
    I can't believe anyone gets the first one wrong. – theonlygusti Feb 13 '17 at 01:24
  • 1
    @theonlygusti: My answer is "About 10 to 30." – user21820 Feb 13 '17 at 03:57
  • 24
    @theonlygusti "How many of them would you expect to succeed?" I wouldn't expect any of them to succeed, but overall I'd expect to get 20 successes :) – Rawling Feb 13 '17 at 07:59
  • 9
    @theonlygusti: #1 does sound surprising but not completely far-fetched. The description said nothing about the events being independent. If the events are assumed to be fully correlated, then the mode is having no successes at all. So I could get where that answer came from. – Meni Rosenfeld Feb 13 '17 at 12:35
  • 2
    Re Q3 - are you sampling with replacement or without? – user3490 Feb 13 '17 at 13:55
  • 6
    @user3490 - the wording implies without replacement, as it doesn't suggest that you pick one, then pick another one, but that you "pick out two". That's a single operation, and thus there is no replacement. Of course, being more explicit is better in a situation like this, but there's a natural understanding, just as "at random" has a natural understanding in a situation like this. – Glen O Feb 13 '17 at 14:30
  • 2
    On Q2, it depends on whether you carry your umbrella or wash your car. :-) – fixer1234 Feb 13 '17 at 18:44
  • 3
    @theonlygusti: There's a particular day every semester where I go around raving my own disbelief to my friends after the results of that quiz come in. Going on 5 years now. – Daniel R. Collins Feb 14 '17 at 00:20
  • 2
    Number 4 is the weird one to me. But I've heard the stories about the guy who was concerned about the LHC because there was a 50-50 chance it would end the universe. – Brian Tung Feb 14 '17 at 00:23
  • 4
    I just added #4. It's a distressingly widespread (and sticky!) misconception that all experiments with two discernible outcomes have a 50% chance of occurring. I kind wish someone would do a mass survey on that. – Daniel R. Collins Feb 14 '17 at 00:26
  • 8
    @DanielR.Collins I'm sure about half of all people surveyed would think it's true. – Kyle Feb 14 '17 at 19:55
  • 3
    Sounds like the most counter-intuitive question is "What's the probability of a college student having good intuition about probability?" – MartianInvader Feb 14 '17 at 21:24
  • 5
    (+1) The whole concept of something being "counter-intuitive" *only works if you have built up an "intuition" first!* From a pedagogical point of view it might well be worth building up basic skills and familiarity with probability first, before letting loose the examples that don't work out as expected. Otherwise (a) they're only likely to cause confusion and prevent the initial development of intuition, (b) the sense of awe/wonder/puzzlement that these examples are so valuable for will be absent anyway, since a new learner may not yet have expectations to contradict! – Silverfish Feb 14 '17 at 22:00
  • 9
    I find this really appalling. 1 and 4 are so basic in understanding the world, let alone mathematics or statistics, that everyone *needs* to have the correct understanding. – Chappers Feb 14 '17 at 23:57
  • In the stated form the first one, could even be the most correct choice.... As the circumstances are not elaborted. – Kami Kaze Feb 17 '17 at 09:49
  • 2
    What do you count as a correct answers for Q2? I ask since it does not seem to have an easy interpretation from the frequentist approach, which is the only thing 99% of the books teach. – Michael Bächtold Oct 21 '20 at 14:28
  • Q3 could also be interpreted as "either A or 2 but not both" or "either A or 2 or both" or even as two questions. – Fax Oct 22 '20 at 14:27

A while back, the xkcd blog posted this problem, which I found fascinating. Usually when I re-tell it, I do so slightly differently from the original author:

I have selected two numbers from $\mathbb{R}$, following some unknown and not necessarily independent distribution. I have written each number in a separate envelope. By fair coin toss, I select one of these two envelopes to open, revealing that number. I then ask the question "Is the number in the other envelope larger than this one?". You win if you guess correctly.

Can you win this game with probability $>\frac{1}{2}$? Note, that is a strict inequality. Winning with probability $=\frac{1}{2}$ is obviously easy.

Now, the solution to this starts out with a double-integral, so depending on the level of the class you're teaching it may not be appropriate.

  • 1,203
  • 7
  • 17
  • 1
    Yeah, I saw this amazing example. I've mentioned it in the question "How to Win a Guessing Game" – MR_BD Feb 12 '17 at 09:42
  • 1
    Oh, sorry @LeilaHatami! I should have clicked on your links first. I'll take a look at that video and see if the proof I'm writing is more or less neat than that, but given that it's a Numberphile video, they've probably presented a very neat and approachable answer. – ymbirtt Feb 12 '17 at 09:58
  • Your welcome dear @ymbirtt . In fact there is no proof In that video. But it gets me familiar with this problem. I thank you for your kind response. – MR_BD Feb 12 '17 at 10:01
  • 1
    You must guarantee your joint distribution is continuous with support $\mathbb{R}^2$, if not it won't work. @LeilaHatami: In fact, you don't need a double integral to solve it. You just pick $r$ from some fixed continuous distribution $R$ with support $\mathbb{R}$ (say $N(0,1)$) and compare $r$ against the revealed amount $x$, and answer "yes" iff $r > x$. Let the unknown amount be $y$. Then $P(xy \mid rx,y) = P(x>y \mid r>x,y) = \frac12$. But $P(xy \mid x>r>y) = 1$ and $P(xr>y) > 0$. – user21820 Feb 13 '17 at 03:24
  • @user21820 Yeah, Thank you – MR_BD Feb 13 '17 at 03:38
  • 7
    I don't think the solution needs calculus at all. It needs that you can find a cumulative distribution function on $\Bbb R$ that is strictly monotonically increasing, like the $\arctan$. This means you accept the higher with greater probability than you accept the lower, which is all you need. – Ross Millikan Feb 13 '17 at 04:56
  • @user21820, I'll have to sit and think about your argument for a while before I can properly internalise it, but it looks sound enough. Could you edit it into my answer? – ymbirtt Feb 13 '17 at 08:27
  • @RossMillikan, I agree that the solution is actually fairly straightforward, but proving that it's correct is something I spent a rather long time on. – ymbirtt Feb 13 '17 at 08:33
  • 1
    I could edit my comment in, but you can edit your own answer too. =) @LeilaHatami: If it helps, the point where I use the continuity of the joint distribution of the envelop amounts is to guarantee that $P(x = y \lor r = x \lor r = y) = 0$. This implies that we are justified in considering only the cases I did in my earlier comment. To express the reasoning in more intuitive terms, the game chooses $x,y$ jointly and then swaps them with probability $\frac12$. So if $r < x,y$ or $r > x,y$, then we answer correctly with probability $\frac12$. But if $r$ is in-between, we answer correctly surely. – user21820 Feb 13 '17 at 12:15
  • 1
    @LeilaHatami: And the point where I use the continuity of the distribution for $r$ (actually all you need is Ross' requirement of strictly monotonic cumulative distribution function, but we get to choose $r$ anyway so it doesn't matter at all) is to ensure that there is positive probability that $r$ is in-between $x,y$, because this is precisely the case that gives us greater than half probability of answering correctly. – user21820 Feb 13 '17 at 12:21
  • @user21820 Yeah, I understood. a great thanks... – MR_BD Feb 13 '17 at 12:46
  • I don't think continuity of the distribution of numbers in the envelope is necessary. It is necessary to know that the numbers are equal with probability zero and that the distribution of your own random number $r$ has support over all of $\mathbb R.$ We work out that for any random numbers $x, y$ the conditional probability of guessing correctly is greater than $\frac12.$ Now we just need some way to translate that into a total probability. Proving that a total probability is actually mathematically defined might be tricky, but clearly if it exists it is greater than $\frac12.$ – David K Feb 13 '17 at 16:02
  • Damn, I was going to post this one. I remember reading it years ago in rec.puzzles (if anyone remembers what Usenet is) and having my mind blown (in a good way). – Brian Tung Feb 14 '17 at 00:31
  • @DavidK: I didn't mean to imply that continuity is necessary, just that it is an easy sufficient condition. Here we have two random variables, so if you don't use continuous random variables (in the measure-theoretic sense) how do you define your random variables? If you use a CDF to do it, then the CDF must be continuous (otherwise you can find a point in the 2d-plane with positive probability). – user21820 Feb 14 '17 at 12:06
  • @user21820 The guessing strategy gives a $>\frac12$ probability of success even if the numbers in the envelope are always $17$ and $17.3.$ In one variant of the problem, they are guaranteed to be integers. – David K Feb 14 '17 at 13:56
  • @DavidK: Ah I accidentally assumed independence; point masses do not necessarily cause the strategy to fail. So do you have a sufficient **and** necessary condition for the strategy to work? – user21820 Feb 14 '17 at 14:07

I particular like the triple-or-nothing game:

You start with $1$ sweet $^{[1]}$ in the pot. At each step, you can either choose to leave the game with all the sweets in the pot, or you can continue the game. If you continue, a fair coin is flipped, and if it comes up heads then the sweets in the pot are tripled, but if it comes up tails then the pot is emptied.

If you can play this game only once, how many sweets would you be willing to pay to play? And how should you play? (Assume that you want to get the most sweets possible.)

$^{[1]}$ Let's not be money-minded here...

The naive (and incorrect) analysis is to compute that at any step if there are $x$ sweets in the pot and you continue the game then the expected number of sweets in the pot will become $1.5x$. Thus you should not stop. But that is stupid; if you never stop you will never get any sweets! So when to stop?

Worse still, a correct analysis will tell you that no matter how many sweets you pay, you can play in such a way that the expected number of sweets you leave with is more than what you paid! The (silly) conclusion is that you should be willing to pay any number of sweets to play!

If you think really carefully about it, you will realize that expectation is a very poor indicator of rationality of choice. Instead, everyone will have some risk aversity, more generally a mapping from probability distributions to favourability. One possibility is that a probability distribution is unfavourable iff its median is not positive (representing no net gain). Then clearly this game will never be favourable to anyone with this kind of risk aversity except if you commit to playing for exactly one step. In real life, people will evaluate distributions in a much more complicated way than just checking the median.

That said, a reasonable rule of thumb is that it is not worth to make a decision whose estimated benefit does not have both positive mean and positive median. Positive mean is necessary for rules of thumb, otherwise you will not benefit in the long run. Positive median will prevent other stupid decisions such as playing the triple-or-nothing game for more than one step or paying more than 1.5 sweets to play it. More risk-averse people will play for zero steps and just take the initial sweet and leave!

This rule will show (reasonably) not only that it is not worth to pay even 2 sweets to play the triple-or-nothing game only once, but also that it is not worth to offer the game for others to play! Any application of probability to real-life decisions should be able to deal with such situations.

[Further remarks...]

My claim about the rule of thumb being reasonable is that it should work quite well in real life. Whether it agrees with various mathematical models of human rationality is irrelevant. Secondly, my rule of thumb is merely for determining whether a single option is worth taking or not. To compare between multiple choices of which you must pick one, you would have to extend the rule of thumb. One possible way is to define the value of each choice to be the minimum of the mean and median benefit. Then you of course pick the choice with the maximum value. Thirdly, different people will of course have different ways to evaluate a choice based on its benefit's probability distribution (assuming it can even be translated to some real number). A very risk averse person might take the minimum of the 1st percentile (roughly speaking the minimum benefit you believe you will gain in 99% of the cases) and the mean (average benefit). Someone else may combine the percentiles and mean in a different fashion, such as taking $-\infty$ as the value if the 5th percentile is below some threshold (such as representing serious hurt), but taking the mean otherwise.

  • 53,491
  • 7
  • 84
  • 231
  • 12
    I argue the problem is not in taking expectations, but the fact that "how much I value a pile of sweets" does not have a linear relationship with "how many sweets are in the pile". In fact, I assert the rational choice is to maximize the expectation of "how much I value the pile of sweets" –  Feb 12 '17 at 19:25
  • One strange point not mentioned in your remarks is that there is no strategy which would maximise the expected number of sweets you obtain (not even a mixed-strategy). This is unusual for this type of games. – Peter Franek Feb 12 '17 at 22:40
  • @Hurkyl: I've seen that explanation given a lot of times, but it is clearly flawed. It is obvious that no matter what valuation function you pick, I can pick a faster growing function in place of $( n \to 3^n )$, and you would be still stuck with the same problem. If you have completely zero risk aversity then yes you might rely totally on the moments to make decisions, but most people are risk averse to some degree and will not want to make decisions with too high probability of intolerable outcome regardless of the potential or average benefit. – user21820 Feb 13 '17 at 02:56
  • 1
    @Hurkyl: By "moments" I mean the moments of the distribution, including the mean, variance and so on. And my analogy was supposed to be so that the number of sweets is precisely how much you value the pile. The problem therefore arises independent of you valuation as long as it is unbounded. If you believe that people have bounded valuation functions then yes that's another way to avoid the 'paradox'. But in my experience people do look at percentiles to make important decisions, whether conscious of it or not. – user21820 Feb 13 '17 at 03:00
  • @PeterFranek: Yes that is an essential part of what makes this game unusual. I didn't mention it because I like to let the reader figure these kind of extra bits out. =) Basically, this game is unbounded and also suffers from the horizon effect throughout, with the infinite play having the worst payoff. The horizon effect is in fact a difficult problem in the design of computer players for well-balanced games, and there are techniques to ameliorate that such as quiescence search. – user21820 Feb 13 '17 at 03:08
  • 2
    @user21820: I'm not sure whether the valuation is bounded, but I'm pretty sure it becomes *negative* for sufficiently large $n$, at which point I'd keep playing in hopes of losing everything so that I'm not responsible for dealing with a cache of billions sweets. I would say the utility goes to $-\infty$ as $n \to \infty$ (as that many sweets crush me and the whole galaxy into nothingness), but the game definitely can't go on that long. –  Feb 13 '17 at 03:37
  • @user21820: I agree that actual people look at the expected quantity, but I do not count that as making the *rational* choice (unless the utility really is proportional to quantity) –  Feb 13 '17 at 03:38
  • @Hurkyl: Hahaha. If I kick you out of the game when the pot becomes empty, then the game will almost certainly end within $30$ steps, way before the galaxy disappears. But seriously, if your evaluation function is not even monotonic, then how can you make single decisions in a uniform and meaningful way, when the total effect of multiple decisions can give a worse outcome than if you had chosen individually poorer choices? And do you have a counter-example to my 'reasonable rule of thumb' in real-life situations? – user21820 Feb 13 '17 at 03:54
  • @user21820: Your rule of thumb is impossible to apply, for example, to any binary decision where one option has a greater expected utility, but gives worse results more than 50% of the time. Whichever option you choose violates the rule of thumb as compared to the other choice. –  Feb 13 '17 at 04:20
  • @Hurkyl: I think you misunderstood my rule of thumb. It only tells you whether a choice is worth it. To compare choices, one extension is to take the minimum of the mean and median as the value, so that each choice has a specific value and you can choose the choice with the highest (less negative) value. Any counter-example to that? (Sorry I misread your comment slightly in my (now deleted) last comment.) – user21820 Feb 13 '17 at 04:49
  • This could use a mention of https://en.wikipedia.org/wiki/St._Petersburg_paradox – JollyJoker Feb 13 '17 at 08:14
  • 1
    @user21820: I'm afraid your analysis is way off. It follows from some very reasonable axioms of preference that you can find a von Neumann-Morgenstern utility function, one that the agent wants to maximize the expected value of. For humans, it will usually be sublinear in things like "money" or "sweets". This sublinearity is what is often called "risk aversity". But you can't have it both ways - you can't say both "the utility grows exponentially" and "but you don't *really* want to maximize your expected utility". By definition you want to maximize your expected utility. – Meni Rosenfeld Feb 13 '17 at 09:38
  • @user21820: Also, it's impossible to generalize your "rule of thumb" in a consistent way. The correct way to do decision making is to figure out what your utility function really is and maximize its expected value. The resolution to "paradoxes" such as the one in your answer is best done by requiring that utility functions are bounded - which also models humans very well, for a sufficiently large bound. – Meni Rosenfeld Feb 13 '17 at 09:43
  • @user21820: See also https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem. I'll iterate again: "Risk aversity" is a feature of the choice of utility function. Once the utility function is determined, you don't add risk aversity on top of it. – Meni Rosenfeld Feb 13 '17 at 09:50
  • @MeniRosenfeld: I don't agree with expected utility theory, and as you might expect also disagree with those axioms. If you had read [my comment to Agnishom's answer](http://math.stackexchange.com/questions/2140493/counterintuitive-examples-in-probability/2140679?noredirect=1#comment4402832_2140502) you would have realized that axiom 3 **does not hold** for most rational people even if they choose between lotteries based on a monotonic function of mathematical expectation. – user21820 Feb 13 '17 at 11:39
  • @MeniRosenfeld: I already said to Hurkyl that if you want to **compare** choices one way is to take the minimum of mean and median as your utility function. Of course there is no "risk aversity" on top of it. I don't know where you got the idea that there was; I certainly didn't say it. I instead clearly implied that different people will have different utility functions; read [this comment for instance](http://math.stackexchange.com/questions/2140493/counterintuitive-examples-in-probability/2140679?noredirect=1#comment4405423_2140679). – user21820 Feb 13 '17 at 11:42
  • @MeniRosenfeld: So you're actually wrong in claiming that my rule of thumb cannot be generalized consistently, because you assumed that I must accept some axioms which I did not. The claim I made is that my rule of thumb works in real life, **not** in some mathematical model, so if you have a **real-life** situation in which you believe my rule fails, I would be interested to know. – user21820 Feb 13 '17 at 11:47
  • @user21820: 1. It makes no sense to me how you could disagree with these axioms. 2. It was quite clearly implied in your comments that you do expect there to be risk aversity on top of the utility function. 3. Even if for some reason you reject the axioms, your rule of thumb still is not consistent. 4. It doesn't even make sense to contrast mathematical model and real-life situation, because "mean" and "median" exist only for a model, not for "real life". 5. I maintain my position that you are misguided but I doubt further discussion will lead to our agreement, so we should stop here. – Meni Rosenfeld Feb 13 '17 at 12:31
  • @MeniRosenfeld: Sure you are free to stop here. But I have a strong background in mathematical logic and I certainly know the difference between mathematical models and "real life". Did you even read my linked comment, which explains how the Ellsberg paradox is only a paradox when you accept expected utility theory (including all those axioms)? In that very comment I made clear why rational people can and should behave contrary to that theory. Even the Wikipedia page states that observationally people don't follow expected utility theory. – user21820 Feb 13 '17 at 12:44
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/53552/discussion-between-user21820-and-meni-rosenfeld). – user21820 Feb 13 '17 at 12:44
  • Hi, Do you know why the question has been put on hold? – MR_BD Feb 13 '17 at 15:48
  • 2
    @LeilaHatami: You can see the reason that the 5 people (not me) gave; namely they think your question is too broad. Well it is broad, but you tagged it as [tag:big-list] and it is an interesting list so I wouldn't close it myself. Anyway, there are already 3 people voting to re-open it; 2 more and it will be opened again! – user21820 Feb 13 '17 at 16:00
  • Thanks again. I can see their names but they didn't say anything special – MR_BD Feb 13 '17 at 16:15
  • I think this example makes more sense with money than with sweets. If I own five sweets I might well be willing to risk them all on an infinitesimal chance of becoming God-emperor of the confectionary industry, because I don't like to eat then very much and in any case I can always buy more. Risk-aversity becomes a more important factor if we play with money instead. – N. Virgo Jun 07 '18 at 01:54
  • @Nathaniel: Haha I was joking with the sweets; I meant any unit of value. It seems your objection is based on the sweets you start out with having individually less value than if you gained a trillion sweets and hence had a significant market share of confectionery. That is, of course, not intended by my example. =) – user21820 Jun 07 '18 at 02:43

Strongly related with OPs example is this consequence of the Arc sine law for last visits. Let's assume playing with a fair coin.

Theorem (false) In a long coin-tossing game each player will be on the winning side for about half the time, and the lead will pass not infrequently from one player to the other.

The following text is from the classic An Introduction to Probability Theory and Its Applications, volume 1, by William Feller.

  • According to widespread beliefs a so-called law of averages should ensure the Theorem above. But, in fact this theorem is wrong and contrary to the usual belief the following holds:

    With probability $\frac{1}{2}$ no equalization occurred in the second half of the game regardless of the length of the game. Furthermore, the probabilities near the end point are greatest.

  • 94,265
  • 6
  • 88
  • 219
  • 1
    @user21820: No worries! Everything's fine. – epi163sqrt Feb 13 '17 at 07:57
  • 3
    The "law of averages" is definitely a good fallacy to destroy early. – mbomb007 Feb 13 '17 at 20:09
  • 2
    What a delightful read (first link): *One may summarize these results by stating that one should not get drunk in more than two dimensions.* – ojdo Feb 13 '17 at 23:34
  • 1
    @ojdo: Many thanks for your nice comment. If you like this paper, I'm pretty sure you will find the chapter III, *Fluctuations of coin tossing and random walks* of Feller's book inspiring and valuable to read. – epi163sqrt Feb 14 '17 at 07:54
  • This makes me wonder, if you have a coin-toss game with some unknown bias, if after some time the two players equalize, who is more likely to win? The intuition I get from this answer is that it's the one who was previously behind – Ben Aaronson Feb 14 '17 at 13:51
  • @BenAaronson: As I stated right in the beginning here we assume a fair coin. – epi163sqrt Feb 14 '17 at 16:07
  • 1
    @BenAaronson: Why? If there is unknown bias but each coin toss is independent, and at some point heads and tails equalize, then you have no information whatsoever about the bias because the data is symmetric due to the independence. – user21820 Feb 15 '17 at 04:35
  • 1
    @MarkusScheuer I realise, my comment was more of an extension – Ben Aaronson Feb 15 '17 at 14:20

A famous example is this one that is called St. Petersburg paradox:

Consider a game in which you earn $2^n\: \$ $ if you get $n$ consecutive Heads in a fair coin tosses. The fair entrance fee of this game is $\infty$

  • 2
    I'd just want to point out that this idea can be extended to a 'more strange' triple-or-nothing game as described in [my answer](http://math.stackexchange.com/a/2140679/21820), and we do not need any notion of $\infty$ to get apparently counter-intuitive results. – user21820 Feb 12 '17 at 14:28
  • @user21820 Yeah. Thank you for mention. But i tried to tell it in simplest form. –  Feb 12 '17 at 14:43
  • 1
    I did not get this one .What is fair entrance fee ? –  Feb 12 '17 at 22:32
  • 1
    @A---B: It's how much you have to pay to play the game such that the net expected outcome is zero. But if defined this way it would be wrong to say "fair entrance fee is $\infty$" since $\infty-\infty$ is not defined. See my answer for explanation of a related game where there is no finite fair entrance fee. If however you define "fair entrance fee" as "expected payoff" (using say measure theory for expectation) then it is indeed $\infty$ in these two games, though it does not have a clear real-world meaning this way. – user21820 Feb 13 '17 at 03:37
  • 1
    You didn't say when the game *ends.* So I'll just keep on playing. – Wildcard Feb 14 '17 at 03:32
  • 1
    @Wildcard If you have an infinity source of money DO IT. What's better than playing dummy games when you have enough(!) money –  Feb 15 '17 at 19:16
  • (Old thread, I know) @user415542 I interpret Wildcard's comment as "the game description does not mention how many coin tosses you are allowed 'per game', so there is no 'end condition'". Based on that, even if you earned log(n) dollars if you get n consecutive heads, the fair entrance fee is still not finite, since you can continue to flip coins until you get enough consecutive heads to cover your entrance fee. – Tonio Jun 07 '18 at 20:36

Bertrand's Paradox

Given two concentric circles ($S_1$, $S_2$) with radii $R_1=r$ and $R_2=\frac{r}2$, what is the probability, upon choosing a chord $c$ of the circle $S_1$ at random, that $c\:\cap\: S_2 \neq \emptyset$ ?

Simply speaking, your task is to

choose a chord of the larger circle at random and find the probability that it will intersect the smaller circle.

Surprisingly, Bertrand's Paradox offers three distinct yet valid solutions.

The same problem can also be stated as:

Given an equilateral triangle inscribed in a circle, find the probability of randomly choosing a chord of the circle greater than the length of a side of the triangle.

The counter-intuition steps in when you understand that the answer to the stated problem is $\frac12,\:\frac13,$ and even $\frac14$ at the same time, and all three answers are perfectly valid.

The crucial reason why there are three solutions to this in different cases is that the methods of selection of random variables are different in each case.

Here's the Wikipedia page for details on how each value is obtained and through what steps.

I remember that a professor had begun my high-school level probability class using Bertrand's Paradox as an introductory example.

  • 65,147
  • 11
  • 62
  • 122
  • 522
  • 2
  • 11
  • 3
    This is less "counterintuitive" and more a cautionary tale about the use of the term "random". Still a good answer, as it reveals a critical detail of probability in an effective and interesting way. – Glen O Feb 13 '17 at 15:13
  • 1
    The reason I used the word counter-intuitive is because one would not generally expect to have multiple valid solutions for a seemingly ordinary problem. And yes, what you say makes sense. – axolotl Feb 13 '17 at 15:16
  • 4
    The seeming validity of all solutions comes from the fact that there is a vagueness in the question, though. Not unlike a question saying "Roll a dice. What is the chance of rolling a 6?" - if it's a standard dice and unweighted, then it's about 1/6, but it could be weighted, or it could fail to have a 6 on it, or have only 6s, or have more/less than 6 sides, etc. – Glen O Feb 13 '17 at 15:54
  • Interesting. How would one make the said question more specific? What condition(s) or additional premises need one include so that an ordinary mathematician would reach at exactly one solution? – axolotl Feb 13 '17 at 16:11
  • 4
    If you mean the Bertrand's Paradox question, then one needs to define how the chord is randomly selected. The word "random" usually implies "uniformly random" in a situation like this, but you need to specify the domain. So you might say "choose a chord of the larger circle by randomly selecting two points on its circumference" (implying the two points are chosen uniformly), which gives 1/3 as the only valid answer. In short, the missing information is the "selection method" - with it defined, the solution is unique. – Glen O Feb 13 '17 at 16:32

Base rate fallacy

If presented with related base rate information (or generic information) and specific information (information only pertaining to a certain case), the mind tends to ignore the former and focus on the latter.

A group of police officers have breathalyzers displaying false drunkenness in 5% of the cases in which the driver is sober. However, the breathalyzers never fail to detect a truly drunk person. One in a thousand drivers is driving drunk. Suppose the police officers then stop a driver at random, and force the driver to take a breathalyzer test. It indicates that the driver is drunk. We assume you don't know anything else about him or her. How high is the probability he or she really is drunk?

Intuitive first answer might be as high as 0.95, but the correct probability is about 0.02.

Solution : Using Bayes's theorem.

The goal is to find the probability that the driver is drunk given that the breathalyzer indicated he/she is drunk, which can be represented as $${\displaystyle p(\mathrm {drunk} |D)}$$
where "D" means that the breathalyzer indicates that the driver is drunk.

Bayes's theorem tells us that

$$ {\displaystyle p(\mathrm {drunk} |D) = {\frac {p(D|\mathrm {drunk} )\, p(\mathrm {drunk} )}{p(D)}}} $$

We were told the following in the first paragraph:

$${\displaystyle p(\mathrm {drunk} )=0.001} $$ $${\displaystyle p(\mathrm {sober} )=0.999} $$ $${\displaystyle p(D|\mathrm {drunk} )=1.00} $$ $${\displaystyle p(D|\mathrm {sober} )=0.05} $$

As you can see from the formula, one needs p(D) for Bayes' theorem, which one can compute from the preceding values using
$${\displaystyle p(D)=p(D|\mathrm {drunk} )\,p(\mathrm {drunk} )+p(D|\mathrm {sober} )\,p(\mathrm {sober} )} $$

which gives $$ {\displaystyle p(D)=(1.00\times 0.001)+(0.05\times 0.999)=0.05095} $$

Plugging these numbers into Bayes' theorem, one finds that $$ {\displaystyle p(\mathrm {drunk} |D)={\frac {1.00\times 0.001}{0.05095}}=0.019627 \approx 0.02 } $$

A more intuitive explanation: on average, for every 1,000 drivers tested, 1 driver is drunk, and it is 100% certain that for that driver there is a true positive test result, so there is 1 true positive test result 999 drivers are not drunk, and among those drivers there are 5% false positive test results, so there are 49.95 false positive test results.
Therefore, the probability that one of the drivers among the $$1 + 49.95 = 50.95 $$positive test results really is drunk is $$ {\displaystyle p(\mathrm {drunk} |D)=1/50.95\approx 0.019627} $$

  • 572
  • 10
  • 26
  • 5
    I find this fallacy easier to explain with whole numbers. For every 20,000 drivers tested, 20 are drunk (and will show drunk on a test), and 999 are not drunk but will falsely show as drunk on a breathalyzer test. So the odds that someone showing drunk is actually drunk is 20/1019...unless, of course, he or she was pulled over for swerving and running a red light. ;) – Wildcard Feb 14 '17 at 04:13
  • That's actually a more simple and better example. And yes, the randomness of choosing drivers is inherently important, otherwise another parameter which defines driver's competency while being drunk or sober comes into play :D – ABcDexter Feb 14 '17 at 05:29

I have also had a lecture on this excellent topic.

Unfortunately, my lecture notes are in Czech, but I can translate some paradoxes from there:

Monty Hall

Monty Hall is in my opinion the most famous probabilistic paradox. It is described here and on the Wiki well. So I just provide you a way how to make it feel intuitive, to persuade other people that the real probability is computed correctly. It is quite impressive, almost like a magician trick :-)

Take a pack of cards and let someone draw a random card. Tell him that he want to draw the ace of hearts and he should not look on the chosen card. Then show to audience all remaining cards but one which are not the ace of hearts. So then there are two hidden cards: One in your hand and one in his hand. Finally, he may change his first guess. Most people do it and they are most likely correct :-).

Tennis-like game

There are two players, Alice and Bob, playing a tennis-like game. Every time, one player serves the ball and winning probabilities of the ball depends on the player. Player who first reaches say 11 points wins the match. Alice serves the first ball. Then there are three possible schemes:

  1. The winner of the previous ball serves the next.
  2. The loser of the previous ball serves the next.
  3. Service is regularly alternating.

One would expect that scheme 1. helps stronger players and scheme 2 helps weaker players. The paradox is that winning probabilities of the match do not depend on the chosen scheme.

Proof sketch: Pregenerate 11 cards with the winner of Alice services (pack A) and 10 cards with the winner of Bob's service (pack B). Then each Alice's (or Bob's) service can be modeled just by drawing a card from the pack A (or B). It can be shown that these 21 cards suffice for any of these 3 presented schemes. And the winner is determined by the cards: there is exactly one player written on at least 11 cards.


I have a bag of candies, there are 123 caramel candies and 321 mint candies. Every morning I randomly draw candies from the pack and eat them while they are all the same. When I take a different kind of candy, I return it back. What is the probability that the last eaten candy will be the caramel one?

Answer: 1/2. (one would expect that it is less than 1/2 since there are less caramel candies)

Proof: It suffices to show that every morning the probability that all caramel candies will be eaten is the same as the probability that all mint candies will be eaten. We can imagine that candies are randomly ordered every morning and I am drawing them from left to right. I eat all caramel candies if the order is "First caramel candies, then mint ones.". I eat all mint candies if the order is the opposite.

Wolf on a circle

There is a wolf at one vertex of a regular n-gon. There is a sheep at every remaining vertex. Each step, the wolf moves to a randomly chosen adjacent vertex and if there is a sheep, the wolf eat it. The wolf ends when it eats n-2 sheep (so there remains just one sheep).

Intuitively, the sheep at the opposite vertex from the wolf is in the best position. The paradox is that all sheep have the same probability of survival.

Proof: Take one sheep S. The wolf will definitely get to an adjacent vertex to S for the first time. This time the sheep on the other adjacent vertex is still not eaten. So for S survival, the wolf have to go around the whole circle without eating S. The probability of going around the whole circle from one vertex adjacent to S to the other without visiting S does not depend on the position of S.

Simpson's Paradox

There is a research on two medical cures A, B.

200 people tried the cure A, it helped to 110 people (50 men, 60 women) and did not helped to 90 people (60 men, 30 women).

210 people tried the cure B, it helped to 120 people (30 men and 90 women) and did not helped to 90 people (40 men, 50 women).

So in general, the cure B is better since 120:90 > 110:90.

But if you are a man, you can consider just men statistics: 50:60 > 30:40, so the cure A is more appropriate.

And if you are a woman, you can consider just women statistics: 60:30 > 90:50, so the cure A is again more appropriate.

Shocking, isn't it? :-)

Mirek Olšák
  • 1,274
  • 6
  • 10
  • These are all very nice, and I hadn't seen #2 or #3 before, thanks! – Especially Lime Feb 16 '17 at 10:21
  • I like the Candy-chain one a lot! But your 'proof' for Wolf-on-the-circle does not seem to yield a proof of the claim of equal probability. Can you add that? – user21820 Feb 18 '17 at 09:17
  • Sufficient now? – Mirek Olšák Feb 18 '17 at 11:48
  • Yep. I think I didn't get notified of your response. What I missed earlier was simply that the wolf will almost surely get adjacent to any specified sheep at some point, and after the first time your added paragraph shows that it has a certain constant probability of getting to the other side of that sheep without eating it. Nice! I already upvoted so I can't do it again. =) – user21820 Mar 05 '17 at 11:56

The Shooting Room Paradox

A single person enters a room and two dice are rolled. If the result is double sixes, he is shot. Otherwise he leaves the room and nine new players enter. Again the dice are rolled, and if the result is double sixes, all nine are shot. If not, they leave and 90 new players enter, and so on (the number of players increasing tenfold with each round). The game continues until double sixes are rolled and a group is executed, which is certain to happen eventually (the room is infinitely large, and there's an infinite supply of players).

If you're selected to enter the room, how worried should you be? Not particularly: Your chance of dying is only 1 in 36. Later your mother learns that you entered the room. How worried should she be? Extremely: About 90 percent of the people who played this game were shot. What does your mother know that you don't? Or vice versa?

  • 619
  • 3
  • 13

I think the most stunning example are the non-transitive dice.

Take three cubic dice with the following numbers on their sides:

  • Die $A:$ $3 \: 3 \: 3 \: 3 \: 3 \: 6$
  • Die $B:$ $2 \: 2 \: 2 \: 5 \: 5 \: 5$
  • Die $C:$ $1 \: 4 \:4 \: 4 \: 4 \:4$

Now I offer you the following game: You choose a die as you like, then I choose another die, and then we are rolling and the highest number wins.

No matter which die you choose, I can choose another one that wins more often than loses against your choice.

  • 5,261
  • 5
  • 41
  • 59
  • 41,315
  • 8
  • 68
  • 125
  • Thank you. It's another version of "Penney's Game" which I've mentioned in the question – MR_BD Feb 13 '17 at 07:10
  • Oops. The Article you've linked is written by James Grime, Who is the talker of the video I've mentioned... – MR_BD Feb 13 '17 at 07:15
  • @LeilaHatami: Well, I didn't take the time to look through the videos, so I still don't know Penney's game. – celtschk Feb 13 '17 at 08:47
  • It has a Wiki page: https://en.wikipedia.org/wiki/Penney's_game – MR_BD Feb 13 '17 at 09:51
  • 1
    We started with Brad Efron's non-transitive dice and made them even more paradoxical: Lake Wobegon Dice (see below). – David G. Stork Feb 14 '17 at 01:05
  • Section 14.3 of [Chapter 14 of MIT's Mathematics for Computer Science](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-042j-mathematics-for-computer-science-fall-2010/readings/MIT6_042JF10_chap14.pdf) details "even stranger dice." – Wildcard Feb 14 '17 at 04:24

The boy or girl paradox already mentioned by Agnishom has an interesting variation:

''Suppose we were told not only that Mr. Smith has two children, and one of them is a boy, but also that the boy was born on a Tuesday: does this change the previous analyses?'' (for the question ''what is the probability that both children are boys'')?

Using some elementary computations with Bayes formula, the seemingly useless information that a child was born on Tuesday, changes the results.

To understand the intuition behind, consider an extreme case where you knew that one boy was born on December $30$. Then it is very unlikely that the other child is born on that date too, so one child is ''specified'' by the date-information. This reduces the question to ''is the other child a boy''? and changes the probability from $\frac13$ to approximately $\frac12$.

However, I do not recommend to use this example for teaching, as there are many interpretations of this paradox (that partially depend on language nuances of the formulation) and it can add more confusion then clarify something.

  • 5,261
  • 5
  • 41
  • 59
Peter Franek
  • 10,750
  • 4
  • 24
  • 45
  • 2
    The last couple paragraphs make it perfectly clear, actually, culminating in this sentence: *"The moral of the story is that these probabilities do not just depend on the known information, but on how that information was obtained."* As an IT professional dealing with such questions as, "How many servers out of 10000 are likely to have this issue given that these 12 servers have the issue?" and the fact that data is often gained by sometimes badly written SQL queries (that filter out too much or too little), this is a *crucial* thing to learn. Thanks for this. – Wildcard Feb 14 '17 at 04:05
  • Actually, this seems to be extremely related to "sampling bias"; is it not? – Wildcard Feb 14 '17 at 04:09
  • @Wildcard Might be. It may also be related to the philosophical question how good the concept of "probability" is well-defined at all (see the Sleeping beauty paradox mentioned somewhere above)... but I'm not an expert on this. – Peter Franek Feb 14 '17 at 09:55
  • +1 for the last paragraph. If the question is phrased as "A mother with two children tells you that at least one is a girl", you actually don't have enough information to work out the probabilities, and I'd argue that based on common-sense assumptions, 1/2 is a better answer than 1/3. I think the most clear way to phrase this is always to state it as a question and answer: "You ask a mother if either of her children is a girl, she says yes." – Ben Aaronson Feb 14 '17 at 13:55
  • @BenAaronson Ah, now I'm a bit ashamed because honestly, I don't see a difference between these two questions. I'm not a native speaker, however :) – Peter Franek Feb 14 '17 at 13:58
  • The difference is that in the first version, you don't know why the mother decided to tell you that. Sounds silly, but is actually important. For example, imagine she'd been prompted "Tell me the gender of one of your children". A neutral assumption might be that she picks a child at random and tells you their gender, or picks a gender at random and tells you that one if either child matches. Both of those answers would give an overall probability of 1/2. Only if she has some bias towards always preferring to mention girls would you get 1/3 (a similar bias towards boys would give 0) – Ben Aaronson Feb 14 '17 at 14:03
  • 1
    What this calls into question in my mind is not the concept of probability, but the human tendency to neglect the necessary information on which to compute probabilities--as a result of which we unwittingly sabotage our own questions in probability. (The same thing happened with the Monty Hall puzzle in another question.) See http://math.stackexchange.com/a/4401 for further discussion of the flaws in the above presentation of the boy-girl question. – David K Feb 14 '17 at 14:41

This is one of my favorites:

100 prisoners problem

The director of a prison offers 100 death row prisoners, who are numbered from 1 to 100, a last chance. A room contains a cupboard with 100 drawers. The director randomly puts one prisoner's number in each closed drawer. The prisoners enter the room, one after another. Each prisoner may open and look into 50 drawers in any order. The drawers are closed again afterwards. If, during this search, every prisoner finds his number in one of the drawers, all prisoners are pardoned. If just one prisoner does not find his number, all prisoners die. Before the first prisoner enters the room, the prisoners may discuss strategy—but may not communicate once the first prisoner enters to look in the drawers.

If every prisoner selects $50$ drawers at random, the probability that a single prisoner finds his number is $50\%$. Therefore, the probability that all prisoners find their numbers is the product of the single probabilities, which is $1/2^{100} \approx 0.0000000000000000000000000000008$, a vanishingly small number. The situation appears hopeless.

Up to which value can prisoners improve the probability of being pardoned using a good strategy?

  • 619
  • 3
  • 13

One that I found surprising as a beginner was that three events (or random variables) can be independent pairwise, but not jointly independent. Or to put it somewhat more strikingly, we can have $C$ independent of $A$ and independent of $B$, yet not independent of $A,B$. Wording it this way shows that it does take care to state independence assumptions carefully, and it also illustrates some non-obvious subtleties in the definition of independence (one of them being that independence of two events does not mean that they don't interact and can't be influenced by a third factor).

One example of pairwise but non-mutual independence is given on the Wikipedia page.

The example that I typically use is to take $T$ to be a uniformly random angle in $[0,2\pi)$ and then consider the events $A = \{\sin T > 0\}$, $B = \{ \cos T > 0\}$ and $C = \{ \tan T > 0 \}$ (effectively, this is just two independent $\pm 1$ variables and their product, but the trig formulation helps to visualize the events in terms of quadrants).

It's easy to see that $P(A)=P(B)=P(C) = \tfrac12$ and that $P(A\cap B) = P(A\cap C) = P(B\cap C) = \tfrac14$, but clearly $P(A\cap B\cap \overline C) = 0$.

Erick Wong
  • 24,142
  • 2
  • 32
  • 85
  • Could you add some reference to this for further reading? – Martin Ender Feb 17 '17 at 09:04
  • @MartinEnder Thanks, I added a wiki link and an explicit example. – Erick Wong Feb 17 '17 at 12:26
  • Oh I see now. I think I misunderstood your wording at first. I read it as "three events can't be jointly independent" whereas you're saying "it's *possible* for three events to be mutually independent while at the same time not being jointly independent". – Martin Ender Feb 17 '17 at 12:33
  • 4
    Simpler example, write $1$ on the heads side of a coin and $0$ on the tails, then let $A$ and $B$ be two independent tosses of this coin and $C$ be $1$ if $A$ is equal to $B$ and $0$ otherwise. Then $C$ is independent from $A$ and from $B$, but clearly not from both. – Ben Millwood Feb 18 '17 at 09:02
  • 1
    I think it's more instructive to change to a simpler example like the one @BenMillwood gave. Equivalently, and more symmetrically, just take a random 3-bit string that has even parity. Then the 3 bits are pairwise independent but every bit is totally dependent of the other two bits. Trivial to prove too. – user21820 Feb 18 '17 at 09:09

Someone mentioned non-transitive dice, and that reminded me of this one:

Suppose there are two unweighted six-sided dice that you cannot examine, but which you can direct a machine to roll and inform you of the sum. You can do this as often as you like, and the distribution of the sum is exactly what you would expect of a pair of ordinary six-sided dice.

Are they, in fact, a pair of ordinary six-sided dice?

Not necessarily.

Then someone mentioned a secretary problem, which turned out to be about derangements. I had in mind a different secretary's problem, which is also called the sultan's dowry:

You have $100$ candidates, upon which there exists a total order. The candidates have appointments with you, one at a time, in a random order. From each interview, you can tell exactly how that candidate ranks amongst those you have already examined. At that point, you may either accept or reject the candidate. Any acceptance or rejection is permanent; no candidate, once rejected, may be reconsidered. Your objective is solely to accept the best candidate. What is the strategy for maximizing your probability of doing so, and what is that probability?

As often happens in probability puzzles, the answer is $1/e$*, which many people find surprisingly high.

*Approximately, with the approximation getting better and better as the number of candidates increases without bound.

Brian Tung
  • 31,234
  • 3
  • 33
  • 64
  • The answer to the secretary problem is only approximately 1/e, with the approximation getting better as the number of candidates goes to infinity. – Anders Kaseorg Feb 14 '17 at 02:56
  • @AndersKaseorg: Well, yes. I was being rather imprecise, but certainly the probability is not exactly $1/e$ for finite numbers of candidates if the process is deterministic. – Brian Tung Feb 14 '17 at 07:23
  • 1
    Thanks. I love you last sentence: As often happens in probability puzzles, the answer is 1/e – MR_BD Feb 14 '17 at 07:25

There's the two envelopes paradox. Wikipedia states it as follows:

You are given two indistinguishable envelopes, each containing money, one contains twice as much as the other. You may pick one envelope and keep the money it contains. Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?

The issue is that there is an amount $X$ of money in the envelope you have. If it's the lesser amount then switching gives you a reward of $2X$ and if you don't then you only get $X/2$. Since both cases are equally likely, it seems that you should switch, even though that is nonsensical since clearly your chance of getting the larger amount can only ever be 50%.

The Wikipedia article goes into great depth explaining various resolutions of this fallacy. It boils down to the fact that the sum of the two envelopes is the same in both cases, which means that the $X$s considered above aren't actually identical.

A related problem is the necktie paradox.

Martin Ender
  • 5,710
  • 4
  • 17
  • 28

It's not counter intuitive but it's amazing for teaching in class.

Pick $a,b \in [n]$ randomly. $\mathbb{P}[gcd(a,b)=1]$ tends to $\frac{6}{\pi^2}$ as $n$ goes to infinity.

Also, there is some other interesting problem whose answers have $\pi , e ,...$

  • What is the meaning of $[n]$? And where would I find a proof of this? – littleO Feb 13 '17 at 07:42
  • @littleO Consider $\prod (1-\frac{1}{i^2})$ and use Euler product formula –  Feb 13 '17 at 08:08
  • Ok thanks. I'm still curious what $[n]$ means and what kind of textbook contains this type of result. – littleO Feb 13 '17 at 08:42
  • 1
    @littleO It means {1,2,...,n}. You can find it here: http://mathoverflow.net/questions/97041/what-is-the-probability-that-two-numbers-are-relatively-prime and here: http://math.stackexchange.com/questions/64498/probability-that-two-random-numbers-are-coprime I have forgot where I read it but it's a well-known fact. You probably can find it easily through the web. –  Feb 13 '17 at 08:48
  • Thanks, those links are interesting. – littleO Feb 13 '17 at 09:04
  • Whoa. My intuition would be that the answer is zero, since obviously large numbers generally have lots of small factors. 25% of pairs are both divisible by two, 1/9 both divisible by three. Ah, I see how that could work. Is the sum of 1/p^2 for all primes then 6/pi^2? – JollyJoker Feb 13 '17 at 09:07
  • @littleO Your welcome... –  Feb 13 '17 at 09:28
  • 1
    @JollyJoker Yeah, It's very interesting. But it's not the sum of $\frac{1}{p^2}$ . It's $\prod(1-\frac{1}{p^2})$ and you have to use Euler product formula to show that the inverse of the quantity is equal to $\sum \frac{1}{i^2}$ –  Feb 13 '17 at 09:30
  • @Marryamry Ah, of course. Thanks. – JollyJoker Feb 13 '17 at 09:37
  • $[n]$ is essentially a generative representation of the natural numbers leading up to (including) $n$. – axolotl Feb 13 '17 at 14:11

One I saw on Twitter recently, which is perhaps a clearer version of sex-of-children-type problems:

Three casino chips have a dot on each side:

  • on one chip the dots are both blue,
  • on the second there is a blue dot on one side and red on the other, and
  • on the third the dots are both red.

The chips are placed in a bag and, without looking, you choose one and place it on the table. When you look at the chip, you see it has a blue dot on top. What is the probability that the dot on the other side is blue?

Many people will say $1/2$ - I did before I thought properly - but...

you are blindly choosing both the chip, and which side to face up. So you have to consider that each dot has an equal chance of showing, making the chance $2/3$.

  • 38,461
  • 5
  • 42
  • 78

Perhaps Parrondo's Paradox would be interesting. One can combine losing propositions into a winning proposition.

Simpson's Paradox is also interesting. (And actually occurred in a court case.)

  • 326
  • 4
  • 8
  • Yes, Simpson's Paradox, or the Yule-Simpson effect) is a good example: https://en.m.wikipedia.org/wiki/Simpson's_paradox. Also, the Dover paperback Counterexamples in Probability by Jordan M. Stoyanov is very useful for your purpose. – PolyaPal Feb 14 '17 at 18:33
  • I came here to include Simpson's Paradox as well. Please expand your answer to explain what it is. – BlueRaja - Danny Pflughoeft Feb 15 '17 at 19:29
  • Parrondo's paradox is very interesting, if a bit sophisticated for an introductory class. I really think this answer deserves some expansion! – Silverfish Feb 15 '17 at 21:58

Lake Wobegon Dice

Find a set of $n$ dice (each with $s$ sides, numbered appropriately), in which each die is more likely to roll above the set average on that roll than below the set average. Given $n$, find the Lake Wobegon Optimal set, in which that probability is maximum.

"Lake Wobegon Dice," by Jorge Moraleda and David G. Stork, College Mathematics Journal, 43(2):152--159 (2012)


  • We present sets of $n$ non-standard dice—Lake Wobegon dice—having the following paradoxical property: On every (random) roll of a set, each die is more likely to roll greater than the set average than less than the set average; in a specific statistical sense, then, each die is “better than the set average.”

    We define the Lake Wobegon Dominance of a die in a set as the probability the die rolls greater than the set average minus the probability the die rolls less than the set average. We further define the Lake Wobegon Dominance of the set to be the dominance of the set’s least dominant die and prove that such paradoxical dominance is bounded above by $(n-2)/n$ regardless of the number of sides $s$ on each die and the maximum number of pips $p$ on each side. A set achieving this bound is called Lake Wobegon Optimal. We give a constructive proof that Lake Wobegon Optimal sets exist for all $n \ge 3$ if one is free to choose $s$ and $p$. We also show how to construct minimal optimal sets, that is, that set that requires the smallest range in the number of pips on the faces.

    We determine the frequency of such Lake Wobegon sets in the $n = 3$ case through exhaustive computer search and find the unique optimal $n = 3$ set having minimal $s$ and $p$. We investigate symmetry properties of such sets, and present equivalence classes having identical paradoxical dominance. We construct inverse sets, in which on any roll each die is more likely to roll less than the set average than greater than the set average, and thus each die is “worse than the set average.” We show the unique extreme “worst” case, the Lake Wobegon pessimal set.

enter image description here

David G. Stork
  • 27,031
  • 5
  • 28
  • 53

In contract bridge, there is the principle of restricted choice. It's always seemed counterintuitive to me.


Walter Mitty
  • 151
  • 4
  • 1
    The Principle of Restricted Choice is identical to the Monty Hall Paradox. – ttw Feb 13 '17 at 19:13
  • @ttw But Restricted Choice actually occurs in the real world, while the Monty Hall Paradox doesn't actually occur with the real Monty Hall, unless he very explicitly states the rules for when he opens a door to a goat. So, in some sense, restricted choice is a validation in a real game of chance that people actually play. – Thomas Andrews Jun 06 '18 at 15:19

Optimizer's Curse: suppose you have a number of options to choose from, each with some objective true value. You don't have access to the objective true value, but instead to a noisy estimate of it, say a value sampled from a normal distribution with mean the true value and variance $1$.

You, naturally, pick the choice whose estimate of true value is the highest. When you do so, you discover what the true value really was. Call the difference between the estimate and the true value your post-decision surprise.

Now, the error of your estimate was normally distributed with mean $0$, so you might also guess that your post-decision surprise will have mean $0$ – sometimes you will be pleasantly surprised, sometimes disappointed. But in fact the post-decision surprise is usually negative, that is to say, you are usually disappointed by your choice!

In retrospect, this is perhaps not so surprising: certainly if all the true values were the same, you'd simply pick the estimate with the highest inflation due to noise, and more generally, conditional on an estimate being the leader of a pack, it's more likely to be an overestimate than an underestimate.

More interestingly, if the variances aren't the same for all your estimates, it's no longer correct to just pick the highest estimate to maximise your expected true value. If the highest estimate is also high-variance, it may lead to a lower true value in expectation than a lower estimate with better precision, and so what looks superficially like some kind of risk-aversion (discounting the value of apparently high-variance options) is actually justifiable purely on expected-value grounds.

(Notice also that "there a bunch of options with some true values, but you only get a noisy estimate of the true value, and you have to decide which to choose" is a REALLY common situation to be in, so this problem is pretty pervasive in real life optimization scenarios)

Ben Millwood
  • 13,604
  • 3
  • 29
  • 59
  • I actually don't think this one is great for an introductory course, because it's too subtle. But it's definitely a counterintuitive result in probability, and I think it needs to be better-known, so here it is. – Ben Millwood Feb 18 '17 at 09:33
  • I like the "persuasive commonness" of such situations! But the part I find surprising (if true) is not the fact that the simplistic method yields an overestimate, nor the fact that you need to take differing variances into account, but that it suffices to discount each estimate independent of the other choices estimated values and known variances! Is it really true, and can you include the discount formula? Thanks! – user21820 Feb 18 '17 at 09:39
  • 1
    Actually, now that you call me out on it, I don't know the formula, nor am I certain there is one, so I replaced my claim with a much weaker one. There's a paper which discussed a Bayesian method of correcting for optimizer's curse, and I had naïvely interpreted it as discounting, but I haven't read the paper in enough detail to know if that's actually how it works. – Ben Millwood Feb 18 '17 at 09:50
  • Okay thanks for your clarification! =) – user21820 Feb 18 '17 at 09:56

The secretary's problem (which has other names).The secretary has $n$ letters ($0<n<\infty$) and $n$ pre-addressed envelopes but puts the letters into the envelopes randomly, one letter per envelope. What is the chance $C(n)$ that NO letter gets into the right envelope?

The answer is $C(n)=\sum_{j=0}^n(-1)^j/j!,$ which converges to $1/e$ as $n\to \infty$. I think the method of solution is instructive.

One counter-intuitive result is that $C(n)$ is not monotonic in $n.$

Also many people would be inclined to guess that $C(n)>1/2$ for large $n.$

Another version of this is to take two shuffled decks, each with $n$ playing cards, and ask for the chance that no card occupies the same position in both decks.

I first saw this in "101 Great Problems In Elementary Mathematics" by H. Dorrie.

  • 53,442
  • 4
  • 26
  • 68
  • "take two identical shuffled decks, each with n playing cards"... the chance that no card occupies the same position in both decks is zero, as the two decks are identical. Perhaps it should say "two shuffled decks, each with the same $n$ playing cards"? – Glen O Feb 13 '17 at 15:25
  • @GlenO. I meant identical as unordered sets. But I deleted the word. – DanielWainfleet Feb 13 '17 at 19:13
  • 4
    When I see "the secretary problem", I generally think of the secretary selection problem (also known as sultan's dowry). Also $1/e$. I have a theory that says that $1/e$ is the most likely answer to a probability question; in fact, the probability that it is the answer is... – Brian Tung Feb 14 '17 at 00:25
  • @BrianTung. I like that. – DanielWainfleet Feb 14 '17 at 00:37

I see the Monty Hall has been mentioned a couple of times, but I want to mention it again because I think the reason that it's interesting is missed in the other answers. In particular, it demonstrates not only a counter-intuitive result for a given formulation, but it also demonstrates how sensitive the correct answer is to the formulation of the problem. I especially like this NYT article as an illustration:


From a teaching point of view, this is a fun exercise because Monty Hall (the real person) is part of the article, and he plays a role both in validating the math on the "academic" version of the problem and in showing the math is meaningless in the real game because he has controls that are not in academic version. Moreover, after years of doing the show, he's good at reading individual contestants, so probability is not really at play in a significant way.

  • 1,360
  • 13
  • 23
  • What's funny is that in the end even this article fails to consider the 'Lazy Monty' scenario, where Monty always opens an empty door, but always the one closest to him. In that case, if you chose door 1, and Monty opened door 2 (which is closer to him than door 3), then it still doesn't help you to switch. – Bram28 Jun 22 '17 at 18:18
  • While the math is meaningless for a real-world Monty (unless Monty sets out rules explicitly,) the same reasoning works in the game of bridge, where it is commonly called "Restricted Choice," and is just as counter-intuitive. – Thomas Andrews Jun 06 '18 at 15:15

One of the most puzzling results in probability is that the probability of randomly (and with uniform probability) picking a rational number among the set of reals is zero. This is nicely explained here.

The set of rational numbers, for instance in the $\Omega=[0,1]$ interval is the countable union of disjoint singletons, and each one of these singletons has a probability of zero. Here is the proof:

A singleton, $\{b\}$, is a Borel measurable set with a Lebesgue measure of zero. The proof is as follows:

$$\Pr\left(\{b\}\right)=\Pr\left(\bigcap_{n=1}^\infty\left(b-\frac{1}{n},b + \frac{1}{n}\right]\cap \Omega\right)$$

is the probability of nested decreasing sets, allowing the use of the theorem of continuity of probability measures $(*)$ to re-write it as:

$$\Pr\left(\{b\}\right)=\lim_{n\rightarrow \infty}\,\Pr\left(\left(b-\frac{1}{n},b + \frac{1}{n}\right]\cap \Omega\right)$$

The probability of $$\Pr\left(b-\frac{1}{n},\,b + \frac{1}{n} \right)\leq \lim_{n\rightarrow \infty}\frac{2}{n}=0$$

Therefore, by countable additivity of measures $(**)$, the probability for the whole set of $\mathbb Q$ is zero:

$$\Pr(\mathbb Q\;\cap \Omega) = 0$$

The apparent paradox is that despite the infinity number of rational numbers in the $[0,1]$ interval, the probability of randomly choosing a rational is strictly zero.

The source is this great explanation here.

$(*)$ If $B_j, j = 1, 2,\cdots,$ is a sequence of events decreasing to $B$, then $\displaystyle\lim_{n\rightarrow \infty} \Pr \{B_n\} = \Pr \{B\} .$

$(**)$ For all countable collections $\{E_i\}_{i=1}^\infty$ of pairwise disjoint sets in a sigma algebra: $$\mu\left( \bigcup_{k=1}^\infty \, E_k \right)=\sum_{k=1}^\infty \mu(E_k).$$

Antoni Parellada
  • 7,767
  • 5
  • 30
  • 100
  • It's worth noting that the term for this situation is "almost never" - where the probability is zero, but the event isn't impossible. – Glen O Feb 13 '17 at 15:57
  • 5
    "even though this is not impossible in an experimental situation (throwing a dart)." — As physicist, I object. There are no darts with infinitely small dart point. And even if there were, quantum mechanics would not allow to hit an exact point with it. Not to mention that space itself might not be continuous. – celtschk Feb 13 '17 at 23:24

Consider the $d-$dimensional sphere, then as $d$ goes to infinity the mass concentrates on the equator $x_1=0$

  • 5,261
  • 5
  • 41
  • 59
  • 131
  • 8
  • 3
    It needs some preliminaries to explain this. But I think it's a counter intuitive example in geometry rather than probability... – MR_BD Feb 12 '17 at 19:17
  • 3
    I agree that to prove it you'll need some ( a lot) preliminaries . I guess that it depends on the formulation, if you say that the probability of choosing a point not on the equator goes to zero as n goes to infinity than it looks a lot more like probability – DStoln Feb 12 '17 at 19:23

I flip two coins. Given that one is heads, what's the probability the other one is heads?

Surprisingly, it's not $\frac12$.

  • 1,139
  • 8
  • 18
  • 9
    Written that way, it's prone to misunderstanding. Indeed, my *first* interpretation of "given that one is heads" was "I look at one of the coins, and see it is heads", in which case the probability that the other is also head *is* $1/2$. A better formulation would be: "Given that *any of them* is heads" as that doesn't suggest selecting a specific one. – celtschk Feb 13 '17 at 08:45
  • 2
    A lot of people disagree on how to interpret that English, though; that's part of why this is one of the infamous examples. –  Feb 13 '17 at 10:10
  • 3
    I think enough distinction can be achieved adjusting the end of the question to "what's the probability that both are heads?", as that removes the implication of "looking at one coin, then the other". – Glen O Feb 13 '17 at 15:21
  • Just to be clear, this is a variant of Bertrand's Box Paradox https://en.wikipedia.org/wiki/Bertrand's_box_paradox , only with heads/tails instead of gold/silver coins or red/white cards. – dberm22 Feb 14 '17 at 15:20
  • 2
    To be really clear, "given that at least one is heads, what's the probability that both are heads?" Same for the boy-girl problem, of course. – Ned Feb 14 '17 at 23:49

I remember to be confused the first time this riddle was proposed to me:

What is the probability of getting a poker hand with at least two aces, assuming that it contains at least one ace?

What if we know that it is indeed an ace of spades? does this information change the probability?

  • 619
  • 3
  • 13

This was grazed both in the original post and in the comments, but I'd like to flesh it out a bit.

Polya's Theorem

The simple random walk on $\mathbb Z^d$ is recurrent iff $d < 3$. Restated:

  • A random walk on $\mathbb Z$ started at the origin will eventually return to the origin (with probability $1$).

  • A random walk on $\mathbb Z^2$ started at the origin will eventually return to the origin (with probability $1$).

  • A random walk on $\mathbb Z^3$ started at the origin may return to the origin, but it also may not; the probability of an eventual return is about $0.35$.

Expected hitting time

Although a random walk on $\mathbb Z$ or $\mathbb Z^2$ that starts at the origin is guaranteed to return to the origin as above, the expected time to do so is infinite.

NB: This is a fairly elementary result in Markov Chain theory, so whether it is surprising is obviously up to the beholder. This is my favorite example of a random variable that is finite with probability 1, yet has infinite expected value.

Penney's Game, revisited

The Numberphile video explored the nontransitive nature of Penney's game; that is, for any 3-coin sequence I choose, you could respond by choosing a 3-coin sequence that is likely to appear first. There is another feature of Penney's Game that I find similarly counterintuitive: The expected wait time is not the same for all sequences of 3 coins. This is not a statement about the competitive likelihood of finding one before another, but rather a statement about the expected number of throws required to encounter a sequence in isolation. Specifically:

  • HHH / TTT require 14 flips on average

  • HTH / THT require 10 flips on average

  • The other 4 combinations require 8 flips on average

Aaron Montgomery
  • 9,575
  • 1
  • 14
  • 28