143

There's an 80% probability of a certain outcome, we get some new information that means that outcome is 4 times more likely to occur.

What's the new probability as a percentage and how do you work it out?

As I remember it the question was posed like so:

Suppose there's a student, Tom W, if you were asked to estimate the probability that Tom is a student of computer science. Without any other information you would only have the base rate to go by (percentage of total students enrolled on computer science) suppose this base rate is 80%.

Then you are given a description of Tom W's personality, suppose from this description you estimate that Tom W is 4 times more likely to be enrolled on computer science.

What is the new probability that Tom W is enrolled on computer science.

The answer given in the book is 94.1% but I couldn't work out how to calculate it!

Another example in the book is with a base rate of 3%, 4 times more likely than this is stated as 11%.

Jim
  • 1,181
  • 2
  • 8
  • 6
  • 26
    Where does this question come from? Mathematically and colloquially, I'd expect 4x more likely to mean multiply the probability by 4, but that leads to a probability that is >1, which is not allowed. – TooTone Aug 12 '13 at 11:37
  • From a chapter in Daniel Kahneman's Thinking Fast & Slow, I'll edit the Q with more context – Jim Aug 12 '13 at 11:45
  • 11
    thanks looking at that chapter the author makes clear that "the description of Tom W is 4 times more likely for a graduate student in that field _than in other fields_" (my italics). He also says the problem is to do with Bayesian reasoning. – TooTone Aug 12 '13 at 12:32
  • Hopefully, there is a description of what the author means by "likelihood". There seems to be a definition that matches the results in your question (see my answer), but I would hope the author would mention this explicitly. – robjohn Aug 12 '13 at 12:32
  • 1
    Full quote is "Bayes’s rule specifies how ... base rates ... should be combined with the diagnosticity of the evidence, the degree to which it favors the hypothesis over the alternative. E.g., if you believe that 3% of graduate students are enrolled in computer science (the base rate), and you also believe that the description of Tom W is 4 times more likely for a graduate student in that field than in other fields, then Bayes’s rule says you must believe that the probability that Tom W is a computer scientist is now 11%. If the base rate had been 80%, the new degree of belief would be 94.1%." – TooTone Aug 12 '13 at 12:43
  • Thanks TooTone, sounds like you're enjoying it. Give it a read! – Jim Aug 12 '13 at 12:48
  • @Jim I might well do: I'm going to have some time on my hands and it looks like a good one! – TooTone Aug 12 '13 at 12:58
  • I was itching to write a tongue-in-cheek answer of 320% but I managed to refrain :P – Andrei Rînea Aug 12 '13 at 15:52
  • 13
    I read that losing 10% of your body weight makes you 30% less likely to get adult onset diabetes. Then I lost 30% of my bodyweight, so I'm practically immune to diabetes. – Ben Jackson Aug 12 '13 at 17:57
  • Is there any reason to find a way for the expression to make sense? – André Nicolas Aug 13 '13 at 05:51
  • 4
    Is there a specific reason for all the numbers to be in latex? It seems like the only thing it adds is rendering time... – Dason Aug 13 '13 at 16:52
  • Since I can't comment yet, let me post this in an answer, even though it's not. My apologies for this. You probably mean "4 times **as** likely". When you say "4 times **more** likely", you have not indicated by how much the likelihood is being increased four times. Indeed, "4 times more likely" means increasing the likelihood by an undefined amount four times, which results in an unknown. In contrast, "4 times **as** likely" means that the likelihood becomes four times as great, which I believe is what you mean. :) – Timo Aug 13 '13 at 10:04
  • 1
    @Dason: the numerals are different in TeX than not (I think the TeX numerals look better, but that is just my opinion). To keep things uniform when TeX is interspersed, people generally put all numbers in TeX. – robjohn Aug 14 '13 at 07:20

6 Answers6

173

The most reasonable way to match the answer in the book would be to define the likelihood to be the ratio of success over failure (aka odds): $$ q=\frac{p}{1-p} $$ then the probability as a function of the odds is $$ p=\frac{q}{1+q} $$ In your case the odds are $4:1$ so $4$ times as likely would be $16:1$ odds which has a probability of $$ \frac{16}{17}=94.1176470588235\% $$ This matches the $3\%$ to $11.0091743119266\%$ transformation, as well.


Bayes' Rule

Bayes' Rule for a single event says that $$ O(A\mid B)=\frac{P(B\mid A)}{P(B\mid\neg A)}\,O(A) $$ where the odds of $X$ is defined as earlier $$ O(X)=\frac{P(X)}{P(\neg X)}=\frac{P(X)}{1-P(X)} $$ This is exactly what is being talked about in the later addition to the question, where it is given that $$ \frac{P(B\mid A)}{P(B\mid\neg A)}=4 $$

robjohn
  • 326,069
  • 34
  • 421
  • 800
  • Why is q 4 * 4? – Jim Aug 12 '13 at 13:04
  • 9
    The $q$ from $80\%$ is $\frac{.8}{1-.8}=4$. $4$ times that likelihood would be $16$. – robjohn Aug 12 '13 at 13:10
  • 13
    I think $q$ may be better described as [*odds*](http://en.wikipedia.org/wiki/Odds) rather than as *likelihood* – Henry Aug 12 '13 at 18:32
  • @Henry: that's true. I've updated the answer. – robjohn Aug 12 '13 at 19:23
  • 1
    Your original answer was the best because of it's simplicity, now you've made too verbose with Math jargon! – Jim Aug 13 '13 at 15:42
  • 7
    @Jim: the original answer is still there (I've changed nothing above the dividing line). However, I was not really pleased with it since it only matched the numbers without any real mathematical support. I can delete the proof, but I really think the mention of Bayes' Rule is important to relate my original answer to the more precise statement of the question. – robjohn Aug 13 '13 at 15:50
  • 1
    would the downvoter care to comment? – robjohn Apr 24 '15 at 16:55
38

Daniel Kahneman's book mentions Bayesian reasoning. An answer using Bayesian reasoning is as follows:

Let $C$ be the event that Tom is compsci, $N$ be the event that he has a "nerdy" personality.

We are given $P(N|C)/P(N|\neg C)= 4$, which implies that $P(N|\neg C) = P(N|C)/4$.

By Bayes Theorem (and using the theorem of total probability to expand the denominator)

$$\begin{eqnarray*} P(C|N) &=& \frac{P(N|C) P(C)}{ P(N)} \\ &=& \frac{P(N|C) P(C)}{P(N|C)P(C) + P(N|\neg C) P(\neg C)} \\ &=& \frac{P(N|C) P(C)}{P(N|C)P(C) + 0.25 P(N|C)P(\neg C)} \\ &=& \frac{P(C)}{P(C) + 0.25 P(\neg C)} \\ &=& \frac{0.8}{0.8 + 0.25 \times 0.2} \\ &\approx& 0.9411765 \end{eqnarray*}$$

Similar reasoning in the 3% case leads to $P(C|N) = 0.03 / (0.03 + .25*.97) \approx 0.1100917$.

TooTone
  • 6,043
  • 1
  • 26
  • 49
21

The statement of the context (in my words) is as follows:

If you believe that $80\%$ of graduate students are enrolled in computer science (base rate), and you also believe that the description of Tom W is four times more likely for a graduate student in computer science than for a graduate student in other fields, then Bayes’s rule says you must believe that the probability that Tom W is a computer scientist is now $\approx94.1\%$.

Here is how to perform the Bayesian reasoning. Let $\rm CS$ be the event that a student is enrolled in computer science, and $\rm desc$ the event that [description] holds true of a graduate student. Then

  • The base rate says that $P(\rm CS)=80\%$.
  • The relative statement says that $P({\rm desc|CS})=4P({\rm desc|\neg CS})$

Thus $P(\neg{\rm CS})=0.2$ and $P({\rm desc|\neg CS})=0.25P({\rm desc|CS})$. Bayesian reasoning says that

$$\begin{array}{ll} P({\rm desc}) & = P({\rm desc~\&~CS})+P({\rm desc~\&~\neg CS}) \\ & =P({\rm desc|CS})P({\rm CS})+P({\rm desc|\neg CS})P(\neg{\rm CS}) \\ & =(0.8+0.25\cdot0.2)P({\rm desc|CS}) \\ & =0.85P({\rm desc|CS}) \end{array}$$

Bayes rule says that

$$\begin{cases} P({\rm desc|CS})= \frac{P({\rm desc~\&~CS})}{P({\rm CS})} \\ \phantom{blah} \\ P({\rm CS|desc})=\frac{P({\rm CS~\&~desc})}{P({\rm desc})} \end{cases}$$

Therefore

$$P({\rm CS|desc})=\frac{P({\rm desc|CS})P({\rm CS})}{P({\rm desc})}=\frac{0.8}{0.85}=0.9411764705882352\dots\approx94.1\% $$


Similarly, if the base rate was $3\%$ instead of $80\%$, the calculation would go as follows:

$$\begin{array}{ll} P({\rm desc}) & = P({\rm desc~\&~CS})+P({\rm desc~\&~\neg CS}) \\ & =P({\rm desc|CS})P({\rm CS})+P({\rm desc|\neg CS})P(\neg{\rm CS}) \\ & =(0.03+0.25\cdot0.97)P({\rm desc|CS}) \\ & =0.2725P({\rm desc|CS}) \end{array}$$

$$P({\rm CS|desc})=\frac{P({\rm desc|CS})P({\rm CS})}{P({\rm desc})}=\frac{0.03}{0.2725}=0.1100917431192660\dots\approx11\% $$

anon
  • 80,883
  • 8
  • 148
  • 244
  • Thanks, can you explain why the relative statement translates to P(desc|CS) = 4*P(desc | not CS)? Shouldn't it be the other way around? P(CS|desc) = 4*P(CS | not desc) -- "from this description you estimate that Tom W is 4 times more likely to be a CS student." So probability of being a CS student when observing the description is four times the base rate? – Sherwin Yu Aug 12 '13 at 15:32
  • 1
    @Sherwin: The relative statement is *the description [of Tom W] is four times more likely for a graduate student in computer science than for a graduate student in other fields*. That is, the probability of the description holding for a student, given the student is enrolled in computer science, is four times the probability the description holds for a student given the student is a different field. This is P(desc|CS)=4P(desc|~CS). OP's version - *from this description you estimate that Tom W is 4 times more likely* is an inaccurate characterization of the original problem (which I read). – anon Aug 12 '13 at 16:24
  • 1
    Note that TooTone gives a direct quote of the original problem in the comments above. – anon Aug 12 '13 at 16:27
  • Thanks for the clarification. – Sherwin Yu Aug 12 '13 at 17:25
19

Well, I'd say $80\%$ chance of success means failure $1$ out of $5$ times. $4$ times more likely means failure only $1$ out of $20$ times, so the new probability would be $95\%$.

Christian Chapman
  • 4,427
  • 2
  • 22
  • 43
walcher
  • 3,315
  • 8
  • 16
  • That makes sense, but in the book (see edit) the answer given is 94.1% – Jim Aug 12 '13 at 11:52
  • 8
    Unless that book gives an explanation and justification of that result, we're going to have to assume it is wrong. – DonAntonio Aug 12 '13 at 12:13
  • 39
    I used similar reasoning, but came up with the 94.1 solution: 80% chance of success means 4 successes for 1 failure. We want 4 times the success rate, which gives 16 successes for 1 failure. That gives the 94.1% rate that is being sought. – Beska Aug 12 '13 at 15:33
  • 4
    I think that's erroneous logic, because you've incorporated the failure into your multiplication. It's 4 times the chance of success, which is 4x4. The error comes because 1 in 20 times failure isn't 5 times a 1 in 4 chance. – NibblyPig Aug 12 '13 at 16:25
  • 21
    -1 by this logic, 5 times **less** likely would mean failure "1 out of 1" = 100% of the time. And 6x would be 120% failure-rate etc. @Beska has the correct logic. – BlueRaja - Danny Pflughoeft Aug 12 '13 at 18:36
  • 3
    This answer and the comments could be a very educating example for why mathematically precise notation is necessary. – Sarien Aug 13 '13 at 15:24
7

To solve the exercise you should be familiar with the concept of odds. Unfortunately the term "odds" is used without consistency by different groups of peoples and therefore it causes often confusion.

Quoting Wikipedia:

In statistics, odds are an expression of relative probabilities, generally quoted as the odds in favor. The odds (in favor) of an event or a proposition is the ratio of the probability that the event will happen to the probability that the event will not happen.

The initial odds that Tom is a computer science student are $$8:2$$ where the symbol $:$ should be read as to. To see that these odds correspond to the probabilities $80\%$ (for) and $20\%$ (against), divide both numbers with their sum: $$(8/10):(2/10) = 0.8:0.2 = 80\%:20\%$$

If it is $4$ times more likely that Tom is enrolled in computer science then the above ratio becomes $$(4\cdot8):2 = 32:2$$ which corresponds to the probabilities $$\frac{32}{32+2}:\frac{2}{32+2} = 0.9411:0.0589 = 94.11\%:5.89\%$$


Similarly if an event is $3%$ likely to occur, then it's odds can be expressed as $$3:97$$ Now if this event becomes $4$ times more likely, the odds become $$(4\cdot3):97= 12:97$$ which corresponds to a probability of $$\frac{12}{12+97}=0.11=11\%$$ for and a probability of $\frac{97}{12+97}=89\%$ against.

Jimmy R.
  • 35,117
  • 4
  • 30
  • 64
  • Very clear answer thanks! One question: You use arrow in some places and equal sign in others. Why s that? Would `(4⋅8):2 = 32:2` be wrong? – Stijn de Witt Jul 12 '17 at 23:24
  • @StijndeWitt No, it wouldn't. So, I changed it everywhere to $=$. Thank you for pointing that out. – Jimmy R. Jul 13 '17 at 08:03
0

The only way I see to make sense of this is to divide by $4$ the probability it does not happen. Here we obtain $20/4=5$, so the new probability is $95\%$.

Denis
  • 6,667
  • 1
  • 17
  • 22