100

I have trouble understanding the massive importance that is afforded to Bayes' theorem in undergraduate courses in probability and popular science.

From the purely mathematical point of view, I think it would be uncontroversial to say that Bayes' theorem does not amount to a particularly sophisticated result. Indeed, the relation $$P(A|B)=\frac{P(A\cap B)}{P(B)}=\frac{P(B\cap A)P(A)}{P(B)P(A)}=\frac{P(B|A)P(A)}{P(B)}$$ is a one line proof that follows from expanding both sides directly from the definition of conditional probability. Thus, I expect that what people find interesting about Bayes' theorem has to do with its practical applications or implications. However, even in those cases I find the typical examples being used as a justification of this to be a bit artificial.


To illustrate this, the classical application of Bayes' theorem usually goes something like this: Suppose that

  1. 1% of women have breast cancer;
  2. 80% of mammograms are positive when breast cancer is present; and
  3. 10% of mammograms are positive when breast cancer is not present.

If a woman has a positive mammogram, then what is the probability that she has breast cancer?

I understand that Bayes' theorem allows to compute the desired probability with the given information, and that this probability is counterintuitively low. However, I can't help but feel that the premise of this question is wholly artificial. The only reason why we need to use Bayes' theorem here is that the full information with which the other probabilities (i.e., 1% have cancer, 80% true positive, etc.) have been computed is not provided to us. If we have access to the sample data with which these probabilities were computed, then we can directly find $$P(\text{cancer}|\text{positive test})=\frac{\text{number of women with cancer and positive test}}{\text{number of women with positive test}}.$$ In mathematical terms, if you know how to compute $P(B|A)$, $P(A)$, and $P(B)$, then this means that you know how to compute $P(A\cap B)$ and $P(B)$, in which case you already have your answer.


From the above arguments, it seems to me that Bayes' theorem is essentially only useful for the following reasons:

  1. In an adversarial context, i.e., someone who has access to the data only tells you about $P(B|A)$ when $P(A|B)$ is actually the quantity that is relevant to your interests, hoping that you will get confused and will not notice.
  2. An opportunity to dispel the confusion between $P(A|B)$ and $P(B|A)$ with concrete examples, and to explain that these are very different when the ratio between $P(A)$ and $P(B)$ deviates significantly from one.

Am I missing something big about the usefulness of Bayes' theorem? In light of point 2., especially, I don't understand why Bayes' theorem stands out so much compared to, say, the Borel-Kolmogorov paradox, or the "paradox" that $P[X=x]=0$ when $X$ is a continuous random variable, etc.

shangq_tou
  • 1,035
  • 1
  • 14
user78270
  • 3,718
  • 2
  • 18
  • 27
  • 19
    It’s elementary. How interesting it is is subjective I guess – Jonathan Hole Jan 27 '21 at 16:10
  • 2
    Note that there are particular situations where doing experiments of the variable $\cdot|B$ is difficult, and so it might be interesting (economically, if you want) to do experiments of the variable $\cdot|A$. That is my view, at least. – fcz Jan 27 '21 at 16:17
  • 39
    Why “Bayes” gets talked about a lot has more to do with Bayesian inference methods and the Bayesian interpretation of probability than just the theorem itself. – spaceisdarkgreen Jan 27 '21 at 16:17
  • 11
    The mammogram example is not artificial. See https://opinionator.blogs.nytimes.com/2010/04/25/chances-are/ – Ethan Bolker Jan 28 '21 at 16:06
  • 9
    When building a breast cancer test, you'd likely want to test it in a known positive and known negative population, rather than a random sample of the population. If you just tested randomly, you'd need a *lot* more data, since you'd only be able to evaluate the prognostic value on positive patients in 1% of your entire cohort. The breast cancer example isn't contrived - Bayes theorem will let you apply fixed characteristics *of the test* to populations with very different characteristics, which is often what's done when optimizing and applying a medical test, for example. – Nuclear Hoagie Jan 28 '21 at 17:26
  • 1
    Bayes' theorem is not so interesting by itself but the Bayesian perspective for inference can be used to frame many well-known problems in machine learning and statistics. Learning the underlying distributions of input data and using them for inference is a key part of many fields and those techniques are in wide use today as well as many other techniques that are derived from this idea such as graphical approaches and sampling techniques for numerical estimation. At their core, all these rely on Bayes' theorem. – syntonicC Jan 28 '21 at 21:54
  • 13
    mandatory xkcd: https://xkcd.com/1132/ – Franky Jan 29 '21 at 13:12
  • There is an element of [synecdoche](https://en.wikipedia.org/wiki/Synecdoche) here. "Bayes Theorem" is a small (but important) part of an entire approach to probability and statistics. For historical reasons, this approach to statistics has become identified with this one part. – John Coleman Jan 29 '21 at 13:30
  • 2
    You assume that you can run experiments to measure all the quantities of interest. Bayes lets you reason about things while not having full information. In order to know the probability that one has breast cancer given the result of a mammogram, you need to conduct a huge number of tests where there is a huge number of ppl who actually have cancer. Bayes lets you get the result with things that are much easier to test and get statistically significant results. Also, you avoiding having to do something like give a huge number of people cancer just to get around not having a good sample size. – iheanyi Jan 29 '21 at 16:20
  • It's not the mathematical statement of the theorem that's interesting, it's its implications. I'm sure fields like quantum physics are full of formulas that look pretty straightforward mathematically, too. – user253751 Jan 30 '21 at 02:02
  • 1
    @Franky: While that comic is funny, it's actually an inaccurate characterization of frequentists. I'm not exactly one, but a proper frequentist does not conclude as shown in the comic. In fact, the correct rebuttal to that comic's characterization of frequentists is [this](https://xkcd.com/882), namely that one should always *expect* 5% of conclusions based on 95% confidence intervals to be false. So why on earth would we use a 95% confidence interval to assess whether the sun has gone nova or not?? – user21820 Jan 30 '21 at 04:23
  • Semi-related: You might like **[my question](https://stats.stackexchange.com/q/43471/10636)**. – user541686 Jan 30 '21 at 12:10
  • BTW I wonder whether "accorded" rather than "afforded" is the appropriate word here. – Michael Hardy Feb 28 '21 at 17:47

8 Answers8

105

You are mistaken in thinking that what you perceive as "the massive importance that is afforded to Bayes' theorem in undergraduate courses in probability and popular science" is really "the massive importance that is afforded to Bayes' theorem in undergraduate courses in probability and popular science." But it's probably not your fault: This usually doesn't get explained very well.

What is the probability of a Caucasian American having brown eyes? What does that question mean? By one interpretation, commonly called the frequentist interpretation of probability, it asks merely for the proportion persons having brown eyes among Caucasian Americans.

What is the probability that there was life on Mars two billion years ago? What does that question mean? It has no answer according to the frequentist interpretation. "The probability of life on Mars two billion years ago is $0.54$" is taken to be meaningless because one cannot say it happened in $54\%$ of all instances. But the Bayesian, as opposed to frequentist, interpretation of probability works with this sort of thing.

The Bayesian interpretation applied to statistical inference is immune to various pathologies afflicting that field.

Possibly you have seen that some people attach massive importance to the Bayesian interpretation of probability and mistakenly thought it was merely massive importance attached to Bayes's theorem. People who do consider Bayesianism important seldom explain this very clearly, primarily because that sort of exposition is not what they care about.

Michael Hardy
  • 1
  • 30
  • 276
  • 565
  • That is very interesting, thank you for your answer! – user78270 Jan 27 '21 at 18:50
  • 9
    For a more "grounded" example, I like to use "the probability that candidate X wins the election" - elections are deterministic, so if you imagine holding the election many times under identical circumstances, the same people will vote or not-vote in the same ways and you will get the same results, every time. But surely the various polling models we see on Fivethirtyeight and similar sites cannot be completely meaningless (even if we might criticize them for other reasons). So Bayesian probability is required for them to make sense. – Kevin Jan 28 '21 at 07:26
  • 11
    Did I miss a joke here or is there a typo in the first sentence? The two phrases in quotes are identical ... Should the second occurence of "Bayes' theorem" be "Bayesian statistics"? – CL. Jan 28 '21 at 09:47
  • 9
    @CL.: I don't think there's a typo - I think it's drawing a distinction between "what you think of as the importance given to (etc)" and "the actual importance given to (etc)" – psmears Jan 28 '21 at 11:13
  • @psmears Oh, thanks! After misunderstanding the sentence the first time I read it, I became so focused on the quoted parts that I overlooked the context. – CL. Jan 28 '21 at 11:56
  • 13
    So what does “what is the probability that there was life on Mars two billion years ago” mean according to the Bayesian interpretation? – Sweeper Jan 28 '21 at 15:00
  • 2
    @Sweeper : Answers may vary. It is a degree of believe assigned to an uncertain proposition. Often you'll hear people say this is subjective. Others may say there are logical reasons for assignment of some particular number. Others is such things as improper priors, considering those to be epistemically objective. – Michael Hardy Jan 28 '21 at 18:12
  • 1
    @CL I'm still unsure what is meant by that sentence. Is it saying that it's a mistaken perception and that it's not really afforded massive importance? Is it saying that it's a correct perception and that there's good reason to afford it massive importance? Something else? There's surely a more straightforward and less wordy way to make whatever it's meant to convey. – JimmyJames Jan 28 '21 at 21:09
  • @JimmyJames Given that I misunderstood the sentence in the first place, I'm not sure if I'm in the position to explain it now. But here we go: To make the sentence less wordy, I’ll write `X` instead of the words in quotes (without the leading “the”). Then the statement is “You are mistaken in thinking that what you perceive as the `X` is really the `X`”. Or put very simply: “You may think that `X`, but this is actually a misperception.” – CL. Jan 29 '21 at 07:31
  • 11
    I'm not a fan of saying something like the life-on-Mars question is meaningless in the frequentist interpretation. When we talk about the probability of life on Mars, it's meant to be understood as the probability _given the information we have available_, i.e. a conditional probability. Frequentists would interpret this as, out of all possible histories (or historical models) that are consistent with the information we do have, in what fraction of them did life on Mars exist two billion years ago? Not to take anything away from the Bayesian approach, but it's far from meaningless. – David Z Jan 29 '21 at 10:13
  • 2
    @DavidZ I was about to say the same thing, although I would have given a slightly different frequentist interpretation. I don't think the set of all possible histories can really be modelled as a finite set. It's more like, of all planets in the galaxy which have all of the same properties that Mars does (all of the properties which are factored into your estimate of the probability of life on mars), what fraction of them supported life at some point? – Jack M Jan 29 '21 at 13:42
  • @CL Right. So does that mean it's not important? That seems unlikely to be what was meant. It's not 'massively' important? I'm not sure what that means exactly. 'Massive' roughly means 'great' or 'much' but probably it's been interpreted as meaning exaggerated or undue importance in this answer. At best the phrase is imprecise. But repeating it twice doesn't clear anything up and doesn't give us much of a clue as to how this answer's author understands the phrase. If the answer means that the importance afforded to this idea is appropriate, it should just say that. – JimmyJames Jan 29 '21 at 15:30
  • My understanding is that the first sentence means the following: My feeling that massive importance is afforded to something having to do with Bayes' theorem is correct. What is incorrect is my interpretation that what people are so excited about is the statement of the theorem itself, and its applications in counterintuitive brainteasers such as the mammogram example I posted. Michael's answer posits that the massive importance is instead afforded to the general philosophy behind Bayesian inference. – user78270 Jan 29 '21 at 15:52
  • 2
    The claim, then, as I see it, is that: (1) While Bayesian inference also has "Bayes" in its name and uses Bayes' theorem as a crucial ingredient, there is a lot more going on there in terms of probabilistic/statistical philosophy than what my answer would suggest. (2) The latter is what people are mostly (or at least to a significant degree) excited about in popular science and when teaching Bayes' theorem in undergrad. probability. – user78270 Jan 29 '21 at 15:55
  • 2
    @JackM Yeah you're right that the set of all possible histories may not be representable or finite - I sort of glossed over that. In practice we'd work with some theoretical model that proposes a finite or at least measurable set of histories and associated probabilities. But anyway, the main point I wanted to make is that you can apply some form of frequentist logic even when the real population size is one, because there's still a set of models. That can be argued around for Mars, sure, but it definitely applies when talking about e.g. the universe. – David Z Jan 29 '21 at 18:45
  • @DavidZ : Are you a native speaker of English? – Michael Hardy Jan 30 '21 at 21:49
  • @MichaelHardy Why does it matter? – David Z Jan 31 '21 at 06:15
  • @DavidZ : You wrote: "That can be argued around". I wonder if this is merely an instance of someone's unfamiliarity with standard English, or are people recently starting to use the word "around" where what has been standard is "about"? One other instance of that is on this present page, and I seem to vaguely recall having seen that somewhere else recently. – Michael Hardy Jan 31 '21 at 06:23
  • @MichaelHardy "...argued around" isn't a common phrase, sure, but as far as I know, it's not so rare that I would expect it to confuse people. That has been the case as long as I can remember. Its appearing twice on this page is almost certainly nothing more than a coincidence. "...argued about" is, of course, a different phrase with an entirely different meaning. – David Z Jan 31 '21 at 06:33
70

While I agree with Michael Hardy's answer, there is a sense in which Bayes' theorem is more important than any random identity in basic probability. Write Bayes' Theorem as

$$\text{P(Hypothesis|Data)}=\frac{\text{P(Data|Hypothesis)P(Hypothesis)}}{\text{P(Data)}}$$

The left hand side is what we usually want to know: given what we've observed, what should our beliefs about the world be? But the main thing that probability theory gives us is in the numerator on the right side: the frequency with which any given hypothesis will generate particular kinds of data. Probabilistic models in some sense answer the wrong question, and Bayes' theorem tells us how to combine this with our prior knowledge to generate the answer to the right question.

Frequentist methods that try not to use the prior have to reason about the quantity on the left by indirect means or else claim the left side is meaningless in many applications. They work, but frequently confuse even professional scientists. E.g. the common misconceptions about $p$-values come from people assuming that they are a left-side quantity when they are a right-side quantity.

Robert Mastragostino
  • 15,009
  • 3
  • 31
  • 52
  • How do we know P(Hypothesis)? – Jonas Frey Feb 22 '21 at 22:15
  • 2
    @JonasFrey We don't, which is why frequentist statistics doesn't like it. This formula tells you how to move from a set of old probabilities to a set of new ones, but doesn't tell you where to start. In practice we either treat all hypotheses equally (frequentist) or propose a particular P(Hypothesis) based on past experience as an explicit input into our model (bayesian). You can also imagine this as P(Hypothesis|past data) so that we are chaining new data onto old results. That can be metaphor or literal depending on circumstance. – Robert Mastragostino Feb 26 '21 at 02:55
25

You might know only $\Pr[A\mid B]$ and not $\Pr[B\mid A]$, not because someone "adversarially told you the wrong one", but because one of those is a natural quantity to compute, and the other is a natural quantity to want to know.

I am about to teach Bayes' theorem in an undergraduate course in probability. The general setting I want to consider is when:

  • We have several competing hypotheses about the world. (Several candidates for $B$.)
  • If we assume one of these hypotheses, then we get a nice and easy probability problem where it's easy to find the probability of $A$: some observations that we've made. (Outside undergraduate probability courses, "nice and easy" is a relative term.)
  • We want to figure out which hypothesis is likelier.

The mammogram example might be natural, but it's less obviously natural because we have to track down where the numbers that are given to us come from, and ask why we couldn't be given the other quantities in the problem. So here are some examples where we have fewer numbers coming to us out of thin air.

  1. Suppose you are communicating over a binary channel which flips bits $10\%$ of the time. (This part is given to us out of nowhere, but it's the natural quantity to ask about first.) Your friend has several possible messages they might send you: these are the hypotheses $B_1, B_2, \dots, B_n$. You receive a message: that's the observation $A$. Then $\Pr[A \mid B_i]$ is just $(0.1)^k (0.9)^{n-k}$ if $B_i$ is an $n$-bit message that differs from the one you received in $k$ places. On the other hand, $\Pr[B_i \mid A]$ is the quantity we want: it will tell us how likely it is that your friend sent each message.
  2. You have a coin, and you don't know anything about its fairness. One possible assumption is that it lands heads with probability $p$, where $p \sim \text{Uniform}(0,1)$, but we could vary this. Then you flip the coin $n$ times and see $k$ heads. There are infinitely many hypotheses $B_p$, one for each possible $p$; under each of them, $\Pr[A \mid B_p]$ is just a binomial probability. Knowing the conditional PDF of $p$, which is what Bayes' theorem tells us, tells us more about how likely the coin is to land heads.
Misha Lavrov
  • 113,596
  • 10
  • 105
  • 192
20

There are two main issues here. One is that on a Bayesian interpretation of probability (this term doesn't reference the theorem, but they're both named for Bayes), probability quantifies how well we know individual events, not detailed available frequency statistics. The best-of-both-worlds hope, if you combine Bayesian and frequentist perspectives, is that past data give us the mammogram values you cited, and an individual woman can be diagnosed based on Bayes's theorem.

The second issue is that $P(A|B)$ need not be remotely close to $P(B|A)$. To wit:

  • A test that's usually right may still have most of its positives be false, which warrants some scepticism, as well as further testing.
  • Conflating $P(A|B)$ with $P(B|A)$ is a danger in the legal system. Will we arrest people based on accuracy, precision etc., even if their guilt is unlikely? Will "this evidence is unlikely if they're innocent" get them convicted, even though it may not mean their innocence is unlikely? And yes, this has had real-world fallout in both policing and court decisions.
  • Statistics tests what probability assumes (e.g. "if this is Gaussian then..."). Statistical tests often boil down to, "we can't measure the probability the null hypothesis is true, but we'll assess it based on the probbaility on the null hypothesis that data at least this surprising would occur". Indeed, which statement gets to be the null hypothesis is more about its facilitating such calculations than its being a "default" or "reasonable" assumption.
J.G.
  • 111,225
  • 7
  • 71
  • 132
  • 5
    I see. So perhaps the second reason I stated (i.e., confusing $P(A|B)$ and $P(B|A)$) is common enough and has dire enough potential consequences that it justifies extended discussion. – user78270 Jan 27 '21 at 17:28
  • 1
    I agree with the legal system danger of confusing $P(A|B)$ and $P(B|A)$. However, if (for example) you were to **stipulate** "this evidence is unlikely if the defendant is innocent" and you were also to **stipulate** that the evidence is *not* unlikely when the defendant is guilty, then the evidence does provide a math basis for concluding that the defendant is *probably* (i.e. greater than 50%) guilty. – user2661923 Jan 27 '21 at 18:49
  • 6
    @user2661923 But this is not true - or am I misunderstanding your statement? You can have $P(E|I) = 1/100$ (evidence unlikely if the defendant is innocent), $P(E|G) = 99/100$ (evidence not unlikely if the defendant is guilty), yet if the prior probability of guilt is sufficiently low (let's say the defendant was just randomly picked on the street), the conditional probability of guilt given evidence will be low as well - for instance for $P(G) = 1/1000$, we get $P(G|E) \approx 0.09$. – aekmr Jan 28 '21 at 11:05
  • @aekmr +1: very good catch - I totally overlooked your analysis - good rebuttal. In defense, I was influenced by the fact that a Police Dept is a political organization that hates to be embarrassed, so they won't generally arrest someone *at random*. However, your rebuttal certainly stands. – user2661923 Jan 28 '21 at 14:32
  • 2
    @user2661923 To be honest, I had to check my calculation a few times to verify I'm not saying something stupid. Which, together with the fact you made that mistake in the first place, seems like a good illustration that using Bayes' theorem in everyday situations isn't something people understand intuitively and that it benefits from a good exposition :). There are many examples of even well educated people getting conditional probabilities very wrong - to take the example of breast cancer screening from op, see [here](https://www.bbc.com/news/magazine-28166019). – aekmr Jan 29 '21 at 08:55
9

Let me start by a memory. From my undergraduate days, 30 years ago, I vividly remember the time when Bayes was introduced. We had spent a lot of time and effort on sampling theory and how to know if things could be proved. And to me, at the time, it always ended up that we needed to have a sample size of x (my remembrance was that a sample size of 7 often was the minimum).

To me Bayes represented a totally different approach which to me was more in alignment with my view of reality. In sampling we looked at groups, with Bayes we started with individual things. So for me this was a very eye-opening addition to the field of probability praxis (and theory of course, but that came later for me). The book we had, written by Raiffa I believe, was about decisition theory. 30 years later I still remember the discussion about whether to do one more test drilling in the oil field.

So, just maybe, in your curriculum the importance placed on Bayes is there to show that statistics does have several different branches, not only sampling theory or how present graphs as correct as possible.

ghellquist
  • 221
  • 1
  • 2
7

You are correct that Bayes' theorem follows trivially from axioms of probability that everyone accepts. The difference between Bayesans and frequentists is a cultural one. The actual mathematical axioms they subscribe to are trivially homologous.

The cultural divide is a pretty stark one though.

  • Frequentists tend to think computation is a dirty word and they dont care to analyse problems that they cannot approach analytically, so basically they would prefer to think that everything is a gaussian. Also some of them tend to do this funny numerology thing where they fetishise numbers like 0.01 and 0.05

  • Bayesians think that if they write down a uniform prior as a formula it looks more like real mathematics and less like a stupid assumption that rarely applies (appeals to 'entropy' make them feel great too); and they delude themselves into thinking that labelling part of their likelihood function a prior makes them special; as if frequentists couldn't multiply different likelihood functions together to get a joint one just fine.

Actual examples where a non-strawman version of either approach to the same problem yields a different result, do not actually exist. Because there are not actually any differences in the fundamental axioms they subscribe to. That being said it is not as if the language, computational tools, and modelling approaches you use are unimportant to guiding your thought process. Itd be better if teaching methods focussed more on said homology though.

  • 4
    Keith Winstein provides an [excellent, neutral explanation](https://stats.stackexchange.com/a/2287/258489) of the difference between the two approaches. His explanation was my first encounter with a neutral comparison between the two approaches and he has convinced me that neither approach is superior. – Brian Jan 28 '21 at 19:27
  • 1
    It's not cultural: Bayesians follow a degree-of-belief interpretation of probability whereas frequentists assign probabilities only when they can be interpreted as relative frequencies. – Michael Hardy Jan 30 '21 at 22:25
  • 1
    Those are nice words. And they might indeed be influential in steering ones mind in one direction or another. The mathematical axioms are homologous though. And ive never met a practicing statistician who strictly abided by those categorisations. – Eelco Hoogendoorn Jan 31 '21 at 10:51
1

Not exactly an answer to the posted question, but Bayesian ideology is important in many practical problems in artificial intel, including character recognition, medical diagnoses, and more, the key structure being a Bayesian inference network.

richard1941
  • 769
  • 4
  • 12
0

First see the comments following this answer, especially the last few comments. I was totally unaware that Bayes Theorem is simply a consequence of axioms around the definition of Conditional Probability. Based on this assertion, I can't refute the idea that the following problem can be solved without Bayes Theorem.


Hard to imagine attacking a conditional probability problem without it. Imagine traveling back in time 1000 years. You are the captain of a ship. You have two sailors, A and B that you independently use to predict rain.

A is right 90% of the time and B is right 80% of the time.
A says it will rain today, and B says it won't rain today.
Absent Bayes Theorem, and absent any info on how often (in general) it rains, how do you (intuitively) determine the chance that it will rain today? Clearly, the problem is well defined, so the problem has a meaningful answer. Absent Bayes Theorem, or anything like it, how do you compute the answer?

user2661923
  • 22,595
  • 3
  • 10
  • 30
  • 1
    (1/2) I accept that some exercises that are given in undergraduate problem sets are stated in such a way that Bayes' theorem is required. One example of this is the mammogram problem I posed in my question, and another is what you stated here. However, I think that this fits into the purely artificial "adversarial" type problem. Who is telling us that A is right 90% of the time and that B is right 80% of the time? In order to compute these two probabilities in the first place, you need to know something about how often it rains in general. – user78270 Jan 27 '21 at 16:57
  • 1
    (2/2) In short, if it is truly impossible to compute $P(A|B)$ directly, then it should not be possible to compute $P(B|A)$, $P(A)$, and $P(B)$ either. If someone is unable to provide information on how the probability that A\B is right is 90%/80% of the time (information with which you could compute the conditional probabilities you are interested in directly), then why should you trust that these numbers are accurate? – user78270 Jan 27 '21 at 17:01
  • @user78270 Absolutely not. You only need to know, by historical records how often A and B are independently right or wrong. I repeat, absent any information about how often it rains, the problem is clearly meaningful with the Bayes Theorem answer that probability of A being right is $$\frac{0.9 \times 0.2}{[0.9 \times 0.2] + [0.8 \times 0.1]}.$$ Try calculating this without Bayes Theorem. – user2661923 Jan 27 '21 at 17:01
  • My bad: we can indeed compute $P(\text{rain}|A\text{ predicts rain})$ by dividing the number of times it rained when $A$ said it would by the total number of times $A$ said it would rain. It is possible to do that while somehow neglecting to keep count of how often it rains in general. I'm still confused about a few things, however. What is the computation that you did in your comment (i.e., what are 0.2 and 0.1)? How do you compute $P(\text{rain}|A\text{ predicts rain and }B\text{ predicts no rain})$ without $P(\text{rain})$? – user78270 Jan 27 '21 at 17:25
  • @user78270 You are hung up on focusing on *rain*. Pretend instead that there is some unknown event which either will or won't happen, and that the Captain doesn't even know what the event is. $(0.9) \times (0.2)$ represents the combined prob. of the (presumably) independent events that A is right and B is wrong. $(0.8) \times (0.1)$ represents the reverse. Since it is given that there is no info on how often it rains, and no historical info on (for example) how often A is right when he says no rain vs when he says rain, the meaningful answer (based on the lack of info) is as I described. – user2661923 Jan 27 '21 at 17:32
  • 1
    I still don't understand where we're going with any of this, and how this relates to Bayes. You have defined two independent events, say $E_A=\{A\text{'s prediction is correct}\}$ and $E_B=\{B\text{'s prediction is correct}\}$, with respective probabilities 0.9 and 0.8. Now you've computed $P(E_A\cap E_B^c)/(P(E_A\cap E_B^c)+P(E_A^c\cap E_B))$. How does this relate to the initial problem and Bayes' Theorem? – user78270 Jan 27 '21 at 17:44
  • Come to think of it, in this interpretation, isn't what we're looking for just $P(E_A\cap E_B^c)$? Also, if this is how we frame the problem, then I don't see how the events $E_A$ and $E_B$ can be independent, since they are mutually exclusive (one predicts rain and the other does not, so they can't both be right). – user78270 Jan 27 '21 at 17:46
  • @user78270 Let D represent the event that A and B disagree and E represent the event that A is correct. The problem requires you to compute $$p(E|D) = \frac{p(\text{events E and D both occur)}}{p(D)} = \frac{0.9 \times 0.2}{[0.9 \times 0.2] + [0.8 \times 0.1]}.$$ – user2661923 Jan 27 '21 at 17:49
  • 1
    This is not an application of Bayes' theorem. The first equality is the definition of conditional probability, and the second uses the facts that (a) $P(A\cap B)=P(A)P(B)$ when $A$ and $B$ are independent, and (b) $P(E\cup F)=P(E)+P(F)$ when $E$ and $F$ are disjoint. – user78270 Jan 27 '21 at 17:56
  • @user78270 "The first equality is the definition of conditional probability": yes, where the definition is based on Bayes Theorem. "$P(A \cap B) = p(A)p(B)$ when A and B are independent." I intended "...that you independently use to predict rain" to signify that the predictions of A and B are independent of each other. Further, you are given no info that the predictions of A and B are not independent. Finally, examining the two ways that A and B can disagree, [A right, B wrong] and vice-versa, these are clearly **disjoint** events. – user2661923 Jan 27 '21 at 18:01
  • 2
    Yes, I understand where independence and disjointness are coming from. More importantly: the conclusion, then, is that we do not actually need Bayes' theorem in your problem at all. I would submit to you that the content of your answer then should exclusively focus on your claim that "the definition of conditional probability is based on Bayes' theorem." Without additional context this is in fact incorrect. Under the Kolmogorov/frequentist approach to probability, $P(A|B)$ is defined without any mention whatsoever of Bayes' theorem. – user78270 Jan 27 '21 at 18:25
  • 2
    Bayes’ theorem is then proved using the definition of conditional probability, not the other way around. The problem you posed can be solved entirely without Bayes’ theorem under the standard Kolmogorov axioms. In this case, I would consider editing your answer to focus on your claim that the definition of conditional probability comes from Bayes’ theorem (which, again, without additional information, such as “if we adopt a different axiomatization or philosophy than Kolmogorov’s”, is objectively incorrect.) – user78270 Jan 27 '21 at 18:25
  • @user78270 Very good rebuttal. I was totally unaware that Bayes Theorem was a consequence of axioms around conditional probability. – user2661923 Jan 27 '21 at 18:28
  • @user78270 Saying that definition A is used to prove theorem B is great when there is only one typical way to develop the theory - but it's just as valid, and in many ways more intuitive, to start by defining conditional probability by Bayes theorem, rather than deriving it. But the two approaches *are* equivalent - so the question might be which came first. And historically, while Bayes derived his theorem, Bayes predated Kolmogorov by a solid 2 centuries, it's a bit silly to say that the Kolmogorov axioms are used to derive Bayes theorem. – David Manheim Jan 29 '21 at 13:37
  • @DavidManheim Very interesting comment. This is an area where I am totally ignorant; all I can do is edit the start of my answer, referring to these comments (**done**), and then sit back and passively observe. – user2661923 Jan 29 '21 at 13:40
  • @DavidManheim To be honest, I still don't understand what it means to "start defining conditional probability by Bayes theorem rather than deriving it." Do you have any quick reference for an explanation of how this is done? After a quick googling it seems to me that [Bayes himself](https://en.wikipedia.org/wiki/An_Essay_towards_solving_a_Problem_in_the_Doctrine_of_Chances) proved his theorem by (a) deriving the usual formula $P(A|B)=P(A\cap B)/P(B)$ (ironically enough with a frequentist intuition), and then (b) writing down the short proof I have in the first paragraph of my OP. – user78270 Jan 29 '21 at 13:56
  • @user78270 You can just as easily start with a definition of Bayes theorem and derive the standard definition of conditional probability; the two are equivalent, so it doesn't matter which you do. The choice of the Kolmogorov axioms is one convenient choice, but it's not fundamental, and saying that Bayes Theorem is a consequence of the definition of conditional probability is no more true than saying that conditional probability is a consequence of the definition of Bayes theorem. – David Manheim Feb 09 '21 at 12:47
  • @DavidManheim I see, so first you introduce a collection of numbers $(P[A|B])_{A,B\subset\Omega}$ which "by definition" must obey the condition $P[A|B]=P[B|A]P[A]/P[B]$ (motivated by some Bayesian philosophy), and then you prove a kind of existence/uniqueness result showing that the only possibility is $P[A|B]=P[A\cap B]/P[B]$. – user78270 Feb 09 '21 at 19:04