**Please ignore the old comments - I have radically changed my answer**. It's long, but bear with me - after reading this answer, you should feel intuitively-comfortable with almost all other probabilistic fallacies, not just this one.

The first and most important thing is to define what we mean by "probability." Let's define it to mean **"the expected percentage of positive outcomes when repeating an observation over a large sample"** *(Go ahead, read that again, more slowly)*. Also, let's call the method we use to chose this large sample the "model."

This may sound trivial, but it has some important implications. For example, what is the probability you will die before age 40? According to our definition, this question has no meaning: we can't observe *you* multiple times before age forty and record how many times *you* die. Instead, we observe other people below age 40, and record how many of *them* die.

So let's say we observe all other people on earth below age 40 (our model), and find that 1/2 of them die before hitting the big four-oh. Does this mean the probability of you dying before 40 is 50%? Well, according to this model, it does! However, this is hardly a fair model. Perhaps you live in a first-world country - now when we revise our model to include only people below 40 in first-world countries, your chances become a much-less-grim 1/10^{†}. But you are also not a smoker, and don't live in the city, and ride your bike on Sundays with your wife and crazy mother-in-law, which makes your chances 123/4567! That's much better... however, our model still doesn't take into account that you are also an avid skydiver ;)

^{†} I am pulling these numbers out of nowhere, they are not real statistics.

So the point is, asking for a "probability" only makes sense in the context of a certain model - a way of repeating our observation many times. Without that, asking for a probability is meaningless.

Now, back to the original question. Before we can assign a probability, we must choose a model; how are we choosing the families to sample from? I see two obvious choices, which will lead to different answers:

- Consider only families which have two children, one of whom is a girl, and choose one randomly.
- Consider only girls who have exactly one sibling, and choose one randomly.

Do you see the difference? In the first case, every family has the same probability of being chosen. However, in the second case, the families with two girls are *more* likely to be chosen than families with only one girl, because every *girl* has an equal chance of being chosen: the two-girl families have doubled their odds by having two girls. If children were raffle tickets, they would have bought two tickets while the one-girl families bought only one.

Thus, we should expect the probabilities in these two cases to be different. Let's calculate them more rigorously (writing `BG`

to mean "boy was born, then girl):

- There are three equally probable family-types:
`BG`

, `GB`

, and `GG`

*(*`BB`

was removed from consideration, because they have no girls). Since only one of the three has two girls, our chances of having two girls are **1/3**.
- We have the same possibilities as above, but now
`GG`

is twice as likely as `BG`

or `GB`

. Thus, the probabilities are `GG: 2/4`

, `GB: 1/4`

, and `BG: 1/4`

, meaning the probability of a girl-sibling is 2/4 = **1/2** *(alternatively, we could have noted that there are only two equally-probable possibilities for the sibling: boy or girl)*.

Here lies the fallacy: the model our intuition assumes is the second one, but the way the problem is worded strongly implies the first one. When we think in terms of "randomly choosing a family (over a large number of families)," our intuition meshes perfectly with the result.

Let's take a look at another similar problem

In a family of two children, where the oldest child is a girl, what is the probability they are both girls?

Once again, I can see two different, plausible models for observing our random sample:

- Consider only families with two children, the oldest of whom is a girl, and choose one randomly.
- Consider only girls who have exactly one (younger) sibling, and choose one at random.

So once again, the question strongly implies the first model, though arguments for either could be made. However, when we actually calculate the probability...

- Same as before, but we've also eliminated
`BG`

, where the first child was a boy. This leaves only two equally-probable possibilities, `GB`

and `GG`

. Thus, the chances are **1/2**.
- We've also eliminated
`BG`

from this case, leaving `GB`

and `GG`

. However, unlike the original question, `GG`

is no longer twice as likely, since the younger child can no longer be the one who was randomly chosen. Thus, `GB`

and `GG`

are equally likely, and we again have a probability of **1/2** *(alternatively, we could have noted that there are only two equally-probable possibilities for the sibling: boy or girl)*.

...we find that the choice between these two models doesn't matter, because in this case both have the same probability! Among two-child families with an older daughter, it doesn't matter if we randomly choose the family or the older-daughter, because in both cases there is only one of each per family.

Hopefully that all made sense. For bonus points, try applying this reasoning to the Monty Hall problem. What is our model - how are we making repeated observations? Why does it clash with our intuition?

For even **more** bonus points, try to figure out the following question; the math isn't too difficult, but it took me a long while to figure out why, intuitively, the answer should be correct:

In a family of two children, one of whom is a girl named Florida, what are the chances of two girls?

*(If you have troubles, post a question and leave a link to it in the comments, and I'll try to answer it there as best I can :) )*