217

Do men or women have more brothers?

I think women have more as no man can be his own brother. But how one can prove it rigorously?


I am going to suggest some reasonable background assumptions:

  1. There are a large number of individuals, of whom half are men and half are women.
  2. The individuals are partitioned into nonempty families.
  3. The distribution of the sizes of the families is deliberately not specified.
  4. However, in each family, the sex of each member is independent of the sexes of the other members.

I believe these assumptions are roughly correct for the world we actually live in.

Even in the absence of any information about point 3, what can one say about relative expectation of the random variables “Number of brothers of individual $I$, given that $I$ is female” and “Number of brothers of individual $I$, given that $I$ is male”?

And how can one directly refute the argument that claims that the second expectation should almost certainly be smaller than the first, based on the observation that in any single family, say with two girls and one boy, the girls have at least as many brothers as do the boys, and usually more.

Marc van Leeuwen
  • 107,679
  • 7
  • 148
  • 306
layman
  • 1,827
  • 2
  • 9
  • 4
  • 3
    See: http://www.cut-the-knot.org/Curriculum/Probability/FamilyStats.shtml – lulu May 21 '16 at 13:30
  • 48
    Choose a random person. Their siblings are equally likely to be male or female. Therefore men and women on average have the same number of brothers. – Théophile May 21 '16 at 13:43
  • 53
    The problem needs to be explicit about its family planning assumptions. In a setting where people stop as soon as their first boy is born, no boy will have any brothers and every girl will have exactly one. – Barry Cipra May 21 '16 at 14:08
  • Barry Cipra Interesting comment. The more information he can give, the better of course. A tip to the original poster, maybe change your question into "Do in general men or women have more brothers?". I believe the answer will be that you have an equal amount of brothers, and it does not matter whether you are born as a male or a female. Since giving birth to a boy or a female, does not change the probability of getting again a new boy or again a new girl. – Pedro May 21 '16 at 14:26
  • 15
    @Barry Cipra: The main rule when dealing with problems like the one formulated here, or exercises in a textbook, is Occam's razor: Make the simplest assumptions that are compatible with the givens. – Christian Blatter May 21 '16 at 15:28
  • 3
    @MJD: Thank you. It was hight time to come back to reason in this matter. – Christian Blatter May 21 '16 at 16:27
  • 12
    FWIW, I would be against this question being closed. It can be made an unambiguous combinatorics / probability problem in several ways, all of which I think are interesting. – Qiaochu Yuan May 21 '16 at 17:21
  • 1
    This is monty-hole problem https://en.wikipedia.org/wiki/Monty_Hall_problem. – Takahiro Waki May 21 '16 at 19:42
  • It seems to me that MJD's assumption #4 rules out the possibility that this is one of those hypothetical societies where each family stops having children after the first male birth. – David K May 21 '16 at 20:50
  • 1
    The problem is not clear to me. Say you have two males and one female, the female has two brothers, but there are two males with one brother each. So, while the female has more brothers than the males, there are more males with brothers. – user May 22 '16 at 04:10
  • 2
    I think, suitably interpreted, that requirements 1 and 4 contradict each other. If exactly half the individuals are men and half are women, then learning that you're male makes it slightly less likely that your siblings are male. – Qiaochu Yuan May 22 '16 at 04:23
  • 36
    In each family the women have more brothers than the men, but it does not follow the same is true for the population as a whole. This is [Simpson's paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox). – Julian Rosen May 22 '16 at 04:27
  • It evens out. Consider 8 families with 3 children all distributions. In a family with 1 girl, the girl will have 2 brothers. This accounts for 6 brothers. In a family with two girls each girl has 1 brother and 1 sister. The six girls in these families account for brothers and 6 sisters. But the three girls in the one family with 3 girls, each girl has 2 sisters. This accounts for 6 sisters. Among the 12 girls there are 12 sisters and 12 brothers total. Works for all numbers. A family with 25 girls will account for 600 sisters which is the same # of brothers in 25 with only 1 girl. – fleablood May 22 '16 at 05:23
  • Note that your assumption 4 is quite questionable if applied to our world. There are a few studies showing that men that have more brothers tend to have male children while men that have more sisters tend to have female children, so it **may not** be true that the sex of childrens is completely independent from one other or from the sex of other relatives. – Bakuriu May 22 '16 at 08:20
  • 1
    before I read the answers: I don't buy the "every child has equal chances of being a boy or a girl, therefore men and women have equal numbers of brothers" just because, say there is a family with two girls, and a family with two boys, men obviously have more brothers. But if each family had one boy and one girl, then only women have brothers. This is an interesting question. – sig_seg_v May 22 '16 at 10:01
  • @Najib (since you were just looking at the question): Shouldn't this be tagged [tag:probability]? – alexis May 23 '16 at 10:03
  • 2
    Is this question about mathematics or about actual state of the world? – Joker_vD May 23 '16 at 10:26
  • @ChristianBlatter: Since the outcome of an open question is not by any means a "given", that attempt to apply Occam's razor fails. It would be different is the question were: statistics show that men and women have on the average equal numbers of brothers; how can this be explained? – Marc van Leeuwen May 23 '16 at 10:45
  • 2
    Ignoring genetic patterns within families or sex-selective family planning, there is a further biological issue: individuals who are identical twins are more likely to have more same-sex siblings than other-sex siblings, and this will then marginally affect the overall position. – Henry May 23 '16 at 13:44
  • "I think women have more as no man can be his own brother." No women can be her own brother too so this intuitive observation is probably meaningless. – Kamil Szot May 23 '16 at 20:21
  • Suppose, the contrary that, that women have on average different amount of brothers than men. That would mean that your gender is not independent of the gender composition of the rest of your siblings (and vice versa). Which, since it's idealized math problem, is not true. – Kamil Szot May 23 '16 at 20:31
  • 1
    @Théophile but the sampling is not completely random. Without proof, it reminds me of the fact that, on average, your friends have more friends than you. – Davidmh May 24 '16 at 05:27
  • 1
    @Davidmh, the friends paradox comes about because the sample is *your friends*, and so it's biased. (People with lots of friends are more likely to be in it.) If you sample everyone instead of just one's friends, you get the real average. Theophile's argument samples everyone, and is correct. (But it's the same argument I have in my answer, so I'd say that :-)) – alexis May 24 '16 at 12:30
  • @BarryCipra says: "In a setting where people stop as soon as their first boy is born, no boy will have any brothers and every girl will have exactly one." This violates assumption 4 as the sex of a first child can not be male if the sex of the second is male. – Theodore Norvell May 24 '16 at 14:53
  • 1
    @TheodoreNorvell, I made that comment when the problem consisted of just the first two lines. The four background assumptions were added later. – Barry Cipra May 24 '16 at 15:00
  • 2
    No man can be his own brother, sure. But no woman can be her own brother either. >_> – Devsman May 24 '16 at 17:27
  • 1
    Assumptions 1 and 4 are at odds. Pick a person x. Tell me the gender of all the other people, and I can tell you the gender of x. Thus the gender of x is not independent of the gender of their siblings. Even if I didn't know the gender of the people from the other families, assumption 1 dictates a slight bias toward gender parity in each individual family. As the number increases, the bias gets smaller. I think this is the nub of why this is a confusing question. If you take assumption 1 seriously, then women will have (ever so slightly) more brothers than men. Assumption 4 leads to parity. – Theodore Norvell May 24 '16 at 21:12
  • Women have more brothers; men have more sisters. In a family of 2 boys/2 girls each boy has one brother but two sisters. Each girl has two brothers but only one sister. The issue is that boys "use up" one potential brother being themselves; likewise, girls use up one potential sister because she's her. – Bob Jarvis - Слава Україні May 25 '16 at 03:29
  • 1
    Assumptions 1 and 2 do not describe how real populations work. Are you looking for a real-world answer, or an answer based on these assumptions? – Kevin Krumwiede May 25 '16 at 05:13
  • 1
    @BobJarvis, read some of the answers. You're wrong. – alexis May 25 '16 at 19:30
  • 1
    http://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1038&context=bio_fac – DavePhD May 26 '16 at 15:31
  • Possible duplicate of http://math.stackexchange.com/questions/137568/sex-distribution – Anixx May 27 '16 at 14:33
  • @DavePhD Also, search for that paper (e.g. by title) using Google Scholar and then click on the "Reladed articles" link below the citation. –  May 27 '16 at 14:45

18 Answers18

96

So many long answers! But really it's quite simple.

  • Mathematically, the expected number of brothers is the same for men and women.
  • In real life, we can expect men to have slightly more brothers than women.

Mathematically:

Assume, as the question puts it, that "in each family, the sex of each member is independent of the sexes of the other members". This is all we assume: we don't get to pick a particular set of families. (This is essential: If we were to choose the collection of families we consider, we can find collections where the men have more brothers, collections where the women have more brothers, or where the numbers are equal: we can get the answer to come out any way at all.)

I'll write $p$ for the gender ratio, i.e. the proportion of all people who are men. In real life $p$ is close to 0.5, but this doesn't make any difference. In any random set of $n$ persons, the expected (average) number of men is $n\cdot p$.

  1. Take an arbitrary child $x$, and let $n$ be the number of children in $x$'s family.
  2. Let $S(x)$ be the set of $x$'s siblings. Note that there are no gender-related restrictions on $S(x)$: It's just the set of children other than $x$.
  3. Obviously, the expected number of $x$'s brothers is the expected number of men in $S(x)$.
  4. So what is the expected number of men in this set? Since $x$ has $n-1$ siblings, it's just $(n-1)\cdot p$, or approximately $(n-1)\div 2$, regardless of $x$'s gender. That's all there is to it.

Note that the gender of $x$ didn't figure in this calculation at all. If we were to choose an arbitrary boy or an arbitrary girl in step 1, the calculation would be exactly the same, since $S(x)$ is not dependent on $x$'s gender.

In real life:

In reality, the gender distribution of children does depend on the parents a little bit (for biological reasons that are beyond the scope of math.se). I.e., the distribution of genders in families is not completely random. Suppose some couples cannot have boys, some might be unable to have girls, etc. In such a case, being male is evidence that your parents can have a boy, which (very) slightly raises the odds that you can have a brother.

In other words: If the likelihood of having boys does depend on the family, men on average have more brothers, not fewer. (I am expressly putting aside the "family planning" scenario where people choose to have more children depending on the gender of the ones they have. If you allow this, anything could happen.)

alexis
  • 2,079
  • 12
  • 11
  • 8
    This is the best answer in my opinion. No extensive math required. I would have left out the comments about real life, because it doesn't add anything and makes your post unnecessary long. – Bernhard May 22 '16 at 05:40
  • 6
    I find it utterly unconvincing. We are not asking about males, but about brothers. – gnasher729 May 22 '16 at 07:44
  • I'm pretty sure based on biological mechanics that any viable couple is (equally) capable of producing male and female offspring, and that the distribution of genders in families only "seems" non-random. – sig_seg_v May 22 '16 at 10:06
  • 1
    @sig_seg_v, I should have sourced that better. Since this is math.se, let's just say that a defect in the X chromosome might be viable (as a recessive) in female offspring, but non-viable in males; and that womb conditions might favor one or the other sex. But [here](http://www.genetics.org/content/genetics/15/5/445.full.pdf) is a very old article showing mathematically that the gender of siblings is not randomly distributed: "The obvious explanation is that the sex of the offspring of the same parents is not wholly independent, but correlated." – alexis May 22 '16 at 12:17
  • 14
    @gnasher, the point is that the number of brothers you have *is* the number of males in the set of siblings you have. – alexis May 22 '16 at 12:18
  • 1
    Very good answer +1. However the point 1. is not stated clearly, as it is not clear whether $n$ is fixed first, or as a consequence of the choice. The fact that the conclusion is independent of the value of $n$ is _not sufficient_ to blur out the difference, because one might be falling into the trap of Simpons's paradox, as mentioned in the comment to OP by Julian Rosen. (But in fact that does not happen.) Therefore the value of $n$ should follow from the choice: you choose a random person, and let $n$ be the size of the corresponding set of siblings (including the person itself). – Marc van Leeuwen May 23 '16 at 11:59
  • Thanks Marc, you are correct of course: We choose $x$ first. (In fact the result would be the same, but there's no need to select an $n$-- it's irrelevant.) – alexis May 23 '16 at 12:04
  • I appreciated the "mathematically" part of this answer. In the "in real life" discussion in this answer, I would have liked a discussion of how parents do select for certain gender distributions in their kids. So, for example, if parents have 2 boys, they may be more likely to keep trying for a girl, but if they were to have had a boy and a girl they would have stopped. If this is true, then I believe girls actually have more brothers than boys. – 6005 May 23 '16 at 21:00
  • 3
    Thanks. If parents make childbearing decisions based on what children they have, _anything_ could happen. I didn't go into the topic to keep the answer short. (Anyway I don't know what most people _actually_ do, so I could only continue hypothetically.) – alexis May 23 '16 at 21:41
  • 1
    @gnasher if you flip a coin and get heads, are you more likely to get heads or tails the next time? Answer: neither. You have a 50/50 chance of getting heads or tails, for the next and all future flips assuming a fair coin, and hence an equal distribution for future flips. If we assume the odds of getting a boy and girl are equal and they do not depend on prior flips, then it's the same exact problem. – Jonathan Baldwin May 26 '16 at 14:23
  • I find it truly remarkable that this result does not depend on having the same probability for females versus males. – Enredanrestos Aug 23 '16 at 17:17
80

Edit, 5/24/16: After some thought I don't particularly like this answer anymore; please take a look at my second answer below instead.


Here's a simple version of the question. Suppose there is exactly one family which has $n$ children, of which $k$ are male with some probability $p_k$. When this happens, the men each have $k-1$ brothers, while the women have $k$ brothers. So it would seem that no matter what the probabilities $p_k$ are, the women will always have more brothers on average.

However, this is not true, and the reason is that sometimes we might have $k = 0$ (no males) or $k = n$ (no females). In the first case the women have no brothers and the men don't exist, and in the second case the men have $n-1$ brothers and the women don't exist. In these cases it's unclear whether the question even makes sense.


Another simple version of the question, which avoids the previous problem and which I think is more realistic, is to suppose that there are two families with a total of $2n$ children between them, $n$ of which are male and $n$ of which are female, but now the children are split between the families in some random way. If there are $m$ male children in the first family and $f$ female children, then the average number of brothers a man has is

$$\frac{m(m-1) + (n-m)(n-m-1)}{n}$$

while the average number of brothers a woman has is

$$\frac{mf + (n-m)(n-f)}{n}.$$

The first quantity is big when $m$ is either big or small (in other words, when the distribution of male children is lopsided between the two families) while the second quantity is big when $m$ and $f$ are either both big or both small (in other words, when the distribution of male and female children are similar in the two families). If we suppose that "big" and "small" are disjoint and both occur with some probability $p \le \frac{1}{2}$ (say $p = \frac{1}{3}$ to be concrete), then the first case occurs with probability $2p$ (say $2 \frac{1}{3} = \frac{2}{3}$) while the second case occurs with probability $2p^2$ (say $2 \frac{1}{9} = \frac{2}{9}$). So heuristically, in this version of the question:

If it's easy for there to be many or few men in a family, men could have more brothers than women because it's easier for men to correlate with themselves than for women to correlate with men.

But you don't have to take my word for it: we can actually do the computation. Let me write $M$ for the random variable describing the number of men in the first family and $F$ for the random variable describing the number of women in the first family, and let's assume that they are 1) independent and 2) symmetric about $\frac{n}{2}$, so that in particular

$$\mathbb{E}(M) = \mathbb{E}(F) = \frac{n}{2}.$$

$M$ and $F$ are independent, so

$$\mathbb{E}(MF) = \mathbb{E}(M) \mathbb{E}(F) = \frac{n^2}{4}.$$

and similarly for $n-M$ and $n-F$. This is already enough to compute the expected number of brothers a woman has, which is (because $MF$ and $(n-M)(n-F)$ have the same distribution by assumption)

$$\frac{2}{n} \left( \mathbb{E}(MF) \right) = \frac{n}{2}.$$

In other words, the expected number of brothers a woman has is precisely the expected number of men in one family. This also follows from linearity of expectation.

Next we'll compute the expected number of brothers a man has. This is (again because $M(M-1)$ and $(n-M)(n-M-1)$ have the same distribution by assumption)

$$\frac{2}{n} \left( \mathbb{E}(M(M-1)) \right) = \frac{2}{n} \left( \mathbb{E}(M^2) - \frac{n}{2} \right) = \frac{2}{n} \left( \text{Var}(M) + \frac{n^2}{4} - \frac{n}{2} \right) = \frac{n}{2} - 1 + \frac{2 \text{Var}(M)}{n}$$

where we used $\text{Var}(M) = \mathbb{E}(M^2) - \mathbb{E}(M)^2$. As in Donkey_2009's answer, this computation reveals that the answer depends delicately on the variance of the number of men in one family (although be careful comparing these two answers: in Donkey_2009's answer he's choosing a random family to inspect while I'm choosing a random distribution of males and females among two families). More precisely,

Men have more brothers than women on average if and only if $\text{Var}(M)$ is strictly larger than $\frac{n}{2}$.

For example, if the men are distributed by independent coin flips, then we can compute that $\text{Var}(M) = \frac{n}{4}$, so in fact in this case women have more brothers than men (and this doesn't depend on the distribution of $F$ at all, as long as it's independent of $M$). Here the heuristic argument about bigness and smallness doesn't apply because the probability of $M$ deviating from its mean is quite small.

But if, for example, $m$ is instead chosen uniformly at random among the possible values $0, 1, 2, \dots n$, then $\mathbb{E}(M^2) = \frac{n(2n+1)}{6}$, so $\text{Var}(M) = \frac{n(2n+1)}{6} - \frac{n^2}{4} = \frac{n^2}{12} + \frac{n}{6}$, which is quite a bit larger than in the previous case, and this gives about $\frac{2n}{3}$ expected brothers for men.

One quibble you might have with the above model is that you might not think it's reasonable for $M$ and $F$ to be independent. On the one hand, some families just like having lots of children, so you might expect $M$ and $F$ to be correlated. On the other hand, some families don't like having lots of children, so you might expect $M$ and $F$ to be anticorrelated. Without the independence assumption the computation for women acquires an extra term, namely $\frac{2 \text{Cov}(M, F)}{n}$ (as in Donkey_2009's answer), and now the answer also depends on how large this is relative to $\text{Var}(M)$.

Note that the argument in the OP that "no man can be his own brother" (basically, the $-1$ in $m(m-1)$) ought to imply, if it worked, that the difference between expected number of brothers for men and women is exactly $1$: this happens iff we are allowed to write $\mathbb{E}(M(M-1)) = \mathbb{E}(M) \mathbb{E}(M-1)$ iff $M$ is independent of itself iff it's constant iff $\text{Var}(M) = 0$.


Edit: Perhaps the biggest objection you might have to the model above is that a given person's gender is not independent of the gender of their siblings; that is, as Greg Martin points out in the comments below, requirement 4 in the OP is not satisfied. This is easiest to see in the extreme case that $n = 1$: in that case we're only distributing one male and one female child, and so any siblings you have must have opposite gender from you. In general the fact that the number of male and female children is fixed here means that your siblings are slightly more likely to be a different gender from you.

A more realistic model would be to both distribute the children randomly and to assign their genders randomly. Beyond that we should think more about how to model family sizes.

Qiaochu Yuan
  • 359,788
  • 42
  • 777
  • 1,145
  • 1
    Reading other people's answers to this this problem makes me feel super simple-minded and doubting if I should continue study mathematics.. :-D – Tesla May 21 '16 at 18:48
  • 5
    Don't worry. I like simple answer. – Takahiro Waki May 21 '16 at 19:10
  • 6
    @Sigma Relax, it's just that Qiaochu is brilliant. Just learn from the answer and be happy. – Ryan Reich May 21 '16 at 20:43
  • 6
    I've said this before, but this is in the running for *Best Answer To Any Question Ever*. – dgo May 21 '16 at 22:34
  • 4
    While this is a wonderful model, unfortunately it does not satisfy the background assumptions! The fact that $M$ and $F$ are independent random variables does *not* necessarily imply that "in each family, the sex of each member is independent of the sexes of the other members". – Greg Martin May 21 '16 at 23:06
  • @Greg: this is a somewhat delicate point. I think that depending on how that requirement is interpreted, it may or may not end up being a reasonable requirement. There's a Monty Hall-like effect here, I think. – Qiaochu Yuan May 22 '16 at 01:18
  • 1
    I agree that the requirement might end up being subtly unreasonable. – Greg Martin May 22 '16 at 08:19
  • 10
    I think the problem is worse: If the family has $n$ children, we must have $M+F = n$ so they are simply **not** independent. – alexis May 22 '16 at 12:57
  • 1
    I also agree that the requirement might end up being subtly unreasonable. I considered coming back to rephrase it, but there were several answers by then and I decided I would just have to live with my mistakes. – MJD May 22 '16 at 15:24
  • In the real world, male infants are more likely to die early (from in utero up through the first several years of life, then wars, then poor life choices - drinking, smoking, fighting).So there are more women, in general. And since the question asks about "men and women" not "male and female", perhaps we can assume we're talking about adults. So that should fairly confidently push the real world situation toward women having more brothers. – niels May 23 '16 at 20:25
  • @niels, indeed, in the real world there are (slightly) more men than women at young ages, and more women than men at every age group after 14 or so (iirc). But that does NOT push the situation toward women having more brothers. As long as the gender of each child is independent of the others, the number of brothers is equal at any gender proportion. (See my answer for the demonstration). – alexis May 24 '16 at 10:05
  • I would say that perhaps a simpler answer is women have more brothers, just like men have more sisters. Because children are not included in the count of children of the opposite sex, that tends towards more of the opposite sex. When a woman has a brother, there are `n` brothers in the family, where as when a man has a brother, there are `n - 1` brothers in the family, in relation to each male. –  May 24 '16 at 17:48
  • 1
    @Zymus: this argument is incorrect, and the entire discussion all of these answers are participating is about why. Suppose there are two families with gender distributions MMF and MFF. In the first family, females have two brothers and males have one; in the second family, females have one brother and males have none. Nevertheless the average number of brothers is the same for both men and women (it is $\frac{2}{3}$). This is because the first family is being counted twice in the average for men, since it has two men. As stated in my second answer, this is an example of Simpson's paradox. – Qiaochu Yuan May 24 '16 at 18:30
  • 1
    @QiaochuYuan, very bad choice of example: In *this* sample of families, women have an average of 1.33 brothers, men have an average of 0.66. If I only had two sisters, each of them would have one brother (me). For everyone else, let's say this yet again: When we choose sets of families at will, *anything can happen* to the relationship between the averages. It's all in how the genders are distributed. When particular (very reasonable) independence conditions hold, *then* the average is the same. – alexis May 25 '16 at 11:27
  • 2
    Maybe you meant to use bigger families: MMMF and FFFM. Now (if *I* didn't make a mistake) the average is 3/2 for both genders. This illustrates why the intuitive conclusion "women must necessarily have more brothers" does not hold; but again, *this is just one grab-bag of families.* It only tells us that we can push the proportions in any direction we want. – alexis May 25 '16 at 11:52
  • @alexis: oops. Thanks for the catch. – Qiaochu Yuan May 25 '16 at 16:27
61

I think I will argue that Cut the Knot is correct.

The distribution of sizes of families is not specified. Let's do some examples. Suppose all families have size 1. Then every boy has no brothers and every girl has no brothers. (So we certainly cannot conclude girls have more brothers than boys independently of the distribution of family sizes.)

Next example. All families have size 2. But random genders for the kids. Then there are four types of families, all equally likely: $$ B\qquad B\\ B\qquad G\\ G\qquad B\\ G\qquad G\ $$ I wrote B=boy, G=girl, in order of birth. Now, if we choose a boy at random, how many brothers does he have? There are four Bs in the list, two of them have 1 brother and two of them have no brothers. So a boy chosen at random has no brother with probability $1/2$ and has one brother with probability $1/2$. (Note: we chose a boy at random, not a family at random.) Now repeat, choosing a G at random. We get: A girl chosen at random has no brother with probability $1/2$ and has one brother with probability $1/2$. Again, it is false that a random girl has more brothers than a random boy.

If you like, do it again for families of size 3. A boy chosen at random has: no brother with probability $1/4$, one brother with probability $1/2$, and two brothers with probability $1/4$. Same for a girl chosen at random.

This works for any size families, as long as the sizes are fixed in advance, and the genders are random and independent.

GEdgar
  • 96,878
  • 7
  • 95
  • 235
  • This would be the basic version of the boy-girl-paradox. Lets assume that the above holds, _generally_. Then the answer to the question would be "in the margins", in which case I believe assumption 4 in OP:s question above possibly makes too large an impact on the result of this theoretical study, as any given man will—marginally—produce more sperms predisposed to one of the genders (sperms predisposed to carrying X or Y chromosomes, in direct correlation with the gender of the child). ... – dfrib May 21 '16 at 19:27
  • ... This could be used to argue that a first-born boy is marginally more likely to have a brother as sibling when(/if) the 2nd child arrives, and even more so in families of many boys (w.r.t. girls) and vice versa. Using this approach, we could argue that any boy is marginally more likely to have more brothers than any girl (however: making use of "marginal deviations", we could probably find a similar argument to point the other way around, in which case my discussion here falls apart. Leaving this as a note :). – dfrib May 21 '16 at 19:29
  • I added an answer with more general assumptions under which this works. – zyx May 21 '16 at 20:31
  • How do you prove that it works for any size family? It is plausible, but won't you need induction then? – Bernhard May 22 '16 at 05:36
  • 7
    The calculation for families of size $C$: the probability that there are $G$ girls is $2^{-C}\binom CG$, and each such family contributes $G$ girls with $C-G$ brothers each and $C-G$ boys with $C-G-1$ brothers each. Then one calculates $$\frac{\sum_{G=0}^C 2^{-C}\binom CGG(C-G)}{\sum_{G=0}^C 2^{-C}\binom CGG} = \frac{C-1}2 = \frac{\sum_{G=0}^C 2^{-C}\binom CG(C-G)(C-G-1)}{\sum_{G=0}^C 2^{-C}\binom CG(C-G)}$$ as the expected number of brothers in each case (both numerators are $\frac{C(C-1)}4$ and both denominators are $\frac C2$). – Greg Martin May 22 '16 at 08:24
  • 6
    Basically boys and girls have the same number of brothers in average because two brothers are counting like 2 men having a brother while a sister having a brother is counting like one girl having a brother. The double brothers case makes up for the higher probability of having a brother and sister scenario. – ChiseledAbs May 22 '16 at 16:32
  • 2
    @Chisele, that's not it at all: In a family with two boys and a girl, the boys have one brother _on average_, the girl has two. What makes a difference is all the families with only boys (which add to the boy average), and only girls (which lower the girl average). – alexis May 22 '16 at 19:26
  • @alexissY ou are wrong telling me "that's not it at all". If you consider families with 2 kids there are 2 cases in which the sister has a brother and only one where there are 2 boys. Therefore people intuitively think that the girls have more brothers but forget that the 2 boys count as two men having a brother, thus it makes up for the higher probability of having a boy and a girl as kids. That's all I wanted to express in a comment. Now if you still claim it's wrong I want you to explain why it is and not you to start describing your own view without addressing mine. – ChiseledAbs May 23 '16 at 17:14
  • 1
    +1 for clarity, but it needs a little more to be complete since it only covers the case where all families are the same size. – Readin May 24 '16 at 04:34
  • @ChiseledAbs that was the missing piece of the puzzle to understand it intuitively. – StuperUser May 24 '16 at 09:00
  • Suppose there are 4 children. 1/2 are boys and 1/2 are girls. The answer above covers the case where family sizes are 2 and 2. What if the family sizes are 1 and 3. On average boys have 1/2 a brother and girls have 1 brother. Of course this violates the assumption that the numbers are large. As the numbers get bigger, the ratio approaches even. – Theodore Norvell May 24 '16 at 16:55
  • 1
    @Readin While not explicitly stated, it is certainly the case that if the average number of brothers in a $n$-person family is the same for boys and for girls, the average number of brothers in one $n$-person family and in one $m$-person family (and one $p$-person family, and one $q$-person family...) is the same as for boys and girls, and equal to the weighted average of the averages between family sizes. – P... May 24 '16 at 18:51
  • +1 for @Greg Martin, and here is a proof of his equation: Let $$f(x,y) = (x+y)^C$$, then it is easy to check that $$\frac{\partial^2 f}{\partial x^2}|_{x=1,y=1} = \frac{\partial^2 f}{\partial x\partial y}|_{x=1,y=1}$$, and if you expand the polynomial $f(x,y)$ and then take the derivatives, you get the numerators in Greg's equations. – Jay.H May 25 '16 at 20:46
  • And as alexis' answer shows, if you replace in the equation above $2^{-C}$ by $p^{(C-G)}(1-p)^G$, then the result still holds (with the term $(C-1)p$ in the middle of course). That is, men have on average as many brothers as women independent of a person being male/female with probability $1/2$. – Enredanrestos Aug 23 '16 at 17:27
24

I've been accused of overcomplicating the issue, so here's a shorter and different answer. This mostly repeats things that have been said already, e.g. in zyx's answer. Consider any model where

  1. Children are male with probability $\frac{1}{2}$ and female with probability $\frac{1}{2}$,
  2. A given child's gender is independent of the gender of their siblings, and
  3. A given child's gender is also independent of the size of the family they're in.

With these assumptions, the expected number of brothers of any child is $\frac{F-1}{2}$ where $F$ is the expected size of a family (where we pick a random family by picking a random child). By linearity of expectation, the expected number of brothers of any man, as well as any woman, is also $\frac{F-1}{2}$. A simple example of a model satisfying all of these assumptions is a model where children are both distributed to a family uniformly and independently and also assigned a gender independently. Another example is a model where the size of each family is fixed, and genders are chosen independently.

The model in my previous answer (dividing a fixed pool of children with fixed genders between two families) does not satisfy assumption 2.

Interestingly enough, it can happen that there are no families with only male or only female children, meaning that in every family the women have more brothers than the men, and nevertheless it's still true that the expected number of brothers is the same for women and men. The reason is that when we compute the expectation for men, families with more male children are weighted more heavily. As Julian Rosen says in a comment on the OP, this is an example of Simpson's paradox.

Qiaochu Yuan
  • 359,788
  • 42
  • 777
  • 1,145
  • 1
    You are not saying anything more than Alexis in his answer. Only the link to Simpsons paradox is not worth of a new answer IMO – Bernhard May 22 '16 at 05:45
  • @Bernhard: Alexis does not identify assumption 3 above as important (gender being independent of family size), which I believe is an assumption I need to make the argument go through. It might in fact end up being unnecessary but I don't see that at the moment. – Qiaochu Yuan May 22 '16 at 07:20
  • Actually I thought that my answer was a repetition of your other one and MJD's edit. The point was just to state a precise set of assumptions for which M=F. @Bernhard 's comment, I don't see much connection of this answer with the one by Alexis, or any point in trying to shut down additional answers. – zyx May 22 '16 at 18:34
  • "Interestingly enough, it can happen that there are no families with only male or only female children, meaning that in every family the women have more brothers than the men, and nevertheless it's still true that the expected number of brothers is the same for women and men. The reason is that when we compute the expectation for men, families with more male children are weighted more heavily." **No,** the reason is that "it can happen" and expectation are different things. "It can happen" that I toss a coin 6 times and all 6 are Heads, but the expected number of Heads in 6 tosses is still 3. – alexis May 22 '16 at 20:48
  • @alexis: sorry, I was being a bit unclear. There are two expectations I might be taking when I say "expected number of brothers," one of which is an average over men (or women) and one of which is an average over distributions (of children, genders, etc), and in the last paragraph I'm referring to the first thing, with the distribution of genders etc fixed. An example is two families with gender distributions MMF and MFF. – Qiaochu Yuan May 22 '16 at 22:10
  • I don't believe you can interpret your statement this way and still say that "the expected number of brothers is the same for women and men." If you take the expected number of brothers over arbitrary distributions (aka arbitrary sets of families), the three assumptions you listed no longer hold and _anything_ can happen. It's no longer true that the expectation is equal for men and women. E.g., if each family has at most one boy, girls have more brothers; if each family only has children of a single gender, boys do. – alexis May 22 '16 at 22:17
  • Those counterexamples violate the conditions in the answer. @alexis – zyx May 23 '16 at 07:01
  • @zyx, yes they violate them, that's what I'm saying too. If you take particular sets of families chosen out of the distribution ("it can happen that ..."), anything goes: the conditions don't hold, and neither do the conclusions. – alexis May 23 '16 at 08:18
  • 1
    I thought your other answer overcomplicated things considerably and made arbitrary, uninteresting assumptions (why only 2 families?). However, I thought this answer was good. – 6005 May 23 '16 at 21:11
  • @QiaochuYuan, interesting what you say about the third condition. Ccould you give us a case that satisfies condition 2 (independence from sibling genders), violates condition 3 (i.e., gender does depend on family size), and the number of brothers is *not* the same for men and women? I don't believe condition 1 is necessary, incidentally; so that's optional. – alexis May 24 '16 at 12:15
  • 3
    @alexis: ah, yes, you're right, condition 1 is irrelevant. As for a situation where condition 2 holds but condition 3 doesn't, perhaps something like "every family first has one random child, and then if that child is male they have one more child, otherwise they have two more children" would work, although I haven't checked. – Qiaochu Yuan May 24 '16 at 18:34
  • This answer cannot be right, as it does not invoke or use a "no gender based family planning" hypothesis. As said in a comment to OP, the conclusion cannot be valid if gender based family planning is possible. – Marc van Leeuwen May 27 '16 at 05:19
  • @Marc: I think that's what condition 3 is for. – Qiaochu Yuan May 27 '16 at 05:44
  • You may be right, but it is a curious way of formulating it. Children are not dropped into pre-formed families; rather families form as a consequence of children being born. Saying the gender of the firstborn child is dependent on whether the family continues to grow after its birth is strange; if there is dependency, it is the other way around. I know that (in)dependence of random variables is a symmetric relation, but these are not obviously random variables (unless a the probabilistic setting is made more precise). – Marc van Leeuwen May 27 '16 at 08:04
10

$\DeclareMathOperator{\ex}{\mathbb E}\DeclareMathOperator{\Var}{Var}\DeclareMathOperator{\Cov}{Cov}$If there are two giant families, one with all the women in it and one with all the men in it, then men will have more brothers on average. So you cannot say anything in general, even assuming that there are the same number of men and women.

So we will need to make some assumptions. In order to find out what the correct assumptions are, I shall define some notation. For convenience, I shall use probabilistic notation, but we are really just talking about counting.

Let $M$ be the random variable corresponding to the number of men in a family chosen at random from the set of families, and let $W$ be the quantity corresponding to the number of women in a randomly chosen family. Let $\mathcal M$ denote the total number of men, let $\mathcal W$ denote the total number of women and let $\mathcal F$ denote the total number of families.

Important Note: The families themselves should be treated as constants. They are not random samples drawn from some kind of distribution or anything like that. When I use probabilistic notation, it is purely for the sake of convenience - I am interpreting the question combinatorially, and it so happens that probabilistic constructs such as sample variance do a good job of capturing certain combinatorial quantities that are relevant in this question.

If you like, we are working over the discrete measure space $(F, \mathcal P(F), \mathbb P)$, where $F$ is the set of families and $\mathbb P(A)=|A|/|F|$ for any $A\subset F$. $M$ is then the random variable defined by $M(f)=\textrm{number of men in $f$}$, while $W(f)=\textrm{number of women in $f$}$. $\ex,\Var,\Cov$ will all take their usual meanings as population mean, population variance and population covariance.

We want to compute the average number of brothers that each man has. To do this, we shall double-count the set $A$ of pairs $(m_1,m_2)$ such that $m_1$ and $m_2$ are brothers.

The first way we count this set will be by family. For a family $f$, let $m_f$ denote the number of men in family $f$. Then we have \begin{align} |A|&=\sum_f m_f(m_f-1)\\ &=\mathcal F\ex(M(M-1)) \end{align} since in each family $f$, we have $m_f$ choices for the first brother and $m_f-1$ choices for the second brother.

The second way to count this set will be by man. For a man $m$, denote by $b_m$ the number of brothers that $m$ has. Then we have $$ |A|=\sum_m b_m $$ Therefore: $$ \sum_m b_m=\mathcal F\ex(M(M-1)) $$ Then: \begin{align} \textrm{Average number of brothers a man has}&=\sum_m b_m/\mathcal M\\ &=\frac{\mathcal F}{\mathcal M}\ex(M(M-1))\\ &=\frac{\ex(M(M-1))}{\ex M} \end{align}

A similar argument gives us $$ \textrm{Average number of brothers a woman has}=\frac{\ex(MW)}{\ex W} $$ In this second case, if we write $w_f$ for the number of women in family $f$, then the number of pairs $(w,m)$ such that $w$ and $m$ are sister and brother is $m_fw_f$.

We would like to show that this second quantity is bigger than the first. Let's try and rewrite each quantity first.

\begin{align} \frac{\ex(M(M-1))}{\ex M}&=\frac1{\ex M}\left(\ex M^2-\ex M\right)\\ &=\frac1{\ex M}\left(\Var M+(\ex M)^2-\ex M\right)\\ &=\frac{\Var M}{\ex M} + \ex M - 1 \end{align}

while

\begin{align} \frac{\ex(MW)}{\ex W}&=\frac{1}{\ex W}\left(\Cov(M,W)+\ex M\ex W\right)\\ &=\frac{\Cov(M,W)}{\ex W} + \ex M \end{align}

Therefore, in order to ensure that the average number of brothers a woman has is greater than the average number of brothers a man has, we need to assume that: $$ \frac{\Cov(M,W)}{\ex W}>\frac{\Var M}{\ex M}-1 $$

We can turn this into a condition saying that the number of men per family has to have small variance $$ \Var M<\ex M+\frac{\mathcal M}{\mathcal W}\Cov(M, W) $$

Is this a reasonable assumption to make? Assuming that the number of men in a family is independent of the number of women in that family, we should expect $\Cov(M, W)$ to be small. And if we assume that the total number of men is roughly equal to the to the number of women, then $\mathcal M/\mathcal W$ will be roughly equal to $1$. So women have more brothers if and only if the variance in the number of men per family is less than $\ex M$.

This is a surprising result, since there's no reason to suppose that the variance in the number of men per family should be less than $\ex M$. In fact, numerical experiments indicate that $\Var M$ is quite often larger than $\ex M$, which means that in fact it will be men who have more brothers than women.

So the answer is that it depends on how large the families are, but your intuition that women will have more brothers is not true in general, even under fairly strong assumptions. If the number of men per family varies greatly from family to family, it is in fact men who have more brothers on average.

This surprising result can easily be confirmed with numerical experiment.


What's the explanation? Well, let's consider a situation in which the number of men per family varies greatly from family to family. Suppose we assume also that the number of men per family is uncorrelated with the number of women per family.

What this means is that there are going to be a significant number of families with lots of men and very few women, and a significant number of families with lots of women and very few men.

Now, within any given family, we know that the women will have more brothers than the men. But look at the overall contribution to the average:

  • The first type of family gives us lots of men with lots of brothers, and a small number of women with lots of brothers.
  • The second type of family gives us lots of women with very few brothers, and a small number of men with very few brothers.

So on average, the men will tend to have lots of brothers, and the women will tend to have very few brothers, as long as the variance in the number of men is large enough to counteract the extra brother that each woman has. With large enough families, that one extra brother counts for less and less, and the variance effect takes over, giving you precisely the opposite effect from what you expected.


Let's take one last look at the result. We found that the important factor was that the variance in the number of men per family should not be too large. What if this value were equal to zero? That would mean that every family had the same number $a$ of men, so then it would be true that women have more brothers, since every man would have $a-1$ brothers and every woman would have $a$ brothers.

The dependence on the covariance is interesting, too. By the Cauchy-Schwarz inequality, we have $\Cov(M,W)\le\sqrt{\Var M\Var W}$, and if we assume that $\Var M$ and $\Var W$ are roughly the same, then we have $\Cov(M,W)\le\Var M$. This extreme value occurs if $W=\lambda M$ for some positive constant $\lambda$. In that case, a simple counting argument shows us that women will have an average of one more brother than men.

John Gowers
  • 23,385
  • 4
  • 58
  • 99
  • @QiaochuYuan I'm assuming that the number of men, women etc. are all fixed beforehand. $\ex$ means sample mean, $\Var$ means sample variance and so on. I'm using probabilistic language for convenience, but I'm really talking about the sizes of sets. – John Gowers May 21 '16 at 17:50
  • If there are no men, then the sample mean for the number of men per family will be $0$, so I won't get a well-defined answer. I think I made it clear that this was my approach in my answer, but perhaps I can make it clearer still. – John Gowers May 21 '16 at 17:56
  • Ah, okay. This is a bit confusing, though; it could be clearer which things are random variables and which things are constants. – Qiaochu Yuan May 21 '16 at 17:58
  • 1
    @QiaochuYuan I hope I've made it clearer, but perhaps I ought to rewrite the whole answer. What do you think? – John Gowers May 21 '16 at 18:05
  • Why would an equality just after "to assume that" give the greater-than you're after? ​ ​ –  May 21 '16 at 18:20
  • @RickyDemer It wouldn't. If it makes you more comfortable, I'll change it to a strict inequality. – John Gowers May 21 '16 at 18:21
  • Where does the "1=" just before "Is this" come from? ​ ​ –  May 21 '16 at 18:26
  • @RickyDemer That was a typo introduced in my last edit. Let me fix it. – John Gowers May 21 '16 at 18:32
  • Now I'm pretty sure your assessment of that assumption is wrong. ​ Note that the variance of a p=1/2 [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution) will be exactly 1/2 of a the distribution's expected value, and for m chosen from an (n,1/2) binomial distribution, Cov(m,n-m) will equal Var(m). ​ ​ ​ ​ –  May 21 '16 at 18:47
  • @RickyDemer Can you tell me specifically what I have said that is wrong? – John Gowers May 21 '16 at 18:48
  • To start, I _should have_ said that Cov(m,n-m) will be _negative_ Var(m). ​ With that in mind, for fixed family size the two sides will be roughly equal, since the expected value will be approximately twice the left side, and the produce term will be approximately the negative of the left side. ​ I don't see any reason the small difference should favor one sign over the other. ​ ​ ​ ​ –  May 21 '16 at 19:09
  • @RickyDemer My men and women are not drawn from a binomial distribution, nor from any other kind of distribution. The situation is: I have the families set up as they are, and I choose one of the families or one of the people at random. The variance is then the poplation variance among all the families. – John Gowers May 21 '16 at 19:20
  • For the reasons given in my previous comment, I see no reason why "for large family sizes, we should expect the variance in the number of men per family to be much greater than" $\mathbb{E}(M)$. ​ ​ ​ ​ –  May 21 '16 at 19:29
  • @RickyDemer Ah, fair enough. I think that originates with an earlier error that I corrected. I'll remove that statement. – John Gowers May 21 '16 at 19:44
  • 1
    While this is a wonderful model, unfortunately it does not satisfy the background assumptions! The fact that $M$ and $F$ are independent random variables does *not* imply that "in each family, the sex of each member is independent of the sexes of the other members". – Greg Martin May 21 '16 at 23:07
  • 1
    @GregMartin The background assumptions weren't in the original question, but were added by another user. My discussion is more aimed towards trying to find which (combinatorial) background assumptions we can make that will allow us to derive the conclusion. – John Gowers May 21 '16 at 23:22
5

Girls, in general, have more brothers than boys$^1$.

$^1$Given some assumptions and MATLAB simulations.

Here's my code: everything before the $---$ is the iterative procedure, and everything after is the function $\text{fplan}$ where most of the substance is.

n = 500; % number of boys and girls, each
f = 500; % number of families total

b_avg_count = 0;
g_avg_count = 0;

iter = 10000;

for i = 1:iter

    [b_avg, g_avg] = fplan(n, f);

    b_avg_count = b_avg_count + b_avg;
    g_avg_count = g_avg_count + g_avg;

end

b_avg_avg = b_avg_count/iter;
g_avg_avg = g_avg_count/iter;

---

function [b_avg, g_avg] = fplan(n, f)

% n = number of boys and girls, each
% f = number of families total
% n/f should be 1 to average ~2 kids per family

b = zeros(1, n);
g = ones(1, n);

c = [b g]; % generate all the children (boy = 0, girl = 1)

c = c(randperm(length(c))); % randomize ordering of children

f_sep = randperm(length(c)-1, f-1); % choose where families begin/end
f_sep = sort(f_sep);

f_test = [0 f_sep 2*n];
max_size = max(diff(f_test)); % determines largest family size

f_matrix = zeros(f, max_size);

b_amt = 0; % number of brothers had by the boys
g_amt = 0; % number of brothers had by the girls

for i = 1:f % iterate through all f families

    fam_num = c(1+f_test(i):f_test(i+1)); % generate i-th family
    fam_offset = 2*ones(1, max_size - length(fam_num));
    f_matrix(i, :) = [fam_num fam_offset];

    fam_bn = length(fam_num(fam_num == 0)); % number of boys in family
    fam_gn = length(fam_num(fam_num == 1)); % number of girls in family

    b_amt = b_amt + fam_bn*(fam_bn - 1); % each of B boys has B-1 brothers
    g_amt = g_amt + fam_bn*fam_gn; % each of G girls has B brothers

end

b_avg = b_amt/n; % average number of brothers had by the boys
g_avg = g_amt/n; % average number of brothers had by the girls

end

What things did I assume?

  • There are an equal number of boys and girls.
  • The amount of families is known, and no family has no children.
  • You cannot be a brother to yourself.

(This first assumption is clearly very important. This case is very, very different from the case where we have $2n$ children and we assign each a gender with probability $\frac 12$.)

The gist of what I did was generate a vector with boys ($0$) and girls ($1$), randomize the order, and create families. How I created the families was to choose random distinct indices in my "children vector" and have families be all the children between these indices. This means that you can decide the average family size by choosing $\frac nf$, but the family sizes are "randomly" distributed.

Some cases ($200$ families with $200$ children) have deterministic outcomes (everyone is an only child), so choosing $n$ high, $f$ high, and $\frac nf \approx 1$ seems to be the most realistic way to approach the problem (average family size is $2$).

Given $1000$ children split between $500$ families, and iterating through this procedure $10000$ times, we get that:

$$\text{average brothers per male} = 0.9963$$ $$\text{average brothers per female} = 0.9984$$

This doesn't exactly answer the OP's question (in his/her view, the distribution of family sizes is entirely unknown), so take these results with a grain of salt. I think this model is realistic, however, and because of the high number of iterations and relative difference between the values, I'm heuristically convinced that girls have more brothers than boys.

I'm going to play around with the $n$ and $f$ parameters a bit, and if I find anything interesting, I'll try to whip up some graphs. The difference between the boy/girl values seems to increase as $f$ decreases, with deterministic behavior (obviously) at $f=1$.

anonymouse
  • 1,988
  • 9
  • 24
4

To elaborate on MJD's edit clarifying the question, here are some assumptions sufficient for males and females to be "equal".

  • no family planning based on gender. Within each family, the number of births is determined independently of the genders of children born to that family.

  • probability 1/2 for every birth to give M or F as the gender

  • the number of families and distribution of family sizes in the total collection of families is determined independent of the genders of children that are/were born.

  • we compare $E[FM]$ to $E[M(M-1)]$ in each family, not number of brothers per male or female. The latter are undefined if there is positive probability to have all the chilren be of one gender.

zyx
  • 34,340
  • 3
  • 43
  • 106
  • Nice attempt to make something precise. However, I have difficulty interpreting the last two points; the third seems to be restating the first one but using statistics rather than probability, and the last one is not an assumption but a clarification of what the question is. – Marc van Leeuwen May 23 '16 at 11:49
  • The third is different from the first. There could be, for example, a government that regulates the number of children in a family and does so not based on previous children in the family, but on previous children in *other* families. The last one can be read as "we assume that the question is A and not B". – zyx May 24 '16 at 19:44
4

Do men or women have more brothers?

So many of the answers have probabilities in them but I see no reason for any argument to be made on the basis of probability. We should simply count the number of men who have brothers and the number of women who have brothers, and compare those two numbers.

We can ignore all "only children" as they have no brothers.

Consider the people who have one sibling. Let's pair them off into sibling pairs, ordered by age. There are roughly equally many M-M, M-F, F-M and F-F pairs. Call the number of each n. From the first group we have 2n males with brothers. From the second group we have n females with brothers. From the third group we have n females with brothers. From the fourth group we have 0 brothers. So we have a total of 2n males with brothers and 2n females with brothers. Neither side is winning in the two-sibling groups, so let's ignore them too.

Now consider the people who have two siblings. Again, group them into triples ordered by age. MMM, MMF, MFM, MFF, FMM, FMF, FFM, FFF. Suppose there are n of each group again. In the first we have 3n males with brothers. In the second, third and fifth we have 2n males with brothers. So that's a total of 9n males with brothers. Similarly we have 9n females with brothers. So the 3-sibling groups are a wash as well. We can ignore them.

Does the pattern continue to 4, 5, 6 sibling groups? Can you either find a group where there are an unequal number of females with brothers as males with brothers? Or can you find a pattern that lets you prove that they must always be the same?

Eric Lippert
  • 3,428
  • 1
  • 18
  • 18
  • 9
    Probability appears naturally in this question when you model it by assuming that children are randomly assigned genders. You're doing it too! When you say "there are roughly equally many M-M, M-F, F-M, and F-F pairs," you're implicitly appealing to a probabilistic model where children are randomly assigned genders (independently) with probability $\frac{1}{2}$ each, together with the law of large numbers. – Qiaochu Yuan May 22 '16 at 04:27
  • Ths is a less clear version of GEdgar's answer. – durron597 May 23 '16 at 04:15
  • "0 brothers" is still a number of brothers (though I don't think it changes the answer to include them) – Glen_b May 26 '16 at 16:39
4

Consider a family with $n$ siblings of undetermined sex (each chosen independently at random). For each ordered pair of siblings, the probability that both are men is $1/4$. There are $n(n-1)$ ordered pairs, so the expectation for the sum of the number of brothers each man has is $n(n-1)/4$. Similarly, the probability that the first sibling in the pair is a woman and the second sibling in the pair is a man is also $1/4$, so the expectation for the sum of the number of brothers each woman has is also $n(n-1)/4$.

We have shown that in each family of fixed size, the expected number of brothers of men is equal to the expected number of brothers of women. So it is also true for the population as a whole: the expected number of brothers of women is the same as the expected number of brothers of men.

Julian Rosen
  • 15,329
  • 3
  • 34
  • 64
  • In your last paragraph you deduce from a conclusion valid in each family the same conclusion valid in the whole population. But how can you be sure conclusion is allowed, in the light of Simpson's paradox mentioned in your own comment to OP. (I'm not saying this is a case of Simpson's paradox, just that Simpson's paradox shows there is a danger in inferring a probabilistic conclusion from an exhaustive collection of conditional probabilities.) – Marc van Leeuwen May 23 '16 at 12:12
  • I'm just using linearity of expectation: the expectation for the total number of brothers of men in the population is equal to the sum over families of the expectation for the total number of brothers of men in that family, and similarly for women. I should have been more clear about the conclusion: I'm only concluding something about total brothers of men vs total brothers of women, not average brothers per man vs average brothers per woman (so this might not answer the OP's question, depending on interpretation). – Julian Rosen May 23 '16 at 12:53
3

There is some unspecified distribution of family sizes, according to which each person born may expect to have $s$ siblings, half of them brothers.

Christian Blatter
  • 216,873
  • 13
  • 166
  • 425
  • This assumes there is no correlation in the birth rate within families, nor the death rate within families, and therefore makes no allowance e.g. for gendercide, sex-selected abortion, bias in mortality rates in zones of war, regions where childbirth is dangerous, families with sex-dependent genetic diseases etc. – samerivertwice May 23 '16 at 13:56
  • 1
    @RobertFrost: yes, because the question explicitly excludes all those things in point 4. They would all lead to families in which the sex of a member either is somewhat correlated to the sex of the others, or else is not 50/50 as stated in point 1. – Steve Jessop May 23 '16 at 15:38
  • @SteveJessop I guess that's fair enough then. Although I guess we should point out to the OP why those supposedly "reasonable" assumptions are entirely unreasonable since they're they fundamental determinants of the answer to the question! – samerivertwice May 23 '16 at 15:41
  • 1
    @Robert: yes, of course we *know* that these assumptions aren't true. But I think the question is all about handling a faulty intuition (specifically, "women have more as no man can be his own brother"), so only a simple model is needed to address that intuition and falsify the wrong claim. – Steve Jessop May 23 '16 at 15:42
2

Assuming that there is an amount of men and women that does not equal $0$ in a family (not including the parents), let's say $3$ men and $3$ women, obviously a woman has $3$ brothers whereas one of her brothers only has $2$ brothers. Conversely, every man has more sisters than a woman does. To generalize it, if there are $n$ men and $m$ women for an arbitrary $n,m \in \Bbb N$, a woman has $n$ brothers while a man has $n-1$ brothers. Similarly, a man has $m$ sisters while one of his sisters has $m-1$ sisters.

Tesla
  • 1,255
  • 9
  • 33
  • You could also say that it is the same. Let us say you are drawn to be a female. Then you have a 50% chance to get a brother, and 50% chance to get a sister as a second child. It is not that if you are drawn to be a female, that the chance of getting a male/female increases or decreases. We know that the statistic says 50/50 but I wouldn't try to try to fit to that statistic and just answer that its the same. By flipping heads the chance of getting a tail also does not go up. – Pedro May 21 '16 at 13:57
  • I don't know whether this is a good model. Is it good to assume that there are equally many males/females in a household? If there always are, you are right. But it oftentimes is not the case. The only thing we can say is that at birth you have 50% of being a man or 50% of being a female I guess. – Pedro May 21 '16 at 14:13
  • @Pedro: No such assumption is actually being made: read the last two sentences of the answer. The sex distribution of births is irrelevant. – Brian M. Scott May 21 '16 at 14:45
  • @BrianM.Scott But how about this answer then? Why is assuming that there are an equal amount of men and women in a family not irrelevant then? That is equally, if not more irrelevant than assuming a certain birth distribution. If you are going to model child birth, you are not going to match this with a model on how many people where born in the past, but are going to assign a certain rate to giving birth to a boy/girl. – Pedro May 21 '16 at 14:47
  • 1
    @Pedro: Read the **whole** answer. The initial example, with $3$ of each, is just illustrative; the actual answer is in the penultimate sentence, the one that begins ‘To generalize it’. – Brian M. Scott May 21 '16 at 14:49
  • @BrianM.Scott The generalisation given is a false one. You are falling into a well known paradox. Please see: http://stats.stackexchange.com/questions/122722/please-explain-the-waiting-paradox/122725 . How on earth can you assume that the total number of people is fixed first. And then you calculate the probabilities afterwards. That is wrong. => you can't first fix the total amount, even if you draw them randomnly. – Pedro May 21 '16 at 14:54
  • 5
    @Pedro: I’m afraid that you are completely confused. No probabilities are involved in this question. None. It is a purely combinatorial question, involving only counting. – Brian M. Scott May 21 '16 at 14:55
  • @BrianM.Scott The answer is 50/50. How you can say that the chance is bigger that you have a sister, if you are a boy yourself. This violates sane reasoning. Think about it. Let me assume. I was born first, I am a boy. What is the chance of the next child being a girl? Yes the probabilistic chance of giving birth to a girl. What is the chance of the next child being a boy? Yes the probabilistic chance of giving birth to a boy. It is not harder than that. – Pedro May 21 '16 at 14:59
  • Well I primitively interpreted the question without involving any probabilities since you do not need any if you just consider a given amount of men and women in a family. Of course you do have to consider the probabilities if you pick random people and so on, but that was never the case in my interpretation of the question. – Tesla May 21 '16 at 15:05
  • 4
    @Pedro: I’m afraid that you simply don’t understand the question that **Sigma** is (correctly) answering. You are answering a different question. Both are reasonable interpretations of the OP’s question. – Brian M. Scott May 21 '16 at 15:06
  • 3
    This attempt at a general solution fails in the case that $m$ or $n$ are zero (that is, for any family of all men or all women). See GEdgar's post and alexis' comment beneath it. – Daniel R. Collins May 23 '16 at 05:55
  • 1
    If this is an answer to a particular interpretation of the question, it would be good to clearly state that interpretation in the answer. In particular because the question has been undergoing rather drastic edits. Personally I have great difficulty in interpreting the question in any way that does not involve any probability or statistics. One possibility would be "in population X, the person with the most brothers is a women", which is true for many, but not all, specific values for X. But I don't think the question says that. – Marc van Leeuwen May 23 '16 at 11:33
  • 1
    @DanielR.Collins I disagree; even in the case that $m$ is zero, each woman of the family (of which there are none) has $m$ brothers, and each man has $m-1$ brothers. Since the assumption is contradictory in one case, there is no problem with whatever conclusion by principle of explosion. – Mario Carneiro May 25 '16 at 01:16
2

Your assumption that females have more brothers is likely correct, but the reasoning you used is not valid.

"However, in each family, the sex of each member is independent of the sexes of the other members."

I don't believe that this is a valid assumption, in a society that assigns so much importance so much about gender.

Some in society might say "I will have children until I have a son, and then stop"

Possibilities are:

M (1/2)

FM (1/4)

FFM (1/8)

FFFM (1/16)

And so on...

Average number of brothers:

Males: 0

Females: 1

Even if one were to go with a slightly different plan "I will have 2 children, and then have up to 2 more if I don't yet have a son" The odds are

MF (1/4) MM (1/4) FM (1/4) FFM (1/8) FFFM (1/16) FFFF (1/16)

Average number of brothers:

Males: 4/16

Females: 1/4+1/4+1/8+1/16 = 11/16

Women have nearly 3 times as many brothers as men under these situations.

In scenarios where parents decide on number of children for reasons unrelated to gender, and where the breakdown is actually 50/50 (it isn't exactly, it depends on things like age, diet, stress etc), your assumption is incorrect - men cannot be their own brother, but females cannot be their own brother either.

You cannot apply the assumption "families on average have equal numbers of males and females" to a particular family where you know the gender of one person in it, but have no information about the number of children or the gender of the others. You could try to apply the same logic to coin flips and this should highlight the issue more clearly. An average series of X coin flips will turn up equal numbers of heads and tails, but having seen the result of one of the coin flips, I now know that this series of X coin flips is more likely to have more heads, that is, the series of the other X-1 will still have even numbers of heads as tails.

Scott
  • 129
  • 4
  • In response to your example: Possibilities are: M (1/2) FM (1/4) FFM (1/8) FFFM (1/16), your maths is right but the rule is unrealistic. A family is more likely to say "I will have two or three children and if we still don't have at least one son we'll keep going until we do. I think this will reverse the bias in your result. – samerivertwice May 23 '16 at 13:52
  • The first rule is clearly unrealistic, some males have brothers after all and some females don't. The second was intended to be at least a little realistic for some fraction of society. But realistically, there will be a large number of different scenarios, with similar qualities. I'm not sure that that your scenario would reverse the bias? Surely any system which involves sometimes stopping when you have your first male child will lead to more families with only 1 male child, and thus males having fewer brothers? Am I missing something? – Scott May 23 '16 at 22:45
  • My bad re the unrealistic rules I didn't read the question properly. Re the other I've a feeling i was thinking it'll lead to brothers with more brothers but like wrong now I come back to it. – samerivertwice May 24 '16 at 06:22
2

There are many factors affecting this matter but the fundamental determinant of the answer is whether correlation within families is positive or negative, i.e. are parents having one male offspring more likely to have another male offspring, in relation to the overall population mean.

This could be due to a genetic disposition to having male children within certain families for example, or because of geographical variances in mortality rates.

If we look at countries where female children are aborted or killed after birth for example, having a son makes having another son slightly more likely since given that the family has a living son, it is more likely that that particular family is a practitioner of so-called "gendercide". Thus brothers will have more brothers and fewer sisters.

The male predisposition for more dangerous occupations and activities is probably the most important cause of correlation. For example in a region where there is war, men will generally be killed in greater proportion and therefore there will be a significant population of women with fewer brothers compared with the global population. Since this region will not contain so many men with fewer brothers (since they too are dead), the impact will be to create more women with fewer brothers.

The contra argument would be that in regions where childbirth is dangerous there will be men with fewer sisters, since death in childbirth will again be co-rrelated within the family based on the simple fact that families are likely to be geographically local to each other. As a result, there will be fewer sisters around, and more men there having fewer sisters.

The most obvious way of balancing these arguments is to look at the mortality rate. The fact that men die on average younger than women provides a strong argument that the effect will make more women have fewer brothers than vice versa.

We only therefore require the assumtions a) that correlation within a family is positive - e.g. because obviously family membership is correlated with geographical proximity, and b) that variations in mortality rate are correlated with geographical proximity and that will be sufficient to prove that women have fewer brothers than men do on the basis of men having a higher mortality rate than women.

So the answer is, men have more brothers than women do.

samerivertwice
  • 8,241
  • 2
  • 20
  • 55
1

In the case that we choose a person randomly and that the intra-family probabilities for each family are also $50\%:50\%$ there should be: $p(boy)= p(girl)=50\%$

If the family size then is $n$ we would expect $n/2$ boys and girls respectively, or $n/2 - 1$ excluding the gender randomly picked. This gives:

$$(n/2-1) \cdot \frac{1}{2} + (n/2) \cdot \frac{1}{2} = \frac{1}{2}\cdot(n-1)$$

Which is we'd expect half of the remaining siblings $(n-1)$ to be girls.

However the conditional probabilities become $p(boy) = 0\%$ and $p(girl) = 100\%$ if it is given that the sibling picked was a girl. Then our calulation becomes:

$$(n/2-1) \cdot \frac{0}{1} + (n/2) \cdot \frac{1}{1} = \frac{1}{2}\cdot(n)$$

We can leave the case $p(boy)=100\%$ as an exercise we will land with $\frac{1}{1}(n/2-1)$

The fraction of the expected values $\frac{E[boys|girl]}{E[boys|boy]}$ will be $\frac{n-1}{n/2-1} = \frac{2n-2}{n-2}$. Since the numerator is larger for all non-negative integers $n$ (you can convince yourself by drawing a graph or using calculus) we can conclude we should expect more boys if we randomly picked a girl than if we randomly picked a boy.

mathreadler
  • 24,082
  • 9
  • 33
  • 83
  • 1
    If you think the population is 50:50 and then picked from random and the choice was a girl, to be truly rigorous, technically that requires that you amend your best estimator of the population mean to reflect the new evidence so now girls are to be judged more likely than boys. – samerivertwice May 23 '16 at 13:59
  • I assume a priori that there is 50:50 also at family level ( I should probably write that down ), then sample and it happens to be a girl. That new information skews the probabilities. For various reasons that 50:50 assumption could be correct or not, but in lack of evidence of the contrary I went with it. In the extreme case you _could_ in theory imagine a 50:50 average society where half the families have 100% boys and half have 100% girls. – mathreadler May 23 '16 at 14:11
  • It's Bayesian probability I was describing but I guess the point I am making is that your argument about the effect of choosing a boy or a girl is virtually equal and opposite across the entire population to the Bayesian effect if you choose to allow for it. – samerivertwice May 23 '16 at 15:07
1

If the (entirely unreasonable) assumptions 1-4 are taken as given and you want to examine only the properties of the fact that you have taken one "man" from your population, then we have to discard reality and the answer to this question is fundamentally down to the debate between Bayesian and conventional probability. This depends on how "confident" you are in your rule that men and women are 50:50, and do you modify that estimator of the population mean in response to evidence, or do you hold it fast and adjust your knowledge of your sample in response to the sample of one taken.

If we are taking assumptions 1-4 then we have to ask how much of reality are we discarding?

If you discard it all then you are a computer with limited information and you probably have total confidence in the 50:50 rule. When you take a man out under these circumstances, you know there to be one more woman in the remaining population and therefore women have more brothers than do men.

A family is a tiny sample of say 2-4 siblings from a population of several billion. The fact that the brother is taken out of the population modifies the likelihood only very marginally. But again, we are introducing the reality of a population of billions of people and a small family, and it is unclear how much of reality you want to eliminate.

If however you put less confidence in your 50:50 rule, the fact that you have taken a man from the population proves that a man exists and makes your best estimator of the percentage of men greater than 50%, but with assumptions 1-4 this leaves the answer unchanged.

Introduce a little bit more of reality and you know the 50:50 rule is the population mean deduced by observation and that the distribution has a binomial distribution very closely approximating the normal. As such taking one man out leaves the distribution unchanged and therefore women and men have the same average number of brothers.

If you carry on introducing more of reality however, you will arrive at my other answer which is the correct answer - men have more brothers than women do.

samerivertwice
  • 8,241
  • 2
  • 20
  • 55
0

This is really a comment but is too long (and I don't have the rep).

Monte Carlo confirms @Donkey_2009's answer; men have more brothers on average.

I built a simple (and probably clumsy) excel spreadsheet with 2000 families, each with a random number (0...5) each of sons and daughters. Thus the family size (only counting kids) varies between 0 and 10. Within each family I calculated the number of brothers and sisters of boys and girls, then averaged them over all families. Typical results with this family size are that boys have 2.65 brothers and 2.45 sisters, similarly for girls. The spreadsheet is here. Typical formulae are:

nBoys (column C) =INT(6*RAND())
nGirls (column D) =INT(6*RAND())
n_bros|B (column E) =IF(C2=0,0,C2*(C2-1)) [if no boys then zero, else Boys(Boys-1)]
n_bros|G (column F) =IF(D2=0,0,C2*D2) [if no girls then zero, else Boys*Girls]

I'd be grateful if someone could check my calculations.

NL_Derek
  • 143
  • 4
  • 1
    "each with a random number (0...5) each of sons and daughters" -- this violates the questioner's axiom 4, "in each family, the sex of each member is independent of the sexes of the other members". For example, if there are 0 boys among the others then the conditional probability of being a boy is certainly different from if there are 5 boys among the others (in which case it's zero). So it's an interesting test, since it illustrates that the assumptions built into your model, and that can affect the outcome, are not always easy to stick to. – Steve Jessop May 24 '16 at 12:26
  • ... exaggerating what you did, suppose instead of 0 .. 5 you'd chosen 0 .. 1 each of boys and girls in a family. Then clearly no boy has a brother, but some girls do, and so girls *would* have more brothers on average than boys. So I'm surprised you came up with the result you did in your simulation, I'd intuitively expect a slight trend to the reverse. – Steve Jessop May 24 '16 at 12:28
  • @Steve you are right about axiom 4; still thinking about a solution. But I did some more tests and the bigger the families, the stronger the "more brothers" effect; as Donkey_2009 pointed out, the men with n-1 brothers come to dominate the *global* average. – NL_Derek May 25 '16 at 21:04
0

I'm going to provide an answer using basic probability rules and the Binomial distribution. Let $n$ be the number of children in a family.

Let $B$ be the number of boys among the $n$ children in a family - obviously, $B\leq n$, and I assume that $$ B\sim \textrm{Binomial}(n,p) $$ where $p$ is the probability of a male child. Now, I will use $M$ and $F$ to refer to male and female children, respectively.

$$ P(B=m)=\binom{n}{m}p^m(1-p)^{n-m}\\ P(M)=p\\ P(M|B=m)=\frac{m}n $$

And so, it is easy enough to see that $$ P(B=m|M)=\frac{m\binom{n}m p^m(1-p)^{n-m}}{np} = \binom{n-1}{m-1} p^{m-1}(1-p)^{n-m} $$ And so, we have $$ E(B-1| M) = \sum_{m=1}^n (m-1) P(B=m|M) = (n-1)p $$ This is all as expected.

Similarly, $$ P(B=m|F)=\frac{(n-m)\binom{n}m p^m(1-p)^{n-m}}{n(1-p)} = \binom{n-1}{m} p^{m}(1-p)^{n-1-m} $$ so, $$ E(B|F) = \sum_{m=0}^{n-1} m P(B=m|F) = (n-1)p $$

Therefore, the number of brothers is the same for men and women, so long as independence is maintained. Based on the assumptions made in the question, we have $p=\frac12$, so both work out to $\frac{n-1}2$.

Note that this result is for a fixed $n$, and so the distribution of family size does not change the result.

Also note that the result changes if we change the distribution. For example, if $P(B=m)=\frac1{n+1}$ (for a uniform distribution of number of boys in a family of size $n$), then $E(B-1|M)=\frac{2(n-1)}3$ and $E(B|F)=\frac{n-1}3$, so men will have twice as many brothers as women will.

In reality, it is likely that a small bias in favour of men having more brothers will occur as the probability of a male child in one family is likely fixed but specific to that family, and so children of the same gender are slightly more likely than children of different genders.

Glen O
  • 11,786
  • 27
  • 38
0

Correct me if I'm wrong but this is just a more complicated version of the Boy or Girl Paradox. https://en.wikipedia.org/wiki/Boy_or_Girl_paradox

For sake of simplifying the problem, let's say each family only has two children. Then, if you randomly take a family and ask if there's a girl, then what's the chance that she has a brother? Probability will obviously be 2/3.

On the other hand, say a family pops out a girl, then they pop out another child. Probability is obviously 1/2. This is just another paradox which shows that the exact random event generation matters.

So, to address OP's question, OP needs to answer which scenario is occurring.

JobHunter69
  • 3,114
  • 20
  • 57