In the best-selling book Thinking Fast and Slow (p. 205), Daniel Kahneman (a Nobel Prize winner in Economics) makes the following claim: 'Suppose you consider many pairs of firms. The two firms in each pair are generally similar, but the CEO of one of them is better than the other. How often will you find that the firm with the stronger CEO is the more successful of the two? . . . A correlation of .30 [between stronger CEO and more successful firm] implies that you would find the stronger CEO leading the stronger firm in about 60% of the pairs.'

How is the 'about 60%' computed under plausible assumptions?

Richard Hevener
  • 1,321
  • 8
  • 19

3 Answers3


As a first approximation:

We examine a pair of firms and pick one at random.   Let $A$ be the event that the firm is more successful and $B$ the event that the firm has the stronger CEO.   We are after $\mathsf P(A\mid B)$, the probability that a firm is the more successful given that it has a stronger CEO.

The measure of correlation is by definition:$$\begin{align}\rho ~=~ & \dfrac{\mathsf P(A \cap B)~-~\mathsf P(A)~\mathsf P(B)}{\sqrt{~\mathsf P(A)~(1-\mathsf P(A))~\mathsf P(B)~(1-\mathsf P(B))~}}\\[2ex] ~=~ & \dfrac{(\mathsf P(A \mid B)-1)~\mathsf P(B)}{\sqrt{~\mathsf P(A)~(1-\mathsf P(A))~\mathsf P(B)~(1-\mathsf P(B))~}}\end{align}$$

Now, half of every pair will have a stronger CEO, and half of every pair will be the more successful; just not necessarily the same half.   So $\mathsf P(A)=\tfrac 12, \mathsf P(B)=\tfrac 12$ and hence:

$$\begin{align}\rho ~=~& 2~\mathsf P(A \mid B)-1 \\[2ex] \mathsf P(A\mid B) ~=~ & \dfrac{1+\rho}{2} \\[1ex] ~=~& \dfrac{1+0.30}{2} \\[1ex] ~=~& 0.65\end{align}$$

More reasonably:

We might consider the successfulness, and the strength of the CEO, of any company $i$ to be jointly bivariate normal random variables ($A_i, B_i$) with identical though dependent distributions.   Then for any pair of companies $(i,j)$ we are looking for $\mathsf P(A_i>A_j \mid B_i>B_j)$

This would be obtained through a similar, but slightly more involved, procedure.


Let $\mathbf 1_A$ be the indicator random variable that event $A$ occurs. $$\begin{split}\mathsf E(\mathbf 1_A) &= 1\mathsf P(A)+ 0\mathsf P(A^\complement)\\ &= \mathsf P(A)\\[2ex]\mathsf {Var}(\mathbf 1_A)& = \mathsf E(\mathbf 1_A^2)-\mathsf E(\mathbf 1_A)^2\\&= 1^2\mathsf P(A)-1\mathsf P(A)^2\\&= \mathsf P(A)(1-\mathsf P(A))\\[2ex] \mathsf {Cov}(\mathbf 1_A,\mathbf 1_B) &= \mathsf E(\mathbf 1_A\mathbf 1_B)-\mathsf E(\mathbf 1_A)\mathsf E(\mathbf 1_B)\\&= \mathsf E(\mathbf 1_{A\cap B})-\mathsf E(\mathbf 1_A)\mathsf E(\mathbf 1_B)\\&= \mathsf P(A\cap B)-\mathsf P(A)\mathsf P(B)\end{split}$$

Graham Kemp
  • 119,730
  • 6
  • 49
  • 108
  • Could you explain why this definition of the correlation is correct (or guide me to a place where I'd find out)? I'm only familiar with the definition where the covariance and standard deviations are used and I don't understand why for instance $\sqrt{P(A)(1 - P(A))} = \sigma_A$. – titusAdam Feb 09 '18 at 16:09
  • See the postscript , @titusAdam – Graham Kemp Feb 09 '18 at 21:55
  • @GrahamKemp I think this correlation coefficient is interesting, and perhaps applicable to confusion matrix statistics. Is there a citable source you could refer me to that uses this coefficient? – Galen Dec 23 '20 at 20:06
  • 1
    @GrahamKemp, shouldn't the numerator in the second line of the definition of $\rho$ be $(P(A|B) - P(A))P(B)$ and not $(P(A|B) - 1)P(B)$? – Xin Yuan Li Jan 13 '21 at 20:11

For a randomly selected firm let $X$ be the CEO quality and $Y$ be the firm success (however these are measured). The author's assertion follows under the assumption that $(X,Y)$ has a bivariate normal distribution with correlation $\rho=0.3$.

If $(X_1,Y_1)$ and $(X_2,Y_2)$ are measured for independently selected firms, then the difference $(X_1-X_2,Y_1-Y_2)$ is also bivariate normal, with mean zero and the same correlation $\rho$. Kahneman's claim is that $P(Y_1>Y_2\mid X_1>X_2)\approx 0.6$. This follows from a fact(*) about bivariate normal variables:

If $(A,B)$ are bivariate normal with means $\mu_A$ and $\mu_B$ respectively, and correlation $\rho$, then $$P(B>\mu_B\mid A>\mu_A)=\frac12 + \frac{\arcsin\rho}\pi.$$

If $\rho=0.3$ the RHS works out to $0.59698668$.

(*) The fact can be deduced from this result.

  • 32,010
  • 1
  • 28
  • 54
  • Could you explain why the difference $(X_1 - X_2, Y_1 - Y_2)$ has mean zero? If the mean of $(X_1, Y_1)$ is $\mu_1$ and the mean of $(X_2, Y_2)$ is $\mu_2$, shouldn't the mean of $(X_1 - X_2, Y_1 - Y_2)$ be $\mu_1 - \mu_2$? – titusAdam Feb 09 '18 at 16:04
  • 1
    @titusAdam We are selecting firms from the same population, so the mean of $X_1$ (i.e., the expected value of $X_1$) is the same as the mean of $X_2$, and the mean of $Y_1$ is the same as the mean of $Y_2$. In other words, $\mu_1=\mu_2$ follows from random sampling. – grand_chat Feb 09 '18 at 17:39

I suspect that the key assumption is a bivariate normal distribution

Suppose $X$ (the difference in the CEO quality) and $Y$ (the difference in the firm success) have a jointly bivariate normal distribution with zero means, and a correlation of $0.3$ (for example they could each have variance $1$ and a covariance between them of $0.3$)

In that case, the probability that $X$ and $Y$ had the same sign would be about $0.597$, close enough to $0.6$. Here is a simulation in R illustrating the point:

> set.seed(1)
> cases <- 1000000
> correl <- 0.3
> A <- rnorm(cases)
> B <- rnorm(cases)
> C <- rnorm(cases)
> X <- sqrt(correl)*A + sqrt(1-correl)*B 
> Y <- sqrt(correl)*A + sqrt(1-correl)*C
> cor(X,Y)
[1] 0.2998031
> mean(X*Y > 0) 
[1] 0.597309

By contrast if $X$ and $Y$ each took the values $1$ and $-1$ with probabilities of $\frac12$ and with correlation between them of $0.3$, I suspect the probability that $X$ and $Y$ had the same sign would be $0.65$.

  • 142,043
  • 9
  • 114
  • 228