If $f(x)$ is a density function and $F(x)$ is a distribution function of a random variable $X$ then I understand that the expectation of x is often written as:

$$E(X) = \int x f(x) dx$$

where the bounds of integration are implicitly $-\infty$ and $\infty$. The idea of multiplying x by the probability of x and summing makes sense in the discrete case, and it's easy to see how it generalises to the continuous case. However, in Larry Wasserman's book All of Statistics he writes the expectation as follows:

$$E(X) = \int x dF(x)$$

I guess my calculus is a bit rusty, in that I'm not that familiar with the idea of integrating over functions of $x$ rather than just $x$.

  • What does it mean to integrate over the distribution function?
  • Is there an analogous process to repeated summing in the discrete case?
  • Is there a visual analogy?

UPDATE: I just found the following extract from Wasserman's book (p.47):

The notation $\int x d F(x)$ deserves some comment. We use it merely as a convenient unifying notation so that we don't have to write $\sum_x x f(x)$ for discrete random variables and $\int x f(x) dx$ for continuous random variables, but you should be aware that $\int x d F(x)$ has a precise meaning that is discussed in a real analysis course.

Thus, I would be interested in any insights that could be shared about what is the precise meaning that would be discussed in a real analysis course?

Jeromy Anglim
  • 817
  • 2
  • 8
  • 12

5 Answers5


There are many definitions of the integral, including the Riemann integral, the Riemann-Stieltjes integral (which generalizes and expands upon the Riemann integral), and the Lebesgue integral (which is even more general.) If you're using the Riemann integral, then you can only integrate with respect to a variable (e.g. $x$), and the notation $dF(x)$ isn't defined.

The Riemann-Stieltjes integral generalizes the concept of the Riemann integral and allows for integration with respect to a cumulative distribution function that isn't continuous.

The notation $\int_{a}^{b} g(x)dF(x)$ is roughly equivalent of $\int_{a}^{b} g(x) f(x) dx$ when $f(x)=F'(x)$. However, if $F(x)$ is a function that isn't differentiable at all points, then you simply can't evaluate $\int_{a}^{b} g(x) f(x) dx$, since $f(x)=F'(x)$ isn't defined.

In probability theory, this situation occurs whenever you have a random variable with a discontinuous cumulative distribution function. For example, suppose $X$ is $0$ with probability $\frac{1}{2}$ and $1$ with probability $\frac{1}{2}$. Then

$$ \begin{align} F(x) &= 0 & x &< 0 \\ F(x) &= 1/2 & 0 &\leq x < 1 \\ F(x) &= 1 & x &\geq 1 \\ \end{align} $$

Clearly, $F(x)$ doesn't have a derivative at $x=0$ or $x=1$, so there isn't a probability density function $f(x)$ at those points.

Now, suppose that we want to evaluate $E[X^3]$. This can be written, using the Riemann-Stieltjes integral, as

$$E[X^3]=\int_{-\infty}^{\infty} x^3 dF(x).$$

Note that because there isn't a probability density function $f(x)$, we can't write this as

$$E[X^{3}]=\int_{-\infty}^{\infty} x^3 f(x) dx.$$

However, we can use the fact that this random variable is discrete to evaluate the expected value as:


So, the short answer to your question is that you need to study alternative definitions of the integral, including the Riemann and Riemann-Stieltjes integrals.

Brian Borchers
  • 9,706
  • 2
  • 20
  • 28

Another way to understand integration with respect to a distribution function is via the Lebesgue-Stieltjes measure. Let $F\!:\mathbb R\to\mathbb R$ be a distribution function (i.e. non-decreasing and right-continuous). Then there exists a unique measure $\mu_F$ on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ that satisfies $$ \mu_F((a,b])=F(b)-F(a) $$ for any choice of $a,b\in\mathbb R$ with $a<b$. Actually there is a one-to-one correspondance between probability measures on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ and non-decreasing, right-continuous functions $F\!:\mathbb R\to\mathbb R$ satisfying $F(x)\to 1$ for $x\to\infty$ and $F(x)\to 0$ for $x\to-\infty$.

Now, the integral $$ \int x\,\mathrm dF(x) $$ can be viewed as simply the integral $$ \int x\,\mu_F(\mathrm dx)\quad\text{or}\quad \int x \,\mathrm d\mu_F(x). $$

Now if $X$ is a random variable having distribution function $F$, then the Lebesgue-Stieltjes measure is nothing but the distribution $P_X$ of $X$: $$ P_X((a,b])=P(X\in (a,b])=P(X\leq b)-P(X\leq a)=F(b)-F(a)=\mu_F((a,b]),\quad a<b, $$ showing that $P_X=\mu_F$. In particular we see that $$ {\rm E}[X]=\int_\Omega X\,\mathrm dP=\int_\mathbb{R}x\,P_X(\mathrm dx)=\int_\mathbb{R}x\,\mu_F(\mathrm dx)=\int_\mathbb{R}x\,\mathrm dF(x). $$

Stefan Hansen
  • 24,191
  • 7
  • 51
  • 79
  • 2
    This is an older answer, but I'll try nonetheless: how is one to understand $\int_\mathbb{R} x \mu_{F}(dx)$? I've never seen that notation and am unsure of what it's supposed to mean. – Dahn Aug 06 '15 at 09:13
  • 1
    @DahnJahn: As a [Lebesgue integral](https://en.wikipedia.org/wiki/Lebesgue_integration). – Stefan Hansen Aug 06 '15 at 09:32
  • 1
    Thanks, ah, so is it simply a notation issue and $\int d\mu = \int \mu(dx)$ by definition? – Dahn Aug 06 '15 at 09:51
  • 3
    @DahnJahn: Yep, it's just two ways of writing the same thing. Usually, one would write $\int f(x)\,\mu(\mathrm dx)$ or $\int f(x)\, \mathrm d\mu(x)$ to make the integration variable explicit. Otherwise, one writes $\int f\,\mathrm d\mu$. – Stefan Hansen Aug 06 '15 at 10:00
  • Hi @StefanHansen, sorry for coming very late to this discussion. I would like to ask why $P_X((a,b])=P(X\leq b)-P(X\leq a)$ and not $P_X((a,b])=P_X(X\leq b)-P_X(X\leq a)$? how are the two different from each other? – Blg Khalil Jan 05 '20 at 21:10
  • 2
    @BlgKhalil First of all, $P_X(X\leq b)$ doesn't even make sense because $P_X$ is a measure on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ and $\{X\leq b\} = \{\omega\in\Omega\mid X(\omega)\leq b\}$ is a subset of $\Omega$, your probability space. In fact, $P_X$ is defined in terms of $P$ and $X$ in the following way: $P_X(A)=P(X\in A)$, where $\{X\in A\}=X^{-1}(A)=\{\omega\in\Omega\mid X(\omega)\in A\}$ for $A\in\mathcal{B}(\mathbb{R})$. See also this [answer](https://math.stackexchange.com/questions/508790/what-does-it-mean-by-mathcalf-measurable/508801#508801). – Stefan Hansen Jan 06 '20 at 08:02

The integral is in the sense of Riemann-Stieltjes. The definition can be found in the link but loosely put it is defined something like:

$$\int_a^b g(x)dF(x)=\lim_{P\rightarrow 0} \sum_{k=1}^{n-1} g(x_k)[F(x_{k+1})-F(x_k)],$$

where $x_i$ partition the interval you are integrating over, $[a,b]$, and the mesh size goes to 0, in that $P:=\{x_0=a,x_1,\ldots, x_{n-1},x_n=b\}$ and $x_i-x_{i-1}\rightarrow 0$ for every $i$. The point of this definition is that $F(x_{k+1})-F(x_k)$ encapsulates the probability of being within the interval $(x_{i},x_{i+1}]$. When $F$ is differentiable, you can show that $dF(x)=f(x)dx$, in that the integral becomes the usual Riemann integral. However, when $F$ is not differentiable, particularly when $F$ experiences a jump (which is equivalent to your random variable taking on a single value with positive probability), you need this generalization of the integral. For example, if $X$ is a constant random variable, say $X=c$, then $F(x)$ jumps from 0 to 1 at $x=c$ and so $X$ doesn't have a density function in the classical sense but rather a point mass (that is, a Dirac Delta functional).

Alex R.
  • 31,786
  • 1
  • 35
  • 74

Because dF(x) (CDF) is continuous function in either cases; f(x) is discrete or in case f(x) continuous.

In other words, in case of f(x) is discrete F(x) would be like as a step function [as in FIGURE 2.1. page 21], continuous curve, whereas f(x) would be represented as a points not connected to each other, then it is not appropriate using integral sign, but in case f(x) continuous it doesn't matter if we using dF(x) or f(x)dx.


The definition of dF(x) is f(x). F(x) is the cumulative distribution function (AKA the CDF). f(x) is the probability density function (PDF). See for yourself: http://mathworld.wolfram.com/DistributionFunction.html

  • 119
  • 3
  • Because $f(x)=F'(x)$? Therefore you can replace $dF(x)$ with $f(x)dx$? – Jeromy Anglim May 04 '13 at 01:39
  • 1
    Thanks. I understand that the density function is the derivative of the distribution function; I suppose the gap in my understanding is probably around integral notation particularly the concept of $dF(x)$ in the integral and the idea of replacing $dF(X)$ with something else. – Jeromy Anglim May 04 '13 at 01:44
  • 4
    To say that the definition of $dF(x)$ is $f(x)$ works ONLY when the distribution has a density. It doesn't apply to things like the Cantor distribution, even though the c.d.f. of that distribution is everywhere continuous. – Michael Hardy Aug 10 '14 at 18:08