I'm having trouble finding a good explanation of the Lebesgue integral. As per the definition, it is the expectation of a random variable. Then how does it model the area under the curve? Let's take for example a function $f(x) = x^2$. How do we find the integral of $f(x)$ under $[0,1]$ using the Lebesgue integral?

  • 3,079
  • 7
  • 30
  • 33

8 Answers8


As has been noted, the usual definition of the Lebesgue integral has little to do with probability or random variables (though the notions of measure theory and the integral can then be applied to the setting of probability, where under suitable interpretations it will turn out that the (Lebesgue) integral of (a certain) functions corresponds to the expectation of (a certain) random variable).

But this is not the origin of the Lebesgue integral. Here is an intuitive idea of what the Lebesgue integral is, as compared to the Riemann integral.

Recall from Calculus the idea behind the Riemann integral: the integral $\int_a^b f(x)\,dx$ is meant to represent the net signed area between the $x$-axis, the graph of $y=f(x)$, and the lines $x=a$ and $x=b$. The way we attempt to do this is by breaking up the domain, $[a,b]$, into subintervals $[a=x_0,x_1]$, $[x_1,x_2],\ldots,[x_{n-1},x_n=b]$. Then, on each subinterval $[x_i,x_{i+1}]$ we pick a point $x_i^*$, and we estimate the area under the graph of the function with the rectangle of height $f(x_i^*)$ and base $[x_i,x_{i+1}]$. This leads to the Riemann sums $$ \sum_{i=0}^{n-1} f(x_i^*)(x_{i+1}-x_i)$$ as estimates of the area under the graph. We then consider finer and finer partitions of $[a,b]$ and take limits to estimate the area.

Lebesgue's idea was that instead of partitioning the domain, we will partition the range; if the function takes values between $c$ and $d$, we can divide the range $[c,d]$ into subintervals $[c=y_0,y_1]$, $[y_1,y_2],\ldots,[y_{m-1},y_m=d]$. Then, we let $E_i$ be the set of all points in $[a,b]$ whose value under $f$ lies between $y_i$ and $y_{i+1}$. That is, $$ E_i = f^{-1}([y_i,y_{i+1}]) = \{ x\in[a,b]\,|\, y_i \leq f(x) \leq y_{i+1}\}.$$

If we have a way of assigning a "size" to $E_i$, call it its "measure" $\mu(E_i)$, then the portion of the graph of $y=f(x)$ that lies between the horizontal lines $y=y_i$ and $y=y_{i+1}$ will be $A$, where, $$ y_i\mu(E_i) \leq A \leq y_{i+1}\mu(E_i).$$ So Lebesgue suggests to approximate the the area by picking a number $y_i^*$ between $y_i$ and $y_{i+1}$, and considering the sums $$ \sum_{i=0}^{n-1} \mu(E_i)y_i^*.$$ Then consider finer and finer partitions of $[c,d]$, and this gives finer and finer approximations of of the area by these sums. The Lebesgue integral will be the limit of these sums. (The analogy given by Mike Spivey is very apt for the distinction between partitioning the domain and partitioning the range to find the sum.)

But in order for this to make sense, we need to develop a way of measuring fairly intricate subsets of the line, so that we can compute $\mu(E_i)$. So we first develop a way of doing this; turns out that if you accept the Axiom of Choice, then it is impossible to come up with a way of measuring that will (i) assign to an interval $[a,b]$ the "measure" $b-a$; (ii) will be invariant under translation, so so that if $F=E+c = \{e+c | e\in E\}$ then $\mu(F)=\mu(E)$; (iii) will be countably additive: if $E = \cup_{i=1}^{\infty}E_i$ and the $E_i$ are pairwise disjoint, then $\mu(E) = \sum\mu(E_i)$; and (iv) every subset of the line will have a well-defined (possibly infinite) measure. (If you don't accept the Axiom of Choice, then there are models of the reals where we can achieve this). So one drops the restriction (iv), and constructs a measure for which some sets will be "too weird" to have a measure. We then restrict attention to certain kinds of functions (called the measurable functions), which are the ones for which the sets we get in the process described above are all measurable sets. And then we define the Lebesgue integral for those functions, following the idea described above (but one does not define it exactly that way; instead the usual way is to describe $f$ as a limit of functions for which the integral is easy, and then compute the integral of $f$ as a limit of the integrals that are easy).

For your function, $f(x)=x^2$, this is fairly easy: the value all lie between $0$ and $1$, so say that we break up the range into subintervals of length $1/n$, so $y_i = i/n$, $i=0,\ldots,n$. Then $$f^{-1}([y_i,y_{i+1}]) = f^{-1}([i/n, (i+1)/n]) = [\sqrt{i/n},\sqrt{(i+1)/n}],$$ so the $n$th estimate, picking $y_i^* = y_i = i/n$ is just $$\sum_{i=0}^n (i/n)\left(\sqrt{(i+1)/n} - \sqrt{i/n}\right).$$ Take the limit as $n\to\infty$, and you will get that the limit is $\frac{1}{3}$, as expected. (I will spare you the details; see the end of this answer for a high-power way of getting the answer similar to the way you do it with the Riemann integral).

It turns out that not every function is Lebesgue-integrable, just like not every function is Riemann-integrable. But every function that is Riemann-integrable will also be Lebesgue integrable, and the value of its Lebesgue integral will be the same as the value of its Riemann integral. But there are functions that are not Riemann-integrable but are Lebesgue-integrable (for example, the characteristic function of the rationals is Lebesgue-integrable, with integral $0$ over any interval, but is not Riemann-integrable). We also have a "Fundamental Theorem of Calculus" for the Lebesgue Integral:

Theorem. If $F$ is a differentiable function, and the derivative $F'$ is bounded on the interval $[a,b]$, then $F'$ is Lebesgue integrable on $[a,b]$ and $$\int_a^x F'\,d\mu = F(x) - F(a).$$

Here, the integral is the Lebesgue integral.

In particular, to finally answer the question you ask about your example, since $F(x)=\frac{x^3}{3}$ is a differentiable function whose derivative is bounded over any finite interval, in particular over $[0,1]$, then from this theorem you can deduce that the integral over the interval $[0,1]$ of the derivative $F'(x)=x^2$ is equal to $F(1)-F(0)$; that is, $$\int_0^1 x^2\,d\mu = \int_0^1 \left(\frac{x^3}{3}\right)'\,d\mu = \frac{1}{3} - \frac{0}{3} =\frac{1}{3}.$$

I recommend the book A Garden of Integrals by Frank E. Burk (Dolciani Mathematical Expositions 31, MAA, 2007, ISBN 9-780883-853375); it discusses and compares the Cauchy integral, the Riemann integral, the Riemann-Stieltjes integral, the Lebesgue integral, the Lebesgue-Stieltjes integral, and the Henstock-Kurzweil integral; it also discusses the Wiener and Feynman integral. I just finished reading it recently.

Arturo Magidin
  • 356,881
  • 50
  • 750
  • 1,081

One of my graduate school professors, Erhan Cinlar, used to give the following analogy to explain the intuitive difference between the Lebesgue integral and the Riemann integral.

Suppose you have a pile of coins of different denominations, and you want to know how much money you have. The Riemann integral is like picking up the coins, one-by-one, and adding the denomination of each to a running total. The Lebesgue integral is like sorting the coins by denomination first, and then getting the total by multiplying each denomination by how many you have of that denomination and then adding up those numbers. The methods are different, but you obtain the same result by either method.

In the same way, when both the Riemann integral and the Lebesgue integral are defined, they give the same value. As others have said, though, there are functions for which the Lebesgue integral is defined but the Riemann integral is not, and so in that sense the Lebesgue integral is more general than the Riemann.

Mike Spivey
  • 52,894
  • 17
  • 169
  • 272
  • 11
    One way to see your analogy in the context of integrals is to notice that Rieman integrals approximate the area by sums of vertical rectangles, while the Lebesgue integral instead uses horizontal rectangles... – Mariano Suárez-Álvarez Oct 21 '10 at 21:22
  • 8
    @Mariano: Or, you have a bunch of piles of coins. Riemann adds up each pile separately, then adds up the totals. Lebesgue counts how many pennies are in all the piles, and gets a partial total; then counts how many nickels; then how many dimes; etc. And then adds up the totals – Arturo Magidin Oct 22 '10 at 04:45
  • 18
    This analogy was actually given by Lebesgue himself (according to Dunham, The Calculus Gallery). – Michael Greinecker Jan 24 '12 at 07:39
  • @MarianoSuárez-Álvarez Could you please clarify "Levesgue integral uses horizontal tectangles"? I'm learning from the book Sakarchi & Stein and so far I found no mention of this. In that book we first define the integral of simple functions, then bounded functions supported on a set of finite measure, etc. The only place where I might imagine horizontal rectangles is one chapter back where we proved that every measurable function is the limit of a sequence of simple functions, and I believe in that construction we partitioned the range. Is this the connection wirh what you said? – Ovi Mar 29 '18 at 04:15

The Lebesgue integral is a generalization of the usual Riemann integral taught in basic calculus. If the Riemann integral of a function over a set exists then it equals the Lebesgue integral. So the Lebesgue integral of $x^2$ over $[0,1]$ is just the old $(1/3) 1^3-(1/3)0^3$

The Lebesgue integral has the benefit of being defined for many more functions than the Riemann integral. Even more importantly the Lebesgue integral has useful limit properties:

The expectation of a random variable is a particular application of the Lebesgue integral where the function to be integrated is the random variable (viewed as a function on the sample space) and the integration is with respect to a probability measure.

You need to look at one of the many probability and measure books for the details. My own favourites are:

  • Pollard, A User's Guide to Measure-Theoretic Probability
  • Dudley, Real Analysis and Probability

Terence Tao has some online lecture notes:

Jyotirmoy Bhattacharya
  • 5,042
  • 3
  • 29
  • 51

The Riemann integral is pretty good and very intuitive, however the main reason to consider other types of integrals is that "the space of functions that are Riemann integrable", say $R(I)$ where $I\subset\mathbb{R}^n$ is compact, is too small (even though it is a linear space in the sense you can add them and multiply by constants).

If you just look at a piecewise continuous function that vanishes outside a bounded region and then you can go on with the Riemann integral. In mathematical analysis we look at various kinds of limits of functions and we would like the limit functions to stay in "the space" (we want the space to be complete).

About the best we can do in the Riemann case is to look at uniformly convergent sequences $f_n$ on a compact interval $I\subset\mathbb{R}^n$ - in that case the limit $\lim f_n\in R(I)$ and $\lim\int f_n =\int \lim f_n$. However, uniform convergent is very rare! (Many Fourier series are not continuous even though there partial sums are, etc..).

The Lebesgue integral can be constructed in several ways (ending up with the same space though). A first try might be to start with norming $R(I)$, $\|f\|=\int|f|$ and then we would get a distance between $f,g\in R(I)$ by $\|f-g\|$, thus turning up with a metric space which we may complete by adding all possible limits - this will not work however because even though $R(I)$ is small it is to large (there are unbounded functions such that $\|f\|=\infty$). A better start would be to look at $C(I)$ = the space of continuous functions on (the compact set) $I$, (certainly each $f\in R(I)$ is a point-wise limit of $C(I)$ functions) if we norm $C(I)$ in the same manner we would indeed get a normed space and the completion of that space is $L^1(I)$.

In $L^1$ you can sure take limits in norm and moreover, as has already been pointed out in other answers, you have many other better limit theorems such as Lebesgue dominated theorem or the monotone convergence theorem. Also, bounded functions of $R(I)$ do belong to $L^1(I)$.

In addition to the above: In order to the suggested norm to be a norm we need to consider two functions, $f$ and $g$, as equal whenever $\int|f-g|dx=0$ which, for example, happens when their value differ at some point of $I$.

AD - Stop Putin -
  • 10,572
  • 8
  • 40
  • 69

You may want to consider the following sources:

  • 5,371
  • 1
  • 31
  • 56

The definition here doesn't mention probability or expectation or random variable. Intuitively, it just says the measure (in $\mathbb R^2$) is the area of the smallest set of rectangles that will cover the set. Then for an area the Lebesgue integral is just the integral of 1 over the set.

Ross Millikan
  • 362,355
  • 27
  • 241
  • 432

You may also like to refer, these two books:

  • A radical approach to Lebesgue Theory of Integration: Bressoud

  • Real Analysis by G.B. Folland.

As Jyotirmoy pointed out, Lebesgue integral is the generalization of Riemann Integral. There are shortcomings of the Riemann Integral, due to which the Lebesgue integral, was discovered. A rigorous definition of the Lebesgue integral needs, you to know what a Simple Function is, and you can read more on this at http://en.wikipedia.org/wiki/Lebesgue_integration


I first understood Lebesgue as integral over range rather than integral over domain but it never seemed plausible to me that it can resolve the problem of discontinuity. I think that the trick is that measure is something that better "width" of the area (function value is the same) than Rieman's $\Delta x \rightarrow 0$ does. The Lebegue is basically

$$\int_S f(s)d\mu = \sum {f(s) \mu(s)}$$

You may think that you scan over f(s) -- all the values that you may have in the domain $S = \cup(s)$ -- and sum up all rectangles of size $height(s) \times width(s) = f(s) \mu(s)$.

You can say that you integrate over range of values f(s). However, you see also other differences. Whereas Riemann implies that the width of every interval is the same dx, that is, if you have function

$$f(1) => 1\\ f(2) => 1\\ f(3) => 2\\ f(4) => 1$$

the integral will accumulate the sum as 0+1+1+2+1. So, if you have 4 coins, you can compute the sum by simply iterating them. You see that the speed of growth is proportional only to the coin value f(x) and not to the coin index, x. This is not the case of Lebesgue. For Lebesgue, you group the coins by values (I am here from Scala), List(1,1,2,1) => Occurrences(1->3, 2->1), and you see immediately that you have two heaps of coins: ones in the first heap and doubles in the second. These are two rectangles (coin value x heap size) that you need to add together. So, you integrate by range here but you not simply add together the measures (the amount of coins) that exceed current value, you add coin value x count. This is different from Riemann. This is my second stage of understanding the integrals.

This enables integrating point-charge like discontinuities. Consider a step function. It jumps instantly at some points. It is an integral of some infinitely narrow spikes and integral grows instantly at the spike point, as if there is a finite finite amount of charge/mass concentrated in infinitesimal point of space, in contrast to continously distributed charge/mass in space. The amount of that charge in the point determines the height of the step. It seems that Riemann has difficulty with integrating the spike because regardless how infinitecimal you make your dx, they fail to break the infinitesimal interval into rectangles of constant (and infinite) height f(x) to sum them up. On the other hand, we can say that there is charge n at $x=x_0$ and when we integrate over axis of values, we suddenly pass through the value n, which is measured (countable measure) to have width of k (coins), regardless it is confined in a single real point of width 0.


Now, how is this related to expectations? Your 4 coins, 3*1 + 1*2 add up to 5 euros and, when you draw arbitrary coin, you expect its value to be 3*1+2*1/4 = 5/4 = 1.25 euros. That is, one coin contributes 1.25 euro in average. The Lebesgue integral is equal to expectation when the measure of all coins (amount of coins) is one. That is, every of n coins is not actually a coin but $1/n$th of it. Now, Lebesgue integral is 3/4*1 + 1/4*2 = 5/4 is the expectation. That is not surprising because integrating every value times its probability is what the expectation is.

I hesitated whether to post or not my undergraduate garbage reflections. But, this video persuaded me that I am on the right track.

  • 1
  • 7
  • 24