15

I've been going back over my notes from Stats class and came across the Probability Integral Transform. From my limited understanding, the basic idea is that a cdf in terms of one variable can be transformed into another cdf in terms of different variable:

  • i.e. from $F_x(x)$ to --> $F_y(y)$

Is this understanding correct? What is the purpose behind this? Finally, is there a general procedure in performing the transformation?

James Mertz
  • 1,146
  • 4
  • 12
  • 20

1 Answers1

14

Your understanding looks basically correct to me.

As far as purpose, I've seen it used mostly to generate random variables from continuous distributions. For instance, if $X$ has a $U(0,1)$ distribution, then $F_X(x) = x$. Thus the requirement $F_X(x) = F_Y(y)$ in the probability integral transform reduces to $x = F_Y(y)$ or $y = F_Y^{-1}(x)$. Since $y$ is an observation from the probability distribution $Y$, this means that we can generate observations from the distribution $Y$ by generating $U(0,1)$ random variables (which most software programs can do easily) and applying the $F_Y^{-1}$ transformation.

For example, suppose you want to generate instances of an exponential$(\lambda)$ random variable. The cdf is $$F(y) = \int_0^y \lambda e^{-\lambda t} dt = 1 - e^{-\lambda y}.$$ Solving for $y$, we have $$F(y) - 1 = - e^{-\lambda y} \Rightarrow -\lambda y = \ln (1- F(y)) \Rightarrow y = F^{-1}(x) = -\ln(1-x)/\lambda.$$

Thus if $x$ is an observation from a $U(0,1)$ distribution, then $y = -\ln(1-x)/\lambda$ is an observation from an exponential$(\lambda)$ distribution. Moreover, $x$ having a $U(0,1)$ distribution is equivalent to $1-x$ having a $U(0,1)$ distribution, so we often express the transformation as $y = -\ln x/\lambda$.

As far as a general procedure for performing the transformation, what I've done here with the uniform and exponential distributions should give you a guide. Unfortunately, though, there aren't that many commonly-used distributions for which the cdf can be inverted analytically.

Mike Spivey
  • 52,894
  • 17
  • 169
  • 272
  • 1
    "Unfortunately, though, there aren't that many commonly-used distributions for which the cdf can be inverted analytically." - indeed. :( – J. M. ain't a mathematician Apr 20 '11 at 00:05
  • Mike, I have a question. So when you say we want to generate instances of an exponential random variable, you are saying our objective is to obtain a realization from random experiment whose distribution is exponential, correct? So the value of the Probability Integral Transform is that if we have the means of generating realizations from the standard uniform distribution, we can easily transform this (like you did above by solving for y) and get realizations from exponential distribution, correct? – Frank Swanton Dec 21 '16 at 21:13
  • 1
    My understanding is that if you have a column of, say 1,000 standard uniform random variable realization entries in Excel, and plug in the -lnx/lambda formula to the next column and plot these realizations, we will get pretty close to exponential density histogram? – Frank Swanton Dec 21 '16 at 21:15
  • @FrankSwanton: That's exactly right. – Mike Spivey Jan 03 '17 at 22:18