10

I have read other posts (such as here) on getting the "reverse" of quantile -- that is, to get the percentile that corresponds to a certain value in a series of values.

However, the answers don't give me the same value as quantile for the same data series.

I have also researched that quantile provides 9 different algorithms to calculate percentile.

So my question: is there a reliable way to get the reverse of the quantile function? ecdf does not take a "type" argument so it doesn't seem that one can make sure they are using the same method.

Reproducible example:

# Simple data
x = 0:10
pcntile = 0.5


# Get value corresponding to a percentile using quantile
(pcntile_value <- quantile(x, pcntile))     

# 50%    
# 5               # returns 5 as expected for 50% percentile     



# Get percentile corresponding to a value using ecdf function
(pcntile_rev <- ecdf(x)(5))                


# [1] 0.5454545   #returns 54.54% as the percentile for the value 5


# Not the same answer as quantile produces
halfer
  • 18,701
  • 13
  • 79
  • 158
  • 3
    https://stackoverflow.com/questions/35927956/quantile-vs-ecdf-results should provide you the answer. – tmfmnk Jun 23 '19 at 14:40

2 Answers2

2

The answer in the link is really good, but perhaps it helps, to have a look at ecdf Just run the following code:

# Simple data
x = 0:10
p0 = 0.5

# Get value corresponding to a percentile using quantile
sapply(c(1:7), function(i) quantile(x, p0, type = i))
# 50% 50% 50% 50% 50% 50% 50% 
# 5.0 5.0 5.0 4.5 5.0 5.0 5.0 

Thus, it is not a question of type. You can step into the function using debug:

# Get percentile corresponding to a value using ecdf function
debug(ecdf)
my_ecdf <- ecdf(x)

The crucial part is

rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/n, 
    method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")

After this you can check

data.frame(x = vals, y = round(cumsum(tabulate(match(x, vals)))/n, 3), stringsAsFactors = FALSE)

and as you devide by n=11 the result is not surprising. As said, for theory have a look at the other answer.

By the way, you can also plot the function

plot(my_ecdf)

Concerning your comment. I think it's not a question of reliability but a question of how to define the "inverse distribution function, if it does not exist":

enter image description here

enter image description here

enter image description here

A good reference for generalized inverses: Paul Embrechts, Marius Hofert: "A note on generalized inverses", Math Meth Oper Res (2013) 77:423–432 DOI

Christoph
  • 5,963
  • 3
  • 31
  • 73
  • So is the answer that one cannot reliably get the reverse of the quantile function? I followed your answer, but it still results in a discrepancy between the quantile function (the 50th percentile = 5) vs. the ecdf function (5 is the 54.54th percentile). – dave_in_newengland Jun 23 '19 at 21:58
1

ecdf is giving the result of the formula in the documentation.

x <- 0:10
Fn <- ecdf(x)

Now, the object Fn is an interpolating step function.

str(Fn)
#function (v)  
# - attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function"
# - attr(*, "call")= language ecdf(x)

And it keeps the original x values and the corresponding y values.

environment(Fn)$x
# [1]  0  1  2  3  4  5  6  7  8  9 10

environment(Fn)$y
# [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
# [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000

The latter are exactly the same values as the result of what the documentation says is the formula used to compute them. From help('ecdf'):

For observations x= (x1,x2, ... xn), Fn is the fraction of
observations less or equal to t, i.e.,

Fn(t) = #{xi <= t}/n = 1/n sum(i=1,n) Indicator(xi <= t).

Instead of 1:length(x) I will use seq_along.

seq_along(x)/length(x)
# [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
# [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000
Fn(x)
# [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
# [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000
Community
  • 1
  • 1
Rui Barradas
  • 44,483
  • 8
  • 22
  • 48
  • I have the same question for you that I put in the comment that I just posted to Christoph's answer – dave_in_newengland Jun 23 '19 at 21:59
  • @dave_in_newengland I believe the answer is yes, only is limit cases you will get the same values. The ECDF is a step function, and the values between the extremes of each interval do not necesseraly correspond to a value of the independent variable `x`. In the case above, `Fn(5) == 0.54` but the quantile is `50%`. It's not that much of a nonsense. – Rui Barradas Jun 24 '19 at 07:51