How can I get top n values with its index in R?

Question

I have a data frame with just one column, I want to find the largest three values with it's index. For example, my data frame df looks like:

  distance
1 1
2 4
3 2
4 3
5 4
6 5
7 5

I want to find the largest 3 value with its index, so my expected result is:

  distance    
6 5
7 5
2 4
5 4
4 3

How can I do this? Since I have just one column, is it also possible with list instead of data frame?

akrun · Accepted Answer · 2015-09-14T13:31:52.430

8

We can use sort with index.return=TRUE to return the value with the index in a list. Then we can subset the list based on the first 3 unique elements in the 'x'.

lst <- sort(df1$distance, index.return=TRUE, decreasing=TRUE)
lapply(lst, `[`, lst$x %in% head(unique(lst$x),3))
#$x
#[1] 5 5 4 4 3

#$ix
#[1] 6 7 2 5 4

edited Sep 14 '15 at 13:31

answered Sep 14 '15 at 13:15

akrun

674,427
24
381
486

Thanks very much for the answer. But I don't know in advance, how many values can be returned. It may be 5 or 4 or 3.... – xirururu Sep 14 '15 at 13:18
Hi akrun, I aware that, you use `[` in lapply. What means the `[`? – xirururu Sep 14 '15 at 13:40
1

@xirururu It is just to subset the dataset based on the index returned from `list$x %in% head(unique..`. without using a anonymous function. It can be otherwise written as `lapply(lst, function(y) y[lst$x %in% head(unique(lst$x),3)])` – akrun Sep 14 '15 at 13:42
1

@xirururu You can find more info from `?Extract` or `?"["` – akrun Sep 14 '15 at 13:43
1

Hi akrun, thank very much! :D I am now on `?Extract` page. It is really cool, I can learn so much just from a small question. :D – xirururu Sep 14 '15 at 13:45
@xirururu Glad to know that it helped. BTW, I used `sort` with `index.return` as it is more specific rather than depending on the numeric row names. – akrun Sep 14 '15 at 13:46

SabDeM · Answer 2 · 2015-09-14T13:26:40.287

2

A little clumsy version of my previous code:

 df[order(df$distance, decreasing = TRUE)[sort(unique(df$distance))], , drop = FALSE]
  distance
6        5
7        5
2        4
5        4
4        3

edited Sep 14 '15 at 13:26

answered Sep 14 '15 at 13:17

SabDeM

6,638
2
22
37

Theodor · Answer 3 · 2015-09-14T13:18:33.093

1

df[order(df, decreasing=TRUE)[1:3],,drop=FALSE]

If you have more columns, then you should have

 df[order(df$column_name, decreasing=TRUE)[1:3],,drop=FALSE]

edited Sep 14 '15 at 13:18

answered Sep 14 '15 at 13:17

Theodor

896
3
7
20

Hi Theodor, thanks for the answer, but I got the result: 5, 5, 4. Acturally, I want 3 distince values, so the top 3 values is 5,5,4,3. Do you know, how can I do this? – xirururu Sep 14 '15 at 13:24

score 1 · Answer 4 · answered Oct 08 '20 at 23:09

If you are looking for one column to sort from increasing to decreasing order

rownames = rownames(df)
indexes <- order(df$ColumnName,decreasing = TRUE)[1:N]

result <- NULL
for (i in indexes)
  result<- c(rownames[i],result)

result

Here, we have saved the rownames in 'result' vector. This will return the indexes as well.

score 1 · Answer 5 · answered Nov 27 '20 at 06:16

Using the libaray data.table is a faster solution because setorder is faster than order and sort:

library(data.table)

select_top_n<-function(scores,n_top){
    d <- data.frame(
        x   = copy(scores),
        indice=seq(1,length(scores)))
    
    setDT(d)
    setorder(d,-x)
    n_top_indice<-d$indice[1:n_top]
    return(n_top_indice)
}


select_top_n2<-function(scores,n_top){
    
    n_top_indice<-order(-scores)[1:n_top]
    return(n_top_indice)
}

select_top_n3<-function(scores,n_top){
    
    n_top_indice<-sort(s, index.return=TRUE, decreasing=TRUE)$ix[1:n_top]
    return(n_top_indice)
}

Testing:

set.seed(123)
s=runif(100000)

library(microbenchmark)
mbm<-microbenchmark(
    ind1 = select_top_n(s,100),
    ind2=select_top_n2(s,100),
    ind3=select_top_n3(s,100),
    times = 10L
)

Output:

Unit: milliseconds
 expr       min       lq      mean    median        uq       max neval
 ind1  5.824576  5.98959  6.209746  6.052658  6.270312  7.422736    10
 ind2  9.627950 10.08661 10.274867 10.377451 10.560912 10.588223    10
 ind3 10.397383 11.32129 12.087122 12.498817 12.856840 13.155845    10

Refer to Getting the top values by group

score 1 · Answer 6 · answered Nov 29 '20 at 03:45

You can use function nth from package Rfast for getting the index or the values

> x=runif(100000)
> num.of.nths <- 3
> Rfast2::benchmark(a<-Rfast::nth(x,3,num.of.nths,TRUE,TRUE),b<-order(x,decreasing = T)[1:3],times = 10)
   milliseconds 
                                        min     mean     max
a <- Rfast::nth(x, 3, 3, TRUE, TRUE) 1.6483  2.12419  3.1238
b <- order(x, decreasing = T)[1:3]   6.8648 12.31633 27.1988
> 
> a
      [,1]
[1,]  8058
[2,] 63946
[3,] 17556
> b
[1]  8058 63946 17556

score 0 · Answer 7 · answered Apr 22 '21 at 16:26

0

Get top percentage (proportion) of any column

df <- df %>% slice_max(IndexCol, prop = .25)

or by a group

df <- df %>% group_by(col1, col2) %>% slice_max(IndexCol, prop = .25)

https://dplyr.tidyverse.org/reference/slice.html

answered Apr 22 '21 at 16:26

micahkimel

1
2

How can I get top n values with its index in R?

7 Answers7

Linked