R: How to subset a dataframe to obtain the top "features" per "id" according to a value column?

Asked Jul 07 '17 at 07:09

Active Jul 07 '17 at 07:26

Viewed 57 times

Say I have a data frame like mydf in the MWE below:

set.seed(1)
ids <- rep(paste(sample(LETTERS, 10), sample(1:100, 10), sep=''), c(34,56,12,98,23,13,24,45,10,21))
feats <- paste(sample(letters, length(ids), replace=TRUE), sample(letters, length(ids), replace=TRUE), sample(1:1000, length(ids)), sep='')
perc <- sample(seq(1,100,0.01), length(ids), replace=TRUE)
mydf <- data.frame(ID=ids, FEATURE=feats, ABUNDANCE=perc)
mydf

That looks like:

> mydf
     ID FEATURE ABUNDANCE
1   G21   yw821     34.98
2   G21   fc599     70.80
3   G21   qx425     59.56
4   G21   dm560     47.47
5   G21   gc790     34.30
6   G21   ki168     96.82
7   G21   av971     64.94
8   G21   jh474     20.43
9   G21   wp930     36.36
10  G21   iv901     51.79

How can I make a subset of it, to obtain the top X (5 for example) most abundant FEATURES per ID? I feel it should be pretty easy, but I can't wrap my head about a simple way to do it... Thanks!

asked Jul 07 '17 at 07:09

DaniCee

1,635
5
22
44

1

`library(dplyr) mydf %>% group_by(ID) %>% top_n(n = 5, wt = ABUNDANCE)` – Ronak Shah Jul 07 '17 at 07:13
Yes this is it!! I'm not really familiar with dplyr so I didn't see that question; it is clearly a duplicate. Thanks! – DaniCee Jul 07 '17 at 07:19
1

So you're not susceptible for base R solutions? :) – Roman Luštrik Jul 07 '17 at 07:21

R: How to subset a dataframe to obtain the top "features" per "id" according to a value column?

0 Answers0