Say I have a data frame like mydf in the MWE below:
set.seed(1)
ids <- rep(paste(sample(LETTERS, 10), sample(1:100, 10), sep=''), c(34,56,12,98,23,13,24,45,10,21))
feats <- paste(sample(letters, length(ids), replace=TRUE), sample(letters, length(ids), replace=TRUE), sample(1:1000, length(ids)), sep='')
perc <- sample(seq(1,100,0.01), length(ids), replace=TRUE)
mydf <- data.frame(ID=ids, FEATURE=feats, ABUNDANCE=perc)
mydf
That looks like:
> mydf
ID FEATURE ABUNDANCE
1 G21 yw821 34.98
2 G21 fc599 70.80
3 G21 qx425 59.56
4 G21 dm560 47.47
5 G21 gc790 34.30
6 G21 ki168 96.82
7 G21 av971 64.94
8 G21 jh474 20.43
9 G21 wp930 36.36
10 G21 iv901 51.79
How can I make a subset of it, to obtain the top X (5 for example) most abundant FEATURES per ID? I feel it should be pretty easy, but I can't wrap my head about a simple way to do it... Thanks!