I have a dataframe as such:
probe.id gene.name variance databse
A_23_P100002 FAM174B 0.93285966 Database1
A_23_P100013 AP3S2 0.48936044 Database1
...
A_23_P100020 RBPMS2 0.77441359 Database2
A_23_P100072 AVEN 0.36194383 Database2
...
I am interested in reducing this dataframe so that only the 100 genes with the highest variances per database remain. It seems that aggregate could do the job, but I don't have an idea of how to write the function that I would pass to aggregate. I would greatly appreciate any help.
Thank you!