0

I'm trying to get the top three observation in the asset variable for each state. Using a smaller example, my starting data frame would be as follows:

#starting data frame
state = c("TX","TX","TX","TX","TX","TX","PA","PA","PA","PA","PA","PA","DE","DE","DE","DE","DE","DE") 
assets = c(600,100,200,400,200,300,600,100,200,400,200,300,600,100,200,400,200,300) 
stating_df = data.frame(state,assets)

And I'd like to go to this:

#ending data frame
state = c("TX","TX","TX","PA","PA","PA","DE","DE","DE") 
assets = c(600,500,400,600,500,400,600,500,400) 
ending_df = data.frame(state,assets)

I can easily use the sort command, but that doesn't account for each state or the top three. I have 50 states and over 1m observations in the data frame. Thank you.

BlueDevilPride
  • 137
  • 3
  • 13
  • `dplyr:;group_by() ` and `dplyr::arrange()`. – RobertMyles Jun 21 '17 at 00:52
  • `library(dplyr); stating_df %>% group_by(state) %>% top_n(n = 3)` – Ronak Shah Jun 21 '17 at 00:57
  • Possible duplicate of [Getting the top values by group using dplyr](https://stackoverflow.com/questions/27766054/getting-the-top-values-by-group-using-dplyr) Also [how to find the top N values by group or within category (groupwise) in an R data.frame](https://stackoverflow.com/questions/14800161/how-to-find-the-top-n-values-by-group-or-within-category-groupwise-in-an-r-dat) – Ronak Shah Jun 21 '17 at 01:00
  • thank you, everyone. I'm tried to locate a similar question on here, but couldn't find that. I guess I wasn't searching the correct terms. Thanks again. – BlueDevilPride Jun 21 '17 at 01:45

2 Answers2

2

you can use aggregate

aggregate(stating_df$assets, list(stating_df$state), 
    function(x) tail(sort(x), 3))
  Group.1 x.1 x.2 x.3
1      DE 300 400 600
2      PA 300 400 600
3      TX 300 400 600
G5W
  • 32,266
  • 10
  • 31
  • 60
0

The following works

library(dplyr)
library(reshape2)
state = c("TX","TX","TX","TX","TX","TX","PA","PA","PA","PA","PA","PA","DE","DE","DE","DE","DE","DE") 
assets = c(600,100,200,400,200,300,600,100,200,400,200,300,600,100,200,400,200,1300) 
starting_df = data.frame(state,assets)

ending_df = starting_df %>% group_by(state) %>%
  summarize(top3 = list(sort(assets, decreasing=T)[1:3]))

# I don't know your use case. But I recommend stopping in the above step

z = matrix(ending_df$top3 %>% unlist, ncol = 3, byrow = T)
ending_df = cbind(ending_df, z) %>% select(-top3)

ending_df = melt(ending_df, id.vars = "state")
ending_df = ending_df %>% arrange(state)