2

I am trying to summarise some data. I am trying to take a set of data with years, months, and refunds. What I want to so is group by year, and show the month with highest refunds and the corresponding refund amount.

Some example data:

Year, Month, Ref 
2017, Jan, 1234 
2017, Feb, 2345 
2017, Mar, 1123 
2018, Jan, 1133 
2018, Feb, 3453 
2018, Mar, 2343 

What I have so far:

RefTable <- returns_data %>% group_by(Year) %>%
  summarise(MaxRefAmt  = max(Ref))

This will pull in the correct amount but finding the corresponding month is proving very difficult. I am thinking an ifelse statement needs to be involved but I am not to sure on how to go about doing this. I also am trying to use dplyr to do this as I need the practice with this package.

Any help would be greatly appreciated. Please let me know if I need to clear anything up.

Edit: I have noticed that this was marked duplicate. I did not realize that it was. However, after reviewing the similar question, it is apparent to me that I do not understand the previous answer. This answer makes more sense to me and was more in the context of the actual problem I was looking into. Furthermore, the results on the previous question that this is similar to were not working while the top result from this question works without issue.

Chris
  • 186
  • 1
  • 16

3 Answers3

2

We can use slice after grouping by 'Year'

returns_data %>%
   group_by(Year) %>%
   slice(which.max(Ref))
#    A tibble: 2 x 3
# Groups:   Year [2]
#   Year Month   Ref
#  <int> <chr> <dbl>
#1  2017 Feb    2345
#2  2018 Feb    3453
akrun
  • 674,427
  • 24
  • 381
  • 486
1

I think this is the simplest way to get what you're looking for. Try combining group_by() and filter(). Using summarise() is actually unnecessary here because you're returning a single row from within each group, instead of summarizing them into a single row. Try this:

library(tidyverse)
RefTable <- group_by(returns_data, Year) %>%
            filter(Ref == max(Ref))
# Output:
# A tibble: 2 x 3
   Year Month   Ref
  <int> <fct> <int>
1 2017  Feb   2345
2 2018  Feb   3453
Marcus Campbell
  • 2,558
  • 4
  • 21
  • 34
0
RefTable <- returns_data %>%
    group_by(Year) %>% 
    summarise(MaxRefAmt = max(Ref),Month=Month[which.max(Ref)])

A tibble: 2 x 3
   Year MaxRefAmt Month 
  <int>     <dbl> <fct> 
1  2017     2345. " Feb"
2  2018     3453. " Feb"

Using base R:

unsplit(with(dat,by(dat,Year,function(x)x[which.max(x$Ref),])),1:2)
  Year Month  Ref
2 2017   Feb 2345
5 2018   Feb 3453
Onyambu
  • 31,432
  • 2
  • 14
  • 36