Selecting data in summarize based on another column in R if it == max(salary)

Question

First of all thanks for browsing through my question.

I'm currently exploring 19 years of NBA data that has 7978 observations and 56 variables. The information is collected from the 2000 to 2018 NBA regular season. I'm currently exploring how different variables during the regular season affect NBA salary. Variables include ppg, effective field goal percentage, height, school etc....

Anyways, what I wanted to do was explore the best and worst team for the each of the 19 years and then compare them.

Right now I'm trying to figure out how to write my summarize so it includes the player name that had the highest salary, or points per game or highest efficient field goal percentage.

For example, the Houston Rockets won 65 games during the 2018 NBA season, and I know James Harden was their highest paid player. I want to show his name in NameTopSal by choosing the his name based on the salary column == to max(salary).

Below is the code I've written.

data %>% subset(PlayerYear == 2018 & team == "HOU",
              select = name:SalCap) %>% 
  summarize(total = sum(salary),TopPaid = max(salary),
            #NameTopSal = select(name, salary == max(salary)),
            highscore = max(ppg), 
            #NameTopPpg = subset(salary == max(salary), select = name),
            efficient = max(EFG),
            #NameTopEFG = subset(salary == max(salary), select = name),
            HighPlusMinus = max(PlusMinus),
            #NameTopPM = subset(salary == max(salary), select = name),
            LeastPaid = min(salary), 
            #NameLowSal = subset(salary == min(salary), select = name),
            AvgSal = mean(salary), 
            tmsalary = median(tmsalary), salcap = median(SalCap),
            OverUnder = (median(tmsalary)/(median(SalCap))), 
            wins = median(TeamWins))

Any help on this matter would be greatly appreciated.

Thanks in advance.

Possible duplicate https://stackoverflow.com/questions/24237399/how-to-select-the-rows-with-maximum-values-in-each-group-with-dplyr Or https://stackoverflow.com/questions/25314336/extract-the-maximum-value-within-each-group-in-a-dataframe — Ronak Shah, Jan 14 '19 at 04:45

Gregor Thomas · Answer 1 · 2019-01-14T04:43:57.863

To select the element of one vector (name) corresponding to the highest value of another vector (salary), you can test which salary element is the max and use that as the index:

name[which(salary == max(salary))]

This is common enough that there's a utility function for it, which.max, so we can simplify do

name[which.max(salary)]

which.max is nicer than which(...) for your use case because it will return the index of the first max (in case multiple values equal the max), which means it will work nicely inside summarize.

Selecting data in summarize based on another column in R if it == max(salary)

1 Answers1