-1

I have the following sample dataframe of baseball game attendance:

team <- rep("SFG", 6)
attendance <- seq(1000:1005)
year <- c(2000, 2000, 2001, 2001, 2002, 2002)
dates <- as.Date(c("2000-01-01", "2000-01-02", "2001-01-03", "2001-01-04", "2002-02-05", "2002-03-06"))

df <- data.frame(team, attendance, year, dates)

I would like to create two new columns.

The first would be a simple binary variable called "opening_day" that indicates (with 1 or 0) whether or not the game is the first game of the year.

The second would be called "opening_day_attendance" and would contain the attendance number of that first game. In other words, all rows for 2000 would have the attendance for that first game of the year.

If possible I'd like to achieve this with dplyr and piping.

Thanks so much!

EDIT: Others have noted that this is a possible duplicate question, and while it is generally similar to the one mentioned, most of the answers to the other question A) Don't create a new column in the original dataset with the values and B) Don't use dplyr and piping to achieve the results.

Michael
  • 71
  • 9
  • 4
    What did you try so far? Where did you get stuck? – Sotos Mar 13 '18 at 18:21
  • 1
    Possible duplicate of [How to select the rows with maximum values in each group with dplyr?](https://stackoverflow.com/questions/24237399/how-to-select-the-rows-with-maximum-values-in-each-group-with-dplyr) – denis Mar 13 '18 at 18:33
  • @Sotos I got stuck trying something like this: df %>% group_by(year) %>% mutate(opening_game = min(date)) – Michael Mar 13 '18 at 18:42

2 Answers2

1

This is an obvious duplicate question. You could try this:

library(data.table)
df <- data.table(team, attendance, year, dates)
df[,opening_day:=ifelse(dates==min(dates),1,0), by=year]
df[,opening_day_attendance:=attendance[dates==min(dates)], by=year]
JeanVuda
  • 1,618
  • 11
  • 27
  • This is a great solution, thank you! I didn't see that other question while searching, but the answers there seem to work. Thanks again. – Michael Mar 13 '18 at 18:56
1

I would recommend creating a separate data frame that contains your minimum dates, as I imagine your actual data would contain more than one team:

require(doBy)
mins       <- summaryBy(dates ~ team, data = df, FUN = min, keep.names = TRUE)
mins$dates <- as.Date(mins$dates, origin = '1970-01-01')

> mins
  team      dates
1  SFG 2000-01-01

Then, you can merge your original data into this data frame, to isolate game info for each team's first game:

mins <- merge(mins, df, by = c('team', 'dates'), all.x = TRUE)

> mins
  team      dates attendance year
1  SFG 2000-01-01          1 2000
93i7hdjb
  • 786
  • 1
  • 6
  • 15