Question on filtering down a large dataset

Question

In the problem here, I have a data set of popular baby names going back to 1880. I am trying to find the timelessly popular baby names, meaning the 30 most common names for its gender in every year of my data.

I have tried using group_by, top_n, and filter, but just am not very well verse with the program yet, so unsure how the proper order and thinking goes here.

library(babynames)

timeless <- babynames %>% group_by(name, sex, year) %>% top_n(30) %>% filter()

I am getting a large data table back with the 30 most common names for each year of data, but I want to compare that to find the most common names in every year. My prof hinted that there should be four timeless boy names, and one timeless girl name. Any help is appreciated!

Possible duplicate/Related https://stackoverflow.com/questions/27766054/getting-the-top-values-by-group — Ronak Shah, Sep 26 '19 at 01:10

score 1 · Accepted Answer · answered Sep 26 '19 at 00:27

1

Here is the answer.

library(babynames)
library(dplyr)

timeless <- babynames %>% 
  group_by(sex, year) %>% 
  top_n(30) %>%
  ungroup() %>%
  count(sex, name) %>%
  filter(n == max(babynames$year) - min(babynames$year) + 1)

timeless
# # A tibble: 5 x 3
#   sex   name          n
#   <chr> <chr>     <int>
# 1 F     Elizabeth   138
# 2 M     James       138
# 3 M     John        138
# 4 M     Joseph      138
# 5 M     William     138

Regarding your original code, group_by(name, sex, year) %>% top_n(30) does not make sense as all combination of name, sex, and year are unique, thus nothing for you to filer the "top 30".

answered Sep 26 '19 at 00:27

www

35,154
12
33
61

You're the bomb; thank you! Why do you add the +1 in the line with the filter function? To add it as another column in the tibble? – PageSim Sep 26 '19 at 00:34
`max(babynames$year) - min(babynames$year) + 1` calculates the total number of years available in the dataset. The final `n` should equal that number. – www Sep 26 '19 at 00:35

Question on filtering down a large dataset

1 Answers1