Run the same codes with data and variable names changed in R

Question

I need to run very similar codes for 3 different dataset. My current codes look like this:

## data a
a_dat2 <- merge(a_dat, zip, by = "zip", all.x = T)
a_dat2 <- a_dat2 %>%
group_by(zip) %>%
summarize(dist_a_min = min(dist))
## data b
b_dat2 <- merge(b_dat, zip, by = "zip", all.x = T)
    b_dat2 <- b_dat2 %>%
     group_by(zip) %>%
summarize(dist_b_min = min(dist))
## data c
c_dat2 <- merge(c_dat, zip, by = "zip", all.x = T)
    c_dat2 <- c_dat2 %>%
     group_by(zip) %>%
summarize(dist_c_min = min(dist))

The codes for the 3 dataset are same except that the name of the data varies: a_dat, b_dat, c_dat. The variable name dist varies too: dist_a_min, dist_b_min, dist_c_min. What function/loop can be used to shorten the codes so that I don't need to copy and paste for each dataset separately?

If the frames are all similar, I recommend storing them in a `list`-of-frames instead of individual frames (ref: https://stackoverflow.com/a/24376207). From there, you can do `lapply(list_of_frames, function(a) merge(a, zip, by = "zip") %>% ...)` (or `purrr::map`). — r2evans, May 28 '19 at 17:56

akrun · Accepted Answer · 2019-05-28T18:17:40.723

An option would be to place the elements in a list with mget, loop through the list with imap, join (?left_join) with 'zip' dataset, grouped by 'zip' and get the min of 'dist' while creating the column name based on the identifier name substring

library(tidyverse)
mget(ls(pattern = "_dat2$")) %>%
        imap(~ left_join(.x, zip, by = 'zip') %>%
             group_by(zip) %>%
             summarise((! str_c('dist_', substr(.y, 1, 1), '_min')  :=  min(dist)))

Or another option is to create a function for repeated tasks

joinSumm <- function(dat, groupName, colName, data2) {
    groupName <- enquo(groupName)
    colName <- enquo(colName)
    nm1 <- str_c('dist_', str_sub(rlang::as_name(enquo(dat)), 1, 1), '_min')
    dat %>%
       left_join(data2, by = rlang::as_name(groupName)) %>%
        group_by(!! groupName) %>%
        summarise((!! nm1) := min(!! colName))

  }
joinSumm(a_dat2, zip, dist, zip)
joinSumm(b_dat2, zip, dist, zip)

A reproducible example with built-in dataset iris (without the join part)

list(a_dat = iris, b_dat = iris, c_dat = iris) %>% 
      imap(~ .x %>% 
            group_by(Species) %>%
            summarise(!! str_c('dist_', substr(.y, 1, 1), '_min') := min(Sepal.Length)))
#$a_dat
# A tibble: 3 x 2
#  Species    dist_a_min
#  <fct>           <dbl>
#1 setosa            4.3
#2 versicolor        4.9
#3 virginica         4.9

#$b_dat
# A tibble: 3 x 2
#  Species    dist_b_min
#  <fct>           <dbl>
#1 setosa            4.3
#2 versicolor        4.9
#3 virginica         4.9

$c_dat
# A tibble: 3 x 2
#  Species    dist_c_min
#  <fct>           <dbl>
#1 setosa            4.3
#2 versicolor        4.9
#3 virginica         4.9

I used `merge(dat, zip, by = "zip", all.x = T)` in my original codes, so I guess should be `left_join`? — mandy, May 28 '19 at 18:07

Run the same codes with data and variable names changed in R

1 Answers1