1

I am looking to store the names of a few tibbles into a list which I will later use to iterate through each tibble separately. Is there a way to do this?

##This is how I am reading in my data from a csv's
i <- 1
for (name in datasets) {
  file <- paste(".csv", name, sep="")
  
  assign(paste0("DF", i), read_csv(file))
  
  i <- i + 1
  list_of_df <- c(list_of_df, )
}

list <- c(tib1, tib2 tib3)
  for(tibble in list) {
    #do things here
  }
  • 1
    What is `list_of_df`? Notes: (1) `tibble::lst` by default uses the name of the object; (2) building a list of frames is more efficiently done using `lapply`, see https://stackoverflow.com/a/24376207/3358272; (3) naming a list `list` is really bad practice, it's not clear (to the human eye) if you mean the base R function `list` or the variable you created named `list`; (4) while `lapply` is generally better, in a `for` loop it may be more useful to use `for (nm in names(list_of_frames)) { dat – r2evans Oct 03 '20 at 16:25
  • 1
    i think file is wrong you're using `paste(".csv", name, sep="")` => `.csvname`. also why not use `paste0`. also you're naming them `DF#` and using `tib#` – Abdessabour Mtk Oct 03 '20 at 17:23

2 Answers2

0

Following on my comment :

files <- paste0(datasets, ".csv")
list_of_df  <- lapply(1:length(files), function(x){
     assign(y<-paste0("tib", x), read_csv(files[x]), envir=.GlobalEnv)
     y
})
### Another way to it / Simpler:
list_of_df  <- lapply(files, read_csv)
names(list_of_df) <- paste0("tib", 1:length(files))
list2env(list_of_df)

names <- paste0("tib", 1:3)
lapply(names, function(x){
    get(x,  envir=.GlobalEnv)
})
# Or
for(x in names){
   tib <- get(x)
}
Abdessabour Mtk
  • 3,677
  • 2
  • 9
  • 21
  • 1
    `fortunes::fortune(236)` says *The only people who should use the assign function are those who fully understand why you should never use the assign function. -- Greg Snow R-help (July 2009)* – Uwe Oct 03 '20 at 22:06
0

If I understand correctly, the OP creates separate tibbles in the environment but wants to store the names of the tibbles in a list in order to iterate through each tibble separately.

Instead of storing the names of the tibbles in a list it is common practice to store the data objects in a list:

library(dplyr)
datasets <- c("Dataset_A", "Dataset_B")
file_names <- paste0(datasets, ".csv")
list_of_tibbles <- lapply(file_names, readr::read_csv) %>% 
  setNames(datasets)

For the given sample files (see below), we get

list_of_tibbles
$Dataset_A
# A tibble: 2 x 2
     ID   val
  <dbl> <dbl>
1     1 0.288
2     2 0.788

$Dataset_B
# A tibble: 2 x 2
     ID   val
  <dbl> <dbl>
1     1 0.409
2     2 0.883

Now, we can iterate through the list of tibbles as requested by the OP

for (tib in list_of_tibbles) {
  # do things here
    print(sum(tib$val))
}
[1] 1.075883
[1] 1.291994

In many circumstances, lapply() is more convenient as it returns the results of the operation as a list, again:

lapply(list_of_tibbles, function(x) sum(x$val))
$Dataset_A
[1] 1.075883

$Dataset_B
[1] 1.291994

My preferred approach is to combine the single tibbles into a large one and to use grouping in oder to operate on the data of the single tibbles

combined_tibble <- lapply(file_names, readr::read_csv) %>% 
  setNames(datasets) %>% 
  bind_rows(.id = "source")

combined_tibble
  source       ID   val
  <chr>     <dbl> <dbl>
1 Dataset_A     1 0.288
2 Dataset_A     2 0.788
3 Dataset_B     1 0.409
4 Dataset_B     2 0.883

The source column indicates the origin of each row and can be used to aggregate by group, e.g.,

combined_tibble %>% 
  group_by(source) %>% 
  summarize(sum(val))
  source    `sum(val)`
  <chr>          <dbl>
1 Dataset_A       1.08
2 Dataset_B       1.29

For the sake of completeness, here is the same approach using data.table:

library(data.table)
datasets <- c("Dataset_A", "Dataset_B")
file_names <- paste0(datasets, ".csv")
combined_dt <- lapply(file_names, fread) %>% 
  setNames(datasets) %>% 
  rbindlist(idcol = "source")

combined_dt
      source ID       val
1: Dataset_A  1 0.2875775
2: Dataset_A  2 0.7883051
3: Dataset_B  1 0.4089769
4: Dataset_B  2 0.8830174
combined_dt[, sum(val), by = source]
      source       V1
1: Dataset_A 1.075883
2: Dataset_B 1.291994
Uwe
  • 34,565
  • 10
  • 75
  • 109