If I understand correctly, the OP creates separate tibbles in the environment but wants to store the names of the tibbles in a list in order to iterate through each tibble separately.
Instead of storing the names of the tibbles in a list it is common practice to store the data objects in a list:
library(dplyr)
datasets <- c("Dataset_A", "Dataset_B")
file_names <- paste0(datasets, ".csv")
list_of_tibbles <- lapply(file_names, readr::read_csv) %>%
setNames(datasets)
For the given sample files (see below), we get
list_of_tibbles
$Dataset_A
# A tibble: 2 x 2
ID val
<dbl> <dbl>
1 1 0.288
2 2 0.788
$Dataset_B
# A tibble: 2 x 2
ID val
<dbl> <dbl>
1 1 0.409
2 2 0.883
Now, we can iterate through the list of tibbles as requested by the OP
for (tib in list_of_tibbles) {
# do things here
print(sum(tib$val))
}
[1] 1.075883
[1] 1.291994
In many circumstances, lapply()
is more convenient as it returns the results of the operation as a list, again:
lapply(list_of_tibbles, function(x) sum(x$val))
$Dataset_A
[1] 1.075883
$Dataset_B
[1] 1.291994
My preferred approach is to combine the single tibbles into a large one and to use grouping in oder to operate on the data of the single tibbles
combined_tibble <- lapply(file_names, readr::read_csv) %>%
setNames(datasets) %>%
bind_rows(.id = "source")
combined_tibble
source ID val
<chr> <dbl> <dbl>
1 Dataset_A 1 0.288
2 Dataset_A 2 0.788
3 Dataset_B 1 0.409
4 Dataset_B 2 0.883
The source
column indicates the origin of each row and can be used to aggregate by group, e.g.,
combined_tibble %>%
group_by(source) %>%
summarize(sum(val))
source `sum(val)`
<chr> <dbl>
1 Dataset_A 1.08
2 Dataset_B 1.29
For the sake of completeness, here is the same approach using data.table
:
library(data.table)
datasets <- c("Dataset_A", "Dataset_B")
file_names <- paste0(datasets, ".csv")
combined_dt <- lapply(file_names, fread) %>%
setNames(datasets) %>%
rbindlist(idcol = "source")
combined_dt
source ID val
1: Dataset_A 1 0.2875775
2: Dataset_A 2 0.7883051
3: Dataset_B 1 0.4089769
4: Dataset_B 2 0.8830174
combined_dt[, sum(val), by = source]
source V1
1: Dataset_A 1.075883
2: Dataset_B 1.291994