I have 30 sas-files (dataset1.sas7bdat through dataset30.sas7bdat, approx. 10 GB per file) in a folder, and need to analyse a subset of rows in these data files (all rows where the character variable A begins with 10). Thus, I need to read each of the sas-files into R, filter a subset with grep on variable A and then save each of these filtered datasets as a .rds-file. I'm trying to achieve this using a for loop of list.files() and the Haven package to read the sas-file. In order to avoid going out-of-memory, I need to remove the imported dataset on each iteration after the subset has been filtered and saved as .rds.
Though not elegant nor satisfying, I could hard-code it manually 30 times over like this, copy/pasting and incrementing the suffixes by 1 each time:
dt1 <- haven::read_sas("~/folder/dataset1.sas7bdat")
dt1 <- data.table::as.data.table(dt1)
dt1 <- dt1[grep("^10", A)]
saveRDS(dt1, "~/folder/subset1.rds")
dt2 <- haven::read_sas("~/folder/dataset2.sas7bdat")
dt2 <- data.table::as.data.table(dt2)
dt2 <- dt1[grep("^10", A)]
saveRDS(dt2, "~/folder/subset2.rds")
etc.
While the following for loop technically works to read the files into memory, it is never going to finish due to massively going out of memory, so it does not allow me to filter the data:
folder <- "~/folder/"
file_list <- list.files(path = folder, pattern = "^dataset")
for (i in 1:length(file_list)) {
assign(file_list[i], Haven::read_sas(paste(folder, file_list[i], sep='')))
}
Is there a way to - on each iteration in the loop - filter the dataset, remove the unfiltered dataset and save the subset in a .rds-file? I can't seem to come up with a way to incorporate this into my approach of using the assign() function. Is there a better way to go about this?