0

I'm just getting used to using lapply and I've been trying to figure out how I can use names from a vector to append within filenames I am calling, and naming new dataframes. I understand I can use paste to call the files of interest, but not sure I can create the new dataframes with the _var name appended.

site_list <- c("site_a","site_b", "site_c","site_d")
lapply(site_list,
  function(var) {
    all_var <- read.csv(paste("I:/Results/",var,"_all.csv"))
    tbl_var <- read.csv(paste("I:/Results/",var,"_tbl.csv"))
    rsid_var <-  read.csv(paste("I:/Results/",var,"_rsid.csv"))
    return(var)
})
Axeman
  • 27,115
  • 6
  • 69
  • 82
mgtrek
  • 3
  • 2

1 Answers1

0

Generally, it often makes more sense to apply a function to the list elements and then to return a list when using lapply, where your variables are stored and can be named. Example (edit: use split to process files together):

files <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
processFiles <- function(x){
    all <- read.csv(x[grep("_all.csv", x)])
    rsid <- read.csv(x[grep("_rsid.csv", x)])
    tbl <- read.csv(x[grep("_tbl.csv", x)])
    # do more stuff, generate df, return(df)
}
res <- lapply(files, processFiles)
user12728748
  • 6,092
  • 2
  • 3
  • 9
  • Thanks so much, this is exactly what I had in mind. If the site names aren't consistent though i.e. site_([abcd]) doesn't actually work, for example they're different city names boston, toronto etc. is there a way to integrate that in? (I should've used that in the original example) – mgtrek Feb 06 '20 at 21:11
  • Sure, you just have to match the regex pattern, e.g. to `pattern = "\\w+_.*csv"` and split accordingly, e.g. `split(files, gsub("(\\w+)_.*", "\\1", basename(files)))`. – user12728748 Feb 06 '20 at 21:35
  • Hi, this solution worked great, except after inspecting it looks like the grep function is not really working. i.e. if I comment out rsid and tbl (and exclude their files from the folder), just working with all alone, then the 'all' files are stored in the list "res" (not "all"), once I introduce tbl and rsid lines (and their files) then I get the error "Error in file(file, "rt") : invalid 'description' argument". So (1) the lists 'all' 'rsid' 'tbl' are not created (2) grep doesn't seem to be able to recognize and separate out the file names. If you have any suggestions much appreciated! – mgtrek Feb 07 '20 at 19:16
  • If you give me an example of real file names, it may be easier to find a working regex pattern... – user12728748 Feb 07 '20 at 19:39
  • all_boston_data.csv all_boston_met1.tbl.csv... all_toronto_data.csv all_toronto_met1.tbl.csv... and so on. (I did revise the extensions in the code accordingly) Thank you! – mgtrek Feb 07 '20 at 20:04
  • Your naming scheme does not match your example and is still unclear to me - you now seem to have `all__[data|met1.tbl].csv ` but no `_all.csv`, '_rsid.csv`, or `_tbl.csv`, and I see no revisions in your code. – user12728748 Feb 07 '20 at 20:55