1

I am facing a problem that I thought would be straightforward to solve but turned out to be far above my horizon. I guess I have a misconception stuck in my brain.

I have some data.frames which I imported from files. All of which have the exact same columns with the same names. Since they are quite many I wanted to automate the process of combining them into one data.frame using bind_rows.

files <- list.files(path = "/home/username/Documents/", pattern = ".txt")

batch.import <- function(filename) {
  name <- unlist(strsplit(filename,"\\."))[1] # get rid of .txt
  df <- read_tsv(filename)
  colnames(df) <- c("name1", "name2", "name3", "name4")
  assign(name, df, envir = .GlobalEnv)
}

map(files,batch.import)

dataframes <- unlist(strsplit(files,"\\."))[seq(1,length(unlist(strsplit(files,"\\."))),2)]  # This produces a chr vector with all the data.frames I want to merge

First thing I obviously tried was:

combinedData <- bind_rows(dataframes)

Would have been too easy... I agree. Since it is a chr vector I actually understand, that this doesn't really refer to the data.frames but just to tries to do something with the text.

So I tried to use combinedData <- bind_rows(paste(dataframes)) which I thought could have done the job. But it wouldn't combine the data.frames either.

So I tried something more sophisticated, like a for loop (I also tried to use map() usage here, which unfortunately I dont remember):

for (df in dataframes) {
  if (exists("combinedData") {
    combinedData <- bind_rows(combinedData, .data[[df]]) # Here I think is the error (if not already before) I also tried {{}}
  } else {
    cobinedData <- .data[[df]]
  }
}

So from what I was reading until now I have to do something with {{}} or .data[[]] but this concept still didn't make it through to my synapses.

Any suggestions how I can use my chr-vector of data.frame names to combine the respective data.frames?

Thank you very much!

Michael

Le-Machi
  • 11
  • 1
  • 2
    `bind_rows` doesn't deal with strings, not sure why that could have worked. It deals with frames only, so ... you need to give it frames. While the use of `assign` is bad practice, since you've already done it, try `bind_rows(mget(dataframes))`. However, please read https://stackoverflow.com/a/24376207/3358227, your `batch.import` function goes against at least a couple "best practices". – r2evans May 11 '21 at 18:44
  • Thank you very much @r2evans! That helped a lot! I should have added that I am a rookie still trying to learn the concepts. In the end, it turned out I was completely on a wrong track. – Le-Machi May 11 '21 at 20:14

1 Answers1

0

What you can use is foreach instead. Here is psuedo code

library(foreach)
library(dplyr)

files <- list.files(path = "/home/username/Documents/", pattern = ".txt", full.name = TRUE)

# foreach will return a list of df which you can combine later using bind_rows
list_df <- foreach(i_file = files) %do% {
  df <- read_tsv(filename)
  colnames(df) <- c("name1", "name2", "name3", "name4")
  df
}

combine_df <- bind_rows(list_df)

If you want to create a named list of data import

files_name_no_ext <- gsub(pattern = "\\.txt", replacement = "", files)
names(list_df) <- files_name_no_ext
Sinh Nguyen
  • 3,191
  • 3
  • 14
  • 23
  • Thank you for spending your time and effort. The question was a bit different and I could solve it from the suggestion below the initial post. Anyway: Thanks a lot. Very much appreciated. – Le-Machi May 11 '21 at 20:17