0

I have the .DAT files in one variable. I would like to create a loop that will read each ";" separated files in a data frame and will merge all the file into a data frame as it iterates through the list.

So, the files can be viewed by alldata[[1..]].

The content of the list.

Now someone suggest a loop that can iterate through the list and read the .DAT file (sep=";")

  • Hi r2evans, Thank you for posting your solution. But it seems to create a list of all file from the directory. I am actually interested in the content of the files into one data frame – Dipayan Banerjee Jan 22 '19 at 03:39
  • Using a `for` loop to iteratively add to a frame will perform horribly with a reasonable number of file. The typical path with a list of frames is to follow it up with something like `dplyr:: bind_rows`, `data.table::rhindlist`, or `do.call(rbind.data.frame, ...)`. – r2evans Jan 22 '19 at 06:21
  • Functions `rio::import_list`, `io:qread` and `tor::list_rds` and `tor::load_csv` might be of help here. – meriops Jan 22 '19 at 08:28
  • Dipayan, see https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207 as some uses of lists-of-frames. – r2evans Jan 22 '19 at 14:52

2 Answers2

1

All your files are being read, but the information is being overwritten onto data.2002 every time so in the end you only see the final file. You need to put first all of your data in a list at the end of the loop with an index.

EDIT: As noted by Nick below, your file.type variable (which should be called file_list or something) should have the actual length of the files, otherwise you may end up with subscript errors (added some similar code)

files <- list.files(path_to_your_folder, pattern = ".dat", recursive = TRUE, include.dirs = FALSE)

data.2002 <- list()
counter <- 1

for(i in files) {

   tempFile < -read.delim(file.path(path_to_your_folder, i)))
   ...
   <Here your modifications to tempFile>
   ...
  data.2002[[counter]] <- tempFile
  counter <- counter + 1

}

Then, you can bind rows afterwards, for which there are at least a couple of ways:

df <- do.call("rbind", listOfDataFrames)

dplyr::bind_rows(list_of_dataframes, .id = "column_label")
Fons MA
  • 780
  • 1
  • 6
  • 19
0

you could also try this to iterate through all of the files in the directory.

# Read all of the DAT files in the directory. 
# Ensure there are only the DAT files you need in there.
temp <- list.files(YOUR_DIRECTORY, pattern = "*.dat", full.names = TRUE)

# Create an empty data frame for the data.
# Change ncol to suit the number of cols you have).
outputs.df <- data.frame(matrix(NA, nrow = 1, ncol = 10))

# Import the dat data files from your "YOUR_DIRECTORY" location
for(i in 1:length(temp)){

    # Read in each DAT file
    myfiles <-read.delim(temp[i], 
        header=FALSE, skip=0, sep=";") # change skip=X to ignore the first X rows as required.

    # Ensure column names are identical
    names(outputs.df) <- names(myfiles)
    # bind the rows
    outputs.df <- rbind(outputs.df, myfiles)
}
    # Remove the first row as it contains NA values
    outputs.df <- outputs.df[-1,]
Nick
  • 216
  • 2
  • 12
  • My .DAT file is separated by ";". So, read.table won't work. So, I am using read.delim() – Dipayan Banerjee Jan 22 '19 at 02:02
  • I've updated the code to show read.delim(), basically the same arguments required as before though. – Nick Jan 22 '19 at 02:15
  • Hello Nick, Thanks for sharing your solution, but the loop does not stop. – Dipayan Banerjee Jan 22 '19 at 02:24
  • The loop should only iterate through the number of files in the directory i.e. 1 to length(temp). Make sure you only have the DAT files required in a separate folder and point to that. – Nick Jan 22 '19 at 02:26