0

I have about 50 CSV files that need to be imported into R, and then I want to set a header row which is the first two rows of data in each file. As an example:

DATE      FIRST  LAST ID
(M/D/Y)   NAME   NAME 

The data starts in Row 3. I can import all the files as their own unique dataframe which

temp = list.files(pattern="*.csv")
list2env(
  lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))), 
         read.csv), envir = .GlobalEnv)

But how can I change that code to combine the first two rows as a header

Gregor Thomas
  • 104,719
  • 16
  • 140
  • 257
ac1998
  • 1
  • 1

1 Answers1

0

I'd recommend reading in the first two rows, making the column names you want, and then read in the rest, skipping the first 2 rows. Here's an example for one file, found at filepath x:

cnames = read.csv(x, nrow = 2)
cnames = sapply(cnames, paste, collapse = "")
data = read.csv(x, skip = 2, col.names = cnames)

I'd also strongly encourage you to leave your results in a nice, easy-to-use list, rather than creating about 50 data frames in your environment. My reasoning for this recommendation is detailed here. That could look like this:

temp = list.files(pattern="*.csv")
data_list = list()
for(i in seq_along(temp)) {
  cnames = read.csv(x, nrow = 2)
  cnames = sapply(cnames, paste, collapse = "")
  data_list[[i]] = read.csv(x, skip = 2, col.names = cnames) 
}

names(data_list) = sub(".csv", "", basename(temp), fixed = TRUE)

# Then if you want to combine them all
complete_data = dplyr::bind_rows(data_list, .id = "source_file")
Gregor Thomas
  • 104,719
  • 16
  • 140
  • 257
  • I see your point, what I'd ultimately like to do is use rbind to combine all of those 50 spreadsheets into one master spreadsheet, but the spreadsheets sometimes have additional columns so creating a standard header might not work for each of the spreadsheets – ac1998 Nov 06 '19 at 17:28
  • I'm not recommending a standard header - these three lines will be run for each data frame inside the loop/lapply. – Gregor Thomas Nov 06 '19 at 17:55
  • @ac1998 see edits for a more thorough example. `dplyr::bind_rows` will include all columns that appear in any data frame. – Gregor Thomas Nov 06 '19 at 18:03