0

I have a few hundred .csv's (here is a sample of 3 of these files) I would like to:

  1. Import into R
  2. Trim to a common row length
  3. Extract a specific column
  4. Combine into one data frame with object name as column name.

In my case I can get the files in fine using:

temp = list.files(pattern="*.csv")
list2env(
lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))), 
read.csv), envir = .GlobalEnv)} 

but after trying a few different suggestions I can't see how to loop and trim the data frames to 25,000 observations each, and extract the 'data.Activity' column for each.

Ideally I would end up with an object 25,000 observations, and each of the 'data.Activity' variables from the supplied datasets named as their 'Clinstag_XX' ID.

Any help would be much appreciated.

  • This question is too vague to answer. However, here's a hint: you should keep them in a list rather than use `list2env`. This way, you can again use `lapply` on all of the dataframes to "trim" them to a common row length, etc. See gregor's answer [here](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) for more motivating examples. – lmo Nov 03 '17 at 12:23
  • 1
    @Imo Thank you, I scrapped the list2env and used lapply which worked great. – Jonathon P Nov 04 '17 at 00:35

2 Answers2

0

Here's an example:

foo <- data.frame(a = 1:3, data.Activity = runif(3))
bar <- data.frame(data.Activity = runif(5), b = letters[1:5])
rows <- 2:3
do.call(cbind, lapply(ls(pattern="foo|bar"), function(dfname) {
  setNames(get(dfname)[rows, "data.Activity", drop=F], dfname)
}))
# bar        foo
# 2 0.4387158 0.08075924
# 3 0.3078052 0.92174396
lukeA
  • 48,497
  • 5
  • 73
  • 84
0

You can always work with data tables and do each operation in a for loop.

Something like that.

temp = list.files(path = 'data', pattern = "*.csv")


datatable <- read.csv(file = paste0("data/",temp[1])) %>% as.data.table()
datatable <- datatable[1:2500, names(datatable)[names(datatable)%like%"Activity|Date"], with = F]


for(file in temp[2:length(temp)]){
temp1 <- read.csv(file = paste0("data/",file)) %>% as.data.table()
temp11 <- temp1[1:2500, names(temp1)[names(temp1)%like%"Activity|Date"], with = F]
datatable <- rbind(datatable, temp11, fill = T)
}

You can adjust the row number of the variables that you want to extract. Note: I have put the csv files in a folder called data.

  • 1
    Thanks! after some minor tweaking (like cbind instead or rbind) I have worked your suggestion. Learnt a lot from your example. Thanks again for your effort. – Jonathon P Nov 04 '17 at 00:33