0

I would like to import some xlsx files into rstudio. These files are all in the same folder, but there are also some other xlsx files in there, that I am not interested in:

  • Alles_2017.xlsx
  • Alles_2018.xlsx
  • Alles_2019.xlsx
  • Alles_2020_incl_Oct.xlsx
  • Blabla.xlsx
  • Idontknow.xlsx

I would like to import all four "Alles_...." files building a single dataframe (called for example All) containing all this data. And I would like the function to ignore every other file not starting with "Alles_"

Which function can I use?

Thank you in advance!

joran
  • 157,274
  • 30
  • 404
  • 439
Beginner
  • 25
  • 6
  • Most of this is a duplicate of https://stackoverflow.com/a/24376207/3358272: reading multiple files iteratively; as such, I recommend that you load these into a list of frames instead of individual objects, since I'm inferring that you will be doing the same or very similar operations on each frame. The part that is different is "some files", meaning you want to filter. For that, either craft the correct filter for one call to `list.files(...)`, or combine multiple calls with specific patterns (since all you need is a vector of filenames/paths). – r2evans Nov 02 '20 at 20:27

2 Answers2

0

The list.files will create a vector of file names, the pattern argument acts like a filter where it will only match the files with Alles in them. map_dfr is a fancy way of doing a for loop and binding the tables together. You may need to try map itself first if all the alles files tables aren't structured the same. That will return a list so you can inspect them. read_xlsx will do as you would think. You may need to figure out what sheet in the file you need, the default is the first sheet in the workbook.

library(purrr)
library(magrittr)
library(readxl)
list.files(path = "directory they live", 
           pattern = "Alles_", 
           full.names = TRUE) %>% 
  map_dfr(., read_xlsx)
NotThatKindODr
  • 656
  • 3
  • 14
  • (1) You can simplify the second half to `%>% map(., read_xlsx)`, since it's an unmodified function call. (2) I encourage specifying which exact libraries from `tidyverse` are required, as some people do not have the gargantuan meta-package `tidyverse` available, even if many of the tidy packages *are*; in this example, it's either `magrittr` or `dplyr`, plus the already broken-out `readxl`. – r2evans Nov 02 '20 at 20:28
  • 1
    Thank you for the suggestions. Both are just bad lazy habit I have when I am trying to give people quick answers – NotThatKindODr Nov 02 '20 at 21:56
  • Thank you! And if these excel files have several sheets and I always want only the second one, how can I add this to the code? – Beginner Nov 03 '20 at 15:44
  • Either use the `sheet = 2` OR `sheet = "sheetname"` as an argument in `read_xlsx`, I prefer to use the string because it gives me an explicit detail of what I am loading. That is entirely preference – NotThatKindODr Nov 03 '20 at 19:48
0

Firstly, I would set the working directory with setwd(), and then:

dir()[grepl('Alles', ignore.case = T, dir())] -> files    
do.call('rbind', lapply(files, function(file) openxlsx::read.xlsx(file)))

In case you want the second sheet:

do.call('rbind', lapply(files, function(file) openxlsx::read.xlsx(file, sheet = 2)))
AlexB
  • 1,962
  • 2
  • 8
  • 13
  • I recommend *against* requiring `setwd` in order to get into the correct directory. While the debate of absolute-vs-relative filenames is slightly different, I think the default use of `full.names=TRUE` fixes the vast majority of files-in-other-directory operations. – r2evans Nov 02 '20 at 20:48
  • Thank you! And if these excel files have several sheets and I always want only the second one, how can I add this to the code? – Beginner Nov 03 '20 at 15:44
  • @Beginner, if the solution helped you, please upvote and accept the answer. – AlexB Nov 04 '20 at 16:08