0

I have an excel file with multiple sheets, and assign each sheet to a separate variable. I decided to try a for statement like this:

  x = 2
for (i in 1:6){
  x <- x + 1
  assign(paste0(i,"_file"), read.xlsx(file.path(path,"Template.xlsx"), sheetIndex = x, colIndex = 1:5, startRow = 6, stringsAsFactors=FALSE)
) 

#remove rows where the entire row is NA (do not remove rows that have some values and/or NA, has to be completely NA)
  paste0(i,"_file")<- paste0(i,"_file") %>% filter_at(colnames(paste0(i,"_file")), any_vars(!is.na(.)))

}

the issue is I don't know how to remove rows after I use assign(). After I assign each sheet to a variable, I want to do some cleaning and remove rows that are all NA. I tried to use paste0 and the <- operator but it doesn't work.


UPDATE:

Using list:] but how to remove last 4 rows of each data frame?

  x = 2
for (i in 1:6){
  x <- x + 1
  scoring_raw[[i]] <- read.xlsx(file.path(path,"Template.xlsx"), sheetIndex = x, colIndex = 1:5, startRow = 6, stringsAsFactors=FALSE) %>%
    filter_all(any_vars(!is.na(.))) %>% #want to remove last 4 rows of dataframe


}
karuno
  • 97
  • 8
  • 1
    This would be far easier if you kept all the objects in a list rather than assigning them to the global environment. The simplest way to do what you're trying to do though is to add your `filter_at` _inside_ the assign, straight after `read.xlsx`. You might also want to look into the function `get`, which does the opposite to `assign` – Allan Cameron Sep 14 '20 at 19:04
  • 5
    [Don't ever create `d1` `d2` `d3`, ..., `dn` in the first place. Create a list `d` with `n` elements.](https://stackoverflow.com/a/24376207/1422451) – Parfait Sep 14 '20 at 19:04
  • 1
    Does this answer your question? [Variable assignment within a for-loop](https://stackoverflow.com/questions/13213845/variable-assignment-within-a-for-loop) – Oliver Sep 14 '20 at 19:18

1 Answers1

1

Although @Parfait is totally right and you should prefer using list, here is a possibility to answer your question, using eval and parse. But you need to put your index i at the end of your object name, not in the beginning, to avoid problems:

x = 2
for (i in 1:6){
  x <- x + 1
  assign(paste0("file_",i), read.xlsx(file.path(path,"Template.xlsx"), sheetIndex = x, colIndex = 1:5, startRow = 6, stringsAsFactors=FALSE)
  ) 
  
  #remove rows where the entire row is NA (do not remove rows that have some values and/or NA, has to be completely NA)
  assign(paste0("file_",i), 
         eval(parse(text = paste0("file_",i))) %>% 
           filter_all(any_vars(!is.na(.)))
  
}

You can change filter_at by filter_all, as you are selecting all columns. But you could actually do:

for (i in 1:6){
  x <- x + 1
  assign(paste0("file_",i), 
         read.xlsx(file.path(path,"Template.xlsx"), sheetIndex = x, colIndex = 1:5, startRow = 6, stringsAsFactors=FALSE) %>%
           filter_all(any_vars(!is.na(.)))
  ) 
}
denis
  • 4,710
  • 1
  • 8
  • 33
  • Thanks, do you know how I can remove the last 4 rows from my data frame following %>%? I've updated my question above – karuno Sep 15 '20 at 14:32