1

I have many years of data, each in an identical dataframe, I want to put all the years into a list and then write one for loop instead of repeating the command for each year.

#set dummies as factors (these dummies repeat across years)
mydummies<-c ('hru_i', 'ge_nonngsother_i','ge_sgt_i')
DF2012[,mydummies]<-lapply(DF2012[,mydummies],factor)
DF2013[,mydummies]<-lapply(DF2013[,mydummies],factor)

I tried to put all the dataframes in a list so that I could run a loop, but it did not change the dataframes...

df.list<- list(DF2012,DF2013)
#want to create a loop here
RLP
  • 19
  • 5

2 Answers2

1

A nested lapply will work for this:

df.list <- lapply(df.list, function(d) {
  d[mydummies] <- lapply(d[mydummies], factor)
  d
})

Here's a reproducible example demonstrating that it works:

df.list = list(a = head(mtcars), b = head(mtcars))
mydummies = c("cyl", "am")
sapply(df.list, sapply, class)
#      a         b        
# mpg  "numeric" "numeric"
# cyl  "numeric" "numeric"
# disp "numeric" "numeric"
# hp   "numeric" "numeric"
# drat "numeric" "numeric"
# wt   "numeric" "numeric"
# qsec "numeric" "numeric"
# vs   "numeric" "numeric"
# am   "numeric" "numeric"
# gear "numeric" "numeric"
# carb "numeric" "numeric"

df.list <- lapply(df.list, function(d) {
  d[mydummies] <- lapply(d[mydummies], factor)
  d
})

#      a         b        
# mpg  "numeric" "numeric"
# cyl  "factor"  "factor" 
# disp "numeric" "numeric"
# hp   "numeric" "numeric"
# drat "numeric" "numeric"
# wt   "numeric" "numeric"
# qsec "numeric" "numeric"
# vs   "numeric" "numeric"
# am   "factor"  "factor" 
# gear "numeric" "numeric"
# carb "numeric" "numeric"
Gregor Thomas
  • 104,719
  • 16
  • 140
  • 257
  • I originally thought the same, to use a nested lapply, and it runs without giving me an error, but when I look into DF2012 after running the nested lapply, the dummies are NOT changed to factors. I am not sure why that lapply does not seem to work – RLP Nov 24 '20 at 20:24
  • I just added a demonstration - seems to work fine. Make sure you're assigning things back, and if you still have trouble please add a reproducible example to your question. – Gregor Thomas Nov 24 '20 at 20:27
  • When I try it, it runs, but within the dataframes in the list, the dummies are not changed to factors. I can't see what I am missing, perhaps I am not assigning the outputs correctly? df.list str(DF2012) 'data.frame': 122 obs. of 266 variables: $ ID : int 1 2 3 4 5 6 7 13 14 15 ... $ hru_i : int 0 0 0 0 0 0 0 0 0 0 ... – RLP Dec 18 '20 at 00:20
  • I mean, the code modifies `df.list`, not `DF2012`. Look at `str(df.list[[1]])` (and use that subsequently). – Gregor Thomas Dec 18 '20 at 14:24
  • When you put `DF2012` in the list, it is copied. There is a copy in the list, and a copy not in the list. When you've got a bunch of data frames that are similar, it's nice to put them in a list, because it's easy to do things to all the data frames in the list. And then you use the data that's in the list, not the original copies that aren't in the list. So the idea is that the list replaces all your individual data frames, and you use the list instead. – Gregor Thomas Dec 18 '20 at 16:01
  • See my answer at [How to make a list of data frames?](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) for more discussion and examples. Basically, I'd recommend ignoring the data frames that aren't in the list. You can probably simplify the code that creates/imports your data frames, so that they go straight to the list without ever creating `DF2012` and `DF2013` in the first place. Naming the list can also be nice, so you can use, e.g., `df.list[["2012"]]` and `df.list[["2013"]]`. – Gregor Thomas Dec 18 '20 at 16:04
1

We can use tidyverse

library(dplyr)
library(purrr)
df.list <- map(df.list, ~ .x %>% 
                    mutate(across(all_of(mydummies), factor)))

data

df.list <- list(a = head(mtcars), b = head(mtcars))
mydummies <- c("cyl", "am")
akrun
  • 674,427
  • 24
  • 381
  • 486
  • I get the following error for this: Error in mutate(., across(all_of(mydummies), factor)) : could not find function "mutate" – RLP Nov 24 '20 at 21:09
  • @RLP It is from `dplyr` I loaded the package – akrun Nov 24 '20 at 21:10
  • Okay, re-installed tidyverse, but the dummy variables are still not factors... – RLP Nov 24 '20 at 21:20
  • @RLP I tested it on a reproducible example. It is working fine for me – akrun Nov 24 '20 at 21:21
  • @RLP Please check the example data on my post. – akrun Nov 24 '20 at 21:22
  • The outpout below shows that the dummy variables are not changed data frames are not changed... Am I missing how to assign the outputs perhaps? > df.list % + mutate(across(all_of(mydummies), factor))) > str(DF2012) 'data.frame': 122 obs. of 266 variables: $ ID : int 1 2 3 4 5 6 7 13 14 15 ... $ hru_i : int 0 0 0 0 0 0 0 0 0 0 ... – RLP Dec 18 '20 at 00:13