1

I have 3 data frames that I'd like to run the same data.table function on. I could do this manually for each data.frame but I'd like to learn how to do it more efficiently.

Using the data.table package, I want to replace the contents of col1 with the contents of col2 only if col1 contains "a". And I want to run this code over three different dataframes. On a single data.frame, this works fine:

df1 <- data.frame(col1 = c("a", "a", "b"), col2 = c("AA", "AA", "AA"))
library(data.table)
dt = data.table(df1)
dt[grepl(pattern = "a", x = df1$col1),  col1 :=col2]

but I am lost trying to get this to run over multiple dataframes:

df1 <- data.frame(col1 = c("a", "a", "b"), col2 = c("AA", "AA", "AA"))
df2 <- data.frame(col1 = c("b", "b", "a"), col2 = c("AA", "BB", "BB"))
df3 <- data.frame(col1 = c("b", "b", "b"), col2 = c("AA", "AA", "BB"))

library(data.table)
listdfs = list(df1, df2, df3)
for (i in dt[[]]) {
dt[[i]][grepl(pattern = "a", x = df[[i]]$col1), col1 := col2] }

But this obviously doesn't work because I have no clue what I'm doing with the for loop. Any guidance/teaching would be appreciated. Thanks!

tyluRp
  • 4,308
  • 2
  • 13
  • 33
moxed
  • 283
  • 4
  • 14

1 Answers1

2

If we are looping through the list, then loop over the sequence of list and then do the assignment

listdfs = list(df1, df2, df3)
lapply(listdfs, setDT) # change the `data.frame` to `data.table`
for (i in seq_along(listdfs)) { # loop over sequence
   listdfs[[i]][grepl(pattern = "a", x = col1), col1 := col2]
 }

This would change the elements i.e. data.table with in the listdfs as well the object 'df1', 'df2', 'df3' itself as we didn't create any copy

df1
#   col1 col2
#1:   AA   AA  # change
#2:   AA   AA  # change
#3:    b   AA

df2
#   col1 col2
#1:    b   AA
#2:    b   BB
#3:   BB   BB   # change

df3
#   col1 col2
#1:    b   AA
#2:    b   AA
#3:    b   BB
akrun
  • 674,427
  • 24
  • 381
  • 486
  • This is great. Thank you. is "seq_along()" standard for running through lists in a loop? – moxed Feb 18 '18 at 03:59
  • 1
    @moxed `seq_along` can be for `vector`/data.frame/data.table column, list elements etc – akrun Feb 18 '18 at 03:59
  • 1
    @moxed The standard for running data.tables/data.frames through a loop, however, is to not run them through a loop. You can use `rbind` or `rbindlist` to get a single table, explained under the "combining" section of Gregor's answer here https://stackoverflow.com/a/24376207/ – Frank Feb 18 '18 at 07:23