1

My goal is to sum certain columns of my dataframe and input that sum into a new column.

Suppose I have the following data.frame:

df <- data.frame(names=c("a","b","c","d","e","f"),
                 wb01=c(1,1,0,1,1,0),
                 wb02=c(0,0,0,0,1,1),
                 wb03=c(0,0,1,1,1,1),
                 wb04=c(1,1,0,1,1,1),
                 wb05=c(1,0,1,0,0,1),
                 wb06=c(1,1,1,1,1,1)) 

rownames(df) <- df$names

  wb01 wb02 wb03 wb04 wb05 wb06
a  1    0    0    1    1    1
b  1    0    0    1    0    1
c  0    0    1    0    1    1
d  1    0    1    1    0    1
e  1    1    1    1    0    1
f  0    1    1    1    1    1

I would like to select what columns are to be summed by using a vector that will contain the names of the columns to be sum. (My real dataframe and the number of columns I will be choosing is quite large and not in bunched together, ie/ I can't just choose columns 3-5, nor do I want to type each column since it would be over 2k...)

But back to the example, here are the columns I'd like to sum:

genelist <- c(wb02, wb03, wb06)

So the results would look like this:

  wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
a  1    0    0    1    1    1         1
b  1    0    0    1    0    1         1
c  0    0    1    0    1    1         2
d  1    0    1    1    0    1         3
e  1    1    1    1    0    1         3
f  0    1    1    1    1    1         3

Thanks for any help or tips!

2 Answers2

2

We can use rowSums

df$sum_genelist <- rowSums(df[intersect(genelist, names(df))], na.rm = TRUE)
df
#  names wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
#a     a    1    0    0    1    1    1            1
#b     b    1    0    0    1    0    1            1
#c     c    0    0    1    0    1    1            2
#d     d    1    0    1    1    0    1            2
#e     e    1    1    1    1    0    1            3
#f     f    0    1    1    1    1    1            3
 

where

genelist <- c('wb02', 'wb03', 'wb06')

data

df <- structure(list(names = c("a", "b", "c", "d", "e", "f"), wb01 = c(1, 
1, 0, 1, 1, 0), wb02 = c(0, 0, 0, 0, 1, 1), wb03 = c(0, 0, 1, 
1, 1, 1), wb04 = c(1, 1, 0, 1, 1, 1), wb05 = c(1, 0, 1, 0, 0, 
1), wb06 = c(1, 1, 1, 1, 1, 1)), row.names = c("a", "b", "c", 
"d", "e", "f"), class = "data.frame")
akrun
  • 674,427
  • 24
  • 381
  • 486
  • Thanks I tried that and get this error: Error in `[.data.frame`(df, genelist) : undefined columns selected Which now makes me think there is something in my genelist that is not in a column in my dataframe... (In my large, real df not this example.) – Laura Chipman Aug 19 '20 at 01:29
  • 1
    @LauraChipman this error can happen only when the `genelist` have some names that are not in the orignal data. Try `rowSums(df[intersect(genelist, names(df))])` – akrun Aug 19 '20 at 01:33
  • @LauraChipman what i meant is that you can replicate the same error on default dataset `mtcars` `mtcars[c('ab', 'cd')]# Error in `[.data.frame`(mtcars, c("ab", "cd")) : undefined columns selected` – akrun Aug 19 '20 at 01:36
1

You can use any_of to select only those columns that are present in your data.

genelist <- c('wb02', 'wb03', 'wb06', 'a')
library(dplyr)
df %>% mutate(sum_genelist = rowSums(select(., any_of(genelist))))

#  names wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
#1     a    1    0    0    1    1    1            1
#2     b    1    0    0    1    0    1            1
#3     c    0    0    1    0    1    1            2
#4     d    1    0    1    1    0    1            2
#5     e    1    1    1    1    0    1            3
#6     f    0    1    1    1    1    1            3
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143