R - Sum columns after spread without knowing column names

Question

I have a dataset and I want to sum all the columns values after I spread them.

For example, if I have the data.frame:

data.frame(
     country = c('US','US','Brazil','Brazil','Canada'), 
     variable = c('v1','v2','v1','v3','v4'),
     value = c(1,2,3,4,5)
   ) %>%
   spread(variable, value, fill = 0)

It results in:

  country v1 v2 v3 v4
1  Brazil  3  0  4  0
2  Canada  0  0  0  5
3      US  1  2  0  0

I want it to finish like this:

  country v1 v2 v3 v4  total
1  Brazil  3  0  4  0  7
2  Canada  0  0  0  5  5
3      US  1  2  0  0  3

Normally a simple mutate(total = v1 + v2 + v3 + v4) would solve the problem, but, in my case, I have no prior knowledge of the columns names.

How can I create this new column?

score 2 · Answer 1 · answered Mar 12 '18 at 19:37

I would just use rowSums, dropping the first column. But I agree with @MrFlick that it's probably easier to do this before spreading.

data.frame(
     country = c('US','US','Brazil','Brazil','Canada'), 
     variable = c('v1','v2','v1','v3','v4'),
     value = c(1,2,3,4,5)
   ) %>%
   spread(variable, value, fill = 0) %>% 
   mutate(total = rowSums(.[ ,-1]))

  country v1 v2 v3 v4 total
1  Brazil  3  0  4  0     7
2  Canada  0  0  0  5     5
3      US  1  2  0  0     3

score 0 · Answer 2 · answered Mar 12 '18 at 19:33

Seems like it would be easier to compute the value while your data is in long format first

dd <- data.frame(
  country = c('US','US','Brazil','Brazil','Canada'), 
  variable = c('v1','v2','v1','v3','v4'),
  value = c(1,2,3,4,5)
)

dd %>% group_by(country) %>% 
  summarize(variable="total", value=sum(value)) %>% 
  rbind(dd) %>% 
  spread(variable, value, fill = 0)
#   country total    v1    v2    v3    v4
# *  <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1  Brazil     7     3     0     4     0
# 2  Canada     5     0     0     0     5
# 3      US     3     1     2     0     0

score 0 · Accepted Answer · answered Mar 12 '18 at 19:35

Lots of ways, you're probably overthinking this :p. It doesn't have to all be on one line, if it's more natural to split it up (which it is IMO).

df <- data.frame(
  country = c('US','US','Brazil','Brazil','Canada'), 
  variable = c('v1','v2','v1','v3','v4'),
  value = c(1,2,3,4,5)
)
results1 <- df %>% group_by(country) %>% summarize(total = sum(value))
results2 <- df %>% spread(variable, value, fill = 0)
results <- results2 %>% full_join(results1, by="country")

R - Sum columns after spread without knowing column names

3 Answers3