0

I have a dataset and I want to sum all the columns values after I spread them.

For example, if I have the data.frame:

data.frame(
     country = c('US','US','Brazil','Brazil','Canada'), 
     variable = c('v1','v2','v1','v3','v4'),
     value = c(1,2,3,4,5)
   ) %>%
   spread(variable, value, fill = 0)

It results in:

  country v1 v2 v3 v4
1  Brazil  3  0  4  0
2  Canada  0  0  0  5
3      US  1  2  0  0

I want it to finish like this:

  country v1 v2 v3 v4  total
1  Brazil  3  0  4  0  7
2  Canada  0  0  0  5  5
3      US  1  2  0  0  3

Normally a simple mutate(total = v1 + v2 + v3 + v4) would solve the problem, but, in my case, I have no prior knowledge of the columns names.

How can I create this new column?

Daniel
  • 525
  • 5
  • 23

3 Answers3

2

I would just use rowSums, dropping the first column. But I agree with @MrFlick that it's probably easier to do this before spreading.

data.frame(
     country = c('US','US','Brazil','Brazil','Canada'), 
     variable = c('v1','v2','v1','v3','v4'),
     value = c(1,2,3,4,5)
   ) %>%
   spread(variable, value, fill = 0) %>% 
   mutate(total = rowSums(.[ ,-1]))

  country v1 v2 v3 v4 total
1  Brazil  3  0  4  0     7
2  Canada  0  0  0  5     5
3      US  1  2  0  0     3
Daniel Anderson
  • 2,239
  • 8
  • 23
0

Seems like it would be easier to compute the value while your data is in long format first

dd <- data.frame(
  country = c('US','US','Brazil','Brazil','Canada'), 
  variable = c('v1','v2','v1','v3','v4'),
  value = c(1,2,3,4,5)
)

dd %>% group_by(country) %>% 
  summarize(variable="total", value=sum(value)) %>% 
  rbind(dd) %>% 
  spread(variable, value, fill = 0)
#   country total    v1    v2    v3    v4
# *  <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1  Brazil     7     3     0     4     0
# 2  Canada     5     0     0     0     5
# 3      US     3     1     2     0     0
MrFlick
  • 163,738
  • 12
  • 226
  • 242
0

Lots of ways, you're probably overthinking this :p. It doesn't have to all be on one line, if it's more natural to split it up (which it is IMO).

df <- data.frame(
  country = c('US','US','Brazil','Brazil','Canada'), 
  variable = c('v1','v2','v1','v3','v4'),
  value = c(1,2,3,4,5)
)
results1 <- df %>% group_by(country) %>% summarize(total = sum(value))
results2 <- df %>% spread(variable, value, fill = 0)
results <- results2 %>% full_join(results1, by="country")
thc
  • 8,747
  • 1
  • 18
  • 33