Spread over multiple columns in R - dplyr tidyr solution

Question

I've got monthly year over year data in a long format that I'm trying to spread with two columns. The only examples I've seen include a single key.

> dput(df)
structure(list(ID = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "b", "b", "b", "b", "b", "b", "b", "b", "b"), Year = c(2015L, 
2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2015L, 
2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L), Month = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L), Value = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 14L)), .Names = c("ID", "Year", "Month", 
"Value"), class = "data.frame", row.names = c(NA, -18L))

I'm trying to get it into a data format with years as columns 2:5, and one row per month per ID

ID  Month   2015    2016    2017
a   1         1     2       3
a   2         1     2       3
a   3         1     2       3
a   1         6     9       12
a   2         7     10      13
a   3         8     11      14

I've tried the following with the following error:

by_month_over_years = spread(df,key = c(Year,Month), Value)
Error: `var` must evaluate to a single number or a column name, not an integer vector

Have you looked at https://stackoverflow.com/questions/30592094/r-spreading-multiple-columns-with-tidyr — André Costa, Aug 14 '18 at 17:43

score 3 · Answer 1 · answered Aug 14 '18 at 17:16

3

library(tidyr)
library(dplyr)
df %>% group_by(ID) %>% spread(Year, Value)

# A tibble: 6 x 5
# Groups:   ID [2]
  ID    Month `2015` `2016` `2017`
  <chr> <int>  <int>  <int>  <int>
 1 a         1      1      2      3
 2 a         2      1      2      3
 3 a         3      1      2      3
 4 b         1      6      9     12
 5 b         2      7     10     13
 6 b         3      8     11     14

answered Aug 14 '18 at 17:16

A. Suliman

11,665
5
16
31

2

I don't think you need the grouping step. Doesn't `spread(df, Year, Value)` do the same thing? – aosmith Aug 14 '18 at 17:35
@aosmith you are correct, Indeed `spread(df, Year, Value)` can do it. `group_by` is just an extra step of caution. – A. Suliman Aug 14 '18 at 17:39

IceCreamToucan · Answer 2 · 2018-08-14T17:23:41.310

3

library(reshape2) # or data.table, for dcast

dcast(df, ID + Month ~ Year)

#   ID Month 2015 2016 2017
# 1  a     1    1    2    3
# 2  a     2    1    2    3
# 3  a     3    1    2    3
# 4  b     1    6    9   12
# 5  b     2    7   10   13
# 6  b     3    8   11   14

edited Aug 14 '18 at 17:23

answered Aug 14 '18 at 17:16

IceCreamToucan

23,575
2
13
25

score 2 · Answer 3 · answered Aug 15 '18 at 01:38

2

Here is a base R option with reshape

reshape(df, idvar = c('ID', 'Month'), direction = 'wide', timevar = 'Year')
#   ID Month Value.2015 Value.2016 Value.2017
#1   a     1          1          2          3
#2   a     2          1          2          3
#3   a     3          1          2          3
#10  b     1          6          9         12
#11  b     2          7         10         13
#12  b     3          8         11         14

answered Aug 15 '18 at 01:38

akrun

674,427
24
381
486

1

really great learning from this command, thanks for sharing ++ve. could you please explain about idvar, direction and timevar, though I checked on google but will be nice if you could add it more explanation here too. – RavinderSingh13 Aug 15 '18 at 07:45
1

@RavinderSingh13 the `idvar` is similar to the one you use in `melt` (`id.var`), `direction` is the direction of transformation (here we change long to 'wide', and `timevar`splits up the 'Value' column based on the variable in 'timevar' – akrun Aug 15 '18 at 12:56

Spread over multiple columns in R - dplyr tidyr solution

3 Answers3