2

I've got monthly year over year data in a long format that I'm trying to spread with two columns. The only examples I've seen include a single key.

> dput(df)
structure(list(ID = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "b", "b", "b", "b", "b", "b", "b", "b", "b"), Year = c(2015L, 
2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2015L, 
2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L), Month = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L), Value = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 14L)), .Names = c("ID", "Year", "Month", 
"Value"), class = "data.frame", row.names = c(NA, -18L))

I'm trying to get it into a data format with years as columns 2:5, and one row per month per ID

ID  Month   2015    2016    2017
a   1         1     2       3
a   2         1     2       3
a   3         1     2       3
a   1         6     9       12
a   2         7     10      13
a   3         8     11      14

I've tried the following with the following error:

by_month_over_years = spread(df,key = c(Year,Month), Value)
Error: `var` must evaluate to a single number or a column name, not an integer vector
longlivebrew
  • 281
  • 3
  • 15

3 Answers3

3
library(tidyr)
library(dplyr)
df %>% group_by(ID) %>% spread(Year, Value)

# A tibble: 6 x 5
# Groups:   ID [2]
  ID    Month `2015` `2016` `2017`
  <chr> <int>  <int>  <int>  <int>
 1 a         1      1      2      3
 2 a         2      1      2      3
 3 a         3      1      2      3
 4 b         1      6      9     12
 5 b         2      7     10     13
 6 b         3      8     11     14
A. Suliman
  • 11,665
  • 5
  • 16
  • 31
  • 2
    I don't think you need the grouping step. Doesn't `spread(df, Year, Value)` do the same thing? – aosmith Aug 14 '18 at 17:35
  • @aosmith you are correct, Indeed `spread(df, Year, Value)` can do it. `group_by` is just an extra step of caution. – A. Suliman Aug 14 '18 at 17:39
3
library(reshape2) # or data.table, for dcast

dcast(df, ID + Month ~ Year)

#   ID Month 2015 2016 2017
# 1  a     1    1    2    3
# 2  a     2    1    2    3
# 3  a     3    1    2    3
# 4  b     1    6    9   12
# 5  b     2    7   10   13
# 6  b     3    8   11   14
IceCreamToucan
  • 23,575
  • 2
  • 13
  • 25
2

Here is a base R option with reshape

reshape(df, idvar = c('ID', 'Month'), direction = 'wide', timevar = 'Year')
#   ID Month Value.2015 Value.2016 Value.2017
#1   a     1          1          2          3
#2   a     2          1          2          3
#3   a     3          1          2          3
#10  b     1          6          9         12
#11  b     2          7         10         13
#12  b     3          8         11         14
akrun
  • 674,427
  • 24
  • 381
  • 486
  • 1
    really great learning from this command, thanks for sharing ++ve. could you please explain about idvar, direction and timevar, though I checked on google but will be nice if you could add it more explanation here too. – RavinderSingh13 Aug 15 '18 at 07:45
  • 1
    @RavinderSingh13 the `idvar` is similar to the one you use in `melt` (`id.var`), `direction` is the direction of transformation (here we change long to 'wide', and `timevar`splits up the 'Value' column based on the variable in 'timevar' – akrun Aug 15 '18 at 12:56