R spreading multiple columns with tidyr

Question

Take this sample variable

df <- data.frame(month=rep(1:3,2),
                 student=rep(c("Amy", "Bob"), each=3),
                 A=c(9, 7, 6, 8, 6, 9),
                 B=c(6, 7, 8, 5, 6, 7))

I can use spread from tidyr to change this to wide format.

> df[, -4] %>% spread(student, A)
  month Amy Bob
1     1   9   8
2     2   7   6
3     3   6   9

But how can I spread two values e.g. both A and B, such that the output is something like

  month Amy.A Bob.A Amy.B Bob.B
1     1     9     8     6     5
2     2     7     6     7     6
3     3     6     9     8     7

score 192 · Accepted Answer · edited Nov 14 '19 at 15:10

192

Here's a possible both simple and very efficient solution using data.table

library(data.table) ## v >= 1.9.6
dcast(setDT(df), month ~ student, value.var = c("A", "B")) 
#    month Amy_A Bob_A Amy_B Bob_B
# 1:     1     9     8     6     5
# 2:     2     7     6     7     6
# 3:     3     6     9     8     7

Or a possible tidyr solution

df %>% 
  gather(variable, value, -(month:student)) %>%
  unite(temp, student, variable) %>%
  spread(temp, value)

#   month Amy_A Amy_B Bob_A Bob_B
# 1     1     9     6     8     5
# 2     2     7     7     6     6
# 3     3     6     8     9     7

EDIT 22/10/2019

As mentioned in comments by @gjabel, newer tidyr versions (v1.0.0+) have now pivot_wider and pivot_longer functions (currently in maturing state), hence, a newer approach would be

pivot_wider(data = df, 
            id_cols = month, 
            names_from = student, 
            values_from = c("A", "B"))
# # A tibble: 3 x 5
#     month A_Amy A_Bob B_Amy B_Bob
#     <int> <dbl> <dbl> <dbl> <dbl>
#   1     1     9     8     6     5
#   2     2     7     6     7     6
#   3     3     6     9     8     7

edited Nov 14 '19 at 15:10

steveb

4,686
2
24
32

answered Jun 02 '15 at 09:31

David Arenburg

87,271
15
123
181

I have the same problem but i have some multiple entries students, A, and B for some months. The code gives following error: Error: Duplicate identifiers for rows. Please help. – Polar Bear Aug 20 '16 at 09:42
1

@PolarBear How do you want to handle dupes? You want to sum? mean? Try the `data.table` solution and add `fun.aggregate = sum` into `dcast` – David Arenburg Aug 20 '16 at 17:51
I want to take median of the dupes with the help of tidyr – Polar Bear Aug 21 '16 at 16:36
1

@PolarBear `spread` and `gather` weren't designed to apply functions. You would probably need to use `dplyr` for that. Or you could just use `dcast` as I've suggested above. Or you could post a new question if you feel strong about it. – David Arenburg Aug 21 '16 at 16:39
1

I did a benchmark for these: https://stackoverflow.com/a/54889598/2563804 – hplieninger Feb 26 '19 at 16:06
2

`pivot_wider(data = df, id_cols = month, names_from = student, values_from = c("A", "B"))` should work in tidyr 1.0.0 or above – guyabel Oct 17 '19 at 05:11
@gjabel I've eventually decided to add it as an edit (with a credit to you) as it seem to be very hard to find it in the dupe. Thanks – David Arenburg Oct 22 '19 at 13:23
1

pivot_wider also works without quotation marks for variable names (in this case A and B), i.e. pivot_wider(data = df, id_cols = month, names_from = student, values_from = c(A, B)) – jlp Apr 02 '20 at 22:41

R spreading multiple columns with tidyr

1 Answers1

Linked

Related