1
obs pre.data1   post.data1  pre.data2   post.data2  pre.data3   post.data3
1   0.40    0.12    0.61    0.15    0.58    0.06
2   0.21    0.05    0.18    0.49    0.35    0.24
3   0.48    0.85    0.00    0.62    0.96    0.37
4   0.66    0.29    0.88    0.56    0.13    0.72
5   0.43    0.23    0.80    0.78    0.05    0.90
6   0.86    0.04    0.25    0.34    0.99    0.79
7   0.57    0.86    0.20    0.34    0.11    0.34
8   0.13    0.39    0.24    0.51    0.79    0.63
9   0.87    0.57    0.00    0.55    0.86    0.72

I've attached an example of my dataset above.

I'm trying to convert from wide to long format with tidyr based off of the prefixes pre and post of each variable. Ideally I would like my data to be in the following format

obs key    data1    data2   data3
1   pre     0.50    0.26    0.12
1   post    0.18    0.50    0.42
2   pre     0.06    0.07    0.47
2   post    0.98    0.87    0.89
3   pre     0.34    0.55    0.72
3   post    0.26    0.99    0.71
4   pre     0.64    0.80    0.54
4   post    0.01    0.36    0.38

I'm fairly new to R but I know it uses the gather function, but I'm only familiar with gather with 2 columns as shown on many tutorial. Is there a simple way of doing the above? Thank you

Michael Luu
  • 365
  • 1
  • 9

2 Answers2

4

We can do this easily with data.table as the melt in data.table (to convert from 'wide' to 'long' format) can take multiple patterns for the measure columns.

library(data.table)
dM <- melt(setDT(df1), measure = patterns("data1$", "data2", "data3"), 
    value.name = c('data1', 'data2', 'data3'), variable.name = 'key')
dM[, key:= unique(sub("\\..*", "", names(df1)[-1]))[key]]
dM[order(obs)]
#    obs  key data1 data2 data3
# 1:   1  pre  0.40  0.61  0.58
# 2:   1 post  0.12  0.15  0.06
# 3:   2  pre  0.21  0.18  0.35
# 4:   2 post  0.05  0.49  0.24
# 5:   3  pre  0.48  0.00  0.96
# 6:   3 post  0.85  0.62  0.37
# 7:   4  pre  0.66  0.88  0.13
# 8:   4 post  0.29  0.56  0.72
# 9:   5  pre  0.43  0.80  0.05
#10:   5 post  0.23  0.78  0.90
#11:   6  pre  0.86  0.25  0.99
#12:   6 post  0.04  0.34  0.79
#13:   7  pre  0.57  0.20  0.11
#14:   7 post  0.86  0.34  0.34
#15:   8  pre  0.13  0.24  0.79
#16:   8 post  0.39  0.51  0.63
#17:   9  pre  0.87  0.00  0.86
#18:   9 post  0.57  0.55  0.72

Or using tidyr, we convert to 'long' format with gather, then separate the 'Var' column in to 'key' and "Var2' and spread it to 'wide' format.

library(dplyr)
library(tidyr)
gather(df1, Var, Val, -obs) %>% 
      separate(Var, into = c("key", "Var2")) %>% 
       mutate(key = factor(key, levels = c("pre", "post"))) %>%
      spread(Var2, Val)
#   obs  key data1 data2 data3
#1    1  pre  0.40  0.61  0.58
#2    1 post  0.12  0.15  0.06
#3    2  pre  0.21  0.18  0.35
#4    2 post  0.05  0.49  0.24
#5    3  pre  0.48  0.00  0.96
#6    3 post  0.85  0.62  0.37
#7    4  pre  0.66  0.88  0.13
#8    4 post  0.29  0.56  0.72
#9    5  pre  0.43  0.80  0.05
#10   5 post  0.23  0.78  0.90
#11   6  pre  0.86  0.25  0.99
#12   6 post  0.04  0.34  0.79
#13   7  pre  0.57  0.20  0.11
#14   7 post  0.86  0.34  0.34
#15   8  pre  0.13  0.24  0.79
#16   8 post  0.39  0.51  0.63
#17   9  pre  0.87  0.00  0.86
#18   9 post  0.57  0.55  0.72

This can be also done using reshape from base R (though, it may need some post-processing to make it tidy)

reshape(df1, idvar="obs", varying = list(2:3, 4:5, 6:7), direction="long")
akrun
  • 674,427
  • 24
  • 381
  • 486
3

In tidyr:

# melt to long form
df %>% gather(var, val, -obs) %>% 
    # separate into key and test labels
    separate(var, c('key', 'var')) %>% 
    # spread to wide form
    spread(var, val) %>%
    # make it pretty so pre is before post for each obs
    arrange(obs, desc(key))

#    obs  key data1 data2 data3
# 1    1  pre  0.40  0.61  0.58
# 2    1 post  0.12  0.15  0.06
# 3    2  pre  0.21  0.18  0.35
# 4    2 post  0.05  0.49  0.24
# 5    3  pre  0.48  0.00  0.96
# 6    3 post  0.85  0.62  0.37
# 7    4  pre  0.66  0.88  0.13
# 8    4 post  0.29  0.56  0.72
# 9    5  pre  0.43  0.80  0.05
# 10   5 post  0.23  0.78  0.90
# 11   6  pre  0.86  0.25  0.99
# 12   6 post  0.04  0.34  0.79
# 13   7  pre  0.57  0.20  0.11
# 14   7 post  0.86  0.34  0.34
# 15   8  pre  0.13  0.24  0.79
# 16   8 post  0.39  0.51  0.63
# 17   9  pre  0.87  0.00  0.86
# 18   9 post  0.57  0.55  0.72
alistaire
  • 38,696
  • 4
  • 60
  • 94