gather multiple times at once

Question

Say I have a data.frame dat like this one

id       X_a      X_b       Y_a      Y_b
v1 -0.012758 0.004537 -0.022725 0.005833
v2 -0.016706 0.003332 -0.006948 0.004965
v3 -0.005629 0.006972 -0.010578 0.006069

I want it to become "gathered". Note that X_a and X_b shall be gathered as well as Y_a and Y_b. The desired result is:

id    X_Y_type         X         Y
v1       X_Y_a -0.012758 -0.022725
v2       X_Y_a -0.016706 -0.006948
v3       X_Y_a -0.005629 -0.010578
v1       X_Y_b  0.004537  0.005833
v2       X_Y_b  0.003332  0.004965
v3       X_Y_b  0.006972  0.006069

gather() applied two times is not what I want:

dat %>%
  gather(X_type, X, X_a:X_b) %>%
  gather(Y_type, Y, Y_a:Y_b)

id X_type      X Y_type         Y
v1 X_a -0.012758    Y_a -0.022725
v2 X_a -0.016706    Y_a -0.006948
v3 X_a -0.005629    Y_a -0.010578
v1 X_b  0.004537    Y_a -0.022725
v2 X_b  0.003332    Y_a -0.006948
v3 X_b  0.006972    Y_a -0.010578
v1 X_a -0.012758    Y_b  0.005833
v2 X_a -0.016706    Y_b  0.004965
v3 X_a -0.005629    Y_b  0.006069
v1 X_b  0.004537    Y_b  0.005833
v2 X_b  0.003332    Y_b  0.004965
v3 X_b  0.006972    Y_b  0.006069

I want a matching of X_a to Y_a and X_b to Y_b, so a fake syntax would be

dat %>%
  gather(X_type = Y_type, list(X, Y), list(X_a:X_b, Y_a:Y_b))

Any hints?

akrun · Accepted Answer · 2015-10-04T14:37:06.133

Using dplyr/tidyr, one option is gather/spread. We convert the 'wide' to 'long' format using all the columns in the key/value argument except the 'id' column, separate the 'key' column into two, reshape from 'long' to wide' with one of the split columns and the 'val', mutate the 'X_Y_type' column by pasteing the substring 'X_Y_', and change the order with `arrange.

library(dplyr)
library(tidyr)
gather(dat, key, val, -id) %>% 
       separate(key, into=c('var1', 'X_Y_type')) %>%
       spread(var1, val) %>% 
       mutate(X_Y_type= paste0('X_Y_', X_Y_type)) %>%
       arrange(X_Y_type)
#  id X_Y_type         X         Y
#1 v1    X_Y_a -0.012758 -0.022725
#2 v2    X_Y_a -0.016706 -0.006948
#3 v3    X_Y_a -0.005629 -0.010578
#4 v1    X_Y_b  0.004537  0.005833
#5 v2    X_Y_b  0.003332  0.004965
#6 v3    X_Y_b  0.006972  0.006069

But, this can be done with melt from data.table. It can take multiple patterns in the measure argument to convert the 'wide' to 'long' format.

library(data.table)#v1.9.6+
DT <- melt(setDT(dat), measure=patterns('^X', '^Y'), 
          value.name=c('X', 'Y'), variable.name='X_Y_type')

If needed, the 'X_Y_type' column values can be changed to the expected output using paste.

DT[, X_Y_type := paste(names(DT)[2],
      unique(sub('.*_','', names(dat)[-1])), sep="_")[X_Y_type]]

DT
#   id   X_Y_type         X         Y
#1: v1 X_Y_type_a -0.012758 -0.022725
#2: v2 X_Y_type_a -0.016706 -0.006948
#3: v3 X_Y_type_a -0.005629 -0.010578
#4: v1 X_Y_type_b  0.004537  0.005833
#5: v2 X_Y_type_b  0.003332  0.004965
#6: v3 X_Y_type_b  0.006972  0.006069

Or using reshape from base R

reshape(dat, idvar='id', varying=2:ncol(dat), sep="_", direction='long')

@Helix123, what does it break? It'd be nice of you to report such things so that they can be fixed.. — Arun, Oct 05 '15 at 11:24
Thank you for the overview of solutions! Did go with `base::reshape` in this case to avoid another dependency. — Helix123, Oct 07 '15 at 11:24
@Arun, code breakage with `data.table` v1.9.6 had already been reported by that time (issue #1361 in data.table's github repo https://github.com/Rdatatable/data.table/issues/1361) — Helix123, Oct 07 '15 at 11:26
@Helix123, thanks. You can turn of auto indexing and use `v1.9.6` until it is fixed. — Arun, Oct 07 '15 at 12:08

gather multiple times at once

1 Answers1