Reshape data wide-to-long, preserve variable order in `varying`

Question

Data:

structure(list(Day = 1:13, Morning_1_id = structure(1:13, .Label = c("20180502-033-000005", 
"20180503-033-000005", "20180507-033-000006", "20180508-033-000005", 
"20180510-033-000005", "20180511-033-000005", "20180514-033-000005", 
"20180516-033-000005", "20180517-033-000001", "20180518-033-000005", 
"20180521-033-000006", "20180522-033-000005", "20180523-033-000005"
), class = "factor"), W = c(26.3666666666667, 26.4433333333333, 
26.2, 26.2866666666667, 26.43, 25.8733333333333, 26.64, 26.5233333333333, 
27.27, 26.6, 26.6966666666667, 26.27, 26.24), R = c(5.87258333333333, 
5.84598, 5.92537333333333, 6.02874666666667, 5.99018666666667, 
5.88347333333333, 5.25210666666667, 5.88159666666667, 5.87579333333333, 
5.92004, 5.68929, 5.89672, 5.93005), Morning_2_id = structure(1:13, .Label = c("20180502-033-000006", 
"20180503-033-000006", "20180507-033-000007", "20180508-033-000006", 
"20180510-033-000006", "20180511-033-000006", "20180514-033-000006", 
"20180516-033-000006", "20180517-033-000002", "20180518-033-000006", 
"20180521-033-000007", "20180522-033-000006", "20180523-033-000006"
), class = "factor"), W1 = c(26.3066666666667, 26.7233333333333, 
25.7866666666667, 27.12, 26.09, 25.82, 27, 26.2166666666667, 
26.5066666666667, 26.7233333333333, 26.8766666666667, 26.1733333333333, 
26.28), R1 = c(5.74259666666667, 5.91224, 5.85586333333333, 5.99682, 
5.99842333333333, 5.28803333333333, 5.88124333333333, 5.85363, 
5.85148333333333, 5.68396333333333, 5.68045666666667, 5.95528, 
5.84653666666667), Afternoon_1_id = structure(1:13, .Label = c("20180502-033-000024", 
"20180503-033-000015", "20180507-033-000020", "20180508-033-000020", 
"20180510-033-000011", "20180511-033-000017", "20180514-033-000011", 
"20180516-033-000012", "20180517-033-000012", "20180518-033-000011", 
"20180521-033-000012", "20180522-033-000011", "20180523-033-000011"
), class = "factor"), W2 = c(27.0733333333333, 26.2233333333333, 
26.4533333333333, 26.4166666666667, 26.0966666666667, 26.5833333333333, 
26.6266666666667, 26.2766666666667, 26.39, 25.5633333333333, 
25.1866666666667, 26.89, 25.17), R2 = c(5.95638, 5.97475666666667, 
5.78408, 5.91546333333333, 5.73866333333333, 5.79964666666667, 
5.87522333333333, 5.53540333333333, 5.85597666666667, 5.75941666666667, 
5.88696333333333, 5.56677, 5.50966666666667), Afternoon_2_id = structure(1:13, .Label = c("20180502-033-000025", 
"20180503-033-000016", "20180507-033-000021", "20180508-033-000021", 
"20180510-033-000012", "20180511-033-000018", "20180514-033-000012", 
"20180516-033-000014", "20180517-033-000014", "20180518-033-000012", 
"20180521-033-000013", "20180522-033-000012", "20180523-033-000012"
), class = "factor"), W3 = c(26.2233333333333, 26.1266666666667, 
25.7733333333333, 26.7933333333333, 26.8166666666667, 26.6633333333333, 
26.45, 25.7833333333333, 26.18, 26.9433333333333, 26.4666666666667, 
26.78, 26.3666666666667), R3 = c(5.83166, 5.88337, 5.93851, 5.96334666666667, 
5.83277, 5.92955, 5.92999333333333, 5.78252333333333, 5.79061666666667, 
5.61290333333333, 5.88305333333333, 5.88644666666667, 5.79076
)), class = "data.frame", row.names = c(NA, 13L))

I want to carry out wide to long conversion (in base R preferably) such that the Ids and the values of 'W' and 'R' get stacked day wise.

I use the reshape function as follows:

mydata<- reshape(new_data, direction='long', 
                 varying = c('Morning_1_id', 'W', 'R', 
                             'Morning_2_id', 'W1', 'R1', 
                             'Afternoon_1_id', 'W2', 'R2',
                             'Afternoon_2_id', 'W3', 'R3'), 
                 v.names = c('TId', 'W', 'R'),
                 timevar = c('W', 'R'), # differentiates
                 times = c('Morning1', 'Morning2', 'Afternoon1', 'Afternoon2'),
                 idvar = 'Day')

This leads to change in the order of column names. The column names are different from the values that they have. I want to correct this and then do the following steps.

What is the correct way to carry this out?

`reshape(new_data,matrix(2:ncol(new_data),3),idvar=1,dir="long",v.names = c('TestId', 'WBC', 'RBC'), times = c('Morning1', 'Morning2', 'Afternoon1', 'Afternoon2'))` — Onyambu, Aug 11 '18 at 00:59

score 3 · Answer 1 · edited Aug 15 '18 at 06:15

3

It is the issue in the varying

varlist <- lapply(2:4, function(x) seq(x, ncol(new_data), by = 3))
out <- reshape(new_data, direction='long', varying=varlist, 
         v.names = c('TId', 'W', 'R'),
         times = c('Morning1', 'Morning2', 'Afternoon1', 'Afternoon2'), 
         idvar = 'Day')

head(out, 3)

#           Day     time                 TId        W        R
#1.Morning1   1 Morning1 20180502-033-000005 26.36667 5.872583
#2.Morning1   2 Morning1 20180503-033-000005 26.44333 5.845980
#3.Morning1   3 Morning1 20180507-033-000006 26.20000 5.925373

edited Aug 15 '18 at 06:15

Uwe

34,565
10
75
109

answered Aug 10 '18 at 22:57

akrun

674,427
24
381
486

@VisheshShrivastav change the variable names in all post or none (i.e. rollback the change here). It's looking odd now! – wp78de Aug 13 '18 at 21:00
Changed in all posts. – Vishesh Shrivastav Aug 13 '18 at 22:32

score 2 · Accepted Answer · edited Aug 15 '18 at 06:15

Since your variables have different names, you will have to specify in the order you need them to be otherwise if they had difference in the numbering only then we could have used the sep argument or/and the split arguments in reshape. You just need to change the varying into a matrix as below indicating the positions:

mydata = reshape(new_data,matrix(2:ncol(new_data),3),idvar=1,dir="long",
                v.names = c('TId', 'W', 'R'), 
                times = c('Morning1', 'Morning2', 'Afternoon1', 'Afternoon2'))
head(mydata)
              Day       time                 TId        W        R
1.Morning1      1   Morning1 20180502-033-000005 26.36667 5.872583
2.Morning1      2   Morning1 20180503-033-000005 26.44333 5.845980
3.Morning1      3   Morning1 20180507-033-000006 26.20000 5.925373
4.Morning1      4   Morning1 20180508-033-000005 26.28667 6.028747
5.Morning1      5   Morning1 20180510-033-000005 26.43000 5.990187
6.Morning1      6   Morning1 20180511-033-000005 25.87333 5.883473

Uwe · Answer 3 · 2018-08-14T21:20:25.887

For the sake of completeness, the melt() function from the data.table package is able to reshape multiple measurements simultaneously. In addition, it allows to specify the column names as regular expressions which saves a lot of typing:

library(data.table)
melt(setDT(new_data), measure.vars = patterns("id$", "^W", "^R"),
     value.name = c("TId", "W", "R"))

    Day variable                 TId        W        R
 1:   1        1 20180502-033-000005 26.36667 5.872583
 2:   2        1 20180503-033-000005 26.44333 5.845980
 3:   3        1 20180507-033-000006 26.20000 5.925373
 4:   4        1 20180508-033-000005 26.28667 6.028747
 5:   5        1 20180510-033-000005 26.43000 5.990187
 6:   6        1 20180511-033-000005 25.87333 5.883473
 7:   7        1 20180514-033-000005 26.64000 5.252107
 8:   8        1 20180516-033-000005 26.52333 5.881597
 9:   9        1 20180517-033-000001 27.27000 5.875793
10:  10        1 20180518-033-000005 26.60000 5.920040
11:  11        1 20180521-033-000006 26.69667 5.689290
12:  12        1 20180522-033-000005 26.27000 5.896720
13:  13        1 20180523-033-000005 26.24000 5.930050
14:   1        2 20180502-033-000006 26.30667 5.742597
15:   2        2 20180503-033-000006 26.72333 5.912240
...

If required, the factor column "variable" can be replaced by a time column to get the same result as reshape:

melt(setDT(new_data), measure.vars = patterns("_id$", "^W", "^R"),
     value.name = c("TId", "W", "R"), variable.name = "time")[
       , time := c('Morning1', 'Morning2', 'Afternoon1', 'Afternoon2')[time]][]

    Day       time                 TId        W        R
 1:   1   Morning1 20180502-033-000005 26.36667 5.872583
 2:   2   Morning1 20180503-033-000005 26.44333 5.845980
 3:   3   Morning1 20180507-033-000006 26.20000 5.925373
 4:   4   Morning1 20180508-033-000005 26.28667 6.028747

Here, the time values are explicitely given (like the times = parameter in reshape). Alternatively, the times values can be created automatically using pattern matching and extracting:

melt(setDT(new_data), measure.vars = patterns("_id$", "^W", "^R"),
     value.name = c("TId", "W", "R"), variable.name = "time")[
       , time := na.omit(stringr::str_extract(names(new_data), ".*(?=_id$)"))[time]][]

    Day        time                 TId        W        R
 1:   1   Morning_1 20180502-033-000005 26.36667 5.872583
 2:   2   Morning_1 20180503-033-000005 26.44333 5.845980
 3:   3   Morning_1 20180507-033-000006 26.20000 5.925373
 4:   4   Morning_1 20180508-033-000005 26.28667 6.028747

Here, the column names of new_data are searched for entries which end on "_id and the preceeding part of the string is extracted.

Reshape data wide-to-long, preserve variable order in `varying`

3 Answers3