0

I have data in wide format and I want it in long format. I also want to add a new column based on a value in the columns (a/b/c). Would appreciate guidance in achiving this!

id <- 1001:1003
q1a <- c(2,3,2)
q1b <- c(7,9,6)
q1c <- c(6,2,8)
q2a <- c(2,3,7)
q2b <- c(4,5,6)
q2c <- c(6,2,8)

df <- data.frame(id,q1a,q1b,q1c, q2a, q2b, q2c)
df
    id q1a q1b q1c q2a q2b q2c
1 1001   2   7   6   2   4   6
2 1002   3   9   2   3   5   2
3 1003   2   6   8   7   6   8

     id trt q1 q2
1  1001   a  2  2
2  1001   b  7  4
3  1001   c  6  6
4  1001   a  3  3
5  1001   b  9  5
6  1001   c  2  2
7  1002   a  2  7
8  1002   b  6  6
9  1002   c  8  8
10 1002   a  2  2
11 1002   b  7  4
12 1002   c  6  6
13 1003   a  3  3
d.b
  • 29,772
  • 5
  • 24
  • 63
raquel
  • 101
  • 5
  • The rows 10:13 seems to be redundant based on the input example. Also, the 'id' values are not correctly placed for 4 to 9 – akrun Sep 16 '19 at 16:14
  • A few posts that might help: https://stackoverflow.com/q/25925556/5325862 and https://stackoverflow.com/q/56685106/5325862, plus the posts those are marked as duplicates of – camille Sep 16 '19 at 17:01

2 Answers2

4

We can do this with tidyr (version -1.0.0)

library(tidyr)  
pivot_longer(df, -id, names_to = c(".value", "trt"), names_pattern= "(.*\\d+)(\\D+)")
# A tibble: 9 x 4
#     id trt      q1    q2
#  <int> <chr> <dbl> <dbl>
#1  1001 a         2     2
#2  1001 b         7     4
#3  1001 c         6     6
#4  1002 a         3     3
#5  1002 b         9     5
#6  1002 c         2     2
#7  1003 a         2     7
#8  1003 b         6     6
#9  1003 c         8     8
akrun
  • 674,427
  • 24
  • 381
  • 486
  • how should I interpret the characters in names_pattern? Can I read about this somewhere? Cant find anything about it – raquel Sep 17 '19 at 12:08
  • @raquel It is a regex notation. The `(.*\\d+)` is capturing the characters (`.*`) along with one or more digits (`\\d+`) as a group (`(..)`) followed by the non-digits (`\\D+`) as another set of groups i..e `q1a` would be split as `(q1)` and `(a)` – akrun Sep 17 '19 at 15:52
1
library(dplyr)
library(tidyr)
df %>%
    gather(var, val, -id) %>%
    mutate(trt = substring(var, nchar(var), nchar(var)),
           var = substring(var, 1, 2)) %>%
    spread(var, val)
#    id trt q1 q2
#1 1001   a  2  2
#2 1001   b  7  4
#3 1001   c  6  6
#4 1002   a  3  3
#5 1002   b  9  5
#6 1002   c  2  2
#7 1003   a  2  7
#8 1003   b  6  6
#9 1003   c  8  8
d.b
  • 29,772
  • 5
  • 24
  • 63