Reshape data frame from wide to long format using extract()

Question

I'm trying to add an "Error" column next to my Measurements using extract(). However, I think I'm getting hung up with the regex and/or extract() syntax. Would appreciate some help.

Ideally, I should get a long format with columns

Reading Category Measurement Error Sample

reproducible code

Reading <- c(1,2,3,4)
Cat1 <- runif(4)*10
Cat1_err <- runif(4)/10
Cat2 <- runif(4)*10
Cat2_err <- runif(4)/10
Cat3 <- runif(4)*10
Cat3_err <- runif(4)/10
Sample <- c("X14","X23","X11","X10")
df_wide <- data.frame(Reading,Cat1,Cat1_err,Cat2,Cat2_err,Cat3,Cat3_err,Sample)
df_wide
  Reading     Cat1   Cat1_err     Cat2   Cat2_err     Cat3   Cat3_err Sample
1       1 7.375116 0.01014747 2.234376 0.08978868 5.373709 0.02245759    X14
2       2 5.097937 0.07036843 5.691806 0.05561866 1.823026 0.07658357    X23
3       3 2.034116 0.01689391 8.192971 0.03844054 4.242167 0.01036751    X11
4       4 9.129536 0.09130868 5.908125 0.05505775 5.747843 0.05774527    X10

df_long <- df_wide %>% 
    +   gather(key=Category, value=Measurement, Cat1:Cat3_err, factor_key = TRUE) %>%
    +   extract(Measurement,c("Meas","Error"),"Cat\d_err", remove=FALSE)


    Error in names(l) <- enc2utf8(into) : 
  'names' attribute [2] must be the same length as the vector [0]

This error is easy to fix: double the backslash, `\\d`. However, it won't fix the whole solution. — Wiktor Stribiżew, Oct 12 '17 at 20:42

score 1 · Accepted Answer · answered Oct 12 '17 at 20:54

I don't think you want to use extract. I think separate and spread are probably what you want. The following will generate warning messages but works.

library(tidyverse)

df_long <- df_wide %>% 
  gather(key=Category, value=Measurement, Cat1:Cat3_err, factor_key = TRUE) %>%
  separate(Category, into = c("Category", "Type")) %>%
  mutate(Type = ifelse(is.na(Type), "Measurement", "Error")) %>%
  spread(Type, Measurement) %>%
  select(Reading, Category, Measurement, Error, Sample)
df_long
   Reading Category Measurement       Error Sample
1        1     Cat1   0.8453114 0.074961215    X14
2        1     Cat2   4.5962112 0.059012908    X14
3        1     Cat3   5.4100838 0.076049726    X14
4        2     Cat1   4.5956145 0.016215603    X23
5        2     Cat2   1.7768868 0.040258838    X23
6        2     Cat3   1.9597101 0.027356213    X23
7        3     Cat1   1.6204584 0.057760820    X11
8        3     Cat2   4.9478913 0.054855327    X11
9        3     Cat3   2.9670444 0.004276482    X11
10       4     Cat1   0.1831593 0.038415489    X10
11       4     Cat2   2.5716471 0.024932980    X10
12       4     Cat3   8.5517659 0.015378512    X10

I was going off of https://stackoverflow.com/questions/25925556/gather-multiple-sets-of-columns. With your solution I get the ...Warning message: Too few values at 12 locations: 1, 2, 3, 4, 9, 10, 11, 12, 17, 18, 19, 20 Any idea what it means? — val, Oct 12 '17 at 21:00
@val It means not all rows have the pattern "_", so `NA` is created for those rows. The warning message should not affect what you want to do. — www, Oct 12 '17 at 21:02

score 1 · Answer 2 · answered Oct 12 '17 at 21:03

There might be a quicker way to do this, but it seems like it does what you're looking for:

df_wide %>% 
  gather(key=Category, value=Measurement, Cat1:Cat3_err, factor_key = TRUE) %>%
  extract(Category,c("Meas","Error"),"(Cat\\d)[_]*([a-z]*)")  %>% 
  spread(key = Error, value = Measurement)

Note, among other things, the need to use \\d for the regex in R.

Reshape data frame from wide to long format using extract()

2 Answers2