1

I have the following data frame:

dat <- data.frame(
    c = c(1 , 2) , a1 = c(1 , 2) , a2 = c(3 , 4) , b1 = c(5 , 6) , b2 = c(7 , 8)
)
  c a1 a2 b1 b2
1 1  1  3  5  7
2 2  2  4  6  8

That I would like to merged columns based on shared prefixes to become this data frame:

dat2 <- data.frame(
    c = c(1 , 2 , 1 , 2) , a = c(1 , 2 , 3 , 4) , b = c(5 , 6 , 7 , 8)
)
  c a b
1 1 1 5
2 2 2 6
3 1 3 7
4 2 4 8

The only way I can think of is to try do this using melt(). This was my attempt:

melt(dat , measure.vars = c(grep("^a" , colnames(dat)) , grep("^b" , colnames(dat))))
    variable value
1 1       a1     1
2 2       a1     2
3 1       a2     3
4 2       a2     4
5 1       b1     5
6 2       b1     6
7 1       b2     7
8 2       b2     8
>

Needless to say, this is incorrect.

mnosefish
  • 369
  • 2
  • 9
  • I think `melt` was an excellent start. Now you just need to read the help page for `?dcast` and complete the process. – IRTFM Apr 22 '16 at 00:45
  • Two things: using `dcast(data = dat , formula = value ~ variable) ` will get me back to where I was (In theory, I think, though it's throwing an error saying it can't find `value` for some reason, but also, I can't seem to figure out how to strip the numbers off of the `variable`s. – mnosefish Apr 22 '16 at 01:01

3 Answers3

4

This is a case where base R's reshape is actually perfectly suited.

reshape(dat, idvar="c", direction="long", sep="", varying=-1, timevar=NULL)

#    c a b
#1.1 1 1 5
#2.1 2 2 6
#1.2 1 3 7
#2.2 2 4 8

The sep="" essentially tells reshape() there is nothing between the group identifier (a and b in this case) and the time indicator - (1 and 2 in this case) in your variable names. So all the renaming is taken care of automagically.

It might be more obvious if I don't set the timevar=NULL as well:

reshape(dat, idvar="c", direction="long", sep="", varying=-1)

#    c time a b
#1.1 1    1 1 5
#2.1 2    1 2 6
#1.2 1    2 3 7
#2.2 2    2 4 8

If you have many id variables that you wish to keep constant for your other melted data, then try this code:

# an example bit of data
dat2 <- cbind(x=1:2,y=2:3,z=3:4, dat)
dat2

#  x y z c a1 a2 b1 b2
#1 1 2 3 1  1  3  5  7
#2 2 3 4 2  2  4  6  8

idv <- match(c("x","y","z","c"), names(dat2))
reshape(dat2, idvar=idv, direction="long", sep="", varying=-idv, timevar=NULL)

#          x y z c a b
#1.2.3.1.1 1 2 3 1 1 5
#2.3.4.2.1 2 3 4 2 2 6
#1.2.3.1.2 1 2 3 1 3 7
#2.3.4.2.2 2 3 4 2 4 8
thelatemail
  • 81,120
  • 12
  • 111
  • 172
  • I like this approach. I'm having trouble with `varying`. If my first 12 columns are to be excluded from reshaping (like `c`), I'm trying something like -(1:12), but this isn't working. Any insights? `data2 – mnosefish Apr 22 '16 at 01:38
  • This is the best answer, imo. Base R and no one in hell can figure out how to use `reshape()`. Nice work. – Rich Scriven Apr 22 '16 at 02:00
  • @RichardScriven - I am one with `reshape()` - it speaks to me. :-P – thelatemail Apr 22 '16 at 03:04
  • @mnosefish - see my updated answer which shows how to exclude other columns. You essentially had it already. – thelatemail Apr 22 '16 at 03:14
3
library(tidyr)
library(dplyr)

dat %>%
  gather(key, value, -c) %>% # this gets you were you were...
  separate(key, into = c("letter", "number"), sep = 1) %>%
  spread(letter, value) %>%
  select(-number)
JasonAizkalns
  • 18,131
  • 6
  • 47
  • 99
  • I'm thoroughly unfamiliar with tidyr until today. WIll this work with multiple columns like c which will not be merged? – mnosefish Apr 22 '16 at 01:13
2

We can use melt from data.table which takes multiple patterns in the measure

library(data.table)
melt(setDT(dat), measure=patterns("^a\\d+", "^b\\d+"), 
                   value.name=c("a", "b"))[, variable:= NULL][]
#   c a b
#1: 1 1 5
#2: 2 2 6
#3: 1 3 7
#4: 2 4 8
akrun
  • 674,427
  • 24
  • 381
  • 486