1

I've worked with R for quite some time now, but have made little use of the reshape or reshape2 packages. I'm currently trying to reshape a data set from wide to long format where the indicator variables are part of the variable names. This is the current structure of my data frame:

mydf <- data.frame(district = c(1:2),
v.mandate = c(1, 3),
s.mandate = c(2, 4),
v.perc = c(.4, .3),
s.perc = c(.5, .6))

> mydf
  district v.mandate s.mandate v.perc s.perc
1        1         1         2    0.4    0.5
2        2         3         4    0.3    0.6

I want to reshape this to long format and extract the "v." and "s." as id variables (in the real data set the list is longer). See example below.

mydf2 <- data.frame(district = c(1, 1, 2, 2),
party = c("v", "s", "v", "s"),
mandate = c(1, 2, 3, 4),
perc = c(.4, .5, .3, .6))

> mydf2
  district party mandate perc
1        1     v       1  0.4
2        1     s       2  0.5
3        2     v       3  0.3
4        2     s       4  0.6

I have tried using both the reshape and melt functions, but I can't seem to extract the indicator variables from the variable names. Instead, the data set is reshaped into long format, but with the full variable names as id variables. See example below.

> melt(mydf, id.vars=1)
  district  variable value
1        1 v.mandate   1.0
2        2 v.mandate   3.0
3        1 s.mandate   2.0
4        2 s.mandate   4.0
5        1    v.perc   0.4
6        2    v.perc   0.3
7        1    s.perc   0.5
8        2    s.perc   0.6

It might be a trivial problem, but I haven't been able to find a solution on-line.

Greatly appreciate any help!

chrstnsn
  • 177
  • 2
  • 6
  • 1
    Look at this related question: http://stackoverflow.com/questions/25925556/gather-multiple-sets-of-columns-with-tidyr. For your data: `mydf %>% gather(key, value, -district) %>% separate(key, c("party", "type"), "\\.") %>% spread(type, value)` with dplyr and tidyr – talat Mar 16 '15 at 13:10
  • In the end I managed to do it using the reshape function, but this solution is definitely cleaner. – chrstnsn Mar 16 '15 at 14:40

2 Answers2

3

data.table v1.9.5 can melt to multiple columns directly. Installation instructions.

require(data.table) # v1.9.5+
ans = melt(setDT(mydf), measure=patterns("^mandate", "^perc"), 
             value.name=c("mandate", "perc"))
#    district variable mandate perc
# 1:        1        1       1  0.4
# 2:        2        1       3  0.3
# 3:        1        2       2  0.5
# 4:        2        2       4  0.6

setattr(ans$variable, 'levels', c("v", "s"))

You can automate the extraction of levels using gsub(). There is no direct way to extract levels yet (it doesn't seem straightforward to me in all cases).

Arun
  • 108,644
  • 21
  • 263
  • 366
  • What if I have not only 'mandate' and 'perc' but a many columns? Can it be done easily on data.table 1.9.7? – skan May 03 '16 at 17:07
3

I may be incompetent, but it seems to be that you have named your variables reversely of what reshape assumes (e.g. v.mandate instead of mandate.v). I reversed their naming, and could get it to work:

mydf <- data.frame(district = c(1:2),
                   mandate.v = c(1, 3),
                   mandate.s = c(2, 4),
                   perc.v = c(.4, .3),
                   perc.s = c(.5, .6))

#reshape data
mydf2 = reshape(mydf,
                varying=2:5, #variables 2:5 varies
                direction = "long", #towards long
                timevar="party", #the grouping variable
                idvar="district", #identifying variable
                sep = ".") #separated by dots

This gives:

> mydf2
    district party mandate perc
1.v        1     v       1  0.4
2.v        2     v       3  0.3
1.s        1     s       2  0.5
2.s        2     s       4  0.6

You may need some kind of automatic way of reversing the names in the entire dataframe. I did it manually above because there were only 4 variables. If you have 100, that's not worth it.

CoderGuy123
  • 5,189
  • 3
  • 48
  • 77
  • Thanks! That worked like a charm. I reversed the variable names using paste() and just a few lines of code. – chrstnsn Mar 16 '15 at 14:37