1

I have a problem of creating and rearrange a dataset. This wrangling is too advanced for me and I'm really thankful for help on this one. It would be great if this could be done with dplyr for example. I have created an example of my problem below: my df:

     vehicle  color  a  b  c  d  A1  A2  A3  B1  B2  B3  C1  C2  C3  D1  D2  D3
resp                                                                           
1       bike  green  5  4  1  3   3   4   5   3   5   3 NaN NaN NaN NaN NaN NaN
2       walk    red  5  3  3  3   4   5   3   3   5   4 NaN NaN NaN NaN NaN NaN
3        car  green  4  2  3  3   4   3   5   4   5   5 NaN NaN NaN NaN NaN NaN
4        car   blue  4  5  4  4 NaN NaN NaN NaN NaN NaN   5   5   5   3   3   4
5        bus  black  2  4  4  3 NaN NaN NaN   2   3   3   2   2   1 NaN NaN NaN
6        car    red  4  2  3  3   3   4   4 NaN NaN NaN   4   4   4 NaN NaN NaN
7        bus   blue  5  5  2  3   3   3   5   4   3   2 NaN NaN NaN NaN NaN NaN
8       walk    red  3  3  4  3 NaN NaN NaN   5   5   5   5   3   3 NaN NaN NaN
9        car   blue  5  3  4  3   3   3   3 NaN NaN NaN   4   3   4 NaN NaN NaN

The dataset contains respondents and answers to a questionare. What I would like to do is to make a new dataframe with resp as index and the data from how the respondents answered rearranged. The data in columns a,b,c,d, vehicle and color are stacked for the respondents (Hope thats the right way to express it) in the new dataframe. Also the values from columns A to C are in the new frame under columns BL_val. Only the data that corresponds from Capital letter (A1-D3) to small letter (a,b,c,d) are filled in. The rest are NAN.

I would like to create a new dataframe from this and it should look like:

ds:

     vehicle  color sl  sl_val  BL_val1  BL_val2  BL_val3
resp                                                     
1       bike  green  a       5        3        4        5
1       bike  green  b       4        3        5        3
1       bike  green  c       1      NaN      NaN      NaN
1       bike  green  d       3      NaN      NaN      NaN
2       walk    red  a       5        4        5        3
2       walk    red  b       3        3        5        4
2       walk    red  c       3      NaN      NaN      NaN
2       walk    red  d       3      NaN      NaN      NaN
3        car  green  a       4        4        3        5
3        car  green  b       2        4        5        5
3        car  green  c       3      NaN      NaN      NaN
3        car  green  d       3      NaN      NaN      NaN
4        car   blue  a       4      NaN      NaN      NaN
4        car   blue  b       5      NaN      NaN      NaN
4        car   blue  c       4        5        5        5
4        car   blue  d       4        3        3        4
5        bus  black  a       2      NaN      NaN      NaN
5        bus  black  b       4        2        3        3
5        bus  black  c       4        2        2        1
5        bus  black  d       3      NaN      NaN      NaN
6        car    red  a       4        3        4        4
6        car    red  b       2      NaN      NaN      NaN
6        car    red  c       3        4        4        4
6        car    red  d       3      NaN      NaN      NaN
7        bus   blue  a       5        3        3        5
7        bus   blue  b       5        4        3        2
7        bus   blue  c       2      NaN      NaN      NaN
7        bus   blue  d       3      NaN      NaN      NaN
8       walk    red  a       3      NaN      NaN      NaN
8       walk    red  b       3        5        5        5
8       walk    red  c       4        5        3        3
8       walk    red  d       3      NaN      NaN      NaN
9        car   blue  a       5        3        3        3
9        car   blue  b       3      NaN      NaN      NaN
9        car   blue  c       4        4        3        4
9        car   blue  d     NaN      NaN      NaN      NaN

I really need some help with this, I cant figure it out!!

Sam Firke
  • 17,062
  • 6
  • 70
  • 83
jonas
  • 10,523
  • 21
  • 53
  • 71

1 Answers1

4

With data.table v1.9.6:

require(data.table) # v1.9.6+
ans = melt(setDT(df), measure=patterns("^[abcd]$", "1$", "2$", "3$"), 
         variable.name="sl", value.name = c("sl_val", paste0("BL_val", 1:3)))
setattr(ans$sl, 'levels', letters[1:4])
setorder(ans, resp)

data.table's melt function accepts a list of measure.vars and combines each of them into a separate column. From there, all there's left is to set levels accordingly, and then reorder the data.table by resp.

See this post for advantages of setorder. See Efficient reshaping using data.tables vignette and my UseR'15 talk to learn more about reshaping using data.tables.

Community
  • 1
  • 1
Arun
  • 108,644
  • 21
  • 263
  • 366
  • 1
    @Arun...Great, thank you so much. I'll definitely check data.table up for future wrangling. Also extra cred for the links and your material. – jonas Oct 12 '15 at 07:24