2

I have a data frame which is structured like this one:

dd <- data.frame(round = c("round1", "round2", "round1", "round2"),
                 var1 = c(22, 11, 22, 11),
                 var2 = c(33, 44, 33, 44),
                 nam = c("foo", "foo", "bar", "bar"),
                 val = runif(4))

   round var1 var2 nam        val
1 round1   22   33 foo 0.32995729
2 round2   11   44 foo 0.89215038
3 round1   22   33 bar 0.09213526
4 round2   11   44 bar 0.82644723

From this I would like to obtain a data frame with two lines, one for each value of nam, and variables var1_round1, var1_round2, var2_round1, var2_round2, val_round1, val_round2. I would really like to find a dplyr solution to this.

  nam var1_round1 var1_round2 var2_round1 var2_round2 val_round1 val_round2
1 foo          22          11          33          44 0.32995729  0.8921504
2 bar          22          11          33          44 0.09213526  0.8264472

The closest thing I can think of would be to use spread() in some creative way but I can't seem to figure it out.

Theodor
  • 896
  • 3
  • 7
  • 20

1 Answers1

7

We can use tidyr/dplyr to do this. We gather the dataset to 'long' format, unite the 'variable' and 'round' to create 'var' and then spread to 'wide' format.

library(dplyr)
library(tidyr)
gather(dd, variable, value, var1, var2, val) %>%
         unite(var, variable, round) %>% 
         spread(var, value)
#  nam val_round1 val_round2 var1_round1 var1_round2 var2_round1 var2_round2
#1 bar  0.7187271  0.6022287          22          11          33          44
#2 foo  0.2672339  0.7199101          22          11          33          44

NOTE: The 'val' are different as the OP didn't set a seed for runif

akrun
  • 674,427
  • 24
  • 381
  • 486