1

I would like to use tidyr's spread function to convert a data frame with multiple id's in the rows and several columns into a df with one row where we have indicator columns for all the combinations of id's and categories. If dplyr and tidyr is not the most appropriate for this, open to other spread-like functions.

In the script below, I'm able to only specify 1 column as the value pair. I would like to have cat1 and cat2 as value columns. Also, I would like the field names to be "sentid1_cat1, sentid1_cat2" etc.

test.df <- data.frame(sentid = 1:3, 
                      cat1 = c(1,0,0), 
                      cat2 = c(0,1,0))

test.df %>%
    spread(key = sentid, value = cat1, sep = '_')

EDIT

Desired output:

output.df <- data.frame(sentid1_cat1 = 1,
                        sentid1_cat2 = 0,
                        sentid2_cat1 = 0,
                        sentid2_cat2 = 1,
                        sentid3_cat1 = 0,
                        sentid3_cat2 = 0)
alistaire
  • 38,696
  • 4
  • 60
  • 94
matsuo_basho
  • 2,059
  • 7
  • 20
  • 39
  • 1
    I am a little uncertain what you are asking. Do you mind including an output df of what the desired result would look like? – Dave Gruenewald Oct 09 '17 at 15:01
  • Maybe [this post](https://stackoverflow.com/questions/30592094/r-spreading-multiple-columns-with-tidyr) will be helpful. It would be helpful if you show your desired output. – lmo Oct 09 '17 at 15:01
  • Does my answer solve your problem? – acylam Oct 12 '17 at 13:25

1 Answers1

3

A solution with dplyr + tidyr:

library(dplyr)
library(tidyr)

test.df %>%
  gather(variable, value, -sentid) %>%
  unite(variable, sentid, variable) %>%
  mutate(variable = paste0("sentid", variable)) %>%
  spread(variable, value) 

Result:

  sentid1_cat1 sentid1_cat2 sentid2_cat1 sentid2_cat2 sentid3_cat1 sentid3_cat2
1            1            0            0            1            0            0
acylam
  • 16,587
  • 5
  • 27
  • 40