Create 'dummy variables' by spreading duplicate rows into columns in R

Question

thanks in advance for the help.

There are several questions using spread (from long to wide) on duplicate rows with unite such as this.

I think what makes my question unique is the need to output dummy variables.

I anticipate an input like so:

df <- data.frame(id = c(1,1,2,3,4), fruit = c("apple","pear","apple","orange","apple"))

And an output like so:

output <- data.frame(id=c(1,2,3,4), apple = c(1,1,0,1), pear = c(1,0,0,0), orange = c(0,0,1,0))

Any help would be greatly appreciated. Thanks.

score 6 · Accepted Answer · answered Jan 14 '18 at 18:45

6

Using tidyverse you can add new column and than use spread.

library(tidyverse)

df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)

# result
  id apple orange pear
1  1     1      0    1
2  2     1      0    0
3  3     0      1    0
4  4     1      0    0

answered Jan 14 '18 at 18:45

m0nhawk

20,919
9
39
68

Thanks. Both this comment and @clemens are on point. I should have been more specific about my preference for dplyr. Thanks so much! – ReginaldMilton Jan 14 '18 at 18:54

score 2 · Answer 2 · answered Jan 14 '18 at 18:45

You can use dcast() from the data.table package.

data.table::dcast(df, 
                  id ~ fruit, 
                  fun.aggregate = function(x) 1L,
                  fill = 0L)

Which will return

  id apple orange pear
1  1     1      0    1
2  2     1      0    0
3  3     0      1    0
4  4     1      0    0

Create 'dummy variables' by spreading duplicate rows into columns in R

2 Answers2

Linked