5

thanks in advance for the help.

There are several questions using spread (from long to wide) on duplicate rows with unite such as this.

I think what makes my question unique is the need to output dummy variables.

I anticipate an input like so:

df <- data.frame(id = c(1,1,2,3,4), fruit = c("apple","pear","apple","orange","apple"))

And an output like so:

output <- data.frame(id=c(1,2,3,4), apple = c(1,1,0,1), pear = c(1,0,0,0), orange = c(0,0,1,0))

Any help would be greatly appreciated. Thanks.

ReginaldMilton
  • 199
  • 1
  • 7

2 Answers2

6

Using tidyverse you can add new column and than use spread.

library(tidyverse)

df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)

# result
  id apple orange pear
1  1     1      0    1
2  2     1      0    0
3  3     0      1    0
4  4     1      0    0
m0nhawk
  • 20,919
  • 9
  • 39
  • 68
  • Thanks. Both this comment and @clemens are on point. I should have been more specific about my preference for dplyr. Thanks so much! – ReginaldMilton Jan 14 '18 at 18:54
2

You can use dcast() from the data.table package.

data.table::dcast(df, 
                  id ~ fruit, 
                  fun.aggregate = function(x) 1L,
                  fill = 0L)

Which will return

  id apple orange pear
1  1     1      0    1
2  2     1      0    0
3  3     0      1    0
4  4     1      0    0
clemens
  • 5,684
  • 2
  • 10
  • 22