1

I'm trying to calculate differences between products and users to use in a recommendation system.

Data is in two column with multiple rows, need to be transformed into rows as users and columns as products.

I tried cast function from reshape package with no success.

library(dplyr)
library(reshape2)
library(tidyr)
library(reshape)
data <- tibble("customerId" = c(1,2,3,4,1,1), productId = c(10,11,12,10,11,10))

I want to transform it to this format:

   10    11    12 
1   1     1     0     
2   0     1     0
3   0     0     1
4   1     0     0

My main problem right now is at the time we have duplicate record, it should be counted only once, so we have 0-1 values.

Mehdi Zare
  • 779
  • 1
  • 7
  • 22

1 Answers1

1

An option would be spread to 'wide' format after creating a column of 1s'

library(tidyverse)
data %>% 
  mutate(n = 1) %>%
  spread(productId, n, fill = 0) %>%
  column_to_rownames('customerId')
#  10 11 12
#1  1  1  0
#2  0  1  0
#3  0  0  1
#4  1  0  0
akrun
  • 674,427
  • 24
  • 381
  • 486