2

I have a shopping cart data, which look like the sample dataframe below:

sample_df<-data.frame(
   clientid=1:10,
   ProductA=c("chair","table","plate","plate","table","chair","table","plate","chair","chair"),
   QuantityA=c(1,2,1,1,1,1,2,3,1,2),
   ProductB=c("table","doll","shoes","","door","","computer","computer","","plate"),
   QuantityB=c(3,1,2,"",2,"",1,1,"",1)
)
#sample data frame
   clientid ProductA QuantityA ProductB QuantityB
1  1        chair    1         table   3
2  2        table    2         doll    1   
3  3        plate    1         shoes   2             
4  4        plate    1             
...
10 10       chair    2         plate   1

I would like to transform it into different format, which will be like:

#ideal data frame
   clientid ProductNumber Product Quantity
1  1        A             chair   1
2  1        B             table   3
3  2        A             table   2
4  2        B             doll    1
...
11 6        A             chair   1
...
17 10       A             chair   2
18 10       B             plate   1 

I have tried

library(tidyr)
sample_df_gather<- sample_df %>% select(clientid, ProductA, ProductB) 
%>% gather(ProductNumber, value, -clientid) %>% filter(!is.na(value))

#this gives me
    clientid ProductNumber value
1   1        ProductA      chair
2   2        ProductB      table
3   3        ProductA      plate
4   4        ProductB      plate
...

However, I don't know how to add Quantity to the data frame. Also, in the actual data frame, there are more columns such as Title, price, which I would like to transform into the ideal data frame as well. Is there a way to transform the data into the ideal format?

Johnny Chiu
  • 319
  • 5
  • 12
  • For QuantityB, you really don't want to use ""... try NA instead. – Frank Apr 21 '16 at 14:57
  • 1
    `reshape(sample_df, dir = 'long', varying = list(c(2,4), c(3,5)))` gives me 20 rows or is that wrong – rawr Apr 21 '16 at 15:06
  • 1
    thanks you @Frank! the reshape function provided here solved my problem. @aosmith, yes I have checked it before I ask this question, but still couldn't find a way to my transform into my ideal data frame. – Johnny Chiu Apr 22 '16 at 04:57
  • It looked to be almost an identical problem to me. Maybe edit your question to show how those solutions didn't work for you? – aosmith Apr 22 '16 at 21:31

1 Answers1

8

With data.table:

library(data.table)
res = melt(setDT(sample_df), 
  measure.vars = patterns("^Product", "^Quantity"), 
  variable.name = "ProductNumber")
res[, ProductNumber := factor(ProductNumber, labels = c("A","B"))]

which gives

    clientid ProductNumber   value1 value2
 1:        1             A    chair      1
 2:        2             A    table      2
 3:        3             A    plate      1
 4:        4             A    plate      1
 5:        5             A    table      1
 6:        6             A    chair      1
 7:        7             A    table      2
 8:        8             A    plate      3
 9:        9             A    chair      1
10:       10             A    chair      2
11:        1             B    table      3
12:        2             B     doll      1
13:        3             B    shoes      2
14:        4             B       NA     NA
15:        5             B     door      2
16:        6             B       NA     NA
17:        7             B computer      1
18:        8             B computer      1
19:        9             B       NA     NA
20:       10             B    plate      1

Data (since the OP's original data was borked):

structure(list(clientid = 1:10, ProductA = structure(c(1L, 3L, 
2L, 2L, 3L, 1L, 3L, 2L, 1L, 1L), .Label = c("chair", "plate", 
"table"), class = "factor"), QuantityA = c(1L, 2L, 1L, 1L, 1L, 
1L, 2L, 3L, 1L, 2L), ProductB = structure(c(6L, 2L, 5L, NA, 3L, 
NA, 1L, 1L, NA, 4L), .Label = c("computer", "doll", "door", "plate", 
"shoes", "table"), class = "factor"), QuantityB = c(3L, 1L, 2L, 
NA, 2L, NA, 1L, 1L, NA, 1L)), .Names = c("clientid", "ProductA", 
"QuantityA", "ProductB", "QuantityB"), row.names = c(NA, -10L
), class = "data.frame")
Frank
  • 63,401
  • 8
  • 85
  • 161