1

Not exactly sure how to word what I am looking for but I have a main data set (1000's of lines) and another file of details (1000's of lines). It is a one to many relationship and I want to join the table but instead of having multiple lines of the main data set, I'd like to create columns based on a column in the detail file.

Below is an example of what I'm looking for:

> maindata <- data.frame(ID = c(1,2,3,4), Customer = c("Bob","Mary","Bill","Steve"), stringsAsFactors = FALSE)
> maindata
  ID Customer
  1      Bob
  2     Mary
  3     Bill
  4    Steve

> reference <- data.frame(ID = c(1,1,1,1,2,2,4), Code = c("Credit","Price","Cust","Task","Credit","Task","Cust"), Num = c(5,4,3,8,3,4,6), stringsAsFactors = FALSE)
> reference
  ID   Code Num
  1 Credit   5
  1  Price   4
  1   Cust   3
  1   Task   8
  2 Credit   3
  2   Task   4
  4   Cust   6

I'd like it to transform to:

> IdealOutput
  ID Customer Credit Price Cust Task
1  1      Bob      5     4    3    8
2  2     Mary      3    NA   NA    4
3  3     Bill     NA    NA   NA   NA
4  4    Steve     NA    NA    6   NA

EDIT

Having 2 header columns with 2 value columns.

> reference <- data.frame(ID = c(1,1,1,1,2,2,4), Code = c("Credit","Price","Cust","Task","Credit","Task","Cust"), Num = c(5,4,3,8,3,4,6), Code2 = c("Banana","Apple","Pear","Grapes","Pear","Grapes","Banana"), Num2 = c(2,5,4,7,3,6,5), stringsAsFactors = FALSE)
> reference
  ID   Code Num  Code2 Num2
1  1 Credit   5 Banana    2
2  1  Price   4  Apple    5
3  1   Cust   3   Pear    4
4  1   Task   8 Grapes    7
5  2 Credit   3   Pear    3
6  2   Task   4 Grapes    6
7  4   Cust   6 Banana    5

I'd like to transform it to:

> IdealOutput
  ID Customer Credit Price Cust Task Banana Apple Pear Grapes
1  1      Bob      5     4    3    8      2     5    4      7
2  2     Mary      3    NA   NA    4     NA    NA    3      6
3  3     Bill     NA    NA   NA   NA     NA    NA   NA     NA
4  4    Steve     NA    NA    6   NA      5    NA   NA     NA

I can do two separate spreads and then Left_join but that's not intuitive.

Kevin
  • 1,648
  • 11
  • 41

1 Answers1

1

We can do a full_join by 'ID' and then sread

library(tidyverse)
full_join(reference, maindata) %>% 
      spread(Code, Num) %>%
      select(1:6)
#   ID Customer Credit Cust Price Task
#1  1      Bob      5    3     4    8
#2  2     Mary      3   NA    NA    4
#3  3     Bill     NA   NA    NA   NA
#4  4    Steve     NA    6    NA   NA
akrun
  • 674,427
  • 24
  • 381
  • 486
  • Hey @akrun, what about if I have two sets of columns? I have seen here https://stackoverflow.com/questions/30592094/r-spreading-multiple-columns-with-tidyr if there is 1 header column with 2 value columns but not 2 and 2. I've added an edit above. – Kevin May 10 '19 at 13:39