3

Say I have an original dataset whose values in the first column are from a to d in the alphabet df1:

a x1
b x2
c x3
d x4
e x5

and then I have another dataset which multiple columns but whose entries reference the columns in the aforementioned dataset df2

---------
A | B | C
---------
a   b   c
    d   e

I would like to use a R function to use the unique values in df2 (a,b, c and d above) in order to create a new column in the df1 dataset that references the title of the corresponding column in df2 , i.e. df3

a x1 A
b x2 B
c x3 C
d x4 B
e x5 C

. Working example:

> # data frame with numbers and characters
> df1 = data.frame(unique_values=letters[1:5], other_col=paste(rep("x",5), 1:5, sep=""))
> print(df1)
  unique_values other_col
1             a        x1
2             b        x2
3             c        x3
4             d        x4
5             e        x5
> #  Create dataset that is then used to create new column
> df2 = data.frame(A = c("a",NA), B=c("b","d"), C=c("c","e") )
> df2
     A B C
1    a b c
2 <NA> d e

# Using df1 and columns referenging the df1 in df2 create df3
library(dplyr)
#df3?
user849541
  • 93
  • 5

3 Answers3

3

A base R option using merge + stack

merge(df1, setNames(na.omit(stack(df2)), c("unique_values", "names")))

gives

  unique_values other_col names
1             a        x1     A
2             b        x2     B
3             c        x3     C
4             d        x4     B
5             e        x5     C
ThomasIsCoding
  • 53,240
  • 4
  • 13
  • 45
2

Reshape the second data to 'long' format and then do a join

library(dplyr)
library(tidyr)
pivot_longer(df2, everything(), values_to = 'unique_values', 
    values_drop_na = TRUE) %>%
  left_join(df1)

-output

# A tibble: 5 x 3
#  name  unique_values other_col
#  <chr> <chr>         <chr>    
#1 A     a             x1       
#2 B     b             x2       
#3 C     c             x3       
#4 B     d             x4       
#5 C     e             x5      
akrun
  • 674,427
  • 24
  • 381
  • 486
1

data.table version :

library(data.table)

merge(setDT(df1), melt(setDT(df2), measure.vars = names(df2)), 
      by.x = 'unique_values', by.y = 'value')

#   unique_values other_col variable
#1:             a        x1        A
#2:             b        x2        B
#3:             c        x3        C
#4:             d        x4        B
#5:             e        x5        C
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143