0

I am working with dplyr package of R. Let's say I have a data frame of names and ids

   df <- data.frame(dID=c(1  ,2  , 1 ),
                   name=c("a","a","b"))

and I want to resolve each id from another database and get the information I need.

   db <- data.frame(dID=c(1   ,2   ,3   ,4   ),
                  info1=c("A" ,"B" ,"C" ,"D" ),
                  info2=c("AA","BB","CC","DD"))

Currently, I am using the following code.

   df %>% rowwise() %>%
   mutate(INFO1 = (function(id){paste(db %>% filter(dID == id) %>% select(info1))})(dID),
          INFO2 = (function(id){paste(db %>% filter(dID == id) %>% select(info2))})(dID))

I was wondering is it possible to find a solution to avoid repeating this part of the code

db %>% filter(dID == id)

by storing it in a temporary variable. For example when I, change my code to

df %>% rowwise() %>%
   mutate(tmp <- db %>% filter(dID == dID),
         INFO1 = paste(tmp %>% select(info1)),
         INFO2 = paste(tmp %>% select(info2))
)

I get this error

Error in mutate_impl(.data, dots) : Column tmp <- db %>% filter(dID == dID) is of unsupported class data.frame

Is there any way to make the code tidier and faster?

divibisan
  • 8,631
  • 11
  • 31
  • 46
user9224
  • 67
  • 6

1 Answers1

1

I agree with Marius' comment. To demonstrate, the following reproduces the result from your rowwise dplyr chain

left_join(df, db) %>% mutate_at(vars(starts_with("info")), ~as.numeric(as.factor(.x)))
#  dID name info1 info2
#1   1    a     1     1
#2   2    a     2     2
#3   1    b     1     1
Maurits Evers
  • 42,255
  • 4
  • 27
  • 51