3
df1 <- data.frame(freetext = c("open until monday night", "one more time to insert your coin"), numid = c(291,312))
df2 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5))

I would line to merge the two dataframe using the freetext column as by option. However the text is not totally the same as some words removed or displayed.

Is there any option to find the max number of the same words between the rows and merge them according to this?

Here an example of expected output

df3 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5), numid = c(291,312))
foc
  • 907
  • 1
  • 9
  • 25
  • Dupe-oids: [dplyr: inner_join with a partial string match](https://stackoverflow.com/questions/32914357/dplyr-inner-join-with-a-partial-string-match); [How can I match fuzzy match strings from two datasets?](https://stackoverflow.com/questions/26405895/how-can-i-match-fuzzy-match-strings-from-two-datasets) – Henrik Jul 05 '20 at 11:22

1 Answers1

6

Perhaps, you can look into stringdist joins from fuzzyjoin and play with max_dist parameter which is suitable for your data.

fuzzyjoin::stringdist_inner_join(df1, df2, by = 'freetext', max_dist = 10)

#  freetext.x                        numid freetext.y                   aid
#  <chr>                             <dbl> <chr>                      <dbl>
#1 open until monday night             291 open until night               3
#2 one more time to insert your coin   312 one time to insert your be     5
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143