First of all I am sorry if my formatting is bad, this is my first time posting, (also new to programming & R)
I am trying to merge two data frames together on string variables. I am merging university names, which might not match up perfectly, so I was hoping to merge using a fuzzy or approximate string matching function. I was happy when I found the ‘fuzzyjoin’ package.
from cranR: stringdist_join: Join two tables based on fuzzy string matching of their columns
stringdist_join(x, y, by = NULL, max_dist = 2, method = c("osa", "lv",
"dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw","soundex"), mode = "inner", ignore_case = FALSE, distance_col = NULL, ...)
my code:
stringdist_left_join(new, institutions, by = c("tm_9_undergradu" = "Institution.Name"))
Error:
Error in dists[include] <- stringdist::stringdist(v1[include], v2[include], :
NAs are not allowed in subscripted assignments
I know that there are some NA's in these columns, but I am not sure how I could remove them as I need them there as well. I know it other join & merge functions the NA's will simply be ignored. Does anyone know a way to get around this error for this package or to do an approximate join on strings another way. Thank you for your help.