1

It might be a common use case, I was doing this in python, but in this case, I have to do it in R. How to replace the rd to road, st to street, etc.. in R.

Suppose I have a mapping dictionary like this,

dict = { st : street, rd : road, Ln : Lane, Pl : Place}

In my df,

Address
2/20,Queen St,London,UK
1,King Ln,Paris,France
5,Stuart Pl,Paris,France

How do I get this,

Address
2/20,Queen Street,London,UK
1,King Lane,Paris,France
5,Stuart Place,Paris,France

Thanks.

ds_user
  • 1,899
  • 2
  • 25
  • 61

1 Answers1

0

You can use the function gsub for that. gsub("Ln", "Lane", addresses) where adresses is a vector with your adresses as strings, replaces all occurences of "Ln" with "Lane". You can use Regex with this, but I don't think that really helps you.

So all you have to do is call that function for all substitutions you want to make and you're done. R doesn't have dictionaries (as far as I know), so doing it all in once would require another format to store your mappings.

To answer your question on how to do it for multiple dictionary entries:

Since we don't have dictionaries in R, we take the next best thing: lists. List entries have a name and an object (value, vector, anything really). We can make the name of the entry the dictionary key, and the value its translation:

dict <- list(St = "Street",
             Rd = "Road",
             Ln = "Lane",
             Pl = "Place")

Taking the adresses in your example:

Adresses <- c("2/20,Queen St,London,UK",
              "1,King Ln,Paris,France",
              "5,Stuart Pl,Paris,France")

Now we can loop over the entries of the list, create the expression (using the \b tags as mentioned by @wibeasley), and replace it with the entry in the list. Each time we overwrite the Adresses vector with the results, so we are sequentially applying all filters.

for(i in 1:length(dict)){
  Adresses <- gsub(paste0("\\b", names(dict)[i], "\\b"), dict[[i]], Adresses)
}
JAD
  • 1,587
  • 2
  • 16
  • 28
  • Surround `Ln` with [boundary](http://www.regular-expressions.info/wordboundaries.html) tags (ie, `\\bLn\\b`) so it doesn't pick up letters that are part of a larger word – wibeasley Jul 06 '17 at 05:28
  • Thanks to both of you. But how do I do this for multiple items, something like dictionary in R? – ds_user Jul 06 '17 at 05:32
  • @ds_user see my edit. – JAD Jul 06 '17 at 06:47