How does this word replacement function work?

Question

import re
def multiwordReplace(text, wordDic):
    rc = re.compile('|'.join(map(re.escape, wordDic))))
    def translate(match):
        return wordDic[match.group(0)]
    return rc.sub(translate, text)

This code was copied from another source, but I am unsure on how it replaces the words in a passage of text and don't understand why the 're' function is used here

You should read about the [regular expressions](https://docs.python.org/2/howto/regex.html). — Tony Babarino, May 13 '16 at 10:15
What should we do with such questions? It is not a question like [What does the regex mean](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean), but similar. — Wiktor Stribiżew, May 13 '16 at 10:19

dron22 · Answer 1 · 2016-05-13T12:29:20.937

re.compile() - Compiles the expression string to regex object. The string consists of the concatenated keys of the worDic with a separator |. Given a wordDic {'hello':'hi', 'goodbye': 'bye'} the string would be 'hello|hi' which could be tranlated to "hello or hi"
def translate(match): - defines a callback function which will process every match
rc.sub(translate, text) - Performes the string replacement. If regex matches the text, the matches (actually keys of wordDic) are looked up in the wordDic via the callback and the translation is returned.

Example:

wordDic = {'hello':'hi', 'goodbye': 'bye'}
text = 'hello my friend, I just wanted to say goodbye'
translated = multiwordReplace(text, wordDic)
print(translated)

Output is:

hi my friend, I just wanted to say bye

EDIT

The main advantage of using re.compile() though is the performance gain if the regex is used multiple times. As the regex is compiled on every function call, there is no gain. If wordDic is used multiple times, you generate a multiwordReplace function for the wordDic and the compiling is done just once:

import re
def generateMwR(wordDic):
    rc = re.compile('|'.join(map(re.escape, wordDic)))
    def f(text):
        def translate(match):
            print(match.group(0))
            return wordDic[match.group(0)]
        return rc.sub(translate, text)
    return f

Usage would be like this:

wordDic = {'hello': 'hi', 'goodbye': 'bye'}
text = 'hello my friend, I just wanted to say goodbye'
f = generateMwR(wordDic)
translated = f(text)

score 1 · Accepted Answer · answered May 13 '16 at 10:41

Piece by piece...

 # Our dictionary
 wordDic = {'hello': 'foo', 'hi': 'bar', 'hey': 'baz'}

 # Escape every key in dictionary with regular expressions' escape character. 
 # Escaping is requred so that possible special characters in 
 # dictionary words won't mess up the regex
 map(re.escape, wordDic)

 # join all escaped key elements with pipe | to make a string 'hello|hi|hey'
'|'.join(map(re.escape, wordDic))

 # Make a regular expressions instance with given string.
 # the pipe in the string will be interpreted as "OR", 
 # so our regex will now try to find "hello" or "hi" or "hey"
 rc = re.compile('|'.join(map(re.escape, wordDic)))

So rc now matches with words there are in the dictionary and rc.sub replaces those words in given string.The translate funtion just returns corresponding value for the key when regex returns a match.

How does this word replacement function work?

2 Answers2