re.compile()
- Compiles the expression string to regex object. The string consists of the concatenated keys of the worDic
with a separator |
. Given a wordDic
{'hello':'hi', 'goodbye': 'bye'}
the string would be 'hello|hi' which could be tranlated to "hello or hi"
def translate(match):
- defines a callback function which will process every match
rc.sub(translate, text)
- Performes the string replacement. If regex matches the text, the matches (actually keys of wordDic
) are looked up in the wordDic via the callback and the translation is returned.
Example:
wordDic = {'hello':'hi', 'goodbye': 'bye'}
text = 'hello my friend, I just wanted to say goodbye'
translated = multiwordReplace(text, wordDic)
print(translated)
Output is:
hi my friend, I just wanted to say bye
EDIT
The main advantage of using re.compile()
though is the performance gain if the regex is used multiple times. As the regex is compiled on every function call, there is no gain. If wordDic
is used multiple times, you generate a multiwordReplace
function for the wordDic
and the compiling is done just once:
import re
def generateMwR(wordDic):
rc = re.compile('|'.join(map(re.escape, wordDic)))
def f(text):
def translate(match):
print(match.group(0))
return wordDic[match.group(0)]
return rc.sub(translate, text)
return f
Usage would be like this:
wordDic = {'hello': 'hi', 'goodbye': 'bye'}
text = 'hello my friend, I just wanted to say goodbye'
f = generateMwR(wordDic)
translated = f(text)