the code is presented below
import re
line = "dogs are better than humans"
matchObj = re.match( r'(.*) are (.*?) .*', line)
if matchObj:
print ("matchObj.group() : ", matchObj.group())
import re
line = "dogs are better than humans"
matchObj = re.match( r'(.*) are (.*?) .*', line)
if matchObj:
print ("matchObj.group() : ", matchObj.group())
(.*)
: matches and captures any character (except new lines) any number of times. This may be zero times. .
denotes "any character" and *
signifies repetition. The parentheses are used to denote capture groups (explained below).
are
: literal string " are "
(.*?)
: same as (.*)
except it tries to match as few characters as possible (non-greedy). This means that it would try to stop matching as soon as possible. If your string contained multiple spaces after (.*?)
, this part of the expression would match all those spaces. Adding the non-greedy symbol (?
) will make it stop at the first space (since that is the character after this segment of the expression).
.*
any character any number of times.
Capture groups or captures for short are portions of the entire match. Wrapping an expression within your regex allows you to easily retrieve that portion of your match.
(dogs
) are
(better
)
than humans
(.*)
are
(.*?)
.*
In your example, dogs
and better
would be captured. These are also referred to as "groups". In regular expressions, they are marked by a pair of parentheses.
Play around with the regex here. Hover on the match to see which portions of the expression are captured.