Python re with group: to extract char repeating in string and what is char

Question

I found this:

>>> re.findall(r'((.)\2*)',s)
[('111', '1'), ('22', '2'), ('1', '1')]
>>> s
'111221'
>>>

I'm not able to follow \2*, how does the regex worked: first group gives me the second group char repeating in s. Its amazing!

\2 meaning the second group, but what is the second group here?!

Note: this is to get number of times a char repeating in a string.

score 2 · Accepted Answer · 2016-04-16T19:51:47.693

\2 is a backreference to what was captured in capture group 2.
For example, if group 2 captured b, \2+ can only match b or bb, etc..
Equivalent to bb+ where 'b' can be any character except newline.

 (                 # (1 start)
      ( . )             # (2), Any character
      \2*               # Backreference to capture group 2, 0 to many times
 )                 # (1 end)

jil · Answer 2 · 2016-04-16T19:52:18.527

2

In your example capture group 1 \1 is ((.)\2*) and the capture group 2 \2 is (.)

Because you are not using the first capture group, you could use non-capturing group instead: (?:(.)\1)

edited Apr 16 '16 at 19:52

answered Apr 16 '16 at 19:43

jil

2,283
8
12

Python re with group: to extract char repeating in string and what is char

2 Answers2