How to properly match a sequence using regular expression in python?

Question

I have a string, s="abaaaababbb".

I am using findall method and I want to know all the occurrences of (ab)+. The code that I am using is:

import re
s = "abaaaababbb"
x = re.findall("[ab]+",s)
print(x)

Output: ['abaaaababbb']

Instead I wanted output like: ['ab' , 'abab']

How to write the correct regular expression for the same?

@Sweeper I want to know wherever ababab... occurs in the string and then I will find the longest such occurence. — Som Shekhar Mukherjee, Jul 03 '19 at 07:00

score 2 · Accepted Answer · answered Jul 03 '19 at 07:03

2

The regex you mentioned in your question ((ab)+) is almost correct.

You just need to make the capturing group a non-capturing one:

(?:ab)+

This is because findall will return all the groups (as opposed to all the matches) if you have any capturing groups in the regex.

answered Jul 03 '19 at 07:03

Sweeper

What is a capturing group I don't understand and what if in place of 'ab', I want to match '()' – Som Shekhar Mukherjee Jul 03 '19 at 07:14
@SomShekharMukherjee you’d still need a non-capturing group. You’d also need to escape the parentheses, so `(?:\(\))+`. You should really learn more about regex first. – Sweeper Jul 03 '19 at 07:16
@SomShekharMukherjee See [the regex reference](https://stackoverflow.com/q/22937618/5133585) – Sweeper Jul 03 '19 at 07:17
Thanks alot, I found what I was looking for (y) – Som Shekhar Mukherjee Jul 03 '19 at 07:26

1 Answers1