0

I want to make a regular expression where it returns all the occurrence of text that is in between a and first occurence of b or c.

so I tried this code:

text = 'dfgahfjbjicij'
re.findall('a(.*?)(b|c)',text)

output

[('hfj', 'b')]

expected:

['hfj']

How do I make it so the first occurrence is the return not a tuple?

JSK
  • 39
  • 4

2 Answers2

2

Use a non-capturing group:

>>> text = 'dfgahfjbjicij'
>>> re.findall('a(.*?)(b|c)',text)
[('hfj', 'b')]
>>> re.findall('a(.*?)(?:b|c)',text)
['hfj']
Tordek
  • 10,075
  • 3
  • 31
  • 63
0

This expression might work, with lookarounds:

(?<=a)(.*?)(?=b|c)

or:

(?<=a)(.*?)(?=[bc])

or:

a([^bc]*)[bc]

Demo

Test

import re


expression = r"(?<=a)(.*?)(?=b|c)"

string = """

dfgahfjbjicij
dfgahfjjicij

"""

print(re.findall(expression, string))

Output

['hfj', 'hfjji']

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Emma
  • 1
  • 9
  • 28
  • 53