1

Given the string bcacaca. The regex pattern b?(.a)* should match on the whole string. However, a call to re.findall('b?(.a)*', 'bcacaca') returns ['ca', ''] as a result. It seems to be only returning the result of matching on the individual groups. What's going on here?

My understanding of findall is that it should return all nonoverlapping instances of the regex pattern. In this case it should return ['bcacaca', ''].

halfer
  • 18,701
  • 13
  • 79
  • 158
Mutating Algorithm
  • 2,234
  • 18
  • 47
  • 3
    That's how [`findall`](https://docs.python.org/3/library/re.html#re.findall) works: "If one or more groups are present in the pattern, return a list of groups". If you want to match the whole string, change the group to non-capturing `b?(?:.a)*` – Nick May 19 '20 at 02:12
  • I believe you are looking for "re.search(...).goroup()" – Joshua May 19 '20 at 02:14
  • @Nick So `?:` is included at the start of every group that you want to mark as non-capturing? – Mutating Algorithm May 19 '20 at 02:15
  • 1
    @MutatingAlgorithm correct. See https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean – Nick May 19 '20 at 02:16

1 Answers1

1

Use '(b(.a)*)' as your regex pattern instead. You need result[0] in the following example.

import re

result = re.findall('(b(.a)*)', 'bcacaca')
result

Output:

[('bcacaca', 'ca')]

A Better Option - Using a Non-capturing Group

As @Nick mentioned, a non-capturing group could be used here as follows. Consider the following scenario. For step-by-step explanation see the next section. Also, I encourage you to use this resource: regex101.com.

## Define text and pattern
text = 'bcacaca dcaca dbcaca'
pattern = 'b?(?:.a)*'

## Evaluate regex
result = re.findall(pattern, text)
# output
# ['bcacaca', '', '', 'caca', '', '', 'bcaca', '']

## Drop empty strings from result
result = list(filter(None, result))
# output
# ['bcacaca', 'caca', 'bcaca']

Explanation for Using a Non-capturing Group

enter image description here

References

  1. Remove empty strings from a list of strings
CypherX
  • 4,846
  • 1
  • 8
  • 26