2

Being naive in Python, While learning re module in Python, I found something strange (I am not able to get it) :

import re

pattern = re.compile(r'[0-9]{3}-[0-9]{3}-[0-9]{4}')
list_phoneNumbers = pattern.findall('phone number : 123-456-7894, my home number : 789-456-1235')
print(list_phoneNumbers)

pattern = re.compile(r'bat(wo)?man')
batman_match = pattern.search('batman is there')
batwoman_match = pattern.search('batwoman is there')
bat_list_all = pattern.findall('batman is there but not batwoman')

print(batman_match.group())
print(batwoman_match.group())
print(bat_list_all)

Output :

['123-456-7894', '789-456-1235']
batman
batwoman
['', 'wo']

How come print(bat_list_all), did not give list ['batman', 'batwoman']? What I am missing to understand?

azro
  • 35,213
  • 7
  • 25
  • 55
Ravi Jiyani
  • 831
  • 1
  • 9
  • 25

1 Answers1

4

This is because you're using a group (wo)? so findall returns what matches this group:

  • '' for batman
  • 'wo' for batwoman

You may use a non-matching group : pattern = re.compile(r'bat(?:wo)?man')


re.findall(): return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

azro
  • 35,213
  • 7
  • 25
  • 55
  • I thought (wo)? means, 0 or 1 occurrence of wo. Thanks for focusing on group. – Ravi Jiyani Mar 08 '20 at 09:02
  • 1
    @RaviJiyani It does, but the parenthesis also make a capturing-group, that can be retrieved, using findall for ex – azro Mar 08 '20 at 09:02
  • Yes, just gone through documentation : (?:...) A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern. – Ravi Jiyani Mar 08 '20 at 09:04