1

I am using python 3.6, working on "Automate the Boring Stuff" course and trying to learn how to use VERBOSE mode in Regex. When the following code is executed, somehow the print result is:

[('123-', ''), ('415-', ''), ('905-', '')]

Can someone tell me what I am doing wrong? I would like the code to return both phone numbers in the string.

import re

phoneNum = re.compile(r'''
(\d\d\d-)|  # area code without parentheses but with dash
(\(\d\d\d\) ) # -or- area code with parentheses and no dash
\d\d\d # first 3 digits
-      # second dash
\d\d\d\d # last 4 digits''', re.VERBOSE) 

print(phoneNum.findall('(415) 123-2342 and 415-905-1234 are the numbers.'))
Steve
  • 21
  • 2

1 Answers1

2

The first grouping is wrong, you need to alternate \d\d\d- and \(\d\d\d\) and also escape the space after the parenthesized digits or it will be treated as a formatting whitespace (since you are using re.VERBOSE).

The regex can be fixed as

(?:\d{3}-|   # area code without parentheses but with dash
\(\d{3}\)\ ) # -or- area code with parentheses and no dash
\d{3}        # first 3 digits
-            # second dash
\d{4}        # last 4 digits

Note thet \ on the second line. See the regex demo. You may add \b at the start/end of the expression to match a number as a whole word.

Use

import re
phoneNum = re.compile(r'''
(?:\d{3}-|  # area code without parentheses but with dash
\(\d{3}\)\ ) # -or- area code with parentheses and no dash
\d{3} # first 3 digits
-      # second dash
\d{4} # last 4 digits''', re.VERBOSE) 
print(phoneNum.findall('(415) 123-2342 and 415-905-1234 are the numbers.'))
# => ['(415) 123-2342', '415-905-1234']

See the Python demo.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • Thanks for the reply. I am new to coding, so I am still unsure about some of the operators and the syntax. Could you explain to me what ?: does in the first line of your code? Must it be used everytime I use the pipe | character? – Steve Dec 10 '17 at 21:50
  • @Steve Sorry, I forgot to mention that you must also use a non-capturing group (the `(?:...)` is its syntax) so that `re.findall` does not return just the captured substrings. If you use `re.finditer` and grab `match.group(0)` you do not have to care about what kind of group you are using. – Wiktor Stribiżew Dec 10 '17 at 22:46