0

Hey guys so I am building a phone and email extractor using python regex and while it works for the emails, it won't work for the phone numbers.

The code for finding phone number matches on the clipboard is below:

for groups in phoneR.findall(text):
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    try:
        if groups[8] != '':
            phoneNum += ' x' + groups[8]
    except IndexError as i:
        print('not in range', i)
    matches.append(phoneNum)

The groups 1, 3, 5 , and 8 are supposed to be the area code, the first 3 digit, the last 4 digits, and the ext if there is one respectively. Yet when I run this code it returns this:

  1. not in range tuple index out of range
  2. not in range tuple index out of range
  3. not in range tuple index out of range
  4. Copied to clipboard:
  5. .-.-
  6. .-.-
  7. .-.-
  8. info@nostarch.com
  9. media@nostarch.com
  10. academic@nostarch.com
  11. info@nostarch.com

I've printed the error there with try & except to show more info. I don't understand why the .-.- appears instead of an actual phone number, so I'll post the code to the phone regex here as well as the test link I used. If anybody can give some insight it'd be much appreciated:

# phone regex
phoneR = re.compile(r'''
(\d{3}|\d{3}\))?        # area code
(\s|-|\.)?      # separator
(\d{3})     # first 3 digits
(\s|-|\.)       # separator
(\d{4})     # last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?      # extension
''', re.VERBOSE)

Here's the test link: https://nostarch.com/contactus/

  • Use `re.finditer` as it will return match data objects. `re.findall` only returns lists of tuples, or lists of strings. – Wiktor Stribiżew Feb 28 '20 at 13:56
  • The re.finditer actually solved the problem although it initially lead to a TypeError, since join() can only join strings, not NoneTypes. So I put the groups[8] thats inside the if statement inside a str() to correct this, which worked. The phone numbers printed but had 'Nonex' attached to them at the end. So to clean it up, I used strip('Nonex) to show just the phone number. There's probably a cleaner way to achieve this, but it got it done so thanks very much Wiktor. – Marcelino Velasquez Feb 28 '20 at 14:14

0 Answers0