selecting variations of phone numbers using regex

Question

    import re
    s = 'so the 1234 2-1-1919 215.777.9839 1333331234 20-20-2000 A1234567 (515)2331129 7654321B (511)231-1134 512-333-1134 7777777 a7727373 there 1-22-2001 *1831 5647 and !2783 '
    reg = r'[()\d-]{7,}'
    r1 = re.findall(reg,s)

I have the following reg that gives the following

    r1
['2-1-1919',
 '1333331234',
 '20-20-2000',
 '1234567',
 '(515)2331129',
 '7654321',
 '(511)231-1134',
 '512-333-1134',
 '7777777',
 '7727373',
 '1-22-2001']

I want to get the following output

['(515)2331129', 
    '(511)231-1134',
 '512-333-1134']

So I tried to alter reg = r'[()\d-]{7,}' by adding \b

reg = r'[\b()\b\d-]{7,}'

But this doesnt work. How do I change reg = r'[()\d-]{7,}' to get the output I want?

`[()\d-]` is a character class that matches `(`, `)`, `-` or a digit `\b` in a character class means backspace. — Toto, Sep 01 '19 at 18:45
@Toto - `Reference - What does this regex mean?` is too broad a topic, he's asking how to match variations of phone numbers. — , Sep 01 '19 at 19:17
@sln: They will find there what is a character class and the meaning of their regex. — Toto, Sep 01 '19 at 19:40
@Toto - I don't think a fopa in remedial class understanding warrants a redirection to an over bloated regex link page trying to replace the google search engine. Then %99 of the regex questions qualify for that. Who decides that ? — , Sep 01 '19 at 19:59

score 1 · Answer 1 · answered Sep 01 '19 at 20:50

To put my two cents in, you could use a regex/parser combination as in:

from parsimonious.grammar import Grammar
from parsimonious.expressions import IncompleteParseError, ParseError
import re

junk = """so the 1234 2-1-1919 215.777.9839 1333331234 20-20-2000 A1234567 (515)2331129 7654321B 
(511)231-1134 512-333-1134 7777777 a7727373 there 1-22-2001 *1831 5647 and !2783"""

rx = re.compile(r'[-()\d]+')
grammar = Grammar(
    r"""
    phone       = area part part
    area        = (lpar digits rpar) / digits
    part        = dash? digits

    lpar        = "("
    rpar        = ")"
    dash        = "-"
    digits      = ~"\d{3,4}"
    """
)

for match in rx.finditer(junk):
    possible_number = match.group(0)
    try:
        tree = grammar.parse(possible_number)
        print(possible_number)
    except (ParseError, IncompleteParseError):
        pass

This yields

(515)2331129
(511)231-1134
512-333-1134

The idea here is to first match possible candidates which are then checked with the parser grammar.

Emma · Accepted Answer · 2019-09-01T18:22:34.260

Maybe, we could use alternation based on the cases you might have:

\d{3}-\d{3}-\d{4}|\(\s*\d{3}\s*\)\d{7}|\(\s*\d{3}\s*\)\s*\d{3}-\d{4}

We can also include additional boundaries if it'd be necessary:

(?<!\S)(?:\d{3}-\d{3}-\d{4}|\(\s*\d{3}\s*\)\d{7}|\(\s*\d{3}\s*\)\s*\d{3}-\d{4})(?!\S)

Demo

Test

import re

expression = r"\d{3}-\d{3}-\d{4}|\(\s*\d{3}\s*\)\d{7}|\(\s*\d{3}\s*\)\s*\d{3}-\d{4}"

string = """
so the 1234 2-1-1919 215.777.9839 1333331234 20-20-2000 A1234567 (515)2331129 7654321B (511)231-1134 512-333-1134 7777777 a7727373 there 1-22-2001 *1831 5647 and !2783 (511) 231-1134 ( 511)231-1134 (511 ) 231-1134
511-2311134

"""


print(re.findall(expression, string))

Output

['(515)2331129', '(511)231-1134', '512-333-1134', '(511) 231-1134', '( 511)231-1134', '(511 ) 231-1134']

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

RegEx Circuit

jex.im visualizes regular expressions:

selecting variations of phone numbers using regex

2 Answers2

Demo

Test

Output

RegEx Circuit

Linked