1

So I have a problem to find different types of phone numbers:

They could in these formats:

+xx xx xxxxxxxx - /[+][0-9]{2}\s[0-9]{2}[0-9]{8}/
xxxx xxxxxx     -   /[0-9]{4}\s[0-9]{6}/
xxxxx xxxxxx - /[0-9]{5}\s[0-9]{6}/
xxxxxxxxxxx -    /[0-9]{11}
+xx xxxxxxxxxx  - / [+][0-9]{2}\s[0-9]{10}
xxxx xxxxxxxxxx  - /[0-9]{4} \s0-9]{10}/

I wrote the regular expression for each one, but not sure how to combine it into one big expression to find them all.

How can I combine this into one big expression so that it could find all those formats of numbers if it was to go through a file of numbers.

JameshGong
  • 165
  • 7
  • 1
    It's too tedius to do a factor combination. Just join them with alternations `|` and enclose into a group `(?: ,,, )` This gives 1 regex. If you want to get fancy, you could surround the group with whitespace boundary's `(? –  Nov 05 '18 at 18:04
  • 1
    trying to merge multiple regex's into one big one gets really complicated very fast. It's generally a better idea to keep them separate and run each one individually. It also keeps your code much cleaner, more readable, and much more maintainable. Trust me, trying to modify super complex regexps is EXTREMELY frustrating. – doom87er Nov 05 '18 at 18:08
  • @doom87er I get that, but the brief in my work says that it should be one big expression – JameshGong Nov 05 '18 at 18:13
  • I'm assuming this is an assignment for a class? – doom87er Nov 05 '18 at 18:14
  • There are some great resources for learning Regex here: https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean and https://regex101.com is a great site for testing and debuging your regex's – doom87er Nov 05 '18 at 18:17
  • @doom87er oh thanks, will test it on there. – JameshGong Nov 05 '18 at 18:19

2 Answers2

2

If you just want a single regular expression that catches all these cases, you can just 'or' together the cases that you've provided:

(?:[+][0-9]{2}\s[0-9]{2}[0-9]{8})|(?:[0-9]{4}\s[0-9]{6})|(?:[0-9]{5}\s[0-9]{6})|(?:[0-9]{11})|(?:[+][0-9]{2}\s[0-9]{10})|(?:[0-9]{4} \s0-9]{10})

I just wrapped each regex in a non-capturing group (?:) and or'd | them together.

However, this isn't any different than iterating through and checking each regular expression individually, and is much less maintainable. I'd check the cases individually.

John
  • 2,285
  • 13
  • 21
0

Since your formats are really just text strings,

+xx xx xxxxxxxx
xxxx xxxxxx
xxxxx xxxxxx
xxxxxxxxxxx
+xx xxxxxxxxxx
xxxx xxxxxxxxxx

If you run your formats through this tool

it will give you a regex like this

     \+xx [ ] xx
     (?: [ ] )?
     xxxxxxxx
  |  xxxx
     (?:
          [ ] xxxxxx
          (?: xxxx )?
       |  x
          (?: [ ] )?
          xxxxxx
     )

where you just replace [ ] with \s{1,9} and x with \d
giving your final regex

\+\d{2}\s{1,9}\d{2}(?:\s{1,9})?\d{8}|\d{4}(?:\s{1,9}\d{6}(?:\d{4})?|\d(?:\s{1,9})?\d{6})

https://regex101.com/r/nF2L9T/1

    \+ \d{2} \s{1,9} \d{2} 
     (?: \s{1,9} )?
     \d{8} 
  |  
     \d{4} 
     (?:
          \s{1,9} \d{6} 
          (?: \d{4} )?
       |  \d 
          (?: \s{1,9} )?
          \d{6} 
     )

Since this is a full blown ternary trie, it is may times faster than an ordinary
bunch of alternations.

Regex1:   \+\d{2}\s{1,9}\d{2}(?:\s{1,9})?\d{8}|\d{4}(?:\s{1,9}\d{6}(?:\d{4})?|\d(?:\s{1,9})?\d{6})
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   6
Elapsed Time:    0.72 s,   715.33 ms,   715325 µs
Matches per sec:   419,389