0

I'm trying to build a complex regular expression, and I recall reading an article on email address validation where each portion of the regular expression was broken down into much simpler, individual regular expressions like:

email      := <localpart>@<domainpart>
localpart  := (?:<mailbox>\+)?(<username>)
domainpart := <domainname>\.<tld>
etc...

But I can't seem to find any documentation on anything similar. Is there a similar valid, programmatic syntax, or am I mis-remembering some RFC-style pseudocode?

Notes:

  • I'm not trying to validate an email address, I know about filter_var() and FILTER_VALIDATE_EMAIL.
  • I've added the perl tag since in my travels someone said "I think perl has something like that"
reinierpost
  • 7,591
  • 1
  • 32
  • 66
Sammitch
  • 25,490
  • 6
  • 42
  • 70
  • `my $email = qr/$localpart\@$domainpart/;` – mpapec Mar 13 '14 at 19:35
  • 5
    I think you might find this [answer](http://stackoverflow.com/a/18151617) interesting. – HamZa Mar 13 '14 at 19:39
  • You should be clear in your question that you're looking for a PHP solution, otherwise you're likely to get Perl answers from people who browse the [tag:perl] tag. – ThisSuitIsBlackNot Mar 13 '14 at 19:43
  • @ThisSuitIsBlackNot a PHP solution would be nice, but I'm not averse to a perl solution either. In fact, the question that HamZa linked is a perl solution and it seems to be working in PHP as well. – Sammitch Mar 13 '14 at 19:47
  • 2
    @Sammitch The answer I linked is a PCRE solution. Note that [`PCRE != perl regex`](http://www.manpagez.com/man/3/pcrecompat/). Since PHP uses PCRE, it should be what you're looking for. Btw, I also expect it to work on perl, not sure though. – HamZa Mar 13 '14 at 19:48
  • @Sammitch [Here's](https://github.com/KyraD/stack-csp/blob/master/src/KyraD/Stack/Csp/Policy.php#L174) an example. I think this question is a duplicate to the thread I linked earlier. If you don't agree, would you mind to comment what you expect? – HamZa Mar 13 '14 at 19:57
  • 1
    removed the 'perl' tag – reinierpost Mar 14 '14 at 09:22

2 Answers2

1

The syntax I was looking for, as suggested by @HamZa in the comments is:

/
    (?(DEFINE)
        (?<userpart> thomas | richard | harold )
        (?<domainpart> gmail | yahoo | hotmail )
        (?<tld> com | net | co\.uk )
        (?<email> (?&userpart)@(?&domainpart)\.(?&tld) )
    )
    ^To:\s.*\s<(?&email)>$
/xi

Which will match a line like: To: Mr. Selleck <thomas@gmail.com>

Edit: I've also found a more implementation-independant syntax that can be used: https://stackoverflow.com/a/22871592/1064767

Community
  • 1
  • 1
Sammitch
  • 25,490
  • 6
  • 42
  • 70
-1

In perl, and many perl style regex solutions /x means to ignore whitespace, and have comments (sample pulled shamelessly from perl.com

$_ =~ m/^                       # anchor at beginning of line
      The\ quick\ (\w+)\ fox    # fox adjective
      \ (\w+)\ over             # fox action verb
      \ the\ (\w+) dog          # dog adjective
      (?:                       # whitespace-trimmed comment:
        \s* \# \s*              #   whitespace and comment token
        (.*?)                   #   captured comment text; non-greedy!
        \s*                     #   any trailing whitespace
      )?                        # this is all optional
      $                         # end of line anchor
     /x;                        # allow whitespace

you can also use variables for regular expressions (taken from perlop ), including multiple ones, be careful not to use user input lest you create "regular expression injection". Storing the regex as string and then concatenating that string in the regex string will work in any language that allows you to store regex's as strings (all that I'm aware of)

$rex = qr/my.STRING/is;
print $rex;                 # prints (?si-xm:my.STRING)
s/$rex/foo/; 
xenoterracide
  • 13,850
  • 17
  • 89
  • 196