0

I got the bash script below for email validation from random sites, it's working fine, but I need to know how is it working?

I would greatly appreciate for clear cut explanation for this.

especially "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,4}"

#!/bin/sh
while true; do
read -p "Enter Email ID: " to_recipient
if [[ "$to_recipient" =~ [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4} ]]
then
    break;
else
    echo "Please enter a valid email address"
fi
done

Thanks again!

eLRuLL
  • 17,114
  • 8
  • 67
  • 91
M.S. Arun
  • 525
  • 8
  • 17
  • You removed an important backslash. – Cyrus Dec 28 '17 at 21:47
  • 2
    I suggest to start there: [Regular Expressions Tutorial - Learn How to Use and Get The Most out of Regular Expressions](https://www.regular-expressions.info/tutorial.html) and [The Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/3776858) – Cyrus Dec 28 '17 at 21:49
  • 1
    You just need to learn more about regexp : http://www.rexegg.com/regex-quickstart.html – George Vasiliou Dec 28 '17 at 21:53
  • 1
    However, the regular expression does not cover all allowed cases. This is a valid email address: `foo@[1.2.3.4]` – Cyrus Dec 28 '17 at 22:23
  • Validate them against what? The [RFC 822, Internet Message Format](https://tools.ietf.org/html/rfc2822) and friends allow so many forms of addresses its probably not worthwhile to do. – jww Dec 28 '17 at 23:54

1 Answers1

1

Short version is that this is a regex match evaluation =~. The long story is that you need to learn the grammar of regular expressions to understand it.

Here is a short explanation of the specific regex you present:

In regular expressions, the [ ] delimit 'character classes' They will match any character within the class. Within character class definitions, you can specify ranges of characters using -. So, in the first one: [a-zA-Z0-9._%+-], that is a class of characters which is any lower case letter, any upper case letter, any number, or ., %, +, or -. Then, the + outside of that class is a Kleene Plus, which indicates one or more of the previous expression (in this case, the character class). Then next bit is an @ sign, which should be self explanatory. The last two classes are supposed to match a domain name, and they're using alphanumeric and . and - in the SLD part, and then in the TLD part they're only allowing 2-4 alpha only characters (the {N,M} syntax indicates lower and upper bounds on the number of matches of the previous expression). I note here that this won't match the longer TLDs which are perfectly valid nowadays: .shopping, etc.

To actually match an email address using the full RFC for emails, it's actually more complicated than what you've got here.

For more information look up:

  1. https://en.wikipedia.org/wiki/Kleene_plus
  2. https://en.wikipedia.org/wiki/Regular_expression
  3. https://www.ietf.org/rfc/rfc0822.txt?number=822

I hope this helps.

JawguyChooser
  • 1,566
  • 14
  • 30