How to validate email ID using bash script?

Question

I got the bash script below for email validation from random sites, it's working fine, but I need to know how is it working?

I would greatly appreciate for clear cut explanation for this.

especially "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,4}"

#!/bin/sh
while true; do
read -p "Enter Email ID: " to_recipient
if [[ "$to_recipient" =~ [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4} ]]
then
    break;
else
    echo "Please enter a valid email address"
fi
done

Thanks again!

I suggest to start there: [Regular Expressions Tutorial - Learn How to Use and Get The Most out of Regular Expressions](https://www.regular-expressions.info/tutorial.html) and [The Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/3776858) — Cyrus, Dec 28 '17 at 21:49
You just need to learn more about regexp : http://www.rexegg.com/regex-quickstart.html — George Vasiliou, Dec 28 '17 at 21:53
However, the regular expression does not cover all allowed cases. This is a valid email address: `foo@[1.2.3.4]` — Cyrus, Dec 28 '17 at 22:23
Validate them against what? The [RFC 822, Internet Message Format](https://tools.ietf.org/html/rfc2822) and friends allow so many forms of addresses its probably not worthwhile to do. — jww, Dec 28 '17 at 23:54

score 1 · Accepted Answer · answered Dec 28 '17 at 21:54

Short version is that this is a regex match evaluation =~. The long story is that you need to learn the grammar of regular expressions to understand it.

Here is a short explanation of the specific regex you present:

In regular expressions, the [ ] delimit 'character classes' They will match any character within the class. Within character class definitions, you can specify ranges of characters using -. So, in the first one: [a-zA-Z0-9._%+-], that is a class of characters which is any lower case letter, any upper case letter, any number, or ., %, +, or -. Then, the + outside of that class is a Kleene Plus, which indicates one or more of the previous expression (in this case, the character class). Then next bit is an @ sign, which should be self explanatory. The last two classes are supposed to match a domain name, and they're using alphanumeric and . and - in the SLD part, and then in the TLD part they're only allowing 2-4 alpha only characters (the {N,M} syntax indicates lower and upper bounds on the number of matches of the previous expression). I note here that this won't match the longer TLDs which are perfectly valid nowadays: .shopping, etc.

To actually match an email address using the full RFC for emails, it's actually more complicated than what you've got here.

For more information look up:

I hope this helps.

How to validate email ID using bash script?

1 Answers1