11

I was reading though this other question which has some really good regex's for the job but as far as I can see non of them work with BASH commands as BASH commands don't support such complex rexeg's.

if echo "http://www.google.com/test/link.php" | grep -q '(https?|ftp|file)://[-A-Z0-9\+&@#/%?=~_|!:,.;]*[-A-Z0-9\+&@#/%=~_|]'; then 
    echo "Link valid"
else
    echo "Link not valid"
fi

But this doesn't work as grep -q doesn't work ...

Edit, ok I just realised that grep had an "extended-regex" (-E) option which seems to make it work. But if anyone has a better/faster way I would still love to here about it.

Community
  • 1
  • 1
Mint
  • 12,683
  • 29
  • 67
  • 107
  • You might want to examine your motivation for asking this question. Programmers try to do static evaluation of URL/URIs and e-mail addresses which you can show to be syntactically valid but tells you nothing about their accessibility. For example, `http://www.example.com/bogus#fragment` will always be syntactically valid and will (presumably) always return a 404 error. – msw Jul 06 '10 at 04:38
  • This function doesn't need to know if the links works or not, just that it is a link. – Mint Jul 06 '10 at 04:55
  • Of what use is a URL that never locates a resource? Why even bother checking? – msw Jul 06 '10 at 10:32

3 Answers3

25

The following works in Bash >= version 3.2 without using grep:

regex='(https?|ftp|file)://[-A-Za-z0-9\+&@#/%?=~_|!:,.;]*[-A-Za-z0-9\+&@#/%=~_|]'
string='http://www.google.com/test/link.php'
if [[ $string =~ $regex ]]
then 
    echo "Link valid"
else
    echo "Link not valid"
fi

Your regex doesn't seem to include lowercase alpha characters [a-z] so I added them here.

Dennis Williamson
  • 303,596
  • 86
  • 357
  • 418
  • 3
    Bug with `http://печки-лавочки.рф/` which is a valid URL. A more complete regex can be found at http://stackoverflow.com/questions/161738 – Nicolas Raoul Dec 03 '13 at 09:35
  • nice, working flawlessly with egrep too (especially that I needed some url's ending in .mp3), Nicolas Raoul, two problems: 1. I was searching for bash solution NOT PHP ! (not always working with bash, and not easy to convert) also, international SUX, it usually only apply in that country and who wanna be visited by all, won't use chrs outside standard ascii (I'm living in such country and try to avoid that at all costs) ... I'm not even talking about you don't have an answer marked as a solution ... – THESorcerer Sep 21 '15 at 13:17
  • 1
    This was helpful. But I think you want to anchor the regex to avoid a string like `'garbage http://google.com'` being passed as valid. I just added ^ and $ to the beginning and end of the regex respectively, like so: `regex='^(https?|ftp|file)://[-A-Za-z0-9\+&@#/%?=~_|!:,.;]*[-A-Za-z0-9\+&@#/%=~_|]$'` – Christopher Werby Oct 02 '16 at 21:14
3

Since I don't have enough rep to comment above, I am going to amend the answer given by Dennis above with this one.

I incorporated Christopher's update to the regex and then added more to it so that the URL has to at least be in this format:

http://w.w (has to have a period in it).

And tweaked output a bit :)

regex='^(https?|ftp|file)://[-A-Za-z0-9\+&@#/%?=~_|!:,.;]*[-A-Za-z0-9\+&@#/%=~_|]\.[-A-Za-z0-9\+&@#/%?=~_|!:,.;]*[-A-Za-z0-9\+&@#/%=~_|]$'

url='http://www.google.com/test/link.php'
if [[ $url =~ $regex ]]
then 
    echo "$url IS valid"
else
    echo "$url IS NOT valid"
fi
Patrick Steil
  • 193
  • 1
  • 5
1

Probably because the regular expression is written in PCRE syntax. See if you have (or can install) the program pcregrep on your system - it has the same syntax as grep but accepts Perl-compatible regexes - and you should be able to make that work.

Another option is to try the -P option to grep, but the man page says that's "highly experimental" so it may or may not actually work.

I will say that you should think carefully about whether it's really appropriate to be using this or any regex to validate a URL. If you want to have a correct validation, you'd probably be better off finding or writing a small script in, say, Perl, to use the URL validation facilities of the language.

EDIT: In response to your edit in the question, I didn't notice that that regex is also valid in "extended" syntax. I don't think you can get better/faster than that.

David Z
  • 116,302
  • 26
  • 230
  • 268
  • This is only the backend, more validation will be done in php before anything gets displayed. – Mint Jul 06 '10 at 04:56