Depending on how carefully you want your script to validate the URL, the regex you provided, as long as you get rid of the '^' and '$' anchors, works fairly well (as seen here).
Note that I added some whitespace in the regex just for readability.
There are several issues that I see from that regex (as you can probably see on that page). It matches in places where it shouldn't (such as repeated ..
characters), and sites with .co.uk
are matching the .co
portion along with the domain name and .uk
separately. That, by itself, can be fairly easy to fix just simply adding those edge cases directly into the second group (the one with (com|org|...)
).
The reason you'll need to remove the '^' and '$' anchors is that the pattern will only match if the URL is the only thing on the line: ^
has to match at the beginning of the line, and $
can only match at the end. Having <b>www.google.com</b>
means that the <b>
will make the ^
anchor fail to match the URL since it's not starting at the beginning of the line.
The other suggestions, such as @amalloy's link, gives a much more comprehensive solution and will match everything correctly, but it is very complex.
So knowing exactly what you want to match, and what you're willing to ignore/trade/give up, will help craft something that works for you.