0

So I've got this URL regex:

/(?:((?:[^-/"':!=a-z0-9_@]|^|\:))((https?://)((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|@[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-_.,~]*[a-z0-9_&=#/])?))/iux

What it's currently matching:

I need it to also match:

  • www.google.com
  • google.com

I tried making the protocol part of the regex optional by slapping a ? at the end "(https?:\/\/)?" but that didn't do anything.

Ideas?

gpojd
  • 20,818
  • 5
  • 39
  • 71
Jon Ursenbach
  • 2,294
  • 5
  • 17
  • 22
  • possible duplicate of [What is the best regular expression to check if a string is a valid URL?](http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url) – gpojd Aug 03 '11 at 22:35

1 Answers1

3

I'd look for something in the language that you are using to do this. URLs are tough to match with a regex. If you insist, I changed yours to make the (https?://) optional. I did not check it though.

/(?:((?:[^-/"':!=a-z0-9_@]|^|\:))((https?://)?((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|@[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-_.,~]*[a-z0-9_&=#/])?))/iux

I got this example from the RFC 3986 and was directed there by this comment. Although, I'd still recommend using something from whatever language you are using rather than a regex.

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Since you are using PHP, did you consider using parse_url? It looks like it will return false on bad urls.

Community
  • 1
  • 1
gpojd
  • 20,818
  • 5
  • 39
  • 71