2

I have this RegExp in JS and I am using .test() method for validating the url:

new RegExp(/^((https|http):\/\/www\.)?(www\.)?[a-z0-9_-]+\.[a-z]+(\/)?(\/[a-z0-9]+(\.(php|html|asp|aspx))?)?$/i)

I want a RegExp that can validate this types of url (notice things like http and https):

https://www.page.com/about.php
https://www.page.com/about 
https://www.page.com/  
www.page.com
page.com
page-10.com
1234.com

This RegExp works when I try those url except when I try this url:

www.page

It takes that url as true, and I don't know why...

First part of my regexp says: the url could have https|http://www. or www. at the beginning or none of them.

/^((https|http):\/\/www\.)?(www\.)?

Second part says: after first part there will be letters, numbers, some simbols + a dot (here is where my error comes from I think. Why it doesn't recognize the dot?) + more letters, the result could be something like page-10.com

[a-z0-9_-]+\.[a-z]+

Third part is optional, it allows a slash and an extension like page.com /about.php:

(\/)?(\/[a-z0-9]+(\.(php|html|asp|aspx))?)?$

Question: In the second part I expect a dot when I say: \., but it doesn't recognize it, how can I be explicit and ask for it?

pharesdiego
  • 119
  • 5
  • 1
    This part of your regex `[a-z0-9_-]+\.[a-z]+` matches the `www.page`, the rest of the parts of regex are optional in this case – abhishekkannojia Oct 05 '17 at 07:16
  • 1
    This may be helpful: https://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url – abhishekkannojia Oct 05 '17 at 07:16
  • https://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url to specification URL validate regexes. You need specific URLs to validate only? – DanteTheSmith Oct 05 '17 at 09:15

1 Answers1

2

I break your regex to parts:

^
((https|http):\/\/www\.)? # Match http://www. or https://www. OR NOTHING
(www\.)? # Match www. OR NOTHING
[a-z0-9_-]+\. # Match at least 1 character in group [a-z0-9_-] followed by a dot
[a-z]+(\/)? # Match at least 1 character in group [a-z] followed by "/" OR NOTHING
(\/[a-z0-9]+(\.(php|html|asp|aspx))?)? # Match at least 1 character in group [a-z0-9] followed by a web page file extension OR NOTHING
$

As you can see, the parts which have "OR NOTHING" has no meaning if it does not appear in your test string. So your test case of www.page is matched by the following parts:

[a-z0-9_-]+\. # Match at least 1 character in group [a-z0-9_-] followed by a dot
[a-z]+(\/)? # Match at least 1 character in group [a-z] followed by "/" OR NOTHING

Then you have the answer to your question:

Question: In the second part I expect a dot when I say: ., but it doesn't recognize it, how can I be explicit and ask for it?


The fixed version of your regex:

^((https|http):\/\/)?(www\.)?([\w-]{2,}\.[\w-]{2,3}\.[\w-]{2,3}|[\w-]{2,}\.[a-zA-Z]{2,3})(\/[a-z0-9]+(\.(php|html|asp|aspx))?)?$

Match test result: https://regex101.com/r/wGp68e/6

Duc Filan
  • 5,158
  • 1
  • 18
  • 24
  • 1
    Thanks you for answering, I was testing your regexp but it has some problem like: ww-w.page.com (this return true), even it could allow whatever in the first part like: wwwwpage.page.com – pharesdiego Oct 05 '17 at 14:00
  • In your edited answer I just found 1 error: if after the first dot there is wrote 2 or 3 letters, symbols (-_) or numbers then it returns true and before the first dot you could write strings without limit. This returns true: `1234567._-d` (but by the way, it just happens when there is 2 or 3 characters after the dot :) ) – pharesdiego Oct 05 '17 at 14:20
  • I fixed the case you pointed out. Let me know if there is any error. – Duc Filan Oct 05 '17 at 14:41