Regex allows spaces

Question

For the following regex expression:

var regex = new RegExp("^(www\\.)?[0-9A-Za-z-\\.@:%_\+~#=]+(\\.[a-zA-Z]{2,})+(/.*)?(\\?.*)?");

I don't understand why the string "www.goo gle.com" passes the regex test. When I did this:

var regex = new RegExp("^(www\\.)?[0-9A-Za-z-\\.@:%_\+~#=]+(\\.[a-zA-Z]{2,})+(/.*)?(\\?.*)?$");

i.e. adding $ in the end of the regex string prevents the above string passing, which is what I would want.

I tried finding a "simulator" online to help me figure out how the regex is matching but couldn't find much help.

@PeterOlson: Yes, but why does the regex work on adding the `$` at the end of regex pattern. Shouldn't it still match `gle.com` pattern. — name_masked, Jun 26 '17 at 19:09
@revo: My question is why regex works on adding `$` at the end of regex pattern. — name_masked, Jun 26 '17 at 19:11
Doesn't it function the same with/without the `$`? You only require 1 or more `0-9A-Za-z-\\.@:%_\+~#=`, then one or more instance of `\.[a-zA-Z]{2,}`. — chris85, Jun 26 '17 at 19:11
In first regular expression you are doing a *partial* match since no exact match is considered. So as soon as a match is found engine is satisfied. In contrast enclosing whole regex with *beginning of input string* and *end of input string* anchors (`^` & `$`) means an exact match which starts from beginning and should finish at the end of input string otherwise it fails. — revo, Jun 26 '17 at 19:22
@sln: Weird, doesn't complain about invalid range to me. the `... z-\\.@ ..` is taken literally and not as a range. — name_masked, Jun 26 '17 at 19:23
@name_masked - Yeah it's funny like that, any ambiguity and the `-` is taken literally. I.e. `[a-z-A-Z]`. But, some engines are strict that way and require it to be escaped if ambiguous. Otherwise, if the engine is lame, and has weird internal parsing rules, it might do it's own interpretation, and the result is undefined behavior (like this?). — , Jun 26 '17 at 19:28
To the OP, don't use `.*` anywhere in the regex. Use `\S*?` if you don't want whitespace. And, might want to use or modify a more commercial regex for url's `^(?!mailto:)(?:(?:https?|ftp):\/\/)?(?:\S+(?::\S*)?@)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))|localhost)(?::\d{2,5})?(?:\/[^\s]*)?$` — , Jun 26 '17 at 19:31
Put your regex into regex101.com and you'll see why it's working. It matches everything up to the space because the rest is all optional. — Barmar, Jun 26 '17 at 19:34
@name_masked: [Your regex matches partially](https://regex101.com/r/JtAUev/1) and matches `www.goo` when input is `www.goo gle.com`. So no it is not matching space everywhere but due to missing end anchor it matches partially. — anubhava, Jun 26 '17 at 19:35

valiano · Accepted Answer · 2017-06-26T19:48:41.883

2

www.goo gle.com passes the test since, www. is matched by [0-9A-Za-z-\\.@:%_\+~#=]+ and goo is matched by (\.[a-zA-Z]{2,})+. In contrast, (www\\.)?, and the last two groups are optional, so the regex is satisfied even if they are not matched, hence there's no need to further match gle.com.

By adding $, the regex no longer matches, since the space is not matched by any of the subexpressions.

edited Jun 26 '17 at 19:48

answered Jun 26 '17 at 19:34

valiano

10,373
4
36
60

Regex allows spaces

1 Answers1