0

In using Javascript to validate a URL, I used the following code from an SO answer:

function validateURL(textval) {
          var urlregex = new RegExp(
                "^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$");
          return urlregex.test(textval);
        }

This function works fine for most URLs I tried, but on the following Amazon url to (ironically) a Creating Mobile Apps with jQuery Mobile book, it hangs. In Chrome dev tools I see nothing but clicking anywhere inside the tab doesn't do anything.

http://www.amazon.com/gp/product/178216006X/ref=s9_simh_gw_p351_d3_i4?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-2&pf_rd_r=0SDC8SED1N96XPK44VD2&pf_rd_t=101&pf_rd_p=1389517282&pf_rd_i=507846

The URL is pretty long but there's nothing special in there that I can tell. In fact it passes the Javascript validation on the following Scott's Playground page.

My question is NOT how to do URL validation in Javascript. My question is the following: If I use a Javascript regular expression and it hangs on a piece of text, what makes a regular expression freeze the browser like this? How can I catch the cases that does that?

Is this something that only happens with new RegExp(...) vs /regex/ as mentioned in this answer?

In terms of the actual validation, I switch to a different /regex/ but I still wanted to post this question because it led to a pretty painful debugging process. (Then again, anything that tries to validate URLs or emails with regular expressions will probably be painful).

Community
  • 1
  • 1
Alan Turing
  • 11,403
  • 14
  • 66
  • 114
  • 1
    No offense, but that regular expression is completely ridiculous. It's unreadable, unmaintainable and quite simply too long. It's so bad you don't even know what's going on, and you wrote it! – Halcyon May 21 '13 at 19:46
  • I didn't write it, as I said in the first sentence, I took it exactly from the SO answer that I linked in the first sentence. That's not the point of this question. We often use regular expressions that other people write. The question is: what can make it hang and how to catch such cases. – Alan Turing May 21 '13 at 19:49
  • 3
    _"We often use regular expressions that other people write."_ - I don't know about you but I never use a regular expression that I haven't scrutinized myself first. This doesn't look like a generic URL matcher, it seems to be trying to match some specific set of numbers, and the non-exhaustive tld list is a joke. – Halcyon May 21 '13 at 19:52
  • 1
    Thanks for your constructive criticism of the regular expression, and for completely ignoring my question. – Alan Turing May 21 '13 at 19:56

2 Answers2

0

This seems to be something that happens with new RegExp(...) and not with /regex/ for this regular expression. So for URL validation and other types of regex matching, use:

function validFoo(value) {
    return /foo/i.test(value);
}

Where foo is the regular expression.

Alan Turing
  • 11,403
  • 14
  • 66
  • 114
  • Which is due to the quoting. When using regular quoting for regular expressions you have to use double backslashes. So `/\w+\./` would be quoted as `"\\w+\\."`. And that expression is a joke. You can most likely use something a lot simpler and better. – Qtax May 21 '13 at 20:14
  • 1
    It depends what you want/need. The real regex for URL validation would probably need to be a lot longer, but just better structured. Please don't use the word "joke" to refer to any legitimate attempt by someone. SO shouldn't be an elitist club of hecklers. Otherwise, you risk driving away a lot of smart people. – Alan Turing May 21 '13 at 20:27
0

It looks like there's a problem with the regular expression and how Chrome interprets it. In Firefox it works.

The issue lies at the second request parameter in your url (the &) and how the Chrome javascript engine gets stuck in a loop.

If you don't need to evaluate the port in the url use something like this: /^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-])/?$/

Andy
  • 219
  • 2
  • 2