0

Users will provide the string and I want to search that string through a list of predefined strings. The trick is that the user's string can be anything and may include regex characters like [*.?^ etc. So something like

"first half of my regex"   + `USER_STRING` +  "second half of my regex"

won't work. An obvious solution is to escape all special characters in USER_STRING with their escaped counterparts, but there's gotta be a better way.

PS: also, a simple string search won't do because I need to match patterns to the left and right of the string.

pseudosudo
  • 5,389
  • 6
  • 34
  • 48
  • What better way would there be other than making the user string regex-safe? – Dave Newton Feb 21 '13 at 14:05
  • Could you do it in two parts? First match your regex, then do the string search through the matches. – freejosh Feb 21 '13 at 14:07
  • 1
    @TimPietzcker: I don't know why those answers escape `-`, though. It is not necessary to do so, when `[]` have been escaped. – nhahtdh Feb 21 '13 at 14:54
  • @nhahtdh: True, but it doesn't hurt either. Some regex escape routines (for example Python's `re.escape()` simply escape *every* non-word character :) – Tim Pietzcker Feb 21 '13 at 15:39

2 Answers2

0

JavaScript has no built in regular expression escape function (analogous to PHP's preg_quote, for example). Some people have taken it upon themselves to create an equivalent, though: https://stackoverflow.com/a/6829401/454533

So no, there's not a better way.

Community
  • 1
  • 1
Explosion Pills
  • 176,581
  • 46
  • 285
  • 363
0

Just use this function to make sure that all special characters are quoted and treated as literal character in regex:

function escapeRegex(input) {
    return input.replace(/[[\](){}?*+^$\\.|]/g, '\\$&');
}

The function expect a String as input and output a String with all the special characters escaped. This is meant to create a String that can be fed to the RegExp constructor to create a regex that matches the original string. Regarding whether the output of this method can be concatenated safely, check my additional note below.

List of all special characters in JS regex on MDN.

  • Nothing much to say about these ^, $, ., |, *, ?, +.
    This also effectively disable the special meaning of ^ inside [] if the first character, and ? inside () if the first character.
    The same for ? and the lazy matching behavior when it follows a quantifier.

  • - is only meaningful inside [] - but not any more when [, ] are escaped.
    There might be problem if the template string is "[" + input + "]". I don't emulate the behavior of \Q and \E inside character class here, but you can add - to the regex in the function above if you want to.

  • \ followed by some special sequence will lose its meaning when \ is escaped.
    On a related note, the case that my method above fails is when the template string is "\\" + input. However, I would say the fault lies on whoever wrote the template string, since this is total non-sense.

  • :, =, ! are only meaningful inside () (for non-capturing group and look-ahead) and must follow after ?, but also lost its meaning when ( and ) are escaped. The ? is already escaped so it poses no problem when the escaped string is inserted in between ().
    Without escaping those, the method above fails when the template string is "(?" + input + ")". I again blame whoever who write this, since they are the one allowing the injection.

  • , is only meaningful inside {}, but lost its meaning when { and } are escaped.
    The case the escaping fails is when you have the template string (e.g. to match a initializer) "\\w+ = {" + input + "}", but normally, one will escape { and } in the template string if the intention is to match them as literal characters.
    There is also the case of repetition, but then, the template string should be ".{" + start + "," + end + "}, and the input must be sanitized first.

In summary, the meta-characters in the template string must be properly escaped for any escaping function to work. If the escaped string is to be used in a character class, add - to the character class.

nhahtdh
  • 52,949
  • 15
  • 113
  • 149