3

How do you get a string not containing a specific group?

(?:[0-9-+*/()x]|abs|pow|ln|pi|e|a?(sin|cos|tan)h?)+

The above string is a regular expression for mathematical expressions. How do you get the string that is not a mathematical expression?

Example input string: WIDTH+LENGTH*abs(2)

Expected output: WIDTH LENGTH

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
tjvg1991
  • 312
  • 3
  • 9

3 Answers3

5

You can use the regex in a negative look-ahead and then add a \w shorthand class to match alphanumeric symbols, or [a-zA-Z] with \b word boundaries:

(?![0-9-+*/()x]|abs|pow|ln|pi|e|a?(?:sin|cos|tan)h?)\b[a-zA-Z]+\b

See regex demo

Since we are only allowing letters with [a-zA-Z], we can reduce this further to

(?!x|abs|pow|ln|pi|e|a?(?:sin|cos|tan)h?)\b[a-zA-Z]+\b

See another demo

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • Thank you so much!! It worked. Saved me a lot of time. – tjvg1991 Jun 23 '15 at 09:25
  • In this case it's probably irrelevant, but note that this is not an actual inversion of the original regex, as both regexes don't match, for example, special characters like `#$§!?~_:;,.{}[]`. – Siguza Jun 23 '15 at 09:26
  • @Siguza: I agree it is not an inversion, perhaps, it can even be written in a better form. I guess `[0-9-+*/()x]` is not that necessary since we only allow `[a-zA-Z]`. Perhaps, we can reduce the expression to `(?!x|abs|pow|ln|pi|e|a?(?:sin|cos|tan)h?)\b[a-zA-Z]+\b`. – Wiktor Stribiżew Jun 23 '15 at 09:31
  • Doesn't work as expected on `axabs(3)`, where it should extract only the first `a`. – Siguza Jun 23 '15 at 09:57
  • @Siguza: No idea what `axabs` is, but if it is a function name, it should be [included into the alternative list](https://regex101.com/r/uW6yP4/5). – Wiktor Stribiżew Jun 23 '15 at 10:02
  • @stribizhev I'd say it's an alternative form of `a*abs`, but I'm just messing around with tjvg's regex, trying to find cases where your regex does not match the inverse of what his regex matches. :P – Siguza Jun 23 '15 at 10:15
  • how about reg ex to cater variables with numbers? (ex. width1 + width2) – tjvg1991 Jul 07 '15 at 02:53
  • 1
    @tjvg1991: You will need to extend the `[a-zA-Z]` class to `[a-zA-Z0-9]` => `(?!x|abs|pow|ln|pi|e|a?(?:sin|cos|tan)h?)\b[a-zA-Z0-9]+\b`. – Wiktor Stribiżew Jul 07 '15 at 05:09
1

When you want to "skip" certain expression here is what you do in regex:

"Tarzan"|skip1|skip2|skip3|more|complicated|expressions|here|(Tarzan)

... as simple as The Best Regex Trick Ever.

When you iterate regex matches collection you only need the matches that have anything in the first capturing group and ignore any other match.

There is no need to use complicated look-arounds which generally don't work for overlapping edge-cases.

wqw
  • 10,921
  • 1
  • 30
  • 39
0

While stribizhev's answer might work in most situations, it is not a true inversion of the regex in the question, as there are things that both regex'es don't match:

  • Spaces
  • Special characters like ?!^~;:_,.[]{}<> (and probably more).

And things that both regex'es do match:

  • Strings such as axabs(3), where the xabs part is matched by both.

This could probably be fixed by fiddling around, but hell, I want an actual inversion! :P

So here it is:

(?:(?!e|ln|(?<=l)n|pi|(?<=p)i|abs|(?<=a)bs|(?<=ab)s|pow|(?<=p)ow|(?<=po)w|sin|(?<=s)in|(?<=si)n|cos|(?<=c)os|(?<=co)s|tan|(?<=t)an|(?<=ta)n|asin|acos|atan)[^0-9-+*/()x])+

It works like this:

  1. Match any character that is not one of 0-9-+*/()x (= [^0-9-+*/()x]).
  2. But do not match that character, if it matches a certain pattern of preceeding/following characters, and is itself a certain character.
    Using a negative lookahead ((?!...)) means that the first character after every | is the current character, the characters after that are the ones following the current one, and the (?<=) is a negative lookbehind, matching certain preceding characters.
    So, for example, in order to not match sin, we need to "not match" s if followed by in, not match i if preceded by s and followed by n and not match n if preceded by si.
    In regex (lookaround part only): (?!sin|(?<=s)in|(?<=si)n)
    Constructing the full list for e, ln, pi, etc. results in:

    (?!e|ln|(?<=l)n|pi|(?<=p)i|abs|(?<=a)bs|(?<=ab)s|pow|(?<=p)ow|(?<=po)w|sin|(?<=s)in|(?<=si)n|cos|(?<=c)os|(?<=co)s|tan|(?<=t)an|(?<=ta)n|asin|acos|atan)
    
  3. Match the above one or more times ((?:...)+).

By merging parts like (?<=l)n, (?<=si)n and (?<=ta)n into (?<=l|si|ta)n, the regex can be shortened a bit:

(?:(?!e|ln|(?<=l|si|ta)n|pi|(?<=p)i|abs|(?<=a)bs|(?<=ab|co)s|pow|(?<=p)ow|(?<=po)w|a?(?:sin|cos|tan)|(?<=s)in|(?<=c)os|(?<=t)an)[^0-9-+*/()x])+

A demo of this, as well as a beautiful visualization can be viewed on Debuggex.

Note 1: This regex does not work in JavaScript, as JS-regex does not support lookbehind.
Note 2: Appending a single multi-byte character (such as §°☀☁️❄️, for example) to the test string in Debuggex might seem to break it, however this is not an issue with my regex, as can be verified with PHP, for example.

Community
  • 1
  • 1
Siguza
  • 15,723
  • 6
  • 44
  • 66