0

Is there a simple way to remove positive/negative lookbehind/lookahead groups from a regex expression using another regex expression (taking inside parenthesis into account)?

The example source expression: A(?<!B(C)D)E(?<=F)G(?!H(I(J))K)L(?=M(O)P)Q(?>R)S(T) The parts I want removed:

  • (?<!B(C)D)
  • (?<=F)
  • (?!H(I(J))K)
  • (?=M(O)P)

So far, I made the expression \(\?\<?[!=].+?\) to find the parts to remove but the inside parenthesis creates problems... For example, instead of finding the part (?<!B(C)D), it finds (?<!B(C)...

I thought about replacing the (?<!, (?!, (?<= & (?= with (?# (transforming them into embedded comment) and that works perfectly on "regex101.com" but sadly not in JAVA...

I'm trying to avoid having to loop through every chars with a bunch of if-else logic.

Note: I'm using these regex expression in Java (Kotlin) and using "containsMatchIn" method to match the source expression to actual text.

Mark Rotteveel
  • 82,132
  • 136
  • 114
  • 158
Don Madrino
  • 138
  • 1
  • 11
  • I'd write or use a parser--having a stack is helpful when you can have arbitrarily nested parens because regexes can't store state. If you do use regex, you'll need a recursive one. It's a bit harder than the dupe target suggests because you'll need to omit escaped substrings like `\(` and other edge cases but it should give you a starting point. – ggorlen Oct 02 '20 at 14:48
  • @ggorlen Recusion seemed promicing until I found out it's not available in Java. :-( – Don Madrino Oct 02 '20 at 15:24

2 Answers2

0

You will have to use PERL-compatible regular expressions, not standard Java ones, for they don't support recursion. Try this: \(\?<?[!=](?<r>[^()]|\(\g<r>+\))+\).

  • \(\?<?[!=] and \) are the beginning and end of an assertion,
  • (?<r>[^()]|\(\g<r>+\))+ is a regular expression for a string with balanced parentheses,
  • [^()]|\(\g<r>+\) is either a non-parenthesis, or a string with balanced parentheses (called recursively) within parentheses,
  • \g<r> is a recursive call to the previous rule.

Saved: https://regex101.com/r/mjMoyz/1.

Alexander Mashin
  • 3,126
  • 1
  • 6
  • 13
  • Do you know any active/recent perl-compatible regex libraries for java? (I'm developping in Android Studio so I don't have much of a choice about using java) – Don Madrino Oct 02 '20 at 15:48
  • You might find these answers helpful: https://stackoverflow.com/a/37890800/6632736 or https://stackoverflow.com/a/27616512/6632736. – Alexander Mashin Oct 02 '20 at 16:00
0

I finally found a solution... had to do a little out of the box thinking :-)

The following code did the trick, basically behaving like PERL's comment group construct (?#...):

Using string replacement:

expression = expression.replace("(?<!", "(?<!").replace("(?<=", "(?<=|").replace("(?!", "(?!").replace("(?=", "(?=|")

Or using regex replacement:

expression = Regex("""((?<?!)""").replace(Regex("""((?<?=)""").replace(expression, "$1|"), "$1_")

Don Madrino
  • 138
  • 1
  • 11