2

I need a Regex to catch anything between parenthesis except when it is between the following pattern, an S char followed by square brackets:

S[]

Like in this sentence:

I am a (test) S[ but i am (not catched)], catch (me (if you can))
       ^^^^^^                                   ^^^^^^^^^^^^^^^^^   # should be matched
              ^^^^^^^^^^^^^^^^^^^^^^^^^^                            # should not be matched

It should also catch the nested parenthesis.

I tried to make it work using various example but the closest I get was this one :

(?![^S\[]*\])\(([^()]*|\(([^()]*|\(([^()]*|\([^()]*\))*\))*\))*\)?

=> but it fails when you remove the S from the test sentence.

Any idea of how to do it?

Edit: Should match like in this case but whith the S into account : https://regex101.com/r/WzECSS/1

Edit: this one should do the trick : (?<!S\[[^\]]+)\((?:[^()]|\([^)]*\))+\) thanks @ctwheels

Edit: the previous regex fails when you put the parenthesis directly after the square brackets like in:

"I am a (test) S[( but i am (not catched)], catch (me (if you can))"

Has anyone an idea on how to fix this. Thanks.

xenope
  • 33
  • 5
  • 1
    Please specify the expected result for your example. – Kosh Dec 11 '19 at 14:57
  • can't you just do a two pass? ```replace(/S\[^]+\]/g,'')``` then capture your matching parenthesis? – grodzi Dec 11 '19 at 14:58
  • If you make it a boolean like `S\[.*?\]|(\(.*?\))` then your desired results will be in `$1`. If you need to remove the `S` from the input then either make the `S` optional in the regex like this `S?\[.*?\]|(\(.*?\))` or remove it from the regex. – MonkeyZeus Dec 11 '19 at 14:59
  • Try [this](https://regex101.com/r/DiiYMT/1): `(?<=(? – ctwheels Dec 11 '19 at 15:22
  • @KoshVery It should catch like in this example : https://regex101.com/r/sX5hZ2/1 , except that here it doesn't take the "S" into account. MonkeyZeus: It should not match anything between S[ ... ], it does in what you provide. EDIT: sorry wrong link here is the good one => https://regex101.com/r/WzECSS/1 – xenope Dec 11 '19 at 15:27
  • @ctwheels it doesn't work it only matches the *(if you can)* part look at this one : https://regex101.com/r/WzECSS/1 – xenope Dec 11 '19 at 15:46
  • @xenope but you stated `catch anything between parenthesis except when it is between the following pattern [...] S[]`. What else is it supposed to match? – ctwheels Dec 11 '19 at 15:47
  • @ctwheels any parenthesis between S[ ...] must be ignored anything between the other parenthesis must be retrieved like here : https://regex101.com/r/WzECSS/1 **(test)** and **(me (if you can))** are matched – xenope Dec 11 '19 at 15:55
  • @xenope is my update to your question correct? – ctwheels Dec 11 '19 at 15:59
  • yes it is, any idea on how to do it? – xenope Dec 11 '19 at 16:04
  • 1
  • Dude you killed it! thanks a lot! – xenope Dec 11 '19 at 16:17

1 Answers1

2

You can use the following regex in ECMA2018+ (V8 engine or greater). Previous versions don't support variable length lookbehinds.

See regex in use here

(?<!S\[[^\]]+)\((?:[^()]|\([^)]*\))+\)

How this works:

  • (?<!S\[[^\]]+) negative lookbehind ensuring the following does not match:
    • S\[ match S[ literally
    • [^\]]+ match any character except ] one or more times
  • \( match ( literally
  • (?:[^()]|\([^)]*\))+ match either of the following options one or more times
    • [^()] match any character except ( and )
    • \([^)]*\) match (, then any character except ) any number of times, then )
  • \) match ) literally

Please note, however, that this only matches to a depth of two parentheses (one set with another nested). You can't easily balance parentheses in JavaScript's regex engine since recursion and other tags aren't currently supported.

This answer explains how to balance parentheses in different regex engines (including JavaScript if you use XRegExp).

Some examples of implementation in other languages - not possible in JavaScript since it doesn't include recursion, control verbs, balance groups, etc.:

PCRE: See here

S\[[^]]*\](*SKIP)(*FAIL)|\((?:[^()]|(?R))*\)

.NET: See here

(?<!S\[[^\]]+)\((?:[^()]|(?<p>\()|(?<-p>\)))+(?(p)(?!))\)

EDIT

Changing the quantifier in the lookbehind + to * prevents it from matching the case of S[(...)]:

(?<!S\[[^\]]*)\((?:[^()]|\([^)]*\))+\)
ctwheels
  • 19,377
  • 6
  • 29
  • 60
  • ++Very nice solution! – The fourth bird Dec 11 '19 at 16:46
  • Fails with this use case : "I am a (test) S[( but i am (not catched)], catch (me (if you can))" – xenope Jun 04 '20 at 12:37
  • Any idea? @ctwheels maybe? – xenope Jun 04 '20 at 14:17
  • @xenope according to your question, it should not match `S[...]` - are you looking instead for it to not match `S[...]` unless it has a nested parentheses set `S[(())]`? If you are trying to match the nested parentheses sets within `S[]`, are you also looking to match the `S[]` part or just the nested parentheses set portion: `S[(a(b))]` or just `(a(b))` in that example? – ctwheels Jun 04 '20 at 23:14
  • @ctwheels Hi, It should never match the S[...] whatever is inside it, in this case if we put a parenthesis right after the brackets it matches it but I don't want to. – xenope Jun 09 '20 at 13:27