5

I am looking for a regex to replace all terms in parentheses unless the parentheses are within square brackets.

e.g.

(matches) #match
[(do not match)] #should not match
[[does (not match)]] #should not match

I current have:

[^\]]\([^()]*\) #Not a square bracket, an opening bracket, any non-bracket character and a closing bracket.

However this is still matching words within the square brackets.

I have also created a rubular page of my progress so far: http://rubular.com/r/gG22pFk2Ld

Phrogz
  • 271,922
  • 98
  • 616
  • 693
Gazler
  • 78,438
  • 15
  • 263
  • 235
  • Bit of a tough one. The double sets of square-brackets will require recursion (at least once) to match up all the pairs. Definitely possible with a more advanced Regex library that Ruby has (hello Perl fans!). I have a feeling you might have to settle for passing this through multiple sets of ruby regex. – Douglas F Shearer May 23 '11 at 19:20

2 Answers2

3

A regex is not going to cut it for you if you can nest the square brackets (see this related question).

I think you can only do this with a regex if (a) you only allow one level of square brackets and (b) you assume all square brackets are properly matched. In that case

\([^()]*\)(?![^\[]*])

is sufficient - it matches any parenthesised expression not followed by an unpaired ]. You need (b) because of the limitations of negative lookbehind (only fixed length strings in 1.9, and not allowed at all in 1.8), which mean you are stuck matching (match)] even if you don't want to.

So basically if you need to nest, or to allow unmatched brackets, you should ditch the regex and look at the answer to the question I linked to above.

Community
  • 1
  • 1
Andrew Haines
  • 6,394
  • 19
  • 34
  • This appears to work, even for nested square brackets as the depth of the nesting of the square brackets does not make a difference, the term is still matched. – Gazler May 23 '11 at 19:59
  • Thanks Andy, that is not necessary for my requirements. The square brackets are always at the start/end. I should have been more clear in my question. Thanks for the solution. :) – Gazler May 23 '11 at 20:13
2

This is a type of expression you cannot parse using a pure-regex approach, because you need to keep track of the current nesting/state_if_in_square_bracket (so you don't have a type 3 language anymore).

However, depending on the exact circumstances, you can parse it with multiple regexes or simple parsers. Example approaches:

  • Split into sub-strings, delimited by [/[[or ]/]], change the state when such a square bracket is encountered, replace () in a sub-string if in "not_in_square_bracket" state
  • Parse for square brackets (including content), remove & remember them (these are "comments"), now replace all the content in normal brackets and re-add the square brackets stuff (you can remember stuff by using unique temp strings)

The complexity of your solution also depends on the detail if escaping ] is allowed.

Community
  • 1
  • 1
J-_-L
  • 8,841
  • 2
  • 37
  • 36