30

I am trying to create a .NET RegEx expression that will properly balance out my parenthesis. I have the following RegEx expression:

func([a-zA-Z_][a-zA-Z0-9_]*)\(.*\)

The string I am trying to match is this:

"test -> funcPow((3),2) * (9+1)"

What should happen is Regex should match everything from funcPow until the second closing parenthesis. It should stop after the second closing parenthesis. Instead, it is matching all the way to the very last closing parenthesis. RegEx is returning this:

"funcPow((3),2) * (9+1)"

It should return this:

"funcPow((3),2)"

Any help on this would be appreciated.

Icemanind
  • 43,745
  • 45
  • 159
  • 272

4 Answers4

55

Regular Expressions can definitely do balanced parentheses matching. It can be tricky, and requires a couple of the more advanced Regex features, but it's not too hard.

Example:

var r = new Regex(@"
    func([a-zA-Z_][a-zA-Z0-9_]*) # The func name

    \(                      # First '('
        (?:                 
        [^()]               # Match all non-braces
        |
        (?<open> \( )       # Match '(', and capture into 'open'
        |
        (?<-open> \) )      # Match ')', and delete the 'open' capture
        )+
        (?(open)(?!))       # Fails if 'open' stack isn't empty!

    \)                      # Last ')'
", RegexOptions.IgnorePatternWhitespace);

Balanced matching groups have a couple of features, but for this example, we're only using the capture deleting feature. The line (?<-open> \) ) will match a ) and delete the previous "open" capture.

The trickiest line is (?(open)(?!)), so let me explain it. (?(open) is a conditional expression that only matches if there is an "open" capture. (?!) is a negative expression that always fails. Therefore, (?(open)(?!)) says "if there is an open capture, then fail".

Microsoft's documentation was pretty helpful too.

Todd Menier
  • 32,399
  • 14
  • 130
  • 153
Scott Rippey
  • 14,881
  • 5
  • 68
  • 83
22

Using balanced groups, it is:

Regex rx = new Regex(@"func([a-zA-Z_][a-zA-Z0-9_]*)\(((?<BR>\()|(?<-BR>\))|[^()]*)+\)");

var match = rx.Match("funcPow((3),2) * (9+1)");

var str = match.Value; // funcPow((3),2)

(?<BR>\()|(?<-BR>\)) are a Balancing Group (the BR I used for the name is for Brackets). It's more clear in this way (?<BR>\()|(?<-BR>\)) perhaps, so that the \( and \) are more "evident".

If you really hate yourself (and the world/your fellow co-programmers) enough to use these things, I suggest using the RegexOptions.IgnorePatternWhitespace and "sprinkling" white space everywhere :-)

xanatos
  • 102,557
  • 10
  • 176
  • 249
  • I think you are missing the last critical part, `(?(BR)(?!))` – Scott Rippey Oct 26 '11 at 06:22
  • @ScottRippey No. There are other expressions after the closing `)`. The OP question was VERY precise. He wants `funcsomething()`, not to parse the entire expression. So the first "unbalanced" bracket I find is the closing bracket of my sub-expression. `funcPow((3),2) * (9+1) -> funcPow((3),2)` – xanatos Oct 26 '11 at 06:24
  • 1
    Oh, I realized that `(?(BR)(?!))` is only to ensure the opening brace has a closing brace. Microsoft's website: "The final subexpression, (?(Open)(?!)), indicates whether the nesting constructs in the input string are properly balanced " – Scott Rippey Oct 26 '11 at 06:50
  • @ScottRippey Exactly as you wrote :-) Note that it's a combination of an alternation construct http://msdn.microsoft.com/en-us/library/36xybswe(v=VS.71).aspx of type `(?(name)yes|no)` with a negative lookahead expression `(?!)` that always fails. So it reads "if there is a capture of name `name` then fail" – xanatos Oct 26 '11 at 07:44
  • 1
    This was a good discussion :-) We think alike. Too alike. Now I must destroy you. I'm sorry. – Scott Rippey Oct 26 '11 at 07:59
  • 1
    Just for the record, the difference between including the `(?(BR)(?!))` and not including it is that, without it, the expression will _match_ up to and including the last closing parenthesis if ther are not _enough_ closing parentheses. With it, the expression as a whole will _not match_. – Rawling Feb 06 '13 at 16:33
0

Regular Expressions only work on Regular Languages. This means that a regular expression can find things of the sort "any combination of a's and b's".(ab or babbabaaa etc) But they can't find "n a's, one b, n a's".(a^n b a^n) Regular expressions can't guarantee that the first set of a's matches the second set of a's.

Because of this, they aren't able to match equal numbers of opening and closing parenthesis. It would be easy enough to write a function that traverses the string one character at a time. Have two counters, one for opening paren, one for closing. increment the pointers as you traverse the string, if opening_paren_count != closing_parent_count return false.

jb.
  • 8,953
  • 10
  • 47
  • 85
  • 7
    That's as may be, but **regexes** can be used on almost any kind of text as long as you understand their limitations. Recursive/balanced patterns are ugly and (IMO) seldom worth the effort, but they *are* supported by many regex flavors. – Alan Moore Oct 26 '11 at 06:26
-1
func[a-zA-Z0-9_]*\((([^()])|(\([^()]*\)))*\)

You can use that, but if you're working with .NET, there may be better alternatives.

This part you already know:

 func[a-zA-Z0-9_]*\( --weird part-- \)

The --weird part-- part just means; ( allow any character ., or | any section (.*) to exist as many times as it wants )*. The only issue is, you can't match any character ., you have to use [^()] to exclude the parenthesis.

(([^()])|(\([^()]*\)))*
rkw
  • 7,173
  • 4
  • 22
  • 38
  • You should specify that this will work with only one level of nesting. – Scott Rippey Oct 26 '11 at 08:03
  • @ScottRippey: It still works if there is a function within that function. The | condition handles that. Can you give an example where this regex would provide a false match? – rkw Oct 26 '11 at 08:47
  • 1
    It works correctly, and does exactly as the OP asked, so it is a good answer. However, it is hard coded to only match one level of nesting, so it would fail to match: `func(a(b(c)d)e)`. It is unclear if the OP needed this. – Scott Rippey Oct 26 '11 at 16:40