7

I need to match the entire following statement:

{{CalendarCustom|year={{{year|{{#time:Y}}}}}|month=08|float=right}}

Basically whenever there is a { there needs to be a corresponding } with however many embedded { } are inside the original tag. So for example {{match}} or {{ma{{tch}}}} or {{m{{a{{t}}c}}h}}.

I have this right now:

(\{\{.+?(:?\}\}[^\{]+?\}\}))

This does not quite work.

Oded
  • 463,167
  • 92
  • 837
  • 979
thirsty93
  • 2,502
  • 6
  • 24
  • 25
  • What exactly are you trying to get out of the string? – Oded May 14 '11 at 15:00
  • I just want to match the entire statement so I can remove it. Like there is other text surrounding that and I want to match anything inside {} brackets and remove it. – thirsty93 May 14 '11 at 15:02
  • 2
    In general regexps are not the right tool to match brackets, see. e.g. [here](http://stackoverflow.com/q/546433/577423). – Howard May 14 '11 at 15:06
  • @Howard: "Regular expressions" have come a long way away from being regular. Modern regex flavors offer many new things, and a problem like this is perfectly suited for a recursive regex. – Tim Pietzcker May 14 '11 at 15:18
  • Can you just use JSON? This kind of sounds like you're outputting this string yourself, and then trying to parse it later. If you do in fact own both ends (and are just serializing and deserializing), you'll save yourself a lot of work if you just go with an existing solution ;) – John Gibb May 14 '11 at 15:21

2 Answers2

16

The .NET regex engine allows recursive matching:

result = Regex.Match(subject,
    @"\{                   # opening {
        (?>                # now match...
           [^{}]+          # any characters except braces
        |                  # or
           \{  (?<DEPTH>)  # a {, increasing the depth counter
        |                  # or
           \}  (?<-DEPTH>) # a }, decreasing the depth counter
        )*                 # any number of times
        (?(DEPTH)(?!))     # until the depth counter is zero again
      \}                   # then match the closing }",
    RegexOptions.IgnorePatternWhitespace).Value;
Tim Pietzcker
  • 297,146
  • 54
  • 452
  • 522
  • thanks for pointing this out. Learnt something today... Do you have a link that documents ``? – Oded May 14 '11 at 15:17
  • @Oded: `DEPTH` is an arbitrary name - it's just an empty named capturing group `(?)` which in .NET counts the number of matches; `(?)` is the same, just decreasing the counter. And `(?(ID)(?!))` only matches if the `id` counter is zero. This is documented on page 436 of Friedl's "Mastering Regular Expressions". – Tim Pietzcker May 14 '11 at 15:23
  • I tried using a basic regex solution that I found but it was crazy slow. Like 2+ minutes to run. This one is like instantaneous. – thirsty93 May 14 '11 at 15:57
4

I suggest writing a simple parser/tokenizer for this.

Basically, you loop over all the characters and start counting instances of { and } - incrementing for { and decrementing for }. Record the index of each first { and the index of each last } and you will have the indexes for your embedded expressions.

At this point you can use substring to get these and remove/replace them from the original string.

See this question and answers for why RegEx is not suitable.

Community
  • 1
  • 1
Oded
  • 463,167
  • 92
  • 837
  • 979
  • I second this. I've seen a company I used to work for go down the road of parsing via regex, and it only seems like it's going to be easier. It's a big learning curve, but it'll be worth it in the long run. Check out ANTLR for a starting point.... – John Gibb May 14 '11 at 15:20
  • Here's a very simple example of using ANTLR to parse and evaluate expressions. Notice how simple it is to just define what the valid 'tokens' are and then sprinkle in inline Java source code (it works with c# as well), and then ANTLR does the rest. http://www.antlr.org/wiki/display/ANTLR3/Expression+evaluator – John Gibb May 14 '11 at 15:26
  • I'm making something that runs on an xbox, so no unmanaged code allowed. – thirsty93 May 14 '11 at 15:58
  • @Paul - you can write this in c#. – Oded May 14 '11 at 16:00
  • @Paul - What? Looping through each char in a string? I described a simple algorithm. Where do you think unmanaged code comes into this? I do not mean ANTLR. – Oded May 14 '11 at 20:29