0

So I looked at How to write a recursive regex that matches nested parentheses? and other solutions for recursive regex matching, but I'm still not getting a proper match on RegexBuddy.

I have a generic handlebars-style template that I want to parse myself, a table with headings:

<table>
    <thead>
        <tr>
            {{#each columns as col }}<th>{{col}}</th>{{/each}}
        </tr>
    </thead>
    <tbody>
        {{#each rows as row }}
        <tr>
            {{#each row as col }}<td>col</td>{{/each}}
        </tr>
        {{/each}}
    </tbody>
</table>

And trying to match with

/{{\#each (\w+) as (\w+) }}(.*?|(?R)){{/each}}/s

The regex matches the {{#each columns... in the <thead> just fine, but it seems to ignore the |(?R) part and matches {{#each rows... only until the first {{/each}}. I, of course, would like it to match both the inner and outer #each expressions. How? This is perhaps much more complex than simple nested parentheses.

(I always feel like I'm a pro at RegEx until I run into things like this. I have been trying for a while to make this work, and regular-expressions.info is just confusing me more.)


I'm currently working around this by doing {{#each_sub...}}...{{/each_sub}} so my regex won't stop on the first closing tag, but that's obviously a sub-optimal way of doing it. I have several other applications that would benefit from recursive regex but can't figure out what I'm doing wrong.

Community
  • 1
  • 1
Phil Tune
  • 2,934
  • 3
  • 21
  • 43
  • `[^()]*` in the regex you link to means *not the leading nor trailing boundary*. Thus, you need something like [`{{#each (\w+) as (\w+) }}(?:(?!{{#each\b[^}]*}}|{{\/each}}).)*(?:(?R)(?:(?!{{#each\b[^}]*}}|{{\/each}}).)*)*+{{\/each}}`](https://regex101.com/r/sT4yB2/1). – Wiktor Stribiżew Apr 13 '16 at 22:03
  • 1
    Thank you @WiktorStribiżew! That matches exactly what I wanted. – Phil Tune Apr 14 '16 at 12:51

1 Answers1

2

It isn't ignoring the recursion, it's just never reaching it. Because .*? is capable of matching your delimiters ({{#each...}} and {{/each}}), it matches the first closing delimiter it finds and reports success without ever needing to recurse.

For this technique to work, the branch before the (?R) has to match anything that's not a delimiter. Since your delimiters consist of multiple characters, you can't use a negated character class, as they did in the question you linked to. Instead, you need to use a tempered greedy token:

(?:(?!{{[#/]each\b).)*

This is the same as .*, except before it consumes each character it checks to make sure it's not the beginning of {{#each or {{/each. Here it is in context:

{{\#each (\w+) as (\w+) }}(?:(?:(?!{{[#/]each\b).)*|(?R))*{{/each}}

If the first branch fails, it means you've encountered something that looks like a delimiter. If it's an opening delimiter, the second branch takes over and tries to match the whole pattern recursively. Otherwise, it pops out of the loop (note the * after the group--you were missing that, too) and tries to match a closing delimiter.

While the regex above will work fine on valid input, it's subject to catastrophic backtracking if input is malformed. To avoid that, you can use an unrolled loop in place of the alternation (as @Wiktor did in his comment):

{{\#each\s+(\w+)\s+as\s+(\w+)\s*}}(?:(?!{{[#/]each\b).)*(?:(?R)(?:(?!{{[#/]each\b).)*)*{{/each}}

Here's a slightly more readable version, with possessive quantifiers added to squeeze out even more speed:

{{\#each\s+(\w+)\s+as\s+(\w+)\s*}}
(?:(?!{{[#/]each\b).)*+
(?:
  (?R)
  (?:(?!{{[#/]each\b).)*+
)*+
{{/each}}
Alan Moore
  • 68,531
  • 11
  • 88
  • 149
  • Wow, fantastic answer, I don't think I would have stumbled upon this easily. It's working in my application. It'll take me a bit to understand how it's working, but your answer is a great resource. Thanks to you and Wiktor! – Phil Tune Apr 14 '16 at 12:54