1

Suppose a string on the following format:

Use \hyperlink{aaa}{apple {pear} banana} and \hyperlink{bbb}{banana {pear} {apple}}.

I want to extract:

\hyperlink{aaa}{apple {pear} banana}
\hyperlink{bbb}{banana {pear} {apple}}

What regex could be used for such an extraction?

I got stuck with this:

\\hyperlink{\S+}{.+}
woeterb
  • 161
  • 4
  • Can there be arbitrary nesting? What language? See [Regular expression to match balanced parentheses](https://stackoverflow.com/questions/546433/regular-expression-to-match-balanced-parentheses?noredirect=1&lq=1) – bobble bubble Oct 15 '19 at 16:37
  • @bobblebubble what is meant by arbitrary nesting? I'm programming in Python – woeterb Oct 17 '19 at 08:33
  • I meant if there can be deeper nesting than `{{}{}}` eg like `{{}{{}}}` for inifinite nesting you'd need a recursive regex. In Python it would be possible with a package. – bobble bubble Oct 17 '19 at 18:02

2 Answers2

2

If there is no arbitrary nesting, you can use a pattern with negated }{ like

\\hyperlink{[^}{]*}{[^}{]*(?:{[^}{]*}[^}{]*)*}

Similar this answer but unrolled. See the demo at regex101. To {extract} use groups (demo).

Depending on your environment / regex flavor it can be necessary to escape the opening { by a backslash for the braces that are not inside a character class to match them literally.

Further note that \S+ can consume } and .+ can match more than desired if unaware.

bobble bubble
  • 11,968
  • 2
  • 22
  • 34
2

Here how you can do it with a recursive regex

\\hyperlink\{[^}]+?\}(\{(?>[^{}]+|(?1))+\})(?=\s|$)

Regex Demo

Recursive regex

Code Maniac
  • 33,907
  • 4
  • 28
  • 50