3

I want to catch bracket/parenthesis pairs that are next to each other and get hold of the words inside them. In the following text I want to catch [oh](so) and [bad](things).

[oh](so)funny
[all]the[bad](things)

If I use the regex r'\[(.*?)\]\((.*?)\)' it will catch [oh](so) and [all]the[bad](things), which is not what I want.

What's a good regex to solve this?

thameera
  • 8,262
  • 8
  • 34
  • 38
  • You could also loop over your pairs of delimiters. Also your question is similar to this one for anyone interested: https://stackoverflow.com/questions/546433/regular-expression-to-match-outer-brackets – tommy.carstensen Nov 22 '17 at 12:28

1 Answers1

8

Don't use .*?.

Instead use [^\]]+ and [^\)]+

In other words:

r'\[([^\]]+)\]\(([^\)]+)\)'

Lone Shepherd
  • 945
  • 1
  • 7
  • 25
  • It does the trick! But it would still fail at `[some[good](text)]`. Here I want to catch `[good](text)` only. Any idea how to deal with this situation as well? – thameera Aug 26 '12 at 18:07
  • 3
    That wasn't part of your question. If you need more sophisticated tag matching, I would recommend using a parsing module of some sort. That said, `\[([^\]\[]+)\]\(([^\)]+)\)` will correctly match the example in your comment. – Lone Shepherd Aug 26 '12 at 18:26
  • 4
    @thameera: To expand on what Lone Shepherd said, regular expressions *can't* deal with nested brackets. You can prove that the language consisting only of balanced parentheses is non-regular, *i.e.* that it cannot be matched by any regular expression. Python regexen are more powerful than formal regular expressions (as are most modern implementations, thanks to backreferences), but I don't think they have the necessary power to recognize the even simpler language consisting of all strings of the form `a...ab...b` with an equal number of `a`s and `b`s. – Antal Spector-Zabusky Aug 26 '12 at 19:19
  • 1
    @AntalS-Z: while you're correct about python stock `re`, more advanced regex engines (pyhon regex, pcre, net etc) can match nested brackets and `a^n b^n` by means of recursive groups, like `(?R)`. – georg Aug 26 '12 at 20:36