2

I have a challenge similar to finding "matching brackets", but I suppose this is a simpler one. For instance, string like "xAAAyBBBz" should match, as there are 3 A's and 3 B's. However, "xAAyBBBz" should not match, as there is one "unmatched" B. Strings have arbitrary length, and it is supposed to be one single regexp. I could use in-regexp evaluation (it is Perl after all) and I could (should!) avoid regexp at all. But now I got curious.

Andy Lester
  • 81,480
  • 12
  • 93
  • 144
creaktive
  • 5,035
  • 2
  • 16
  • 32
  • 2
    What if x, y, or z contain A or B? – melpomene Dec 14 '12 at 17:26
  • The pattern you describe is not matchable by traditional regular expressions, as the pattern is not a regular language (http://en.wikipedia.org/wiki/Regular_language). It may be matchable via extensions, however. – Mark A. Fitzgerald Dec 14 '12 at 17:32
  • I believe that [this previously asked question](http://stackoverflow.com/questions/7434272/match-an-bn-cn-e-g-aaabbbccc-using-regular-expressions-pcre) answers your question. :) – Mark A. Fitzgerald Dec 14 '12 at 17:36

1 Answers1

1
^[^AB]*(A(?:[^AB]*|(?-1))B)[^AB]*\z

^
[^AB]*       # "x"
(
  A
  (?:
    [^AB]*   # "y"
  |
    (?-1)
  )
  B
)
[^AB]*       # "z"
\z

The capturing group (A(?:[^AB]*|(?-1))B) matches an A at the beginning and a B at the end. In between, there may be either any number of non-(A or B) characters, or the pattern of the first capture group may match recursively at this position ((?-1)). This guarantees that the As and Bs are balanced.

melpomene
  • 79,257
  • 6
  • 70
  • 127
  • Perhaps you should explain your regex too. – TLP Dec 14 '12 at 17:54
  • This works if the As and Bs are consecutive, but I'm not sure that's always the case. If they don't have to be, simply extend the parens to include both outer "`[^AB]*`". – ikegami Dec 14 '12 at 18:04
  • 1
    @melpomene Answers should as a general rule come with explanations. Even though something might seem obvious to you, it may not be to others. – TLP Dec 14 '12 at 18:06
  • @ikegami "*I have a challenge similar to finding "matching brackets", but I suppose this is a simpler one.*" If the A/B's don't have to be consecutive, it is exactly the matching brackets problem. – melpomene Dec 14 '12 at 18:06
  • @TLP That's why I was asking which parts were non-obvious. When I write a solution in, say, C, I don't annotate each line a la `int i; // declares an integer variable` because I assume the asker already knows C. Similarly, for regex questions I assume the asker knows regexes. – melpomene Dec 14 '12 at 18:08
  • 1
    I know that's why you asked. My previous comment still stands. – TLP Dec 14 '12 at 18:17
  • @TLP If you already knew, why comment? My previous question still stands. – melpomene Dec 14 '12 at 18:25
  • 1
    @melpomene I would have thought that would be obvious. I made the comment because your answer lacked an explanation. It was a suggestion on how you could improve your answer. This is a learning environment, people come here to learn new things, that is why explanations are important. – TLP Dec 14 '12 at 18:49
  • @TLP I still don't know what exactly I'm supposed to explain. – melpomene Dec 14 '12 at 18:52
  • 1
    Oh, I think you do. And if not, it will be easier for you to figure out than it would be for a novice to decipher your regex. – TLP Dec 14 '12 at 19:03
  • 1
    @melpomene: kudos for the great tip, however, I'd recommend you to point out that `(?-1)` is the *prestige* part of your magic ;) @amon: thanks for providing the explanation! – creaktive Dec 14 '12 at 21:38