I want to write a regular expression which matches anything between



  • 52,949
  • 15
  • 113
  • 149
  • 139
  • 1
  • 2
  • 11
  • 4
    what the ... 1. Whats your input (examples), 2. What do you want to fetch 3. What RegEx have you tied that didint work? – Hannes Oct 27 '10 at 08:11
  • please clarify -- anything between what? where in that string of brackets do you want to find the anything? – Spudley Oct 27 '10 at 08:11
  • i have a condition which will hold true if it matches () or (()) or ((())) or ()()() or (()()()).The brackets can be of any number. If the condition does not match i want to go in the false part.The input can be any of the examples mentioned above and some wrong input will be (() or (()()( or (())) etc – gaurav Oct 27 '10 at 08:19
  • no only )( is not a valid string – gaurav Oct 27 '10 at 09:53
  • why is (() invalid? It matchies "anything between ()" (in this case, the "anything" is "(") – Bryan Oakley Nov 01 '10 at 15:04
  • equal no of ( or ) should be present – gaurav Nov 23 '10 at 08:19

4 Answers4


All these answers claiming you can't use patterns to match a string with balanced nested parens are quite wrong. It's not practical to pretend that the patterns matched by modern programming languages are restricted to "regular languages" in the pathological textbook sense. As soon as you permit backreferences, they're not. This allows real-world patterns to match much more than the textbook versions, making them far more practical.

The simplest pattern for matching balanced parens is \((?:[^()]*+|(?0))*\). But you should never write that, because it is too compact to be easily read. You should always write it with /x mode to allow for whitespace and comments. So write it like this:

  \(              # literal open paren
     (?:          # begin alternation group
         [^()]*+  #  match nonparens possessively
       |          # or else
         (?0)     #  recursively match entire pattern
     )*           # repeat alternation group
  \)              # literal close paren

There's also a lot to be said for naming your abstractions, and decoupling their definition and its ordering from their execution. That leads to this sort of thing:

my $nested_paren_rx = qr{



        (?<open>       \(       )
        (?<close>       \)      )
        (?<nonparens> [^()]     )

                (?&nonparens) *+
            ) *


The second form is now amenable to inclusion in larger patterns.

Don't ever let anybody tell you can't use a pattern to match something that's recursively defined. As I've just demonstrated, you most certainly can.

While you're at it, make sure never to write line-noise patterns. You don't have to, and you shouldn't. No programming language can be maintainable that forbids white space, comments, subroutines, or alphanumeric identifiers. So use all those things in your patterns.

Of course, it does help to pick the right language for this kind of work. ☺

  • 74,913
  • 28
  • 118
  • 169
  • @tchrist: It would be good if you specified what languages your example would work in. Of course, it would also be good if the OP specified what languages he is looking to implement this in. – Avi Oct 27 '10 at 16:42
  • 1
    @avi: Agreed; hence my last line. As far as I know, recursive patterns work in Perl, PHP, and PCRE. The variable syntax I used was for Perl, but that's somewhat incidental to the problem. I expect that more languages will adopt recursive patterns in the next few years, now that PCRE supports them. Be careful, though, because PCRE has more restrictions on head-vs-tail recursion than Perl does. – tchrist Oct 27 '10 at 17:32
  • @tchrist, Your statement that "backreferences" make regexes non regular is incorrect. Back-references are just a short hand for _lots_ of alternatives - `(.)\1` is just shorthand for `aa|bb|cc|dd|...` you can do the same transformation for all uses of back-references. Indeed `[...]` notation and `?` notation are all just shorthand for alternatives in classical regexes. Recursive regexes on the other hand are a very different kettle of fish, using that feature stops it from being regular... – tobyodavies Jan 29 '11 at 12:57
  • @tobyodavies: Consider the pattern `(.+).*\1`. This requires auxiliary storage beyond what is needed for the automaton’s states, and indeed it requires storage proportionate to the length of the input string being matched against. This clearly violates one of the fundamental properties of a ʀᴇɢᴜʟᴀʀ language, and so cannot be solved by a DFA because no further storage can be required, especially no storage proportionate to the input length. Therefore the language which that pattern describes is by definition not ʀᴇɢᴜʟᴀʀ. – tchrist Jan 29 '11 at 17:08
  • @tchrist, true, it is however a far cry from context free with just back-references... as its still impossible to match `S ::= '(' S ')'` I would love to see if anyone has written an analysis on what class of languages normal regexes can parse... (I still don't consider recursive regexes a 'normal' feature yet... i've really not seen them used outside of this post) – tobyodavies Jan 29 '11 at 23:46
  • @tobyodavies: I use recursive regexes pretty frequently. A simple example is `s/\((?:[^()]*+|(?0))*\)//g` to remove nested parens. A more elaborate example is [this pattern to parse a legal RFC 5322 mail address](http://stackoverflow.com/questions/764247/why-are-regular-expressions-so-controversial/4053506#4053506), which allows for nested comments per the spec. There are plenty of grammatical constructs that include recursion, such as the one you yourself cite. One shouldn’t have to write a *de novo* parser for such simplistic tasks as all these. Modern patterns are perfect for it! – tchrist Jan 30 '11 at 02:05
  • @tchrist, i'd rather use a recursive descent parser where possible... They are dead easy to write and usually very easy to understand without having a fantastic comprehension of regex-feature-of-the-week or EBNF. Do recursive regexes actually match all CFLs or LL grammars or some subset? i'd be really interested to know... in fact i'm gonna make a question out of it :D – tobyodavies Jan 30 '11 at 03:23
  • http://stackoverflow.com/questions/4840988/the-recognizing-power-of-modern-regexes - i suspect you will have something to add – tobyodavies Jan 30 '11 at 03:34

In case you are stuck with language whose regular expression syntax does not support recursive matching I'm giving you my simple Javascript implementation from which you should be able to make your own in the language of your choice:

function testBraces(s) {
    for (var i=0, j=0; i<s.length && j>=0; i++)
        switch(s.charAt(i)) {
            case '(': { j++ ; break; }
            case ')': { j-- ; break; }

    return j == 0;

And here you can play with it: http://jsfiddle.net/BFsn2/

Marko Dumic
  • 9,530
  • 3
  • 27
  • 33
  • True, `j` should never go below zero as it indicates imbalance. – Marko Dumic Oct 27 '10 at 09:48
  • I only just noticed the `&& j>=0` bit in the end condition of the for loop (was it there all the time or did you edit it in the five-minute window?). Perfect. – Tim Pietzcker Oct 27 '10 at 14:34
  • @Tim: It was there from the start, as is in the demo (on jsFiddle). – Marko Dumic Oct 27 '10 at 14:49
  • This answer is also wrong: patterns in modern programming languages are perfectly up to the job. – tchrist Oct 27 '10 at 17:34
  • 3
    I didn't know that some regular expression flavors do recursive pattern matching because in languages I use, they don't. And most of the time (read: on my employers projects) I do not get to choose the languages I work with. I fixed my answer though. – Marko Dumic Nov 01 '10 at 14:53

Such nested structure cannot be effectively handled by regular expressions. What you need is a grammar and a parser for that grammar. In your case the grammar is simple enough. If you are using python try pyparsing or funcparserlib.

With pyparsing you can do the following:

from pyparsing import nestedExpr
nestedExpr().parseString( "(some (string you) (want) (to) test)" ).asList()

This will return a list containing the parsed components of the nested string. The default delimiter for nestedExpr is parenthesis, so you do not have to do anything extra. If you want to use funcpasrerlib you can try the following

from funcparserlib.parser import forward_decl, many, a
bracketed = forward_decl()
bracketed.define(a('(') + many(bracketed) + a(')'))

After this you can call

bracketed.parse( "( (some) ((test) (string) (you) (want)) (to test))" )

and it will return the parsed elements in a tuple.

  • 2,340
  • 16
  • 17
  • 1
    This is another wrong answer. Just because Python is incapable of doing the sophisticated pattern matching necessary does not mean all other languages are similarly hampered. Perl, PHP, and PCRE can all handle this perfectly well. See my answer. – tchrist Oct 27 '10 at 17:36
  • 3
    When you add backreference, the possessive "+" or the (?number) syntax it is not a regular language/expression anymore. It can have similar syntax but that doesnt mean its the same thing. Perl 5.10 and above and if I remember correctly, languages in the.Net framework offer such extensions. I think there is a bug in your pattern though I might be wrong, commenting separately there. Oh I cannot comment there. Anyways, I think the (?0) you have should be (?1) because you should recurse once you match at least one matched parenthesis. – srean Oct 27 '10 at 18:52
  • 1
    No, recursing on (?0) for the whole parenthesized is correct. There is no group 1 to recurse on as written. I *did* test all these. Honest! I prefer the final version with named groups used as a sort of regex subroutines. It's infinitely easier to read, debug, extend, and maintain. Hope this helps. – tchrist Oct 28 '10 at 00:04
  • You are probably right about (?0). But if you prefer readability, maintainability, clarity etc nestedExpr().parseString( ) seems shorter and easier :), the right language or not. – srean Oct 28 '10 at 10:35

I wish you good luck. You'd need a finite state automata with a stack to parse something like this. It can't be parsed using only regex, since it's not powerful enough.

  • 43,550
  • 10
  • 71
  • 102
  • 1
    You get a downvote for parroting the broken refrain about patterns being fundamentally incapable of handling grammars with recursive definitions. Perl, PCRE, and PHP can all handle them just fine. Schools stuck on teaching formal automata **theory** err by not explaining that **practical** considerations have led many tools to extend regexes with backreferences and even recursion. – tchrist Nov 07 '10 at 12:39