This can be done in php (uses PCRE engine).
Below is just an example.
You could comment out the errors check, then insert boundary constructs
around the regex to make it definitively pass/fail.
The biggest problem is not the recursion, but defining the content boundary
conditions. I've pretty much boiled it down for you. These checks have to
be maintained any how you do it, state, stacks ..., its all the same.
( This regex was constructed and tested using RegexFormat 6 )
Sample input:
((( (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and 7) )
Tested output:
** Grp 0 - ( pos 0 , len 64 )
((( (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and 7) )
** Grp 1 - ( pos 1 , len 62 )
(( (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and 7)
** Grp 2 - NULL
** Grp 3 - NULL
** Grp 4 - NULL
Regex:
5/29 All Forms:
Empty form ( )
not allowed
Empty form ) (
not allowed
Form ) and (
ok
Form ) and 2 and (
ok
Form ( 1 and 2 )
ok
Form ( 1 )
ok
Form ) and 2 )
ok
Form ( 1 and (
ok
Form ( whitespace (
or ) whitespace )
ok
# (?s)(?:\(((?!\s*\))(?&core))\)|\s*([()]))(?(DEFINE)(?<core>(?>(?&content)|\((?:(?!\s*\))(?&core))\)(?!\s*\())+)(?<content>(?>(?<=\))\s*(?:and|or)\s*(?=\()|(?<=\))\s*(?:(?:and|or)\s+\d+)+\s*(?:and|or)\s*(?=\()|(?<=\()\s*\d+(?:(?:\s+(?:and|or)\s+)?\d+)*\s*(?=\))|(?<=\))\s*(?:(?:and|or)\s+\d+)+\s*(?=\))|(?<=\()\s*(?:\d+\s+(?:and|or))+\s*(?=\()|\s+)))
# //////////////////////////////////////////////////////
# // The General Guide to 3-Part Recursive Parsing
# // ----------------------------------------------
# // Part 1. CONTENT
# // Part 2. CORE
# // Part 3. ERRORS
(?s) # Dot-All modifier (used in a previous incarnation)
(?:
# ( # (1), Take off CONTENT (not used here)
# (?&content)
# )
# | # OR
\( # Open Paren's
( # (1), parens CORE
(?! \s* \) ) # Empty form '( )' not allowed
(?&core)
)
\) # Close Paren's
| # OR
\s*
( # (2), Unbalanced (delimeter) ERRORS
# - Generally, on a whole parse, these
# are delimiter or content errors
[()]
)
)
# ///////////////////////
# // Subroutines
# // ---------------
(?(DEFINE)
# core
(?<core> # (3)
(?>
(?&content)
|
\( # Open Paren's
(?:
(?! \s* \) ) # Empty form '( )' not allowed
(?&core)
)
\) # Close Paren's
(?! \s* \( ) # Empty form ') (' not allowed
)+
)
# content
(?<content> # (4)
(?>
(?<= \) ) # Form ') and ('
\s*
(?: and | or )
\s*
(?= \( )
|
(?<= \) ) # Form ') and 2 and ('
\s*
(?:
(?: and | or )
\s+
\d+
)+
\s*
(?: and | or )
\s*
(?= \( )
|
(?<= \( ) # Form '( 1 and 2 )'
\s*
\d+
(?:
(?:
\s+
(?: and | or )
\s+
)?
\d+
)*
\s*
(?= \) )
|
(?<= \) ) # Form ') and 2 )'
\s*
(?:
(?: and | or )
\s+
\d+
)+
\s*
(?= \) )
|
(?<= \( ) # Form '( 1 and ('
\s*
(?:
\d+
\s+
(?: and | or )
)+
\s*
(?= \( )
|
\s+ # Interstitial whitespace
# '( here (' or ') here )'
)
)
)