1

i need a regexp which returns all parts of a string which has parenthesis.

An example would be:

if ((a and b) or (a and)) or (c and d) or (e and f)

would return

['if', '((a and b) or (a and))', 'or', '(c and d)', 'or', '(e and f)']

.

Can anybody direct me how could i achieve it? Unfortunately i don' t have a very deep friendship with re .

The biggest problem is the "parenthesis inside a parenthesis" .

Many thanks.

user2194805
  • 711
  • 8
  • 24

3 Answers3

1

Deep parenthesis matching is not doable with regular expressions.

You could do it if you had a fixed pattern - like three parenthesis deep, and a second set of sibling at the top level parenthesis, and so on. But matching arbitrary closing parenthesis with the opening ones is not easily feasible with regexes only (if there is a practical way of doing it with regexes at all).

It is much easier to write a couple lines of Python code and use Python itself to match the outer parentheses groups - as you can just count the number of open parentheses in a stream. So, soemthing along this - (it can be made in less lines):

def extract_parentheses_groups(text):
    count = 0
    groups = []
    buffer = ""
    for char in text:
       if char == "(":
            if count == 0 and buffer.strip():
                groups.append(buffer.strip())
                buffer = ""
            count += 1
       buffer += char
       if char == ")":
            count -= 1
            if count == 0:
                groups.append(buffer.strip())
                buffer = ""
    if buffer.strip():
         groups.append(buffer.strip())
    return groups

Running your example input through this I get:

In [17]: a = """if ((a and b) or (a and)) or (c and d) or (e and f)"""

In [18]: extract_parentheses_groups(a)
Out[18]: ['if', '((a and b) or (a and))', 'or', '(c and d)', 'or', '(e and f)']
jsbueno
  • 77,044
  • 9
  • 114
  • 168
0

Well, as it's mentioned in Regular expression to match balanced parentheses, matching parentheses is not a task for regex. But here is some Python code that might help to get the result without regex:

w = 'if ((a and b) or (a and)) or (c and d) or (e and f)'

result = []
curr = ''
open = 0

# Assuming that we don't have broken parentheses, i.e. all '(' are closed with ')'
for c in w:
    curr += c

    if c in '()':
        open += 1 if c == '(' else -1
        if not open or (c == '(' and open == 1):
            curr = curr = curr[:-1].strip() if open else curr.strip()
            if curr:
                result.append(curr)
                curr = '(' if open else ''

curr = curr.strip()
if curr:
    result.append(curr)

print(result)

Output:

['if', '((a and b) or (a and))', 'or', '(c and d)', 'or', '(e and f)']
andnik
  • 1,631
  • 1
  • 14
  • 26
  • yep - essentially the same code as I wrote minus some code order - I guess this is the "one and obvious way to do it" in Python – jsbueno Mar 27 '19 at 18:37
  • @jsbueno, yeap, your're right :) always fun to practice in simple algorithms, unlike this one: https://stackoverflow.com/questions/55379954/shortest-subarray-containing-all-elements-without-using-arrays – andnik Mar 27 '19 at 18:44
-1

You could use something like:

(\((?>[^()]+|(?1))*\))|(\w+)

See demo and explaination here.

EDIT FOR PYTHON

You can use this for python re:

(\((?:[^()]*|\([^()]*\))*\))|(\w+)

See demo and explaination here.

Note: as @jsbueno pointed out this will only work up to two nested parenthesis.

ALFA
  • 1,705
  • 1
  • 8
  • 18