1

I have multiple parentheses and want to remove the parentheses that have at least one number in.

I have tried the following. However, since it is greedy, it removes the first open parenthesis to the last close parenthesis. I have also tried to destroy the greedy feature by excluding an open parenthesis but did not work.

names = ['d((123))', 'd(1a)(ab)', 'd(1a)(ab)(123)']
data = pd.DataFrame(names, columns = ['name'])

print(data.name.str.replace("\(.*?\d+.*?\)", ""))
# Output: ['d)', 'd(ab)', 'd']

print(data.name.str.replace("\((?!\().*[\d]+(?!\().*\)",""))
# Output: ['d(', 'd', 'd']

# desired output: ['d', 'd(ab)', 'd(ab)']
Hao Wu
  • 12,323
  • 4
  • 12
  • 39
cccfran
  • 97
  • 5
  • Your problem is about balancing parentheses, so this is probably relevant https://stackoverflow.com/questions/546433/regular-expression-to-match-balanced-parentheses - also, it's not clear what you want in some cases. For example, do you want to go from `'d(ab(12))'` to `'d(ab)'` or to `'d'`? – Grismar Mar 04 '21 at 00:48
  • Playing around with it, I found the issue was with `.` matching parentheses as well as alphanumeric characters. I tried `data.name.str.replace(r'\(+\w*\d+\w*?\)+', "")` and got the desired output. – jrhode2 Mar 04 '21 at 00:54

2 Answers2

3

This regex seems valid: \([^)\d]*?\d+[^)]*?\)+

>>> pattern = '\([^)\d]*?\d+[^)]*?\)+'
>>> names = ['d((123))', 'd(1a)(ab)', 'd(1a)(ab)(123)']
>>> [re.sub(pattern, '', x) for x in names]
['d', 'd(ab)', 'd(ab)']

I don't know if there are more complex cases but for those that you've supplied and similar, it should do the trick.

gribvirus74
  • 606
  • 4
  • 15
-1

Although Python does not support recursive regex, you can enable it by installing regex module with:

pip install regex

Then you can say something like:

import regex

names = ['d((123))', 'd(1a)(ab)', 'd(1a)(ab)(123)']
pattern = r'\((?:[^()]*?\d[^()]*?|(?R))+\)'
print ([regex.sub(pattern, '', x) for x in names])

Output:

['d', 'd(ab)', 'd(ab)']
tshiono
  • 12,077
  • 2
  • 6
  • 16
  • If all you're looking for is recursion, there's no need to install a whole package for it. – Aaron Mar 04 '21 at 02:19