0

Assuming the text

text = """{ |p{3cm}|p{3cm}|p{3cm}| } \hline \multi{3}{|c|}{City List} \ \hline Name ... """

I would solely like to subset the content of the first curly brackets. So the desired output would be:

desired_output = "p{3cm}|p{3cm}|p{3cm}"

Currently I receive the content of all curly brakets of the lines



text = """{ |p{3cm}|p{3cm}|p{3cm}|  } \\hline \\multi{3}{|c|}{City List} \\ \\hline Name ... """
import re
false_output = re.findall(r'\{(.*?)\}',text)
false_output

#[' |p{3cm', '3cm', '3cm', '3', '|c|', 'City List']


#also no success with: 
re.findall(r'({\w+\})',a) 

NDel
  • 138
  • 9
  • If the nesting of braces inside the first brace pair can be arbitrary deep (can it?), this can't be solved with a single regular expression. – Michael Butscher Nov 01 '19 at 14:02
  • Take a look on this question: https://stackoverflow.com/questions/546433/regular-expression-to-match-balanced-parentheses . It is not directly devoted to python, but there was suggestion in answers for python: use package regex – MrPisarik Nov 18 '19 at 15:48

1 Answers1

1

I don't think this can be done with a regular expression. Last time I had to tackle something like this (parsing wikitext), I ended up using a stack, increasing every time I have the opening character, decreasing when I meet a closing one, exiting when I found the last one.

Please note this wouldn't work for repeated first level brackets.

The code was more optimized than this, but the basic idea is as follows:

def keep_between(text, start, end):
    counter = 0
    result = []
    beginning = text.find(start)
    if beginning != -1:
        remaining_text = text[beginning:]
        for c in remaining_text:
            if c == start:
                counter += 1
                continue
            if c == end:
                counter -= 1
                continue
            if not counter:
                break
            result.append(c)
    return ''.join(result)

print(keep_between(text, '{', '}'))

that gets me ' |p3cm|p3cm|p3cm| '

Alex
  • 94
  • 5