0

Suppose I have a string s = SU 3180 and (CMG 3200 or SU 3210). I need to split this string into a tree diagram such as this:

               X
              / \
       SU 3180   ()
                / - \
        CMG 3200     SU 3210               

The main goal is to show a difference with and / or split as show in the diagram. For example I have shown the or split with hyphen between the split. I have no idea how I should proceed with this. Any ideas are welcome!

  • 2
    You have a number of problems to solve: you need to deal with parentheses, order of precedence, recursively or iteratively parsing long strings and finally printing a representation of it that's properly laid out - that's a bit much to ask for in a single question without an attempt at solving it yourself. Please provide code you tried yourself and ask from there. – Grismar Sep 18 '19 at 01:15
  • 2
    You need to write a parser for this, and that is probably the point of your assignment. Regex is the wrong tool for the job. – Tim Biegeleisen Sep 18 '19 at 01:17
  • 2
    I agree with Grismar, we need to see what you have already, as this is a pretty complicated question and it's easier to help that way. That being said, you could look into a grammar parser like [Lark](https://github.com/lark-parser/lark). – Rob Rose Sep 18 '19 at 01:18
  • 1
    @TimBiegeleisen Most parsers use regex for tokenizing though. – gilch Sep 18 '19 at 01:25

1 Answers1

2

For algorithmic processing of that string, I'm not sure, for just that one case, you can likely start with this simple expression,

([A-Z]+\s+\d+)\s+and\s+\(([A-Z]+\s+\d+)\s+or\s+([A-Z]+\s+\d+)\)

and replace it with something similar to:

           X\n               /\\\n        \1  ()\n               /  -  \\\n       \2       \3

Test

import re

regex = r"([A-Z]+\s+\d+)\s+and\s+\(([A-Z]+\s+\d+)\s+or\s+([A-Z]+\s+\d+)\)"

string = "SU 3180 and (CMG 3200 or SU 3210)"

subst = '               X\\n               /\\\\\\n        \\1  ()\\n               /  -  \\\\\\n       \\2       \\3'

print(re.sub(regex, subst, string))

Output

           X
           /\
    SU 3180  ()
           /  -  \
   CMG 3200       SU 3210

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


A bit more complex expression that you can start with could be using (?R) to check for balanced brackets, then capture what is before and after brackets, level by level (depth), pass it through a for loop, and based on that design some methods to print the tree level by level, which is pretty complicated:

(?>([^(]*?)\s*([(]([^()]*|(?R))*[)])([^)]*?)\s*)  

or

([^(]*?)\s*([(]([^()]*|(?R))*[)])([^)]*?)\s*

Demo

Reference

Regular expression to match balanced parentheses

Emma
  • 1
  • 9
  • 28
  • 53