-1

I am using GoLang

I want to extract and store contents between brackets which may contain nested brackets (Which we will want to ignore) and store whatever the content to the right of the last bracket is as a single match, regardless of how many lines, or how many quotes or different type of chars e.g.

[TSLA] [Model-ST[3000]123] "(Model:"3000"),
("Tesla":"CyberTruck"),
("Tesla":"Model(y)"),
("Tesla":"Battery-Day

I would like to get these when iterating

match1 = TSLA

match2 = Model-ST[3000]123

match3 = "(Model:"3000"),
("Tesla":"CyberTruck"),
("Tesla":"Model(y)"),
("Tesla":"Battery-Day

The Regex Pattern I currently have is

(\[(.*?)\])|"([^"]+".*|[(\+)])

This allows me to extract TSLA and Model-ST[3000 it does not ignore the nested bracket, and it also fails to extract the remaining content to the right of the last bracket as a whole

  • A regular expression cannot count, and therefore can't handle arbitrary nestings (see [here](https://stackoverflow.com/q/546433/1256452) and [here](https://stackoverflow.com/q/7898310/1256452) for more, and note that Go's regexp matcher does not implement recursion or stacks). – torek Sep 22 '20 at 05:53
  • Parse it "by hand". Regexps are the wrong tool for this kind of tasks. – Volker Sep 22 '20 at 06:09

1 Answers1

0

If you have only one level of nesting, you should do it like this:

\[([^[]+]*|.*?\[.*?\].*?)\]|("(?:.|[\n])+)

  • [^[]+]* : no nesting

  • .*?\[.*?\].*? : one level nesting (could be harnessed by using [^\]\[] instead of the dots)

You'll get this:

  • Capture group 1 of match 1 will be TSLA
  • Capture group 1 of match 2 will be Model-ST[3000]123
  • Full match 3 will be "(Model:"3000"), ("Tesla":"CyberTruck"), ("Tesla":"Model(y)"), ("Tesla":"Battery-Day

Demo: https://regex101.com/r/Dcwszv/2

Vincent
  • 1,762
  • 8
  • 15