1

I would like to extract aaa, bb b={{b}}bb bbb and {ccc} ccc from the following string using regular expression:

zyx={aaa}, yzx={bb b={{b}}bb bbb}, xyz={{ccc} ccc}

Note: aaa represents an arbitrary sequence of any number of characters, hence no determined length or pattern. For instance, {ccc} ccc could be {cccccccccc}cc {cc} cccc cccc, or any other combination),

I have written the following regular expression:

(?<a>[^{}]*)\s*=\s*{((?<v>[^{}]+)*)},*

This expression extracts aaa, but fails to parse the rest of the input with catastrophic backtracking failure, because of the nested curly-brackets.

Any thoughts on how I can update the regex to process the nested brackets correctly?

(Just in case, I am using C# .NET Core 3.0, if you need engine-specific options. Also, I rather not doing any magics on the code, but work with the regex pattern only.)


Similar question

The question regular expression to match balanced parentheses is similar to this question, with one difference that here the parenthesis are not necessarily balanced, rather they follow x={y} pattern.


Update 1

Inputs such as the following are also possible:

yzx={bb b={{b}},bb bbb,}, 

Note , after {{b}} and bbb.


Update 2

I wrote the following patter, this can match anything but aaa from the first example:

(?<A>[^{}]*)\s*=\s*{(?<V>(?<S>([^{}]?)\{(?:[^}{]+|(?&S))+\}))}(,|$)
Hamed
  • 1,460
  • 1
  • 16
  • 32
  • Does this answer your question? [Regular expression to match balanced parentheses](https://stackoverflow.com/questions/546433/regular-expression-to-match-balanced-parentheses) There is at least one answer that applies to C# as well. – Sebastian Proske Nov 10 '19 at 09:05
  • From what I understand : you want to extract parts that are separated by comas and an equal sign Try to focus on that, rather than trying to parse nested brackets with regexp (which you can't) – LeGEC Nov 10 '19 at 10:36

1 Answers1

0

Regex.Matches, pretty good

"={(.*?)}(, |$)" could work.

string input = "zyx={aaa}, yzx={bb b={{b}}bb bbb}, yzx={bb b={{b}},bb bbb,}, xyz={{ccc} ccc}";

string pattern = "={(.*?)}(, |$)";

var matches = Regex.Matches(input, pattern)
        .Select(m => m.Groups[1].Value)
        .ToList();

foreach (var m in matches) Console.WriteLine(m);

Output

aaa
bb b={{b}}bb bbb
bb b={{b}},bb bbb,
{ccc} ccc

Regex.Split, really good

I think for this job Regex.Split may be a better tool.

tring input = "zyx={aaa}, yzx={bb b={{b}}bb bbb}, yzx={bb b={{b}},bb bbb,}, ttt={nasty{t, }, }, xyz={{ccc} ccc}, zzz={{{{{{{huh?}";
var matches2 = Regex.Split(input, "(^|, )[a-zA-Z]+=", RegexOptions.ExplicitCapture); // Or "(?:^|, )[a-zA-Z]+=" without the flag


Console.WriteLine("-------------------------"); // Adding this to show the empty element (see note below)
foreach (var m in matches2) Console.WriteLine(m);
Console.WriteLine("-------------------------");
-------------------------

{aaa}
{bb b={{b}}bb bbb}
{bb b={{b}},bb bbb,}
{nasty{t, }, }
{{ccc} ccc}
{{{{{{{huh?}
-------------------------

Note: The empty element is there because:

If a match is found at the beginning or the end of the input string, an empty string is included at the beginning or the end of the returned array.

Case 3

string input = "AAA={aaa}, BBB={bbb, bb{{b}}, bbb{b}}, CCC={ccc}, DDD={ddd}, EEE={00-99} ";
var matches2 = Regex.Split(input, "(?:^|, )[a-zA-Z]+="); // Or drop '?:' and use RegexOptions.ExplicitCapture

foreach (var m in matches2) Console.WriteLine(m);

{aaa}
{bbb, bb{{b}}, bbb{b}}
{ccc}
{ddd}
{00-99} 
tymtam
  • 20,472
  • 3
  • 58
  • 92
  • This is from `.NETCoreApp,Version=v3.0` – tymtam Nov 10 '19 at 09:16
  • Thank you, that solved for most of my cases; however, there is one possible combination that this regex pattern does not solve. Please see my updated question. Any thoughts on that? – Hamed Nov 10 '19 at 09:31
  • The solution with `Regex.Split` does not work properly. For instance take this input: `AAA={aaa}, BBB={bbb, bb{{b}}, bbb{b}}, CCC={ccc}, DDD={ddd}, EEE={00-99}`. I have tried another regex, please see my updated question. – Hamed Nov 10 '19 at 18:07