5

The example php regex (below) uses subroutine calls to work.

If I try use it with the C# Regex class I get an error: Unrecognized grouping construct

Is it possible to rewrite this in to C# regex syntax?

Would it be a simple translation, or does another (regex) approach need to be used?

If it is not possible what is the name of the thing it is using, so I can add it to this question to make it more useful to others with the same problem?

PHP which works with all json RFC test data

$pcre_regex = '
  /
  (?(DEFINE)
     (?<number>   -? (?: [1-9]\d*| 0 ) (\.\d+)? (e [+-]? \d+)? )    
     (?<boolean>   true | false | null )
     (?<string>    " (?>[^"\\\\]+ | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
     (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
     (?<pair>      \s* (?&string) \s* : (?&json)  )
     (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
     (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
  )
  \A (?&json) \z
  /six   
';

And not working in C#

string pattern = @"(?(DEFINE)
 (?<number>   -? (?: [1-9]\d* | 0 ) (\.\d+)? (e [+-]? \d+)? )    
 (?<boolean>   true | false | null )
 (?<string>    "" (?>[^""\\\\]+ | \\\\ [""\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* "" )
 (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
 (?<pair>      \s* (?&string) \s* : (?&json)  )
 (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
 (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* ))
\A (?&json) \z
";
    string input = @"[{\"Example\": \"data\"}]";
    RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline;

    bool isValid = Regex.IsMatch(input, pattern, options);

Edit: This question is NOT about using regex with json, it is about how to do something (subroutine calls) in C#, which CAN be done in PHP regex

Just because there is a way of parsing json in C# DOES NOT answer the question. Please keep your answers and comments on topic.

Casimir et Hippolyte
  • 83,228
  • 5
  • 85
  • 113
DarcyThomas
  • 958
  • 11
  • 28
  • You should be using regex with html. html is not regular and regex is for regular text. Use an html class and method in the class. – jdweng Nov 12 '17 at 09:22
  • When you simplify the regex to find the construct that provokes the error message, what did you find? Please read about [mcve] and the other [help] pages. – AdrianHHH Nov 12 '17 at 11:22
  • @jdweng Why do you think my question is about HTML? – DarcyThomas Nov 12 '17 at 21:27
  • 1
    FWIW json is regular enough to use with (some) modern regex engines See: https://stackoverflow.com/a/3845829/309634 – DarcyThomas Nov 12 '17 at 22:54
  • @AdrianHHH Added a MCV C# example. When I tried I to simplify I got completely broken regex syntax (or regex which is so simple it does nothing useful). C# regex syntax (for grouping) is as far as I can tell quite different than the php syntax. If I could make a simpler working example then I would not need to be asking how to do it. I know the PHP version works. So I don't want to introduce noise be slicing and dicing that up inthe example (PHP is not my forte) . – DarcyThomas Nov 12 '17 at 23:09
  • .NET regex does have a related feature: https://www.regular-expressions.info/balancing.html - At least would ease nesting; however I'm not sure if it allows alternating between structures. – mario Nov 24 '17 at 08:39
  • 1
    It's not possible with a single regex since recursion isn't possible. Even using balancing groups doesn't provide all the functionality that recursion does. I was able to create a regex that does 99% of this, but what it cannot do is match nested objects inside an array since it cannot recurse the parent group (object) in the child group (array) – ctwheels Nov 24 '17 at 20:25
  • @CasimiretHippolyte What is your thinking around you edits to the regex? The original regex comes from this answer https://stackoverflow.com/a/3845829/309634 So I am inclined to rollback to keep them the same; but I am interested to hear your thought/reasoning behind your edits first. – DarcyThomas Nov 28 '17 at 21:12
  • 1
    @DarcyThomas: Ok, about the "number" subpattern, testing with a lookahead is stupid since you can directly match the beginning of the number. Also, since the whole pattern is case insensitive, no need to write: `[eE]`. About the "string" subpattern, a branch that can match an empty in a group that isn't atomic (or repeated with a possessive quantifier) in an alternation is clearly the way to go if you want to obtain a catastrophic backtracking (for example with a string without a closing quote). To finish `\Z` is for the end of a line, `\z` is for the end of the string. – Casimir et Hippolyte Nov 28 '17 at 21:39
  • @CasimiretHippolyte I'll keep the change. Maybe you would want to edit the answer that I sourced the pattern from as well then? – DarcyThomas Nov 28 '17 at 21:44
  • @DarcyThomas: perhaps, I will see that tomorrow. – Casimir et Hippolyte Nov 28 '17 at 21:52

2 Answers2

4

This does not directly answer the question but is a work around.

Rather than using the BCL Regex class, there is a project called PCRE.NET, which wraps the PCRE regex engine (the same engine which is used in the PHP example) with C# function calls.

This would allow the use of regex with subroutine calls in C# land.

DarcyThomas
  • 958
  • 11
  • 28
  • 1
    I'm glad you found my lib useful :) To answer the original question: *no*, there is no general-purpose way to convert a recursive PCRE pattern into a .NET regex. Those two regex engines are *fundamentally different* in several ways, and each one supports some features the other one doesn't. This is what motivated me to write the library in the first place. You can sometimes work around the lack of recursion in .NET regexes with balancing groups, but as soon as you have different kind of groups you're most probably out of luck, or you'll have to write a monstrous pattern. – Lucas Trzesniewski Nov 24 '17 at 18:13
  • 1
    See [here](https://stackoverflow.com/a/20644634/3764814) and [here](https://kobikobi.wordpress.com/2010/12/14/net-regex-matching-mixed-balanced-parentheses/) for some really good info (by Kobi) relevant to your question. – Lucas Trzesniewski Nov 24 '17 at 18:24
2

The short answer is kinda, but not really.

.Net regex has a concept called balancing groups.

This is really good for checking if all of your opening braces have matching (i.e., nested is Ok, but overlapping is not)

For example this regex will ensure that all of the curly braces match:

{(?:[^{}]|(?<Open>{)|(?<Content-Open>}))+(?(Open)(?!))}

Which matches this string:

{1 2 {3} {4 5 {6}} 7}

However it is beyond me to craft a regex which includes several nested groupings; like in the example.

Further more it looks like you would need to make a nested regex pattern with as many nestlings you would expect in your source data.

What you could try is combining balanced groups with some recursive C# to par down each grouping. There is something similar in this answer (But I would not recommend it in this case)

Alternatively you could add this nuget package. Which is a wrapper around the PCRE regex engine, which supports recursive subroutines. Details here.

DeltaTango
  • 127
  • 7