3

I've been making a parser using Regex in PowerShell, and it works really well so far, except for that issue I'm having.

\s*([a-zA-Z_]+)\s*=(?:\s*"(.*)"|([^;#]*))

I've made this Regex to match, with success, these scenarios:

Name= "Value" ;Comment

Will take everything in between quotes after the = and disregard the rest

Name=Value ;Comment

Will take everything after the = up to a ; or # as a Value

Now that's great but the problem I'm having is that scenario one is going to match as Value[2] and scenario two is going to match as Value[3]. This makes me check which one contains something for the final value, which I don't find clean, and I'm sure isn't necessary. So here comes the question: How can Value[2] contain the result of those two different groups? (Using an entirely different Regex isn't an issue, I've rewritten this one several times already)

Ansgar Wiechers
  • 175,025
  • 22
  • 204
  • 278
Blah
  • 31
  • 4
  • 1
    How about something like this: [`\S*([a-zA-Z_]+)\s*=\s*("?)([^"]*?)\2\s*[;#]`](https://regex101.com/r/R9LcE3/2) – Faibbus Feb 15 '17 at 09:45
  • This would no longer capture `Name="Value1;Value2"` And would not capture a space after the = if no quotes are specified like: `Name= has been found by ` Thanks – Blah Feb 15 '17 at 10:22
  • I think it would be better to make the alternation a capturing group and remove outer double quotes in a second step: `^\s*(\w+)\s*=\s*(".*?"|[^;#]*)` – Ansgar Wiechers Feb 15 '17 at 10:28
  • Or if powershell supports [branch reset](http://www.regular-expressions.info/branchreset.html): [`\S*([a-zA-Z_]+)\s*=(?|\s*"(.*)"|([^;#]*))`](https://regex101.com/r/R9LcE3/3) (Your question might actually be a duplicate of http://stackoverflow.com/questions/5377782/what-is-the-equivalent-of-branch-reset-operator-found-in-phppcre-in-c) – Faibbus Feb 15 '17 at 10:34
  • Possible duplicate of [What is the equivalent of branch reset operator ("?|") found in php(pcre) in C#?](http://stackoverflow.com/questions/5377782/what-is-the-equivalent-of-branch-reset-operator-found-in-phppcre-in-c) – Faibbus Feb 15 '17 at 10:39
  • Fantastic! I tested it, and Powershell does support multiple groups with the same name. I still think there might be a way to rewrite this to work as expected without using multiple groups, which might not work on other languages according to regex101.com (used for the tests). Portability, especially for this parsing operation is a key element. – Blah Feb 15 '17 at 11:03

1 Answers1

0

Use a named capture group instead of a non-capturing group:

$pattern = '\s*([a-zA-Z_]+)\s*=(?<value>\s*"(.*)"|([^;#]*))'
$value = ($string |Select-String -Pattern $pattern).Matches.Groups['value'].Value.Trim(' "')
Mathias R. Jessen
  • 106,010
  • 8
  • 112
  • 163
  • Your answer leads to the right direction, but it includes the `"` in the match. – Faibbus Feb 15 '17 at 13:12
  • Yes, but even this way, you cannot capture a space after the = if no quotes are specified like: `Name= has been found by `. You could do this by naming your inner groups with the same name. – Faibbus Feb 15 '17 at 14:12
  • Yes I've started using the same name group trick. I wish there was a way to do without, as it it not a portable solution. This is what I now use: `^\s*(?[\w]+)\s*=(?:\s*"(?.*)"|\s*(?[\d]*)\s*|(?[^;#]*))` – Blah Feb 15 '17 at 15:36