5

I have the following example of key=value pairs as one line string

start=("a", "b") and between=("range(2019, max, to=\"le\")") and end=("a", "b")

Using regex in golang I want to extract the key=value pairs as below

  1. start=("a", "b")
  2. between=("range(2019, max, to=\"le\")")
  3. end=("a", "b")

There are solutions on stackoverflow but do not work with golang regex.

There is a link to my failed attempt with golang regex: regex101 golang flavor

I would appreciate any help.

bobble bubble
  • 11,968
  • 2
  • 22
  • 34
M.H. Hussaini
  • 336
  • 2
  • 8
  • `strings.Split(x, " and ")` – Peter Dec 13 '19 at 13:36
  • For the given example your solution works, but I might not work for a value like _start=("a", " b and a")_. – M.H. Hussaini Dec 13 '19 at 14:08
  • 3
    don t use regexp for that. Write a parser, it is super easy. – mh-cbon Dec 13 '19 at 14:09
  • 2
    A quick [unrolled](http://www.softec.lu/site/RegularExpressions/UnrollingTheLoop) idea: [`\w+=\([^)(]*(?:\([^)(]*\)[^)(]*)*\)`](https://regex101.com/r/Fs8nTw/1/) But if it comes to arbitrary nested parenthesis and no regex recursion is available, a parser is the only solution. If it's about the escaped quote, you can also try something like [`\w+=\(".*?[^\\]"\)`](https://regex101.com/r/Fs8nTw/2) – bobble bubble Dec 13 '19 at 14:23

1 Answers1

3

The problem is the escaped quotes:

\S+=(\([^(]*(?:[^("]*"(?:[^\\"]|\\["\\])*")(\)))

https://regex101.com/r/3ytO9P/1

I changed [^"] to (?:[^\\"]|\\["\\]). This makes the regex look for either a regular character or an escape. By matching the escape, it doesn’t allow \" to end the match.

Your regex has other problems though. This should work better:

\S+=(\([^("]*(?:[^("]*"(?:[^\\"]|\\["\\])*")*(\)))

https://regex101.com/r/OuDvyX/1

It changes [^(] to [^("] to prevent " from being matched unless it’s part of a complete string.


UPDATE:

@Wiktor Stribiżew commented below:

It still does not support other escape sequences. The first [^("]* is redundant in the current pattern. It won't match between=("a",,,) but will match between=("a",,",") - this is inconsistent. The right regex will match valid double quoted string literals separated with commas and any amount of whitespace between them. The \S+=(\([^(]*(?:[^("]*"(?:[^\\"]|\\["\\])*")(\))) is not the right pattern IMHO

If you really want the regex to be that robust, you should use a parser, but you could fix those problems by using:

\S+=(\((?:[^("]*"(?:[^\\"]|\\.)*"[^("]*)*(\)))
Anonymous
  • 10,924
  • 6
  • 31
  • 56
  • You must explain the regex a bit. It's too complicated IMO – CinCout Dec 13 '19 at 13:42
  • 1
    But it won't match `between=("range(2019, max, to=\"le\")", "b")`, see https://regex101.com/r/3ytO9P/2. `\S+=(\([^(]*(?:[^("]*"(?:[^\\"]|\\["\\])*")(\)))` is a wrong pattern. – Wiktor Stribiżew Dec 13 '19 at 13:47
  • 1
    @WiktorStribiżew Thanks. I didn’t see that in the original regex. Updated. – Anonymous Dec 13 '19 at 13:55
  • 1
    Thanks @Anonymous works perfectly :) – M.H. Hussaini Dec 13 '19 at 14:06
  • @M.H.Hussaini You’re welcome :) – Anonymous Dec 13 '19 at 14:21
  • 1
    It still does not support other escape sequences. The first `[^("]*` is redundant in the current pattern. It won't match `between=("a",,,)` but will match `between=("a",,",")` - this is inconsistent. The right regex will match valid double quoted string literals separated with commas and any amount of whitespace between them. The `\S+=(\([^(]*(?:[^("]*"(?:[^\\"]|\\["\\])*")(\)))` is not the right pattern IMHO – Wiktor Stribiżew Dec 13 '19 at 15:02
  • 2
    @WiktorStribiżew There are definitely other changes you could make. It’s not clear from the question that these are what the OP wants though, but it’s good to point them out – Anonymous Dec 13 '19 at 16:34