-8

I have a string from a csv file which I want to split. The string may contain a comma within a double quote or within a JSON. For example if the string is:

abc, pq"r,s", {"one":1, "two":2}

The regex should split it into three tokens as:

  1. abc
  2. pq"r,s"
  3. {"one":1, "two":2}

I have tried this regex.

The regex reads like this: (?x)[,](?=([^"]*"[^"]*")*[^"]*$)

Can anyone please suggest a right regex?

horcrux
  • 4,954
  • 5
  • 24
  • 35

1 Answers1

0

Here is the regex that works on your example abc, pq"r,s", {"one":1, "two":2}:

,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)(?=(?:[^{}]*{[^{}]*})*[^}]*$)

Or check out this regex101 example

There are three parts on this regex

  1. , is the comma we want to match
  2. (?=(?:[^\"]*\"[^\"]*\")*[^\"]*$) is a look ahead based on the topic discussion Java: splitting a comma-separated string but ignoring commas in quotes by Bart Kiers.
  3. (?=(?:[^{}]*{[^{}]*})*[^}]*$) is an adapted look ahead to handle { ... }.

Not sure if it will work on other examples.

Magdrop
  • 464
  • 2
  • 8
  • well, that gets you halfway there. now you just need to add a lookahead for balanced brackets. don't forget to ignore brackets inside quoted strings! – Patrick Parker Aug 07 '18 at 18:46
  • @Patrick, I agree with you and most of the comments that parser is probably the best. Not sure where the OP wants to go with this, but if he wants to learn some complex regex, I hope to point him to the other post, which has a nice long explanation what that regex does. – Magdrop Aug 07 '18 at 18:47
  • it's an interesting idea, but you should be aware that it only works based on the assumptions in RFC 4180... namely that doublequotes are escaped by preceding with yet another doublequote. Based on @op's sample input, we can already see that he is not RFC 4180 compliant. – Patrick Parker Aug 07 '18 at 19:01