3

I am separating a string "foo,bar,c;qual="baz,blurb",d;junk="quux,syzygy"" by commas but want to keep the commas in the quotes. This question was answered in this Java: splitting a comma-separated string but ignoring commas in quotes question but it fails to fully explain how the poster created this piece of code which is:

line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);

OK so I do understand some of what is going on but there is a bit that is confusing me. I know the first comma is for matching.

Then

        (?= 

is a forward search.

Then the first part is grouped

  ([^\"]*\"[^\"]*\"). 

This where I get confused. So the first part

  [^\"]* 

means that beginning of any line with quotes separate tokens zero or more times.

Then comes \". Now is this like opening a quote in string or is it saying match this quote?

Then it repeats the exact same line of code, why?

      ([^\"]*\"[^\"]*\")

In the second part adds the same code again to explain that it must finish with quotes.

Can someone explain the part i am not getting?

Community
  • 1
  • 1
spaga
  • 143
  • 3
  • 12
  • 1
    `^` inside square brackets mean not. `\"` means `"` but the backslash is there as an escape character. So `[^\"]*` matches any string that does not contain `"`. – M. Shaw Jul 18 '15 at 12:26
  • Thanks, it would help if i knew that part. I still don't understand the part after [^\"] though – spaga Jul 18 '15 at 12:29
  • @M.Shaw it must be `[^\"]*` matches any character but not of `"`, zero or more times. – Avinash Raj Jul 18 '15 at 12:32
  • `[^\"]` is any string without `"`. `\"` matches `"`. So basically `([^\"]*\"[^\"]*\")` matches a string that contains 2 `"` and the last character is `"`. – M. Shaw Jul 18 '15 at 12:34
  • @AvinashRaj Which is basically any string that doesn't contain `"`. – M. Shaw Jul 18 '15 at 12:34
  • 1
    Always thought regex is a write only thing ... Noone can read a regex, only write ;) – Christian Kuetbach Jul 18 '15 at 13:00

3 Answers3

3

[^\"] is any string without ". \" matches ". So basically ([^\"]*\"[^\"]*\") matches a string that contains 2 " and the last character is ".

M. Shaw
  • 1,732
  • 9
  • 15
1

I think they do a pretty good job of explaining later in the answer:

[^\"] is match other than quote. \" is quote.

So this part ([^\"]*\"[^\"]*\") is

  1. [^\"]* match other than quote 0 or more times
  2. \" match quote, yes this is the opening quote
  3. [^\"]* match other than quote 0 or more times
  4. \" match quote, closing quote

They only require the first [^\"]* because they do not start with a quote, their example input is like a="abc",b="d,ef". If you were parsing "abc","d,ef" you wouldn't need it.

weston
  • 51,132
  • 20
  • 132
  • 192
  • Yes, the poster does explain what is going on well, it did not help that I was a little of base. I just didn't really understand the syntax – spaga Jul 18 '15 at 12:50
0

here is your string /,(?=([^\"]\"[^\"]\")[^\"]$)/

here is the readout from https://regex101.com/

, matches the character , literally
(?=([^\"]*\"[^\"]*\")*[^\"]*$) Positive Lookahead - Assert that the regex below can be matched
1st Capturing group ([^\"]*\"[^\"]*\")*
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
[^\"]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\" matches the character " literally
\" matches the character " literally
[^\"]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\" matches the character " literally
\" matches the character " literally
[^\"]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\" matches the character " literally
$ assert position at end of the string
LhasaDad
  • 1,322
  • 1
  • 10
  • 17