-6

I would like to resolve this problem.

  • , comma : split terms
  • " double quote : String value (ignore special char)
  • [] array

For instance:

input : a=1,b="1,2,3",c=[d=1,e="1,2,3"]

expected output:

    a=1
    b="1,2,3"
    c=[d=1,e="1,2,3"]

But I could not get above result.

I have written the code below:

 String line = "a=1,b=\"1,2,3\",c=[d=1,e=\"1,11\"]";
 String[] tokens = line.split(",(?=(([^\"]*\"){2})*[^\"]*$)");
 for (String t : tokens)
      System.out.println("> " + t);

and my output is:

a=1
b="1,2,3"
c=[d=1
e="1,11"]

What do I need to change to get the expected output? Should I stick to a regular expression or might another solution be more flexible and easier to maintain?

phimuemue
  • 30,699
  • 8
  • 74
  • 109
user2636961
  • 215
  • 1
  • 2
  • 6
  • See also: http://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes – assylias Aug 01 '13 at 05:37
  • possible duplicate of [java regex pattern split commna](http://stackoverflow.com/questions/17963969/java-regex-pattern-split-commna) – Kyle Strand Aug 01 '13 at 05:39
  • 2
    Don't re-post your question, especially without explaining why you think a repost is necessary. (If the answer provided on that question isn't sufficient, *edit* the original question instead of re-posting.) Also, don't copy-and-paste code from someone and say that you "have written" it. – Kyle Strand Aug 01 '13 at 05:41
  • Regular expressions are not appropriate for general text parsing. You want a lexical scanner, not a regular expression. – PaulProgrammer Aug 01 '13 at 05:44
  • Sorry, previous posted code in question is wrong. So I have reposted – user2636961 Aug 01 '13 at 05:48
  • 1
    @PaulProgrammer yeah, but see my answer. As long as the structure isn't too crazy or flexible, regex will work. – Bohemian Aug 01 '13 at 05:50
  • 1
    Sure, you **can** do a lot of crazy things, but that doesn't mean you **should** – PaulProgrammer Aug 01 '13 at 05:52
  • 3
    Again, **don't re-post; EDIT.** Failing to receive an answer that works for you is **NEVER** a good enough reason to re-post. And keep in mind that your original question hasn't even been open for 24 hours yet. – Kyle Strand Aug 01 '13 at 06:07

2 Answers2

52

This regex does the trick:

",(?=(([^\"]*\"){2})*[^\"]*$)(?=([^\\[]*?\\[[^\\]]*\\][^\\[\\]]*?)*$)"

It works by adding a look-ahead for matching pairs of square brackets after the comma - if you're inside a square-bracketed term, of course you won't have balanced brackets following.

Here's some test code:

String line = "a=1,b=\"1,2,3\",c=[d=1,e=\"1,11\"]";
String[] tokens = line.split(",(?=(([^\"]*\"){2})*[^\"]*$)(?=([^\\[]*?\\[[^\\]]*\\][^\\[\\]]*?)*$)");
for (String t : tokens)
    System.out.println(t);

Output:

a=1
b="1,2,3"
c=[d=1,e="1,11"]
Bohemian
  • 365,064
  • 84
  • 522
  • 658
0

I know the question is nearly a year old, but... this regex is much simpler:

\[[^]]*\]|"[^"]*"|(,)
  • The leftmost branch of the | matches [complete brackets]
  • The next side of the | matches \"strings like this\"
  • The right side captures commas to Group 1, and we know they are the right commas because they weren't matched by the expressions on the left
  • All we need to do is split on Group 1

Splitting on Group 1 Captures

You can do it like this (see the output at the bottom of the online demo):

String subject = "a=1,b=\"1,2,3\",c=[d=1,e=\"1,11\"]";
Pattern regex = Pattern.compile("\\[[^]]*\\]|\".*?\"|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "@@SplitHere@@");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("@@SplitHere@@");
for (String split : splits) System.out.println(split);

This is a two-step split: first, we replace the commas with something distinctive, such as @@SplitHere@@

Pros and Cons

  • The main benefit of this technique is that it is extremely easy to understand and maintain. If you suddenly decide to exclude commas {inside , curlies}, you just add another OR branch to the left of the regex: {[^{}]*}
  • When you are familiar with it, you can use it in many contexts
  • In this case, the main drawback is that we proceed in two steps as we replace before splitting. In my view, with modern processors that's irrelevant. Maintainable code is much more important.

Reference

This technique has many applications. It is fully explained in these two links.

Community
  • 1
  • 1
zx81
  • 38,175
  • 8
  • 76
  • 97