0

i have a relatively simple java question. I have a string that looks like this:

"Anderson,T",CWS,SS

I need to parse it in a way that I have

Anderson,T    
CWS    
SS

all as separate strings.

Thanks!

David Brossard
  • 12,223
  • 6
  • 42
  • 72

2 Answers2

1

Here's a solution that will capture quoted strings, remove spaces, and match empty items:

public static void main(String[] args) {
    String quoted = "\"(.*?(?<!\\\\)(?:\\\\\\\\)*)\"";
    Pattern regex = Pattern.compile(
        "(?:^|(?<=,))\\s*(" + quoted + "|[^,]*?)\\s*(?:$|,)");

    String line = "\"Anderson,T\",CWS,\"single quote\\\"\", SS ,,hello,,";
    Matcher m = regex.matcher(line);
    int count = 0;
    while (m.find()) {
        String s = m.group(2) == null ? m.group(1) : m.group(2);
        System.out.println(s);
        count++;
    }
    System.out.printf("(%d matches found)%n", count);
}

I split out the quoted part of the pattern to make it a bit easier to follow. Capturing group 1 is the quoted string, 2 is every other match.

To break down the overall pattern:

  1. Look for start of line or previous comma (?:^|(?<=,)) (don't capture)
  2. Ignore 0+ spaces \\s*
  3. Look for quoted string or string without comma (" + quoted + "|[^,]*?) (The non-comma match is non-greedy so it doesn't grab any following spaces)
  4. Ignore 0+ spaces again \\s*
  5. Look for end of line, or comma (?:$|,) (don't capture)

To break down the quote pattern:

  1. Look for opening quote \"
  2. Start group capture (
  3. Get the minimum match of any character .*?
  4. Match 0+ even number of backslashes (?<!\\\\)(?:\\\\\\\\)* (to avoid matching escaped quotes with or without preceding escaped backslashes)
  5. Close capturing group )
  6. Match closing quote \"
boot-and-bonnet
  • 616
  • 1
  • 5
  • 13
0

Assuming your string looks like this

String input = "\"Anderson,T\",CWS,SS";

You can use this solution found for a similar scenario.

String input = "\"Anderson,T\",CWS,SS";
List<String> result = new ArrayList<String>();
int start = 0; //start index. Used to determine where the word starts
boolean inQuotes = false;

for (int current = 0; current < input.length(); current++) { //iterate through characters
    if (input.charAt(current) == '\"') //if found a quote
        inQuotes = !inQuotes; // toggle state
    if(current == (input.length() - 1))//if it is the last character
        result.add(input.substring(start)); //add last word
    else if (input.charAt(current) == ',' && !inQuotes) { //if found a comma not inside quotes
        result.add(input.substring(start, current)); //add everything between the start index and the current character. (add a word)
        start = current + 1; //update start index
    }
}
System.out.println(result);

I have modified it a bit to improve readability. This code stores the strings you want in the list result.

JavierCastro
  • 188
  • 8