i have a relatively simple java question. I have a string that looks like this:
"Anderson,T",CWS,SS
I need to parse it in a way that I have
Anderson,T
CWS
SS
all as separate strings.
Thanks!
i have a relatively simple java question. I have a string that looks like this:
"Anderson,T",CWS,SS
I need to parse it in a way that I have
Anderson,T
CWS
SS
all as separate strings.
Thanks!
Here's a solution that will capture quoted strings, remove spaces, and match empty items:
public static void main(String[] args) {
String quoted = "\"(.*?(?<!\\\\)(?:\\\\\\\\)*)\"";
Pattern regex = Pattern.compile(
"(?:^|(?<=,))\\s*(" + quoted + "|[^,]*?)\\s*(?:$|,)");
String line = "\"Anderson,T\",CWS,\"single quote\\\"\", SS ,,hello,,";
Matcher m = regex.matcher(line);
int count = 0;
while (m.find()) {
String s = m.group(2) == null ? m.group(1) : m.group(2);
System.out.println(s);
count++;
}
System.out.printf("(%d matches found)%n", count);
}
I split out the quoted part of the pattern to make it a bit easier to follow. Capturing group 1 is the quoted string, 2 is every other match.
To break down the overall pattern:
(?:^|(?<=,))
(don't capture) \\s*
(" + quoted + "|[^,]*?)
(The non-comma match is non-greedy so it doesn't grab any following spaces) \\s*
(?:$|,)
(don't capture) To break down the quote pattern:
\"
(
.*?
(?<!\\\\)(?:\\\\\\\\)*
(to avoid matching escaped quotes with or without preceding escaped backslashes))
\"
Assuming your string looks like this
String input = "\"Anderson,T\",CWS,SS";
You can use this solution found for a similar scenario.
String input = "\"Anderson,T\",CWS,SS";
List<String> result = new ArrayList<String>();
int start = 0; //start index. Used to determine where the word starts
boolean inQuotes = false;
for (int current = 0; current < input.length(); current++) { //iterate through characters
if (input.charAt(current) == '\"') //if found a quote
inQuotes = !inQuotes; // toggle state
if(current == (input.length() - 1))//if it is the last character
result.add(input.substring(start)); //add last word
else if (input.charAt(current) == ',' && !inQuotes) { //if found a comma not inside quotes
result.add(input.substring(start, current)); //add everything between the start index and the current character. (add a word)
start = current + 1; //update start index
}
}
System.out.println(result);
I have modified it a bit to improve readability. This code stores the strings you want in the list result
.