3

I have a csv file in the below format.

H,"TestItems_20100107.csv",07/01/2010,20:00:00,"TT1198","MOBb","AMD",NEW,,

I require the split command to ignore the commas inside the double quotes . So i used the below split command from an earlier post. Pasted the URL that i took this command

String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
System.out.println("items.length"+items.length);

Java: splitting a comma-separated string but ignoring commas in quotes

When i run for this CSV data I am getting the items.length printed as 8. The last two commas at the end of line after "NEW" are ignored. I want the split command to pick up these commas and return me the length as 10. It's not picking up the null commas if it's in end but it's picking it up if it's in the middle of string. Not sure what i need to modify in the split command to resolve this issue. Also in the csv file Double quotes within the contents of a Text field can be repeated (e.g. "This account is a ""large"" one")

Community
  • 1
  • 1
Arav
  • 4,291
  • 20
  • 64
  • 112

2 Answers2

9

There's nothing wrong with the regular expression. The problem is that split discards empty matches at the end:

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

A workaround is to supply an argument greater than the number of columns you expect in your CSV file:

 String[] tokens = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", 99);
Mark Byers
  • 719,658
  • 164
  • 1,497
  • 1,412
  • 3
    Actually, wouldn't it be better to specify -1 instead of 99? According to the docs, the difference between 0 and non-positive numbers is that with 0 "trailing empty strings will be discarded." – Powerlord Feb 11 '10 at 13:50
  • @R. Bemrose: I missed that bit, but if you're right then yes that sounds better. – Mark Byers Feb 11 '10 at 13:53
  • @bemrose You guys are amazing! Thanks for the help though I almost cracked the reg exp :D – Sandeep Jul 29 '10 at 07:51
0

I came across this same problem today and found a simpe solution for csv files: adding an extra field containing just one space at the time the split is executed:

(line + ", ").split(",");

This way no matter how many consecutive empty fields may exist at the end of the csv file, split() will return always n+1 fields

Example session (using bsh)

bsh % line = "H,\"TestItems_20100107.csv\",07/01/2010,20:00:00,\"TT1198\",\"MOBb\",\"AMD\",NEW,,
bsh % System.out.println(line);
H,"TestItems_20100107.csv",07/01/2010,20:00:00,"TT1198","MOBb","AMD",NEW,,
bsh % String[] items = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
bsh % System.out.println(items.length);
8
bsh % items = (line + ", ").split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
bsh % System.out.println(items.length - 1 );
10
bsh %
jee
  • 1
  • 1