0

I know how to use Java's string split method, like this

String s="a,b,c";
s.split(",");

But how to split the following lines :

 FirstName,LastName,Companies,Title,EmailAddress,PhoneNumbers,Tags,Note
 Tom,Smith,ABC Co.,Manager,toms@xyz.com,,,
 John,White,"Some-Company, Inc.","Network Manager, Architect",xyz@abc.com,,,

It's easy to handle the 1st 2 lines, but the 3rd line has some trouble, because it's including [,] inside ["], any suggestions ?

Edit : I've looked into the other 2 similar questions and answers, but they are different from my question, my question has [,] as delimiter, other 2 questions have a space character as delimiter, and space is also present in my question which complicates matter, so if I use that answer, it won't work, please don't mark it as a duplicate.

Frank
  • 28,342
  • 54
  • 158
  • 227
  • 6
    Use a CSV library to help you parse the text. – Hovercraft Full Of Eels Jun 22 '17 at 21:04
  • 3
    @Carcigenicate: ***NO***. Regex should not be used for this sort of thing and does not work well for nested symbols. Using regex will lead to two problems -- the original one, and the regex code that is difficult to use and maintain. – Hovercraft Full Of Eels Jun 22 '17 at 21:05
  • 1
    @HovercraftFullOfEels Really? I would have though this is exactly the kind of thing Regex would be useful for. Ok, removing. – Carcigenicate Jun 22 '17 at 21:06
  • 1
    Check out [Apache Commons CSV](https://commons.apache.org/proper/commons-csv/); – Hovercraft Full Of Eels Jun 22 '17 at 21:06
  • @Carcigenicate: please check out: [Now you have Two Problems](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/) – Hovercraft Full Of Eels Jun 22 '17 at 21:07
  • How frequent are you expecting strings like this to have commas? If it's not frequent, you can still do a split on comma, then check the first character of the appropriate fields for a " character. If found, merge with subsequent indices until you reach the " terminating character. – deckeresq Jun 22 '17 at 21:07
  • @Carcigenicate: also check out [parsing CSV with regex](https://softwareengineering.stackexchange.com/questions/166454/can-the-csv-format-be-defined-by-a-regex) – Hovercraft Full Of Eels Jun 22 '17 at 21:08

1 Answers1

0

After some research and test, I found the answer, here is some sample code :

String str="John,White,\"Some-Company, Inc.\",\"Network Manager, Architect\",xyz@abc.com,,,";
Vector<String> v=new Vector<String>(8);
for (int i=0;i<8;i++) v.add("");

String regex="\"([^\"]*)\"|([^,\\s][^\\,]*[^,\\s]*)";
m=Pattern.compile(regex).matcher(str);
int index=0;
while (m.find())
{
  if (m.group(1) != null) v.set(index,"Quoted [" + m.group(1) + "]");
  else v.set(index," Plain [" + m.group(2) + "]");
  index++;
}
for (int i=0;i<v.size();i++) System.out.println("<"+(i+1)+"> "+v.elementAt(i));

Since I also have to consider some empty items at the end, I set the size of the vector to 8, and set each one to an empty space as place holder.

Frank
  • 28,342
  • 54
  • 158
  • 227