1

I'm stuck with this problem with regular expression.

Suppose i have a string that i've read from a file that contains.

first_name, Hello, "test Drive"

then i'll just split it using , as delimiter. and i'll get

myString[0] = "first_name";
myString[1] = "Hello";
myString[2] = "\"test Drive\"";

My problem is when the system read a string with a , inside the double quotes

first_name, Hello, "test, Drive"

i get

myString[0] = "first_name";
myString[1] = "Hello";
myString[2] = "\"test"
myString[3] = "Drive\"";

My Question

How would i split a string using , as delimiter with a condition that no " are present on it's left and right side.. or is there some workaround that will be much easier?

Thanks.

Bk Santiago
  • 1,276
  • 2
  • 13
  • 22
  • Therein lies the fundamental problem with using a context-free language to extract context-sensitive data. The good news is that with modern regex implementations you can do most of this, but you won't get past the fact that quotes within quotes within ... eventually reaches the limit of what is possible. Consider using a proper parser, such as from a CSV library. – caskey Sep 05 '13 at 06:17
  • You have to be more precise and more formal about the grammar used. Can the first segment have quotes ? And the seconds ? And what's appended when quotes are present in quoted string ? Are they escaped ? A BNF grammar defintion may be usefull... – Aubin Sep 05 '13 at 06:21
  • there are lot of duplicates you will get, search google `skip comma in double quote string split regex`, first page gives you only stackoverflow links :) – Nandkumar Tekale Sep 05 '13 at 06:22
  • 1
    http://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes – Nandkumar Tekale Sep 05 '13 at 06:30

2 Answers2

2

It looks like you're working on an CSV-file. Have you already considered to use one of the CSV-libraries to do this (like opencsv or supercsv)?

MrD
  • 1,196
  • 1
  • 8
  • 22
  • just some plain `*.txt` file. with some `"` on it. but i'll check on some of the libraries. – Bk Santiago Sep 05 '13 at 06:16
  • If that file contains a specific formatting such as: "there are three fields of each line, separated by comma, and the last field is a quoted string", then that is considered as a CSV file =D – justhalf Sep 05 '13 at 06:30
0

if you know the definite regex limit until then where you want to split, then below will work for you

        String test = "first_name, Hello, \"test, Drive\"";


        String[] tests = test.split(", ", 3);

        System.out.println("1  " + tests[0]);
        System.out.println("1  " + tests[1]);
        System.out.println("1  " + tests[2]);

 output :-
   1  first_name
   2  Hello
   3  "test,Drive"

if you dont know the limit then input string should be in below format where if you can format the string within qutoes which has only (,) without space.. where other text delimiters with one comma (,) and space ( )

String test = "first_name, Hello, \"test,Drive\"";


        String[] tests = test.split(", ");

        System.out.println("1  " + tests[0]);
        System.out.println("1  " + tests[1]);
        System.out.println("1  " + tests[2])

output :-
       1  first_name
       2  Hello
       3  "test,Drive"
pappu_kutty
  • 2,101
  • 6
  • 33
  • 71