0

I'm trying to parse some data sets to through into a database. The CSV files that i'm working with have the data separated by commas. However there is some data that is encompassed by quotation marks which include commas that are part of that string. How do i step through each line of data and parse out each data entry given this obstacle?

edit: CSV parsers are not an option because this program will be used for non csv files. And i'm trying to minimize external libraries.

I found the issue because of my code here would just parse through the quotes:

while(bufferinput.hasNext()){

            nextline = bufferinput.nextLine();
            dataArray = nextline.split(",");

            for(i = 0; i<colNum;i++){
                //Try-Catch to search through dataArray and push it into sqlData arraylist.
                try{
                    if(!dataArray[i].toString().isEmpty()){
                        sqlData.add(dataArray[i].toString());
                    }
                    //else if used to find empty cells between non empty cells (Have to test to see if opposite of if-statement works)
                    else if(dataArray[i].toString().equals("")){
                        sqlData.add(null);
                    }
                }
                //Catch is necessary to find the cells not at the end of the row in a text file.
                catch(ArrayIndexOutOfBoundsException e){
                    sqlData.add(null);
                }

            }
}

An example of this data would be something like ,,"Ontario, Canada",xxxxx.....

Locations are the biggest cause of it as the address contains a comma.

user3521471
  • 188
  • 1
  • 12
  • 5
    Use a [CSV parser](http://commons.apache.org/proper/commons-csv/)? – GriffeyDog Jun 18 '14 at 20:49
  • In addition to the above-linked Apache Commons CSV parser, which isn't yet at a stable release (and hence may not be as feature rich yet), there are various other open-source and permissively licensed Java CSV parser libraries out there. – ajp15243 Jun 18 '14 at 20:52
  • @GriffeyDog I can't because i'm using this program to parse other files which arnt CSVs as well. – user3521471 Jun 18 '14 at 20:52
  • @user3521471 Do those non-CSV files also run through your posted code? – ajp15243 Jun 18 '14 at 20:53
  • @ajp15243 yes they do – user3521471 Jun 18 '14 at 20:54
  • Use the right tool for the problem. Use a csv parser for parsing your csv files, use something else as appropriate for the others – MadProgrammer Jun 18 '14 at 20:56
  • @user3521471 If you don't want to use any external tools, I think you're going to have to manually do the work that `split()` is doing automatically, since you don't actually want to split on all `,`s. Manually doing it will allow you to selectively choose which `,`s you want to use as a separator. – ajp15243 Jun 18 '14 at 20:57
  • Maybe you can consider using regex like here: http://stackoverflow.com/questions/6432408/regular-expression-to-match-csv-delimiters – Lexandro Jun 18 '14 at 20:59
  • @ajp15243 Thats what i thought i was going to have to do. Just solved the problem by adding this line: dataArray = nextline.split(",(?=([^\"]\"[^\"]\")[^\"]$)"); – user3521471 Jun 18 '14 at 21:02

1 Answers1

0

So this was a really easy fix. All i had to change was my split line to this:

dataArray = nextline.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
ajp15243
  • 6,951
  • 1
  • 29
  • 38
user3521471
  • 188
  • 1
  • 12