12

I have a CSV file that I am having trouble parsing. I am using the opencsv library. Here is what my data looks like and what I am trying to achieve.

RPT_PE,CLASS,RPT_MKT,PROV_CTRCT,CENTER_NM,GK_TY,MBR_NM,MBR_PID "20150801","NULL","33612","00083249P PCP602","JOE SMITH ARNP","NULL","FRANK, LUCAS E","50004655200"

The issue I am having is the member name ("FRANK, LUCAS E") is being split into two columns and the member name should be one. Again I'm using opencsv and a comma as the separator. Is there any way I can ignore the commas inside the double-quotes?

        public void loadCSV(String csvFile, String tableName,
            boolean truncateBeforeLoad) throws Exception {

        CSVReader csvReader = null;
        if (null == this.connection) {
            throw new Exception("Not a valid connection.");
        }
        try {

            csvReader = new CSVReader(new FileReader(csvFile), this.seprator);

        } catch (Exception e) {
            e.printStackTrace();
            throw new Exception("Error occured while executing file. "
                    + e.getMessage());
        }
        String[] headerRow = csvReader.readNext();

        if (null == headerRow) {
            throw new FileNotFoundException(
                    "No columns defined in given CSV file."
                    + "Please check the CSV file format.");
        }

        String questionmarks = StringUtils.repeat("?,", headerRow.length);
        questionmarks = (String) questionmarks.subSequence(0, questionmarks
                .length() - 1);

        String query = SQL_INSERT.replaceFirst(TABLE_REGEX, tableName);
        System.out.println("Base Query: " + query);
        String headerRowMod = Arrays.toString(headerRow).replaceAll(", ]", "]");
        String[] strArray = headerRowMod.split(",");

        query = query
                .replaceFirst(KEYS_REGEX, StringUtils.join(strArray, ","));

        System.out.println("Add Headers: " + query);
        query = query.replaceFirst(VALUES_REGEX, questionmarks);
        System.out.println("Add questionmarks: " + query);

        String[] nextLine;
        Connection con = null;
        PreparedStatement ps = null;
        try {
            con = this.connection;
            con.setAutoCommit(false);
            ps = con.prepareStatement(query);

            if (truncateBeforeLoad) {
                //delete data from table before loading csv
                con.createStatement().execute("DELETE FROM " + tableName);
            }

            final int batchSize = 1000;
            int count = 0;
            Date date = null;
            while ((nextLine = csvReader.readNext()) != null) {
                System.out.println("Next Line: " + Arrays.toString(nextLine));
                if (null != nextLine) {
                    int index = 1;
                    for (String string : nextLine) {
                        date = DateUtil.convertToDate(string);
                        if (null != date) {
                            ps.setDate(index++, new java.sql.Date(date
                                    .getTime()));
                        } else {
                            ps.setString(index++, string);
                        }
                    }
                    ps.addBatch();
                }
                if (++count % batchSize == 0) {
                    ps.executeBatch();
                }
            }
            ps.executeBatch(); // insert remaining records
            con.commit();
        } catch (SQLException | IOException e) {
            con.rollback();
            e.printStackTrace();
            throw new Exception(
                    "Error occured while loading data from file to database."
                    + e.getMessage());
        } finally {
            if (null != ps) {
                ps.close();
            }
            if (null != con) {
                con.close();
            }
            csvReader.close();
        }
    }

    public char getSeprator() {
        return seprator;
    }

    public void setSeprator(char seprator) {
        this.seprator = seprator;
    }

    public char getQuoteChar() {
        return quoteChar;
    }

    public void setQuoteChar(char quoteChar) {
        this.quoteChar = quoteChar;
    }
}
ltd9938
  • 1,208
  • 1
  • 12
  • 26
KalebD
  • 121
  • 1
  • 6
  • According to the CSVReader documentation this case should be handled. Post the key parts of your code. – Eric J. Jul 31 '15 at 18:29
  • See my code example. – KalebD Jul 31 '15 at 19:09
  • I wrote a simple program , it seems working for me. instead of extra column, I got the full name with the comma which is expected.You can try it, it may give a bit clue – Jegg Jul 31 '15 at 19:44

4 Answers4

2

Did you try the the following?

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), ',');

I wrote a following program and it works for me, I got the following result:

[20150801] [NULL] [33612] [00083249P PCP602] [JOE SMITH ARNP] [NULL] [FRANK, LUCAS E] [50004655200]

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

import au.com.bytecode.opencsv.CSVReader;

public class CVSTest {

    /**
     * @param args
     */
    public static void main(String[] args) {
        CSVReader reader = null;
        try {

            reader = new CSVReader(new FileReader(
                    "C:/Work/Dev/Projects/Pure_Test/Test/src/cvs"), ',');
        } catch (FileNotFoundException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        String[] nextLine;
        try {
            while ((nextLine = reader.readNext()) != null) {
                // nextLine[] is an array of values from the line
                System.out.println("[" + nextLine[0] + "] [" + nextLine[1]
                        + "] [" + nextLine[2] + "] [" + nextLine[3] + "] ["
                        + nextLine[4] + "] [" + nextLine[5] + "] ["
                        + nextLine[6] + "] [" + nextLine[7] + "]");
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}
Jegg
  • 541
  • 3
  • 11
0

According to the documentation, you can supply custom separator and quote characters in the constructor, which should deal with it:

CSVReader(Reader reader, char separator, char quotechar)

Construct your reader with , as separator and " as quotechar.

Ankit
  • 1,190
  • 15
  • 23
0

Your case should be handled out of the box with no special configuration required.

If you can't make it work, then just switch to uniVocity-parsers to do this for you - it's twice as fast in comparison to OpenCSV, requires much less code and is packed with features.

CsvParserSettings settings = new CsvParserSettings();     // you have many configuration options here - check the tutorial.

CsvParser parser = new CsvParser(settings);

List<String[]> allRows = parser.parseAll(new FileReader(new File("C:/Work/Dev/Projects/Pure_Test/Test/src/cvs")));

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

josliber
  • 41,865
  • 12
  • 88
  • 126
Jeronimo Backes
  • 5,701
  • 2
  • 20
  • 28
0

It is simple to load your CSV as an SQL table into HSQLDB, then select rows from the table to insert into another database. HSQLDB handles commas inside quotes. You need to define your text source as "quoted". See this:

http://hsqldb.org/doc/2.0/guide/texttables-chapt.html

fredt
  • 22,590
  • 3
  • 37
  • 60