2

How can I parse my CSV file without parsing first line ?

This class work but I don't want to parse the header of my CSV.

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

      def parse = new File(filepath)

      // split and populate GeneInfo
      parse.splitEachLine(',') {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }

    }
}

I change my Class, so now I have :

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

    def parse = new File(filepath).readLines()[1..-1]

    parse.each {line ->

      // split and populate GeneInfo
      line.splitEachLine(',') {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }
     }
    }
}

Works fine, until this part in my CSV :
"Homo sapiens interleukin 4 receptor (IL4R), transcript variant 1, mRNA."

When my parser get this part, he cut in 3 (should be in 1) :
- Homo sapiens interleukin 4 receptor (IL4R)
- transcript variant 1
- mRNA.

How can I fix that ? Thank you for your help.

-- New comment -- Here is a copy (2nd line) of my CSV line :
"M6.6",NA,"ILMN_1652185",NA,NA,"IL4RA; CD124",NA,"NM_000418.2","16","16p12.1a","Homo sapiens interleukin 4 receptor (IL4R), transcript variant 1, mRNA.",3566,...

As you can see my problem is in line "Homo sapiens interleukin 4 receptor (IL4R), transcript variant 1, mRNA." ; I don't want to cut text between " and ". My parser should only split ',' out of quotes (but not commas between quotes). For example I have : "part1","part2","part3", I just want cut part1, part2, part3, and if there are commas in my part2, I don't want to cut these commas.

To sum up, I just want Ignoring commas in quoted elements.

Bill the Lizard
  • 369,957
  • 201
  • 546
  • 842
Fabien Barbier
  • 1,504
  • 4
  • 28
  • 41

2 Answers2

1

You can read each line of the file except the first into a List using:

List<String> allLinesExceptHeader = new File(filepath).readLines()[1..-1]

Each line of the file (an element of allLinesExceptHeader) can then be parsed using code similar to that shown above

allLinesExceptHeader.each {line ->    
    // Code to parse each line goes here
}
Dónal
  • 176,670
  • 166
  • 541
  • 787
1

Ok, I have my Fix !

Here the code :

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

    def parse = new File(filepath).readLines()[1..-1]

    def token = ',(?=([^\"]*\"[^\"]*\")*[^\"]*$)'

    parse.each {line ->

      // split and populate GeneInfo
      line.splitEachLine(token) {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }
     }
    }
}

See this post for more details : Java: splitting a comma-separated string but ignoring commas in quotes

Community
  • 1
  • 1
Fabien Barbier
  • 1,504
  • 4
  • 28
  • 41
  • Did you consider using a CSV parser that does all that for you ? Like Ostermiller's ? [http://ostermiller.org/utils/CSV.html][1] [1]: http://ostermiller.org/utils/CSV.html – Philippe Aug 23 '10 at 13:27
  • 1
    Here's another csv parsing lib for Groovy that I created a while back: [GroovyCSV](http://xlson.com/groovycsv/). It's based on opencsv. – xlson Apr 11 '11 at 13:43