1

Since we know CSV files content has each line separated using \r\n, then we can get each line easily. Code like:

scanner.useDelimiter("\r\n"); 
while(scanner.hasNext()){
    scanner.next();
}

But how if a field of CSV file has "\r\n" inside, then this code doesn't work! Like CSV here:

Row1: "abc\r\nabc","abc","abc"
Row2: "efg", "efg", "efg"
Row3: "hjk", "hjk"

I would like using scanner to read in as:

"abc\r\nabc","abc","abc"
"efg", "efg", "efg"
"hjk", "hjk"

but if just using \r\n, it turns out would be:

“abc
abc","abc","abc"
"efg", "efg", "efg"
"hjk", "hjk"

What change should I do? How to modify scanner.useDelimiter("\r\n") to make the pattern workable?

Duncan Jones
  • 59,308
  • 24
  • 169
  • 227
Pengzhi Zhou
  • 381
  • 5
  • 23
  • shouldn't the delimiter be "," instead ? – Majid Laissi Sep 09 '12 at 20:10
  • I would like using scanner to separate lines, first I used bufferedreader, but it would stop when met with "\n", but I just found out scanner.useDelimiter("\r\n") doesn't work as well.. since in a field of CSV may contain "\r\n" inside. – Pengzhi Zhou Sep 09 '12 at 20:10
  • "," can also be inside a field of CSV. In my codes, I try to separate lines first, read each line of CSV in, then in each line of fields, separate them using other pattern. – Pengzhi Zhou Sep 09 '12 at 20:15
  • 1
    if \r\n is in between the quotation marks then it's not csv file. – gigadot Sep 09 '12 at 20:17
  • i was wrong https://tools.ietf.org/html/rfc4180 – gigadot Sep 09 '12 at 20:26
  • @gigadot, really? so you mean a line like ("abc\r\nabc","abc","abc") can not be made as a CSV file? – Pengzhi Zhou Sep 09 '12 at 20:26

3 Answers3

1

Firstly, I would recommend you research existing CSV parsing libraries. I imagine they will do a very good job coping with anything that exists between your field delimeters (""), including the same character(s) that terminate your lines. See related question: CSV API for Java.

Failing that, I would attempt to implement the solutions presented in this SO question: Java: splitting a comma-separated string but ignoring commas in quotes.

Community
  • 1
  • 1
Duncan Jones
  • 59,308
  • 24
  • 169
  • 227
  • "," can also be inside a field of CSV. In my codes, I try to separate lines first, read each line of CSV in, then in each line of fields, separate them using other pattern, I have found a very useful pattern for this, but now my problem is I can't separate lines successfully.. maybe my way is too far stupid.. but thanks anyway. – Pengzhi Zhou Sep 09 '12 at 20:19
  • 1
    @PengzhiZhou I would *strongly* recommend you seek a CSV library. Don't re-invent the wheel on this one! People have already battled these issues and have succeeded. See also the new link in my answer. – Duncan Jones Sep 09 '12 at 20:20
  • thanks for your kindly suggest, actually that is a work test, I don't think they would like to see me using other CSV library. – Pengzhi Zhou Sep 09 '12 at 20:25
  • @PengzhiZhou Then see the other link I added to my edited answer. – Duncan Jones Sep 09 '12 at 20:26
  • I have read that post already, that is the pattern I used to separate fields of each line, but the problem is I need to read a line of string in first, then I can apply that pattern to separate by ",". I have a csv file, I need to separate by lines first, I just have no clue how to do it. If I directly using that pattern of that post to separate an entire file, i will have no way to restore that file again using those data. – Pengzhi Zhou Sep 09 '12 at 20:34
1

It's always tempting to roll your own solution, but it's issues like embedded newlines that make it far easier to use a CSV library.

Super CSV caters for embedded newlines (it's compliant with RFC4180 - the MIME type definition of CSV), as well as embedded quotes and delimiters (all configurable). As well as being able to read into a List, Map or POJO, you can define processors to convert or validate your data, and you'll get a lot more information when something goes wrong (the line number, row number, column number, etc).

We (the Super CSV team) have just released a new version, which brings many improvements and bug fixes as well as a powerful new extension that maps between CSV files and POJOs using Dozer.

It's available for download on SourceForge or Maven.

James Bassett
  • 8,062
  • 3
  • 30
  • 65
0

You could try the delimiter:

 "\"\r\n\""

which should work provided each line begins and ends with a ". Although it would still be broken if one of your strings contained just a newline.

MikeFHay
  • 7,264
  • 4
  • 24
  • 43
  • this delimiter cannot find any match, only would get entire file in one string. Seems in each line break \r\n, I don't think they would be surround by "". I also have tested it in my code, sorry it doesn't work – Pengzhi Zhou Sep 09 '12 at 21:05