0

I have a CSV file that I need to read line by line with the help of a Scanner and store only country names into an array of strings. Here is my CSV file:

World Development Indicators
Number of countries,4
Country Name,2005,2006,2007
Bangladesh,6.28776238,13.20573922,23.46762823
"Bahamas,The",69.21279415,75.37855087,109.340767
Brazil,46.31418452,53.11025849,63.67475185
Germany,94.55486999,102.2828888,115.1403608

This is what I have so far:

public String[] getCountryNames() throws IOException, FileNotFoundException{
    String[] countryNames = new String[3];
    int index = 0;
    BufferedReader br = new BufferedReader(new FileReader(fileName));
    br.readLine();
    br.readLine();
    br.readLine();
    String line = br.readLine();
    while((br.readLine() != null) && !line.isEmpty()){
        String[] countries = line.split(",");
        countryNames[index] = countries[0];
        index++;
        line = br.readLine();
    }
    System.out.println(Arrays.toString(countryNames));
    return countryNames;
}

Output:

[Bangladesh, Brazil, null]

For some reason it skips "Bahamas, The" and can't read Germany. Please help me, I have been stuck on this method for hours already. Thanks for your time and effort. The return should be an array of Strings (country names).

nbrooks
  • 17,489
  • 5
  • 46
  • 61
Dmytro Marych
  • 31
  • 2
  • 8
  • 2
    Use an appropriate parsing library, like [Apache Commons CSV](https://commons.apache.org/proper/commons-csv/) or [OpenCSV](http://opencsv.sourceforge.net) – MadProgrammer Jan 29 '18 at 00:42
  • 2
    Whenever you call `br.readline()` you are effectively reading a new line. It does not matter if you are doing it inside or outside of the loop or just for comparison. – DobromirM Jan 29 '18 at 00:42

2 Answers2

1

It seems that you're reading too many lines, as seen below:

String line = br.readLine(); // Reads 1 line
while((br.readLine() != null) && !line.isEmpty()){ // Reads 1 line per iteration (and doesn't store it in a variable)
    String[] countries = line.split(",");
    countryNames[index] = countries[0];
    index++;
    line = br.readLine(); // Reads another line per iteration
}

The correct syntax for the while loop is:

String line;

while((line = br.readLine()) != null && !line.isEmpty() && index < countryNames.length) {
    String[] countries = line.split(",");
    countryNames[index++] = countries[0];
}

Notice how line is being assigned within the condition rather than within the loop body.

Jacob G.
  • 26,421
  • 5
  • 47
  • 96
0

There are two issues with your code for parsing this CSV file. As a few folks have pointed out, you're calling readLine on your reader too many times, and discarding the output. Each time you read from the stream, you lose access to any data before the current read point. So reader.readLine() != null, for example, reads new data from the stream, checks that it isn't null, and then immediately gets rid of it since you haven't stored it in a variable. That's the main reason you're losing data while reading.

The second issue is your split condition. You're splitting on commas, which makes sense since this is a CSV file, but your data contains commas too (for example, "Bahamas, The"). You'll need a more specific split condition, as described in this post.

Here's an example of what this might look like (using a list for the countryNames instead of an array, because that's much easier to work with):

private static final String csv = "World Development Indicators\n"
    + "Number of countries,4\n"
    + "Country Name,2005,2006,2007\n"
    + "Bangladesh,6.28776238,13.20573922,23.46762823\n"
    + "\"Bahamas,The\",69.21279415,75.37855087,109.340767\n"
    + "Brazil,46.31418452,53.11025849,63.67475185\n"
    + "Germany,94.55486999,102.2828888,115.1403608\n";

public static String[] getCountryNames() throws Exception {
    List<String> countryNames = new ArrayList<>();

    //BufferedReader br = new BufferedReader(new FileReader(fileName));
    BufferedReader br = new BufferedReader(new StringReader(csv));
    br.readLine();
    br.readLine();
    br.readLine();

    String line = br.readLine();
    while (line != null && !line.isEmpty()) {
        String[] countries = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
        countryNames.add(countries[0]);
        line = br.readLine();
    }

    System.out.println(countryNames);
    return countryNames.toArray(new String[0]);
}
nbrooks
  • 17,489
  • 5
  • 46
  • 61
  • That regex is slow, since it has to scan the rest of the line every time it encounters a comma. The code also cannot handle a quoted multi-line value. Would be much better to use an actual CSV Parser. --- Upvote for fixing `while` loop and use of `List`. Downvote for slow/incomplete CSV parsing. No vote either way from me. – Andreas Jan 29 '18 at 01:17
  • @Andreas Yes, using an actual CSV parser would be the ideal alternative, but I figured that pointing that out wouldn't be very helpful for the OP's understanding of the issues with their current code. That regex is from the post I linked (i.e. I didn't write it), but it performs well enough for the use case here. Since there aren't likely to be any country names that are multiple lines, accounting for that probably isn't necessary or relevant in this solution. Thanks for noting that though! – nbrooks Jan 29 '18 at 01:23
  • @nbrooks StringReader did not work for me, but FileReader did. Thanks a lot! And thanks for all the explanation, I really appreciate it – Dmytro Marych Jan 29 '18 at 02:37