I have noticed that using java.util.Scanner
is very slow when reading large files (in my case, CSV files).
I want to change the way I am currently reading files, to improve performance. Below is what I have at the moment. Note that I am developing for Android:
InputStreamReader inputStreamReader;
try {
inputStreamReader = new InputStreamReader(context.getAssets().open("MyFile.csv"));
Scanner inputStream = new Scanner(inputStreamReader);
inputStream.nextLine(); // Ignores the first line
while (inputStream.hasNext()) {
String data = inputStream.nextLine(); // Gets a whole line
String[] line = data.split(","); // Splits the line up into a string array
if (line.length > 1) {
// Do stuff, e.g:
String value = line[1];
}
}
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
Using Traceview, I managed to find that the main performance issues, specifically are: java.util.Scanner.nextLine()
and java.util.Scanner.hasNext()
.
I've looked at other questions (such as this one), and I've come across some CSV readers, like the Apache Commons CSV, but they don't seem to have much information on how to use them, and I'm not sure how much faster they would be.
I have also heard about using FileReader
and BufferedReader
in answers like this one, but again, I do not know whether the improvements will be significant.
My file is about 30,000 lines in length, and using the code I have at the moment (above), it takes at least 1 minute to read values from about 600 lines down, so I have not timed how long it would take to read values from over 2,000 lines down, but sometimes, when reading information, the Android app becomes unresponsive and crashes.
Although I could simply change parts of my code and see for myself, I would like to know if there are any faster alternatives I have not mentioned, or if I should just use FileReader
and BufferedReader
. Would it be faster to split the huge file into smaller files, and choose which one to read depending on what information I want to retrieve? Preferably, I would also like to know why the fastest method is the fastest (i.e. what makes it fast).