3

I'm doing some tasks on CodeEval. Basically the task is very simple: "Print out the sum of all the integers read from the file".

My solution is following:

import java.io.File;
import java.io.IOException;
import java.io.BufferedReader;
import java.io.FileReader;

public class SumIntegersFromFile {

    public static void main(String args[]) throws IOException{

        File file = new File(args[0]);
         BufferedReader br = new BufferedReader( new FileReader(file));
         String line;
         int i=0;
         while((line=br.readLine())!=null){
            int k = Integer.parseInt(line);
             i+=k;
         }
         br.close();
         System.out.println(i);
    }
}

But I was told this solution is not optimal from a performance point of view.

The code is based on the recommendations in the question Best way to read a text file. The only difference here is I am reading integers instead of strings.

What is the most performance-efficient way to read integers from a file in Java?

Community
  • 1
  • 1
Ivan T
  • 956
  • 1
  • 8
  • 21
  • 3
    What does " I got only 29.352 from 35" mean? – BitNinja Aug 31 '14 at 20:11
  • 9
    This question appears to be off-topic because it is about improving working code. Try posting it at [codereview.se]. – Keppil Aug 31 '14 at 20:14
  • @BitNinja I meant scores, maximum score is 35, I got 29.352 – Ivan T Aug 31 '14 at 20:16
  • 1
    try not storring `Integer.parseInt(line);` in a variable? Use `i += Integer.parseInt(line);` – qbit Aug 31 '14 at 20:17
  • @xgeorgekx I don't think that will make any difference. The `k` is not used subsequently, so I imagine the two things will produce the same bytecode. – chiastic-security Aug 31 '14 at 20:18
  • @xgeorgekx Indeed it helped, it improved from 29.352 to 29.461 – Ivan T Aug 31 '14 at 20:30
  • 4
    @Keppil and all who voted to close: can you point to some specific text in the Help Center that this question falls foul of? It seems to me to fit within the scope as defined there. – chiastic-security Sep 01 '14 at 08:19
  • 2
    @chiastic: I agree that it is not as clear as it could be. When it comes to reviewing and improving working code, [codereview.se] is dedicated to just that, therefore I think that the question should be asked there instead. – Keppil Sep 01 '14 at 09:30
  • 4
    @Keppil I agree that it's in scope for Code Review, but I don't see that that makes it out of scope here. There are lots of questions that could legitimately be asked in more than one place. A large number of the questions on SO are of the form where someone posts some "working" but inefficient code and wants some clues for how to improve it, e.g., http://stackoverflow.com/questions/25576302/wary-of-flattening-longish-liststring-into-string , to which no one would object. – chiastic-security Sep 01 '14 at 09:41
  • possible duplicate of [Fastest way to sum integers in text file](http://stackoverflow.com/questions/25606833/fastest-way-to-sum-integers-in-text-file) – Florent Bayle Sep 10 '14 at 07:44

2 Answers2

1

Unless you've been explicitly told otherwise, you shouldn't assume that the total will fit in an int. Try changing the type of i to a long, or even a BigInteger, and see if that makes a difference to your score.

You might try doing the same with k (and using Long.parseLong(line)). It will depend on the exact wording of the question, but perhaps the individual values could exceed the limits of an int too.

One more thing... the question, as you've phrased it, just says that you should sum all the integers. That leaves open the possibility that there will be lines that aren't integers, in which case you should skip them, rather than throwing a NumberFormatException (which is what your code will do at the moment).

(And presumably you've been told that it's one entry per line...)

But if you want to squeeze every last bit of performance out, you need to read the file as binary rather than line by line: turning each line into a String is just too expensive. A detailed account of how to do it can be found in this question on summing integers from a text file.

Community
  • 1
  • 1
chiastic-security
  • 19,689
  • 3
  • 33
  • 62
  • Thank you for the answer. I solve the issue correctly, the problem is that it is not so optimized. The full description is here: https://www.codeeval.com/open_challenges/24/ – Ivan T Aug 31 '14 at 20:56
  • Can you post the full analysis of the score? How much detail does it give you? – chiastic-security Aug 31 '14 at 21:04
  • Yes you can find it here: max_memory = 20 * 1024 * 1024 # 20 MB max_time = 10 * 1000 # 10 sec # if submission takes more than 10 seconds # or uses more than 20MB of memory # the score is 0 if memory_taken > max_memory or time_taken > max_time: return 0 max_total_score = total_max[category] memory_factor = 1 - memory_taken/max_memory time_factor = 1 - time_taken/max_time factor = (memory_factor + time_factor)/2 return score * max_total_score * factor / 100 – Ivan T Aug 31 '14 at 21:08
  • OK, I've added a suggestion for increasing performance too. It relies on the assumption that the whole thing will fit in memory. Really this seems rather pedantic: any sensible coder would code it up exactly the way you have done. – chiastic-security Aug 31 '14 at 21:09
  • 2
    @IvanT I've asked this new question that hopefully will give significantly more insight into the answer to yours, but also raises some further questions of its own: http://stackoverflow.com/questions/25606833/fastest-way-to-sum-integers-in-text-file – chiastic-security Sep 01 '14 at 13:23
  • Could you please edit your answer with the link to your question here: http://stackoverflow.com/questions/25606833/fastest-way-to-sum-integers-in-text-file and I'll accept it as a right one. – Ivan T Sep 04 '14 at 07:48
1

I see nothing wrong with the performance of your code. That is, I dispute the claim that your program has anything wrong with it.

Reading data from files, or across the network, is several orders of magnitude slower than manipulating data in memory. The performance of code that mixes I/O with some manipulation of data in memory is therefore typically dominated by the time taken for the I/O. Tweaks to the manipulation of data in memory are rarely worthwhile. If I/O operations happen in parallel with data manipulation (which will be the case if the O/S does some read-ahead), the data manipulation can be almost free: making the data manipulation faster will not decrease the time taken because any decreases in the CPU time for the data manipulation will be precisely offset by an increase in the amount of time that the program blocks while awaiting input.

Programs that do I/O and need good performance must decrease the amount of time they spend blocked awaiting I/O. They should operate in a manner that enables them to take advantage of the optimizations that the hardware and operating system provide to reduce the amount of blocking.

Importantly, at the low level, disks and networks do not operate on small numbers of bytes for each operation. They use larger units of packets or blocks. Interacting with the operating system to read fewer bytes than are stored in one disk block is wasteful. Programs avoid doing that by buffering their I/O, so the program itself changes a sequence of many small I/O operations into fewer but larger operations. You are using a BufferedReader, so you are already doing that.

The operating system is likely to do some read-ahead: if you ask for bytes in a block at the start of a file it will guess that you are probably going to read the file sequentially, so it would be worthwhile for it to also fetch some of the subsequent blocks of the file, in anticipation of your program also needing those. Reading files sequentially gives better performance. You are already doing that.

Raedwald
  • 40,290
  • 35
  • 127
  • 207