2

I am trying to read a file using Scanner Object with the following code -

public void read(){
    Scanner scanner = new Scanner(dataFile).useDelimiter("\n");
    String line;
    int i = 0;
    while(scanner.hasNext()){
          line = scanner.next();
          i++;
    }
    System.out.println(i);
}

The file which I am trying to read from has 117000 lines, out of which the scanner only reads first 59550 odd lines. It does not throw any exception and simply returns.

When I change the implementation to use a BufferedReader it reads all 117000 lines -

public void read(){
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(dataFile)));
    String line;
    int i=0;
    while((line = br.readLine())!= null){ 
          i++;
    }
    System.out.println(i);
}

Can anyone explain why scanner doesn't read all lines ?

Aniket
  • 279
  • 3
  • 7
  • 21
  • I'm not sure, but I do know `Scanner` has an internal cache buffer. Seems to be the file is too large for this Scanner-object, a problem which BufferedReader doesn't have. – Joetjah Jan 24 '14 at 15:37
  • 4
    I can't reproduce this - it works fine for me... although the code that you say is broken doesn't even compile (System vs system), which leads me to wonder whether the *really* broken code is significantly different in some way. It would really help if you could post a short but complete program which demonstrates the problem. – Jon Skeet Jan 24 '14 at 15:38
  • are there any special characters in the lines in the file? – Ross Drew Jan 24 '14 at 15:38
  • How long are the lines in general? How long is the longest line? – Andreas Fester Jan 24 '14 at 15:41
  • @JonSkeet: Changed system.out.println() to System.out.println(). Thanks for pointing that out. – Aniket Jan 24 '14 at 16:03
  • @Andreas: Each line is approximately 300-400 Characters long. – Aniket Jan 24 '14 at 16:10
  • @JonSkeet: I am guessing the file size i.e. the no. of lines is the problem here as the Scanner Implementation works with all other files that I have which are much smaller in size. The largest file having 28856 lines. – Aniket Jan 24 '14 at 16:17
  • @acoolguy: I very much doubt that it's the size of the file. I suspect it's more likely to be the delimiters in this particular file. Can you copy the original file, make it smaller (e.g. 10 lines) and reproduce the problem? Can you give us a link to the original file? – Jon Skeet Jan 24 '14 at 16:19
  • @JonSkeet: Unfortunately I cannot copy the file here as it contains confidential data. I am trying to see the end of the line at which the scanner object stops reading the file. – Aniket Jan 24 '14 at 16:31
  • @acoolguy: Why not just change both bits of code to print out the lines they're reading (with the notional line number)? I suspect you'll find that it really is a problem with the line separator - I very much doubt that Scanner just "stops" reading the file... – Jon Skeet Jan 24 '14 at 16:35
  • @JonSkeet: I simply tried printing out entire lines. The scanner doesn't even read the entire line before returning. Every line in the file ends with the word "international.test". On line no. 59554 it only reads "international." and that is it. It somehow encounters an end of file at that point. – Aniket Jan 24 '14 at 18:43
  • @JonSkeet: I looked at the characters with a binary file editor and did not see any weird character. The character at which the scanner stops reading is '2e'. – Aniket Jan 27 '14 at 17:01
  • @Aniket: Unfortunately without any way of us reproducing the problem, I don't think we're going to be able to help you further. – Jon Skeet Jan 27 '14 at 23:24

4 Answers4

1

One probable reason could be that Scanner's(1KB) buffer limit is less than that of BufferedReader(8KB).

Kazekage Gaara
  • 14,516
  • 13
  • 53
  • 104
1

The following program works for me:

    Scanner scanner = new Scanner(dataFile);
    String line;
    int i = 0;
    while(scanner.hasNextLine()){
          line = scanner.nextLine();
          // System.out.println(line); // remove comment for debug
          i++;
    }
    System.out.println(i);
    scanner.close();

The changes from the original program are:

  1. Changed hasNext() and next() to hasNextLine() and nextLine(). In this case the default delimiter is fine
  2. Fixed a typo - system.out.println should be System.out.println
  3. Added a comment to print line (and check if the delimiter is OK)
  4. Added scanner.close()
gromi08
  • 503
  • 2
  • 6
  • The same implementation works for other but much smaller files perfectly. So I am guessing the no. of lines is the problem here. – Aniket Jan 24 '14 at 16:12
  • [This discussion](http://stackoverflow.com/questions/8330695/java-scanner-not-going-through-entire-file) is for a similar issue. It seems like there is some special character that makes scanner think that EOF is there. Anyhow, BufferedReader seems to be more robust. – gromi08 Jan 24 '14 at 16:17
  • Even with .hasNextLine() and .nextLine() the scanner doesn't go beyond the same line 59554 :( – Aniket Jan 24 '14 at 18:20
  • What if you delete the first 59000 lines? does it fail in line 554? – gromi08 Jan 25 '14 at 09:31
  • If I remove first 59554 lines the scanner reads 0 lines. So it has to be some weird character in the file that the scanner interprets as end of file. – Aniket Jan 27 '14 at 16:43
  • Here is the worst part. I looked at the characters with a binary file editor and did not see any weird characters. The character at which the scanner stops reading is '2e'. – Aniket Jan 27 '14 at 19:53
0

It's probably something to do with the line ending, delimiter used by Scanner.

You should use the methods :

 hasNextLine() and nextLine() 
Eugene
  • 102,901
  • 10
  • 149
  • 252
0

Can anyone explain why scanner doesn't read all lines ?

br.readLine also selects lines that end with \r (and not \n). This is one important difference with your Scanner that only reads lines with \n.

ljgw
  • 2,701
  • 1
  • 17
  • 38
  • I checked the file with Notepad++ editor and each line end with LF. – Aniket Jan 24 '14 at 16:15
  • @acoolguy: Not check with Notepad++ - check with a binary file editor to see the exact bytes involved. This *would* explain it if you had a bizarre separator of "\n\r" for example. – Jon Skeet Jan 24 '14 at 16:21