1

so today I ran into some trouble while using the java Scanner. I use the Scanner class many times in my project and I never ran into any problem.

Basically, I always do something like this:

try (Scanner scanner = new Scanner(file)) {
    while(scanner.hasNextLine()) {
        String line = scanner.nextLine();
        ...
    }
} catch ...
} finally ...

and the Scanner works just fine because it's just some simple code. Today, however, I used the code above to read text files with about 17000 lines.

At first the code worked just fine (when running it through Eclipse) as I expected but then, after exporting the project, the Scanner would stop reading after about 400 lines.

I googled a bit and in the end I solved the problem thanks to these answers:

All I had to do was change the constructor from

Scanner scanner = new Scanner(file)

to

Scanner scanner = new Scanner(new FileInputStream(sql)))

It is some weird encoding problem, I get it. But why when I ran the code from Eclipse it worked flawlessy and when I ran it from my exported jar the Scanner stopped after reading about 400 lines?

The code does the exact same thing in both cases because I set up Eclipse so that it would use the same working directory as the exported .jar archive (because it has got some data subdirectories):

  1. Takes the same .gz archive
  2. Extract a file from the .gz
  3. Read the file like I showed above

Not sure if it helps but Eclipse is set up to save source files in UTF-8 format.

Thanks in advance

Community
  • 1
  • 1
xuT
  • 319
  • 3
  • 14

1 Answers1

2

But why when I ran the code from Eclipse it worked flawlessy and when I ran it from my exported jar the Scanner stopped after reading about 400 lines?

Scanner has two different constructors that accept a File as argument. From the docs:

Scanner(File source)

Bytes from the file are converted into characters using the underlying platform's default charset.

and

Scanner(File source, String charsetName)

Bytes from the file are converted into characters using the specified charset.

So if you do not specify the charsetName, it will use the environment's default charset.

The environment encoding when you run your project outside Eclipse is probably other than UTF-8. To check that this is the case you can write a simple program like this:

class CheckDefaultCharset {
    public static void main(String... args) {
        System.out.println(Charset.defaultCharset());
    }
}

And run it on both environments.

For example, when running the above code from Eclipse I get:

UTF-8

And when running in the PowerShell (Windows 7), I get:

windows-1252

To avoid this type of problem it would be better to always specify the encoding of the files you intend to use when using Scanner.

Anderson Vieira
  • 8,096
  • 2
  • 30
  • 44
  • This is it. I figured it was some kind of environment variable problem and knew Scanner had that second argument but I never thought about using it since I never had any problem at all. In the end it was just a simple and stupid problem. Thanks a lot! – xuT Apr 05 '15 at 12:46
  • Glad to help! And don't worry, encoding problems are very common. – Anderson Vieira Apr 05 '15 at 13:00