How to read integers/doubles from a large text file in Java

Question

I am making a Pi based RNG(Random Number Generator) for a research project. I am getting stumped at this point hence I cant seem to figure out how to read the digits form a rather large file (1GB). Here is the input:

....159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847564823378678316527120190914564856692346034861045432664821339360726024914127372458700660631558817488152092096282925409171536436789259036001133053054882046652138414695194151160943305727036575959195309218611738193261179310511854807446237996274956735188575272489122793818301194912983367336244065664308602139494639522473719070217986094370277053921717629317675238467481846766940513200056812714526356082778577134275778960917363717872146844090122495343014654958537105079227968925892354201995611212902196086403441815981362977477130996051870721134999999837297804995105973173281609631859502445945534690830264252230825334468503526193118817101000313783875288658753320838142061717766914730359825349042875546873115956286388235378759375195778185778053217122680661300192787661119590921642019893809525720106548586327886593615338182....

File is ugly I know... its Pi to 1 Billionth decimal place. I am not going into details on why I am doing this but here is my goal. I want to be able to skip x number of decimal places before beginning printing output, I also need to be able to read out y number of consecutive digits at a time so like if it was 4 at a time output would look like:

1111\n 2222\n 3333\n 4444\n....

My base objective is to be able to print at least 1 number at a time hence after that I can piece them together how I want... So basic output is:

For input 3.1415.. I get.. 3,1,4,1,5....

I tried bunch of File Streams from Java API but it only prints bytes/bits... I have no idea on how to convert them to something meaningful.

Also, Reading line by line is not optimal hence I have to have my numbers be same length and I feel like reading line by line would cut them off in a funny way..

Why can't you read the bytes in and then convert it too a string? — DejaVuSansMono, Nov 12 '14 at 20:03
That's where I get stuck. Any classes you have in mind that do that. — pirate694, Nov 12 '14 at 20:27
Apache commons has an set of IOUtils that has a toString method that takes in a byte stream. It would be up to you to figure out when you want to cut it off and it requires a file encoding. I would look at this though http://stackoverflow.com/questions/8512121/byte-to-string-java — DejaVuSansMono, Nov 12 '14 at 20:28
"... read the digits form a rather file"?? Assuming the file consists of nothing but ASCII digits: [RandomAccessFile](https://docs.oracle.com/javase/8/docs/api/java/io/RandomAccessFile.html), [seek(long)](https://docs.oracle.com/javase/8/docs/api/java/io/RandomAccessFile.html#seek-long-), [read(byte[\])](https://docs.oracle.com/javase/8/docs/api/java/io/RandomAccessFile.html#read-byte:A-), `new String(bytes, StandardCharsets.US_ASCII)` — David Conrad, Nov 12 '14 at 20:39
...Rather large file*. Simple typo. File has to be accessed in order and not randomly hence that is using another Sudo RNG which is not my goal. — pirate694, Nov 12 '14 at 20:45
@DavidConrad I agree that reading bytes should work. Also, considering that the first digit is always 3, the file don't even need to contain this, but it could be a nice way to validate the file. First digit must always be 3. RandomAccessFile will be my choice. After all, you can always set the offset at 0. — hfontanez, Nov 12 '14 at 21:11
@DavidConrad I think the character set should be UTF-8 instead. — hfontanez, Nov 12 '14 at 21:17
@hfontanez If it contains only digits, the difference between UTF_8, ISO_8859_1, and US_ASCII is academic. — David Conrad, Nov 12 '14 at 21:35

Lolo · Answer 1 · 2014-11-12T20:50:10.057

What you need is a character stream, basically a subclass of Reader, so you can read character by character, rather than byte by byte.

To achive what you need, you will have to:

List item
open a character stream to the file containing your input digits. Prefer a BufferedReader over a FileReader to speed up the I/O, since reading char by char can be very slow, especially with large files
you will need to keep track of the previous character read (if any) and group strings of identical characters in an appropriate data strcuture (for instance a StringBuilder)
if you need to skip the first n characters, use Reader.skip(n); at the start

The following code does exactly what I understand of your requirements:

public class Test {
  public static void main(String[] args) {
    final char decimalSeparator = ',';
    try (Reader reader = new BufferedReader(new FileReader("pi.txt"))) {
      int prevC = -1; // previous character read from the stream
      int c; // latest character read from the stream
      StringBuilder sb = new StringBuilder();
      while ((c = reader.read()) != -1) {
        // if first digit or same as previous digit
        if ((prevC == -1) || (c == prevC)) {
          sb.append((char) c);
        } else {
          // print the group of digits and reset sb
          if (sb.length() > 0) {
            System.out.println(sb.toString());
            sb = new StringBuilder();
          }
          sb.append((char) c);
        }
        prevC = c;
      }
      // print the last digits group
      if (sb.length() > 0) {
        System.out.println(sb.toString());
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

If you want to read at arbitrary places in a 1 GB file, a RandomAccessFile that you can seek in is a better choice, I think. — David Conrad, Nov 12 '14 at 20:42
Again, I cannot use any RNG I have to read data sequentially. — pirate694, Nov 12 '14 at 20:46
@pirate694 Sorry, I misunderstood. I thought you wanted to read digits from the file starting at a randomly-chosen point. — David Conrad, Nov 12 '14 at 21:37

score 0 · Accepted Answer · answered Nov 12 '14 at 21:47

0

Okay I have spoken to a CS professor and it seems that I have forgotten my basic Java training. 1Byte = 1 char. In this case BufferedInputReader spits out ASCII values for said chars. Here is simple solution:

FileInputStream ifs = new FileInputStream(pi); //Input File containing 1 billion digits
BufferedInputStream bis = new BufferedInputStream(ifs);
System.out.println((char)bis.read()); //Build strings or parse chars how you want

..Rinse and repeat. Sorry for wasting time... but I hope this will set someone one the right track down the road.

answered Nov 12 '14 at 21:47

pirate694

128
2
15

Remember that your solution works for this case because you are dealing with numbers. As David Conrad stated, the difference (encoding) is academic. Dealing with actual characters is a whole different story. Good luck on your classes! – hfontanez Nov 12 '14 at 21:53
Yes, my solution is what I was looking for. Anything else is not in scope of my question. Thank you. – pirate694 Nov 12 '14 at 22:13

How to read integers/doubles from a large text file in Java

2 Answers2