0

I am making a Pi based RNG(Random Number Generator) for a research project. I am getting stumped at this point hence I cant seem to figure out how to read the digits form a rather large file (1GB). Here is the input:

....159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847564823378678316527120190914564856692346034861045432664821339360726024914127372458700660631558817488152092096282925409171536436789259036001133053054882046652138414695194151160943305727036575959195309218611738193261179310511854807446237996274956735188575272489122793818301194912983367336244065664308602139494639522473719070217986094370277053921717629317675238467481846766940513200056812714526356082778577134275778960917363717872146844090122495343014654958537105079227968925892354201995611212902196086403441815981362977477130996051870721134999999837297804995105973173281609631859502445945534690830264252230825334468503526193118817101000313783875288658753320838142061717766914730359825349042875546873115956286388235378759375195778185778053217122680661300192787661119590921642019893809525720106548586327886593615338182....

File is ugly I know... its Pi to 1 Billionth decimal place. I am not going into details on why I am doing this but here is my goal. I want to be able to skip x number of decimal places before beginning printing output, I also need to be able to read out y number of consecutive digits at a time so like if it was 4 at a time output would look like:

1111\n 2222\n 3333\n 4444\n....

My base objective is to be able to print at least 1 number at a time hence after that I can piece them together how I want... So basic output is:

For input 3.1415.. I get.. 3,1,4,1,5....

I tried bunch of File Streams from Java API but it only prints bytes/bits... I have no idea on how to convert them to something meaningful.

Also, Reading line by line is not optimal hence I have to have my numbers be same length and I feel like reading line by line would cut them off in a funny way..

pirate694
  • 128
  • 2
  • 15
  • Why can't you read the bytes in and then convert it too a string? – DejaVuSansMono Nov 12 '14 at 20:03
  • That's where I get stuck. Any classes you have in mind that do that. – pirate694 Nov 12 '14 at 20:27
  • Apache commons has an set of IOUtils that has a toString method that takes in a byte stream. It would be up to you to figure out when you want to cut it off and it requires a file encoding. I would look at this though http://stackoverflow.com/questions/8512121/byte-to-string-java – DejaVuSansMono Nov 12 '14 at 20:28
  • "... read the digits form a rather file"?? Assuming the file consists of nothing but ASCII digits: [RandomAccessFile](https://docs.oracle.com/javase/8/docs/api/java/io/RandomAccessFile.html), [seek(long)](https://docs.oracle.com/javase/8/docs/api/java/io/RandomAccessFile.html#seek-long-), [read(byte[\])](https://docs.oracle.com/javase/8/docs/api/java/io/RandomAccessFile.html#read-byte:A-), `new String(bytes, StandardCharsets.US_ASCII)` – David Conrad Nov 12 '14 at 20:39
  • ...Rather large file*. Simple typo. File has to be accessed in order and not randomly hence that is using another Sudo RNG which is not my goal. – pirate694 Nov 12 '14 at 20:45
  • @DavidConrad I agree that reading bytes should work. Also, considering that the first digit is always 3, the file don't even need to contain this, but it could be a nice way to validate the file. First digit must always be 3. RandomAccessFile will be my choice. After all, you can always set the offset at 0. – hfontanez Nov 12 '14 at 21:11
  • @DavidConrad I think the character set should be UTF-8 instead. – hfontanez Nov 12 '14 at 21:17
  • @hfontanez If it contains only digits, the difference between UTF_8, ISO_8859_1, and US_ASCII is academic. – David Conrad Nov 12 '14 at 21:35

2 Answers2

0

What you need is a character stream, basically a subclass of Reader, so you can read character by character, rather than byte by byte.

To achive what you need, you will have to:

  • List item
  • open a character stream to the file containing your input digits. Prefer a BufferedReader over a FileReader to speed up the I/O, since reading char by char can be very slow, especially with large files
  • you will need to keep track of the previous character read (if any) and group strings of identical characters in an appropriate data strcuture (for instance a StringBuilder)
  • if you need to skip the first n characters, use Reader.skip(n); at the start

The following code does exactly what I understand of your requirements:

public class Test {
  public static void main(String[] args) {
    final char decimalSeparator = ',';
    try (Reader reader = new BufferedReader(new FileReader("pi.txt"))) {
      int prevC = -1; // previous character read from the stream
      int c; // latest character read from the stream
      StringBuilder sb = new StringBuilder();
      while ((c = reader.read()) != -1) {
        // if first digit or same as previous digit
        if ((prevC == -1) || (c == prevC)) {
          sb.append((char) c);
        } else {
          // print the group of digits and reset sb
          if (sb.length() > 0) {
            System.out.println(sb.toString());
            sb = new StringBuilder();
          }
          sb.append((char) c);
        }
        prevC = c;
      }
      // print the last digits group
      if (sb.length() > 0) {
        System.out.println(sb.toString());
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}
Lolo
  • 3,890
  • 2
  • 20
  • 22
0

Okay I have spoken to a CS professor and it seems that I have forgotten my basic Java training. 1Byte = 1 char. In this case BufferedInputReader spits out ASCII values for said chars. Here is simple solution:

FileInputStream ifs = new FileInputStream(pi); //Input File containing 1 billion digits
BufferedInputStream bis = new BufferedInputStream(ifs);
System.out.println((char)bis.read()); //Build strings or parse chars how you want

..Rinse and repeat. Sorry for wasting time... but I hope this will set someone one the right track down the road.

pirate694
  • 128
  • 2
  • 15
  • Remember that your solution works for this case because you are dealing with numbers. As David Conrad stated, the difference (encoding) is academic. Dealing with actual characters is a whole different story. Good luck on your classes! – hfontanez Nov 12 '14 at 21:53
  • Yes, my solution is what I was looking for. Anything else is not in scope of my question. Thank you. – pirate694 Nov 12 '14 at 22:13