3

I am trying to read an ascii file and recognize the position of newline character "\n" as to know which and how many characters i have in every line.The file size is 538MB. When i run the below code it never prints me anything. I search a lot but i didn't find anything for ascii files. I use netbeans and Java 8. Any ideas??

Below is my code.

String inputFile = "C:\myfile.txt";
FileInputStream in = new FileInputStream(inputFile);
FileChannel ch = in.getChannel();
int BUFSIZE = 512;
ByteBuffer buf = ByteBuffer.allocateDirect(BUFSIZE);
Charset cs = Charset.forName("ASCII");

while ( (rd = ch.read( buf )) != -1 ) {
        buf.rewind();
        CharBuffer chbuf = cs.decode(buf);

        for ( int i = 0; i < chbuf.length(); i++ ) {
             if (chbuf.get() == '\n'){
                System.out.println("PRINT SOMETHING");
             }
        }
}
dimcode
  • 203
  • 3
  • 15
  • Have you looked at http://stackoverflow.com/questions/4716503/best-way-to-read-a-text-file-in-java ? – Gaël J Oct 25 '15 at 12:28
  • I have already seen this post but with BufferReader it throws me Java Out of Memory error so i am not able to use the readline() function. – dimcode Oct 25 '15 at 12:40
  • Use `RandomAccessFile` instead of `FileReaders` for large files. – ccc Oct 25 '15 at 12:55

3 Answers3

1

Method to store the contents of a file to a string:

static String readFile(String path, Charset encoding) throws IOException 
{
    byte[] encoded = Files.readAllBytes(Paths.get(path));
    return new String(encoded, encoding);
}

Here's a way to find the occurrences of a character in the entire string:

public static void main(String [] args) throws IOException
{
    List<Integer> indexes = new ArrayList<Integer>();
    String content = readFile("filetest", StandardCharsets.UTF_8);
    int index = content.indexOf('\n');
    while (index >= 0)
    {
        indexes.add(index);
        index = content.indexOf('\n', index + 1);
    }
}

Found here and here.

Community
  • 1
  • 1
Daffyd
  • 104
  • 7
0

The number of characters in a line is the length of the string read by a readLine call:

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
    int iLine = 0;
    String line;
    while ((line = br.readLine()) != null) {
        System.out.println( "Line " + iLine + " has " +
                            line.length() + " characters." );
        iLine++;
    }
} catch( IOException ioe ){
    // ...
}

Note that the (system-dependent) line end marker has been stripped from the string by readLine.

If a very large file contains no newlines, it is indeed possible to run out of memory. Reading character by character will avoid this.

    File file = new File( "Z.java" );
    Reader reader = new FileReader(file);
    int len = 0;
    int c;
    int iLine = 0;
    while( (c = reader.read()) != -1) {
        if( c == '\n' ){
            iLine++;
            System.out.println( "line " + iLine + " contains " +
                                len + " characters" );
            len = 0;
         } else {
            len++;
         }
    }
    reader.close();
laune
  • 30,276
  • 3
  • 26
  • 40
  • With BufferedReader it throws me java.lang.OutOfMemoryError: Java heap space. Thats why i use ByteBuffer. – dimcode Oct 25 '15 at 16:41
  • @Iostromos Is it possible that the entire file doesn't contain line ends? Is this a "regular" text file or some weird bunch of bytes? – laune Oct 25 '15 at 16:47
  • @Iostromos Added a version not storing any file data - this should be OK. (If too slow: it can be improved.) – laune Oct 25 '15 at 16:57
  • It really throws me OOM, what kind of verifyable example you want to post? Thats a link from the screenshot i just capture with the error message [link](http://prntscr.com/8v6vqn) – dimcode Oct 25 '15 at 16:58
  • The file contains newline characters, and its an array 65x65 with 0 and 1 as elements but in an ascii format. – dimcode Oct 25 '15 at 17:29
  • I try the new version of your code. The code runs without errors but program ends without entering the if condition. Any other ideas? – dimcode Oct 25 '15 at 19:03
  • If I run the 2nd version with its own .java file as input it enters the then branch correctly for each line. - A 65x65 array of 0 and 1 isn't 538MB. And how do you know it contains newline characters? - Can you post a data file where you think the 2nd version doesn't find line lengths correctly? – laune Oct 25 '15 at 21:20
  • I didnt understand the file correctly, but now i did. It not an 65x65 array, my mistake. It has 65columns and 1000+ rows with 0 and 1 as elements. Also as comment for your above code, as i mention its an ascii file, so in your code in c value got the ascii codes of 0 1 and space (48, 49 and 32 accordingly). I have to decode the read values first as to get the 0 and 1 and not the ascii codes of them. – dimcode Oct 28 '15 at 19:03
  • So you don't need to know the number of characters in a line - you have to decode the characters and store the (numeric or bit) values 0 and 1 in a suitable data structure. - It's not the reading of the lines that is causing the OOM - you create your problems with the data structure, I bet. You should show all of your code, not jsut a part and claim that this causes OOM. – laune Oct 28 '15 at 19:25
  • Are you sure that you aren't confusing something? 1000 lines containing 65 zeroes and ones have a size of less than 130K. To reach 538MB you need - well, you can do the math. – laune Oct 28 '15 at 20:32
  • You were right about the size of the array. Its an 1680x1680 array. And i found the solution. Your code is correct in a way. As to find the position of the newline character i just write an if statement for c to be equal to 92 (ascii code for "\")(or equal to 110 ascii code for "n"). As for the OOM the second code you posted doesnt throw error as i said before. Anyway thanks for the help. – dimcode Oct 29 '15 at 18:44
  • Well, so mark the answer as accepted - and don't test for "\" - this isn't in the ASCII file, and neither is "n"! – laune Oct 29 '15 at 18:49
-1

You should user FileReader which is convenience class for reading character files.

FileInputStream javs docs clearly states

FileInputStream is meant for reading streams of raw bytes such as image data. For reading streams of characters, consider using FileReader.

Try below

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
    String line;
    while ((line = br.readLine()) != null) {
       for (int pos = line.indexOf("\n"); pos != -1; pos = line.indexOf("\n", pos + 1)) {
        System.out.println("\\n at " + pos);
       }
    }
}
M Sach
  • 30,322
  • 72
  • 198
  • 300