0

I'm trying to build myself a mini programming language using my own custom regular expression and abstract syntax tree parsing library 'srl.h' (aka. "String and Regular-Expression Library") and I've found myself an issue I can't quite seem to figure out.

The problem is this: When my custom code encounters an error, it obviously throws an error message, and this error message contains information about the error, one bit being the line number from which the error was thrown.

The issue comes in the fact that C++ seems to just be flat out ignoring the existence of lines which contain no characters (ie. line that are just the CRLF) until it finds a line which does contain characters, after which point it stops ignoring empty lines and treats them properly, thus giving all errors thrown an incorrect line number, with them all being incorrect by the same offset.

Basically, if given a file which contains the contents "(crlf)(crlf)abc(crlf)def", it'll be read as though its content were "abc(crlf)def", ignoring the initial new lines and thus reporting the wrong line number for any and all errors thrown.

Here's a copy of the (vary messily coded) function I'm using to get the text of a text file. If one of y'all could tell me what's going on here, that'd be awesome.

template<class charT> inline std::pair<bool, std::basic_string<charT>> load_text_file(const std::wstring& file_path, const char delimiter = '\n') {


    std::ifstream fs(file_path);

    
    std::string _nl = srl::get_nlp_string<char>(srl::newline_policy);


    if (fs.is_open()) {


        std::string s;


        char b[SRL_TEXT_FILE_MAX_CHARS_PER_LINE];


        while (!fs.eof()) {


            if (s.length() > 0)
                s += _nl;


            fs.getline(b, SRL_TEXT_FILE_MAX_CHARS_PER_LINE, delimiter);


            s += std::string(b);
        }


        fs.close();


        return std::pair<bool, std::basic_string<charT>>(true, srl::string_cast<char, charT>(s));
    }
    else
        return std::pair<bool, std::basic_string<charT>>(false, std::basic_string<charT>());
}
Tirous
  • 121
  • 1
  • 8
  • 1
    Take a look at https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons – cigien Jul 05 '20 at 13:51
  • Thanks, but I don't quite understand how this post addresses the problem I'm having. If you could elaborate in an answer about how it does, that would be great. :) – Tirous Jul 05 '20 at 14:19
  • You have buggy code. Fix that first, and if your problem persists, then update the question. – cigien Jul 05 '20 at 14:27
  • 1
    @Tirous: If you want to write a parser system, then you shouldn't be using `getline` to read data from the file. Read the *exact text* from the file and process it. Just read the entire file. – Nicol Bolas Jul 05 '20 at 14:52

1 Answers1

1

std::ifstream::getline() does not input the delimiter (in this case, '\n') into the string and also flushes it from the stream, which is why all the newlines from the file (including the leading ones) are discarded upon reading.

The reason it seems the program does not ignore newlines between other lines is because of:

if (s.length() > 0)
     s += _nl;

All the newlines are really coming from here, but this cannot happen at the very beginning, since the string is empty.

This can be verified with a small test program:

#include <iostream>
#include <fstream>
#include <string>

int main()
{
    std::ifstream inFile{ "test.txt" }; //(crlf)(crlf)(abc)(crlf)(def) inside

    char line[80]{};
    int lineCount{ 0 };

    std::string script;

    while (inFile.peek() != EOF) {
        inFile.getline(line, 80, '\n');
        lineCount++;

        script += line;
    }

    std::cout << "***Captured via getline()***" << std::endl;
    std::cout << script << std::endl; //prints "abcdef"
    std::cout << "***End***" << std::endl << std::endl;

    std::cout << "Number of lines: " << lineCount; //result: 5, so leading /n processed

}

If the if condition is removed, so the program has just:

s += _nl;

, newlines will be inserted instead of the discarded ones from the file, but as long as '\n' is the delimiter, std::ifstream::getline() will continue discarding the original ones.

As a final touch, I would suggest using

while (fs.peek() != EOF){};

instead of

while(fs){}; or while(!fs.eof()){};

If you look at int lineCount's final value in the test program, the latter two give 6 instead of 5, as they make a redundant iteration in the end.