0

I'm trying to implement a Reader class for my own hobby programming language which I'm working on. The job of the reader would be very simple, to read the source file, and remove comments.

Here's the definition of the Reader class:

// reader.hh
// Contains Reader Class specifications

#ifndef PHI_SRC_FRONTEND_READER_HH
#define PHI_SRC_FRONTEND_READER_HH

#include "errhandler.hh"

class Reader
{
public:
    Reader() = default;

    auto val() const -> String const & { return val_; }

    void read(String const &filename);

    explicit operator bool() const { return success; }
    auto operator!() const -> bool { return !success; }

    friend auto operator==(Reader const &lhs, Reader const &rhs) -> bool
    {
        return lhs.val_ == rhs.val_;
    }

    friend auto operator!=(Reader const &lhs, Reader const &rhs) -> bool
    {
        return lhs.val_ != rhs.val_;
    }

    friend auto operator<<(std::ostream &stream, Reader const &read) -> std::ostream &
    {
        return stream << read.val_;
    }

private:
    String val_;
    bool success;
};

#endif

Previously, I used a very simple placeholder for the Reader's read function. It basically copied everything inside the file using an istreambuf_iterator

void Reader::read(String const &filename)
{
    val_.clear();
    success = true; // success flag, true by default

    auto file = std::ifstream{filename};

    if(!file)
    {
        log_error(Error{Error::Type::ReadError, "Could not open file"});
        success = false;
    }

    val_.assign(std::istreambuf_iterator<char>(file), std::istreambuf_iterator<char>());
    // Read entire file into val_
}

This worked fine, it passed the unit tests, and I manually checked the output as well, which was also fine.

But this was just a placeholder, the actual Reader needed to remove comments. Which made me implement this:

// reader.cc
// Contains Reader Class Implementation
// Work In Progress, placeholders being used for now

#include <fstream>
#include <sstream>
#include "reader.hh"

void Reader::read(String const &filename)
{
    val_.clear();
    success = true; // success flag, true by default

    auto inStringLiteral = false;
    // Variable to determine if the reader is currently reading a string literal
    // (enclosed in double quotes)
    // In order to not mistake '//' inside literals as comments"

    auto file = std::ifstream{filename};

    if(!file)
    {
        log_error(Error{Error::Type::ReadError, "Cannot open file: " + filename});
        success = false;
        return;
    }

    for (unsigned char c; file >> c; )
    {
        // ASCII characters only use 7 bits, which is up to 127
        // So value of an ascii char must be lesser than 128
        if (c < 128)
        {
            if(c == '"')
            {
                inStringLiteral = !inStringLiteral; // flip the value of the boolean
            }

            if(!inStringLiteral && c == '/')
            {
                // If we're not inside a string literal enclosed in quotes, and find a backslash
                // Peek at the next character to check if it is a backslash
                // In case two consecutive backslashes are found, treat it as a comment and
                // ignore everything until the end of line
                if(file >> c)
                {
                    if(c == '/')
                    {
                        // keep reading until a newline is found
                        while(file >> c && c != '\n')
                        {
                        }
                    }

                    else
                    {
                        c = '/';
                        file.unget();
                    }
                }

                else
                {
                    c = '/';
                }
            }

            val_ += c;
        }

        else
        {
            log_error(Error{Error::Type::ReadError, "Unrecognized character(s) found in file: " + filename});
            success = false;
            return;
        }
    }
}

However, this weirdly causes the Unit Tests to fail.. I've compared the outputs of both versions of the read function, both have (apparently) the exact same output. Note I haven't actually checked for equality of the strings, but they do look the same. I've tried to find the reason for the error a lot but have failed....

Here's the Unit Test I'm using for the Reader (using GoogleTest):

#include <gtest/gtest.h>
#include "frontend/reader.hh"

TEST(ReaderTest, BaseTestCase)
{
    auto TestReader = Reader{};
    auto const ExpectedOutput = String{
R"delim(Int test = 0;
String test2 = "abcdefgh";
Float test3 = 0.9876;
)delim"};

    TestReader.read("TestFiles/ReaderTest_BaseTestCase.phi");

    ASSERT_FALSE(!TestReader);

    ASSERT_EQ(TestReader.val(), ExpectedOutput);
    // If Reader Base Test Case fails, no need to continue next tests
}

TEST(ReaderTest, Should_Fail_When_FileDoesNotExist)
{
    auto TestReader = Reader{};
    TestReader.read("Non_existent_test_file.txt");
    EXPECT_TRUE(!TestReader);
}

As I mentioned before, it worked quite fine for the first placeholder version, but the actual read function doesn't seem to pass the tests.... The weird thing is, the sample file doesn't even have any comments, here's the sample file the Reader reads:

Int test = 0;
String test2 = "abcdefgh";
Float test3 = 0.9876;

(Yes, that's literally it. Also as I mentioned before, the language the Reader is reading is not C++, but rather a homemade language I'm working on reading, but that's probably irrelevant for this question).

Oh and in case you need to compile this, you'd need to implementation and definition of errhandler (errhandler.hh and errhandler.cc), I'll put them here as well:

Declaration (errhandler.hh):

    // errhandler.hh
    // Contains Phi Error Handling specifications
    // Mostly complete, minor changes still might be made though

    #ifndef PHI_SRC_FRONTEND_ERRHANDLER_HH
    #define PHI_SRC_FRONTEND_ERRHANDLER_HH

    #include <iostream>
    #include "utils.hh"

    class Error
    {
    public:
        enum class Type : unsigned char
        {
            ReadError, LexError, ParseError, SemanticError, InterpretError
        };

        Error() = delete;
        Error(Type type__, String const &val__) : type_(type__), val_(val__) {}

        auto type() const -> Type { return type_;  }
        auto val() const -> String { return val_; }

        friend auto operator<<(std::ostream &stream, Error const &error) -> std::ostream&
        {
            return stream << error.val();
        }

    private:
        Type type_;
        String val_;
    };

    class ErrorLog
    {
    public:
        using iterator = Vector<Error>::iterator;
        using const_iterator = Vector<Error>::const_iterator;
        using reverse_iterator = Vector<Error>::reverse_iterator;
        using const_reverse_iterator = Vector<Error>::const_reverse_iterator;

        void push(Error const &error) { errors.push_back(error); }
        void pop() { errors.pop_back(); }

        auto size() const -> Size { return errors.size(); }

        auto operator[](Size index) -> Error& { return errors[index]; }
        auto operator[](Size index) const -> Error const& { return errors[index]; }

        auto begin() -> iterator { return errors.begin(); }
        auto end() -> iterator { return errors.end(); }

        auto cbegin() const -> const_iterator { return errors.cbegin(); }
        auto cend() const -> const_iterator { return errors.cend(); }

        auto rbegin() -> reverse_iterator { return errors.rbegin(); }
        auto rend() -> reverse_iterator { return errors.rend(); }

        auto crbegin() -> const_reverse_iterator { return errors.crbegin(); }
        auto crend() -> const_reverse_iterator { return errors.crend(); }

        friend auto operator<<(std::ostream &stream, ErrorLog const &error_log) -> std::ostream&
        {
            for (Size i = 0; i < error_log.size(); i++)
                stream << error_log[i];

            return stream;
        }

    private:
        Vector<Error> errors;
    };

    void log_error(Error const &error);
    void show_errors(std::ostream& stream);

    extern ErrorLog errlog;
    // The global error log to be used by every part of the Phi system
    // To be declared in main()

    #endif

Definition (errhandler.cc):

// errhandler.cc
// Contains Phi Error Handling implementation
// Work In Progress, placeholders are temporarily being used

#include "errhandler.hh"

void log_error(Error const& error)
{
    errlog.push(error);
}

void show_errors(std::ostream& stream)
{
    stream << errlog;
}

And you also, finally, need the utils header, containing the utilities

// utils.hh
// Contains globally used utility functions, templates, using declarations, etc.

#ifndef PHI_SRC_UTILS_HH
#define PHI_SRC_UTILS_HH

#include <string>
#include <vector>
#include <limits>

#define UNUSED(x) (void)x

template <typename T>
using Vector = std::vector<T>;
using String = std::string;
using Size = std::size_t;

class bad_narrow_cast : public std::bad_cast
{
public:
    bad_narrow_cast(const char* message) : what_(message)
    {
    }

    char const *what() { return what_; }

private:
    char const *what_;
};

template <typename Target, typename Base> static inline
typename std::enable_if<std::numeric_limits<Target>::is_specialized,
Target>::type narrow_cast(Base const &base)
{
    if(base > static_cast<Base>(std::numeric_limits<Target>::max()) ||
        base < static_cast<Base>(std::numeric_limits<Target>::min()))
    {
        throw(bad_narrow_cast((String() + "Invalid narrowing conversation from type " +
        typeid(Target).name() + " to type " + typeid(Base).name()).c_str()));
    }

    return static_cast<Target>(base);
}

template <typename Target, typename Base> static inline
typename std::enable_if<!std::numeric_limits<Target>::is_specialized,
Target>::type narrow_cast(Base const &base)
{
    Target target = static_cast<Target>(base);
    Base narrowed_base = static_cast<Base>(target);

    if (base == narrowed_base)
        return target;

    throw(bad_narrow_cast((String() + "Invalid narrowing conversation from type " +
    typeid(Target).name() + " to type " + typeid(Base).name()).c_str()));
}

#endif

This question really halted my progress on the project. Helping me solve it would be really helpful

Famiu
  • 51
  • 7
  • May I recommend a technique called debugging. The goal would be to visually inspect the bytes of the strings that look the same but aren't. – rustyx Oct 27 '19 at 14:14
  • 1
    Likely relevant: [Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons) – Igor Tandetnik Oct 27 '19 at 14:19
  • @rustyx Well I *would* do that but even then it probably won't make any actual sense on *why* it fails the unit tests. I can't understand why the codes would have different outputs, they're supposed to do the exact same thing, since the file doesn't even have comments – Famiu Oct 27 '19 at 14:20
  • @IgorTandetnik Oh wow, I didn't know about that, gonna try it now, thanks a lot for informing me – Famiu Oct 27 '19 at 14:21
  • @uneven_mark That's a raw string, and what I used is a delimiter for the raw string.... – Famiu Oct 27 '19 at 14:41
  • @IgorTandetnik I tried to use >> instead of .eof(), I updated the code in my PC and in the question, but the problem still persists – Famiu Oct 27 '19 at 14:44
  • @Arnil Oops, sorry. – walnut Oct 27 '19 at 14:46
  • @IgorTandetnik I finally fixed the problem, I had to use std::noskipws along with >>, thanks for all the help – Famiu Oct 27 '19 at 15:07

1 Answers1

0

So, people in the comments really helped me, and suggested me to use >> for reading characters from the stream instead of eof() and get(), as eof() is unreliable.. But even that didn't solve the problem. Until I, through some googling, figured it out myself, I had to use std::noskipws, in order to make the >> operator not skip whitespaces, and then it worked. Thanks for all the help, I really appreciate it

Famiu
  • 51
  • 7