0

The following links I have used to helped get closer to a solution, however I am still experiencing read, memory, or looping errors. I am looping through a logfile and extracting the json. Before the start of the json objects there is a date and time along with message and object ID - Which corresponds to the json. So both are needed. Time is also a factor as the log file grows. I need help figuring out where I am going so wrong.

https://riptutorial.com/cplusplus/example/19029/string-streams

Multilines regex in C++

https://www.codeproject.com/Questions/1221494/Simple-multiline-regex-in-Cplusplus

http://www.cplusplus.com/reference/iterator/next/

Stringstream c++ while loop

I can use regex on a string no problem, and as a multiline. Reading from a file using stringstream I have while(input >> sstr.rdbuf()); where my stream is now buffered to my understanding.

when I cout << sstr.str() it is only read 1 time

std::ifstream input("log.txt");
std::stringstream sstr;
std::smatch m;
std::regex reg("(\\{|\\[)(\\n\\s+.*)+\\n*(\\}||\\])"); 

while (input >> sstr.rdbuf());

std::string strang = sstr.str();
while (std::regex_search(strang, m, reg)) {
    std::cout << "Results : \n" << m.str() << '\n';
    for (i = 0; i < strang.length(); i++) {
        std::cout << m.str(i);
        i++;
    }
}

This seems to loop the file forever if the file is small. For larger files 30MB+ there is no output.

I am looking at vectors and hashmaps, but I am not certain how to apply regex to hashmap- seems odd. Ontop of that I have learned that vectors only store upto about 30 variables anyway, so this type of workload is too much.

Thanks!

Another Variation

void PrintMatches(std::string str, std::regex reg) { 
    std::smatch matches;
    std::cout << matches.size() << std::endl;

}
int main() {
    std::ifstream input("log.txt");
    std::stringstream sstr;
    std::smatch m;
    std::regex reg("(\\{|\\[)(\\n\\s+.*)+\\n*(\\}||\\])");

    while (input >> sstr.rdbuf());

    std::string str = sstr.str();
    std::cout << str;
    //PrintMatches(str, reg);

    return 0;
}


imhotep
  • 47
  • 9
  • Please provide all the definitions of the variables used then add examples of how the program is called, and the expected output. – Robert Andrzejuk Oct 25 '19 at 17:56
  • 1
    `strang ==sstr.str();` is comparison, not assignment. – Barmar Oct 25 '19 at 18:02
  • Your `while` loop doesn't change any of the variables, so it's an infinite loop. Should that be `if` instead of `while`? – Barmar Oct 25 '19 at 18:04
  • When I run as an if Statement it just fails. I agree that the while is making it infinite. – imhotep Oct 25 '19 at 18:09
  • Also, what exactly are you trying to match? you regex just specifies an empty group. – zdan Oct 25 '19 at 18:10
  • @imhotep: With an `if`, it would either run or not run. With a `while`, it either runs forever, or never runs. `while` isn't improving things. I answered what I could (I have limited familiarity with `std::regex_search`), but we'd need a [MCVE] to provide a complete answer. – ShadowRanger Oct 25 '19 at 18:10
  • For the record "apply regex to hashmap" is nonsensical. Hashmaps require exact matches, you can't fuzzy match their keys. And "I have learned that vectors only store upto about 30 variables anyway" is also nonsense. `vector`s will store at least a few hundred MB of data on 32 bit systems (limited by virtual address space/RAM), and potentially TB of data on 64 bit systems (limited more by RAM than physical address space). If you have an implementation of `vector` that fails beyond 30 items, each item had better be 10s of MB in size on a 32 bit system, or gigabytes in size on a 64 bit system. – ShadowRanger Oct 25 '19 at 18:16

1 Answers1

0

while (input >> sstr.rdbuf()); makes no sense. operator>> on a streambuf slurps in one action, or it fails. If it fails, it almost certainly won't succeed no matter how many times you retry, and at least some of those failure modes (e.g. insertion into the output sequence failing) won't change the "truthiness" of the istream, so the loop will become infinite (which could potentially explain why you see no output on larger files, though it would be odd for slurping to fail on files that small). Take a look at an efficient (if perhaps slightly overly compact) file slurping implementation here (which will avoid at least one unnecessary copy that your code requires).

Another problem is:

strang ==sstr.str();

which is comparing an empty string to a temporary, then throwing away the result; presumably you wanted:

strang = sstr.str();

Also, this loop never runs:

for (i = 0; i > strang.length(); i++) {

By testing i > strang.length(), i is always 0 on first test, and strang will always be greater than or equal to it, so the inner loop never runs.

Your regex pattern is empty (I'm guessing omitted for brevity, but if it's really just capturing nothing, I have no idea what you were trying to do).

Finally, your while (std::regex_search(strang, m, reg)) { never changes strang, m or reg; it's either never going to run, or it will loop forever.

ShadowRanger
  • 108,619
  • 9
  • 124
  • 184
  • Your points are valid; I made some corrections thats should help. – imhotep Oct 25 '19 at 18:14
  • what would be better than strang.length()? I'd rather use sstr.eof()... – imhotep Oct 25 '19 at 18:17
  • @imhotep: The problem is that you're testing `>`, rather than ` – ShadowRanger Oct 25 '19 at 18:24
  • eof() can't be used anyway. Any ideas on how to convert the file into a string for regex to work? vector and .push_back()? or sstream_iterator? Or simply make it read 1 time. The idea of push_back() or iterator is just to read the lines and append to a string maybe? Im expecting the log file to be very very large, so scaleability is at the core. – imhotep Oct 25 '19 at 19:14
  • @imhotep: I linked the standard "slurp a whole file" solution in my first paragraph... If the file is large enough that you can't hold it in memory, you may want to look at OS specific means of memory mapping the file (`boost` provides portable wrappers for it IIRC), so it can behave like it's in memory, while seamlessly loading and releasing pages under the hood (without paging out read-only data, which a normal slurp to memory would do). – ShadowRanger Oct 25 '19 at 19:25