1

I have asked this question a few days ago:

How to look for an ANSI string in a binary file?

and I got a really nice answer, what later turned into a much harder question: Can input iterators be used where forward iterators are expected? what is now really not on a level what I could understand.

I am still learning C++ and I am looking for an easy way to search for a string in a binary file.

Could someone show me a simple code for a minimalistic C++ console program which looks for a string in a binary file and outputs the locations to stdout?

Possibly, can you show me

  1. a version where the file is being copied to memory (supposing the binary file is small)

  2. and an other one which uses the proper way from the linked questions

Sorry if it sounds like I'm asking for someone's code, but I am just learning C++ and I think maybe others could benefit from this question if someone could post some high quality code what is nice to learn from.

Community
  • 1
  • 1
hyperknot
  • 12,019
  • 22
  • 87
  • 143

2 Answers2

2

Your requirement specification is unclear, for example - where does "121" appear in "12121"... just at the first character (after which searching continues at the 4th), or at the 3rd as well? The code below uses the former approach.

#include <iostream>
#include <fstream>
#include <string>
#include <string.h>

int main(int argc, const char* argv[])
{
    if (argc != 3)
    {
        std::cerr << "Usage: " << argv[0] << " filename search_term\n"
            "Prints offsets where search_term is found in file.\n";
        return 1;
    }

    const char* filename = argv[1];
    const char* search_term = argv[2];
    size_t search_term_size = strlen(search_term);

    std::ifstream file(filename, std::ios::binary);
    if (file)
    {
        file.seekg(0, std::ios::end);
        size_t file_size = file.tellg();
        file.seekg(0, std::ios::beg);
        std::string file_content;
        file_content.reserve(file_size);
        char buffer[16384];
        std::streamsize chars_read;

        while (file.read(buffer, sizeof buffer), chars_read = file.gcount())
            file_content.append(buffer, chars_read);

        if (file.eof())
        {
            for (std::string::size_type offset = 0, found_at;
                 file_size > offset &&
                 (found_at = file_content.find(search_term, offset)) !=
                                                            std::string::npos;
                 offset = found_at + search_term_size)
                std::cout << found_at << std::endl;
        }
    }
}
Tony Delroy
  • 94,554
  • 11
  • 158
  • 229
  • 2
    @ildjarn: true (but hey, it still runs *more than twice* as fast as your non-boost solution in my benchmarks ;-P) – Tony Delroy Jun 27 '11 at 02:58
  • 2
    Fair enough, I benchmarked and verified your results; I didn't expect copying from an `istreambuf_iterator` pair to be so slow. :-[ – ildjarn Jun 27 '11 at 04:03
  • 1
    @ildjarn: what happened with your code? Even if its not the fastest solution it might be a really good reference to have it here! I was planning on learning from all 4 solutions. – hyperknot Jun 28 '11 at 10:54
  • @ildjarn: zsero's right... you have good solutions to list... it might be something simple like not using reserve on the deque - I didn't have time to investigate - but that's not the point anyway: it might run faster in someone else's/future library implementation etc.... – Tony Delroy Jun 29 '11 at 01:01
1

This is one way to do part 1. Not sure I would I describe it as high quality but maybe on the minimalist side.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main(int argc, char *argv[])
{
    std::ifstream ifs(argv[1], ios::binary);

    std::string str((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());

    size_t pos = str.find(argv[2]);

    if (pos != string::npos)
        cout << "string found at position: " << int(pos) << endl;
    else
        cout << "could not find string" << endl;

    return 0;
}
Duck
  • 25,228
  • 3
  • 57
  • 86
  • Thx, works perfectly and it is really nice to read from! But my problem is that std::string str (std::istreambuf_iterator, std::istreambuf_iterator) is extremely slow. While the actual search takes almost no time to find the result. Is there any way to do the string creation faster? – hyperknot Jun 28 '11 at 16:10
  • @zsero - The iterators are slow. Faster ways are to (1) read buffers of data and search as you go along rather than reading the whole file, all of which may not be necessary, into memory; (2) drop down to more OS-specific things like memory-mapping or using OS-hint like posix_fadvise. Simply using a good buffer size and fstream.read() will be faster than this. – Duck Jun 28 '11 at 21:33