1

I want to speed up program processing here's code:

int main(){
ifstream fin("./europarl_clean_1-5000.en");
        ifstream fin2("./europarl_clean_1-5000.fr");
        while(!(fin.eof()&&fin2.eof())){
        string english,chinese;
            getline(fin,english);
            getline(fin2,chinese);
            fun1();
            fun2();
            fun3();
        }
}

two files contain over 5000 lines each file, I want to process more than 2 lines at a time ,that I could speed up program run-time , The code just process line by line , how can I rewrite it ??

  • 2
    To begin with, you should read [Why is iostream::eof inside a loop condition considered wrong?](http://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong). – Some programmer dude Jul 16 '16 at 05:52

1 Answers1

1

As for your problem, use two threads to read the files, one file per thread, into two vectors. Then you can process the vectors any way you want.

If the lines of data in the input files are not related you can use e.g. two threads per vector, each thread handles half of the vector. If the lines of input is related and you need to process them sequentially then you can use one thread each for the vectors. And if you need to process alternating lines from each vector then just a single thread.


A note about the reading of the files: Even if you use threads for the reading, the performance may actually be slower. It all depends on where the files are stored. If the files are both on a single mechanical hard-drive the performance might be worsened as the disk has to jump back and forth to read the files. If the files are on two separate disks, or on an SSD then the performance should increase by using one thread each to read the files.


Possible implementation for the reading of the files

auto reader = [](std::vector<std::string>& v, std::istream& f)
{
    std::string s;
    v.reserve(5000);  // Allocate space for 5000 strings
    while (std::getline(f, s))
        v.push_back(s);
};

std::vector<std::string> english;
std::vector<std::string> chinese;

auto future1 = std::async(std::launch::async, reader, std::ref(english), std::ref(fin));
auto future2 = std::async(std::launch::async, reader, std::ref(chinese), std::ref(fin2));

future1.wait();
future2.wait();

// Here all lines from fin will be in the vector english
// and all lines from fin2 will be in the vector chinese

The processing I really can't give you any code, not even pseudo-code, as the only one who knows how the data should be processed is you.

References:

Also, don't blindly use the code above. First of all it's not tested, it might give build errors or not work at all, but should be enough to give you idea on how to continue. And please try to understand what the code I showed actually is doing. Read the linked references, experiment and most importantly benchmark it to see if it's better than your current code. Like I said, the performance may vary or even be worse depending on what kind of disks and where on the disks the files are stored.

Some programmer dude
  • 363,249
  • 31
  • 351
  • 550
  • @yihanghwang I've added some example code for the reading of the file. Please continue to read the rest of the text as well, don't just copy it blindly. – Some programmer dude Jul 16 '16 at 11:19