why this C++ code is slower compared to C# variant

Question

Recently, we had a requirement where there are more that 100,000 xml files and all of them needed modification of particular data in the xml. Simple perl command would do the job but perl was not installed on the machine where the files are located. Hence I wrote a small C# code to do the job.

private static void ModifyXML(string[] args)
{
      Stopwatch sw = new Stopwatch();
      sw.Start();
      string path = @args[0];
      string opath = @args[1];
      string token = "value_date=\"20121130\"";
      string target = "value_date=\"20121019\"";

      Parallel.ForEach(Directory.EnumerateFiles(path), (file) =>
      {
           StringBuilder sb = new StringBuilder(File.ReadAllText(file));
           sb.Remove(0, 55);
           sb.Replace(token, target);
           var filename = file.Split(new char[] { '\\' }).Last();                
           File.WriteAllText(string.Format("{0}\\{1}", opath, filename), sb.ToString());
       });
       TimeSpan ts = sw.Elapsed;
       Console.WriteLine("Took {0} secs", ts.TotalSeconds);
}

I decided to implement C++ version. It turned out that the C++ version was not significantly faster than C# version. In ran both versions sevaral times.In fact, it's as fast as C# version during some of the runs.

For C# I used .NET 4.0 and for C++ it's VC10.

void FileHandling(std::string src, std::string dest)
{
     namespace fs = boost::filesystem;
    auto start = boost::chrono::system_clock::now();
    string token = "value_date=\"20121130\"";
    string target = "value_date=\"20121019\"";
    fs::directory_iterator end_iter;
    fs::directory_iterator dir_itr(src);
    vector<fs::path> files;
    files.insert(files.end(), dir_itr, end_iter);
    string dest_path = dest + "\\";
    parallel_for_each(files.begin(), files.end(), [=](const fs::path& filepath)
    {
        ifstream inpfile (filepath.generic_string());
        string line;
        line.insert(line.end(), istreambuf_iterator<char>(inpfile), istreambuf_iterator<char>());
        line.erase(0, 55);
        auto index = line.find(token, 0);
        if (index != string::npos)
        {
            line.replace(index, token.size(), target);
        }
        ofstream outfile(dest_path + filepath.filename().generic_string());
        outfile << line;
    });

    boost::chrono::duration<double> finish = boost::chrono::system_clock::now() - start;
    std::cout << "Took " << finish.count() << " secs\n";
}

@JesusRamos, What's point of C++ performance argument is all about then ? — Jagannath, Feb 20 '13 at 05:47
Your program is likely to be waiting for file IO most of the time. That's not where I'd expect any language to be faster than others significantly. — atoMerz, Feb 20 '13 at 05:47
Take a look at this question for information on how to read an entire file into a string quickly: http://stackoverflow.com/q/2602013/951890 — Vaughn Cato, Feb 20 '13 at 05:54
As an aside, I'd also note that for a case like this, you might well find that serial processing is faster than parallel. A hard drive can only supply one stream of data at a time, and processing many files in parallel may lead to more disk seeking. — Jerry Coffin, Feb 20 '13 at 05:55
@JerryCoffin, I tried the serial version as well before posting here. It was slower than parallel version. The server on which code is run has 24 cores. Since it's a disc IO issue it's not relevant I believe. — Jagannath, Feb 20 '13 at 06:04
@Jagannath: The question would be less about how many cores than how many disk drives/total disk bandwidth available. — Jerry Coffin, Feb 20 '13 at 06:06
a) see if your CPU is at 100% during execution b) try to kill [=] — NoSenseEtAl, Feb 27 '13 at 11:10

score 7 · Answer 1 · answered Feb 20 '13 at 05:51

Seems like you have many files with too little job done on them, so the main bottleneck is disk IO here. If you had some complex and CPU consuming task for each file you could had C++ version faster, but on small tasks it is irrelevant since IO is the problem

score 3 · Answer 2 · answered Feb 20 '13 at 05:50

3

in spite of some peoples perception, c# isn't slow at all as long as you don't use certain slow features such as reflection - in fact, people end up coding faster and with fewer obscure bugs so have more time to to spend optimize performance and logic, not bug fixing, meaning it ends up being faster...

other than that you are using more common libs in the C# code, which are generally written and optimized well by the MS devs - compared to having to roll your own functions in the c++ code..

answered Feb 20 '13 at 05:50

Rob

41
3

3

C# does have some overheads that aren't present in C++ or other languages that are natively compiled, since in general there are things like JIT happening to convert IL to native code, and the various things that can slow you down like boxing and such. The productivity argument is largely irrelevant... people code best in what they know best, so a C++ programmer coding in C# is less productive than a C++ programmer coding in C++. It's all relative. Good code in C# can beat bad code in C++ and vice versa. – Corey Feb 20 '13 at 05:56
1

This is far too general and isn't really a proper answer to the question. Productivity has nothing to do with the question that has been asked, which is about why one piece of code runs faster than the other. – David Saxon Feb 20 '13 at 06:18

score 0 · Answer 3 · answered Feb 20 '13 at 06:05

When you ‘compile’ C# code, the ‘compiler’ generates intermediate language code (MSIL), this code is then compiled at runtime by the JIT compiler of the dotnet framework to native code. The JIT compiled code is highly optimised on to the environment that you executing the code on. This happens only once for each function and once the function is compiled to native code, it will be reused till the application is terminated. So if you have one function called over and over again, JIT generate and optimised code may outperform generally compiled C++ code

why this C++ code is slower compared to C# variant

3 Answers3