1

Recently, we had a requirement where there are more that 100,000 xml files and all of them needed modification of particular data in the xml. Simple perl command would do the job but perl was not installed on the machine where the files are located. Hence I wrote a small C# code to do the job.

private static void ModifyXML(string[] args)
{
      Stopwatch sw = new Stopwatch();
      sw.Start();
      string path = @args[0];
      string opath = @args[1];
      string token = "value_date=\"20121130\"";
      string target = "value_date=\"20121019\"";

      Parallel.ForEach(Directory.EnumerateFiles(path), (file) =>
      {
           StringBuilder sb = new StringBuilder(File.ReadAllText(file));
           sb.Remove(0, 55);
           sb.Replace(token, target);
           var filename = file.Split(new char[] { '\\' }).Last();                
           File.WriteAllText(string.Format("{0}\\{1}", opath, filename), sb.ToString());
       });
       TimeSpan ts = sw.Elapsed;
       Console.WriteLine("Took {0} secs", ts.TotalSeconds);
}

I decided to implement C++ version. It turned out that the C++ version was not significantly faster than C# version. In ran both versions sevaral times.In fact, it's as fast as C# version during some of the runs.

For C# I used .NET 4.0 and for C++ it's VC10.

void FileHandling(std::string src, std::string dest)
{
     namespace fs = boost::filesystem;
    auto start = boost::chrono::system_clock::now();
    string token = "value_date=\"20121130\"";
    string target = "value_date=\"20121019\"";
    fs::directory_iterator end_iter;
    fs::directory_iterator dir_itr(src);
    vector<fs::path> files;
    files.insert(files.end(), dir_itr, end_iter);
    string dest_path = dest + "\\";
    parallel_for_each(files.begin(), files.end(), [=](const fs::path& filepath)
    {
        ifstream inpfile (filepath.generic_string());
        string line;
        line.insert(line.end(), istreambuf_iterator<char>(inpfile), istreambuf_iterator<char>());
        line.erase(0, 55);
        auto index = line.find(token, 0);
        if (index != string::npos)
        {
            line.replace(index, token.size(), target);
        }
        ofstream outfile(dest_path + filepath.filename().generic_string());
        outfile << line;
    });

    boost::chrono::duration<double> finish = boost::chrono::system_clock::now() - start;
    std::cout << "Took " << finish.count() << " secs\n";
}
Jagannath
  • 3,918
  • 22
  • 28

3 Answers3

7

Seems like you have many files with too little job done on them, so the main bottleneck is disk IO here. If you had some complex and CPU consuming task for each file you could had C++ version faster, but on small tasks it is irrelevant since IO is the problem

ixSci
  • 10,620
  • 5
  • 35
  • 66
3

in spite of some peoples perception, c# isn't slow at all as long as you don't use certain slow features such as reflection - in fact, people end up coding faster and with fewer obscure bugs so have more time to to spend optimize performance and logic, not bug fixing, meaning it ends up being faster...

other than that you are using more common libs in the C# code, which are generally written and optimized well by the MS devs - compared to having to roll your own functions in the c++ code..

Rob
  • 41
  • 3
  • 3
    C# does have some overheads that aren't present in C++ or other languages that are natively compiled, since in general there are things like JIT happening to convert IL to native code, and the various things that can slow you down like boxing and such. The productivity argument is largely irrelevant... people code best in what they know best, so a C++ programmer coding in C# is less productive than a C++ programmer coding in C++. It's all relative. Good code in C# can beat bad code in C++ and vice versa. – Corey Feb 20 '13 at 05:56
  • 1
    This is far too general and isn't really a proper answer to the question. Productivity has nothing to do with the question that has been asked, which is about why one piece of code runs faster than the other. – David Saxon Feb 20 '13 at 06:18
0

When you ‘compile’ C# code, the ‘compiler’ generates intermediate language code (MSIL), this code is then compiled at runtime by the JIT compiler of the dotnet framework to native code. The JIT compiled code is highly optimised on to the environment that you executing the code on. This happens only once for each function and once the function is compiled to native code, it will be reused till the application is terminated. So if you have one function called over and over again, JIT generate and optimised code may outperform generally compiled C++ code

Chintana Meegamarachchi
  • 1,788
  • 1
  • 11
  • 10