15

Today I downloaded and created a sample project of the Electronic Arts STL implementation and the EA's vector is looks much slower for me than the standard. I just created 2 vectors and uploading them with 1 million of items:

void performance_test(void)
{
    clock_t start;
    clock_t end;


    // EA

    eastl::string strEA = "hello";
    eastl::vector<eastl::string> vec_EA;

    start = clock();
    for (size_t i = 0; i < 1000000; i++)
    {
        vec_EA.push_back(strEA);
    }

    end = clock();
    printf("EA       %f\n", (double(end - start) / 1000));

    // Standard

    std::string strStandard = "hello";
    std::vector<std::string> vec_Standard;

    start = clock();
    for (size_t i = 0; i < 1000000; i++)
    {
        vec_Standard.push_back(strStandard);
    }

    end = clock();
    printf("Standard %f\n", (double(end - start) / 1000));
}

And the results are:

  1. EA 0.759000
  2. Standard 0.064000

So, is there anything what I'm doing wrong or I just missed something? The sample has been compiled with v100 platform toolset.

leemes
  • 42,229
  • 18
  • 115
  • 172
CsOkemf
  • 175
  • 1
  • 1
  • 7
  • Why did you tag C? The compiler's own standard library can be optimized specially for that compiler. I'm not sure if it should be that order of magnitude, however. What is v100 platform toolset? – Neil Kirk Mar 03 '15 at 18:28
  • 2
    I'm not sure I would trust `clock` for timing, but when the difference is this great it should be OK. It would be worth trying an EASTL vector with a standard string and vice versa, to narrow down the problem. – Mark Ransom Mar 03 '15 at 18:36
  • @MarkRansom: Hmm. The EA's vector with standard string is faster than the standard vector, but the string shouldn't be faster too? – CsOkemf Mar 03 '15 at 18:49
  • 1
    The `std::vector` implementation may use an allocation strategy that reserves more space than requested - leading to fewer `new` calls. Furthermore, `std::string` might implement [short-string optimization](http://stackoverflow.com/questions/21694302/what-are-the-mechanics-of-short-string-optimization-in-libc). – Brett Hale Mar 03 '15 at 18:52
  • 1
    You might want to try some of the following: 1) switch the order of benchmarking of the `eastl` objects and the `std` object to see if the first loop is somehow 'priming' caches or memory; 2) at the end of the benchmark, print information (size, the contents of some elements, etc) from each vector to try to make sure operations aren't being elided due to the optimizer recognizing that the `vec_Xxxx` objects aren't actually used; 3) examine or step through the assembly code to see if anything jumps out as a reason; 4) let us know exactly what compiler options are being used to build – Michael Burr Mar 03 '15 at 18:53
  • @MichaelBurr I changed the orders, now EA's stl is the second but the result is the same. It's faster just with standard string. – CsOkemf Mar 03 '15 at 19:03
  • Take a look at the performance section of this. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html Nothing was listed for vector of strings. I can't say for other games or game libraries, but the games I worked on there was very little use cases for vector of strings. They are usually static as in initialized and never updated again, only accessed when needed; thus in those cases an different data structure would be used for memory efficiency. In the case of EASTL, my guess would that there was little use cases, so no optimization was done for push_back on vector of strings. – Tony J Mar 03 '15 at 19:15
  • @TonyJ: I'm working with a very old engine and I'm just trying to make it faster a bit. I thought EASTL should be a great solution for me. – CsOkemf Mar 03 '15 at 19:22
  • MSVC's v100 platform toolset (VS 2010) supports move semantics for strings. I'm not sure if the EASTL originally did, but it looks like at least some support for move semantics can be enabled in the version available at https://github.com/paulhodge/EASTL if you define the macro `EA_COMPILER_HAS_MOVE_SEMANTICS`. I'd be interested in hearing if enabling that configuration helps. – Michael Burr Mar 03 '15 at 19:41
  • @MichaelBurr: This macro is already defined. – CsOkemf Mar 03 '15 at 19:49
  • I would measure the performance of some actual code with `std::vector` vs. `eastl::vector`. This kind of artificial test will lead to artificial results. – Khouri Giordano Mar 03 '15 at 21:40
  • Interesting - I pulled down the EASTL and ran the test myself and I get the following results in VS2010 with a release mode build: `EA 0.065000` and `Standard 0.072000`. I used the `new` overloads that are required by the EASTL which are in `example/example1.cpp` (basically they just call `malloc()`). I don't know why we see such different results for the EASTL part of the benchmark. – Michael Burr Mar 04 '15 at 01:41
  • CppCon 2015: Scott Wardle “Memory and C++ debugging at Electronic Arts” including EASTL - https://www.youtube.com/watch?v=8KIvWJUYbDA – rbento Feb 22 '21 at 20:03

1 Answers1

19

When I run your benchmark with a release build in VS 2010, I get results similar to what one might hope for:

EA       0.063000
Standard 0.073000

However, when I run the same release build under the VS debugger, the results change dramatically:

EA       1.293000
Standard 0.080000

And it takes even longer (tens of seconds) for whatever object cleanup occurs. Keep in mind - this is the same release mode build, not a debug build.

I haven't looked into why EASTL is impacted so severely by the debugger environment. I assume it has something to do with the debug heap.


Update (4 March 2015):

Another detail that affects the results is the length of the string involved. VS uses a 'short string optimization' in std::string which will reduce the number of allocations that occur for string objects that have a value like "hello". If you change the initialized value for the strings used in the example from "hello" to "hello - but not too short", you'll get results more like the following:

// release build, run outside of the debugger
EA       0.078000
Standard 0.113000

// release build, run under the debugger
EA       0.762000
Standard 1.414000

And now it becomes evident that the order of magnitude difference when the benchmark is run under the debugger is likely due to the debug heap spending a lot of time tracking the string memory allocations.

Michael Burr
  • 311,791
  • 49
  • 497
  • 724
  • Hmm, can you please share the entire solution? Maybe I got a different compiler settings. – CsOkemf Mar 04 '15 at 08:24
  • `git clone https://github.com/mburr/eastl-test.git` then open the VS2010 solution in `eastl-test\vs2010-test\eastl-test\eastl-test.sln`. You may also want to read `eastl-test\vs2010-test\README-vs2010-test.txt` – Michael Burr Mar 05 '15 at 05:35
  • Thank you so much, now I'm sure there was something wrong with my settings. Now I'm getting your results. – CsOkemf Mar 05 '15 at 06:35
  • 4
    I'd be interested in knowing what the difference in the project is that caused you to always see the behavior, if you ever track that down. – Michael Burr Mar 05 '15 at 20:58
  • @MichaelBurr probably Whole Program Optimization, Inline Any or Profile Guided Optimization – Luc Bloom Jan 16 '18 at 13:09
  • 5 years after and I have a puzzle ... I have taken the @MichaelBurr repo and rebuilt locally with the all up to date VS2019, CL and CLANG. test loop size is 10 mils. specimen string is longer. EASTL passes ... but MSVC STL consistently kicks-the-bucket with "bad allocation" message from `std::exception` caught. repo is here `https://github.com/DBJDBJ/eastl-test` .... Any idea? – Chef Gladiator Aug 31 '20 at 14:59
  • changing the specimen to "Hello" produces the same result ... MVC STL: "bad allocation" – Chef Gladiator Aug 31 '20 at 15:03
  • shortening the loop to 1 mil makes MSVC STL happy ... Hmm... – Chef Gladiator Aug 31 '20 at 15:05