4

when using C++ vector, time spent is 718 milliseconds, while when I use Array, time is almost 0 milliseconds.

Why so much performance difference?

int _tmain(int argc, _TCHAR* argv[])
{
const int size = 10000; 
clock_t start, end; 
start = clock();
vector<int> v(size*size); 
for(int i = 0; i < size; i++)
{  
    for(int j = 0; j < size; j++)
    {   
        v[i*size+j] = 1;  
    } 
} 
end = clock();
cout<< (end - start)
    <<" milliseconds."<<endl; // 718 milliseconds

int f = 0;
start = clock(); 
int arr[size*size]; 
for(int i = 0; i < size; i++)
{  
    for(int j = 0; j < size; j++)
    {   
        arr[i*size+j] = 1;  
    } 
} 
end = clock();
cout<< ( end - start)
    <<" milliseconds."<<endl; // 0 milliseconds
return 0;
}
Joel Coehoorn
  • 362,140
  • 107
  • 528
  • 764
blue_river
  • 41
  • 1
  • 2
  • How is this compiled? Are optimizations enabled? And which compiler are you using? – jalf Dec 22 '09 at 13:05
  • 1
    int arr[size*size]: Causes a segmentation fault on my machine (when built without optimization); as putting this on the stack seems to exceeds the size of the stack frame allowed by the compiler: (gcc version 4.2.1 (Apple Inc. build 5646)) http://stackoverflow.com/questions/216259/is-there-a-max-array-length-limit-in-c/216731#216731 – Martin York Dec 22 '09 at 18:57
  • Funnily enough, the time for the array goes up to around 500ms on my machine as well if you allocate it dynamically (which you should since it's so big. As Martin York says, you're asking for a stack overflow otherwise) – jalf Dec 23 '09 at 13:38
  • vector is used specifically for random access of arrays, therefore I assume it's performance may come at a trade off, and would best be used in situations where random access (insertion and deletion of random elements of array) is seeked to be optimized. – marshal craft Oct 29 '16 at 03:30
  • Note that the only right answer is the one given by el.pescado below. If the array hadn't been optimized away, then you would get a stack overflow too like Loki Astari commented. – migle Jan 09 '18 at 11:50
  • To prove that the array writing loop is not executing, consider that you are writing a 400MB array. Writing it (once or twice, it doesn't matter much) in 718 ms means 530MB/s (or double), plausible numbers. But writing it in only 10 ms would bean 38GB/s. – migle Jan 09 '18 at 12:01

8 Answers8

22

Your array arr is allocated on the stack, i.e., the compiler has calculated the necessary space at compile time. At the beginning of the method, the compiler will insert an assembler statement like

sub esp, 10000*10000*sizeof(int)

which means the stack pointer (esp) is decreased by 10000 * 10000 * sizeof(int) bytes to make room for an array of 100002 integers. This operation is almost instant.

The vector is heap allocated and heap allocation is much more expensive. When the vector allocates the required memory, it has to ask the operating system for a contiguous chunk of memory and the operating system will have to perform significant work to find this chunk of memory.

As Andreas says in the comments, all your time is spent in this line:

vector<int> v(size*size); 

Accessing the vector inside the loop is just as fast as for the array.

For an additional overview see e.g.

Edit:

After all the comments about performance optimizations and compiler settings, I did some measurements this morning. I had to set size=3000 so I did my measurements with roughly a tenth of the original entries. All measurements performed on a 2.66 GHz Xeon:

  1. With debug settings in Visual Studio 2008 (no optimization, runtime checks, and debug runtime) the vector test took 920 ms compared to 0 ms for the array test.

    98,48 % of the total time was spent in vector::operator[], i.e., the time was indeed spent on the runtime checks.

  2. With full optimization, the vector test needed 56 ms (with a tenth of the original number of entries) compared to 0 ms for the array.

    The vector ctor required 61,72 % of the total application running time.

So I guess everybody is right depending on the compiler settings used. The OP's timing suggests an optimized build or an STL without runtime checks.

As always, the morale is: profile first, optimize second.

Community
  • 1
  • 1
Sebastian
  • 4,673
  • 21
  • 48
  • 3
    +1 Yes, move `vector v(size*size); ` out of the timing and there shouldn't be any difference. – Andreas Brinck Dec 22 '09 at 11:25
  • 4
    you may also need to allow the compiler to inline stuff to get the same speeds of course, i.e. don't compare the speeds with optimizations off – jk. Dec 22 '09 at 11:35
  • Switching optimisations on is indeed the key. C++ is designed to take advantage of compiler optimisations - if you don't use them, performance will definitely suffer. –  Dec 22 '09 at 11:47
  • 1
    Or you could make the array be: int* arr = new int[size*size]; which would use a memory allocation. However don't include these setup costs in your timing unless it is relevant to what you want to measure. – Daemin Dec 22 '09 at 12:04
  • 1
    @Daemin that's what I was thinking. It's really the only fair comparison. Once the memory is allocated, it shouldn't matter whether it came from the stack or the heap, but yes: making a system call to allocate memory is going to be expensive. – San Jacinto Dec 22 '09 at 12:53
  • I highly doubt it takes 700ms to perform a single heap allocations. – jalf Dec 22 '09 at 13:03
  • @jalf: In a debug build? With a debug heap and checking iterators? – Sebastian Dec 22 '09 at 15:32
  • Yes. Checked iterators do not affect the time taken for heap allocations, which was what your post claimed. Of course *other* aspects of `std::vector` cause the slowdown in a debug build, but it is certainly not the single call to `new`. – jalf Dec 22 '09 at 20:27
  • Checked iterators can slow down access. C++ doesn't forbid `[]` from doing bounds checks (it requires `.at()` to perform them), and it's perfectly reasonable for a debug build to check. – David Thornley Dec 22 '09 at 22:36
  • 1
    The difference between stack and heap allocation should not be able to account for 718 milliseconds of time. – Omnifarious Dec 23 '09 at 00:12
  • 1
    Sounds great, too bad it's not true. 718ms to allocate a single allocation on a clean heap? The real answer is the operator[] is much slower in a vector. – Charles Eli Cheese Dec 23 '09 at 03:03
9

If you are compiling this with a Microsoft compiler, to make it a fair comparison you need to switch off iterator security checks and iterator debugging, by defining _SECURE_SCL=0 and _HAS_ITERATOR_DEBUGGING=0.

Secondly, the constructor you are using initialises each vector value with zero, and you are not memsetting the array to zero before filling it. So you are traversing the vector twice.

Try:

vector<int> v; 
v.reserve(size*size);
xcut
  • 6,028
  • 30
  • 25
  • After `vector::reserve` you have to call `vector::push_back` to increase the vector's size. Using an unchecked `operator[]` would work, but it'd be evil. `vector::resize` would also initialize with 0. – Sebastian Dec 23 '09 at 09:59
3

Change assignment to eg. arr[i*size+j] = i*j, or some other non-constant expression. I think compiler optimizes away whole loop, as assigned values are never used, or replaces array with some precalculated values, so that loop isn't even executed and you get 0 milliseconds.

Having changed 1 to i*j, i get the same timings for both vector and array, unless pass -O1 flag to gcc, then in both cases I get 0 milliseconds.

So, first of all, double-check whether your loops are actually executed.

el.pescado
  • 17,764
  • 2
  • 43
  • 82
  • Oh, God! This is the right answer and all other answers are way off. Does anyone think that stack or heap allocation of a 400MB array could matter? Or even that a 400MB can be allocated on the stack in some implementation? It's a pity this one is not upvoted enough. – migle Jan 09 '18 at 11:48
3

To get a fair comparison I think something like the following should be suitable:

#include <sys/time.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include <numeric>


int main()
{
  static size_t const size = 7e6;

  timeval start, end;
  int sum;

  gettimeofday(&start, 0);
  {
    std::vector<int> v(size, 1);
    sum = std::accumulate(v.begin(), v.end(), 0);
  }
  gettimeofday(&end, 0);

  std::cout << "= vector =" << std::endl
        << "(" << end.tv_sec - start.tv_sec
        << " s, " << end.tv_usec - start.tv_usec
        << " us)" << std::endl
        << "sum = " << sum << std::endl << std::endl;

  gettimeofday(&start, 0);
  int * const arr =  new int[size];
  std::fill(arr, arr + size, 1);
  sum = std::accumulate(arr, arr + size, 0);
  delete [] arr;
  gettimeofday(&end, 0);

  std::cout << "= Simple array =" << std::endl
        << "(" << end.tv_sec - start.tv_sec
        << " s, " << end.tv_usec - start.tv_usec
        << " us)" << std::endl
        << "sum = " << sum << std::endl << std::endl;
}

In both cases, dynamic allocation and deallocation is performed, as well as accesses to elements.

On my Linux box:

$ g++ -O2 foo.cpp 
$ ./a.out 
= vector =
(0 s, 21085 us)
sum = 7000000

= Simple array =
(0 s, 21148 us)
sum = 7000000

Both the std::vector<> and array cases have comparable performance. The point is that std::vector<> can be just as fast as a simple array if your code is structured appropriately.


On a related note switching off optimization makes a huge difference in this case:

$ g++ foo.cpp 
$ ./a.out 
= vector =
(0 s, 120357 us)
sum = 7000000

= Simple array =
(0 s, 60569 us)
sum = 7000000

Many of the optimization assertions made by folks like Neil and jalf are entirely correct.

HTH!

EDIT: Corrected code to force vector destruction to be included in time measurement.

Void
  • 3,321
  • 16
  • 18
  • Vector deallocation is only done after the vector test end time is measured here, at the end of the block, isn't it? This makes the comparison code slightly unfair. – Olli Etuaho Aug 17 '11 at 22:28
  • @Olli: Good point! I've updated the code and results accordingly. With the correction in place, the `std::vector<>` case is no longer consistently faster, but it is still consistently comparable to the array case - sometimes slightly faster, sometimes slightly slower. Thanks for pointing out the problem in the code! – Void Aug 31 '11 at 22:28
2

You are probably using VC++, in which case by default standard library components perform many checks at run-time (e.g whether index is in range). These checks can be turned off by defining some macros as 0 (I think _SECURE_SCL).

Another thing is that I can't even run your code as is: the automatic array is way too large for the stack. When I make it global, then with MingW 3.5 the times I get are 627 ms for the vector and 26875 ms (!!) for the array, which indicates there are really big problems with an array of this size.

As to this particular operation (filling with value 1), you could use the vector's constructor:

std::vector<int> v(size * size, 1);

and the fill algorithm for the array:

std::fill(arr, arr + size * size, 1);
visitor
  • 7,738
  • 2
  • 24
  • 15
1

Two things. One, operator[] is much slower for vector. Two, vector in most implementations will behave weird at times when you add in one element at a time. I don't mean just that it allocates more memory but it does some genuinely bizarre things at times.

The first one is the main issue. For a mere million bytes, even reallocating the memory a dozen times should not take long (it won't do it on every added element).

In my experiments, preallocating doesn't change its slowness much. When the contents are actual objects it basically grinds to a halt if you try to do something simple like sort it.

Conclusion, don't use stl or mfc vectors for anything large or computation heavy. They are implemented poorly/slowly and cause lots of memory fragmentation.

0

When you declare the array, it lives in the stack (or in static memory zone), which it's very fast, but can't increase its size.

When you declare the vector, it assign dynamic memory, which it's not so fast, but is more flexible in the memory allocation, so you can change the size and not dimension it to the maximum size.

Khelben
  • 5,537
  • 4
  • 29
  • 46
0

When profiling code, make sure you are comparing similar things.

vector<int> v(size*size); 

initializes each element in the vector,

int arr[size*size]; 

doesn't. Try

int arr[size * size];
memset( arr, 0, size * size );

and measure again...

DevSolar
  • 59,831
  • 18
  • 119
  • 197
  • I disagree - it is a flaw of `vector` that even with POD types, there is no way to avoid initialization in the case where you're going to manually set every element immediately afterwards. It is absolutely right that a benchmark of vector vs. array should show that array is faster in cases where you don't need zero-initialization. That said, in this case he's manually initializing all the values to 1, so it might be more fair to compare the array code as it is, against `vector v(size*size,1);` – Steve Jessop Dec 22 '09 at 15:58
  • Have you tried `vector v(0); v.resize( DESIRED_SIZE );`? It should result in an empty, zero-sized vector being assigned, which is then re-sized to DESIRED_SIZE, without any constructors / initialisation. – DevSolar Dec 22 '09 at 16:49
  • 2
    No, `resize` is really `void resize(size_type sz, T c = T())`. Same deal as the constructor, it initializes all the new values. – Steve Jessop Dec 22 '09 at 18:32
  • Are you absolutely positive about that? `resize()` changes `capacity()`, not `size()`...?!? – DevSolar Dec 23 '09 at 13:33