Helps a vector for cache locality? (C++)

Question

Last week I have read about great concepts as cache locality and pipelining in a cpu. Although these concepts are easy to understand I have two questions. Suppose one can choose between a vector of objects or a vector of pointers to objects (as in this question).

Then an argument for using pointers is that shufling larger objects may be expensive. However, I'm not able to find when I should call an object large. Is an object of several bytes already large?

An argument against the pointers is the loss of cache locality. Will it help if one uses two vectors where the first one contains the objects and will not be reordered and the second one contains pointers to these objects? Say that we have a vector of 200 objects and create a vector with pointers to these objects and then randomly shuffle the last vector. Is the cache locality then lost if we loop over the vector with pointers?

This last scenario happens a lot in my programs where I have City objects and then have around 200 vectors of pointers to these Cities. To avoid having 200 instances of each City I use a vector of pointers instead of a vector of Cities.

[Explain my data](http://blog.explainmydata.com/2012/07/expensive-lessons-in-python-performance.html): "Profile everything! Profile mercilessly and relentlessly. Let no assumption about efficiency pass without being put under the cold and uncaring gaze of a profiler." — Ami Tavory, May 18 '15 at 07:10
It really depends on a lot of things, and the only definitive answer is to benchmark all options. Are you often iterating linearly over your vector? Are you often shuffling the objects around? Try all three solutions, benchmark, then benchmark, and benchmark some more! — user703016, May 18 '15 at 07:11
Profiling is definitly true! However I like to start with the solution which is most likely the best one since I find all options in this case equally easy. Furthermore I have really no idea when an object is called large, independent of the performance. — Michiel uit het Broek, May 18 '15 at 07:14
@MichielUitHetBroek There is no right in all cases answer. For instance, is the access pattern random or sequential or both? You need to profile. Only then can you find where the hot spots are. You might even find that the vector is not a performance issue at all thus saving you wasting you time. — Richard Critten, May 18 '15 at 07:36
Profiling is definitely necessary, but in general, std::vector gives you better cache locality, and a sorted vector is sometimes better than my favorite data structure in the STL, the std::map. Here is one good discussion: http://stackoverflow.com/questions/24542936/vector-vs-map-performance-confusion — Erik Alapää, May 18 '15 at 07:58
@RichardCritten the acces is in random order and in sequential order. — Michiel uit het Broek, May 20 '15 at 16:04

score 3 · Accepted Answer · answered May 18 '15 at 07:51

There is no simple answer to this question. You need to understand how your system interacts with regards to memory, what operations you do on the container, and which of those operations are "important". But by understanding the concepts and what affects what, you can get a better understanding of how things work. So here's some "discussion" on the subject.

"Cache locality" is largely about "keeping things in the cache". In other words, if you look at A, then B, and A is located close to B, they are probably getting loaded into the cache together.

If objects are large enough that they fill one or more cache-lines (modern CPU's have cache-lines of 64-128 bytes, mobile ones are sometimes smaller), the "next object in line" will not be in the cache anyways [1], so the cache-locality of the "next element in the vector" is less important. The smaller the object is, the more effect of this you get - assuming you are accessing objects in the order they are stored. If you pick a random number, then other factors start to become important [2], and the cache locality is much less important.

On the other other hand, as objects get larger, moving them within the vector (including growing, removing, inserting, as well as "random shuffle") will be more time consuming, as copying more data gets more extensive.

Of course, one further step is always needed to read from a pointer vs. reading an element directly in a vector, since the pointer itself needs to be "read" before we can get to the actual data in the pointee object. Again, this becomes more important when random-accessing things.

I always start with "whatever is simplest" (which depends on the overall construct of the code, e.g. sometimes it's easier to create a vector of pointers because you have to dynamically create the objects in the first place). Most of the code in a system is not performance critical anyway, so why worry about it's performance - just get it working and leave it be if it doesn't turn up in your performance measurements.

Of course, also, if you are doing a lot of movement of objects in a container, maybe vector isn't the best container. That's why there are multiple container variants - vector, list, map, tree, deque - as they have different characteristics with regards to their access and insert/remove as well as characteristics for linearly walking the data.

Oh, and in your example, you talk of 200 city objects - well, they are probably going to all fit in the cache of any modern CPU anyways. So stick them in a vector. Unless a city contains a list of every individual living in the city... But that probably should be a vector (or other container object) in itself.

As an experiment, make a program that does the same operations on a std::vector<int> and std::vector<int*> [such as filling with random numbers, then sorting the elements], then make an object that is large [stick some array of integers in there, or some such], with one integer so that you can do the very same operations on that. Vary the size of the object stored, and see how it behaves. On YOUR system, where is the benefit of having pointers, over having plain objects. Of course, also vary the number of elements, to see what effect that has.

[1] Well, modern processors use cache-prefetching, which MAY load "next data" into the cache speculatively, but we certainly can't rely on this.

[2] An extreme case of this is a telephone exchange with a large number of subscribers (millions). When placing a call, the caller and callee are looked up in a table. But the chance of either caller or callee being in the cache is nearly zero, because (assuming we're dealing with a large city, say London) the number of calls placed and received every second is quite large. So caches become useless, and it gets worse, because the processor also caches the page-table entries, and they are also, most likely, out of date. For these sort of applications, the CPU designers have "huge pages", which means that the memory is split into 1GB pages instead of the usual 4K or 2MB pages that have been around for a while. This reduces the amount of memory reading needed before "we get to the right place". Of course, the same applies to various other "large database, unpredictable pattern" - airlines, facebook, stackoverflow all have these sort of problems.

Thanks for your detailed answer. Although your main message is that you cannot say anything you have helped me with the feeling that 200 objects isn't that large. I this scenario I will use a vector of pointers since this will be slightly easier in the code. — Michiel uit het Broek, May 20 '15 at 11:06

Helps a vector for cache locality? (C++)

1 Answers1