In order to prepare an optimization in an existing software framework, I performed a standalone performance test, so I could assess potential gains before spending a large amount of time on it.
The Situation
There are N
different types of components, some of which implement an IUpdatable
interface - those are the interesting ones. They are grouped in M
objects, each maintaining a list of components. Updating them works like this:
foreach (GroupObject obj in objects)
{
foreach (Component comp in obj.Components)
{
IUpdatable updatable = comp as IUpdatable;
if (updatable != null)
updatable.Update();
}
}
The Optimization
My goal was to optimize these updates for large amounts of grouping objects and components. First, make sure to update all components of one kind in a row, by caching them in one array per kind. Essentially, this:
foreach (IUpdatable[] compOfType in typeSortedComponents)
{
foreach (IUpdatable updatable in compOfType)
{
updatable.Update();
}
}
The thought behind it was that the JIT or the CPU might have an easier time operating on the same object type over and over again than in a shuffled version.
In the next step, I wanted to further improve the situation by making sure that all data for one Component type is aligned in memory - by storing it in a struct array, something like this:
foreach (ComponentDataStruct[] compDataOfType in typeSortedComponentData)
{
for (int i = 0; i < compDataOfType.Length; i++)
{
compDataOfType[i].Update();
}
}
The Problem
In my standalone performance tests, there is no significant performance gain from either of these changes. I'm not sure why. No significant performance gains means, with 10000 components, each batch running 100 update cycles, all main tests take around 85 milliseconds +/- 2 milliseconds.
(The only difference arises from introducing the as
cast and if
check, but that's not really what I was testing for.)
- All tests were performed in Release mode, without attached debugger.
External disturbances were reduced by using this code:
currentProc.ProcessorAffinity = new IntPtr(2); currentProc.PriorityClass = ProcessPriorityClass.High; currentThread.Priority = ThreadPriority.Highest;
Each test actually did some primitive math work, so it's not just measuring empty method calls which could potentially be optimized away.
- Garbage Collection was performed explicitly before each test, to rule out that interference as well.
- The full source code (VS Solution, Build & Run) is available here
I would have expected a significant change due to memory alignment and repetition in update patterns. So, my core question really is: Why wasn't I able to measure a significant improvement? Am I overlooking something important? Did I miss something in my tests?