51

Should I prefer to calculate matrices on the CPU or GPU?

Let's say I have the following matrices P * V * M , should I calculate them on the CPU so that I can send the final matrix to the GPU (GLSL) or should I send those three matrices separately to the GPU so that GLSL can calculate the final matrix?

I mean in this case GLSL would have to calculate the MVP matrix for every vertex, so it is probably faster to precompute it on the CPU.

But let's say that GLSL only has to calculate he MVP matrix once, would the GPU calculate the final matrix faster than the CPU?

Maik Klein
  • 13,812
  • 20
  • 88
  • 168
  • 1
    Can you give an example that you only perform this calculation once in GPU? – Amadeus May 18 '13 at 03:06
  • 3
    1. Optimize last. Are you sure you're not just procrastinating? :) 2. If you can't measure performance and identify bottlenecks, don't think about optimization. – Andreas Haferburg May 18 '13 at 09:30
  • Just a note, if you are directly multiplying a vector, as in `projection * view * model * vertex` Then there are actually no `matrix` multiplications. Only `matrix * vector` multiplications since it is evaluated from right to left `(projection * (view * (model * vertex)))`. This is much less burdensome on calculation. – Justin Meiners Jun 15 '17 at 03:39

2 Answers2

76

General rule: If you can pass it to a shader in form of a uniform, always precalculate on the CPU; no exceptions. Calculations on the shader side make sense only for values that vary between vertices and fragments. Everything that's constant among a whole batch of vertices is most efficiently dealt with on the CPU.

GPUs are not magic "can do faster everything" machines. There are certain tasks where a CPU can easily outperform a GPU, even for very large datasets. So a very simple guideline is: If you can move it to the CPU without spending more CPU time doing the calculation than it takes for the GPU in total overhead to process it, then do it on the CPU. The calculation of a single matrix is among those tasks.

datenwolf
  • 149,702
  • 12
  • 167
  • 273
  • 8
    GPUs are typically faster for problems which are "embarrassingly" parallel. – fluffels May 18 '13 at 13:07
  • 7
    Isn't matrix multiplication parallel? Each element can be calculated independently. – Calmarius Dec 24 '13 at 12:32
  • 7
    @Calmarius: Yes you can parallelize matrix calculation and in fact most CPUs will parallelize it using their vector instruction sets. But the matrices need to be calculated only exactly *once* and stay the same for all the vertices of a model using that particular transformation. The overhead alone just for making a GPU perform a 4×4 matrix multiplication takes more instructions (and time) than doing that calculation on the CPU. If you want to parallelize a say 2k×2k matrix multiplication, then GPUs will nicely parallelize that. But for a 4×4 matrix the overhead is simply not worth it. – datenwolf Dec 24 '13 at 16:32
  • 6
    It's not always about GPU's speed and parallelism. You need to remember that your matrix multiplication occurs for each vertices in your vertex shader! If you have a complex mesh with thousands of vertices, guess what? You will multiply P x V x M thousands of time, calculating the same exact MVP matrix values over and over. It is wasteful. – mchiasson Sep 13 '16 at 13:21
15

Like most situations with OpenGL, it depends.

In most cases, a single calculation can be done faster on the CPU than on the GPU. The GPU's advantage is that it can do lots of calculations in parallel.

On the other hand, it also depends where your bottlenecks are. If your CPU is doing lots of other work, but your shaders are not a bottleneck yet on the lowest-powered target system, then you could easily see some performance improvement by moving some matrix multiplications to the vertex shader.

Generally, you should avoid any work in the fragment shader that could also be done in the vertex shader or on the CPU, but beyond that, it depends on the situation. Unless you are running into performance issues, just do it whatever way is easiest for you, and if you are having performance issues, do it both ways and profile the performance to see which works better.

bcrist
  • 1,490
  • 13
  • 24
  • 5
    Calculating a handfull of matrices, especially the MVP one, is never a bottleneck on the CPU. – datenwolf May 18 '13 at 08:19
  • 7
    @datenwolf no it's not, but it isn't free either. So if the program isn't graphically demanding, but has other parts that are computationally demanding, letting the GPU calculate a few redundant MVP matrices could be just fine. As I said, it all depends on the situation. – bcrist May 18 '13 at 08:40
  • 8
    The whole overhead of making the GPU calculate something takes far more cycles than simple matrix-matrix calculation. You can do a 4×4 · 4×4 calculation in 16 instructions on modern CPUs. That's far less than what's required to select a shader program, set the uniforms and get the GPU's cogs turning by sending in a primitive to render. The overhead for making the GPU actually do something is quite high, hence you try to batch jobs as large as possible to the GPU. – datenwolf May 18 '13 at 10:25