Display list vs. VAO performance

Question

I recently implemented functionality in my rendering engine to make it able to compile models into either display lists or VAOs based on a runtime setting, so that I can compare the two to each other.

I'd generally prefer to use VAOs, since I can make multiple VAOs sharing actual vertex data buffers (and also since they aren't deprecated), but I find them to actually perform worse than display lists on my nVidia (GTX 560) hardware. (I want to keep supporting display lists anyway to support older hardware/drivers, however, so there's no real loss in keeping the code for handling them.)

The difference is not huge, but it is certainly measurable. As an example, at a point in the engine state where I can consistently measure my drawing loop using VAOs to take, on a rather consistent average, about 10.0 ms to complete a cycle, I can switch to display lists and observe that cycle time decrease to about 9.1 ms on a similarly consistent average. Consistent, here, means that a cycle normally deviates less than ±0.2 ms, far less than the difference.

The only thing that changes between these settings is the drawing code of a normal mesh. It changes from the VAO code whose OpenGL calls look simply thusly...

glBindVertexArray(id);
glDrawElements(GL_TRIANGLES, num, GL_UNSIGNED_SHORT, NULL); // Using an index array in the VAO

... to the display-list code which looks as follows:

glCallList(id);

Both code paths apply other states as well for various models, of course, but that happens in the exact same manner, so those should be the only differences. I've made explicitly sure to not unbind the VAO unnecessarily between draw calls, as that, too, turned out to perform measurably worse.

Is this behavior to be expected? I had expected VAOs to perform better or at least equally to display lists, since they are more modern and not deprecated. On the other hand, I've been reading on the webs that nVidia's implementation has particularly well optimized display lists and all, so I'm thinking perhaps their VAO implementation might still be lagging behind. Has anyone else got findings that match (or contradict) mine?

Otherwise, could I be doing something wrong? Are there any known circumstances that make VAOs perform worse than they should, on nVidia hardware or in general?

For reference, I've tried the same differences on an Intel HD Graphics (Ironlake) as well, and there it turned out that using VAOs performed just as well as simply rendering directly from memory, while display lists were much worse than either. I wish I had AMD hardware to try on, but I don't.

As mentioned, you will see this performance gain disappear on other implementations. nV is noted for actually being an advocate of display lists, other vendors are not big fans of them. Display lists in and of themselves are not bad things, they are even supported in modern proprietary game console APIs where you have a much lower-level control over the command buffer. They are just nastily implemented in OpenGL and they violate the spirit of OpenGL's modern design direction (object model) in my opinion. — Andon M. Coleman, Oct 07 '13 at 03:53
Yes, I realize that it's most likely highly implementation-dependent, but I'm still wondering if it's normal and expected that VAOs perform explicitly *worse* than display lists, even on nVidia hardware. — Dolda2000, Oct 07 '13 at 04:11
Just for an additional experiment: How about you try out what happens if you're not using indexed vertices, but just a sequential array of vertices and draw that using `glDrawArrays`? — datenwolf, Oct 07 '13 at 09:20
How is your data organized when you use VAOs? Are you using a single buffer with interleaved vertex data? — GuyRT, Oct 07 '13 at 14:45
The dirty little secret about VAOs is the state setup overhead, it is not enormous but it can actually be slightly higher than deprecated behavior (be it VBOs with no VAOs or display lists). Binding a VAO can be more expensive than changing the bound VBOs and one or two vertex attrib. pointers. This is what display lists boil down to, the bare minimum set of commands needed to setup the proper states to draw something (but they also leak their state long after the display list is no longer in-use, VAOs do not do this - binding a different VAO nullifies any side-effects of a previous VAO bind). — Andon M. Coleman, Oct 07 '13 at 16:56
@GuyRT: I have to admit that my data is not interleaved, but in separate buffers. I've wanted to try and see if interleaving would make any significant difference, but it's a bit hard to fit into my architecture, and since reports on the Internet at large about the performance gains to be had from it are ambiguous I haven't made it a priority. I'll probably figure out a way to fit it in at some point, and I look forward to seeing what difference it would make. — Dolda2000, Oct 07 '13 at 17:12
Just stumbled upon this discussion from 2y ago... out of curiosity, any updates on your findings? — fegemo, Dec 29 '15 at 16:59
I'd like to know if using a core profile makes a difference vs. using a compatibility profile. — rob mayoff, Aug 19 '16 at 01:56

Display list vs. VAO performance

0 Answers0