8

This article is commonly referenced when anyone asks about video streaming textures in OpenGL.

It says:

To maximize the streaming transfer performance, you may use multiple pixel buffer objects. The diagram shows that 2 PBOs are used simultaneously; glTexSubImage2D() copies the pixel data from a PBO while the texture source is being written to the other PBO.

Double PBO

For nth frame, PBO 1 is used for glTexSubImage2D() and PBO 2 is used to get new texture source. For n+1th frame, 2 pixel buffers are switching the roles and continue to update the texture. Because of asynchronous DMA transfer, the update and copy processes can be performed simultaneously. CPU updates the texture source to a PBO while GPU copies texture from the other PBO.

They provide a simple bench-mark program which allows you to cycle between texture updates without PBO's, with a single PBO, and with two PBO's used as described above.

I see a slight performance improvement when enabling one PBO. But the second PBO makes no real difference.

Right before the code glMapBuffer's the PBO, it calls glBufferData with the pointer set to NULL. It does this to avoid a sync-stall.

// map the buffer object into client's memory
// Note that glMapBufferARB() causes sync issue.
// If GPU is working with this buffer, glMapBufferARB() will wait(stall)
// for GPU to finish its job. To avoid waiting (stall), you can call
// first glBufferDataARB() with NULL pointer before glMapBufferARB().
// If you do that, the previous data in PBO will be discarded and
// glMapBufferARB() returns a new allocated pointer immediately
// even if GPU is still working with the previous data.

So, Here is my question... Doesn't this make the second PBO completely useless? Just a waste of memory !?

With two PBO's the texture data is stored 3 times. 1 in the texture, and one in each PBO.

With a single PBO. There are two copies of the data. And temporarily only a 3rd in the event that glMapBuffer creates a new buffer because the existing one is presently being DMA'ed to the texture?

The comments seem to suggest that OpenGL drivers internally are capable to creating the second buffer IF and only WHEN it is required to avoid stalling the pipeline. The in-use buffer is being DMA'ed, and my call to map yields a new buffer for me to write to.

The Author of that article appears to be more knowledgeable in this area than myself. Have I completely mis-understood the point?

genpfault
  • 47,669
  • 9
  • 68
  • 119
cds84
  • 171
  • 9
  • 7
    I have no idea what I am talking about, but my blind guess is: You use two buffers so that at any time one can be written and the other can be read without reads and writes interfering, memory is rather cheap compared to the cost of avoiding race conditions – 463035818_is_not_a_number May 30 '18 at 13:15
  • IMHO I think it's like this: Imagine that one of the PBO's is being sent to the render buffer to be drawn to the screen. While this operation is taking place during this current frame, what if the information in the PBO is to change before the next frame or draw call... Instead of waiting for the frame to finish drawing, getting the information back, then updating it and sending it back to the render buffer, the updates can take place in the background during the current draw frame, then once that frame is done drawing it can nearly immediately swap the buffers to draw the updated information. – Francis Cugler May 30 '18 at 13:19
  • Don't exactly quote me on it for I have not used PBOs as I mostly have used vbo and vba and rbo mostly, but that is what I think the reasoning would be. – Francis Cugler May 30 '18 at 13:21
  • @FrancisCugler Thanks. But, the information in the PBO cannot change. Any attempt to change it will stall the GPU untill it is nolonger used. OR if the buffer has been orphaned by the glBufferData(...NULL), then any attempt to modify it simply provides a brand new buffer. – cds84 May 30 '18 at 13:25
  • 1
    @cds84 didn't you just answer your own question there? "Any attempt to change it will stall the GPU untill it is nolonger used" ie you need 2 so that you don't stall the GPU – UKMonkey May 30 '18 at 13:27
  • @UKMonkey NO. "OR if the buffer has been orphaned by the glBufferData(...NULL)"... This was the question... The buffer is orphaned, so no stall will take place. A new buffer will be allocated if a stall WOULD have taken place. So a second PBO is not required.. surely? – cds84 May 30 '18 at 13:30
  • @cds84 "A new buffer will be allocated if a stall WOULD have taken place" so a 2nd will be made for you? Still seems like a second to me. The problem is that by allocating a new buffer every time, it's a bigger drain than having 2 and reusing them both; and while you may not notice the difference on your hardware for the given example, slower hardware might be more obvious. – UKMonkey May 30 '18 at 13:32
  • @UKMonkey Thanks. Perhaps the Author intended to enable the *new buffer* feature... But try to prevent it from being required. Could any Guru's comment? It was my understanding that this kind of behavior was very cheap. (similar to D3D_DISCARD?) – cds84 May 30 '18 at 13:50
  • @cds84 Oh okay I think I vaguely remember reading something about the PBO not being able to change its data internally; it's been a while since I've read any material or docs on it, and as I stated the applications that I work with mostly just use basic vbo & vba. Once in a while I may use a frame buffer or render to texture; I've never really had a use to work with the PBO's directly. – Francis Cugler May 30 '18 at 14:10

1 Answers1

1

Answering my own question... But I wont accept it as an answer... (YET).

There are many problems with the benchmark program linked to in the question. It uses immediate mode. It uses GLUT!

The program was spending most of its time doing things we are not interested in profiling. Mainly rendering text via GLUT, and writing pretty stripes to the texture. So I have removed those functions.

I cranked the texture resultion up to 8K, and added more PBO Modes.

  • NO PBO (yeilds 6fps)

  • 1 PBO. Orphan previous buffer. (yields 12.2 fps).

  • 2 PBO's. Orpha previous buffer. (yields 12.2 fps).

  • 1 PBO. DONT orphan previous PBO (possible stall - added by myself. yields 12.4 fps).

  • 2 PBO's. DONT orphan previous PBO (possible stall - added by myself. yields 12.4 fps).

If anyone else would like to examine my code, it is vailable here

I have experimented with different texture sizes... and different updatePixels functions... I cannot, despite my best efforts get the double PBO implementation to perform any better than the single-PBO implementation.

Furthermore... NOT orphanning the previous buffer, actually vields better performance. Exactly opposite to what the article claims.

Perhaps modern drivers / hardware does not suffer the problem that this design is attemtping to fix...

Perhaps my graphics hardware / driver is buggy, and not taking advantage of the double-PBO...

Perhaps the commonly referenced article is completely wrong?

Who knows. . . . My test hardware is Intel(R) HD Graphics 5500 (Broadwell GT2).

cds84
  • 171
  • 9