0

I've been working on implementing deferred shading as I want to have at least 20 lights in my scene. I was having problems making it fast enough (and still am), but then I made a change that I would have thought would make it slower, but in fact almost double my frame rate.

Initial code:

geometryPassFBO = createFBO(); // position texture, normal texture, colour texture and depth buffer
while (1)
{
    bind geometryPassFBO.
    allObjects.draw();

    bind systemFBO();
    for each light
        send light info
        draw light sphere sampling from position, normal and colour textures.

    blit depth buffer from geometryFBO to systemFBO

    for each light
        light.draw(); // draw a cube to represent the light

    2DObjects.draw(); // frame rate, etc...
}

I was in the process of setting up a stencil test to only do the lighting pass if the pixel is set during the geometry pass (ie the background with normal = 0,0,0 and position = 0,0,0 and colour = 0,0,0.

However I was having difficulty copying the combined depth / stencil buffer to the default depth / stencil buffer. Apparently this doesn't work great, as we don't know what format the system depth / stencil buffer takes. So I had read that it was better to setup another FBO where we can specify the depth / stencil buffer format, render to this, and then either blit or render a screen quad to get it out to the screen.

So before adding any stencil stuff, I simply added the new FBO to get that bit working.

My new code now looks like:

geometryPassFBO = createGeometryFBO(); // position texture, normal texture, colour texture and depth buffer
lightingPassFBO = createLightingFBO(); // colour texture and depth buffer
while (1)
{
    bind geometryPassFBO.
    allObjects.draw();

    bind lightingPassFBO();
    for each light
        send light info
        draw light sphere sampling from position, normal and colour textures.

    blit depth buffer from geometryFBO to lightingPassFBO

    for each light
        light.draw(); // draw a cube to represent the light

    2DObjects.draw(); // frame rate, etc...

    bind systemFBO;
    render screen quad sampling from colour texture.
}

This works as expected. What was not expected is that my frame rate jumped from 25 FPS to 45 FPS.

Why is this? How can having to do an additional shader pass for a screen quad be more efficient than not doing?

Quick follow up question. Which is more efficient rendering a screen quad using a simple vertex and fragment shader to sample a texture based on gl_FragCoord, or blitting the colour attachment directly to the system FBO?

1 Answers1

2

Well, it's probably this:

blit depth buffer from geometryFBO to lightingPassFBO

As you point out, format conversion can be slow. But since you're defining both the input and output buffers for this blit operation, they're probably using the same depth format. So the blitting operation may proceed much faster.

Also, you probably shouldn't even do this blit at all. Just attach geometryFBO's depth/stencil buffer to the lightingPassFBO before you render your light cubes. Just remember to remove the attachment afterward rendering the lights (otherwise your deferred pass will have undefined behavior, assuming you're reading from the depth buffer in your deferred pass).

As for your question about blitting vs. a full-screen quad, I have a better question: why are you accumulating 20+ lights in a scene and not using high-dynamic range lighting? Because the final pass to render to the screen should also use tone-mapping to convert your HDR image to an LDR for display.

But as for the exact question, a blit operation should be no slower than an FSQ, assuming that there is no format conversion going on. If there is format conversion happening, then it could be more efficient to take things to a vertex shader.

Nicol Bolas
  • 378,677
  • 53
  • 635
  • 829
  • Thanks, that makes sense. I'll look into swapping the depth buffer's attachment. In answer to your question, I'm not doing HDR, because I haven't got to it yet. It's next on my list after improving performance. – Andrew Parlane Jan 24 '16 at 19:43