I've been working on implementing deferred shading as I want to have at least 20 lights in my scene. I was having problems making it fast enough (and still am), but then I made a change that I would have thought would make it slower, but in fact almost double my frame rate.
Initial code:
geometryPassFBO = createFBO(); // position texture, normal texture, colour texture and depth buffer
while (1)
{
bind geometryPassFBO.
allObjects.draw();
bind systemFBO();
for each light
send light info
draw light sphere sampling from position, normal and colour textures.
blit depth buffer from geometryFBO to systemFBO
for each light
light.draw(); // draw a cube to represent the light
2DObjects.draw(); // frame rate, etc...
}
I was in the process of setting up a stencil test to only do the lighting pass if the pixel is set during the geometry pass (ie the background with normal = 0,0,0 and position = 0,0,0 and colour = 0,0,0.
However I was having difficulty copying the combined depth / stencil buffer to the default depth / stencil buffer. Apparently this doesn't work great, as we don't know what format the system depth / stencil buffer takes. So I had read that it was better to setup another FBO where we can specify the depth / stencil buffer format, render to this, and then either blit or render a screen quad to get it out to the screen.
So before adding any stencil stuff, I simply added the new FBO to get that bit working.
My new code now looks like:
geometryPassFBO = createGeometryFBO(); // position texture, normal texture, colour texture and depth buffer
lightingPassFBO = createLightingFBO(); // colour texture and depth buffer
while (1)
{
bind geometryPassFBO.
allObjects.draw();
bind lightingPassFBO();
for each light
send light info
draw light sphere sampling from position, normal and colour textures.
blit depth buffer from geometryFBO to lightingPassFBO
for each light
light.draw(); // draw a cube to represent the light
2DObjects.draw(); // frame rate, etc...
bind systemFBO;
render screen quad sampling from colour texture.
}
This works as expected. What was not expected is that my frame rate jumped from 25 FPS to 45 FPS.
Why is this? How can having to do an additional shader pass for a screen quad be more efficient than not doing?
Quick follow up question. Which is more efficient rendering a screen quad using a simple vertex and fragment shader to sample a texture based on gl_FragCoord, or blitting the colour attachment directly to the system FBO?