1

In a deferred shading framework, I am using different framebufer objects to perform various render passes. In the first pass I write the DEPTH_STENCIL_ATTACHMENT for the whole scene to a texture, let's call it DepthStencilTexture. To access the depth information stored in DepthStencilTexture from different render passes, for which I use different framebuffer objects, I know two ways:
1) I bind the DepthStencilTexture to the shader and I access it in the fragment shader, where I do the depth manually, like this

uniform vec2 WinSize; //windows dimensions
vec2 uv=gl_FragCoord.st/WinSize;
float depth=texture(DepthStencilTexture ,uv).r;
if(gl_FragCoord.z>depth) discard;

I also set glDisable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE)

2) I bind the DepthStencilTexture to the framebuffer object as DEPTH_STENCIL_ATTACHMENT and set glEnable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE) (edit: in this case I won't bind the DepthStencilTexture to the shader, to avoid loop feedback, see the answer by Nicol Bolas, and I if I need the depth in the fragment shader I will use gl_FragCorrd.z)

In certain situations, such as drawing light volumes, for which I need the Stencil Test and writing to the stencil buffer, I am going for the solution 2). In other situations, in which I completely ignore the Stencil, and just need the depth stored in the DepthStencilTexture, does option 1) gives any advantages over the more "natural" option 2) ?

For example I have a (silly, I think) doubt about it . Sometimes in my fragment shaders Icompute the WorldPosition from the depth. In the case 1) it would be like this

uniform mat4 invPV; //inverse PV matrix 
vec2 uv=gl_FragCoord.st/WinSize;
vec4 WorldPosition=invPV*vec4(uv, texture(DepthStencilTexture ,uv).r ,1.0f );
WorldPosition=WorldPosition/WorldPosition.w;

In the case 2) it would be like this (edit: this is wrong, gl_FragCoord.z is the current fragment's depth, not the actual depth stored in the texture)

uniform mat4 invPV; //inverse PV matrix 
vec2 uv=gl_FragCoord.st/WinSize;
vec4 WorldPosition=invPV*vec4(uv, gl_FragCoord.z, 1.0f );
WorldPosition=WorldPosition/WorldPosition.w;

I am assuming that gl_FragCoord.z in case 2) will be the same as texture(DepthStencilTexture ,uv).r in case 1), or, in other words, the depth stored in the the DepthStencilTexture. Is it true? Is gl_FragCoord.z read from the currently bound DEPTH_STENCIL_ATTACHMENT also with glDisable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE) ?

darius
  • 817
  • 9
  • 22

1 Answers1

3

Going strictly by the OpenGL specification, option 2 is not allowed. Not if you're also reading from that texture.

Yes, I realize you're using write masks to prevent depth writes. It doesn't matter; the OpenGL specification is quite clear. In accord with 9.3.1 of OpenGL 4.4, a feedback loop is established when:

  • an image from texture object T is attached to the currently bound draw framebuffer object at attachment point A

  • the texture object T is currently bound to a texture unit U, and

  • the current programmable vertex and/or fragment processing state makes it possible (see below) to sample from the texture object T bound to texture unit U

That is the case in your code. So you technically have undefined behavior.

One reason this is undefined is so that simply changing write masks won't have to do things like clearing framebuffer and/or texture caches.

That being said, you can get away with option 2 if you employ NV_texture_barrier. Which, despite the name, is quite widely available on AMD hardware. The main thing to do here is to issue a barrier after you do all of your depth writing, so that all subsequent reads are guaranteed to work. The barrier will do all of the cache clearing and such you need.

Otherwise, option 1 is the only choice: doing the depth test manually.

I am assuming that gl_FragCoord.z in case 2) will be the same as texture(DepthStencilTexture ,uv).r in case 1), or, in other words, the depth stored in the the DepthStencilTexture. Is it true?

Neither is true. gl_FragCoord is the coordinate of the fragment being processed. This is the fragment generated by the rasterizer, based on the data for the primitive being rasterized. It has nothing to do with the contents of the framebuffer.

Community
  • 1
  • 1
Nicol Bolas
  • 378,677
  • 53
  • 635
  • 829
  • This explain a lot, thanks. But in my option 2) I don't sample from the texture in the shaders, but access the depth information with gl_FragCoord.z. So the third hypothesis for the feedback loop is not reached. Am I right? (gonna edit my question to make it more clear). – darius Aug 16 '13 at 08:02
  • @darius: So if you're not reading the depth, how are you reconstructing the position for deferred rendering? Are you actually writing the position? – Nicol Bolas Aug 16 '13 at 08:06
  • No. I wrote that in the end of my question: with gl_FragCoord.z. My idea is: I have the DepthStencilTexture bound to the FBO as DEPTH_STENCIL_COMPONENT, so gl_FragCoord.z should give me the same result as if I was sampling from the texture, but without actually sampling from it. – darius Aug 16 '13 at 08:11
  • @darius: That's not how it works. `gl_FragCoord` is the *current fragment's* coordinate. It has absolutely nothing to do with the framebuffer. – Nicol Bolas Aug 16 '13 at 08:13
  • Ah, ok! Of course! That was my big error here. This really cleared my mind. Thank you. – darius Aug 16 '13 at 08:19
  • Also, since I have a feeling doing it the way described in option 2 is going to break hiearchical Z buffering and other things needed for early-Z, testing the depth in your shader should give the same performance as using the built-in depth test... It has to do traditional post-shading depth accept/reject testing. As you should know `discard` only discards the results of a shader, the majority of the time the GPU has to continue evaluating the shader because of the way it schedules fragment shaders (so option 1 is not a substitute for early-Z). – Andon M. Coleman Aug 16 '13 at 20:11
  • @AndonM.Coleman Yes, I know about discard. Why do I need hierarchical Z buffering for the early z ? – darius Sep 02 '13 at 13:00
  • Early Z is often implemented using hierarchical Z, which effectively compresses a copy of the depth buffer into tiled regions. Primitives can be clipped against these tiles in a Hi-Z implementation before fragment shading occurs, rather than waiting for the fragment shader to finish and performing a depth test. But if you `discard` in a fragment shader, or use too many depth buffers in your scene Hi-Z may not be possible. It is hardware dependent, older generations had strict rules on what would break Hi-Z and what would allow it to continue working, testing is the only way to know for sure. – Andon M. Coleman Sep 02 '13 at 14:28