5

I'm working on a Three.js WebGL scene and am noticing 60 FPS when I'm zoomed out so that all observations (~20,000 triangles) are in view, but very low FPS when I'm zoomed in so that only a small subset of the triangles are in view.

I'd like to figure out what's causing this discrepancy. My intuition is that the opposite would be true: I'd assume when the user is zoomed in the near and far clipping planes would remove many triangles from the scene which would increase FPS. I want to figure out why this intuition is wrong in this scene.

How can one identify the full stack of calls used within a three.js program? Ideally I'd like to identify all the function / method calls and the time required to execute that function so that I can try and figure out which portion of the shaders I'm working on are killing the FPS when the user is zoomed in.

gman
  • 83,286
  • 25
  • 191
  • 301
duhaime
  • 19,699
  • 8
  • 122
  • 154
  • It seems the low FPS is due to the fact that the triangles overlap on the z-plane significantly and fill a large region of the screen when zoomed in, which causes lots of "overdrawing" (each triangle's pixels are shaded each frame, and because the triangles are filling the screen, there are lots of triangles that are shaded each second for each pixel on screen). – duhaime Apr 28 '18 at 01:20
  • Are they semi-transparent? If they are opaque then you can z-sort them and draw front to back. The depth buffer will then help with overdraw. – gman Apr 28 '18 at 01:54
  • @gman they're not transparent -- they're fully opaque raster textures painted on triangles. Do you by chance have a link for a sample that z-sorts and draws front to back in Three.js? (I came to your comment just now after reading half a dozen other helpful notes by you scattered throughout SO -- thanks for helping so many people) – duhaime Apr 28 '18 at 01:57
  • Here's a thread on how to control sorting in THREE: https://stackoverflow.com/questions/15514274/three-js-how-to-control-rendering-order The real solution in my case is to put together a better layout algorithm, as I can prevent z-axis overlaps which will help – duhaime Apr 28 '18 at 01:59
  • Would someone mind writing an answer? It took me a long time to figure out why this would happen and I had the same assumption as the op. Is the fragment shader guaranteed to not run if the depth test fails, or does this depend (what is early rejection). – pailhead Apr 28 '18 at 10:15
  • @gman do you have a reference I can read to better understand how z-sorting triangles and/or the depth buffer help mitigate overdrawing costs? -- Edit, the wiki page is pretty succinct: https://en.wikipedia.org/wiki/Z-buffering – duhaime Apr 29 '18 at 01:34

1 Answers1

16

GPUs have a few basic places where they spend computing power. It should be pretty obvious. One is running the vertex shader once per vertex. The other is running the fragment shader once per pixel/fragment.

There are almost always a ton more pixels than vertices. A single 1920x1080 screen is nearly 2 million pixels yet can be covered in a 3 vertex triangle or a 4 or 6 vertex quad (2 triangles). That means to cover the entire screen the vertex shader ran 3 to 6 times but the fragment shader ran 2 million times!!!

Sending too much work to the fragment shader is called being "fill bound". You maxed out the fill rate (filling triangles with pixels) and that is what you're seeing. In the worse case on my 2014 MacBook Pro I might be able to only draw at 6 or so screens worth of pixels before I've hit the fill rate limit for updating the screen at 60 frames a second.

There are various solutions to this.

The first is the z-buffer. The GPU will first test the depth buffer to see if it needs to run the fragment shader at all. If the depth test fails the GPU does not need to run the fragment shader. So, if you sort and draw opaque objects, closest objects first to furthest object last, then most of those objects in the distance will fail the depth test when rendering the pixels of their triangles. Note that this is only possible if your fragment shader does not write to gl_FragDepth and does not use the discard keyword.

This is a method of "avoiding overdraw". Overdraw is any pixel that is drawn more than once. If you draw a cube in the distance and then draw a sphere up close such that it covers the cube then for every pixel that was rendered for the cube it was "overdrawn" by the sphere pixels. That was a waste of time.

If your fragment shaders are really complicated and therefore slow to run some 3D engines will draw a "Z buffer pre-pass". They'll draw all the opaque geometry with the simplest vertex and fragment shader. The vertex shader only needs position. The fragment shader just emits a constant value. They'll even turn off drawing to the color buffer gl.colorMask(false, false, false, false) or possibly make a depth only framebuffer if that's supported by the hardware. They then use this to fill out the depth buffer. When finished they render everything again with the expensive shader and the depth test set to LEQUAL (or whatever works for their engine). In this way every pixel will only be rendered once. Of course it's not free, it still takes the GPU time to try to rasterize the triangles and test every pixel but it can still be faster than overdraw if the shaders are expensive.

Another way is to try to figure out which objects are going to be occluded by closer objects and not even submit them to the GPU. There are tons of ways to do this, usually involving bounding spheres and or bounding boxes. Some potentially visible sets techniques can also help with occlusion culling. You can even ask the GPU to compute some of this using occlusion queries though that's only available in WebGL2

The easiest way to see if you're fill bound is to make your canvas tiny, like 2x1 pixels (or just size your browser window really small). If your app starts running fast it's likely fill bound. If it's still running slow it could either be geometry bound (the vertex shader is doing too much work) or it's CPU bound (whatever work you're doing on the CPU is taking too long whether that's just calling WebGL commands or computing animation or collisions or physics or whatever).

In your case you likely are fill bound since you see when all the triangles are small it runs fast (because very few pixels are being drawn) vs when you're zoomed in and lots of triangles cover the screen then it runs slow (because too many pixels are being drawn).

There are no really "simple" solutions. I really just depends on what you're trying to do. Apparently you're using three.js, I know it can sort for transparent objects. I have no idea if it sorts for opaque objects. The other techniques listed I believe are kind of outside the scope of three.js and more up to your app to take things in and out of the scene or set their visibility to false etc...

Note: here is a simple demo to show how little overdraw your GPU can handle. It just draws a bunch of fullscreen quads. By default it likely can't draw that many, especially at fullscreen size, before it can no longer hit 60fps. Turn on sorting front to back and it will be able to draw more and still hit 60fps.

Also note that enabling blending is slower than with blending disabled. This should be clear because without blending the GPU just writes the pixel. With blending the GPU has to first read the destination pixel so that it can do the blending therefore it's slower.

gman
  • 83,286
  • 25
  • 191
  • 301
  • 2
    wow, thanks for this spectacular answer. These solutions and debugging suggestions (the resizing of canvas to 2x1 and your overdraw test page) are spectacular. Thanks for taking the time to lay all this out so clearly. – duhaime Apr 29 '18 at 11:52
  • 2
    This is still one of the most helpful posts I've found on Stack Overflow -- thanks for taking the time to write this up. – duhaime Jul 06 '18 at 15:46