How WebGL works?

Question

I'm looking for deep understanding of how WebGL works. I'm wanting to gain knowledge at a level that most people care less about, because the knowledge isn't necessary useful to the average WebGL programmer. For instance, what role does each part(browser, graphics driver, etc..) of the total rendering system play in getting an image on the screen? Does each browser have to create a javascript/html engine/environment in order to run WebGL in browser? Why is chrome a head of everyone else in terms of being WebGL compatible?

So, what's some good resources to get started? The kronos specification is kind of lacking( from what I saw browsing it for a few minutes ) for what I'm wanting. I'm wanting mostly how is this accomplished/implemented in browsers and what else needs to change on your system to make it possible.

score 45 · Accepted Answer · edited Sep 20 '17 at 02:08

Hopefully this little write-up is helpful to you. It overviews a big chunk of what I've learned about WebGL and 3D in general. BTW, if I've gotten anything wrong, somebody please correct me -- because I'm still learning, too!

Architecture

The browser is just that, a Web browser. All it does is expose the WebGL API (via JavaScript), which the programmer does everything else with.

As near as I can tell, the WebGL API is essentially just a set of (browser-supplied) JavaScript functions which wrap around the OpenGL ES specification. So if you know OpenGL ES, you can adopt WebGL pretty quickly. Don't confuse this with pure OpenGL, though. The "ES" is important.

The WebGL spec was intentionally left very low-level, leaving a lot to be re-implemented from one application to the next. It is up to the community to write frameworks for automation, and up to the developer to choose which framework to use (if any). It's not entirely difficult to roll your own, but it does mean a lot of overhead spent on reinventing the wheel. (FWIW, I've been working on my own WebGL framework called Jax for a while now.)

The graphics driver supplies the implementation of OpenGL ES that actually runs your code. At this point, it's running on the machine hardware, below even the C code. While this is what makes WebGL possible in the first place, it's also a double edged sword because bugs in the OpenGL ES driver (which I've noted quite a number of already) will show up in your Web application, and you won't necessarily know it unless you can count on your user base to file coherent bug reports including OS, video hardware and driver versions. Here's what the debug process for such issues ends up looking like.

On Windows, there's an extra layer which exists between the WebGL API and the hardware: ANGLE, or "Almost Native Graphics Layer Engine". Because the OpenGL ES drivers on Windows generally suck, ANGLE receives those calls and translates them into DirectX 9 calls instead.

Drawing in 3D

Now that you know how the pieces come together, let's look at a lower level explanation of how everything comes together to produce a 3D image.

JavaScript

First, the JavaScript code gets a 3D context from an HTML5 canvas element. Then it registers a set of shaders, which are written in GLSL ([Open] GL Shading Language) and essentially resemble C code.

The rest of the process is very modular. You need to get vertex data and any other information you intend to use (such as vertex colors, texture coordinates, and so forth) down to the graphics pipeline using uniforms and attributes which are defined in the shader, but the exact layout and naming of this information is very much up to the developer.

JavaScript sets up the initial data structures and sends them to the WebGL API, which sends them to either ANGLE or OpenGL ES, which ultimately sends it off to the graphics hardware.

Vertex Shaders

Once the information is available to the shader, the shader must transform the information in 2 phases to produce 3D objects. The first phase is the vertex shader, which sets up the mesh coordinates. (This stage runs entirely on the video card, below all of the APIs discussed above.) Most usually, the process performed on the vertex shader looks something like this:

gl_Position = PROJECTION_MATRIX * VIEW_MATRIX * MODEL_MATRIX * VERTEX_POSITION

where VERTEX_POSITION is a 4D vector (x, y, z, and w which is usually set to 1); VIEW_MATRIX is a 4x4 matrix representing the camera's view into the world; MODEL_MATRIX is a 4x4 matrix which transforms object-space coordinates (that is, coords local to the object before rotation or translation have been applied) into world-space coordinates; and PROJECTION_MATRIX which represents the camera's lens.

Most often, the VIEW_MATRIX and MODEL_MATRIX are precomputed and called MODELVIEW_MATRIX. Occasionally, all 3 are precomputed into MODELVIEW_PROJECTION_MATRIX or just MVP. These are generally meant as optimizations, though I'd like find time to do some benchmarks. It's possible that precomputing is actually slower in JavaScript if it's done every frame, because JavaScript itself isn't all that fast. In this case, the hardware acceleration afforded by doing the math on the GPU might well be faster than doing it on the CPU in JavaScript. We can of course hope that future JS implementations will resolve this potential gotcha by simply being faster.

Clip Coordinates

When all of these have been applied, the gl_Position variable will have a set of XYZ coordinates ranging within [-1, 1], and a W component. These are called clip coordinates.

It's worth noting that clip coordinates is the only thing the vertex shader really needs to produce. You can completely skip the matrix transformations performed above, as long as you produce a clip coordinate result. (I have even experimented with swapping out matrices for quaternions; it worked just fine but I scrapped the project because I didn't get the performance improvements I'd hoped for.)

After you supply clip coordinates to gl_Position WebGL divides the result by gl_Position.w producing what's called normalized device coordinates. From there, projecting a pixel onto the screen is a simple matter of multiplying by 1/2 the screen dimensions and then adding 1/2 the screen dimensions.^[1] Here are some examples of clip coordinates translated into 2D coordinates on an 800x600 display:

clip = [0, 0]
x = (0 * 800/2) + 800/2 = 400
y = (0 * 600/2) + 600/2 = 300

clip = [0.5, 0.5]
x = (0.5 * 800/2) + 800/2 = 200 + 400 = 600
y = (0.5 * 600/2) + 600/2 = 150 + 300 = 450

clip = [-0.5, -0.25]
x = (-0.5  * 800/2) + 800/2 = -200 + 400 = 200
y = (-0.25 * 600/2) + 600/2 = -150 + 300 = 150

Pixel Shaders

Once it's been determined where a pixel should be drawn, the pixel is handed off to the pixel shader, which chooses the actual color the pixel will be. This can be done in a myriad of ways, ranging from simply hard-coding a specific color to texture lookups to more advanced normal and parallax mapping (which are essentially ways of "cheating" texture lookups to produce different effects).

Depth and the Depth Buffer

Now, so far we've ignored the Z component of the clip coordinates. Here's how that works out. When we multiplied by the projection matrix, the third clip component resulted in some number. If that number is greater than 1.0 or less than -1.0, then the number is beyond the view range of the projection matrix, corresponding to the matrix zFar and zNear values, respectively.

So if it's not in the range [-1, 1] then it's clipped entirely. If it is in that range, then the Z value is scaled to 0 to 1^[2] and is compared to the depth buffer^[3]. The depth buffer is equal to the screen dimensions, so that if a projection of 800x600 is used, the depth buffer is 800 pixels wide and 600 pixels high. We already have the pixel's X and Y coordinates, so they are plugged into the depth buffer to get the currently stored Z value. If the Z value is greater than the new Z value, then the new Z value is closer than whatever was previously drawn, and replaces it^[4]. At this point it's safe to light up the pixel in question (or in the case of WebGL, draw the pixel to the canvas), and store the Z value as the new depth value.

If the Z value is greater than the stored depth value, then it is deemed to be "behind" whatever has already been drawn, and the pixel is discarded.

^{[1]_{The actual conversion uses the gl.viewport settings to convert from normalized device coordinates to pixels.}}

^{[2]_{It's actually scaled to the gl.depthRange settings. They default 0 to 1.}}

^{[3]_{Assuming you have a depth buffer and you've turned on depth testing with gl.enable(gl.DEPTH_TEST).}}

^{[4]_{You can set how Z values are compared with gl.depthFunc}}

OpenGL ES was never used to render WebGL on desktop cards, OpenGL ES is only used on mobile hardware, on desktops with an operating other than windows the browser engine translates WebGL API calls into OpenGL calls. This info is also found on this very nice writeup: codeflow.org/entries/2013/feb/02/why-you-should-use-webgl — LJᛃ, Nov 28 '13 at 00:37
WRT your comment about working around driver bugs. PLEASE FILE BUGS at http://crbug.com or firefox etc. The browser don't want you to have to deal with driver bugs and they will do their best to work around them and make WebGL have consistent behavior everywhere. That includes writing new tests so drivers won't regress. But they can only do this if they know about the bugs and they can only know that if you file them. — gman, Apr 22 '14 at 14:27

gman · Answer 2 · 2015-04-28T08:30:07.057

I would read these articles

http://webglfundamentals.org/webgl/lessons/webgl-how-it-works.html

Assuming those articles are helpful, the rest of the picture is that WebGL runs in a browser. It renderers to a canvas tag. You can think of a canvas tag like an img tag except you use the WebGL API to generate an image instead of download one.

Like other HTML5 tags the canvas tag can be styled with CSS, be under or over other parts of the page. Is composited (blended) with other parts of the page. Be transformed, rotated, scaled by CSS along with other parts of the page. That's a big difference from OpenGL or OpenGL ES.