What are the options for the gzip_proxied directive for?

Question

The gzip_proxied directive allows for the following options (non-exhaustive):

expired
enables compression if a response header includes the “Expires” field with a value that disables caching;

no-cache
enables compression if a response header includes the “Cache-Control” field with the “no-cache” parameter;

no-store
enables compression if a response header includes the “Cache-Control” field with the “no-store” parameter;

private
enables compression if a response header includes the “Cache-Control” field with the “private” parameter;

no_last_modified
enables compression if a response header does not include the “Last-Modified” field;

no_etag
enables compression if a response header does not include the “ETag” field;

auth
enables compression if a request header includes the “Authorization” field;

I can't see any rational reason to use most of these options. For example, why would whether or not a proxied request contains the Authorization header, or Cache-Control: private, affect whether or not I want to gzip it?

Given that old versions of Nginx strip ETags from responses when gzipping them, I can see a use case for no_etag: if you don't have Nginx configured to generate ETags for your gzipped responses, you may prefer to pass on an uncompressed response with an ETag rather than generate a compressed one without an ETag.

I can't figure out the others, though.

What are the intended use cases of each of these options?

Alex Nauda · Accepted Answer · 2015-10-31T14:28:46.020

From the admin guide: (emphasis mine)

The directive has a number of parameters specifying which kinds of proxied requests NGINX should compress. For example, it is reasonable to compress responses only to requests that will not be cached on the proxy server. For this purpose the gzip_proxied directive has parameters that instruct NGINX to check the Cache-Control header field in a response and compress the response if the value is no-cache, no-store, or private. In addition, you must include the expired parameter to check the value of the Expires header field. These parameters are set in the following example, along with the auth parameter, which checks for the presence of the Authorization header field (an authorized response is specific to the end user and is not typically cached)

I'd agree that not compressing cacheable responses is reasonable. Consider that the primary savings of caching at a proxy is to increase performance (response time) and reduce the time and bandwidth that the proxy spends in requesting the upstream resource. The tradeoff to gain these performance benefits is the cost of cache storage. Here are some use cases where not compressing cacheable responses make sense:

In the normal web traffic of many sites, non-personalized responses (which constitute the majority of cacheable responses) have already been optimized through techniques like script minification, image size optimization, etc., in a web build process. While these static resources might shrink slightly from compression, the CPU cost of trying to gzip them smaller is probably not an efficient use of the proxy layer machine resources. But dynamically generated pages, served to logged-in users, containing tons of application-generated content would very likely benefit from compression (and would typically not be cacheable).
You are setting up a proxy in front of a costly upstream service, but you are serving responses to another proxy that will be responsible for compression for each user agent. For example, if you have a CDN that makes multiple requests to the same costly upstream resource (from separate geographical edge locations) and you want to ensure that you can reuse the costly response. If the CDN caches uncompressed versions (to service both compressed and uncompressed user agents) you may be compressing at your proxy only to have them uncompress again at the CDN, which is simply a waste of hardware and electricity on both sides, to reduce bandwidth in the highest-bandwidth part of the chain. (Response gzip compression is most beneficial at the last mile, to get the response data to your user's phone which has dropped to one dot of signal as they enter the subway.)
For sizable response entities, requests may come in (from various user agents, but often via downstream CDN intermediaries) for byte ranges of the resource, to user agents that don't support compression. The CDN is likely to serve byte range requests from its own cache, provided that it has an uncompressed version already in its cache.

"*the CPU cost of trying to gzip them smaller is probably not an efficient use of the proxy layer machine resources*" - but we're not talking about gzipping in the proxy layer, right? We're talking about gzipping on the Nginx backend, behind some proxy layer that has forwarded the request to that Nginx backend; the proxy layer doesn't need to compress or uncompress the response itself. (Doesn't affect the overall thrust of your point, just nitpicking.) — Mark Amery, Nov 03 '15 at 16:33
Point 1 (about gzipping often not being worthwhile on cacheable resources because cacheable resources are often already minified and gzipping minified resources is a waste of time) feels very weak to me. It seems reasonable in principle, but I think it's false in practice; people who've [measured the effect of gzipping minified content elsewhere on Stack Overflow](http://stackoverflow.com/questions/807119/gzip-versus-minify) have found that gzipping still significantly reduces file size even when the content being gzipped is already minified. — Mark Amery, Nov 03 '15 at 16:41
Points 2 and 3, on the other hand, are much more compelling! If I may provide a tl;dr: proxies generally need a decompressed version of content that they cache, either to serve user agents that don't support gzip or to serve byte range requests. So if they're caching it, proxies need to decompress your gzipped response, costing CPU time. The only benefit you get in exchange is reduced bandwidth usage between the proxy and the backend, but in some circumstances (such as proxy and backend being on the same local network) proxy-to-backend bandwidth may be practically unlimited anyway. — Mark Amery, Nov 03 '15 at 16:46
In response to your first point about gzipping in the proxy layer vs on the nginx backend... I wouldn't call nginx the backend in the use cases where I've used gzip_proxied. For example, I've used nginx as a proxy in front of an application server (Node.js, Django, JVM, or what have you) in a load balanced set. In this use case, nginx provides a great way to cache (and optionally gzip) responses to lessen the load on your application server instances. So yes, I'm talking about gzipping in the "proxy layer" which is what I would call nginx in this setup. — Alex Nauda, Nov 03 '15 at 21:10

What are the options for the gzip_proxied directive for?

1 Answers1