1

I'm looking to implement cache busting for our JavaScript and CSS based on code changes (changes to a DLL) in a .NET project by adding ?v=number to the end of the link/script tag src paths. I want to do this in the most performant way possible, since this will be used on every page of the application. Would just obtaining the DLL version be the best way to generate that number, as explained here?

Community
  • 1
  • 1
Donald
  • 960
  • 10
  • 19
  • I'm confused, why not just use the .net optimization bundle for this? – Eric Herlitz Jul 21 '15 at 16:42
  • Right now I am using Gulp to do all of the minification (and uglification of JS). – Donald Jul 21 '15 at 16:43
  • Is there a specific reason you want to bypass the caching using a querystring parameter? If you want to invalidate a cache entry in a client when you make a change to the resource, there are cache-control mechanisms specifically for this job. – Paul Turner Jul 21 '15 at 16:45
  • There are 2 hard problems in CS, naming things, cache invalidation and off by one errors. – Aron Jul 21 '15 at 16:45
  • @Tragedian I'm rolling out fairly large JavaScript changes that have a large impact on certain parts of the web site (as well as conversion) so I want to make sure that users get the most up-to-date version when I roll out an important change. – Donald Jul 21 '15 at 16:48
  • Okay, so you don't have a specific motivation to use a querystring? You just want a way for clients to know when a file has changed so they can clear their caches and get the most recent version? – Paul Turner Jul 21 '15 at 16:51
  • Correct, I figured a querystring based on DLL changes would be the best way to achieve this, but if there's a better way to do it that would be great. – Donald Jul 21 '15 at 16:53
  • @Tragedian For instance, [this answer](http://stackoverflow.com/a/32427/2465599) suggests a similar approach. – Donald Jul 21 '15 at 17:08

1 Answers1

3

Caching is one of the hardest problems in Computer Science. There's no magic answer to fit all problems, because caching is an optimisation technique: you make tradeoffs to achieve the performance profile you're interested in. Only you will know what is "best" for your problem.

For HTTP, there's a series of headers which indicate to clients how they should perform caching. As with all headers, the client may choose to ignore them and do its own thing, but you should be comfortable that most clients will pay attention to what you send back.

The relevant headers to this discussion are:

  • cache-control
  • etag

cache-control

This header indicates back to clients what basic caching rules they should apply. If this header is not specified, the client can make its own choice on what to do in regards to caching. If you're not sending this header, you can't make many assumptions about what your clients are doing.

The cache-control header is composed of a number of directives to indicate the caching rules to apply to the resource. The common ones are:

  • private | public - A private directive indicates that proxy servers should not cache this value; responsibility of caching lies entirely with the client. A public directive indicates that proxy servers may cache this resource. If you are serving resources which are customized for end-users (such as having the user's name somewhere on the page), the private directive is appropriate. If you are serving resources which are shared across all your users, public is appropriate (such as a favicon or logo).
  • max-age This indicates how many seconds the resource should be cached for before the client goes back for another copy, regardless of any other caching policies. This is the maximum amount of time the resource will remain in the client's cache.
  • no-cache This tells clients not to cache the resource and to check for a new version every time. This doesn't mean the client not cache the resource at all, but that it will check to see if the resource has changed every time a request is made. The etag header will be relevant here.
  • no-store This indicates that the client should not store the response at all.

A cache-control value to indicate a dynamic resource (a resource which changes on every request) is:

cache-control: no-cache

This tells the client and any proxy servers that this resource should be checked every time a request is made.

A cache-control header to cache a resource for 1 day looks like this:

`cache-control: public, max-age=86400

etag

The etag header is short for Entity Tag. You can think of the etag header as being like a hash-code for your resource. When you provide this header, you give the client a way to determine whether the resource has changed without having to retrieve the whole resource.

When a client has an etag value for a resource, it can make a request to the server which is like "give me this resource if its etag value is different from the one I have`. You still have the cost of a network round-trip, but your client will only receive the new resource if the value has changed.

The etag header most useful at saving bandwidth. If you are using an etag your clients will only download the new version when it actually changes, and otherwise will cache the value indefinitely. The requests the clients do make are very small and complete more quickly than re-downloading the full resource.

When you combine etag with cache-control, the cache-control header decides when the local cached value is no longer valid, and the etag is used in the subsequent request to see whether the resource has been changed.


Depending on what frameworks and libraries you are using, there are many ways to control these header values, but you will need to make informed guesses to what to set them to.

I would suggest that, where you can cheaply produce one, applying an etag to a response is a simple way to achieve decaching as you have asked for. You should generally combine an etag with a cache-control header that includes a max-age value that is appropriate for how "responsive" you need your clients to be when the values are changed.

As a final note, don't forget that caching is an optimisation. You can turn it off and you should if the cost of caching isn't worth the payoff.

Paul Turner
  • 35,361
  • 15
  • 90
  • 155
  • Having written all this, I suspect the problem has its root with in-place versioning, with caching being just one symptom of the flaw. – Paul Turner Jul 21 '15 at 17:38
  • Thank you for the very well-thought out reply. I wasn't aware of these headers, so I think with a combination of the two I can definitely achieve what I'm looking for! – Donald Jul 21 '15 at 17:52