3

I have a mostly static HTML website served from CDN (plus a bit of AJAX to the server), and do want user's browsers to cache everything, until I update any files and then I want the user's browsers to get the new version.

How do I do achieve this please, for all types of static files on my site (HTML, JS, CSS, images etc.)? (settings in HTML or elsewhere). Obviously I can tell the CDN to expire it's cache, so it's the client side I'm thinking of.

Thanks.

Alex Kerr
  • 747
  • 10
  • 32
  • 4
    I usually add some kind of random "cache-breaker" attribute when loading a resource, to ensure the user always get the latest version of it. You'd do something like `server.com/resource.css?cache_breaker` – Ulydev Dec 23 '15 at 23:41
  • 2
    that is how I do it too – loneshark99 Dec 23 '15 at 23:43
  • Yeah, me four. I just have the server dictate what the `?versionNumber` is so when it changes for release the version number also changes. But using the queryString approach is definitely the way to go. – Travis J Dec 23 '15 at 23:48
  • Look into HTTP ETAG. – Sam Axe Dec 23 '15 at 23:49
  • Refer this to stop HTML caching http://stackoverflow.com/questions/1341089/using-meta-tags-to-turn-off-caching-in-all-browsers. In addition quick fix for css, js would be adding version number/cache_breaker as mentioned in above comments. Other than this if you have access to server configuration you could modify server configuration to control over what resource to cache/not, duration etc. there are lot of resources online and you could look for configurations for type of server being used. More http://stackoverflow.com/questions/49547/making-sure-a-web-page-is-not-cached-across-all-browsers – pratikpawar Dec 23 '15 at 23:54
  • Also see http://stackoverflow.com/questions/3870726/force-refresh-of-cached-css-data/3870743#3870743. A query string parameter is fine when used correctly, although it still requires a request to the server (unlike the "slowly-changing URL" method). – Tim Medora Dec 24 '15 at 01:17

3 Answers3

5

One way to achieve this is to make use of the HTTP Last-Modified or ETag headers. In the HTTP headers of the served file, the server will send either the date when the page was last modified (in the Last-Modified header), or a random ID representing the current state of the page (ETag), or both:

HTTP/1.1 200 OK
Content-Type: text/html
Last-Modified: Fri, 18 Dec 2015 08:24:52 GMT
ETag: "208f11-52727df9c7751"
Cache-Control: must-revalidate

If the header Cache-Control is set to must-revalidate, it causes the browser to cache the page along with the Last-Modified and ETag headers it received with it. On the next request, it will send them as If-Modified-Since and If-None-Match:

GET / HTTP/1.1
Host: example.com
If-None-Match: "208f11-52727df9c7751"
If-Modified-Since: Fri, 18 Dec 2015 08:24:52 GMT

If the current ETag of the page matches the one that comes from the browser, or if the page hasn’t been modified since the date that was sent by the browser, instead of sending the page, the server will send a Not Modified header with an empty body:

HTTP/1.1 304 Not Modified

Note that only one of the two mechanisms (ETag or Last-Modified) is required, they both work on their own.

The disadvantage of this is that a request has to be sent anyways, so the performance benefit will mostly be for pages that contain a lot of data, but particularly on internet connections with high latency, the page will still take a long time to load. (It will for sure reduce your traffic though.)

Apache automatically generates an ETag (using the file’s inode number, modification time, and size) and a Last-Modified header (based on the modification time of the file) for static files. I don’t know about other web-servers, but I assume it will be similar. For dynamic pages, you can set the headers yourself (for example by sending the MD5 sum of the content as ETag).

By default, Apache doesn’t send a Cache-Control header (and the default is Cache-Control: private). This example .htaccess file makes Apache send the header for all .html files:

<FilesMatch "\.html$">
    Header set Cache-Control "must-revalidate"
</FilesMatch>

The other mechanism is to make the browser cache the page by sending Cache-Control: public, but to dynamically vary the URL, for example by appending the modification time of the file as a query string (?12345). This is only really possible if your page/file is only linked from within your web application, in which case you can generate the links to it dynamically. For example, in PHP you could do something like this:

<script src="script.js?<?php echo filemtime("script.js"); ?>"></script>
cdauth
  • 4,440
  • 2
  • 29
  • 33
1

To achieve what you want on the client side, you have to change the url of your static files when you load them in HTML, i.e. change the file name, add a random query string like unicorn.css?p=1234, etc. An easy way to automate this is to use a task runner such as Gulp and have a look at this package gulp-rev.

In short, if you integrate gulp-rev in your Gulp task, it will automatically append a content hash to all the static files piped into the task stream and generate a JSON manifest file which maps the old files to newly renamed files. So a file like unicorn.css will become unicorn-d41d8cd98f.css. You can then write another Gulp task to crawl through your HTML/JS/CSS files and replace all the urls or use this package gulp-rev-replace.

There should be plenty of online tutorial that shows you how to accomplish this. If you use Yeoman, you can check out this static webapp generator I wrote here which contains a Gulp routine for this.

Andrew Wei
  • 3,717
  • 2
  • 14
  • 6
-1

This is what the HTML5 Application Cache does for you. Put all of your static content into the Cache Manifest and it will be cached in the browser until the manifest file is changed. As an added bonus, the static content will be available even if the browser is offline.

The only change to your HTML is in the <head> tag:

<!DOCTYPE HTML>
<html manifest="cache.appcache">
...
</html>
Brent Washburne
  • 11,417
  • 4
  • 51
  • 70