One way to achieve this is to make use of the HTTP Last-Modified
or ETag
headers. In the HTTP headers of the served file, the server will send either the date when the page was last modified (in the Last-Modified
header), or a random ID representing the current state of the page (ETag
), or both:
HTTP/1.1 200 OK
Content-Type: text/html
Last-Modified: Fri, 18 Dec 2015 08:24:52 GMT
ETag: "208f11-52727df9c7751"
Cache-Control: must-revalidate
If the header Cache-Control
is set to must-revalidate
, it causes the browser to cache the page along with the Last-Modified
and ETag
headers it received with it. On the next request, it will send them as If-Modified-Since
and If-None-Match
:
GET / HTTP/1.1
Host: example.com
If-None-Match: "208f11-52727df9c7751"
If-Modified-Since: Fri, 18 Dec 2015 08:24:52 GMT
If the current ETag
of the page matches the one that comes from the browser, or if the page hasn’t been modified since the date that was sent by the browser, instead of sending the page, the server will send a Not Modified
header with an empty body:
HTTP/1.1 304 Not Modified
Note that only one of the two mechanisms (ETag
or Last-Modified
) is required, they both work on their own.
The disadvantage of this is that a request has to be sent anyways, so the performance benefit will mostly be for pages that contain a lot of data, but particularly on internet connections with high latency, the page will still take a long time to load. (It will for sure reduce your traffic though.)
Apache automatically generates an ETag
(using the file’s inode number, modification time, and size) and a Last-Modified
header (based on the modification time of the file) for static files. I don’t know about other web-servers, but I assume it will be similar. For dynamic pages, you can set the headers yourself (for example by sending the MD5 sum of the content as ETag
).
By default, Apache doesn’t send a Cache-Control
header (and the default is Cache-Control: private
). This example .htaccess
file makes Apache send the header for all .html
files:
<FilesMatch "\.html$">
Header set Cache-Control "must-revalidate"
</FilesMatch>
The other mechanism is to make the browser cache the page by sending Cache-Control: public
, but to dynamically vary the URL, for example by appending the modification time of the file as a query string (?12345
). This is only really possible if your page/file is only linked from within your web application, in which case you can generate the links to it dynamically. For example, in PHP you could do something like this:
<script src="script.js?<?php echo filemtime("script.js"); ?>"></script>