0

I have a url with a test PDF on it, this is my origin: https://powered-by.qbank.se/miso/MISO_Testing_Document279626.pdf

I have that origin setup in an Azure CDN using the Microsoft provider. it's url is: https://misocdn-fail.azureedge.net/MISO_Testing_Document279626.pdf

When I update the PDF on the origin site, all the browsers that I have tested will bring back the NEW document with just an F5 refresh, not even a ctrl-F5. But, the CDN continues the cache the PDF basically indefinetly (2 days acording to docs or til I purge)

My question is, why isn't my CDN able to detect the change at the origin and browser is?

I understand that the CDN caches, but I don't understand what it is that a browser is doing to figure out this content is new?

Paul Duer
  • 917
  • 1
  • 7
  • 29

1 Answers1

0

To better understand the phenomenon it is a good start to obesrve the response headers received from the direct access url. One way to do that is to use curl -I <YOUR_URL> in your terminal.

You will see something like:

HTTP/1.1 200 OK
Date: Mon, 01 Oct 2018 09:03:57 GMT
Server: Apache
Last-Modified: Fri, 28 Sep 2018 19:11:57 GMT
ETag: "11ff1-576f33ab4c2a0"
Accept-Ranges: bytes
Content-Length: 73713
Cache-Control: max-age=86400
Expires: Tue, 02 Oct 2018 09:03:57 GMT
Content-Type: application/pdf

Out of these headers the browser uses the Cache-Control, ETag and Last-Modified to determine the freshness of the requested content. Cache-Control: max-age=<seconds> is the maximum amount of time (relative to the time of the request) a resource will be considered fresh.

Now, according to Mozilla Developer Network –MDN– Freshness is described as below:

Once a resource is stored in a cache, it could theoretically be served by the cache forever. Caches have finite storage so items are periodically removed from storage. This process is called cache eviction. On the other side, some resources may change on the server so the cache should be updated. As HTTP is a client-server protocol, servers can't contact caches and clients when a resource changes; they have to communicate an expiration time for the resource. Before this expiration time, the resource is fresh; after the expiration time, the resource is stale. Eviction algorithms often privilege fresh resources over stale resources. Note that a stale resource is not evicted or ignored; when the cache receives a request for a stale resource, it forwards this request with a If-None-Match to check if it is in fact still fresh. If so, the server returns a 304 (Not Modified) header without sending the body of the requested resource, saving some bandwidth.

So to validate a cached resource, an If-None-Match header will be issued by the browser if the ETag header was part of the response for the resource.

This is the mechanism that makes your browser download the new version of your pdf when accessed directly. Please also note, that these headers are present in the request from the CDN url as well, but the CDN edge servers are still storing your old file.

When it comes to the CDN cache, the ETag and Last-Modified headers are not respected. It is only the Cache-Control header in the HTTP response from the origin server that defines the time-to-live (TTL) period of a resource. In your case, it is 86400 seconds. So theoretically, the new version of your pdf will be served after 1 day from the first request through the CDN link. Up until that moment the old pdf will be hosted by the CDN edge servers. You can read more about the Azure CDN expiration management in the Azure CDN documentation.

dferenc
  • 7,163
  • 12
  • 36
  • 42