Exploring the caching in the browser. Clarifying the cache-control header

November 25, 2019

The performance of our websites is an essential factor to consider. One of the ways to improve it is to cache our resources, and there are many ways to implement it. In this article, we define what caching is and go through the most common approaches.

Caching

Implementing a cache mechanism means storing a copy of a resource and reusing it when it is needed again. It is a well-established approach to improving the quality of the user experience on the web. When a user visits a website, we have a chance to preserve the assets. Thanks to that, it can be reused upon revisiting. An example of an asset that is a proper candidate to be cached is an image. When the user needs the content again, we can first check if a valid copy of it can be used without a new request to the server.

The above can have a big impact on the responsiveness of the page. Making fewer requests might also unburden your server. Playing with caching in a careless manner can lead to serving stale assets that should be updated already. Because of the above, it is crucial to configure the mechanism following our needs.

HTTP cache-control

The cache-control is an HTTP header that aims to specify how to approach caching the incoming data. It can be added to the response to control the cache mechanism from the provider’s perspective. Doing so changes the way the browser handles caching of a particular resource.

The cache-control header can also be added to the request header to control the behavior of the cache from the consumer’s perspective – for example, the browser. You might want to use it if you aim to adjust the cache policy of some resources. We can do it only if some cases. The browser, when processing the HTML file, sends requests for various resources on its own. You don’t have a straightforward way of configuring what headers the browser attaches to the above requests. A good example of a way to affect this behavior is disabling cache while DevTools is open. Doing that causes the browser to send Cache-Control: no-cache header.

The most important directives consist of:

No-cache

Attaching Cache-Control: no-cache means that the browser can’t use the cached resource before checking if there is an updated version of the resource. The browser achieves that by checking the etag response header, which is an identifier of the resource version. By comparing the ETag token, we can be sure if the cached resource is up-to-date. The server changes the ETag token every time the resource gets updated.

When we use no-cache, the client always connects to the server. He compares the ETag of the cached resource with the one from the server. The above process ensures the freshness of the resource without additional downloads of the content if the cache is up-to-date.

No-store

The Cache-Control: no-store intends to turn off the cache mechanism completely. Every time someone requests the data, a fresh copy needs to be provided. It might be useful for very sensitive data that we don’t want to write to the cache. We don’t usually need it in any standard situation.

Private

Response with Cache-Control: private can only be cached by the client, such as a browser. It indicates that proxy or CDN shouldn’t store it in a public (shared) cache. We might attach this header to prevent caching of private data that is

Public

Cache-Control: public header means that the response might be cached by any cache. It means that the restriction that was present with the private directive does not apply here.

Max-age

The Cache-Control: max-age=<seconds> header defines the maximum time that the browser might consider the resource to be up-to-date from the moment the client downloads it. We can combine the max-age directive with the ones above.

1	Cache-control: max-age=3600, public

Other way to specify the expiry time of a resource is to use the Expiry header. Browser ignores it if the max-age directive is present.

Other caching techniques

There are more ways to help the browser to cache assets properly. Let’s briefly go through some of them.

Managing URLs

A common approach to forcing the browser to fetch a new version of a resource is to change its filename. We explore this approach in the fourth part of the Webpack 4 course. In the article above, we generate a hash in the name of the file. It changes only if the content of the file has changed. It is especially beneficial if we split our JavaScript code into multiple chunks. This way, we can, for example, keep the libraries such as React cached in the memory but refresh our code that changed.

Another way to force the fresh content is to append a query string to the request URL, but it is not always encouraged.

Service Workers

Another way to handle resources is to use service workers. With them, we can specify very precisely how do we want to cache any data. Service workers can implement very simple caching or even help us build Progressive Web Applications (PWA). For more specific pieces of information, check out The basics of Service Workers: how should our application behave when offline?

The service workers can do more than just cache. For additional features, look into Using Push Notifications with Service Workers and Node.js

HTTP/2 push

Another interesting approach to cache is to make the use of the HTTP/2 protocol and the server push. With it, we can populate the data in the cache even before the browser makes a request.

For example, when the user visits the main page, he requests the index.html file. When the browser parses it, it notices that it needs to fetch some CSS files. This causes a slight delay that we can deal with by placing the css file in the cache before the browser asks for it.

For more details, check out the 15th part of the Node.js TypeScript series: Benefits of the HTTP/2 protocol.

Summary

In this article, we’ve learned the primary methods in which we can set up the cache mechanism for our resources. The more we use the cache, the more responsive our applications will be and setting the expiration to be as far as possible. It is crucial not to overdo it and don’t serve stale content. Aside from learning the basics of the cache-control header, we’ve also got to know other techniques such as adding hashes to the filenames, using service workers, or making use of the server push. Combining the above methods can significantly increase the responsiveness of our website, and therefore – the overall user experience.