Deep Dive into HTTP Caching: cache-control, no-cache, no-store, max-age, ETag and etc.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey programmers welcome back before we dive deep into the specifications and some of the best practices of http caching let's first define why caching is so important for us well first of all it's going to make your website or web app much much faster because you're not going to make this unnecessary call to the server anymore to download this asset and the second reason is that your users are usually using mobile data and they're paying money for that mobile data so imagine a user has to download the same heavy asset let's say an image of 200 kilobytes every time they refresh a page and they're gonna be paying some money for that image so in order to save them money and not let them download this image over and over again you can use caching and now let's get deeper into the topic all right so this is gonna be our diagram and i'm gonna try to make it visual for you so that it's much easier to understand so let's get started when we talk about the client and server relationships in terms of caching just keep in mind that client is not always the browser although it is most of the times like 99 of the times what i mean is by that there are some other devices embedded devices that don't have a browser but still make requests to the server and might have caches but to make our life easier just imagine that we're talking about web development and it's always a browser that makes a request to the server all right one other entity is obviously a server that's gonna send us the assets back to the browser be it in html css files any javascript file and so on or some media files videos and so on and we also have some intermediary entities like reverse proxies or cdn i'm going to talk about the cdn in just a bit so please bear with me next we're also gonna talk about directives they are very important when it comes to how the cache gets validated and so on but first thing first i wanna talk about hits and misses because this is the bare minimum of caching that one has to know so i'm gonna switch to a pen and imagine we have a browser and this cache is the browser's cache that's why it's so closer but to the browser than to the server so when a browser is parsing an html file and let's say it finds a script so a javascript or let's let's say if it finds an image file so it needs to download this image so first it's going to check this image in its own cache and ask the cache if this if the cache has has this image if it does have this image then it's going to return this image back and we're going to call this a hit because it hit the cache and cache made sure that the image version is fresh and can give it back to the browser so the browser is going to reuse this image from the cache but what is a miss then well a miss happens when we request something from the cache and cache either says that this file never existed in the cache or that it simply get flushed because the age of this file expired so the cache tells you to fetch a new one and then we're gonna simply go to the server and fetch this data or in our case an image and the server is going to give this image back to the browser and the browser is then then probably going to store it in the cache depending on what the server tells the browser to do well now you know more about hits misses but one thing to note here is that you as a developer don't have much control especially as a front-end developer you don't have much control over what you're gonna store in your browser's cache and whatnot and especially how to clean it well the reason is when the server is sending files back to the browser it can also send some headers so the browser is automatically going to pass those headers and based on those headers it's going to store the files in the cache so in the code in javascript you have no control over this maybe one thing that the user can do is simply to clear their browser cache but it's a different topic so what i want to say is that there is no specific api for working with a cache on the browser on the client side now let's go back to the our our diagram and before that when i mentioned that the browser sends headers that define how the cache is going to store it what i meant is basically this as you can see we are on cloudflare's website and apparently the user requested an html file from cloudflare and cloudflare returns some headers one of them is cache control and cache control defines how this entity or how this file is going to start be stored in browser's cache in our case it also sends some directives or attributes so to say one of them is public and another one is max age we're gonna go over directives as you can see here on the right side in a bit but this is how it usually looks like when you open your dev tools in the browser and inspect an element that you just downloaded all right so going back to our diagram what do we have here we have a browser that makes a request so this request goes to the server but we can also have some intermediary entities one of those cases is the cdn what is a cdn cdn stands for content delivery network i'm gonna draw something to make it easier for you imagine you are a user somewhere in asia all right so let's say this is you and you are requesting some data and the origin server that is supposed to send you this data back is sitting in let's say in north america so obviously there's some distance between asia and north america what a cdn or a content delivery network can do is that it can be deployed somewhere in between let's say somewhere in europe it can also be deployed somewhere in australia and one of them can also be deployed somewhere in asia and it's gonna kind of act as a cache rather than a proxy well when you see a proxy here a reverse proxy or rather a cdn is kind of a type of a reverse proxy and cdns serve two main purposes first of all they are very good with caching and invalidating that cache for the users and the second reason is that they are usually globally distributed across the world across different continents so that if you are a user in asia let me change the color so you are here in asia and obviously the server is far away in north america you're probably gonna go grab this data from a cdn instead of going all the way to the origin server because they have the same data here so and depending on your location if you suddenly change your location and you're in europe now then you can simply go to this cdn instead of going to the cdn cdns are basically leveraging the location so you're usually going to go to the cdn that is closer to you all right so it's just a proxy and now let's see what directives we have here so first of all max h it's the one that you would usually see more often as you can see in this image above we also set a max age and we also set some value on the right hand side what this value is is a number of seconds in this case 14 400 seconds okay but what do these seconds do let's take a look so i'm gonna simply clear all of this so when a browser requests some data from the server the server is going to respond as we understood and it's going to set some max age and it's going to set this message obviously in this cache control header okay so all the directors here are are related to cache control don't forget that so till here these are the directives i'm going to talk about pragma expires in very very soon very very soon so the server is going to send some data and let's say it has max age so max h basically means that this is the maximum time that the asset is able to be stored in the cache so when the time actually goes beyond this max age the cash is simply going to get rid of it so it's basically like an expiry date on on a milk on a bottle of milk you know so it works pretty much the same way and if it doesn't exist in the cache anymore then the next request the browser is going to go straight to the server instead of the cache we're going to talk about this max s max age in just a second so the next one is no store what is no store directives like no store no cash must revalidate are very confusing and i really understand that but i'm gonna try to do my best to explain it to you because they're very important so no store as the name probably suggests suggests means that you shouldn't store any data in the cache so when the server responds with no store attribute what it means that the browser should not be storing it in a cache in a new way what are some use cases for using this attribute so imagine you're a banking software and you obviously don't want to cache the amount of money that is left on users account because then you would have some staled stale information and obviously you always want to be up to date on how much money a person has so don't store sensitive information in the cache and second use case is when the server responds with an html file it's a really bad practice to hash or sorry cache html files because html file is like an entry point for our application and holds references to other assets like our styles like css files and javascript and so on and images so html file should never be cached and it's a good practice to send this no store attribute with an html file well the next one is no cache and what no cache is well no store no cache sounds kind of similar right so what node cache means is imagine we again search some file from the server and we save it in the cache and then the browser wants to make a similar request to the same file it means that there's never gonna be a hit when we have no cache directive so it's always going to go first to the server and then ask for the server for a file so it's basically omitting the cache and doesn't matter if this file was already there or not it's always going to emit the cache and it's going to always miss okay and the next directive that we have here is must revalidate what must revalidate does is whenever the browser goes to the cache to ask for the specific asset it's it makes sure it simply makes sure that you're never gonna get a stale response from a cache no matter what the circumstances imagine you did some magic there and the cache is still somehow able to give you stale information master validate means that it's going to make sure that the file in the cache is always fresh that's why must revalidate is usually used together with max age so that master validate is able to see whether the max age has been passed or not okay all right the next one is public and private this is very easy to explain basically when a cache control header as you can see here has this public value it means that the origin server allows the file to be cached on cdns or reverse proxy or anything in between when we have a private directive on the other hand we are allowed only to store this cache on the client so only in the browser's cache so why is it important it's important to first of all to notice that cdns are usually public so it's not your private and you shouldn't never store very sensitive information there otherwise it's it's going to be a security flow all right this is clear we can move on so immutable stale while revalidate still if error these are also pretty interesting ones so immutable basically means that the asset that the server gives to the browser is never going to change and it can leave it in the cache for as long as it can it's basically the same as max age and the max the the maximum number for max age is one year so i would personally never use that it's quite risky but the next two ones are also interesting so still while revalidate do you remember how in no cash we are always gonna revalidate so we're always going to go to the server and ask the server first without going to the cache well what stale while revalidate means that you're able to get the stale version of the data so you can go grab this data from the cache while i'm revalidating it from the server so let's say revalidating takes five seconds and you still have some data in the cache for the same object of course you're gonna take this file from the cache and show it while we are fetching the fresh one from the server for five seconds i'm not sure when you would want to use this but it's kind of a kind of a fallback so to say and we also have still if error imagine if we go to the server to fetch the data that we want to revalidate and if the error happens we're simply going to serve the one from the cache even if it's still so that's these are kind of interesting fallbacks that you might also want to keep in mind all right these two are also although three so pragma is a very outdated uh header it's almost never used unless you want to support really really old browsers same with expires nowadays you only use max age and shouldn't be using expires although if you want to support old machines go for it very it's mostly used on for cdns to define different user agent specific settings for example the browser language because some operating systems are not always in english it can be in italian in german or whatever and then you want to kind of track that by a public cdn and then surf only the data that is matching the user agents vary and there are different attributes like accept accept language and so on but i'm not gonna dive deep into this instead i'm gonna link a video in the description so that you can check it out all right now let's go into a bit deeper and let's see what happens so we're talking about heuristic caching what's going to happen if you don't supply cache control so in this example we have cache control and some directives but what happens if you don't have it well then modern browsers are gonna still cache it yes they're gonna still cache it and they're gonna use this heuristic caching what they're gonna do is they're gonna look at different headers for example last modified and they're gonna calculate the difference between the current date and last modified and now let's say we are in 2023 or 22 and it's been less modified in 2021 so there's one euro gap or one year difference and modern browsers are gonna still catch this asset but for only about 10 percent of this time span that has passed since it has been less modified so it's still going to cache it very similar to max age and it's going to set the max age to let's say one and a half months because it's like 10 percent one one year it's not going to be max h but it's going to save it heuristically somehow all right next point is if modified since so when you look at this part this is basically the response headers that the origin server is responding to the browser and it's responding with obviously cache control with max age less modified and date and so on but did you know that we can also send cache related headers when we make requests so if you open your developer tools and inspect one file that is in the network you're gonna see that we have request headers and response headers so this is a request header and the way we are telling the the origin server to send us a new file or not is by sending this uh header called if modified since and we're sending the data that we're taking from here and if the server sees that yes the modified data is newer than the one the client sent then we're going to send the client a newer version because we have something new but this led to a lot of issues when it comes to time zones because it's really hard to manage time zones if you don't believe me i'm gonna link a video of tom scott who talks about this issue in a very interesting video so please check it out and people came up with this other idea of etags and if not match so etag is basically the same as lust mattified although it's not it doesn't give you back the date but it gives you back some kind of a hash or an etag let's say this is our etag and then what we're gonna do is simply upon making a request to the server we're gonna supply the same etag in this header if not match and we're gonna extend the same header and now the server can compare this e-tag to the one that it's sent previously if the e-tag doesn't match then it's gonna know that the file has been changed and i have a new version so i'm gonna send you the client a new version you could ask like how does the server generate this e-tag servers are free to choose the way they generate an e-tag but the very common way is simply by taking the contents of a file and hashing them the way they do it is basically if there's a change in the contents of the file then the new etac is going to be generated if you're a node developer if you're a javascript developer um there's a package called etag on npm which can do that directly nowadays cdns are so mainstream and cdns are used everywhere for assets that has to be cached so you don't usually need to worry about e-text they're doing it automatically and the last technique is cache busting and it's also very very common let me tell you what it is all right let's say you have an a react or angular application or from some other framework whenever you build your app app it's going to spit out some files into the disk folder and probably one of the files is main.js and it usually has this hash in the name let's say main dot some random hash and dot js have you ever wondered why this hash is in the file name the reason is whenever we deploy a new version this hash is gonna change and it's basically gonna be a new file with a different name so we're basically making sure that our assets are revalidated and they never go to the to the local browsers hash or cache and cdns never serve us a stale version because we're essentially requesting a completely different file and by using this technique you can always make sure that your users always get the changes that you push on your development server obviously this is good for dynamic content and everything else that we have discussed above is good for more or less static content like assets images styles that rarely change and so on alright guys i hope you enjoyed this video and learned something new if you are still interested in similar topics make sure you check the playlist of software development topics videos and i'm gonna see you in the next one goodbye
Info
Channel: Software Developer Diaries
Views: 24,519
Rating: undefined out of 5
Keywords: software development, software developer, programming, software engineering, web development, coding, http caching, aws http cache, varnish http cache, web cache proxy server, web cache server, cache control header, cache headers, http cache server, http cache headers, http cache control headers, http header no cache, http no cache, http etag, html header no cache, pragma header, cloudflare etag, http caching explained, http caching policy, http caching with etags
Id: Cy2ZJOBgk84
Channel Id: undefined
Length: 21min 27sec (1287 seconds)
Published: Sat Sep 17 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.