19. System Design: Distributed Cache and Caching Strategies | Cache-Aside, Write-Through, Write-Back

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome to concept encoding and this is shreyansh and today in the high level design topic we are going to do today yes that is the very important topic that is caching right so the caching part I have divided into two parts so in the part one I will be covering this part so today I will cover this this part one so here caching strategy and all I will show you with the help of sequence diagram itself so you will never ever forget what are the different strategies available in cash not in even interviews also right and then I will tell you what is distributed caching and this cash eviction policies I will cover in the part two okay so this will be the part one and part two so let's start without any wasting time let's understand what is caching so as everybody knows right caching is a technique to store frequently used data in a Fast Access Memory rather than accessing data every time from a slow access memory right so this is a self-explanation that caching is a technique where if any data which is used Too Much frequently right so you know that uh slow Access Memory could be what your uh hard disk right or if you say that you are it could be your DB DB axis these are slow Access Memory rather than the Fast Access Memory could be you know right there are different types of uh Fast Access Memory like Ram so Ram is much faster than your hard disk and DB right so this is just an example so caching is a technique to store frequently used data in a faster Access Memory rather than accessing every time from the slow Access Memory that is what actually known as caching okay so this makes our system fast yes because every time we read the data we are reading from the fast Texas memory so our operation would be faster so our latency would be very less and our system would be very fast right it also helps to achieve the fault tolerance this is very very important to understand how it helps to achieve the fault tolerance then you will get your answer in one of the caching strategy how it helps the fault tolerance that this one right back Okay so we will cover this in depth like how it helps in achieving the fault tolerance okay so now you know that there are different type of caching present at different layer of the system right so from front end to back end to DB everywhere you will see that cache is available uh I am not a front-end guy but I am I know that there is a browser caching available also that whenever you load a page in the browser also has certain memory where it uh stores that web pages which are frequently used so same web pages should not be loaded again from the host right so we have caching at client side and if you know that at CDN also we keep the static data so that over the geographical distributed uh users they get the faster content delivery right in load balancer in the previous video I have already shown you that load balancer also have the cache capability but today I will much more focus on this servers side application caching like radius right so this is very very important which I will cover today okay now let's I am going into that server side application caching right so here if you know that what all the components generally present you have a client you have a load balancer you have a server and then you have a DB right so this cache sits behind your server and DB right so let's say if client wants certain information it goes to a load balancer there might be many app servers so one of the app server uh will get this request from the load balancer now first app server tries to read this data right to fetch the data from the cache instead of getting it from the DB right so that's where the cash application cache in the server side cache or you can say that the backend cash right so you can call it with a different name that's where it comes into the picture so before application directly talks to the DB it first talked to the cash hey do you have that data present if yes give me I will return back if not then app goes and take out the data from the DB right there are different strategies will come to that soon but this is where the point where cash sits between app and DB in the server side application caching okay so now what is distributed caching so before even understanding distributed caching term right you need to understand like let's say if you have multiple app app server one app server 2 app server 3 let's say they are using only one cache you have only one CAD server which holds the data right only one cache server so you already know the limitation what is the limitation with this so first limitation would be the scalability at a particular point of time you can't scale it right it has limited space or you can say that limited resources after a particular point of time you cannot scale it more right the second limitation of this is like a single point of failure if this got failed then your caching capability gone right so that's where the distributed caching comes into the picture right so what happens in the distributed caching is we have a cash pool right and we have lot of cache servers cash server one cash server 2 cache server 3 cash server 4 there are so many cache server present into a pool right and we have a cache client which is known as cash client right so each app app server uses this cache client right they have this cache line with them and they use this cache line to connect to a particular cache server right so each app server has its own cache client and using that cache client it will get one of the cache server where it will store the data or read the data now your question should come into your mind that hey how one of the server would be allotted to a particular app server how right so for that it uses consistent hashing technique and if you have not seen my previous in the high level design from Basics to Advanced playlist right this sixth number video I have covered the consistent hashing in depth right so please have a look because that will make your fundamental clear like how consistent hashing works right so I'll just I will just give you here the Glimpse right how consistent hashing works but to know in depth please uh have a look at this video right so you know that in consistent hashing it makes a ring and all this cached servers right will be present here on the ring so these are all cash server one cache server 2 cache server 3 cache server 4 cache server 5 right and it when any request so let's say whenever any request comes in of let's say uh put I want to put the data into a particular cache server let's say from App server one okay so whatever the request comes in so it will get based on the hash right it will get one of the point right and whichever point it will go it will rotate in a clockwise Manner and the first cache server will be allotted to it right so this is how generally the so it's nothing it's uh nothing special about the distributed caching it is actually nothing but a consistent hashing technique right in consistent hashing I called them as a node right because it can be any server it can be your DB server it can be your app server it can be here in this case it is cache server now let's come to the major section of this uh video where we want to know about the different caching strategy right so there are five caching strategy cash aside read through cash right around cash right through cash and write back or behind cash okay let's see this one by one and let's see with an example so what is Cash aside so let's first read out the definition or this terms then we'll see with the sequence graph application first check the cache okay so this is your client this is your server or which I am calling it as an application okay so application first check the cash if data found in the cache it is called Cash hit means you got the data and data is returned to the client if data is not found in the cache it is called catch Miss and application then fetch the data from DB store it in the cache and data is returned to the client now let's see with this one so now here is the client this client said that hey read any read request any get call okay now it goes to the application of our backend server so what it says that first application first read the cache so it this is our first read a get call comes so server will first check hey cash do you have the data present right then we have an if else condition whether it could be yes cash has if cash has means it's a cash hit if it's not a if if cash doesn't have the data wanted to be it would be cash Miss right so it's an if else condition let's say cash has the data means it's a cash hit so cash will return it here right and cash will return it and ultimately it will send the back to the server or client send the response but it is possible that cash doesn't have the data then what it will do it it will be a cash Miss now so this would be our third step now it become a cash Miss then what you will do is our application or server will fetch the data from the DB now it will get the data from the DB DB has the data and then it will update our cache also it will write the data into the cache right so that any further get calls comes so that it become a cachet okay there are certain pros and cons of it like so one thing is very clear that it is this approach is good for heavy read so whenever you have an application which is very very heavy read then this cache is good for it because whenever any get request comes we are looking for cash and if cash has the data it's a cash hit we don't have to go and query the DB right so this is a good approach when our application is read heavy okay and the second what is a pro is like let's say if cash goes down this goes down will it affect us no what it will happen is it will be considered as a cash Miss then it fit as a cash Miss we will fetch the data from the DB right but uh it is still the request will get success but only thing is we won't be able to utilize the cash we will not be able to read the cash or write into the cache right but we can still fulfill our queries from the DB so this is another Pro like even cash is down request will not fail we will fetch the data from the DB and another uh you can say that Advantage is that the document the data model which is there in the DB so let's say in DB you have a table employee and it has let's say ID then name uh then address okay and then you have another table uh you can say that salary where it has salary detail or something something right so this is your DB okay but what it capability has is in the cache you can have let's say key is your employee ID and the value you can put this like this any you don't have to have the same structure you can have your own structure that name address salary and inside you can create salary as an object and inside salary object you can have more details right you even if there is certain data of you which you wanted to put you can add over here also right so that's what I'm trying to say is that this cache data structure of storing the data is independent of the DB how the DB stores the data because this application is taking the responsibility of writing into the cache application is taking the responsibility of writing into the cache so whatever the data you get from the DB you can modify it in the way you want it to store in the cache and then you can store it one of the con you will see that whenever any new request comes in let's say any so right operation comes in now here in this cache assigned during write operation are we doing anything are we doing anything during write operation no right operation is still writing directly into the DB right so one thing is that let's say if any write operation comes and it's write some new data into the DB definitely that new data is not there into the cache so any get call comes for this new data it will try to read from the cache will it be available no it won't be available it will be a cache mess and then it will fetch from the DB and then it will write in into the cache so for new data read there will always be a cache mess first but there is still a problem with this right so here if you see that there is a chance of inconsistency now let's say understand this thing how the inconsistency can come so you have a read operation you have a read operation and this is the right operation so now let's say write operation comes right let's say this is our my cash this is dot DB okay so now let's say one thing is write operation comes is let's say uh it is putting certain value as 10. okay put it into the DB fresh okay now the get request comes so it will say that hey do you have this data just assume that any data employee data anything okay so it said that hey do you have this employee data no it will go fetch from the DB and then it will insert into the cache and it will let's say put the 10. okay now let's say another request comes into the right now right is putting let's say 11 of this it is updating it is updating it is updating the request so now this DB stores 11 instead of this 10. but has this right make this cache invalid it no this cash is still valid when any get request comes it will check the cash hey do you have this employee data it will say yes and it will return 10. even though DB has now 11. so there is a chance that without appropriate caching is not used during write operation so you have to use certain caching technology or caching strategy at right also otherwise your read will become inconsistent data between because there is a chance that you have an inconsistent data your DB has latest value and your cash has stale data and your read will get the stale data so you got this right how there is a chance of inconsistency between cache and DB so there is very very similar to cash assizes read through cash this is very very similar just understand this application first check the cache the same if data is found in the cache it's called cache same yes you got the data and the data is returned to the client exactly same as cash aside now the third point if data is not found in the cache what it is it is cache Miss now in the cache aside what we will do is application take the responsibility to fetch the data and put into the cache but in the read through cash the cache itself takes the responsibility to fetch the data from DB and store it back to Cache right so here the only difference is that now our application don't have to worry about that okay if cash Miss I have to fetch the data from DB no so a request comes a get request comes to our application our application first checked with the cash hey do you have the data now cash say that a case cash has the data if it is a hit then it will return it if it is a Miss if it is a sales condition if it is a miss this is if it is a miss then what it will do is now this cached Library this is the library let's say this will take the responsibility to fetch the data from the DB and whatever the data will get from the DB it will update itself and then it will return to the application an application will return to the client so you got it so earlier fetching the data and updating the cash was taking uh Care by the application of a server but here the whatever the cached Library we are using it takes care of fetching the data from the DB and updating itself so Pros good approach for read heavy read application the same thing because it is also for the read and whenever read request comes it first check with the cache second the logic of fetching the data from DB and updating the cache is separated from the application if there is a cash so now if for from the server perceptive from your application perceptor you don't have to worry about it's a cache hit or cash cash Miss I have to have a certain unique logic of okay if it is a cash mesh do this all this logic will be present into inside cache Library whatever the library you are using uh you can have added over there that now it will fetch the data from the DB and right into the cache okay but what is the cons there are con also it is similar to the cache aside any new data there will always be a cache Miss second without any right operation there is a chance of inconsistency between cache and DB another con is the cash document structure should be same as DB table you can't Define your own cache structure now right whatever the DB has so let's say DB has this employee table it has 20 fields the similar kind of document Also let's say whatever the employee ID is your key the same kind of do would be present there with all the 20 Fields the same document right so it would be now one-to-one mapping with your DB table you can't have your own cache document structure kind of thing okay so that is your read through cache now let's see what is right around cash so this is which will help in both so inconsistency program I told you right in both cash aside and read through cash so generally I told you right they use they have to be used with certain right operation that is what it is right around cash so in right around cash if we directly write data into the DB right we don't even touch the cache we write directly into the DB okay but right it do not update the cash document or you can say the cache data but what it does is it will make cash invalidate or you can Market dirty it will mark it dirty so now let's say any write request comes in okay so write request comes in let's say uh put right or any patch comes in so what it will do is it will first directly write into the DB so now let's say cache already has value 10 DB already has 10 same so this is your get call comes it connects with cash any let's say put or patch or post comes it will connect with directly DB so now any output request comes that hey change the value of this 10 to 11 so it will directly write to DB it will directly write to DB so it will change it to 11. but what it will do is it will invalidate the data into the cache so what it will do is it will also Mark this hey this is now your dirty so there is a certain thing called Dirty flag so it will make it true hey this is now dirty don't read it so during get call it will check okay hey this data which I am about to rate is it dirty or not if it is dirty means this data has been changed then it it's it will be treated as a cache Miss and it will read from the DB and then put the latest data okay okay so what is the advantage it is again good for heavy read application because this is used with heavy read only otherwise there is of no use of right around right it has to be used with read heavy application and it has to be used with certain either read through cache or Cache aside cash right alone this cache strategy is of no use right it resolve the consistent inconsistency problem between this what is the disadvantage for new data read there will always be a cache miss the same thing because I am saying you this is just helping the read through cache or cash aside strategy only so whatever the disadvantage it says that cash mesh one it is still the same problem right every time get request will come it will be a cash Miss only right because new data is not even present because let's say if new data is coming post and it is something putting new data it is not even touching the cache right it is not putting that new data here it is just putting it to the DB so for get it would be a Miss only and one more thing is if DB is down your right operation will fail So currently your right operation is totally dependent on DB availability if let's say your DB wherever your data center is it there is a disaster then your DB goes down will your right operation Works no it will try to write into the DB but DB is now down it won't be able to write it so your right operation is not fault tolerant right so currently your is your right is fault all right no your right is not fault tolerant because when DB is down your right will also fail okay so this is the disadvantage of this one now let's say one more write through cash now so what is right through cash so now you uh with the name itself it is clear it passing through something right through so first thing is first write the data into the cache and then in synchronous manner write data into the DB so now let's say request comes here write something let's say post request something new data now our application what it will do is it will first write into the cache okay so now cash will always have the latest data right and after that it will write into the DBA so now DBS also have the latest data so now let's say you have cash you have DB right and let's say post request Comics let's say put 10 just assuming that okay I need to insert 10. so what it will do is it will first insert into cache and then it will for a sec insert into DB if any of them insertion got failed you have to throw the exception right so both should pass if DB insertion got failed or if cache insertion got failed you have we have to fail the transaction that's what I said in the asynchronous manner we have to write the data into the DB let's say you have write the data into the cache and in asynchronous in synchronous manner you have to write the data into the DB let's say if DB failed you have to now roll back the whatever you have inserted into the cache also so what is the pro cache and DB always remain consistent because if success is there it has to be success if this got failed it has to be rollback also right and let's say if cache itself got failed we will fail the right operation itself second is Cash hits chances increase a lot now here if you see that even for the new data even the new data is now present into the cache right so whenever any new post request comes in it added the data into the cache right the first time it added into the cache it added into the DB and whenever the get request comes it will find the data now so now if you see that what the advantage where we have that for the new data there will always be cache Miss now in the right through cache it would be because even for the right operation it insert the data into the cache okay so but there are certain cons also first thing is alone it is not useful now let's say you are inserting the data into the cache and the DB will you be uh using it alone no you have to read you have to use it with either cache aside or read through cash then only you will get an advantage of it otherwise it just like we are adding a latency earlier we were just writing into the DB now it's right through cache we also have to write into the cache so only latency will get increased if nobody is there to read the data from the cache what would be the advantage so right through cash alone we can't use it we have to use it either with read through or Cache aside second two phase commit need to be Implement means I told you right either both should be success or both should be failed it should not be like cash success and DB fail right then we can't keep the data into the cache otherwise it will have the latest data and it will since DB failed it has the old data then it would be an inconsistency issue so we have to implement the two phase command where like if cash success DB success okay if cash success and DB fail we have to roll back the cash also right so either both will success or both will fail that's all that's what two phase commits say so we have to make sure that another thing it is still a not fully fault tolerant if DB goes down or cash goes down what will happen our right operation will fail if DB goes down will it be able to write into the DB no if it not be able to write into the DB or write operation will fail and last you can say that very important right back or behind cash strategy so we first write data into the cache the similar to through we first write data into the cache but when we write data into the DB right we will not write into synchronous we will write into a synchronous manner so let's say first client to answer that hey this is my post request add something new data now our application will first write into the cache now let's say our data is present into the cache now instead of writing data into the DB right I am not dependent on DB now I let's say I push the data into the queue right and from this queue maybe later I will read it and then put the data into the DB in a synchronous manner right now here if you see that now I am not no more dependent on DB let's say even if DB goes down for two hours I am okay let's say my cash turnaround time is let's say 24 hours so means till 24 hour DB is down I am able to fulfill the request from the cache because cash has the data and once the DB come back from queue I will read it and write it into the DB so it will bring your fault tolerance now you have certain tolerance level what is the pro good for write heavy application so one of the advantage you will see right so our right uh API latency goes down because earlier what we were doing is uh from we were writing into the DB right as a normal flow right operation right into the DB now we are not writing into the DB we are just writing into the cache writing into a cache is faster than writing into a DB right and then we are not waiting to write into the DB we are just publishing a message into the queue hey you take care of writing into the DB in a synchronous way okay but by the time we have write data into the cache we have already sent success response like we just published the message and in the US this will this task will be in async manner it might take certain time we don't care as soon as we insert the data into the cache we are good we will send a success the cash hit chance increase a lot why because cash always has the updated data whenever any right post put comes cash will get updated so it will always have the latest data so during get request the it will get the data from the cache itself the cash itself it has the latest data gives much better performance when used with right through cash yeah so this one just I want to add like again using it alone yeah it is a benefit that even if DB goes down my system would still be up my write request or post request will not go fail there is an advantage but still the real Advantage comes when you will use either with catch aside or read through cash because you are inserting the data into the cache right if somebody can read from it then you will see that your whole system read also will become much more faster so that's why it gives much better performance when used with read through or cash or site cache but there are certain cons of it first so this is not a con so this is a pro even when DB fails right operation will still work so this is a Pro itself so this is not con this is your pro it's then gone the disadvantage is let's say now just see this you have a cache you have a DB and that's it this is your certain queue so let's say your time to live of data so here you wanted to put 10 so cash has the latest data and in queue you have let's say I want to put 10 into the DB but DB doesn't have any this data yet and time to live is let's say I have put three hours here in the cache so okay so now Q when Q was trying to put this data into the DB now let's say DB goes down right so Q Will Wait Q will just await it and it wait for DB to come back let's say but let's say DB is down for 5 hours right so at 5r dbo is down so after three hours what will happen after time to live for this cash is three hours so after three hours this data will be removed from the cache DB is still down because it is down for five hours so after three hours still it is down so DB doesn't have the data all the application which has read this data let's say if they are trying to do certain operation on top of it will it work because now this 10 is not available in the cache 10 is not available in the DB right so this is problem can comes certain kind of issue can come so these are the caching strategies which I have wanted to cover in depth and cash eviction policies I will cover in the next video of the part 2. okay guys if you have any doubt feel free to ping me in the comment section or uh please connect with me on LinkedIn right we can discuss more okay guys thank you bye
Info
Channel: Concept && Coding - by Shrayansh
Views: 27,035
Rating: undefined out of 5
Keywords: Distributed Cache, Caching Strategies, Distributed Cache and Caching Strategies, System Design: Distributed Cache, Cache System design, Cache-Aside, cache aside strategy, Read Through Cache, read through cache strategy, Cache-Aside Caching Strategy, Write Around Caching Strategy, Write Through Caching Strategy, Write Back Caching Strategy, write behind cache, write-through cache, write-around cache, what is distributed cache, how distributed caching works, HLD, interview, cache
Id: RtOyBwBICRs
Channel Id: undefined
Length: 37min 40sec (2260 seconds)
Published: Sun Jun 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.