My thoughts on the CAP theorem

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

just like most acronyms on software engineering the cap theorem is something we engineers deal with on daily basis even before this thing was given a label and name the cap theorem stands for consistency availability and partition tolerance and even if you don't know anything about this i guarantee if you work with distributed system it just makes sense when i explain what this means now you're gonna say we're saying duh of course but when you when you just read the acronym or the definition of cap ethereum you will get so confused until you put it in an example and that's the goal of this video let's just jump into it welcome to the backend engineering show with your host hussein nasser and the cap theorem guys is very important to help you as an engineer as a back engineer to be specific to make better decision on your distributed system if you have a distributed system even if you don't it's kind of good to know because despite the name and and the attachment of this theorem is attached to distributed system and network system you can essentially have some version of that at a single note i'm going to explain that as well in order to explain the cap theorem we have to explain what is it stands for so it stands for consistency availability and partitions tolerance so the first thing here is consistency and consistency differs from the consistency in acid a little bit here what we mean by consistency here is a consistency in read so if i write something to a system and treat the system has a black box here could be one not could be 700 right if i write something and turn around and read it or someone else read it they better see my change that's what a consistent system is very simple definition okay you might think now who's saying okay isn't all systems consistent no really right if you if you have like a master and a bunch of replicas okay or radius caches for example cash or memcache d if you write something to the main source of truth it takes time to propagate to the rest of the nodes and in that time although you actually committed and the client who issued the right says share you got it it's written the read to the other replicas or the other cash note or the other thing might not get the latest change as a result you get inconsistent system so that's what consistency is okay so when you get the same results you're consistent when you don't you're inconsistent i know some people say eventual consistency and all that stuff but you can add eventual to literally anything and it becomes true hey my system is not is not working it's failing oh well we're eventually available just wait your system is not really perform it well don't worry it's it has a lot of load now but eventually it's going to become performed just wait it's eventually performance that's that's my system eventually because this is an adventure you can add eventually to anything else doesn't mean anything i i don't i don't like this term at all right if you're inconsistent you're just inconsistent at this moment don't add eventually to it to make it better i know a lot of people disagree with that but that's my opinion you can have yours of course so let's talk about the other one availability availability is when you wanted to issue a right or issue a read your right succeeds always and your read always succeeds it doesn't matter if you get a wrong result or all the results inconsistent just hey you're available you get something that's why people add caches here the more you have caches you're available but are these information up to date not really but i favor availability that's what what people mean i favor availability of inconsistent nothing wrong with that you can make designed choices based on your really requirements if you think about it so we have consistency availability and the final one which is the most confusing term here is partition tolerance and if you're if you work with databases and you heard the word partition here do not confuse it with the actual database partition it has nothing to do with it and i made this mistake now partitioning here is is when you have distributed system when you're you have multiple nodes and they cannot talk to each other that's called a network partitioned that's what it really means they cannot talk to each other there is a possibility of failure in partition network so when you hear the same partitions that means there is a possibility of failure so partition taller runs hey i know my system is partitioned and there is possibility of a failure in communication let's say you have a master node and you have replicas and the replicas or the master has to phone to the replicas right in order to issue the rights if there is a possibility if a failure in the network your network is partitioned your system is partitioned and you might say uh i can tolerate partition because i have partitioning in my system i might have failure but i'm gonna decide to tolerate it or you can decide no my system cannot possibly be partitioned there is no chance and and i won't have partitions at all that means you're a single node i have a single beefy machine and there is no other nodes this is this just when you look at that this is just one thing and when i write to it there is no partition there is there isn't i'm not talking to anything else another choice is hey um i have a gigabit ethernet connectivity between this machine and this machine so i don't really is i don't consider this partition you can you can you can you can make the decision that you're not partitioned but when would you have networks especially in the cloud where everything is virtualized software defined networking and whatnot you're going to get partitioned and when i say partition you gotta get failures you are there is a chance there's a tiny beanie mini chance to get partition right so now that we understand what consistency availability and partition tolerance the cap theorem now let's actually put it to test and and let's put it in an example here so in a nutshell this is how the theorem goes if you have a possibility of a partitioning in your system then you can either have consistent system or available you cannot have both that's what the theorem states essentially so let's take an example of how what does that really mean let's say you have a master node here which takes you right and you have two read replicas or you can say it's red as caches or whatever whatever right so we have two replicas secondary nodes if i issue a right here so my system just by looking at this my system is networked as a result is partition tolerance i'm i decided hey i'm going to decide to tolerate partitioning i'm going to decide to tolerate failures and as a result again i'm going gonna get a choice what kind of choice let's continue if i if i issue a right to my master node right i need to update the replicas and i get two choices here the first choice is i can immediately the moment the right comes to the master i can acidify it and and commit it to the actual primary node so i am acid in that node it's completely acid it's completely committed and i can decide to return immediately and says client share you get you got your right i can decide that so my system is available so i don't care if my my replicas get updated after one second or 10 seconds i i'm available i wrote and i got commit so asynchronously on the backend i'm gonna update those replicas on the background whenever i have the time whenever i feel like it so you are available as far as we know the right is available good and because it committed immediately and it's fast the reads are available because i can read and immediately get a result are they consistent no you can never guarantee consistency here because the moment you write and commit the time it takes to propagate that even if it's milliseconds someone could have already issued a read and got the old results and as a result one example is back in the day so back in 2006 is one exact good example of this youtube had this idea of replication with mysql so when you update your profile on youtube you write to your profile it goes to the masternode and you immediately refresh the read goes to the replica and you don't see the change so the change that you just made you don't see it and you freak out and the lead of youtube actually explain this because the read is going to the replica and you're not seeing so the system was nothing in since then but it was available that's the choice they made now so in a case of a partition term if there is a failure right you don't care hey i i can get failures on the back end these asynchronous methods can fail and can they decide to retry or they decide to to not update at all but i'm not available so now let's flip it so that's the ap availability right the ap i am part i totally partitions but i'm available so the flip of the coin here is when you let's take the same example but i wanted a consistent system a right comes to my master and i apply the changes to my read the the right ahead log my wall changes my journal my redo logs and i i don't commit i don't tell the user you're done no i'll let the user block i block the user and synchronously while this is happening i call them the first replica and i call the second replica and i say right right and i the client is doesn't i did not tell the client hey you're good no i love the client way so i write to the first replica and says did the replica replied yes okay the replica is yeah oh until all of them says yes yes yes committee committee committee committed i will reply to the user and i say your right has been completed i'm fully consistent in this case why if someone tried to read while i'm writing they're going to get the old result and that's fine because it's it's still consistent i didn't say that the read write committed you gotta wait but until all of these commit happened successfully any read that happens from anywhere in my system is gonna be consistent but are we available well if everything goes happy dandy dandy then sure but i'm tolerating partitioning this can fail a network error can happen while synchronously update that and you get a choice you have two choices as a designer of the system you can fail the right thus he's still unavailable but he's still consistent too you're gonna get the old results which is consistency to me but you're no longer available because hey you just failed my right you're not available the other choice is to retry hey i i wrote to this primary and i'm trying to write one two three sexy sexy sexy the final note i'll try to write to him and it failed so you can choose to fail like the first situation but you can also choose to retry hey retry retry retry retry retry and eventually it will be succeeding right you will eventually succeed so you can continue retrying until eventually you're gonna succeed but you're still not available because that's another thing that eric has expanded upon after 12 years says okay yeah if i retry then my system technically is available but it the latency just does not uh uh justify the availability because you are so slow so i'm going to consider you unavailable or eventually available right so he didn't like to use the eventually available because it's going to be silly right oh eventually consistent eventually available because that's what it is even in case of free tries you're gonna be slow and that in that case you're also not available that's that's the definition here so you're consistent you're partition tolerant so cp and then you're not essentially available right so you can either be ap available and you're targeting partition you can be cp consistent and partition tolerant and you can also be ca consistent and available if you can't guarantee that no matter what you will not tolerate you are a badass you do not have partition in your system at all i have a single beautiful machine it's beefy one three terabyte of ram 64 core and it's a single beefy machine so it is consistent and available all the time and i don't have any partitioning i don't have i cannot tolerate any protein so you have these two and that's basically what acid system when you look at this kind of thing is part of his acid right now within this beefy single machine you can still have partitioning if you are connecting to a device like a nas drive which has multiple uh disk drive and you write to multiple disks that's kind of a similar thing but you're not having physical machines you're having one single machine that does the work but the array of a storage is also partitioned what do you do in that case is does the cap theorem applies there actually maybe it does i don't know but if you think about it the cap theorem can apply and it really depends where are you looking across the stack so so when you're zoomed in to a single node you can get a consistent available system and when you zoom out you could be an available partition system but not consistent but within a single machine you are consistent when you issue a rate so one confusing here is the consistency in acid atomicity consistency isolation durability is different than the consistency what we're talking about which is the reads consistency in asset is within the same data as stealth right hey i have a unique primary key here that has to be unique so if i insert multiple things that's the same name you have to fail please guarantee that my data is consistent hey i have a foreign key that points to a single value if i delete that please cascade all and delete all the relevant keys right so so don't leave orphan rows so that's what consistency really means here another other form of the consistency is an asset that you have to maintain some some some time at that level is so you have a picture with a field called number of likes let's say instagram and you have another table that has all the people who actually like the picture so if you sum the id of the people of all the people who like this picture a better equal to this actual count that's also consistent with the data has nothing to do with the raids itself you essentially you don't you want to avoid corruption at that level if you get a consistent acid the consistency in acid is really nasty you don't want it the consistency and read in in the cap theorem you can't tolerate it pun intended right you can tolerate the consistency when you read something yeah well caitlyn jenner this picture has like 3.2 million likes if you got 3.1 million likes who cares right it's not a really big deal but if you have a discrepancy in a banking system where the mon the number of transaction doesn't sum up to the actual account all right guys uh that's it for me today i'm gonna see you in the next one you guys stay awesome all goodbye

Info

Channel: Hussein Nasser

Views: 11,784

Rating: undefined out of 5

Keywords: hussein nasser, backend engineering, distributed systems, CAP theorem, CAP Erik Brewer, CAP theorem explained, Consistency availability, replication, database replicaiton, DBMS ACID, ACID vs CAP

Id: KmGy3sU6Xw8

Channel Id: undefined

Length: 17min 33sec (1053 seconds)

Published: Fri Jun 11 2021