Distributed Transactions: Two-Phase Commit Protocol

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so distributed transition has to be one of the most intimidating thing out there either you take it for granted because your database guarantees that out of the box otherwise you just know it theoretically so in this one what we are in this two part video series what i would want to do is i would want to walk you through the concept of disabled transactions and implement it using two-phase commit protocol we'll take an example of zomatos 10-minute food delivery to understand the overall algorithm the overall protocol to guarantee at obesity spanning micro services right and in the next video we will actually simulate this entire environment locally to understand this distributed transaction and to see distributed transactions in action right so let's jump into the first part where we under where we solidify our understanding of two-phase commit protocol using zomatos 10-minute food delivered as an example right so how is alberto so zomento in india guarantees like is about to guarantee 10-minute food delivery so how how would you do that so tomato purchase a lot of food in bulk from the restaurant because they have the data to understand from this area this at this time this particular food will be ordered they can predict a load on a particular food item or the amount of orders on a particular type of food so they can pre-book it or pre-purchase that food and keep them available in the store right and they have these mini dark stores across all the cities uh they have this microsoft for each locality so whenever an order is placed the food is just already available in the tomato store they will just take two minutes to warm the food and then they will have eight minutes to deliver the food to the customer so in 10 minutes they can guarantee food delivery in india right so this is what they are basically this from the business side of thing now let's talk about how you would want to like as an engineer how you will need to ensure that you guarantee 10 minute food like what are the engineering challenges to do it so to guarantee food delivery under 10 minutes what tomato should be doing is tomato should be accepting the order for a 10 minute food delivery only when the food is available in the store right let's say you are ordering a burger that particular burger from that particular restaurant is actually available in the zomata store and that burger needs to be available for you to place the order and the delivery partner is available to deliver it hey so if either of the two is not that either the food is not there or a delivery partner is not dead you cannot guarantee a 10 minute food delivery because if the food is not available in the store then it needs to be get then it needs to be picked up from the restaurant which might be far away and then delivered to the user and if the delivery pattern is not available and the food is there hey then how will you even deliver it within 10 minutes that's the concern so this is a classic case of distributed transaction where let's say hypothetically that flow looks like this you as a user want to place an order the request comes to an order service it's it is about to place an order in this database which stores all the user orders but orders always what it would need to do is it would need to talk to the store service to to to reserve a food for an order like to basically mark that this wood is for this order and this delivery partner is for this order so both of these things should be done that the food should be blocked for an order and the delivery partner should be assigned to an order and then only you'll say to the user that hey the order is placed so you as an engineer let's say you're building this order service out here right which would work like which would be placing an order only when these two are otherwise you would not place a 10 minute guaranteed order here so the one thing that a disabled transformer would definitely need to get which is the toughest of all is at obesity which says that hey in one transition if i am saying that there are two steps either all of those steps happen or none of those steps happen right so either all or none right so yeah here if we take an example if we extend our example and say that hey for the order service to work order service will talk to the store service to reserve a food for a particular order hey this food is now assigned to this order like if it makes a like let's say we the three services talk with simple http protocol like http rest based protocol so order service makes a call to the store service on an http endpoint and says that hey assign this food for this order and then it makes another call to delivery service single day assign this assign a delivery partner to this order but say that store services the assigning of food to this order is done but delivery partner assignment failed what does this mean the store got the order but delivery agent is not assigned which means that order that the store started warming the the the food item and then basically packing it and keeping it ready for the delivery but delivery agent is not yet it's not actually booked right it's not actually assigned for that order so this would incur a little bit of loss plus your food hygiene is at stake because you are you are again warming up the food and then uh not warming and then not delivering it and whatnot so you are rewarming the food again and again so both of them like here you would need to somehow ensure like both of them would happen on none of them right with other use other flow where let's say the delivery agent is booked but the store is but the store like store service is down or the food is not available then what would happen the delivery agent is at that place to deliver the food but food is not prepared like like very weird situation so this would incur a business loss plus a poor user experience on the delivery agent side where he or she is waiting for the food to be uh he's he's waiting for the food to pick and deliver but the food is not even there so it would lead to a poor experience for the delivery partner plus your store has to spend time heating and packing the food which is not getting delivered and you incur when you incur some loss plus hygiene of the food is at stake so classic case of distributed transaction where you would need to guarantee or where we need to see that both of these happen then only i'll say to the user that hey the order is placed with a 10 minute guarantee others i would say that though i'm not getting a 10 minute delivery it will deliver it 20 30 something like that so but if you are guaranteeing a 10 minute foot delivery then you would need to ensure that both of these steps are done otherwise you would not guarantee 10 minute library so let's see how two-phase commit protocol would help us ensure the same thing so two-phase commit protocol splits this entire thing into two parts so the first part is prepare the first phase is prepare phase the second phase is commit phase so prepare phase is all about reserving items while the commit phase is all about assigning or booking the item so this is like basically committing hey this item is now for this particular order so let's say we start let's say we have an order like we have an order service we have a store service we have a delivery service for the first phase what order service would do is order service would make a call to store service to reserve a food reserving a food is like taking a lock on that item right you will mark that item as unavailable for any other order out there so let's say for ordering and burger form from burger king so that particular that specific burger that one specific unit of burger is locked in the database for or rather is reserved in the database so that no other transaction no other order will be able to book or will be able to purchase that exact same order exact same burger right so that is the first part so you would reserve a food for an order then you would reserve an agent for a delivery now when i say reserve this is not so here it's just reserving the food it does not mean that the store got an order to warm and pack the food it is just in the database we are reserving the food making it unavailable for any other order kind of like you told us turkey keep this burger aside and let no one else have it similarly when you make a call to delivery agent you are saying in the database keep this delivery agent aside no one else should be able to block him or her right no one like no other order should assign him or her this is reserved for an order like you're just reserving it like they are not intimidated so you you are not informing them hey you have to start packing you're just saying that hey i'm reserving them for it right so they you no way have communicated to the store key start packing start warming the food or you have not assigned a delivery partner to it yet right you are just blocking them making them unavailable for any other order to pick that food item or pick that particular delivery person making them unavailable is the first step by reserving them the second phase is all about booking or assigning them which is the which is your commit phase where what you say is now that you have one burger reserved and one delivery agent reserved for for your order right what would you do is because it is reserved they are unavailable for anyone else to block so if you would want to go through it you can say hey book this food which means the reserved food that you had for for uh for our use case uh you now assign this exact burger to a particular order right and then you make another call to book an agent for that particular order right so now if you are able to book a food because the food is reserved no one else can claim it because the agent was reserved no one else can assign it another order can assign it right so then you intimate though then you inform the both of this both the services the store service and the delivery service key this particular food is now for this particular order or this particular agent which we reserved can you can now inform that particular agent and then once both of this is done once both of this is done you place the order right so this after this two is done after the commit phase is done then you make an entry in your orders dbs informing your end user that hey your order is placed will be delivered in 10 minutes right because for that to happen the food should be available and agent should be available so we just blocked that so as soon as you do that you immediately get a delivery partner assigned you'll immediately get a foot in 10 minutes it will be delivered to it so here let's just talk about a few sort of edge cases or rather few sort of uh scenarios what we would have right so if both so in the reservation phase in the first phase if both fails your transaction fails right which means if you're not able to reserve the food and you're not able to reserve the agent your preparation phase is gone so you would say that you'd say to your end user order is not acceptable like like you like you would say order is not placed because food is unavailable or delivered is not is is unavailable over here right if only one succeeds we cancel the reservation and abort right so here let's say if you are able to reserve a food but you are not able to reserve an agent because you have not intim because you have not informed the store about it or you have not informed the agent about it it's okay to cancel your reservation like the food you kept aside or the food you took lock on you will be releasing that lock right so it's it's not because your business is not impacted just one entry in your database that is updated nothing else right so food reservation is not uh so if food reservation is done uh but uh agent reservation is not done so you reverse the food reservation because you are not able to find the lorry potter and vice versa right but here one critical thing to notice like every time you reserve something every time you resolve something there is a timer assigned to it that hey after i result i can reserve it only for let's say two minutes i cannot reserve it for more than two minutes because it's distributed system anything can fail any process can become unresponsive let's say hypothetically or order service itself became unresponsive so you cannot perpetually reserve something in your system so you cannot perpetually reserve an agent for your order which which will never get placed because then then how will that how will that agent will ever get an order because if it is if it is perpetually booked like if it is perpetually reserved so that is where any time you reserve something you would add a timer to it so after that particular timer expires the reservation is auto cancelled and reservation because we have not informed either the store or the delivery agent it makes no difference because it just like you release the lock that you took on your database right so with this with this timer with this auto cancellation of your reservation what you are doing is you are ensuring that no food or no agent is perpetually reserved for anything and then say if both of these succeeds as mean as it that you are able to reserve a food and you are able to reserve an agent if both of this succeeds then you move to the next phase which is your commit phase right so if both of these succeed we forward we move forward to the commit phase of it now when we talk about commit phase what would happen you have a food already reserved for an order you have delivery agent already reserved for it now you make a call to the store saying the food that you have reserved right the food you have reserved now this food is assigned to this order so you are booking that particular food your order service is making a call to the store survey saying the thing that we just the food that we just reserved the burger that we just reserved now is assigned for this order and then you make a call to this book agent where you say that the agent you reserved is now blocked for this particular order right this is where you you inform the store that now you can start heating and packing the food with agent now you can start a now you can assign this agent you can inform this agent that hey come to this particular store you need to deliver something right because they were already reserved no one else was contending for that particular resource the food or the agent these two call are bound to be successful unless there is a network failure because the items were already reserved for that particular order so long as there is no network failure these two calls are bound to be successful and if both of these are success then you actually place an order informing your end user that hey your order is place and will be delivered in 10 minutes right then if one of them if any one of them fails we like let's say if we book the food and if we book the agent if any one of them fails let's say if we first book the food and it is all sequential if we first book the food if we book the food uh and let's say your uh your call to agent fail so then what you do is you simply just cancel the reservation the thing that you reserved for an order you just revoke the reservation that you did right and because they were already reserved no one else was contending for it it it is pretty simple right because you are not informing you have not informed anyone but because it was reserved this has to be successful and so long as because of the network it is not going to be a perpetual network failure eventually the call will go through right eventually the call will happen eventually the food will be or the food will be assigned to an order of the agent will be assigned to that order and this is not going to be perpetual so which means that because it is reserved it is not contended by anyone else this two will be successful right because network failure after few retries will able to block that and that because it is reserved right because it is reserved that will be assigned to that order so not not really a concern and so if hypothetically if it goes on for very long very long your network failure your services down for let's say hours or something obviously it won't be there for hours or something but hypothetically which is there for arthur something you anyway have that timer running for reservation so then that item that food item or that agent will never be perpetually reserved or will be perpetually blocked on something right they would always be freed up for some other for some other order to consume right so this is the commit phase of it because you were able to reserve it no one else was actually able to book them which means you reserve and then you commit so hypothetically say if your order service fails as in this is your this is your coordinator like if your order service fails at any of this stage because you have timer set the item is never perpetually reserved right but and because order service failure your end user is is like like you'll never say to our end user that your order is placed right because the order service itself is done and the transaction is not complete right so by splitting your entire thing into two phases first reservation phase and then the commit phase you are able to guarantee that either both of them happens or none of them happens and the failures become transient right when because all the network failures and all because you reserved it and because network failures are now transient because with retry you'll be able to block it because they because you have reserved that food item so that's really not a concern now right so this is how two-phase commit protocol actually works let's now let's talk about few some advantages and disadvantages of this the advantages of using a two-phase commit protocol is pretty straightforward it guarantees atomic transactions we saw how we get atomic transitions over here by first reserving the items across services and then you actually you actually assigning it to an order right so you reserve and then you commit so you you reserve and then you commit that's how you guarantee atomicity across multiple services and if either one of them fails we have timer and whatnot to solve it and the second is guarantees isolation right it gives exclusivity to something once you have reserved a food an actual burger for an order no one else is able to no one else is able to assign that exact burger to another order and that's how your inventory works and so the disadvantages of using a two phase commit protocol for distributed transaction is first of all it's pathetically slow because you see a lot of things happen sequentially and it takes a long time because you are first reserving and then you are actually committing uh the the items for that particular order it is it is going to be very slow but if you want those sort of if you want those sort of guarantees if you want those sort of guarantees for a distributed transaction you would have to do that you would not think about uh throughput at large when you want very strong consistent guarantees across your system and second it's prone to deadlock like because you have a separate reservation phase and then separate commit phase very high chance that you will create a cyclic dependency across your stores orders and delivery service where you make free key a is waiting for b b is waiting for c c and c is waiting for a you can very easily create this part so very high chance for a deadlock to happen which means you might want to have a deadlock detection algorithm running so that you know your your transition never stucks in a deadlock right yep so basically that is it for the theoretical part of two-phase commit protocol in the next video we will actually simulate this entire environment locally by by mimicking order service store service and delivery service to see how we can guarantee atomicity across multiple services by mimicking failures by making successes by making timers and whatnot right so i hope you guys like this video if you guys like this video give this video a thumbs up if you guys like the channel give this channel a sub i post three engineering videos every week and i'll see in the next one thanks a ton
Info
Channel: Arpit Bhayani
Views: 27,029
Rating: undefined out of 5
Keywords: Computer Science, Software Engineering, System Design, Interview Preparation, Handling Scale, Asli Engineering, Architecture, Distributed Systems, Distributed Transactions, Two-Phase Commit Protocol, Zomato, Microservices, Challenges in Microservices, Atomicity in Distributed Systems, distributed transaction explained, how to do distributed transactions, distributed transactions using two-phase commit protocol, deadlock in distributed systems, two phase commit in microservices
Id: 7FgU1D4EnpQ
Channel Id: undefined
Length: 21min 21sec (1281 seconds)
Published: Mon Mar 28 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.