Airbnb System Design | Booking.com System Design | System Design Interview Question

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hi everyone! Welcome to CodeKarle. My name is Sandeep in this video we will be looking at a very common design interview problem which has been asked by a lot of companies lately so let's look at how do we design a hotel booking system something very similar to booking.com or airbnb but just one thing to call out we will be looking at a very high level architecture of the whole system and not at a lower level class diagram and all of that in this video so before we jump into the problem let's first look at the functional requirements then we look at the non-functional requirements of what we want to achieve and then we look at the design so we have two major consumers of this application one is the hotel side of users and then there are the consumers who want to book the hotel so for the hotel managers we'll have these three major functionalities 1) they should be able to onboard onto our platform 2) they should be able to update their property so for example they might want to add a new room they might want to change the pricing they might want to add new images and stuff like that 3) then they should be able to see what all bookings are there and along with that also get some insight into the revenue numbers and all of that From a user standpoint they should be able to search for a property in a particular location with a couple of search criteria so for example they might want to filter within a price range or some aspects of the property like a five-star property or a beach front property and stuff like that. Then they should be able to book that hotel and once they have booked they should be able to look at their bookings okay these are the major requirements now we should also design it in a way that we leave scope for some kind of analytics to be done so these are the functional side of things from a non-functional side of things we need this platform to run at a very low latency and it should give a very high availability and a very high consistency by high consistency I mean if you are booking a hotel or if a user is booking a hotel he should be able to see that hotel immediately now from a scale standpoint what kind of scale do we want so a quick google search tells me that there are roughly 500,000 hotels in the whole world at this point in time there are roughly 10-12 million rooms in all the hotels across the world at this point in time and roughly there... you can assume that there are thousand rooms in a particular hotel in general so there are some hotels who have which have more than 7000 rooms at this point in time but those are some edge cases we should be able to handle that. The reason I am talking about this thousand number is so let's say a hotel has thousand rooms now these rooms will be booked over a course of many days so there will never be a situation that there is just one room available and there are thousands of users who are wanting to book that. At max what will happen is that there is one room and there are two three users who are trying to book that and we will be able to use that assumption for our leverage at later point in time. Now let's look at the overall design of the whole system and how the data flows within each component then we look individually into some of the components. So the whole business flow starts at this point which is basically a UI that we give out to the hotel managers through this you it could be either a website or a mobile app but through this UI they would come on onboard onto our platform and the same UI would be used by them to modify the property. So let's say they want to add a new image or they want to add a new room or if they want to make any modifications this is the UI that they talk to. Now this UI talks to a Load Balancer through which it talks to a hotel service. This is basically a service which manages the hotel part which is basically the onboarding and the management. All right, now let's just say there's a spike in traffic so there could be multiple nodes of this hotel services that could be added here and so this becomes a horizontally scalable component Now hotel data in itself is a very much relational data plus earlier we talked about the number of photos that's not too many so it doesn't even have a scale problem so we'll be using a clustered MySQL here with one master and multiple slaves slaves can be added as and when required. let's say there's a huge spike in Read traffic, we can add more slaves but this data resides within MySQL database Now let's just say any image is added, so hotels can add images about the rooms about their whole building and all of that all those images would be stored into a CDN and the reference to the CDN which is basically a URL of the image would be stored in the database and that URL would be sent out to customers and whenever they want to render an image that would be looked up directly from the CDN. Now what is a CDN? it's basically a geographically distributed data store which we will be using for sending out images throughout the whole world. So let's just say I'm connecting from India somebody's connecting from US they want to look up for an image of a particular hotel so I'll look up on the CDN server which is in India the other person will look up into the CDN server which is in US So this becomes the hotel life cycle management The next thing is basically let's just say each time a modification is happening to a hotel let's just say a new hotel comes in we want to bubble up this hotel to the users who are going to search for this right, now there are multiple ways in which we can send out this information to the search piece, right. I'll be using a Kafka here so each modification that is happening within hotel service will flow through a kafka cluster and there'll be multiple consumers that will be sitting on top of this cluster which will populate their data store for serving the search traffic and for other traffic as well, right. So one of the consumers will be the search consumer what happens is let's say a hotel gets a new room, for example. There will be a payload that is put into Kafka which has all the information that is required Now the search consumer pulls up the payload from Kafka and it stores into its own database and this database would be used to power the search on the website. Okay, now for search, I am using an elastic search. Elasticsearch is basically a database that is built on Lucene platform. Similarly, instead of elasticsearch you could also use a Solr here. Both are kind of similar components ideally it would depend on what infrastructure is being used in your company you could use that right. But the idea of using elasticsearch is that I want this piece to be supporting fuzzy search now let's just say i am searching for a hotel in maldives or let's say user is searching for a hotel in maldives, the user might not know the correct spelling, right. If they type in a wrong word I don't want them to get no results. I would want this to be able to support a fuzzy search so I have to be able to handle all the typos and spelling mistakes and all of that plus i also want to give similar.. similarity kind of a thing there so that's the reason i'm using elastic search here. So all the data of each individual hotel, flows through the kafka via the search consumer into this elasticsearch cluster. Now on top of this elasticsearch sits the Search Service. Now again let's just say there's a spike in traffic I can increase the number of nodes in kafka cluster, I can increase the number of search consumers here and I can increase the number of nodes in elasticsearch cluster. So till now whatever we have talked about is again horizontally scalable, right. And again coming to Search Service this is the service which powers the search on the website now website is... i'm using a generic term sometimes I'll use a website sometimes I'll say UI but it's basically all modes of communication through which a user can come in. That could be an app, that could be a website right. So the user talks to through again a load balancer to the search service whenever they want to search for a particular hotel again they will give a date range and a location for example as a search criteria and along with that they could also provide some tags. Now those tags would be the properties of the hotels. So again going back to my previous example a five star property is a tag. A beachfront property is a tag. Now the search on elasticity would be happening on either of these tags and the ranges that are provided basically the date range, price range and all of that. okay, so this takes care of the search flow. Now once the user has seen some of the results on the website they would want to book a hotel. The booking again happens through this UI. So I've made this UI saying that it's a search and book UI. Normally, it will be the same app or the same website through which they are searching and then booking right. Now a booking request again comes to this load balancer and talks to Booking Service. Booking Service essentially again sits on top of a MySQL database now these are two different MySQL clusters. I am purposefully not using a same cluster here although we could use the same cluster and have two different databases in that but because we are talking about a fairly large system that has like a good enough amount of scale I would want to keep different clusters so as to you know take care of the scaling separately of each other Now, whenever a booking happens, that booking gets stored into this MySQL we go over the exact flow of booking when we go over the details of implementation within the Booking Service, but essentially this stores the data into this MySQL and it talks to a Payment Service Normally what will happen... a booking request will come, it stores something, it will send the request for payment, once there's a success, it will mark the booking confirm. Now again, whenever a booking is happening, the data is flowing into the same kafka, right. Why? so let's just say there was just one room available in a hotel right and that room is now booked i want to make sure that this hotel is not available for search now in that same date range, because it's not available. So all of those information is again sent to the same kafka which is read by Search Consumer and then it takes care of even removing the hotels which are now completely booked. Now if you can see, there's something called an Archival Service here. What I have done is, I am just storing the live data into MySQL. By live data, I mean the bookings that are done but have not been completed thereby making sure that this is having a scale which is low enough that MySQL can easily handle, and once the booking moves to a terminal state so let's say booking is cancelled or booking is completed it will move through the archival service to a Cassandra cluster. The reason I'm using a Cassandra here is so cassandra is a very good database which can handle a huge amount of reads and writes. It has a constraint that it needs a partition key on which all the queries should happen. So let's say if I want to search by a "booking_id" my partition key has to be a "booking_id" in that case I cannot do any kinds of queries on a Cassandra therefore I did not use a Cassandra as a source of truth database. Because on this database I need to do a large variety of queries. We'll come to all of those when we go into the detail of Booking Service, but once it is archived we just need to do GETs on those. So therefore Cassandra makes a good enough sense over here. Now, once the booking is done, all of that is fine, but now we need to notify all the people right? So then comes the Notification Service. So let's say whenever a booking is made, or any changes are happening into a booking or it moves into a terminal state, there'll be a Notification Service that consumes events from this kafka and notifies the people, so for example on each booking, we need to notify the hotel right. Whenever a booking is cancelled by the hotel we need to notify the consumer or in fact on each booking we need to notify the consumer with an invoice right. So all of those is taken care by this Notification Service. Now coming back to the UI for hotels and users. So each time a booking is done or even without that a user might want to see their old bookings or a hotel might want to see all the bookings that they have. This is more of a read-only view for them, right? That will be powered by this Booking Management Service, which talks to now two data sources. It talks to the MySQL cluster for all the active bookings, which are to happen sometime in future and to the Cassandra cluster, for the bookings that have already happened right. Now i am adding a Redis on top of this MySQL to reduce the load on this MySQL, so Redis will act as my Cache and whenever I have a query so for example something like get bookings of a user so I can cache this result into this Redis. And it'll be a write-through cache, so whenever a new booking is coming in this will get updated all right. Now this is the functional flow the bigger component here is how do we do the analytics on this so let's just say a business person wants to know how much revenue I'm making or how many bookings I'm having or what are my best performing hotels and stuff like that. So they need to do a lot of analytics Now mostly while designing the system we'll never always know what kind of analytics is required right so what I've done for that is I've used a Hadoop Cluster on which I'm pushing in all the events that are going into my kafka. Which is basically information about all my hotels, about all my bookings, about all the transactions that happen in my system. So there will be a Spark Streaming Consumer that runs somewhere that reads from this kafka and puts all the data into a Hadoop Cluster on which I can do Hive queries or any different kind of queries and build up a lot of reporting. So this is overall how the system looks like and how the data flows. Now let's go into the details of some of the components. Now let's look at what Hotel Service internally is. So it's not a very complicated service it is basically a CRUD Service which provides Create, Update, Read, Delete operations on the hotel data store. And it is the source of truth for hotel data. Now, this is not an exhaustive list of neither the APIs nor the DB Schema that you see here there will be a lot more things, but this will give you a feel of how it should be. So let's look at some of the APIs. 1) There'll be a POST API /hotels to create a hotel which will be part of their onboarding process. 2) There will be a GET API with an id GET /hotel/{hotel_id} which will give back the information of the hotel which can be rendered on the screen and the hotel guy can see it. 3) There will be a PUT API PUT /hotel/id which will be used to update any information of a hotel. 4) Similarly there will be a PUT API PUT /hotel/{hotel_id}/room/{room_id} which would be used to update the room information or create new rooms and all of that. Now this is not an exhaustive list there'll be a lot more APIs that you can add it as in when you know there's a requirement to add. Now let's look at how the DB schema might look like. So there are a couple of important tables now this is again not an exhaustive list of databases of the tables so there's one hotel table into this hotel DB but before that everything in red here is either a primary key or a foreign key. everything in blue is just a column now this hotel table contains your very standard things id, name, locality_id which is a foreign key to locality table description, original_images, display_images and is_active. Now I have two columns for original and display images? so original_images is basically the artifact that the people have uploaded display_images could be a compressed version of that, that we've compressed, it could be a version that we have uploaded on the CDN, it could be something different than the original image but we still need to keep both of them so we have stored it here. is_active is basically like a soft delete flag Then coming to rooms table. It has a room id obviously, a hotel_id which references into this[hotel] table a display_name which could just be a identifier to tell the customer on what kind of a room it is, is_active again a soft delete flag, quantity basically tells how many such rooms are there in the hotel and a price_min and a price_max. Now why do I have do we have two prizes? Remember the hadoop cluster that we had in the original design that we made. it has a lot of data about various kinds of things we might as well run a machine learning model onto it and do some supply demand analytics and then come up with the optimal price! right? let's say supply is low there's a lot of demand there are just a few rooms left... might as well increase the price! or let's say if there are too many rooms and very few customers might as well reduce the price. So this price_min and price_max could be the ranges which the hotel provides, wherein the price could be fluctuated by the system. A good starting point could be an average of both these prices right. Then there's a facilities table, which is basically a list of all the facilities that a hotel and a room can possibly have and these hotels_facilities and room_facilities are basically mapping tables which is a many-to-many relationship between a hotel_id and a facility_id. again is_active flag everywhere is basically a soft delete flag. now again this is not a full list of tables there are a lot of information missing. I've skipped the auditing information, I've skipped the bookkeeping information like created_on, updated_time and all of that. A lot of information missing but this will give you a fair enough idea and it will be a good starting point for you to come up with a DB schema for this. One more thing to note here that if you remember the original design that we had I did not keep a Redis cache on top of this MySQL database but I did keep a Redis cache on the other MySQL database which was for Booking DB Now why is that? We could have kept the Cache on top of this and all these GET APIs could have been a bit more faster right, but this is not coming in the critical path of any high throughput business interaction right so all the customers are not querying this database, neither this service, they are always querying the Search Service so if this service is a little bit slow that's okay but adding a Redis Cluster is a cost. So you need to do a trade-off analysis between what cost are you adding of an infrastructure and what benefit it adds to you if it is worth it you might as well go and add a Redis cluster here, but I don't think it is worth it and that's the reason I did not add it. Now let's look at the internal functioning of the Booking Service. We'll first start off just walking through the DB Schema again it's not a full-fledged schema there are a lot of details missing like bookkeeping information like created_time, updated_time and all of that but let's focus on the meaty part here. So it has a table called available_rooms which has a room_id it has a date, it has an initial_quantity that comes from the hotel service and it has a available_quantity available_quantity is basically the number of rooms that are remaining for that particular room_id for that particular date. Now, it has a constraint saying it cannot go negative. Here is where the true power of MySQL we are utilizing and that's the reason why I chose to use MySQL here the other table here is a booking table it has a booking_id which is the primary key here, which will be referenced across the whole system it has a room_id, again comes from the room table it has a user_id, a start_date and an end_date for a particular booking, number_of_rooms which is how many rooms the person has booked, status and an invoice_id. looking at this design we can clearly understand that one booking cannot contain different room types you can have multiple rooms of the same room type but you cannot have like one deluxe room and one regular room in one booking. If you want that there'll be a small change required but i think that's a minor detail it can be taken care of easily. The important part here is the status column. it has these four values - reserved, booked cancelled and completed. Now canceled and completed are the terminal statuses here so the booking gets first created into reserved status. Then based on the payment success it can either move to book or cancel. And once the user stays in that, it moves to complete. Now you can add more statuses depending upon your conversation with your interviewer but these four statuses are the main ones that will help us achieve what we initially thought of. Now, let's look at the API Signature so this will have one important API called a book API it will be a POST API which will take these five attributes. It will contain a user_id it will contain a room_id, it will contain the quantity. Now again if you want to make multiple rooms multiple quantity we'll have to change it a bit to have an array but let's stick to this for now it'll have a start_date and it'll have a end_date. The price will come from somewhere else let's assume for now. It'll actually come from the data store which contains the price for the room at this point in time we don't want to generally take the price from the user because then the request can be tampered with and that's not really a good design okay now let's do a quick revision of the design because i skipped some important details in the earlier larger diagram and we'll go over that now. So the way Booking Service actually works is when it gets a request to do a booking it first of all queries this table and the available_rooms table and check whether or not I have that many number of rooms remaining or not. So if there are no rooms left for that particular room_id for that particular date, there's no point of proceeding so we can error out from there. But in case that's a success and we have rooms then we actually go ahead with the blocking of the room saying that now I'll block it temporarily and if the payment is success I'll actually book the room. I'll do a quick dry run of what actually happens. Assuming this is the request that came in user_id: 1 | room_id: 5 | quantity: 1 for some date "dt" to "dt +1". The room_id: 5 on that particular date "dt" has 7 available rooms. So our first check is a success that we have enough rooms. So what essentially will happen is there'll be a row created in this table with a booking_id: (some_uuid) | room_id: 5 user_id: 1 | start_date: dt whatever that is, end_date would be "dt + 1" whatever that is, number_of_rooms in the request is quantity:1, and status would be at this point in time, RESERVED. invoice_id at this point in time would be NULL because there is no invoice created till now Now, we have a record, along with that we also decrement the quantity here now here again we are utilizing a very important feature of MySQL which is part of the ACID property and transactions. So we are creating a record here[booking table] and we are reducing the quantity here[available_rooms] to 6. what essentially we are trying to do is basically bounding this as part of one transaction so let's say there was just one room left and two three requests came in only one transaction would be successful to do both these things. Basically to insert this record and reduce the quantity because we have this constraint sitting over here which says that quantity cannot be negative. okay so only one of the transaction will be success, and only one of the rooms will be booked and no two users will be redirected to payment. That being taken care of what is the next step so i have written down the steps here if you want to actually look at so what we have gone through till now is step number one and step number two okay we've inserted in booking and reduces reduce the quantity in available_rooms our step number three is something that I did not cover as part of the larger design review because it was getting too much cluttered. Now we cannot keep this room reserved for an infinite amount of time. What we can say is if the payment is success in next five minutes, well and good, if not then we'll assume that the payment will not go through and will unblock the room so that somebody else could book it, okay. So there are multiple ways to implement that what I choose to implement here is something using the TTL(Time To Live) of Redis. So because we anyway are using a Redis we can utilize the same cluster of Redis for this use case as well. So what we'll do is we'll put the key in Redis saying some booking_id expires at some timestamp Now the time stamp could be a configurable number, it could be a fixed timestamp across the board, it could be a country specific timestamp, for India have an expiry time of five minutes, for US have expiry time of four minutes, something of that sort but whatever that time is, we'll insert that into redis. Now what redis does is, it has something called callbacks so one of the later versions of Redis has introduced this concept called callbacks so whenever a key is getting expired you'll get a notification, okay. And you can do whatever you need to do at that point in time, right. So, if you get a Success notification from payment, well and good. Success notification means the payment has gone through, then you will mark the booking as BOOKED but before that if you get a callback from Redis saying that the key has expired and you've not got the success from payment you will say that the booking is CANCELLED. Alternatively you could also get a failure from payment saying for whatever reason the payment didn't go through and you got a failure response from the Payment Service, in that again you can say CANCELLED. Now if you want a bifurcation of the varieties of CANCELLED, you can maybe make multiple statuses like cancel because of invoicing / cancel because of payment/ cancel because of expiry, whatever right or you could maybe add a status_reason column or something of that sort but that's a very minor detail we'll skip that for now, okay. So let's go over what all possibilities are there in this and how each of them behaves, okay. So, first very simple thing is - what happens when payment is a Sucess? so in case payment is a success everything remains the same just the status becomes BOOKED. okay, in that case, we do get some invoice_id as well. So basically we'll get an invoice_id from Payment Service whenever you know a booking is getting success and we'll just update the invoice_id there and then the regular kafka events would also be sent saying the booking is now complete and here's the kafka event for that in case somebody wants to do something on that. What happens when payment fails? Now in this we just have these four statuses so the booking status will become CANCELLED. okay there would be no invoice_id in this case Why? because if the payment did not go through there obviously is not an invoice that is generated. And everything else remains the same but, if the payment did not go through, we need to revert the available_quantity again. so available_quantity in that case would become seven. Now let's say your key expired so basically let's say the user was redirected to payment screen and there was no response from payment service for whatever reason what happens then if we get a call back from Redis and based on that call back we can say that okay now the payment has not gone through we will follow the same process as payment failure we will mark this CANCELLED okay we will mark this CANCELLED and we'll increment the quantity in available_quantity so that the room is now available for somebody else to use. Again in that scenario there is no invoice generated. But this you do only if the status is RESERVED. Why? - coming to the next case, what happens if both (3) and (1) happen. What happens if you get a key expiry event and a payment is also success. so there are two conditions if the payment has already been successful and the booking has already been moved to BOOKED status, after that if you get this key expired event then you don't do anything because that is any way bound to happen right? but what if it happens the other way around what if key expired first you move the booking to CANCELLED state but then you get a notification saying payment is success Now there are multiple directions in which you could take it based on your conversation with your interviewer and the non-functional requirements and in fact even the functional requirements for that matter you could do two three things. You could now either revert the payment saying for whatever reason we were not able to book the room so here's your payment back. Alternatively you could do something even more smarter. You could say that now I have anyway got the payment from the user, I can check if there are rooms available and I'll book them, right? Now this could be done based on what the requirement is and you could talk to your interviewer and implement it either ways. All good so far but there are a couple of caveats here. The TTL that you have talked about it is not a very precise measure so let's just say that a key was supposed to expire at 10:00 okay, you will probably never ever get a call back at this point in time it will always have some delay. Now in this case it doesn't matter too much instead of at 10:00 if you get it at 10:01 it's possibly okay also. So it's not too big of a problem and the reason for that is because of the way expires are implemented in Redis I'll not go too much into detail of that but there's a background process that runs in Redis for keys that are not accessed and whenever that process gets to access a particular key is when it will expire it. So it is not necessary that it will acquire it at exactly the same time. But let's say if you wanted it to be totally precise then you could possibly tweak the implementation a bit and do a slightly different way so instead of doing a TTL based approach you could in fact implement a queue with it within Redis and have a poller that kind of queries Redis, the topmost node of the queue every one second and whichever one it it finds has expired then you could kind of delete that but that's not that's obviously much better but that comes at a cost so you'll have to build a kind of a polling mechanism so that's additional development effort and then it will be continuously bombarding Redis every one second so there's a lot of CPU being utilized on both the sides on the cron side, and on the Redis side so possibly you'll have to add more nodes into the Redis cluster and also on the side where cron is being developed So now that's a tradeoff. Do you want to be notified absolutely immediately when the keys are supposed to be expired and at the cost of additional hardware that trade-off you can again make with the conversation with your interviewer. But otherwise all of this being said I would still go with a TTL based approach because in this particular example it doesn't really matter so much. Now a couple of optimizations you could do. So, let's just say payment is success. You know that key will expire after some time, for sure, right? because it's there in Redis. You don't need to keep that key there you know the payment is success you can evict the key right, even if the for the payment failure case you know that payment has failed it will expire after five minutes might as well delete the key then and there, right. So these are certain optimizations that you could do over this implementation to make it even more better. But on and off this is how the booking flow works. Now again reiterating we have used a couple of important features of MySQL and that's what is helping us to make the code on application side much more smoother had we used some other database which doesn't provide for example if you were using Cassandra here we would not have had access to the transactions and the constraints and all of that you would have to implement it on application side. That's additional effort on our side to make sure things are consistent. in this case I would rather leave it to MySQL to implement all of those things. Now coming back to the same architecture again i just want to call out that all of the components that you see here are individually horizontally scalable so let's just say there's a traffic spike happening on one of the components we could increase the number of nodes in that particular service maybe that particular database and then that should work just fine. As far as kafka and hadoop cluster are concerned we could add more nodes into that as well and they should also scale to a much larger scale than what we need. Cool, so now let's look at what kind of alternates that we could have used instead of this particular design choice. So first of all why MySQL, we could use any other relational database here. We could use a Postgres we could use a SQL Server, anything which provides ACID guarantees should be fairly fine here. As far as Redis is concerned we could use a memcache or any other cache instead of Redis and that should also be good. Cassandra, I would still stick to Cassandra because that is exactly what we need here now technically in place of Cassandra we could also use a HBase here that would also work fine but it has a lot of operational overhead in terms of deployment and maintaining it over time so that's the reason I would prefer Cassandra over HBase or any other similar system The way cassandra works is every data in cassandra is you know sharded across a partition key so each query has to happen on a partition key now the queries that we are doing are just of two varieties. 1) Get bookings by hotel or 2) Get bookings by user. There is no the third variety so we basically have two kinds of data which is distributed by two different partition keys on which the queries are happening. So this would be kind of a very good choice here. In place of Kafka, we could have used an Active MQ or a Rabbit MQ or any other queueing mechanism there's an amazon queue(SQS) also we could have used that but i think kafka scales much better than most of them so i think it's a fairly good choice here. Other than that in general we definitely need to monitor how are our CPUs and Memory is behaving. So if I have a CPU spike at certain points in time that is something we need to kind of look at so across the whole infrastructure we need to keep an eye on how my CPU usage percentage is, how my memory usage percentage is, how my disk usage for Redis is, how my disk usage for elastic search is all of these things are what we need to monitor now monitoring could be done through a grafana kind of a tool on which i can set up alert. So if the let's say a particular metric has some threshold the moment I cross that threshold or with certain conditions I could send out an alert and the team could get notified that something is potentially wrong and they need to look at that. this will help us to make sure that we in the end achieve our NFRs that we talked about of latency and high availability. Because let's just say something goes wrong let's just say memory is you know utilize more than what we expected eventually it will lead to some machines going down and eventually it will lead to us having a lower availability that than what we expected so yeah these are the things that we need to monitor and alert on. Now in the next section let's look at how this whole thing would be spread across geographies, so for example let's say there's an earthquake in one of the data centers and everything just goes away out of the blue what do we do? So let's look at that next So let's say we have these four data centers data center 1, data center 2 data center 3, and data center 4 which are located in different geographical regions across the globe, okay. Now we want to create a topology in a way that we do get low latency and high availability okay so one very simple approach that we could do is say that DC 1 is our primary and all the three DC's are our secondary data centers and data is replicated to all the three data centers in near real time okay, so that's okay it's good enough. but it's not very good to be honest because we are just using 25 percent of our capacity as primary which is active and rest three data centers are sitting idle and not really doing anything. So let's try to improvise what we could instead do is divide the data centers and thus the globe into two parts. What we could say is this is region one and this is region two okay now the countries or people accessing our services who are closer to this region(R1) will connect to this region(R1) and the people who are closer to that region(R2) will connect to that region(R2). Now how are we able to do that so the data in a hotel management system is fairly specific to a geography so all the hotels in let's say India can be you know separated from all the hotels in US. Similarly all the rooms all the bookings are now specific to hotels and thus specific to geography so we could kind of bifurcate the data as per geography right which gives us the leverage to divide the system into two halves right now what will happen here now let's just say DC1 is the primary in this region(R1) and DC3 is the primary in R2 okay now if DC1 goes down all the data in DC1 is getting replicated to DC2 in near real time so if that goes down DC2 can become active and all the clients who are connecting to DC1 and how will they connect so there will be bunch of clients who are connecting via some DNS to DC1 right if this goes down DNS can flip and connect to DC2 if this link is broken right similar thing can happen on this side so this way what we have is basically dividing our infrastructure into two halves thereby clients who are closer to this region are connecting to the servers that are closer to them thus giving them lower latency right. Now we could go even one step further we could say that we'll divide the region into four parts and we could do we could go as much as deep we want into this to increase the latency basically to reduce the latency and increase the availability but i think for all practical purposes at least for a Hotel Booking System this R1 R2 thing is more than sufficient to give us a good enough latency and a very high availability. So I think yeah that should be it for a Hotel Booking System.

Info

Channel: codeKarle

Views: 38,220

Rating: 4.8993134 out of 5

Keywords: System Design, System Architecture, Interview, codekarle, Code Karle, System, Architecture, Design, airbnb, airbnb system design, booking.com, booking system design, hotel, system design airbnb, system design booking.com, grokking, FAANG, system design tutorial, tutorial, system design interview question, interview questions, hotel booking system design, bookmyshow system design

Id: YyOXt2MEkv4

Channel Id: undefined

Length: 38min 57sec (2337 seconds)

Published: Tue May 26 2020