UBER System design | OLA system design | uber architecture | amazon interview question

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone my name is Miriam and in this session let's talk about system design for uber lyft Ola our grab kind of tax education services Uber's technology may look simple but it is not when a user requests a ride on the app the driver arrives to their place to take them to the destination but behind the same there are tons of service which is supporting the trip terabytes of data has been used for this particular trip like any other startups uber or when it started they had a monolithic architecture that means they had a back-end service they had front-end that is an application and a database and couple of real time services only this couldn't work well when the were started to roll the service into different regions initially the design was something like they used Python for the application servers they use Python based salary framework for this a synchronous task they had courses sequel to save the database after 2014 and now ubers architecture has evolved into something like service-oriented architecture now uber won't just handle taxes but instead it also handles food and cargo everything is built into one system now the challenging thing for uber or any capitalization platform is to meet the supply to the demand or demand to the supply the main task for the backing of the uber or any tax aggregation platform is to serve the mobile traffic because without mobile phone it is pretty hard to run this service because everything works on GPS the next thing is ubers dispatch system acts lights real-time marketplace to match the rider to the cap so that means it is clear that we need two different services in our architecture that is supply service and demand service and here is the complete architecture for uber or any taxi aggregation platform and you see everything here I have written all the major components over here but instead of jumping right into explaining each and every component I am going to concentrate much on this particular component over here thanks God discovered dispatch optimization let's talk about how this dispatch system work dispatch system works completely on map our location data that means that we have to model our Maps our location data properly so now since earth is spherical it is pretty hard to do summarization and approximation just by using latitude and longitude data and to solve this particular problem what Google uses is Google s2 library what this library does is it takes the spherical map data and it makes it wise this data into tiny cells of about say for example one kilometer by one kilometer cells so when we join all these cells we get the complete map so each cell is being given a unique ID that way it is a lot easier now to spread this data in the distributed system and store it easily so whenever we want to access a particular cell of if we know the ID we can easily go to the server where that particular data is present we can use consistent hashing by based on the cell ID also s2 library can easily give you the coverage for any given ship say for example we want to draw a circle on the map and we want to figure out all the supplies available inside that particular circle what we need to do is use this to library and give the radius to it so if you automatically filter out all the cells which contribute to that particular circle that way we know all the cell IDs so now we can easily filter the data which we need and also which belong to that particular cell that may be how the list of supply available in in all the cells that way we can filter out we can calculate ETA etc so when we want to match a rider to the driver or even if you want to show a number of cars which are available in your region all we need to do is just the way I explained earlier we need to draw a circle of about two to three kilometer radius and list out all the cabs available using Google s to library then what we need to do is with the list of all the cabs available we need to check for the ETA how do we need to do is say for example themselves so we found it say with this particular circle of about two kilometer radius say we found about one here so these many cabs we found out in India nearby in the two kilometer radius so what we need to do is so we have to calculate ETA or the distance from the rider consider the rider is present here in this this way the shortest distance we know obviously we can calculate something like you clear your distance but this won't accurately give you the ETA the because you can't just drive the cab from here to here directly but instead you have to go to the connected horses on like this say if the road is something like this so the cap should be driving using that or so we have to find the ETA or the distance which is connected by the road so that way we have to figure out all the ETA so when we do that maybe we might get a point eight kilometer and this could be our 2.5 millimeter or three or one something like that so now we know which gap is suitable or which are the cab's suitable for this particular rider in the same order we can send the notification to the driver and if the driver and we can match the rider to the driver enough of explanation now let's jump into the system design and understand all the components which are needed to understand dispatch optimization component okay so now you can see over here this is the supply that means the cabs are the supply and this is the demand where the user occurs for the right so every four second wants the calves will be keep on sending the location data to the cat-car REST API and every call happens through the Web Application Firewall and then it hits the load balancer and it goes to Kafka and it keeps updating the location by pushing it to Kafka and then it is consumed two different places and also a copy of location is also sent to the database and also to the dispatch optimization to keep the state machine that means the latest supplies the location that is the latest location of the cap so here we need Web Application Firewall and if you ask me the reason why it's pretty simple for the security purpose here we can block the request from the blocked Ivy's or we can block the request from the bots are we can block the request from the regions where the uber is still not yet launched and then we obviously need a load balancer a load balancer can be of different types that's the hardware lower balancer or software or balancer and also in the load balancer we can have different layers of load balancers a layer 3 layer 4 layer 7 left-wing words based on their IP based load balancer like all the ipv4 traffic go to this particular server or the ipv6 traffic go to the different kind of server or in the layer setup for what we can do is we can do by a dense based load balancing and in the layer 7 it is tradition level you're balancing and the Kafka rest api is will provide the endpoint to consume all the location data for every cap say for example we have thousands of caps running for a city and that and every force it can be are sending a location that means that in four seconds will be having thousand hits or thousand location point being sent here and that data will be buffered and put it to a Kaka and then they were consumed two different confidence over here and also a copy of it to say with no sequel when the right is happening and latest location will be sent to disco to keep the state machine updated and also we have restate yes and and we'll talk about these components later so the important component is WebSockets and why do we need WebSockets are unless like normal HTTP requests WebSocket is really helpful in these kind of applications because we need a synchronous way of sending messages from client to the server and server to the client at any given part of the time that means that we should have a country established between a cab application to the server or from the user to the server what happens is web WebSockets keeps the connection open to all the applications the ubers ubers application and based on changes happens in the dispatch system or any component in the server the data will be exchanged to and fro between the application on the server since the the supply demand and the WebSocket component of are mainly written in node.js as nodejs is really good in a synchronous messaging and small messaging and also it is event driven framework so people these days are using node.js for all these kind of requirements so now let's jump into understanding the dispatch optimization so let's see how the dispatch system works the uber dispatch system is built using node.js the advantage of using node.js either synchronously event-driven ephemeral so the server can push the message or send a message to the application whenever it wants now the next question is how do we scale these servers so this is the disco component or dispatch optimization combination these are different servers or the services which runs so far we scale is to scale this uber uses something called repo it has two functionalities question is it does consistent hashing to do the distribute to distribute the work between these servers so when they say consistent hashing we know that it needs a really kind of structure if you don't know about consistent hashing you can check one more video which I have made about the consistent hashing it works similarly so it is consistent hashing to descriptive work between these workers and it also it uses our pc cohn to make call from one server to one more server at times and I will explain in about few minutes why do we need to make a call from server to server and along with doing this it also is a something called swill protocol that is Casa protocol which helps every server knows the other servers responsibility so for example this server know what what is the responsibility of this server and what is the responsibility of this server so every server has specific responsibility even though they all does the same work but the responsibility to compute for a specific location is assigned to each and every server so so now why do we need in Casa protocol so the advantage of gossip protocol is we can easily add a server we can easily remove the server in this so that way when we add the server the responsibility is distributed to the server and responsibly is read used for other servers that way everyone also knows that this guy is responsible for doing what work yep now let's see how in real time when a user plays a request to for a cab or for a ride how this function this particular setup works so if you know that WebSocket has a connection to the user and to the caps right so when the user places the request for a write that occurs lands to the web socket and this web socket hands over the request to the demand service since the demand service knows the requirement of the cap r-right say I need a mini car or I need a s to me or I need a Salam or something like that or maybe like I need once a dog if in case of pull over pool I need to saves or I need a complete cab or something like that so now the demand service requests the supply that I need a cab of this kind at this particular location so what does supply service does is it it knows the location of the user that means the cell ID of the user's location on a map as I have always think Google esta library gives breaks the alert into Spanish as write like say we have map of some region like this it breaks the location into small small cell if the user is present here that means that it knows the cell ID of that particular user so so the demand supplies the sir give the ISIL ID to the supplier so based on the ie what it does is it contacts one of the server in the wing of the servers so in the consistent hashing as you know the work the responsibility is equally distributed that means that say we have about 10 cells in total hypothetical latest millions of cells just to make the expansion simpler and dividing you to ten cells with ten cells the responsibility would be something divided like one come out to the cell one and two will be handed by this guy three and four here four five hundred six years 70 by this guy and nine and ten by this guy so say consider the user is requesting from the location cell type that means that the supply knows the five and hits the server and requests the server here to find a Kaffir for this guy for father for the writer what this guy does is so it figures out it draws a circle in the map and figures all say it was a circle around it and figures out all the cells which is responsible all the cells from which the caps can be figured out okay and then it makes the list of all the caps refers to the it makes a list of all the caps and then in that list it figures out the ETA for each and every cap using the maps ETA service and it starts based on that and with all this information it gives back that information back to the supply service and the supply service using the WebSocket sends the request to the first few caps which is very near to the user and as soon as the driver accepts the Hoover accept first and that particular camp will be assigned to the rider sometimes it might happen that for a particular request a right of save for the rider is at the fifth cell so the cells which we got is say say 4 and say 7 in this case there are different kind of cells so now what happens is the supply won't directly talk to each and every servers what it does is it handles the requests with one server that is probably this guy and then this server internally hands overs the requests through all the different other servers which is responsible to compute the or to figure out the caps that is 407 so in that case what happens is that requests replaced here using RPC call and or more a question of paste using that physical so now once these guys to figure out the cap de and once this dares to figure out the caps ETA and also the same with this server they all respond back to the supply server service and the supply service takes care of notifying the driver and matching the demand with the supply next we need to add more service to the existing dispatch optimization ring pop here the reason is we need to handle the traffic from the newly added city so we need to add more servers to the link Bob say for example we have added here and we have added two different servers now the responsibility of these servers are unknown what the link bot does is it knows the all newly added cell IDs from this component and it distributes the responsibility of newly added cells to these new servers that is probably cell number 11 and 12 you take care of it cell number 13 and 14 you take care of it same way it works when we take down the server its reachable state IDs or Ria's is the responsibility of the computation of particular cell to one of the random server which is free now let's talk about geospatial design as I have only think Google heavy users to library to break the map into different cells and that has been used to easily locate the caps near to any particular riders location so that is the use of gesture libraries so next about building maps or using maps in your application earlier Hoover used to use map box because of the Google's pricing strategy and etc but now over is back to Google's API sand maps now Cooper heavy uses Google's map framework in which it gives us two users Google Maps to show on app and also uses Google's Maps API is to calculate the ETA from point A to point B that is pick a point to the destination point and it uses Google's help to calculate ETA earlier Google used to do all its own it used to repeatedly trace the cab's movements GPS points and builds the road network system on its own and also it used to use the real-time speed and different information from the cab to calculate the ETA or so but now who but as moved on and it uses Google's library heavy so the next thing is preferred access point so if you know so many times say for example there is a big campus in the city say no matter how many times you book the cap from inside the campus you always overall which shows to prefer to us more preferred access points for example this is the entry and this is the exit gate of the campus usually over shows the preferred access point somewhere near today it takes it how did it learn it learns based on the repeatedly uber drivers or the caps used to stop near the entry incent gates because they can't enter into the campus so that is been learned by uber and so it automatically shows to the customers that we can only pick up from these two points we can't enter into the campus so these are called as preferred access point they use different algorithms and machine learning to keep automatically figure out these different parts now let's talk a little bit about how ETS are calculated and why it is very important component of uber or any cab alleviation service say for example a user is requesting a cab from this point so the writer is requesting a cab from this point and the available tabs near the user is something like these three caps cap 1 cap 2 and Catholic so now when a user request for a cab the demand service request applying to figure out caps for a writer now what the service does is it tries to figure out all the cabs which are nearby to this particular particular writer so now we trust it draws the circle and then it figures out there are three caps which are free to take the service but what happens now is it calculates ETA from from the riders to these cab by the road system and then figures out the ETA for all the three different caps this all this doesn't work because these could lead to bigger ETS say for example on one more cab which is about to finish a ride which is very near and device the trip is already happening so this is a better selection than any of these recap as the strip is about to complete in about few minutes and this is much near to nearer to the rider so so uber what it does is it includes all the different factors like u-turn cost turn caused the traffic condition and everything to calculate the ETA and based on the different ETA sometimes just idle caps sometimes the caps which are already solving the trip also included for their so a particular rider now let's talk about database earlier uber used to use a DBMS that is post phase sequel database for operations they used to say you know profile information they used to save GPS points everything in a DBMS it couldn't scale as Google rolled out service in different cities then they talked about new no sequel kind of database that is build on top of my sequel is something called as schema-less they when they are building this database these were the points they consider the first one is it should horizontally horizontally scalable that is you can literally add the capacities in different part of the cities into the network say in here you can see there are multiple nodes in different regions which are added and all together acts as one database that is schema-less so if you don't want to design you can either use big table or Cassandra MongoDB or any of that since they also behave the same way and also a different other consideration which they considered while building scheme Alice was delight and read availability as I have already mentioned cue that every four second ones the caps will be sending the GPS location to the catalyst appearing and those points were sent to Kafka for different processing and also points were written to no sequel for record purpose and to figure out and also point source a said two state machines also write so that it means that the there is like heavy application and also when lose a request for a cap all this latest cap information is also fetched from the DB to show to the customer on application that means that there are tons of great happening there are tons of writes happening to that means these systems should be heavily vital and readable system and this system should never give downtime because we haven't heard goober downtime even for a minute right because every every minute people will be requesting caps people will be writing trips etc so we can't just say that we are doing some maintenance now the database is not edible so the system should be like always available no matter what you are doing for example you are handing notes to the system it should be available you say for example you're taking backup of the storage the system shouldn't go down say for example you're adding indexes to the system then also if the system should be up and running so no matter what you do to the system the system should be always up so that's these are the points they kept in mind while building schema-less that is built on top of my sequel so what Google does is when they wrote services in new cities they try to build their in a center near to it to give the seamless service if not always the nearest look data center is selected and the data will be served from those locations now let's talk about analytics what is analytics in simple words it is making sense of the data which we have uber does that a lot because you need to understand about the customer you need to understand the behaviors of the cab drivers that's when you can optimize the system that's when you can minimize the cost of operation and also make the customer satisfaction better so now let's see what are the different tools which uber users or different frameworks which uber uses to do lot of different analytics as I have already mentioned there is tons of GPS data flowing into the from drivers and also a lot of information coming in from which lawns to the customers all this data is saved either in no sequel or our DBMS or sometimes in his papers if we don't have we're not saving the data directly to your shape is what we can do this we can do a dump of all the data which we have on no sequel and put it onto HDFS and sometimes for the different kind of analysis we might need the data in real time that can be consumed from the Kafka now let's talk about each and every components here see the Hadoop platform has a lot of analytical and analytics related tools which we can make use of to build analysis on the existing data so we can take a constant dump of the data which we have in the database to a service and use tools like - big query tools to get the data which we want from the HDFS next the component Maps or ETA components what we can do is you consume the historical data along with dfdl time data we can retrace the previous Maps data which we have and then we can build the new Maps all together or we can improve the Maps data which we have now and also with the historical data and the real-time information which is coming from the cab like traffic situation and the speed at which the cab is driving or the condition and everything we can use this data to compute ETA the dispatch system here when there is a request for supplying what we hope these servers here contacts these component - for ETA calculation now Google also uses something like simulated artificial intelligence to calculate the ETA in see and much faster so the next component is machine learning or fraud detection there are different kind of frauds happening on the system say payment fraud incentive abuse or usage of compromised account services different algorithms are used to detect payment fraud where people are using stolen credit cards to offer trips in discounted price on different forums Google is taking care of that also and there's incentive of use mostly done by the cab drivers the Google offers extra dollars when they finish say for example 25 rise in a day what goobers uber driver does is they simulate the trip using fake GPS location ads and then they claim the incentives do by doing nothing or by doing booking for life using another mobile phone of their own what Google does is to find out these kind of abuses and uses historical trips altitude data and it retraces with the with the abuse trips altitude and and that way we can easily figure out that this particular trip was fake and uber warns the driver that if they keep hitting them cancel their account and etc and the next thing is compromising on lot of times hackers using phishing techniques gets the username and password of the customers and they use it to withdraw the money in the wallet and start of things how this is uber tackles that it gives us a historical behavioral data of the customer like what what is the usual location from which they used book what is the usual destination and the country of booking and etc based on this kind of information uber also uses machine learning techniques to figure out the usage of compromised accounts apart from that too to real-time streaming distributed analysis we can go with spark or strong framework to figure out the trending planning things happening in the system after analytics the very important thing is logging now we have pretty much complex system over here and since Oberon uses service-oriented architecture all of these components are different services that means they run independent of other system that means that if you want to track what's happening or if you want to debug what's happening in each or every service or the system we need to have strong logging mechanism that means each loggers in each and every service what we can do is we can keep forwarding all the log lines to one of our cluster and from there we can have a platform which is built on top of elastic allow search log stash and Cabana so using Cabana we can build dashboards which shows the total errors which are occurred the systems health and etc not just Cabana we can use kevanna also and there are a lot of different tools available which can gather the logs from different system and show it in a beautiful dashboards so now let's talk about how to handle total data center failure data center failure doesn't happen usually but when they happen it's very difficult to handle the situation how does uber handles this situation what hoop uber has done is over has built a backup data center and that has all the components which we needed to run this show or to maintain the trips which is happening what happens is in the backup data center uber never copies the existing data into the backer center now you might think without the data hook in the backup data center will help to handle that situation now what uber has done is the various marketing or the courting days the drivers app itself act as the data source at the event of data center failure every time when there is a transaction or when there's a EPA calls happening between the drivers either in the data center it also keeps track of what level of knowledge nose or whatever of data it has in there in the driver's lap with a state digest kind of unique ID you can think of say for example this a data center went down the next time the mate scored with this data center it learns that this system is not available then the backup data center kicks in and this is the application driver application the rider application is now talking to the backup data center now this data center doesn't have any of the information for the trips which are happening in this situation the backup the APS in the backup doing a simple learns that I don't know the state information or I don't know what's happening to the trip now using the state digest which is present in this it gives out all the data to this backup data center from the drivers application that way backup data centers now have all the information which is needed to finish the trip which is happening right now now when these things happen the user or the driver will never know that there was a backup there was a datacenter failure and there was a backup data center which is helping now to run the trip now now that I think I have explained all the different components in the whole system like uber so some design most of this data was consumed from internal comm I strongly suggest you to go through the Google's blog and read each and every article because they gives you a lot of information about all the components over here not just here I don't cover a lot of information which is they mentioning over comm it was it was a full of knowledge I said us you to read the engineering google.com are I will leave a lot of links in the description of this video please go through the links to better understand the system because of the time limit I'm not able to explain each and every component even to the death but you can always learn from internet it like this video please subscribe and hit the like button and please comment and share with your friends and I am always opening false editions or if there is any cash you need to be done in the system design or any of the videos please contact me thank you

Info

Channel: Tech Dummies Narendra L

Views: 712,625

Rating: undefined out of 5

Keywords: Amazon interview question, interview questions, software interview preparation, developer interview questions, google interview question, Technical interview question, easy learn algorithms, system design, system design for uber, system design for ola, software architecture for uber, uber architecture, ola architecture, system design interview preperation, design uber, design ola, design lyft, system design for ride sharing service, system design for cab aggregation service

Id: umWABit-wbk

Channel Id: undefined

Length: 36min 55sec (2215 seconds)

Published: Sat Aug 11 2018