System Design Full Tutorial for Beginners | Learn System Design from Scratch | System Architecture

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to get another session by scaler and in this particular session we are going to look into system design complete course before we understand what exactly will be covered in this session please make sure to subscribe to this channel so that you don't miss our upcoming videos and also leave a like if you enjoy our content if you have any queries leave a comment down below and we would love to help you out so now let's understand what are the things which will be covered we'll start off with relational data modeling then we'll be looking into horizontal versus vertical scaling after that we'll be looking into stateless versus stateful systems and then look into load balancing in stateful systems once that is done we'll be looking into consistent hashing and also what is caching after that we will be into system design interview questions and also mock interviews which will be the last part of this particular session so now without any delays let's begin with it let's go ahead and see how to apply those tricks in action the first feature is that user should be able to view and edit his profile which contains the following information this gives us a hint that there should be a user entity table so let's add user entity it can be a user profile as well let's keep it user profile for now every entity that you create will have one attribute which is id this will basically be used to uniquely identify a user profile the other attributes would be coming from the next set of requirements on top left a user should have profile photo on the right side of profile pic we should also show name and contact information so let's add these attributes inside a user profile entity so we'll have a profile pic profile pic can be stored in two places one is in the database itself if you store the profile pic in the database the amount of storage that is required to store all the information of the users will be very large so it's wise to store all the images and video links that are associated with a particular profile into a file store like s3 or we can also store them in a cdn right so instead of storing the profile pic i'll store the uri which will either correspond to s3 or cdn or maybe any other file store as well the next information that i would have to store is the name and contact information in form of an email or phone number or both a user can have multiple skills so there are two options of adding the skills the first option is add that as a list object in the same user profile table so if i am adding this in the form of a list how that will come up is sample values will be java javascript and so on right this pose is one major problem the major problem is that if i want to find out all the users who have java in their skills mentioned so the problem would be that i would have to write something like a select star from user profile where skills like am star java star which means that this is going to be a scan operation so if i have multiple rows in my database then i will end up scanning all the rows if i if i have one billion rows i will scan one billion rows so this approach will not scale the other down side of this particular approach is that i will not be able to auto suggest skills to the user because that will be again a scan on all the uh rows that are present in the database so what we'll do is we'll add one more entity called skills now this skill there are two ways of adding skill right uh either we can have the skill id name and user id stored directly here if i do something like this then again auto complete will be slow because there will be multiple users and multiple users will be having multiple skills if you look closely the two entities the user profile entity and the skill entity what is the kind of relationship between these two entities right so user can have multiple skills similarly a skill can be having multiple users right so it's a many to many kind of a relationship to model a mini to many kind of a relationship what you should do is you should create a mapping table called user skills this will again all our entities have an id so it will start with end id we'll also have a user id here and skill id and we can remove the user id that was there in the skills object if we read the next set of requirements it states that we need to keep information about educational institutions and employers and if you look closely the attributes over here in the educational institutions and the employers are same so what i can have is i can create a entity called organization inside an organization i can have an id field and all the other fields that are there depicting name start here end here and let's say description and i will need to have one more type this type will store information of institution or company so from the looks of it uh or for the given set of features this table works uh really well the challenge over here is that what if i want to add other attributes let's say i want to add cgpa in the educational institution and i want to add salary details inside employers the problem here that comes is that i will have to add cgpa as a one column here and salary also has one more column now depending upon whether the type is institution or company either cgpa will be null or salary will be now it is not a very good idea to enforce application logic in your database just by looking at the database you should be able to figure out that what are the what each column means so to avoid such kind of a problem and to have you know a design that is extensible what we can do is we can split this into two tables one would be education and the other one would be a company both will have id both will have name it will have start here and it will also have end year type is not required because we have already split the table but this alone will not be able to give me any sense of which user went to this particular university which users went to this particular company now what is the relationship between user and education and user and company we would realize that a user can go to multiple companies and he can also go to multiple institutions for completing his education similarly a institution might have multiple students who are studying in it in a company also there can be many employees so it's again a many to many kind of a relationship and we have learned uh in the example about that whenever we have multiple you know many to many kind of a relationship we create mapping entities so there'll be one table for user education and one table for user company the attributes will be id user id and education id education might not be a better word or the better word can be institution over here so we'll have id we'll have user id and we'll have company id as well now if i want to find out all the users who have been to a particular institution what i'll do is i'll go to user education table and i'll query select start from user education where education id equal to the institute id the next requirement is to have recommendations now a particular user can either give recommendations or receive recommendations since this is a list it does not make sense to add this in the attributes of your particular user it makes sense to have a separate entity so let us go ahead and create a recommendation entity will have an id the person who is given the recommendation so let us say user id for that and it will also have a recipient id right do i need to create a mapping table in this case now a particular recommendation will have just one user id and it will also have one recipient that information is actually captured in this particular table so we do not need a mapping table in case of a recommendation so if i want to find out all the recommendations that i have what i'll do is i'll go to recommendation table and i'll select star from recommendations where recipient id equal to my user id so the next requirement is having connections and followers how connection and follower work is that if a sends a request to b that means that a will be able to see all the public posts that are made by b if b accepts that particular connection request then b will also see all the posts that are made by a let's have a design that can model uh this particular feature set so we'll have a connection table inside this connection table i will have an id i'll have user 1 user id 1 and i'll have user id 2. whenever a new connection comes in i'll add that particular row here and as soon as b accepts the request of a i'll add one more row so if i want to find out anybody's followers uh all i need to do is select star from connection table where user id 2 is equal to the given user so for example a send a request to b so a is a follower of b so if i want to find out b's follower i'll need to query the second column user id 2 and as soon as it's a bi-directional mapping i'm adding up adding that particular row as well so i'll be able to find that out as well the problem however is that i'm taking two times the space that i should be taking so to optimize on space what i can do is have one boolean column called is accepted now if a and b are a connection not a follower that means that a center request to b and b accepted that particular request i can keep the accepted flag as true otherwise it will be false now if i want to find out all the followers of a if accepted flag is true then i'll query based on use either user id 1 should be equal to a or user id 2 should be equal to a if it is false then only user id 2 should be equal to a the last set of entities that i am going to talk about is post comment and like let us go ahead and create entities for all of them so there will be a post entity inside this post entity will have an id now what is the relationship of a post with a user one user can have multiple post whereas the reverse is not true one post will be written by just one user so let's add user id in the post entity table whenever you have one too many kind of a relationship or many to one kind of a relationship add that attribute of the first entity in the dependent entity so the first entity over here was user so we will have user id inside post id which is the dependent one and some text to go along with it similarly in comment entity also we'll have an id comment is made by a user so comment is dependent on user id comment is also dependent on post because the comment cannot exist independently of a post so we'll also have a post id here again this is an example of a mini to one kind of a relationship and it will also have text now for likes we can have two tables one is a post like table inside post like we'll have id post id again one to many kind of a relationship and a user id on a comment like table we'll have the same attributes id comment id and user id if you notice here the post id and comment id are basically the entity ids that this like is made on so for example in post like the entity was post in comment like the entity was comment we can club these two tables together similar to we saw in educational and company combined table will look something like a like like will have an id it will have entity id and entity type now this entity type can be of two types presently uh either post or comment finally we'll also have the user who has liked this particular thing if i want to add one more entity let's say a user can have vlogs as well so what i'll do is i'll create a blog table over here and in this type i'll add type equal to vlog as well so without even adding one more table my like table will work i hope this video gave you some sense of how to design schema for big systems like linkedin this is just a glimpse of what we teach and how we teach at scalar lastly we'll keep on adding videos on similar topics therefore please subscribe to our channel to get notifications about new videos hi i'm mohit yadav and i teach at scada in this video we are going to talk about delicious which was a site that started off as a college project and then scaled to 5 million users in this video we are going to learn about vertical scaling horizontal scaling and the challenges that comes with it now coming to the topic delicious delicious was a website that offered users to save bookmarks so that they can access it from any computer the website was started in 2003 by a college student and was later acquired by yahoo in 2005. this was way back when chrome was not there and there was no bookmarking service either by the end of 2008 the website had more than 5 million users and approx 180 million bookmarks saved this scaled from one laptop to 5 million users we'll start from the day it went live on the creator's laptop let's start by understanding what happens when you type delicious in the browser now browsers do not understand the names they only understand ip addresses hence we need a way to convert this url into an ip address in case of delicious dns will return the ip address of the computer on which it is hosted now since every machine which is connected to internet gets an ip address and every time we connect to internet this ip address may change therefore we need to associate a static ip which does not change when you connect to internet again and again and we need to assign this ip to the computer to get the whole system up and running we first need to purchase a domain name called delicious and then map it to the static ip of this computer if you look closely there are two major kind of requests involved one is the get or the read request in form of get user bookmarks and the second one is post or write request in form of ad bookmarks using the ip address that is returned from the dns both this request will go to the same computer and within the computer there will be two things running one is the application server which will listen to all the requests that are coming to the http port 80 and the second one will be the mysql server which will store all the bookmarks that my users are trying to save so till now my life is good i'm running the website for free and i'm using campus internet my database is set up on my personal computer so i'm not paying anything for the resources and i'm making money out of ads soon the website became an instant hit the user ways kept on growing and the challenges also kept on piling up in the same proportion every time a new bookmark was created meant that there is a need for extra storage to store this particular bookmark now the best available hardware at the time was around 50 gb of hard disk 128 mb of ram this was basically a pentium 4 era to keep up with the demand he bought better hardware meaning he bought a better laptop a better computer the bottom line is he bought more ram or more hard disk or maybe multiple cores but soon even the best hardware available was not enough to keep up with the growing demand of the business in fact most companies of the era focused on upgrading their hardware to solve the growing needs of the business this form of scaling is basically called as vertical scaling however delicious founder was different he knew in the long run his approach will not work not only he will be restrained by his system resources but he might also lose valuable customer data if his laptop crashes this website is not reachable when his computer is turned off hence this computer was a single point of failure to overcome the challenges of vertical scaling his friend suggested him to utilize their own laptops or computers the idea was to uniformly distribute the incoming request to available computers this mode where we are overcoming the scalability hurdles using multiple servers and computers is basically called as horizontal scaling now we know that the client talks to the dns to get ip address corresponding to a particular website but the question is the website is now running on multiple server instances we have let's say four servers running the delicious website which of the four servers available should be stored in the dns if i randomly choose any of the available server all the requests will go to that particular server and that server will eventually die to fix this have one machine whose responsibility is to route the traffic to the available instances this machine will lie between the dns wall and the servers that i have and the dns configuration in the registry i'll mention the ip address of this special machine and not of the any of the instances that i have now the question comes won't the special machine also die if all the requests pass through it the reason why this special machine will be able to handle higher traffic is because it is only routing the traffic that it is getting to one of the available instances it is not doing any compute or memory intensive task and modern day machines can handle up to 1 million requests per second so using this we have solved for high number of queries per second on my system but there is one more problem that still exists that is single point of failure just like any other computer what if my special server also dies to solve this we can have a standby routing server which will not serve or route any request when my main server goes down swap the main server with the standby server but how do i swap should i change the ip address in the dns registry doing this will take minutes can we do better yes we can remember we assigned a static ip to a special server the key here is take the static ip and assign it to the passive one passive one meaning the standby server the special server we just spoke about is called load balancer and we will study more about it in the upcoming videos hello hi and welcome everyone to my new video on system design i am mohit yadav and i teach at scala i have been fortunate enough to work on systems that saw over 10 million concurrent users and have also built data pipelines that can crunch around one tv of data per day having led teams at nutanix and hot star which have built such massively scalable systems i'm here today to share the same knowledge with you all in this video we are going to learn about a very important system design concept called load balancing to understand how a load balancer works we first need to look at how a typical request flows through my entire system a request will start from a client application this client can be browser a web app or maybe a mobile app this client will call dns to resolve the domain name that i'm trying to access for example if i type delicious.com in the browser the request will go to the dns server dns server will give the ip address corresponding to delicious.com and it will return it back to the client subsequent request will then go to the ip address that is returned by the dns this ip address can either correspond to the application server which is capable of catering to that particular request or it might correspond to a proxy server which will act as a gateway to route request to the application servers since this dns happens to be the first point of contact for almost every request on internet this system needs to be very scalable and available the amount of load that this system takes is immensely huge as compared to any other system in the world also the latency which is nothing but a measure of time it takes for a system to respond to a particular request needs to be extremely low what would be the total latency over here if i have to calculate the total latency of the system it would be dns resolution plus the request time so request time will be application dependent but the time it takes for dns resolution is same for every request on the internet if this time is high the overall latency of the system increases so my goal as an engineer should be latency of the dns resolution should be extremely low and to achieve this we try to cache the dns mapping at various stages what are the various stages we try to cache it at the browser we try to cache it at the os isps and then we also replicate it to dns servers there's one more important thing now i assume that if the dns servers are placed at only one geographical location let's say us and the requests are originating from let's say india it will add a lot of latency because india and us are geographically apart from each other and that's the reason this geographical distance will add to extra latency typically it takes around 200 milliseconds to for a request to reach from us to india so every request which is originating from india and is getting resolved in u.s will have an inherent lag of 200 milliseconds it is not a very desirable outcome right so to avoid that kind of scenarios dns is basically replicated across the globe and because it is replicated across the globe updating or changing the ip address on these dna server is not a very easy task this change will have to be propagated to the entire world which might take hours or even days in some cases i have a wonderful resource in the description that you should refer to understand how a typical dns works also i would like to pause here and request you guys to please like and share this video and support our free content on system design topics and software engineering in general now coming back to the topic in the example that we considered the dns was just returning one ip address it will work for most of the websites in the world but consider this dns resolution to be of google or facebook scale basically a tech giant which have massive amount of queries massive amount of user base and they process billions of queries per second in those cases if i just return one single ip address chances are that all the request of the client will hit that particular instance and there is no such instance which would be able to handle traffic of the entire world so that machine will eventually die to avoid this what typically these tech giants do is they deploy multiple ips on which the client request will resolve to right so instead of returning let's say ip1 they return a list of ip so ip1 ip2 ip3 ip4 and they send this particular list to the client now it is the responsibility of the client to figure out which iep to talk to typically in dns resolution phase it tries to follow up something called as a waterfall model what is a waterfall model it will try to connect with the first ip if the connection is successful it will not connect with ip2 ip3 or ip4 if the first connection itself is successful it will connect with the first ip now if all the clients will start connecting to the first available ip that also is not solving the problem even though i am returning a list but essentially each client is connecting to the first ip that is returned so if i return the same ip list to every client i will be gated by a limitation that every client will try to connect to the same ip what dns do is instead of returning the list like this they return the list in a shuffled order meaning that it is very much possible that user 1 gets a list like ip1 ip2 ip3 and so on user 2 gets a slightly different order like ip2 ip1 and ip 3 user 3 gets a list like ip3 ip2 and so on this way i have successfully distributed the load to my available instances right another way to do traffic routing is you to use a jio based routing strategy that means people who are residing in india will get a ip address corresponding to india data center people who are residing in u.s will get our ip address corresponding to us data center that also bridges the latency gap why because if in user sitting in india gets our ip address of the data center in us there will be inherent lag in resolving the subsequent request so it makes sense to have a jio based routing strategy as well now we have identified the ip address which will be responsible for answering all my client request this could either correspond to a single server or a gateway server which will route traffic to my available servers if we choose to route the entire traffic to a single computational server the only way to scale is to run our application on a better hardware however if we choose to route traffic via gateway server we can add more instances as and when required and adding more instances will help us cater to multiple or higher number of incoming request another important advantage of this approach is that this allows application servers to keep evolving without the need to make dns changes now this dns changes we have learned that it might take hours or even days in some cases so if we have to update the code of my application server all i need to do is add more application server let's say as1 as2 and as3 resistor these application server in the up in the load balancer and de-register the older ones and i have saved myself from updating the dns record if this was the one server case the ip address of that one server would have changed and i would have to update the dns mapping as well lastly one very very important benefit that we are getting out of having a gateway machine fronting my application server is that this gateway machine also provides a layer of security how all the machines that are below the load balancer can typically be in my private network meaning these instance can typically be disconnected from internet and whenever they want to be using any data from the internet it will come only by a load balancer so my load balancer is the wall that connects to the internet and my application servers just talks to the load balancer the load balancer also takes care of encrypting and decrypting the message and securing the request via https so my load balancer layer can typically be https and my application server can implement a http protocol now there are two major types of load balancer one is a hardware load balancer and the second one is a software load balancer a hardware load balancer is a specialized hardware which takes care of routing the requests that are coming to it most likely you end up using a software load balancer which can run on a commodity hardware to route traffic and this software load balancer is again divided into two types one is a layer 4 load balancer and the second one is a layer 7 load balancer this layer 4 and layer 7 translates to the osi network layer right layer 4 if you remember is a transport layer and layer 7 is basically the application layer in the transport layer the only thing that we get from the incoming request is the ip information so we only get the source ip and based on the source ip and the port in the incoming request we route the traffic to the available servers whereas in the application load balancer we get much more richer information in form of the headers that are passed in the request the query params the path params and also the http method but what are the pros and cons of using both right in what cases should you use layer 4. if you look closely in the transport we only have to rely on source ip and port and based on this we route the traffic so the amount of competition that we are doing in the layer 4 load balancer is less as compared to the competition that we are doing in application where we are processing the entire request so in layer 4 it will be much faster to route the request whereas in layer 7 it will wait for all network packets to arrive before deciding which server should be able to serve my incoming request but these days we are having much better hardware which are drastically reducing the slowness factor of layer 7 however there is one more distinction why you should still use layer 4 in some cases because in layer 4 we are exposing only minimal information to the load balancer we are only exposing the source ip address and nothing else hence layer 4 load balancers are slightly more secure as compared to layer 7 where we could extract the entire information about the incoming request the last thing that we want to look at is how do we even route traffic from a load balancer to my available application server load balancers keep a registry of all the available application server so it will keep as1 as2 and as3 and with it will also maintain the state in which these systems are so for example as 1 2 and 3 are in available state so it will mark it with green and if there are other servers which are not in available state it will mark it with red and request will not be routed to those servers but what should be the routing strategy our goal is to try and optimize on the resources that we have and based on the resources we should route the request one of the very trivial or the brute force way to route traffic is to send the first request to first server second to second and the third request goes to the third server the fourth request again goes to the first server fifth request goes to the second server and six requests goes to the third server and we'll keep on doing this circular distribution of request to all the available servers in my system this kind of routing strategy is called as a round-robin strategy but as you can see this is not the best strategy to go for why this is not a best strategy we can see that the application server 2 has twice the ram twice the number of cores and twice the number of disk available as compared to application server one and two therefore it should ideally process more number of request as compared to the other servers so what we can typically do is make the routing strategy in such a way that application server two gets approximately twice the number of requests as compared to my other available servers how i can do that i'll route first request to the first application server second request will go to my second application server third request will also go to the same server fourth goes to application server three fifth goes to one six and seven goes to application server two and eight goes to application server three that ways first server gets two requests this gets four and the last server will get again two request application server two is getting fifty percent load application server one is getting twenty five percent load and similarly applications server 3 is also getting up 25 of the load this kind of routing strategy is called weighted round robin the other routing strategies worth mentioning over here is a leased connection routing strategy this leased connection looks at the number of active connections that are there with the application server and server with least number of active select connection will get the next new incoming request for example if there are five connections to application server one six connections to application server two and let's say four connections to application server three the next incoming request will go to application server 3 and this number will increase to 5. as soon as this request is processed the number will again decrement to 4. here also we have the same problem as in round robin strategy that we are not accounting for the capacity of the instance to accommodate that here also we have a weighted leased connection routing strategy where you can assign weights to a particular application server and routing will be done based on that the last strategy that i am going to talk about is basically a least response time strategy in least response time strategy we look at two parameters one is the number of connection it has and the second is the average time it takes to complete a particular request considering these two parameters we decide which server should cater to my next incoming request now all these strategies that we have covered till now work well if every app server is equally well equipped to answer my query meaning that if i have a query a i will get the same response if i send the query a to application server 1 to application server 2 or application server 3. regardless of where this query is getting processed i will get the same response this kind of a system is basically called as a stateless system the last set of routing strategy that i have not covered in this video is called a hash based routing strategy in this hash based routing strategy we use the source ip address or the url and we calculate a hash out of it we have another video on consistent hashing to explain this routing strategy in depth so please go ahead and watch the video on consistent hashing to know more about the hash based routing strategy right this typically works on stateful systems again stateful system and stateless systems is something that we are going to cover in the upcoming videos so please stay tuned for that video as well hello hi and welcome everyone to my new video on system design if you are a software engineer chances are you would have heard about terms like stateless and stateful systems this video is a beginner's guide to understanding these terms and have practical insights about when and where to use these systems i'm mohit yadav and i teach at scalar academy i've been very fortunate enough to work on systems that saw over 10 billion concurrent users and have built data pipelines that can process terabytes of data per day now let's get started let's say you're building online calculator and you're asked to add two numbers do you appreciate that every server in the system is equally well equipped to answer this query it doesn't really matter which user is asked to do these operations every server in the system will yield the exact same result the reason each server is able to do so is because the servers are not saving any context or state this makes every server equally favorable to answer that query such systems are called stateless systems because they're not storing any state context or any data in the application server now let's consider another example let's say we are trying to build a chat bot this time and one user sends a query asking for restaurants that are nearby his location the chatbot first saves this message in the memory and then responds with a list of restaurants nearby it then asks the same user to pick one restaurant out of the list now the user let's say choose third restaurant and if this request goes to the same application server it will be able to return relevant information since it has context about the previous chart history however if this request goes to some other server that server will not be able to comprehend the message properly due to the absence of previous conversation history therefore in such systems the context of the conversation really matters this context can be decoupled from the application server by simply adding a storage layer to save the messages such systems where the context or the state lies within the application server are called stateful systems now that we understand what are stateless and stateful systems let's take a moment to appreciate the pros and cons of each system we'll start off with stateless systems which are very easier to scale the reason is that each server in the stateless system is equally well equipped to handle the request thereby if the load on my system increases i can just add or remove servers in the system and my entire infrastructure will work just fine these systems are also resilient to server failures because there is no state which is maintained at the server and even if one server goes down there is no actual loss of data and the lost server we can just replace it with a new one and my system as a whole will work as expected however most of the systems that you will design will need to store some state in some way or the another if you want to keep the system stateless chances are that you will store this data or the state in a storage layer now querying this data will involve a network io call this operation will drastically slow down your system performance because you are involving a network io call reading the data from your application server will be slower as compared to doing a network call and fetching the data from the storage layer now let's compare this with the stateful systems since the context or the state of data required to answer client queries reside within the application the latency of stateful system is much lower as compared to the stateless ones however these systems are not resilient to server failures if one of the system goes down the data that was residing in the application goes down with it hence it makes sense to store either the derived data or the data which is actually very short-lived because of this it is also harder to scale right or it is quite difficult to scale systems because with the increasing number of request we will have to transfer some amount of state from one server to newly added server this might sound trivial but in reality it poses many risk associated with distributed systems you would have heard somewhere that stateless systems are beautiful and software engineers should always try to design their systems to be stateless this is true in most of the cases right because we can just add or remove machines and the load of the entire system will be uniformly distributed however in some cases it is essential to store the data locally in the application server as well especially in cases where the state is very short-lived i'll give you one very good example let's say you're designing a multiplayer game like pubg if you store the state of a particular match in a separate storage layer the latency of the entire system would be huge imagine somebody shooting someone in the game that state is stored in a separate db and it is propagated to multiple users who are also playing the same game and they are reading from the db it will take there will be a lot of lag hence the latency of such systems needs to be very low so what we do is we store the state of such applications in the application server itself also it is important to note that the state of a pubg match is very transitive we will not need this information once the match has ended hence in such cases stateful systems makes more sense in nutshell just like any other system design concept deciding which system to go for depends upon the use case in hand ever wondered how games like pubg scales horizontally we learned that designing a game like pubg would require us to save the state in the application server itself we also saw that this state is very short-lived to understand why let us see what information we are storing about the game every game will have players and each player will have to store their location on the map their health weapons and the tv information that they are playing in we need to store this information only till the match is live once the match has ended we can destroy this particular information because it does not make sense if we store this information in a database we end up wasting a lot of time why because we'll have to query the database every time a person request for the match state and this is an i o call network i o call which will waste lot of time hence it is critical to store this information in the application server itself now in order for a match to work seamlessly all players of a particular match must be routed to the same application server else they won't be able to find the relevant data for that particular match to achieve this what we can do is we can assign a match id and ask our client to send this match id whenever they're making a request they can send it as a path pattern they can send it as a header they can also send it as a query param the thing to note here is that the load balancer must ensure routing the traffic based on this match id in simple words if the first request for let's say match id 2 goes to server 1 all the subsequent requests of same match id must be routed to server 1 only which algorithm can be used to distribute this traffic well in this video we are going to discuss one such approach we are also going to look at its pitfall and build intuition for the upcoming video on consistent hashing i am mohit yadav and i teach in scalar academy i have been fortunate enough to work on scale at hot star and nutanix where i build systems which has massive read and write volumes one of the reasons behind doing the system design primer series is to impart the knowledge that i have gained over the past one decade and help budding software engineers get better at architecting systems now coming back to the topic remember our goal was to distribute the load in such a way that all the request with same match id must be routed to the same server match id 2 server 1 all subsequent requests of match id2 going to server one the second point was that the load should be uniformly distributed meaning if i have n servers load on each server must be equal to or must be nearly equal to 1 by n and finally i should be able to add or remove instances from the cluster why because that is the essence of horizontal scaling i don't want to build an instrument where i have constant number of machines my load on the system can increase or decrease depending upon that the number of servers in my cluster can increase all degrees the first strategy if i have to think about the approach the first strategy that comes to my mind is that i have a match id and if i have to route traffic to each of the servers evenly the basic strategy that comes to my mind is that i can simply take a mod of match id with the number of servers which is n this number will signify which server will process my request let's take a few match ids to see how exactly this is going to happen if we have three servers running let's say the match id is 11 11 mod 3 is 2 42 42 mod 3 is 0 so it will be routed to the zeroth server 34 mod 3 is one so it will without it to first server similarly 53 mod 3 is 1 63 mod 3 is 0 which will be routed to 1 and zeroth server respectively we see that the above approach will satisfy my first two goals let's dig a little deeper to find out what will happen if we add one more server now my number of servers or n becomes 4 so let's consider all the examples all the match ids that i took in the previous case where n was equal to 3 so now since n is equal to 4 11 mode 4 will become 3 42 mode 4 will become 2 34 mod 4 will again be 2 53 mod 4 is 1 and 63 mod 4 is 3. now let's consider the other case where we are removing one instance from the cluster level mod 2 1 42 mod 2 0 34 mod 2 again 0 and 53 mod 2 is 1 63 mod 2 is again 1. now what is happening here let's consider the first case if the match id is equal to 11 which was getting routed to server 2 when we had 3 servers will now be routed to server 3 when we add a new instance unless we transfer the data which is already residing in server 2 to server 3 our users will not be able to get the match information and as we can clearly see from the examples that we took above whenever there is a change in the number of servers a majority portion of the keys will be distributed to the new servers and this by no means is an ideal behavior we need a way to minimize the need of data transfer whenever we decide to add or remove instances now before winding up i would like to give you a home assignment that's a match puzzle the puzzle goes like this consider initially there are m servers in the cluster and you can add or remove servers from the clusters let's say you do some add-on remove operation and the final number now becomes n this n can be greater than n meaning that i have added new servers in the cluster or it can be less than m which means that i have removed few servers from my cluster what you have to do is you have to write an equation using which you can find out the amount of data that will be moved write your answers in the comment section and we'll see who gets it right i hope you all get an idea about basic approach for load balancing in stateful systems we will look at a better approach which is consistent hashing in the upcoming video how do you distribute the load evenly in stateful systems in the last video we saw one such approach where we used the modulo hashing the approach works perfectly fine until we decide to add or remove the instances whenever we want to add the instances a majority portion of the keys needs to be rehashed in this video we are going to look at a better approach called consistent hashing i'm muhatev and i teach in scalar academy now coming back to the topic let's have a look at the problem statement again our problem statement is that we need a distribution scheme that does not directly depend on the number of servers so that whenever we want to add a remove service the number of keys that needs to be relocated is minimized to understand the solution let's visualize a circle with marking 0 to a very large number let's say 10 to the power 18 and we'll call this particular ring or a circle as a consistent hashing ring then we will take two hash functions which will basically generate keys in the range of 0 to 10 to the power 18. let's call them hs and hc the hs hash function will be used to mark servers on the consistent hashing ring the hc hash function can be used to mark partition key or the client id on the same consistent hashing ring now to find out on which server a given key needs to be routed we need to locate the key on the circle and then move in the counter clockwise direction to find the server now there are two main problems with this approach first uneven distribution the server which is located at the max distance in the counter clockwise direction will get the most number of request and the second one is the weighted router where we might have a scenario where one server has twice the number of resources as the other servers in that case we need to ensure that the servers with more resources gets more number of requests to solve for the other problems what can we do we can assign many labels to each server so now instead of having labels a b c for three servers that we had we'll have server labels between a0 to a9 b0 to b9 c0 to c9 all marked in the same circle now server b is twice as powerful as the rest it could be assigned twice as many labels as a result it would end up holding twice as many objects on an average basis this way we can be sure that the keys are evenly distributed and server with better resources are holding more amount of data now let's see what happens when we remove one server imagine that server c is now removed to account for this we must remove all the labels between z0 to c9 from the circle that we had the circle which was basically representing the consistent hashing ring this results in the object keys reassigned to server a and b whatever keys were residing in server c will now be reassigned to server b and c respectively but what happens with the other object keys the ones that originally belong to a and b the answer is nothing absolutely nothing and that's the beauty of consistent hashing the absence of cx label does not affect the keys in any way so removing a server results in the object keys being randomly assigned to the rest of the server leaving all the other keys untouched and if you notice closely all servers are still holding one by n amount of data earlier also they were holding one by n when they had three servers we they were holding the one third of the data and then when we removed one server every server is holding 50 percent of the data now consistent hashing is used to uniformly distribute the data in stateless systems like kafka cassandra etc it's also a very powerful tool that finds its application in the stateful uh servers as well as it is a way of implementing sticky sessions in the load balancer i hope you got an idea about what consistent hashing is how it works hi my name is anchorman and i'm one of the co-founders at scalar academy if you are new to tech and you have recently heard the term caching and don't know what it means then this video is for you i was fortunate to see scale at facebook i was fortunate to be part of teams that build products that scale to millions of queries per second and billions of users i'm here to share some of those learnings with you all right so i'll try to explain caching as as simple examples as possible by the way there is a bonus problem at the end of this video so please do stay till the end of the video to look at the solution of that all right so let's imagine that you want to make milk tea one obvious ingredient of milk is milk itself and milk i find in most grocery stores department stores that are around me so one obvious way whenever i'm making milk tea is that i go to the department store fetch milk from here and then make my milk tea this two and four process by the way ends up taking a lot of time however one thing which strikes me is by the way i can buy a refrigerator i can buy a fridge which stays in my house only where i can buy few milk cartons at a time and i can store it here and when i'm making a milk tea now then as long as there is some milk present in the fridge i can use it from here if there is no milk present in the fridge then i go to the department store and i buy the next set of cartons of milk so what i have done what have i done in the process i i could still have gotten milk from department store it's the same milk however i created this temporary storage which is close to me which is much faster for me to be able to store those milk so that i can make my milk be faster plus then i have more motivation to make milk tea because i have it in my fridge very similarly if you look at the design world distributed systems world you always have these systems that are looking for some of the other information right so machines talk to each other they they query for information imagine you load your facebook profile page that profile page would would need to show your name email id your friends et cetera et cetera before that would need a lot of information so this machine would then ask for profile information for yours which would be there in some other machine in the hard disk and reading from hard disk is slow if you would have noticed when you transfer movies etc from from your friend's laptop those movies are written to your hard disk and those hard disk rights are fairly slow very similarly reading from the hard disk is also slow especially if if your data is lying somewhere you don't know where so you'll have to search through the index to find where this profile information is and then fetch that data from there however if the same information was in ram somehow which is main memory then accessing that would be faster however i mean if if you're talking about facebook facebook has so many people so many profiles all information cannot reside in ram it has to be in the hard disk which is the same as me going to the department store every single time then i want to make milk tea however if i know that there are let's say certain profile pages that are fetched very often that are more likely to be fetched than the other profile pages then maybe i can create a fridge a refrigerator where i can store the information of these pages so that when somebody asks for these pages then i don't go to the departmental store which is the hard disk in this case maybe i have that information in ram or a faster memory access i mean let's not say ram some place where i can find that information faster than actually having to read from the hard disk of a different machine that process is called caching caching is exactly what i did with my refrigerator to department store it is exactly doing that to reduce the latency of fetching that information in some cases a cash will never be as big as your total storage there will always be cases when i run out of milk in my refrigerator or when the information is not present in this fast storage and that process by the way this is caching and when i find data in my cache it's called a hit when i do not find the data because it is not present therefore i'll have to go to the departmental store or go and read from this hard disk that is called miss so whenever you find data in the cache it's a hit you don't find it it's miss that is caching and caching is something by the way you encounter on a day-to-day basis in for example you might have noticed that when you load a website for the first time it's usually slower but when you refresh it loads faster sometimes when your internet is not working you keep refreshing after a while and eventually the website loads your internet doesn't get faster that's not the reason why it loads it loads because once let's say there is there was an image on the website once that image has loaded that means your browser has now gotten that milk carton of milk so it caches that in its own memory on your browser on chrome it gets cached so next time when you load the same website with the same image the browser doesn't go and fetch the entire image from the other website it only checks if it was the same image if it was the same image then it fetches it from its local memory which means i don't have to go over that slow internet to fetch that image and therefore by using caching i have significantly reduced the latency of fetching that data that is caching with your browser systems within also do caching in various forms one simple question i have if you haven't seen my cap theorem video please go and see that because this question is slightly based of that imagine you are building a very consistent system you would want that if you're doing caching then this cache data is is exactly consistent and exactly the same as the data present in your hard disk but whatever entries it has if you're building a consistent system and you're doing caching there there when i ask you to update how do you go about updating um such a system in the context of the departmental store and milk tea example let me say let's let's say that you want your milk tea to always use the latest milk that is present in the department store or always use the milk that is present in the department store how do you validate that the milk that you have in fridge as of now is the exactly the same milk that is present in department store how do you do that there is something called as uh well actually if you look at uh how you would do it there are only two ways of doing it right one is when you tell me that hey here is new milk to restock your store with or here is new data right which is let's say and on my profile name for some reason my name changes so whenever there is data pertaining to me that is being updated it does not go directly to the hard disk imagine this is my hard disk or other storage and here is cache whenever this update operation comes it goes through the cache so first i check is this entry present in the cache which is is my name is my profile cached if it is cached then i first update the entry in the cache and then i update the entry in the hard disk and only then when both of them have been updated only then i return success this is called write through cache which means whenever new milk comes to the department store it goes through me it goes through through my house i do a verification and only then i let the milk pass that is right through cash or the other way could be that if the database is aware of where my caching system is whenever there is a profile update the database tells cash to throw away whatever information i have regarding my profile because that is now out of date and only once invalidation has been done only this write has been done and validation has been done only then you return success or in the context of again departmental stores as well as milk let's say whenever my department stores gets a new kind of milk they call me up and they say hey look please throw away the milk that you have in your refrigerator it is old now we have a newer quality of milk and that way i make sure i'm always cooking my milk tea with the newest quality of milk that i have in the market that way i stay consistent that was all for me for today i hope again caching made sense to you you understood why caching happens in systems hi my name is anchorman and i'm one of the co-founders of scalar academy if you're attacking chances are you would have heard about cathedral today we would talk about what c a and p and cap theorem stand for why it works the way it does and what does cap theorem mean in the modern ecosystem i was fortunate to see tech at scale at facebook and was able to design products that served millions of queries per second and billions of users i'm here to share a few of those learnings with you all right so let's first look at why do we even need to learn cap theorem catherine is one of the most basic principles of distributed storage so if if at all you think about building or working on distributed systems in the future cap theorem is a must-have must know so that's why you need to learn cap theorem now before i get into explaining cap theorem and i would try to do that with using as simple example as possible i want to get give credit to kaushik whose blog i have borrowed a few ideas from uh so thank you kasich for coming up with a very very innovative example so let's get to it right so to elaborate uh cap theorem i'll take example of of a real world example of let's say you imagine i am an entrepreneur i'm looking for ideas and i feel you know what like people need reminders let's build a service that just stores reminders right which is i get this very premium number imagine i named my company as abcd and i get this very premium number which is one two three four one two three four people could just call me on one two three four one two three four and they can tell me what they want to remember and then when they call me back i tell them what they wanted to remember and then things work it seems to me that this is a great idea so what i do is i set up this line i i sit in the office imagine this is me forgive my drawing imagine i sit in the office every time somebody tells me let's say i got a call from x and x says hey please remember that i have a flight tomorrow at 9 00 pm i said great next time they call and they ask me tell me what i wanted you to remember and then i tell them you know what do you have a flight tomorrow at 9pm and then this seems great i charge people very normal amount of money to store this information and service takes off everybody needs this there's a big market everybody starts using it and then i start to hit the problem and you'll notice most systems they start to hit the problem when they hit scale then i realized that i'm getting a lot of calls and i'm not enough to address all of those calls myself so then i start to think and then i think sure let me let me get my wife also involved so now this number actually has me and my wife both of us both of us sit with our diaries where the call gets routed to either one of us whoever is free and when i get a call i just very quickly if the person is asking me to take a reminder i just take down that reminder right and when the person calls me back i respond back to the reminder great i have suddenly now doubled my capacity if i was able to take 1000 requests in a day before now i can take 2000 requests because i have my wife to help me as well very soon i got my first unhappy customer mr x calls me one day and says hey can you please remind me when is my flight and i look through my diary and i realize i don't have an entry corresponding to mr x and i tell mr x hey look i don't have an entry responding to your corresponding to you mr x seems very disappointed very angry and then disconnects the call now i start to think what might have happened either mr x was delusional or maybe there was a fault at my end and then i see that there might have been a problem it is possible when mr x wanted to store their reminder they might have called my wife and my wife would have stored this entry that hey x wants a reminder for a flight at 7 00 pm tomorrow however that entry is not with me because that call never came to me however when x wanted to understand when their flight was they ended up calling me instead of my wife because they don't know they only know this number one two three four one two three four and i don't have that entry and this is why x is very unhappy this is what is called as consistency problem data consistency problem which is an end consumer might have stored some data with me which and when they ask me back there is no guarantee that they will get the data back because it depends on where the request comes to if the request comes to my wife they'll get their data back the request comes to me they'll not get the data back if my wife's diary has that entry this is a big problem if not solved my company will die so i start to think i start to think and i come up with a solution i say hey look i mean here is what we could do when x calls me to say that please note down that i have a flight tomorrow at imagine 6 pm i'll note it down in my diary i'll also tell my wife to note down this entry which is also that tomorrow 6 pm flight and only when both of us have written the entry in my diary in their own diaries then i return success then i tell mr x i have noted down your entry that way there is guarantee that both of our diabetes my diary and my wise diary they'll have the exact same entries consistency of data they'll have the exact same entries and great my consistency problem is solved now when x calls any one of us we can respond back with the right data points things were going great one day i again start to face a problem one day my wife calls in sick she's not keeping well so she's not in the office what that means is that my wife for the time being is gone for a day is gone and at that very moment this another gentleman why calls me and tells me to note down an entry i say okay sure what's your entry and they tell me you know what i have a flight tomorrow at 3 p.m i said great let me note it down let me by the way also go and tell my wife to note it down oh by the way my wife is not here she can't note it down and therefore i'll have to tell why that hey i can't note down your request your request actually has failed which means for the entire day i can't take in any new request that is a availability problem which is my system is not available to take all kind of write requests in the case of software systems for example like facebook you could think of facebook saying that for an entire day or for an hour or for a few minutes you can't make any new posts or you can't post any new photos on on facebook that is an availability problem so again i'm very stressed what do i do i don't want my business to fail so i think and think and think and i come up with another solution i say you know what when my wife is not here so i have my diary and imagine my wife is not here so i have written a bunch of entries this new day comes whatever entries i get mr x says to note down something i'll note it down mr y says to total something i'll note it down and when my wife returns back because i want both of our diaries to be identical i tell her look you'll not take up any phone calls till you have noted down all of the additional entries that i made in the previous day or if she was gone for last two days than in the previous two days so you first note down all of the additional entries and then you start taking calls and then my problem is solved now because if if x calls me and says that please note down that i have a flight tomorrow at 8 pm even if my wife is not in the office i can still note down the entry and therefore i am still available since my wife will come back and she'll she will actually catch up with whatever new entries are there in my diaries before answering any phone call so we're also consistent which is great things are going super good and then comes another problem one day me and my wife we both have fight and we stopped talking to each other so imagine i get a phone call ex says hey look please note down i have a flight tomorrow at 9 00 pm i can't note it down because i'm not able to talk to my wife and if if i noted down then and say tell x that hey look i have noted down your entry then my system becomes inconsistent right because my diary has now different entries my wife has different entries and she's still taking calls so when this network partition happens when this network partition happens where i'm not able to talk to my wife or other systems then i have to make a choice do i want to stay consistent or do i want to say available if i want to stay consistent then if mr x calls me and says please note down the entry i will have to say sorry i can't my system is down it's not accepting any new write requests if i want to stay available then i can take down the entry but then i become inconsistent so i if i choose consistency then i become unavailable if i choose availability then i become inconsistent and that is exactly the caps theorem caps theorem says that if you have consistency if you have availability and if you have partition tolerance then at a time you can only guarantee two out of these you can't promise all three at the same time which means when this network partition happens i would have to choose between consistency and availability i can't have both that's the gap theorem now i have two bonuses to add here most systems that are available and partition tolerance which is called ap a for availability b for partitioning c for consistency right most systems that are ap can still make sure that they become eventually consistent what does that mean that means eventually my and my wife's diary they have same entries how do i make sure of that whenever we start talking again we can exchange notes think i mean or maybe just imagine that there is a clerk here that i've hired who just takes up all of the new entries in the last one r tells me what those entries were so i take those down also by the way ask me what were the entries that i made in the last one are goes back to my wife and tells here are the new entries that were made in the last one are so there might be periods in between that i was inconsistent but eventually we become consistent so that is eventually consistency there is one more extension to cap theorem that was done very recently actually not very recently about 10 years back which is called as p a c e l c theorem and what this theorem states is that hello when network partition happens when my systems can't talk to each other then sure you have to choose between availability or consistency you can't have both however there are going to be cases when my systems can talk to each other which means there is there is no network partition which is else so else i have to choose between latency or consistency i can't have both if you notice when i made this decision that to stay consistent i will make sure that on every single call i will make sure my wife also notes down the entry and only then i return success then i am actually waiting for my wife to note down that entry my wife could be doing something else at that time my wife could be answering some other call and in that case it the entire request might end up taking a lot of time so the latency could become high because i want consistency however if i was willing to let go of consistency which is i wanted to say that hey look i'll just note it down and maybe my wife could probably catch up later on then latency could stay really low so then you would actually have a balance between latency and consistency then there is no network partition so that's what this lc theorem is which is just an extension to the cap theorem because cap theorem does not talk about latency i hope that made sense to you and i hope that was helpful this cat theorem is one of the most basic and one of the most useful theorems you'll come across when designing distributed systems i'll keep doing more videos like this if you like this video please like the current video and subscribe to the channel thank you thank you so much hello everyone welcome to this session on system design primarily objective of this round is to help you guys tell you about what kind of questions you can expect in system design now and more importantly how do you answer in a particular uh system design round so we'll take one example we'll discuss one framework around what should be the ideal framework around discussing such questions all right then let's get started so the agenda like first let me give you a brief about why you know these system design topics are important they have importance in two areas right one is your ongoing job and the other one is the interviews so whenever specifically it makes sense for people who have higher experience right people who are coming from let's say two three years of experience they are expected to have some knowledge about system design right so system design encompasses two com main components one is the low level design uh wherein you'll be expected to either write the schema or maybe design the apis or maybe uh design the class like what are classes and interfaces how would you design that particular uh code and the other one is uh having a broader perspective of how different components are going to behave so typically to have uh you'll basically get around one hour or maybe some more time to discuss a particular system in detail uh during your interviews so uh it's best to follow this particular framework i have followed this particular framework for facebook and i've i was able to crack facebook with this particular framework and whenever you apply also your recruiter will also tell you about the frame the same framework that i'm going to be telling you so without wasting more time uh let me just introduce the framework to you guys and then maybe if you have any questions you can type it down uh in the comment section i'll read the questions from there right so the framework has five steps it's a five step framework you always start with gather requirements right so your first step should be to gather or ask clarifying questions about the requirements most of the time i'll give you a side beside whenever i'm giving you the framework i'll also give you one example and correlate how if for this particular question what should be the ideal answer or ideal output so uh let's suppose i am asked to design a url shortener right let's suppose i appear for a facebook interview and my interviewer asked me that hey you need to design a url shortcut now this by itself will basically give you some hints about how you should typically approach the problem even if the problem is 100 clear to you you must ask some clarifying questions most of the time the problem will be vague enough that you will have to ask uh you know clarifying questions but even if the problem is crisp clear just reiterate the problem in the same you know in your words so that you and the interviewer are on the same page right so gather requirement step will basically for example if i'm asked to build a url shorter for me the requirements can be do i need to add personalization or do i need to add support for analytics do i need to support uh things like expiry of a particular long url if somebody uses my services and give a i give a particular short response for how long should i possess that particular short uh url to long url mapping that is also a business statement and that will dictate how my design will look like so it is important to ask such clarifying questions in the gather requirement phase now i hope uh the first point is clear to everyone let's move to the second point which is uh once you've got an idea or got a sense of the requirements that you'll be building out just right there write all those requirements down on a piece of paper or google doc or anywhere right if you are doing it on a whiteboard you would write the requirements freeze the requirements on the whiteboard it is important to note here that do not spend a lot of your time on this particular part gather requirements but maybe one or maximum two three minutes on this particular aspect right after that you need to spend time on estimating the scale now there are multiple sub sections to estimating scheme the sense or the idea that i'm trying to get out of estimate scale is now this is also one one more area just like gather requirements your interviewer will work with you to under to basically give you the requirements you'll give certain assumptions the interviewer will say yes or no right or maybe you can ask your ask question to the interviewer and he'll give you the answer for that for estimate scale also for example if you don't know what is what the starting point is just ask how many daily active users are going to be there so for example i was asked to design a url shortener i asked my interviewer how many users are going to use it or how many daily active users are going to use it so let's suppose my interviewer said that there will be 1 billion users which will be which i can expect to use my service so in the estimation scale step what i need to do is i need to convert this 1 million daily active users to find out how much storage is going to be required to convert these 1 million daily active users now my interviewer also tells me or i tell him that i am assuming that 90 of my users will just read the data and only 10 of the users will write the data right now this is a one assumption that i'm making i'll i'll uh you know express this particular uh assumption to my interviewer and based on what whether he says it's a fair assumption or not i'll change these numbers but for our system let's say let's suppose these are the numbers so 10 percent right means that 1 billion into 10 percent which means around 100k users are going to write something on my i'm going to generate data on a daily basis now what is the data that these guys are generating these guys will be generating will give me the big url i'll have to somehow do some transformational magic here and return a small url this big url can be let's say 100 characters and this short url can be 10 characters right so this is the kind of transformation that i need to do let's suppose to store the entire record i can store it using 1 kb of space so what is the daily space that i'm going to need in the bare minimum daily space that i'm going to need for my system it will be 1 kb of data for one record because one record is taking 1 kb multiplied it by the number of users that i am getting who are generating the data on my website how many such users are there there are 100k so this is 10 to the power 5 which is something around 10 to the power 5 and 3 mb around 100 mb of data right this is on a daily basis so bhagya rana says that it should be 1 million into 10 to the power you know into 1 kb it wouldn't be 1 million the reason it will not be 1 million is because only 10 percent of my users are generating that particular data not all of them are generating the data the people who are generating the data that means they're they're using my service to shorten a particular url only those users will be considered hence i have taken 100k users right which uh which basically means that i'll be storing around 100 mb of generating around 100 mb of data and if i have to calculate for a particular year one particular year then it will be 365 i'll have to multiply it by 365 which would be around 3.65 or 36.5 gb of data right on a yearly basis my service is going to be producing this why am i not taking the read into consideration while storage because i'm only storing whenever you are a person is requesting a shorter url for a long url that i've got only then i have to store that particular url what happens when a person reads wants to read that particular url to a big url if a request comes for reading this particular url i'll go to the storage i'll read whatever is the corresponding big url and i'll return it back to the user so in that i don't need any storage right so the second step which was estimating the scale the most important thing is to estimate how much storage you are going to require because this number this 35 or 36 gb number that i have got here tells me that only one instance can basically hold this particular data the entire data for one year can fit into one machine itself if this would have been around one tv or more than one tb i would have required multiple machines to store that particular data and hence i would require uh more since i required more machines i would require some short some sort of sharding right but in our case we have seen that it it is just around 36 gb of data one instance can handle it there are still problems we will come to what are the problems but i'm just giving you what are the numbers that we need to take care of right so coming back to our important parameters for estimating scale first was with the storage the other important parameter is to identify the tps of your system that means how many requests are you getting per second that includes of right tps right tps would be that a person is giving you a long url and you need to return a short url for that a read request a read request in this scenario would be a person is giving a short url and you have to return a long url so you need to find out the tps of every system tps the full form of tps is throughput per second people call it qps has been qps is query per second some people call it reads per second writes per second and by forgetting it basically the crux is that you need to find out how many what would be the number of requests for these what would be the number of requests for rights because if these numbers are also very high let's suppose you are getting 1 million reads per second that means that you need to have some sort of a caching enabled you need to have some replication enabled because one server will not be able to cater to 1 million i o calls so there is a network bandwidth limitation as well this is the reason why you should estimate the tps as well i hope that makes sense now let's move to the third important part which is once you have estimated the scale then you look for what are your design goals what should you solve for in this particular problem design goals uh if i have to just give you an idea about what design typical design goals are if you have not uh heard about uh capture just go to our youtube you know there's a playlist on system design go to that playlist and see all the you know primer content so there's a theorem called cap theorem which states that uh either in a in a scenario of a partition tolerance you can only have either consistency or availability if you're taking the partition tolerance into account so one of the design goals is to identify whether your syst you're designing for a very consistent heavy heavy system or are you okay with eventual consistency and you prefer availability on that particular scenario so identifying that is one part let's talk about the case that we are building do we need at most consistency well if i just have to explain consistency in two lines consistency states that if there are two data nodes let's say there is data node 1 and data node 2. now if the value of x is let's say initially x is 5 in both the cases then i get a write request which basically updates the value of x to 15 and let's say that particular value was written in data node 1. now if i fire a read request depending upon whether it is going to data node 1 or data node 2 15 or 5 would be written in a strongly consistent system we will get 15 as the response in a weekly or a in a eventually consistent system i may get 15 or 5 both depending upon whatever is the output so do i need at most like do i require my system that every node every data node in my system should have the exact same data the answer is no uh because there are very less updates i'm just adding whenever i generate a big url once it is copied you know once it is added into the database i'm not a modifying it so either the date value will not be there or if the value is there it will be same for all the cases right so i don't need consistency i can live with uh eventual consistency do i need availability i don't want if i've generated a particular url and i've sent it to you know millions of people that's it was a google i o event or maybe uh you know donald trump or maybe uh modi giving some speech and that link to that particular speech i have let's say shortened it using my service and i've sent it to multiple places if those people are not able to you know reliably see the output or reliably able to access my service in that case it will be a massive hit on my website and the credibility of my website hence i cannot let go of the availability part right so design goals make you need to identify this that whether you need consistency or should you prefer availability in case of a partition tolerance partition tolerance we've seen uh you know we most of the cases you know you would require some sort of a partition right so that was my third part designing having the design goals you first identify which what what trade-off you will have consistency versus availability trade-off and the second trade-off is between latency right what is the kind of latency are you looking at so let me just write it down and then i'll explain what latency means now different systems would have different requirements for having different requirements for latency for example the url shortener one that i have given you uh if i give you uh if somebody give you know types a shortened url into the browser and then tries to access the bigger you know url if that particular takes uh time uh if the lookup itself takes lot of time it will not be a very good indicator on the website because let's suppose it was basically a url representing one particular event now it's not a very good customer experience if they're just spending on the resolution resolution time right uh i'm further slowing down the service of the end provider so i don't since i don't want that i need the system to be very low latency or driven right but it is not uh like people might think that we would need such kind of latencies in every scenario but that is not a case right all the bad systems for example if you build that can have higher latency when you sign up for netflix for example it asks you that hey what are your choices do you like to you know select some movies that you like watching and based on that they give you some recommendation on the uh once you open that flex home page now the reason the same thing happens with uh spotify as well for those systems the latency can be high like it is still bounded by some seconds maybe 10 or 15 seconds but it's higher as compared to uh you know just signing up on netflix you enter your email you enter your password and you're done so depending upon the system you're building you might have varying level of latency and way by varying level of latency you can decide whether you want to build a system which is a bad system or do you need a real-time processing system so i'm taking a very simple example but i hope you are getting at the point that the using the same methodology you can solve really complex problems as well by the way if you are interested in a very complex problem also uh we take master classes uh those are free completely freeze you can join us in one of the master classes it's a slightly longer format hence we get to you know pick up slightly difficult topics than designing our uh you know url short enough so if you are interested in you know let's say building youtube incision pipeline that is going to happen tomorrow right so you can attend that i'll be taking a youtube how does youtube delivers its data to millions of customers which are globally distributed uh i'll take a decision on that on sunday so you can attend such kind of sessions that'll have a little more complexity as compared to what we're doing right now this is the link for the master class so anyone who's interested can attend the master class on sunday it happens on sunday 5 pm right there will be couple of more master classes which are listed we take master classes on data structures as well we take you know master classes on the system design aspects as well cool so coming back to our original problem coming back to our original framework so we have discussed the design goals part of it the next part of it is the so before i uh tell you the about the next part how do you typically build a system right uh you don't for example if you have to run a marathon you don't uh you know directly participate in a marathon you maybe start uh to walk for a longer distance and then maybe you start building up stamina then you run five kilometers then ten kilometers and then baby you run for a 42 kilometer marathon similarly similarly in a design system design uh concept as well whenever you're designing a particular system assume that everything uh right from you know your storage requirements your tps requirements that we have identified in the step two all those requirements can fit in a single machine why do we make this particular assumption this assumption is made because if you keep everything in a single box that means you can have any kind of flexibility the only thing that is reduced now is all the problem is uh or all the data can fit into single machine which means you can use rds there's no need of optimization or anything like that we identify what are the bottlenecks and we'll iteratively build on top of it uh we'll iteratively remove all the bottlenecks so the last uh or the fourth point is basically you design for a single server whenever you're designing for a single server you need to take care of db schema this is our the the um based db schema don't directly jump to any nosql uh system it's a major red flag right so even if you have worked you know thoroughly on cassandra or maybe that is or maybe any other database it's a red flag if you directly jump to that particular that particular database always start with rdbms based schema right once you have started with rdbms based schema then you move to designing the apis and writing the business logic as well right now if i have to do the same thing for uh you know design schema and do all all those things for our case as well so let's do that here so for the uh let's start with the api right uh what are the apis that needs to be there there'll have to be a post api which would something be like shortened and in the body i can get the url and expiry right this would be the body this is one api the second api is basically a get api get is nothing but slash whatever is the short url and then you there's no body in get and you give long url as the response so these are the two main apis that needs to be there uh i won't spend much time into how we design that and that's a separate topic in itself uh how do you design rest apis how do you what makes uh url restful what does not make a url uh restful all those things we'll consider late uh like that's a topic of discussion uh for a different uh session i won't go into that then you have to design the schema for this particular problem the schema for this particular problem uh is a simplistic one table schema you can have something like a short url you can have a long url and you can have validity which user generated this and may be created at and maybe some other three four other tables but the main columns are short url and the long url and the validity right these three columns are the main columns using these three columns i can answer both the apis that i have uh listed out right now i have done uh just a quick quick checklist i have made the db schema i've made the apis the only thing remaining is using the what should be the business budget now if you think about the business logic this problem can be solved in variety of ways right one of the way to solve this particular problem is to have one id field here right this id can be an integer and whenever you store something in inside a database that id will get automatically generated incremented to be precise so it started with one two three four five whatever is the number that is the shortened url and that will not go beyond you know we are generating only 100k users per day and some 3 36 gb of data per day which means that your shortened url will not go beyond a certain uh limit right it will not go beyond six digits or seven digits or maybe eight digits right whatever big url you are getting you just return the id and that will be the shortened url but that is not a good approach right i hope everyone can realize that uh it will not be a good approach if you just rely on the database auto generated id why that won't be a good approach because that's easily guessable so if i have for example i used your api and i used a repair to generate a small url and then what i did was i just uh you returned me 54. that is telling me that hey there are you know 53 other entries already present in the databases where so i can just query using 52 53 and find out what all who all are using your services so it's easily guessable so that that's not that's out of picture the second approach so first approach was basically relying on auto generated id right the second approach uh which i think rahul has also pointed out is around using the uuid or maybe using the hash function uh like md5 or sha-256 and things like that by default we cannot use uuid why it's a 32-byte 32-bit character essentially we don't require uh anything like this why because it is generating 32 bits of characters i don't need 32 bits md5 also is 32 bits sha 256 is 256 bits i don't require these many characters why so let's look at how many numbers i can generate with one single character or how many urls can i store with one single character with one single character i have 26 choices of alphabets plus 10 digits 0 to 9 10 digits so around 36 choices i have only with alphanumeric characters if i use it for short-term url if i also include the casing i'll also introduce another 26 characters which makes the total tally to 62. so with a single character i can represent 62 big urls so let's suppose my big url was this i transformed it and i said that hey i will represent this big url with a and i'll store this particular mapping too uh in the database right now how many uh urls can i store using two characters then the first 62 the first uh character can represent 62 the other cam character can also represent 62 right so it will be 62 into 62 which is equal to 62 raised to the power 2. how many for with 3 62 raised to the power 3 and so on how many with n characters 62 raised to the power m so even if i have 5 right 62 raised to the 5 is i think somewhere in millions or billions which will do my job in case i need more i can add one more 62 raise to about 6 probably and i will get my response i will get my final so shivan says uh shivan dubai says it will be 62 into 61. uh no i can use a and capital a again so for the other character also i have the same exact same repetitions can be there anyways in either of the cases uh 61 or 62 that really doesn't matter what i was hinting or what i was approaching uh from this particular discussion was that there will be uh exponential number of uh combinations that i can create with a limited set of characters so i have solved one part of the problem uh right the part of problem that which all characters or what should be my business strategy to come up with uh to represent million or a billion long urls that i have really solved i have solved for that what i have not solved for is the guessability factor meaning that if i have a a if i return someone a a or maybe a a b let's suppose i return a b this really tells me that there has to be a url generated with a a so what can i do to remove this guessability factor any thoughts on that how do i remove the guessability so what i can do is i can generate multiple messages with or i can pre-compute all the numbers right i can pre-compute everything and then i pick up few uh pick up the first million and then push it into a set a hash set a hashtag will just randomize it and then then i can just uh i trade on the key z once i have accessed a particular code i can save it in the database that will basically ensure that my you know that will basically ensure that our guessability factor is removed and it is scalable as well so that was my point number four how do you design system that can work in a single machine right design for a single system or for a single server now to add scalability factor we need to what what will can be the bottlenecks that one server is not enough to cater to my uh higher etps in that case what can i do i can have multiple servers who are running this per piece of logic and each server is responsible for the e-server is basically responsible for generating its own version of code right so i can say uh let's suppose i have running 26 servers or let's suppose i'm running only three servers server one server two and server three i give one million pre-generated coupons for this one one million to this and one million to this right so as in when i'm adding new servers i'm giving this one million set of uh you know pre-generated uh codes that can be used by that particular server i while assigning this one million coupon codes i can ensure that there is no collision once these uh codes are residing in the memory of the server i'm not introducing any new latency and i'm ensuring that the date uh you know i have designed a low latency system that can scale to million of users millions of users as well right so that was the basic idea right uh you go for first four uh steps you gather the requirements you estimate the scale you find out what are the design choices or the design trade-offs that you need to make once you've identified that the next step is to uh basically design your system that can work on a single system that will involve writing you're writing some db uh schema that will also involve writing apis and also finding what is the core business logic that we should support right once that is done then is scale for the numbers that you have identified in this step estimate skill set how do you basically scale for whatever you have decide once you have discussed all the approaches and all the problems then you go back to the gather requirements the mvp stage in the mvp building stage you have written some requirements just go through one each requirements one by one and validate that whatever you've written is working fine and you know all the features that you've discussed earlier are taken care of right so this was the basic idea now let's jump into the question and i will stop the screenshot yeah so he was asking that while i was estimating the scale won't i consider the read once as well because there were number of the the number of users who are doing the read and the number of users who are doing the right operations are different hence uh i think the question came that read one should also be stored uh i think that that is not required because whenever you have to write something to a database only that that amounts for this storage read once you'll consider while calculating the tps or throughput on your system then you can uh you'll basically have to consider the read uh request as well his uh another question is that expiry of url uh would it be 24 hours one week availability must so that's a valid question which you should definitely ask when you're appearing for interviews uh i didn't clarify while i was designing it but uh i had taken the expiry from the client itself whoever is generating the url can tell typically tell me that when is it that he wants this particular url to expire by default i can have some you know keep the link open for maybe years but you should have a fixed expiry to all the you know whatever you're adding in the database and this is a very key concept if you think closely about it expiry can save you from adding or having to add multiple you know shards as well why because if you don't have expiry your data will keep on growing right it will go on a linear scale or exponential say depending on how people are interacting with your service but if you truncate older data that means your same single database instance can serve the request as well so that is actually very important that you clarify you in your with your interview that what is his expectation out of the expiry most likely if we ask you to design a uh you know url shorter ex even expiry will not be uh explicit ask right or this is something that you should you as an interviewee should have you know clarity on you should ask that particular question right cool batch versus stream when to use right uh that's a very interesting question uh i think it actually depends on your use case right or for example in some cases batch processing does make sense for uh i'll give you one example uh right for example click buzz uh let's suppose uh we have a website like click bus now crick buzz have started let's say one game wherein people can play uh or people can chat live during or comment on any anything while the match is going on after the match is over over they'll find whatever or whoever has the maximum number of like on a particular comment they'll reward that particular user now in this case either you can you still you have both options either you can stream all the comments that are coming and then uh have i aggregate on number of likes each comment is getting or since it's a one-off activity after which is supposed to happen after just after the match you can also uh use a batching system here i would use a batching system here solely because it's a activity which will happen only after the match right if you see irctc website uh that goes down after 11 30 pm every uh every day that is because they want to reconcile everything it's a batch process the salaries for example are a batch process where i would need a stream i think a stream processing engine would make sense for example if i want to do some cloud deployment and i want to check whether in real time are there any security anomalies in my cloud deployment right uh or for example any kind of an audit that i want to maintain right for example if somebody launches a ec2 instance and he has uh you know attached a public ip to that particular instance that is an opening into my vpc cloud which is a security thread and ideally i would want to catch it in real time so over there i would go for the stream stream processing engine because i want to have a live action defined for every outcome that is happening right okay i think that's a very good question uh his question was when i was defining the schema right uh when i was defining the schema for my url short now i didn't add id uh id i added later only when i was you know going to the business aspect of it but uh a quick yes or no in your mind just uh keep a yes or no in your mind do you think a primary key is required in this case let's talk about yes and it's implication of having a uh external id if i have an id internally although mysql or rdbms system what they do is they sort the data physically uh based on id so id one will be placed closer to two will be placed closer to next to three will be placed next to four and so on so this helps in uh you know fast retrieval of data because they are able to look up for that particular key in a very fast manner now my primary lookups you know i have seen this many or you know many people uh doing this that they uh just add id for the heck of it there is no real use case of it right uh for me my id is the most of the time my requests are going to be short url what given a short url what is the end uh longer that will be that is going to be the reads to optimize for that i'll keep but the short id as my primary key it is unique right and i want the data to be fetched using that hence having an explicit auto incremental or any kind of other id is not required just having a short url uh will serve my purpose so if you want to go deeper into it or just read about locality of reference it's a very important concept primarily used in dbms system so you can typically go ahead and look into that uuid hash base and all those systems does not work solely because they are they generate 32-bit numbers and let's suppose i skim the last five for last six also then also there'll be many much many chances of collisions so dealing into those areas will just be problematic pseudo random numbers generators so for anything that you're doing with randomness right either the range has to be a lot meaning for example for 32 bit number md5 that that is basically a good enough hack to generate a 32-bit unique number or i can go with uid and generator random number but if my range is limited now let's uh just to give you an example let's say i have only two numbers 0 and 1 and i ask you to generate a random number between 0 and 1 you generated 0 let's suppose the first number that he generated was 0 i again ask you to generate a pseudo random number let's suppose again you generated 0 and again i ask you and again 0 was generated right now you just have one number remaining and there are chances of there are many chances of you know collisions there are 50 percent chance of that that you generate the number which is already generated before now let's increase the range of your numbers now you have 0 1 2 3 4 numbers right let's suppose 0 1 2 uh these three numbers are already generated the probability of generating a new random number which is not been used earlier is just 0.25 75 percent chances are there that you generate the number which is already generated so as you keep on exhausting the available resources the operation which was already o1 operation is basically tending to o n right you can take n chances to generate that particular number if you have hundred numbers the probability of one number getting picked up is one by n so if you have out of the first hundred numbers you filled up 99 the probability of you know generating our uh the number which is not used randomly is very very less right so it's 1 by 100 so you have to take 100 tries then find out that particular number in our best case in a probabilistic kind of a case scenario so that's not a very good you know way to find it out or random randomness helps you in some cases but randomly s is not a good enough criteria so you can search for this particular problem it's a very famous problem asked in interviews as well as in coding interviews as well that if you have one array one and you want to pick up random indexes in an equal probability so let's suppose you have numbers from one two three four five six out of these six numbers you want to give equal probability to each number and find out and generate a random number out in this how would you do that right uh the standard way of doing that is uh you always generate a number from since you had six places you generate a random number from zero to five whatever is the index let's say three is picked you swap three with the last element and you decrement the range earlier you the range of you know finding out the random number was zero to five for the next iteration it will be zero to four for the next iteration it will be zero to three and so on and you keep on swapping the index of element which is getting picked with the last value how is uh how is the server assigned a set of million records out of all the possible records uh i think i have discussed this problem very briefly you can generate all the sequences first million sequences then second million sequences and third million sequences whatever is the sequence that you've generated put it in the hash set asset will automatically rearrange the keys right it is not a list typically keeps everything in order uh if you put it in a hash set kind of a data structure it will automatically you know jumble up the records that you're getting and and that will basically ensure that you're getting a random id every time batch processing uh would be amazon order salary stock market yes stream processing life sentiment analysis sentiment analysis also can be done using batch processing might not be a yeah it actually makes sense if you want to you know triage or maybe alert or mark as misinformation on you know twitter or facebook these days so yes you can use sentiment analysis which can be assumed let's take this question uh he says if we are pre-assigning the short keys won't it be time taking for searching available key to assign as if you are trying to create a on a particular segment no so let me explain this process let's suppose these are my available slots let's suppose these are my available slots now what i need to do is whenever a request comes i need to pick one number out of it randomly right and assign it to the incoming request this is stored in memory in memory meaning this is stored on the same server on which my application server are done right so let's suppose the first request comes and i generate the random number the random number which was generated between 0 to 6 i generate a random number between 0 to 6 and that returns me 2. so what i do is i move i return 2 2 is now assigned to the big url and i move to here right so 2 comes here so 2 comes here and 6 comes here all right so the new array now becomes this next one number again comes so what i'll do is now i'll decrease the size size is till this so i'll generate a random number between 0 to 5. let's suppose that number that gained was 1. what i'll do i'll move this 1 to this i'll swap the two numbers what will be the resultant array the resultant array would be 5 will come here one will go here and size minus minus if i have to generate new next random number what i'll do i'll take random of zero to five let's suppose one is generated now five and four will be exchanged right so the data will become four here five here size minus minus and so on i can just continue this so using this way i can you know find out generating a random number between two numbers is very easy right it's an o1 kind of an operation so using this particular approach i can find out keys and i can ensure that the guessability factor is you know is maintained to a certain extent and how do i distribute if i have multiple let's suppose i just have 10 10 buckets each so how do i distribute 10 10 buckets means that one server let's suppose can handle only 10 keys how do i distribute them i have a list of 200 you know urls that can be generated i pick the first 10 and i send it back i save this 10 in the first server that is how i can ensure ensure that each server has 10 keys to work with all right guys uh thank you so much guys for joining in bye bye all right we are live uh hello everyone i am mumit yadav and today we have with us who will be interviewing uh mock interviewing with us uh the topic for today would be a topic around system design and i let you introduce yourself uh and tell me something about yourself hi everyone my name is lukesh i'm currently working as a senior software engineer at akamai so i'm i'm totally having six years of experience um so prior to akamai i worked at big basket uh so currently uh i'm working as full stack developer so i i'm graduated from uh triplet newsweek in 2015. um yeah so that's uh it about me sure uh cool so i'll just quickly tell the format for our viewers as well uh i'll be asking questions uh location will be answering me and wherever i feel that i can you know add some value uh valuable feedback i'll probably give that feedback to uh you locate and uh to the wider audience as well right uh so instead of giving the feedback and keeping the feedback for the last i'll probably keep on giving the feedback as and when we go around right uh cool so uh i think there are a few users who have joined us uh let's wait for a couple of more minutes uh before we get started uh meanwhile i'll also uh share the screen so uh whenever you talk about uh you know if there's a need to draw something on a whiteboard i'll use my ipad to draw that particular uh thing for you for example if you say the load balancer i'll draw a load balance if you if you say something other things are maybe application servers or databases i'll draw those databases as well for you right uh [Music] cool so let's get started uh the question for today is you need to design a google search type ahead uh how would you go about designing that okay so um so basically um so if i uh look at the requirement so basically uh when users type something on ui so he needs to get the list of suggestions um so in in the in the drop down so that's the basic requirement and uh coming to the suggestions uh so that's a the first thing is the citation should be sorted by the popularity in the decreasing order so the most popular should be at the top uh or the most you know service are most popular um so um and also uh also um sometimes i mean we also need to take care of uh uh you know the uh muted wall so i mean we we don't want to show some pi pi rated or um some other so some kind of words we don't want to show so we need to filter them so uh that's also one requirement for me to filter filter it filter them okay um so i'm just thinking um so popularity and uh so um so i think um so um so so uh fun i mean for this um so i'm assuming we need to respect top 10 uh popular votes i mean popular um uh phrases only uh so let's just say top five would do yeah okay fine yeah top five yeah top five uh words um and so yeah and also uh the mo the most important thing is you know it should handle the typo so uh if we uh make any type of the user make any type also it should handle that so that is one word i mean one requirement yeah so i'm i'm so i'm assuming this is the scope of um this system system i mean as as per the functional requirements so and [Music] anything so that you see in a typical google search for example if you search for weather and i search for weather will be shown different results okay you mean the informational uh results so should we also consider is it like so okay so that means like you know prime minister of india so in the the drop down we'll get the answer for that so you mean for example uh if i type prime i watch prime time news a lot so for me the suggestion can be prime time uh let's suppose you search prime minister okay so for you prime minister will come uh okay okay oh okay got it so so make sure you're so using the cookies um so use the cookies for personalization so user personalization um recommendation yeah so personalization yeah that is true that i missed yeah that's one of the thing and um so the um so so also we should should we consider the location based search or not yeah we can uh so uh this is basically the first phase right uh where we just scope out that just by looking at google search type type ed these are the five things or ten things that are coming to my mind now uh to in uh in order to discuss it in a you know 45 minute interview uh it is not possible to go and design each and every uh feature so we'll just limit the number of features that we are uh you know we are going to discuss so i'll cross out we definitely need to sort the result based on popularity that's a must-have uh you don't have to filter what uh we can assume that uh you know filtration is not required so yeah this is something that we can remove uh top five again top five is a primary requirement for google search so we need to have that cyber handling also we can ignore personalization and location based uh you know uh suggestions are also something that we can ignore for now we may discuss it if we have time towards the end of the personalization right yeah so let's start with these popularity and yeah so um so so these are the functional requirements and coming to non-functional requirements so basically um so you know yeah so the system that i want um so so so the first of all i mean if i go with cap so um so so the system should anyway have to uh have to handle the partition tolerance because of the network failures or individuals so the partition should be always there um so and then if i want to choose uh that means i mean basically you know the scalable so i'm actually choosing scalability um uh to be my first priority and also over availability and consistency um so for for this case i actually go over go with um availability uh the reason being you know the reason being why i do not go with consistency is because um so i mean like if eventual consistency is is okay so you know instead of strong consistency uh eventual consistency is okay for this um system i'm i'm thinking why because um i mean sometimes sometimes um i mean even though even though if we are not so let's say we have um some top five uh results but uh let's say we are we have uh we have to i mean we are showing um let's let's say you know the incorrect one like in the let's say there is a there is a term which is top five but we are showing top uh six or you know i mean one i mean which is misplaced so that is fine like you know it can eventually uh propagate and you know we can show so that's why i'm actually going with availability and the scalability um so and coming to scale estimations uh so um so i'm i'm actually so first of all the number of requests um so uh basically um since since it's the scale of uh google um so i'm thinking maybe some five billion requests uh per day uh at the scale of google and uh thinking uh so how did you come up with this particular number um i mean so um so basically i mean um i mean i just came with random because uh so obviously um the request will be billions um i mean taking a google search so uh that's why um i came with like you know some five or ten billion so for five billion i came up with no okay a good way for uh scale estimation right or this is i'm basically giving your feedback a good way to do scale estimation is to look at the daily active users and then reduce uh and then reduce what uh what are your tps and storage requirements would be right so google has i think close to two or three billion let's uh for the sake of calculation let's assume that google have one billion users and every user may be queries at least once a day uses google search at least once a day so there will be one billion queries per day these are the leads and everything is contributing to the right as well so i'll leave exactly the thought here and let you continue the thought exactly yeah um so um so uh so this system uh is um actually uh read and read i mean so read and writes are uh equal uh here um so that means one is to one and um so that means so uh here it is uh one so we are um so thinking i mean we are having that uh one billion uh requests per day uh i mean because we are having one billion users um so and then and then uh coming to um so and then coming to the storage um so i think we can we can leave this here and then coming to storage um so i in our system so we only need to store uh the queries i mean the uh the votes or the uh the queries um so for that i'm thinking so maybe so each query uh so each query at max may not store uh i may not uh take more than um so maybe like you know um so uh so so since they're strings um so and they actually i'm thinking uh maybe like you know less than a kilobyte um so that means like something like um some some five some 512 bytes or something i'm thinking like you know half half a kilobyte uh so each um each word um so that means uh that means we are having one billion request so one billion into uh half i mean uh into 0.5 kb so which will i think i need to calculate so so something something in terabytes but is that calculation correct uh because although we are getting one billion search queries uh do you want to relook at this number uh because all of them are not unique yeah exactly um so yeah all of them are not unique and um so and in our system we don't write them i mean so we don't write all of them so we already may have other info so that's why you know i i actually uh so i mean the maximum threshold so let's say um we got a word or we got a phrase in the worst case that is not in the not in the uh in the system so in that case we we we have to store the whole world so so that's why i came up with you know uh less than a kilobyte so around maybe some 256 bytes or something uh i mean zero point perhaps so 256 bytes um so at maximum i'm thinking okay so we'll leave the storage part here let's move ahead uh let's just assume that uh one box is not enough to handle the entire storage data exactly okay let's just assume that ask there were some issues with the storage the issues were that uh you are reducing the size of the particular uh record instead then i'm what i'm arguing on is that the you won't be having one billion records to write because let's say i search for michael uh and you also search for michael that will end up going in the same direction i'll just upgrade the frequency but let's uh see the design first and then we'll come back to the storage part here yeah sure so um yeah so i'll go with the first go with you know um so the basic design um so basically when we you know um so we can actually think of you know the the first thing that comes to you know uh our mind is you know try so i mean to store them in a try so you know it's very um very trivial and obvious that you know i mean by by looking at the question so that means we have to use try so um so basically you know we'll have one node and you know after each node will have i will store them um yeah uh so um so uh so so let's uh so let me first you know try i mean let me first go on go by um i mean the flow so basically the user sends a http request um so with the votes um in it um so let's say your user searches with um donald trump okay um so as you written so basically uh i mean initially i'm assuming that you know i have only you know um i mean i don't have large scale but you know small system so basically um so the client sends the request and i uh so i actually go and look up the try um so and and each look up i actually try to find uh um so uh i mean if you if you if it's a successful match uh that means if it's a successful vote um so then uh i actually update the frequency at that at the terminal note so basically we know that in a try uh we actually store a boolean called ease word to uh to tell whether that's a word or not so basically uh yeah so that's a word so whenever it ends there so we so in our case we also we also store the frequency um so you know which will be updated uh incrementally uh every time uh so and um so so so one quick question over here so you said that uh let's suppose the search query is done here right uh and uh there is a null character over here as well that means dawn is also a valid word so you suggested let's say two words don and donald and he clicked on donald so what all where all now i think each node can have its own frequency so which all frequency would you modify it yeah so huh so you're saying when user enters d1 dawn so we'll suggest dawn and donald um so if user clicks on donald so which we will uh increment right correct yeah so uh in this case since user selected donald so that means we we will increment the frequency of donald so that means the d terminal node because i mean that's where that is getting hit so this frequency will hit i mean however there are still problems with this but we'll um eventually find it to find it out um so so this is the basic approach uh so what are the problems with it uh in terms of so what i want is top five how would i uh like what can what is the complexity of your current solution basically yeah exactly so um so here the problem would be so when i want top five um so i i actually have to traverse uh the whole tree okay um so so that means so let's say when when i said like um d o n so that means from n um so i need to actually traverse through um i mean whatever the depth of that um a tree i mean located at n um so i need to i need to look look them uh and then and then i need to uh find the maximum maximum of them and then again i do return them so um so uh here the problem will be drilling down the uh tree so um so so so i mean we need to find you know how can we resolve it right so first of all first question is how would you find top five out of let's suppose you are doing the iteration also how would you actually find the top five elements okay so you mean hum so we can do iteration this is as good as i have an array and i need to find exactly so so the main thing and also um so uh basically when i when i um so came across this top uh or you know i mean uh kate i mean kate largest of these so i actually think of building heap but but again um some thinking so i mean so can we store this uh information in the in in each node like so as we as we build the uh try uh can we actually uh can we actually store um so let's say at the end can you actually store the information that okay so top five um top five votes at the end are these uh can we actually store them i mean which data structure would you use to store that so uh here uh i'm thinking you know it would be so it would be uh enough if we can store the i mean store the references of the terminal also let's say um so n at the end so d is another terminal node uh having some good frequency so here we can uh see here we can use a heap uh a max heap okay um so at max heap okay so the top i'm sure that you'll use a maxi for finding top five so because here here we need we need uh we need top by the decreasing i mean by the decreasing order of the popularity right so that means every time i actually remove the maximum at the i lost your voice actually hello hello now now you're audible yeah am i audible yes yes yes hello yeah hi hello hello i can hear you fine hello hello hello it's a recorded session how can you say it's like uh it's not a recorded uh session uh it's uh it's actually live uh i can hear you man hi hello can you hear me uh facing technical challenges guys uh just bear with us for some time uh can you try reconnecting location i i'm able to hear you meanwhile i'll just take questions that are there so locations based personalized search yes fast response time yes hey location are you am i audible now yes yes yeah i think some glitch i mean um yeah fine no back uh so we were discussing about that uh we uh you said we'll have a max heap over here yeah right uh so do you see a problem in that i'll give you some numbers uh let's say these are the numbers four five three zero six three uh six all right and let's suppose you want to find top three elements so you try building a heap so you added four in the heap then you add five so you five is added then you add three three is added now comes the value of 0 should 0 be added or not how do you decide that okay basically you'll have to check all the elements in the heap to between so um okay saying um so we want top three right it's okay um so yeah in this case uh we will check uh yeah so basically we need to check the minimum element to find but the minimum is at is not at the yeah maximum is at there okay okay yeah yeah yeah that's true so if we find three uh yeah that's true that's so uh so when it came six uh so we'll check uh with three uh if it is greater than yeah yeah yeah so uh instead of this uh what we could have is a minimum e yeah a min basically uh so four will be there five will be there then three will come so three comes here and four comes here then zero zero is not greater than three six is greater so we'll remove uh three and we'll add six so that that's how it will be done cool so uh every node uh will have a min heap attached to it uh the size of the mini will be five uh till that we have solved the problem so using that we can uh typically answer our query in an order one time yeah right yeah exactly cool so uh till that we are clear what about uh what can we do after that yeah so uh this is the basic design uh so um so i mean like i'm i'm thinking about you know what else uh i mean what other problems do we have with it so i mean when we when we scale this up so the main problem would be the storage like how do we store try um you know how do we handle uh such a huge uh use scale so um so basically um uh so one thing uh so so one thing we can do is we can actually uh replicate the try i mean we can actually use uh like master slave approach uh by having you know multiple copies of tries um so and uh so and then we can you know uh we can propagate like just like how in databases so we can write to master and then schedule uh two slaves um so that approach we can do um so be so uh uh i'm thinking the problem would be uh so the latency would be more in this case like uh so i mean writing too so so there can be replication lag so we can add to the latency of the user uh so you i mean the uh so the consistency can be little compromised so um so this but this is one approach um so and and also one more thing is so this is okay like you know um so this is uh okay when we when we can store uh the try in a one machine but um so but what about you know if if we have to partition that so that also we need to consider um so what if so let's break it down what are the problems with this particular approach let's write down all the problems and then we'll discuss one by one what is the uh what can be the probable solution so what are the problems can you identify the problems and tell me what what those are yeah the first problem is um uh so i would say uh the data partition partitioning uh okay so uh i'd say storage yeah right uh what else that one did the entire data the entire try that we are building cannot fit into one machine because we are uh storing the uh search like we're storing the tree try of all the searches right so that is one problem what is the other problem yeah so that is one problem and um so should i also consider on the client side like what issues will have on client or not uh not the client side we're designing the backend so let's focus our only on the back okay so and also caching so i mean should we have to cache uh some queries or something i mean should we implement caching here uh so what kind of caching uh are we doing here this is a try right so try would be stored in memory so we don't require a cache um so i mean like so uh i mean if there are some popular uh things um so we can actually we can actually you know return uh i mean we can just store some uh interest in cash and you know we can write i mean write it to try asynchronously i mean to increase uh latency because uh i mean the main requirement of the system is to have less i mean low latency to the user uh because you know if you i mean as the users are typing it should it should come in milliseconds so i mean what if we if we store them store the uh i mean like popular ones on cache and then we update them in the asynchronously to the try uh so uh if you say how would you identify the popular ones in that case uh is basically what you're saying is that maybe for some queries you will save data somewhere so that uh those are basically anyways the top ten or top hundred or top 200 or maybe top 1000 queries uh you'll store it separately and whenever the term comes you'll so you'll give the response from there but let's focus on the main problem the main problem is that we need to give a consistent response for every uh system right we don't have to optimizing for some queries we can do it later but let's optimize for the larger set of problems so apart from storage do you see any other issue in this particular approach yeah i'm thinking so um storage and so so i mean so main thing we need to improve is latency so that's the you know main important thing um in this system so for improving latency i'm just thinking so what else we can do so we identified uh do you think updating these top five when we are updating the frequency of a donald trump yeah that's true yeah [Music] all the nodes right exactly i'm coming there so um so when we are updating the uh updating one terminal load so we need to traverse back uh you know i mean so many levels uh to find you know each terminal node terminal node and so on to till root maybe to our i mean sometimes we need to update this topic so you know these up these uh updations are uh yeah i mean are so here that's that's one problem so uh you said uh we need to update terminal nodes uh i think we need to update every node uh that's there in the path or till the top node why because even if a person has type d o there should be a heap yeah corresponding to which can give you the node reference right that's true that's true yeah so end up updating l a n o and right up to d so for one read i am actually doing multiple uh write operations um yeah that's true i mean so these while so so so you mean like so we will not uh so we will not update them in in sync way but we'll do it in async way so we will just we just written the result and we will um so use uh asynchronous a way to update those nodes uh but but that still be a problem uh because you know updating those uh event so considering the scale uh the number of requests um so the applications will be a lot so uh basic thing right uh let's suppose i have one database where this is a chunk of database i have a value of x equal to let's say five yeah right uh people can come and read the value of x equal to 5 there can be multiple read threads that can come and read this particular value now let's suppose there is one thread which is trying to update this value it wants to set the value of x equal to 10 yeah unless this value is written all the read threads will have to wait they cannot read this value that's true so if i have to update the value of all the nodes till the time i will not be able to serve the uh uh reads as well yeah right yeah that's okay so yeah i mean so in that case yeah rhys had has to wait till these are harm that is true uh yeah so so my problem is how to make this a real intensive it's a really w kind of a system i want reads should be greater than greater than w much much greater than w what is a way to do that okay so to optimize rates how can we uh okay so question is like to optimize rates so i mean how can we optimize rates so i mean when we have more rates um can i mold my system in such a way can i have some kind of a design trade-off which will make my system more read friendly and uh i will lose rights only after uh like i i need my rights to be very less as compared to the reads so um so so how can i mean what if we have i mean instead of um storing a big try so on what if we have uh i mean because when we are writing in a big try one made in one try so what if we uh i mean if we uh chunk it and you know store the chunks and you know based on the ranges if we go to that particular time only update the try so that you know we are reducing the uh blast radius right so we are we are just reducing it uh so uh i'm not sure how that would impact the number of rights so right now what we are doing i'll give you a hint right uh right now what we are doing is whenever a person clicks on uh chooses one suggestion we are updating that particular frequency over here exactly do you think that it makes sense to update the frequency every time a user clicks on a suggestion by the way guys who are looking for what the question is the question is we are designing a search type ahead okay yeah so um so i mean if we do not update so then the subsequent uh reach uh will not get the top so i mean so uh so when when we are reading so you are saying you know instead of updating maybe we will update later uh or in i mean in some intervals we will update uh i mean we will just uh store them and then update uh in regular intervals maybe yes seeing how yes that you are going in the right direction where can we store it how can we store that so basically um so whenever user clicks on that so we are actually updating instead of that we'll store them uh in some message queues or something so then you know um in in some message uh cues and then uh we will we will actually uh have consumers that consumes uh maybe every r uh or some regular intervals to update them uh so even that uh when you're storing in a queue that also has a problem right some consumer will read that message and then come to update this particular free this particular try that we have right instead of that why not store it in a hash map can you use a hash map to store it somehow and then after some frequency or after some threshold you update the complete frequency yeah but what you're saying with message queues will will have some residency is it uh no so for message queue let's suppose i store it in a message queue there is a donald message here there is a donald message here there is a donald message here so i'll end up whoever is consuming my message will end up updating the yeah the same number of times you have pushed the message so my rights i'm not decreasing the rights yeah can we use and decrease the frequency of rights yeah so in this case we'll have a map that will uh that will have the uh you know that will have the frequency of uh each uh vote uh you know in that regular interval and then yeah we will update in one shot so you know one uh uh i mean one shot we can update yeah that's that once my frequency reaches beyond thousand and thousand for a billion if i consider billion search queries thousand is a very very small number right yeah so i can afford to have an error margin of 1000 i don't need to not be a very accurate version as well yeah right so um just keep updating the frequency in hashmap and once you reach let's say thousand you update for that particular keyword so for donald you'll just iterate through the tie and update the frequencies yeah plus thousand to whatever it was earlier yeah exactly cool so uh my one problem is solved uh about the right part now my reads are more right so i can add more parallelization i can uh have uh i'll basically have lesser number of rights so that inherently gives me an opportunity to cash or do all sorts of optimization uh but we still have this problem of storage that entire time i cannot push the entire tray in a single box yeah yeah that is true how do i so yeah okay yeah so um like you know we we can actually partition it in such a way that maybe uh i mean i'm just iterating some possibilities and we can actually look at the pros and cons so one is uh so range based partition like you know all a's are all b's at once uh but but this this will not uh scale well why because um let's say you know the words at with starting with x or w maybe less than the both starting with other alphabet so um so you know again so that will be a problem um so um so or we can actually take prefixes and store by prefixes uh so um um so maybe like you know um so a2 uh like you know from like a b to a a d or something so those prefixes uh if we store any two let's say k is something and x two z something like this right maybe basically bucketing you're basically creating the buckets uh no i mean not like that but still you know uh not not with a length of one but maybe three so let's say so there can be votes like you know um so like uh starting with uh c80 cat um so in the game there can be many votes uh some maybe that's some millions of votes and it was so like uh taking the prefix and you know storing by the prefixes like um so a b or a b c and so on i think a b is stored in one and then a c is another one a d is another one and so on yeah yeah okay so let's suppose uh with this i'll have six uh sorry 26 into 26 characters these many combinations i'll have and it is not really necessary that i i might have more machines i might have lesser machines yeah that is true how would you map this number with the number of machines that i have let's say i have x-men sure so how would you map this let's call this x x number or x number of prefixes with the n machines that i have so we can we can actually um so we can actually do the hashing so uh for a word we can you know have a hash function so um so that can actually i mean so when we when we store so basically we will not store all those abs in in one machine but um so basically what i'm saying so let's say i will determine so um so words starting with prefix a b uh or ac ad and so on some x some k number of prefixes will store in one machine um so and and so i mean like that we'll store in multiple machines uh the question over there is uh i think what you're hinting at is that there'll be out of these one machine will hold multiple prefixes as well but how would you decide uh let's suppose i have prefix a a on which machine this particular prefix prefix should go how would you determine that so so um so basically so these machines uh so these machines will be behind a load balancer so where my load balancer will have will maintain this mapping so this with that mapping uh the load balancer can uh the load balancer can react to that particular machine okay so uh basically there is a road balancer which is sending it to one of the machines that is available yeah right uh but in the load balancer also uh most of the load balancers are uh primarily uh straight less so yes you need to have a stateful kind of a load balancer over here so do you know how to do load balancing in stateful system i'm just looking for one specific term uh anand kumar are you right use the right term uh so basically the term that i was looking at is consistent yeah i mean i was coming to that i mean as i said hashing there so i mean i'll actually usually come but this i got it i got my god in my mind like yeah so consistent hashing we can do right so we will use consistent hashing you still not answered what would be the shard key on what particular parameter i think this is your short key right uh a b and a a and e c uh that is what is what will you call that again looking for a specific term you've answered that previously okay so i mean you you're saying this this prefixes what do we call uh prefixes is what i was looking at so prefix becomes my uh shard key yeah uh and once i have these prefixes ready i can just uh push it to a consistent hashing ring and it will distribute the load evenly and it will take care of how many number of servers are there how many number of servers are not there so we have correctly sold storage problems yeah any other problems that you see [Music] so um so i i mean can you go up so that the cons i mean the uh scope is only top and uh i mean i forgot the scope so only top off popularity okay so popularity problem is um solved and uh top is also solved right so and you know we'll do optimization so so i mean like as of now i don't think um so much of problems here so i mean because the basic design is fine like we can apply optimizations on these so we have introduced sharding what if let's suppose this particular application server is holding a a b a c a and some more data what if this server goes down yeah um so how would you ensure that if this server also if a server goes down then also your system is done so um so i need to copy i mean i need to have these keys in another system so that's one i mean right so um how do we do that one minute i forgot that so um this is nothing but replication right yeah i mean sure so that means we will that means we will keep the replicas in other machines also right for each one all right each of the pixels each of the prefix can be replicated to multiple servers as well multiple servers yeah right and that is how we can define uh how uh how how we are basically replicating so i'll quickly ask you a question on consistent housing so let's suppose these are my server ids one two and three all right yeah uh now let's suppose i have one object over here okay this is key one i'll use a different color this is key one okay now this k1 on which server will this data will be stored for k1 either you go in the clockwise direction are you clockwise or on anti-clock yeah or anti-clockwise so let's choose one convention uh let's go in the clockwise direction so two is holding my data yeah what is the next hour if let's suppose i set the replication factor equal to uh two that means to server needs to hold the data yeah what would be the next choice of server should i push this data to one should i push the data to four should i push this data to three two is the ideal candidate i have added the data but i need one more server to hold a copy of this particular data with server out of the other three available is the best candidate for that okay okay so um your options are one four and three two is already holding the data okay so now two is holding um so so so if we if we assign to three um so the problem would be so if we assign to three the problem would be since we are going in clockwise um so if two is also uh went down i mean i mean for two uh we again uh have to store them i mean store the replica of two in some other machine but um so uh i mean storing in uh in the clockwise uh would be a problem here because uh i mean if two goes down um so we will uh i mean we'll have three uh for it uh oh wait i'm i'm not able to you're going in the right direction so where will this k1 be routed if two goes down so it will be routed to three so um so in the clockwise it'll be round will be two three correct uh so that means if we store it three um see if three also goes down then it's a different problem yeah [Music] we don't need to transfer the data from two to some other node my other my newer queries will be routed to three and three already has the data yeah right so yeah my natural choice becomes three in this case right exactly right right cool so uh yeah i think uh that is uh we've discussed enough on this there is one more way to decrease the number of right operations can anyone uh you or anyone who's joined us uh tell me one approach which we can use to make this property hold true right now uh if i don't do anything rights and reads are same i want a system where my reads are much much greater than my number of rights one way that we have discussed is the hashing approach can you guys think of uh one more way to do it so as you as you said hashtag so similar to that maybe map reduce uh kind of this is something we can use like we'll have a map and then you know those reduced jobs will write it down to them what are the what is your mapper step step here i mean so whatever uh okay so in this case um so the mappers will be uh i mean so maybe we if we can store the words uh and map them into uh i mean into an object like you know i mean similar to hash maps so how we map them so we can transform them into the number of occurrences or number of the frequency and then pass it to reducer okay and i still don't uh understand how let's suppose this is a donald again using the same abusing the donald uh name uh but let's suppose this is the term what would be the mapper step over here what operation am i doing here yeah i mean so i think here this is not possible why because uh we'll have multiple object so so the map step is applied only i mean the transformation can only be applied on one particular object like you know we cannot combine with other i mean i'm assuming so that's what i yeah no i think it's not a solution i i'll take uh options that you know people are typing in the comments as well uh so zelikrith is saying uh jack kritsari is saying that uh we can use our nostril db as well so uh any time you appear for a system design interview you should refrain from saying terms like uh noise i'll use a nosql db and that will solve the problem for me why uh see google search type net problem can be solved using elasticsearch right uh it gives everything that i want uh out of the box the au for a system design interview is to judge whether you if if i give you a new problem can you design it on your own and then uh if you answered uh if your answer to a particular question is that i'll use a nosql db my question would be which dvo do you use and then i'll dissect whether you know the architecture how that particular db is designed or not so unless you don't know how the architecture or how the db is architected do not use these uh these terms in the interview the these are a red flag when you you know use a random technology and uh don't know what is the uh architectural decisions behind that particular knowledge try to stick to the basics if you're using caching don't say redis i say i'll use a uh use a cache uh and when you're saying you use the cache it should be clear in your mind that hey uh this is going to be my key this is going to be my value and my retreat this is uh going to be my retrieval pattern right that is very much essential during an interview so keep that in mind uh yeah i'm sorry over to you yeah i mean yeah i cannot think of it uh anyone who's done machine learning uh if they would uh okay i think be able to uh get a better hang of it it's a simple statistical tool typically used in not just machine learning machine learning is one area but it's a purely statistical way of doing things but what you can typically do is use sampling okay the idea of sampling is that if it is popular no don't use average or anything like that just use sampling if it is popular uh right uh there most of the thousands if you just skip 99 request and choose the 100th request it will be the request of the most popular one for example if it's a india versus new zealand final that is going on right now if i search for ind and missed out you know just randomly uh choose the 100th uh request it will be the for india versus new zealand and because i have that kind of uh you know scale i can just remove or let go of the 99 of my request and just choose one request that is coming to you of course there will be some error threshold over here it will not be accurate but it will be a very much uh you know close to accurate guess i would say so sampling is one more uh thing that you can typically do i am only considered the 99th uh or maybe the 100th request 99 requests i'll just uh i won't just care about it and in fact this sampling way is used in many uh applications right uh what i was saying is so i'll just summarize what i i just said right what i just said was that if something is very much very popular like for example if i type inb uh ia will automatically be suggested because india has been searched by many people right if it is uh if there is something else that is coming out of ind as a prefix it will be searched by many other people as well and by the law of averages or by the law of randomness what i can surely say is that uh even if i remove the 99 the first 99 request whatever is the 100th request that in the chant the probability of getting that particular number uh which is popular that is uh you know really high for our actually popular request i'll give you a small example i'd write as uh let's suppose i have four spaces one two three and four all right what is the probability of picking one it's 1 by 4 25 and so is the case with every other number that is there in the array now let's make something some more interesting now what i've done is i have done i've added one more one now if i ask you to pick a number randomly there are 50 chances that i'll end up choosing one now let's do remove one uh add one more one in the four numbers the percentage now shoots 75 chances of getting one right and this is the law of popularity out of the four numbers that i had one is really popular because it is occurring three times if i pick a choose a number randomly out of this particular batch i have 75 chances that i will get one for truly popular systems the occurrence of this one is over 99 right and that is what i was hinting at you just sample your request they create a sample of let's say a 100 batch out of that match pick one uh pick one element randomly and the chances are that you will be you will end up picking up the uh most frequent or most frequently searched trees so this is a small idea about sampling that you can go back and apply in other systems as well yeah i hope this leaves you with a thought uh and yeah uh cool anything else uh that you want to add location on the uh on the problem that we have discussed um so on back end yeah i think that's it so um so i mean just one thing is that you know maybe in clients uh we need to add throttling that you know um so like you know as we are typing um so you know like we said debouncing so this term called debouncing um so i mean that means let's see if i type a uh and there's an interval like you know like some uh 600 milliseconds like that so if i stop typing then i should send a request if i i'm not stop typing so you know i should not um send the request so that debouncing maybe will add on uh my clients so yeah are two optimists yeah i think that's yeah that's that's it um from my side i think your voice is cracking uh i don't know if it is if that's just me uh or oh is it so is it fine now is it fine yes i mean some maybe yes that's it yeah oh all right uh cool so we've discussed a problem uh a system design problem with everyone uh right uh and i i think we are quite done with one hour mark that are typically uh you know added uh given for a system design interview as well uh i'll give you uh before leaving i'll uh give you guys a thought about uh what are the typical challenges that people uh face during uh system design interviews right uh pacing yourself uh is quite important right i think the pace uh that you showed location was uh spot on we were able to cover quite a lot of ground uh in 45 minutes uh we didn't touch upon many factors as well i didn't go deep into understanding the uh how would you implement a try and all that but i think that is not our objective of our objective as well right we didn't go into things like read limiting and things like how would you actually uh persist a try on db right everything was there in memory so we didn't go into how would you persist a particular try so that in case in case the server restarts how how would you handle that uh we've not discussed uh so we've not discussed the length of a particular prefix that you will add so would you add a also as a prefix into the consistent hashing ring or would you require a minimum length if you are having like what are the pros and cons of all these things that is something that we've not discussed uh but uh i think within a given the time frame uh this is uh this is a decent enough discussion uh there are multiple other aspects as well that we can look at for example personalization uh typically if you are you know interviewing for a senior role you would be asked uh to go go into one peripheral area i didn't get a time to uh you know dig deeper into the other peripheral features that we had missed out uh the other peripheral features are primarily there to test whether your approach or the solution that you're giving is extensible or not right for example if on this particular system also you want to add personalization how would you change your change of the ecosystem do you require to change the entire logic if that is the case then your system is not expensive and extensibility is a major major component when it comes to testing testing out things right so uh i'll probably jump to questions uh that are there in the comments uh right uh if you have other prior engagements i think uh i i've had blog only one hour of your time location if you have any priority uh other engagements are feel free to drop off i'll take the questions and then i'll uh then i'll wind up so do you want to stay so um um so i want to drop off because my mom is waiting for our dinner so yeah i'll drop off guys all right thank you so much thanks uh have a nice day thanks bye guys thanks all right guys uh so i'll take up all the questions that are that might be uh there in this particular chat window uh if you have uh if you've still not typed your question please do so if you want uh me to interview you on uh do a mock interview for you as well uh please mail us your resume at youtube at scalia.com i'll just type it down so that it's visible for everyone this is the id that you should be typically be looking at all right let's now go to the questions page says fast response okay these are the answers where are they are there in question uh should we handle location current issues and frequency at least in count what's the profile of the interviewee uh so the uh i i think uh jay kirat uh he told uh like in the beginning we started off with the profile of the candidate he was uh having six years of experience and hd2 at akamai and prior to this he was working with book my show i hope this answers your question so javan has a question uh in yesterday's short url uh why we use user id for the table if you're searching for short code in the whole tv and providing the actual url won't be time using how about using a key value pair uh just uh one that was uh so the schema that i wrote uh that is primarily for a system with a very very less uh amount of uh which is basically designed for very very less uh users not for a massive scale a kv a mini minister kv is the one that will work the best for a url shortener kind of a design so uh try can be partitioned based on location uh that is not a good approach i would say output uh the reason being uh because if we break it down break down the try on a location basis that will be uh adding personalization but what we were designing uh today was basically a searched ipad that is globally uh accessed and uh the frequencies were also changing globally uh the prefix approach that we discussed in at the end is basically the correct approach arthur says you should not block reads how does it matter if the value goes a little here and there exactly i think uh what he is referring to is that when we are writing a particular system uh that that is when we should never uh whenever we update a particular frequency of a particular record uh that is when we have to uh basically block all the reads because the right will take a lock on the particular record and hence the reads will not happen so we should avoid it at all cost or we have discussed two approaches to avoid that one is hashing and the other one is sampling so we can use either of the two approaches to uh to basically block down the reads and write value uh so i see there are many people who have uh told that caching is the best way to uh do this particular thing however i would like to differ for the uh from then uh caching for a search type ahead is not a very very good idea all right we are actually caching quite a lot of things in the node or where we just reverse to the node and we are finding the output we need to understand that caching by itself is not a solution to reduce the latency for example if i maintain a hash map in memory that is much more faster as compared to storing the data in reduce so whenever you are appearing for interviews and you know giving uh your answers that i'll use caching have instead of using caching word or instead of using this uh tell them where are you going to cache it are you going to have a distributed cache are you going to cache it on the client side or you're going to use maybe a distrib or maybe at a load balancer or maybe at a global cache or a db gas specify what kind of caching you are going to do and what is going to be the key here in this case i am not sure what kind of what can be the typical key of a key for a particular cache how many prefixes can you typically cache also and the frequency also what if let's suppose i cache uh i'm taking the same example donald if i cash don and save the top five words how would you update the top five frequency that will become a problem right uh so i hope uh everyone who is actually attending the session now understand that caching here is not the best way to go uh go ahead uh so vishnu says nosql is a good option i think i have spent some time describing that don't directly jump to a nose sql database identify what are the requirements that you would need to design such kind of a system and then discuss the trade-offs directly jumping to an os scale is a red flag especially if you are interviewing for a higher rock profile for sd2 it will still be okay for but for hd3 it is a definite red flag is uh it's already in memory what will you cache uh what does it mean uh exactly that is what i was hinting at what would be the criteria for caching we will chart based on the prefixes okay uh so vineeth is asking what uh what we need to be prepared before giving in an interview with you uh so uh in this particular mock interview i had uh basically told uh location that i'll be asking you to design a google search ipad so the question was given to him already before uh he came for an uh mock interview uh so whenever i take an interview mock interview with you i'll probably give you a heads up that i'll be asking this question the sole purpose of giving the topic beforehand is uh we want to have a meaningful discussion while uh while on a mock session in an actual uh interview of course you will not be knowing the topic so it's important to understand the system design concepts and prepare based on that so typically if you want to do a walk interview with me uh i can give you a topic and we can have a mock interview on that right any uh stream for low level design uh so this system design interview is a super set of the low level design right it's based on two important uh parallels one is the lower level design uh for a system design interview you need to design the schema schema designing comes under a part of low level design i as an interviewer might ask you to write some classes and code for that particular uh you know business logic that i'm asking you to implement if you are a more experienced candidate i might spend more time in discussing about the systems all right if you have lesser experience i might ask you to design the classes for example the same question how do you how would you go about designing chess i would ask this question differently to a person who has maybe a college fresher or maybe have two years experience and maybe uh with a person of 10 years experience for a person with a 10 with 10 years of experience i will drill deep into how would ensure uh make this particular chess game into a multiplayer game or maybe not a multiplayer game basically a two-player game a two-player online game how would he bake in artificial intelligence how would he make sure that there is no flag there is no cheating component that is happening in the chess tournament so uh doing uh focusing more on the system design aspects of it uh for 10 years experience for a lesser experience i'll probably spend time in how would you design the queen class how would you design the pawn class how would one extend to the other on and how they abstract out the common functionalities that is what i'll test uh probably for a with a two-year experience candidate ah i've learned from thank you so much i hope i take it as a compliment uh in memory hash map is not skill uh scalable can be used by a distributed server so you can have a radius cache as well in case you want to have uh in case you are limited by the number of uh limited by the memory that your application servers are having you can typically take this component out or in a different system itself right so that would not be a problem uh is it possible to explain try implementation at db level in your upcoming youtube masterclass uh unfortunately youtube master class will not use any concept of try uh what we can however do is uh there are people uh like we also take master classes on uh data structures we can pick this topic up uh in one of the data structure master classes uh not at a um at a system design master class right uh and uh dbs as far as i do i know there are no dbs which will allow you to persist or try right up you have to store it as an object and retrieve it retrieve that as an object for r much much greater than w keep uh cash on servers and have some hybrid approach for right through and write back policies to update the data back yes uh pratik we can do that uh but understand that tries are also stored in memory so by storing uh by having a separate cash layer you are not actually gaining anything out of it the amount of time you will spend on reading from a try will be closely similar to what you will uh to the performance that you will get out of the cache as well because essentially both of the systems are reading uh data from memory to find out the top k elements for a particular prefix the max you have to go for is maybe three or four characters or maybe five characters max and you'll get the complete string so you are actually not going uh you know traversing a lot of data when you when it comes to you know uh having the uh reading it from the trial radius is a event-driven system and not necessarily uh slow is single threaded in nature it can slow down the entire system uh so red is even though it is uh a single threaded it can serve uh the kind of latencies that you can get out of uh redis server is around one to two or maybe some five millisecond kind of latencies so in one second you can typically serve from a single uh thread itself you can serve around 500 200 200 uh seconds and then there will be a thread uh multiplexing can also happen so sorry uh you can have a multi-threaded application interacting with this particular uh thread so that also will gain uh give your system a huge boost so uh using context switching and all that uh i have had uh instances where i can with a single uh ready server i can serve million uh qps so traders is blazingly fast even though it is a single threaded application uh red is a event driven system and non-necessarily slow knowledge is about very fast uh cpu intensive uh so output says radius is a event-driven system and not necessarily stone uh this drawing parallelism from a nodejs server cpu intensive application do require multiple uh threaded system but uh not which are just a network i haven't agreed on this most of the time i spend on io thread so uh exactly so thread multi thread cont not thread contention i forgot the term for that uh yeah context switching so context switching will there be multiple threads which will be reading the data and using threads will switch the context to give chances to other thread which is waiting for the uh waiting to be get into running state uh sharding please i think our note is asking is are you asking me to explain what sharding means um i'll be uploading one video on sharding on under the system design series so uh please stay tuned uh right so uh if you are interested in knowing any other system design concept or topic uh please feel free to comment on this video i'll uh end this video now tomorrow we uh i'm going to tell you about how to uh answer the uh coding experiences uh like how do i answer coding questions uh dsa questions in an interview uh i'll probably invite one more uh speaker who will uh tell so i i'll basically invite one more speaker who is who has also done multiple uh taken multiple coding interviews have himself cleared multiple uh company coding interviews at multiple different different companies at farm companies as well so uh tomorrow will primarily be focused upon how do you appear for and how do you think about a approach uh when you are you know going for these coding interviews and uh day after tomorrow i'll take one uh mock coding interview uh similar to the in a similar format that we have done uh today right so i'll see you guys tomorrow uh i'll leave uh the leave the class now uh hopefully that made some uh sense for you uh please subscribe uh if you have not subscribed to our channel uh right i'll see you guys tomorrow
Info
Channel: SCALER
Views: 149,382
Rating: undefined out of 5
Keywords: system design full course, system design, system design interview, system design tutorial, system design basics, system design course, software interview, interview preparation, system design instagram, system design whatsapp, system design for beginners, system design interview questions, system design full course for beginners, caching, consistent hashing, system design fundamentals, scaler, horizontal scaling vs vertical scaling, cache, load balancing, SCALER
Id: P_eh1b6vE-4
Channel Id: undefined
Length: 202min 17sec (12137 seconds)
Published: Sat Mar 19 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.