Whatsapp System Design: Chat Messaging Systems for Interviews

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone this is gkcs uh this is a video on designing whatsapp uh it's a chat based application so once you know how to design whatsapp you will be able to design any chat based application to a large extent uh the special things about whatsapp are that they have group messaging and they have these read receipts so those are the two key features that people look for in a normal system design interview but there's also other features that we'll be talking about and we'll be talking about the features that we should probably not take up during an interview and you know basically uh choose the kind of things that we are doing so that we can actually finish in the hour that we have now amongst all the features that you can ask your interviewer as to you know would you like this would you like that probably you should start simple and you should start with things that you already know because i have noticed that the first feature that you ask for the interviewer usually says yes so one of the things i'm comfortable with is group messaging so whatsapp has groups at most 200 people can enter these groups and so group messaging is something that i understand to a good extent image sharing is another good question to ask as to you know are images going to be shared in these messages and an almost obvious answer is yes we will allow image sharing or video sharing also a good question but i mean this is something that if you have used whatsapp uh you'll know about is sent delivered and read receipts so you have those tick marks coming in based on what stage is the message on the final two things are not critical to an application in terms of features but it's nice to think of in an engineering way the first one being that is the person online and if they're not then when was the last time that they were seen on the chat and the second thing is are the chats temporary or are they permanent so if you have a look at snapchat or even if you have a look at whatsapp in a way they are much more temporary than a lot of the office messaging applications the reason for this is because you want a lot of privacy you want to give the user a lot of power also it actually saves a lot of storage space if you think about the chats being stored in the user's applications only but if there is any sort of compliance that you need or if there is any official communication then you know you want that message to be stored somewhere forever so that's another thing that we'll be asking although whatsapp gives you so to speak only temporary chats yeah if you delete the app and if your friend also deletes the app those chat messages are lost forever so one thing i'd like to say is that image sharing has already been taken up on this channel if you want to have a look at how this is done have a look at the tinder video it explains how images can be stored retrieved etc etc in a in a sensible engineering way so you're left with four features for this video and the first one we'll be picking up is group messaging before we get to group messaging we need to first talk about how does one person send a message to another person so that is one to one chat and that is our requirement which is one two one chat all right this is what we're coming to okay let's take this step by step uh a lot of the things that i'll be discussing in this are there in the system design playlist so have a look at that when you're looking for things like load balancing when you're looking for things like messaging queues i'll be using those things as abstractions as structures to meet all the features that we have talked about if you want any detail then you can always go there single point of failure is also something pretty important in the whatsapp architecture so have a look at those now let's start you have the application installed in your cell phone uh you connect to whatsapp on the cloud uh the place that you're connecting to is called a gateway the reason for this is because you will be using an external protocol when you're talking to whatsapp but whatsapp might be talking in a different language with its internal services main reason being that you don't need that much security you don't need those big headers that http provides you when you're talking internally because a lot of the security mechanisms are taken care of on the gateway itself right so once you do connect to the gateway let's assume that you're actually sending a message to person b so you are person a and you're sending it to person b person a connects to the gateway the gateway actually needs to send it to person b somehow so you could store this information as to which users are connected to which box in the gateway itself in that case you would need some sort of a user to box mapping okay for the gateway service which is a micro service itself it needs to store the information as to this user id is currently connected to box number two so if this is box number one two three then there needs to be information saying that b is connected to two and a is connected to one when you have this kind of information being stored on the boxes itself it's going to be an expensive thing why is it expensive because maintaining a connection a tcp connection itself takes some memory what you want to do is you want to actually increase the maximum number of connections that you can store in a single box and you don't want that memory to be wasted by keeping information uh for who is connected to which box okay second thing is this information is being duplicated on all three servers either it's being duplicated or there's some caching mechanism or there is some database which is actually handling this uh this is transient information so there's going to be a lot of updates going on over here and this is this is not nice there's a lot of coupling that i can see in this system so what you want to do is you want to keep a dumb connection yeah this tcp connection should be dumb in the sense that it just takes information and gives information it doesn't know what it's doing apart from that the person you want to be asking for when it comes to information on who is connected to which box is a micro service in itself and this micro service can be the sessions micro service what is the sessions micro service store well uh who's connected to which box just that information that we were storing over here and was being handled by the gateway has been decoupled from the system and been sent to the sessions microservice you can see that the multiple servers for single point of failure avoidance okay so when a user is sending user a is sending some message it actually asks for send message with the user id for b when the gateway gets this message it's pretty dumb it doesn't know what to do it just sends it to the session service okay this session service is indirectly a router when it gets this message when it gets this request rather of send message to user b what it does is it figures out where does user be exist which box is user be connected to and then routes this message basically sending this message to gateway 2 to send it back to user b now what's happened is a has sent a message to b interesting how can a send a message to b if the server is sending this this final bit where the gateway 2 is sending a message to be this can't be done using http it's a server to client protocol i mean rather it's a client to server protocol so the client sends requests the server gives responses so you cannot send a message from the server to the client you can only send requests from client to server there's many ways to get you know get over this using http itself one of them is long polling in which case what happens is every minute or so b can ask for hey are there any new messages for me and then the gateway or the sessions management service or whichever one you would like can send it the message of course this is not real time and if you want something real time especially for chat applications which is it's very important to have the real time thing so http is not something that we can use and we need another protocol over tcp right and the thing that we're looking for really are websockets so web sockets are super nice when it comes to chat applications uh the main reason being that they they can allow you peer-to-peer communication so a consent to be consent to a there's no client or server semantics over here so with that what happens is literally the server can send a message to the client b okay so we are happy we got the message what now well b got the message so that means it has been delivered at this point user a should be notified that the message has been delivered there's one place that i missed out on when the message actually gets to the gateway and gets the session service what it can do is you can send a parallel response to gateway one saying that okay i got the message now it's going to be sent to user b when it's possible let's say a different database for the chat and because it's stored in the database it's safe it's persistent uh it will keep retrying the message till user b gets it so a is guaranteed that b is going to get the message so it should get the send receipt so just give a response saying that okay i got the message gateway1 is now going to send the message to user a so cent is taken care of when this entire flow is completed when b gets the message for the first time how do we deliver i mean how do we give a delivery receipt once you send the message to b and b actually got the message it should respond i mean it should again go to gateway 2 and say that got the message all right that's an acknowledgement a tcp acknowledgement when gateway 2 gets this message it sends it again to the session service saying that hey this message was received so this message was received the message is going to be containing a two and a from field yeah so the session service what it can do is okay the message has been received by the person who was tagged over here too which is b so the person who sent the message from a should get a delivery receipt and so sessions find out finds out again where a exists that is box number one send a delivery receipt a gets a delivery receipt okay and of course you can think about how red is going to work the moment a person opens the application comes and opens this chat tab they send a message saying that red and the exact same flow takes care of red also all right so that's a lot to digest if you like then you can go through this a little more this is the very first feature of sending and basically delivering receipts to the sender okay the second feature we are talking about is quite simple uh it's about the last scene or is the person online right now uh at a scale i mean at huge scale when there's millions of users everything gets complicated but one of the principal architectural things that we can do over here is this simply put b just wants to know when a was online the last time this information has to be stored somewhere and what the server can do is they can ask a but that would be stupid so instead a is not even in the picture now and the only messages which will be sent and received are from b and the server so b asks the server when does a online last there needs to be some information in some table saying that this user was last online at this time so some time stamp and a will have some entry over here with a particular timestamp the only question which remains is how is this row maintained the last scene timestamp for a particular user this key value pair whenever a user a makes does an activity basically sending a message or reading a message or any kind of request to the server should be logged as an activity and that should be that current timestamp should be persisted in this table in that way we can say that whenever a did anything definitely they were online which means that the last scene timestamp needs to be updated based on this b can be told that if a is online or not one of the key features over here is that if a was online three seconds ago then b shouldn't be told that they were online three seconds away instead though the showing tag should be online right probably they haven't done any activity in the last three seconds you can keep this threshold to anything that you like maybe 10 seconds maybe 15 seconds but the important thing is they are either online or they were last seen at least let's say 20 seconds ago the last scene tag is a little tricky to update even after taking in all activities so what i'll be doing is whenever a user sends a request to the gateway i'll be having a microservice which is the last seen microservice and what this will be doing is it's doing user activity tracking anytime there's an activity they definitely send a message to the gateway when they send a message to the gateway i'm going to say that they're last seen at this point now interestingly there might be some requests which are not being sent by the user but by the application itself for example when you poll for certain messages maybe you're offline you're not using the app but uh you want your application to notify you whenever there's a there's a message so for example delivery receipt that's not an activity by me so the request should be smart in the sense that the client should be smart saying that this is a user activity and this is something that the application itself is doing okay so two types of messages being sent by the client one type is user activities and the other one is let's say system generated or app messages app requests this can be a flag in the request itself if it's an app request don't send it to the last scene service if it's a user activity send it to the last scene service it'll go and update the last same timestamp for this user and in that way what can happen is user b can say whether the user is online or at least they were last seen at this time stamp by querying the service so feature three is also done all right so we are very close to actually completing this chat messaging application uh as you can see it's a pretty complicated diagram but we'll get to everything one by one um certain things that i like to skip over so to speak is load balancer because we have already talked about this so i won't be talking about how the load balancer balances the load across the system uh there's one interesting thing which we have not talked about in the series which is service discovery or heartbeat maintenance and that will be taken in a separate video but it's pretty interesting you can have a look at some blogs and probably post them in the description below the authentication service is another thing that i'll be talking about later main reason being that it's quite simple but it's something worth talking about as a basic principle so that will also be taken later as you can see these four services are things which are not really relevant to whatsapp so to speak the profile service is a very generic service image services sending emails and sending sms's okay then what is core to the chat application sending messages now you can see that there are five users that have drawn over here um the red guys are in one group the green guys are in the other group so whenever a user from the red box sends a message it should go to all other red boxes and this is the feature of group messaging right so this red user is connected to gateway one while we have the other red users connected to gateway too so let us assume that we send a group message through this user the problem here is that if the session service stores all the information for all groups let us say the red group has these three users and they are connected to these three boxes it's too complicated for the session service to handle i mean it's something that you can decouple so that's what we have done we have decoupled the information for who is existing in which group in a group service now the session service when it gets a message from a red user is going to be asking the group service who are the other group members in this group the group service can then respond saying 10 members with these user ids exist in this group now the session service runs through its own database usually this information is going to be cached as much as possible but it can figure out where these users are connected to through its database i mean those 10 users it had a mapping for user id to connection and that connection tells you which box which gateway it exists in so with this information it can then route the messages to each of these users one by one what if the group has too many members too bad whatsapp actually gives you a maximum limit of 200. there's a lot of chat applications which try to contain that to 500 600 main reason being that you'll be otherwise fanning out the request too much if you've seen the instagram design video uh what happens in that is also when a celebrity actually posts something it's effectively sending messages to sometimes millions of people and that's not practical so you have to either batch process them or you have to wait for these guys to pull them uh in a chat application because you want the messages to be real-time as much as possible you can't really have too much of a pull mechanism so instead what you do is you limit the number of people in a group so 200 is a slightly reasonable number compared to millions yeah it's a very reasonable number so what we are going to be doing is we are going to be limiting the number of users we have to some number x and we are going to be assuming that the sessions um can handle web sockets sending these messages to the relevant users okay now let's get into the details of this mechanism i mean we have the bare bones thing how it's going to work but the details are important the first thing that i would do in this architecture is because a lot of users are going to be connecting to my gateways these gateways are going to be starving for memory that's the reason why we have separated out the session service that's one good way to reduce memory footprint uh the second thing you can do is passing the message right maybe the message is sent over http it's a json message so on and so forth you don't really want to pass the message converted into an object do some smart things on it find out whether it has been authenticated or not on the gateway itself all those responsibilities as many responsibilities as you can you want to push away from the gateways because those are web sockets those are expensive those are actual users connected to your box so i would send an unpassed message to the session service or to anyone i am sending it to one smart way to actually send an unpassed message to any service that you want to is to have this unpassed message go through a parser microservice okay you don't really need uh too many servers here just two are enough so i'll just call it uh parser and parser microservice what it's going to be doing is it's going to take the unpassed message okay and going to be converting it to a sensible message so if your internal protocol is instead of http or written something or tcp or you have something like thrift which is used by facebook internally so i would say thrift then you can you can pass the message over here itself right what is the advantage let me just again uh reiterate you get an electronic message over here you send the electronic message forward there's no work that you're doing on the gateway itself this electronic message will be converted to a sensible programming language object by this parcel on parser all right and that will then route it to the right place okay so that's one way to reduce the memory footprint over here what are the other concerns or key areas that we should focus on group id to user id and this is a one-to-many mapping right one group can have many user ids and to reduce a lot of the duplication and information that you have we go for something called consistent hashing we should have a look at that consistent hashing helps you reduce the memory footprint across servers by delegating only some information to some boxes okay have a look at the video in case you're not sure what this is uh consistent hashing is going to allow you to actually route the request to the right box what should be routed on the group id if you have the request routed on the group id then it can tell you that for this group who are the users belonging to this group all right so that takes care of the routing mechanism we have in case any time the group service fails like you send the message to the box it failed what do you do you can retry but you can only retry if you know which what request you needed to send next so one of the mechanisms for this is message queues yeah we have discussed this in the playlist so i won't be getting into too much detail but message queues are nice in the sense that once you give a message to the message queue it ensures that the message will be sent maybe now maybe 10 seconds later maybe 15 seconds later those are configurable options and also how many times you're going to retry all of this is comfortable in the message queue if the message queue fails to send the message even after five retries it can tell you that it's failed you propagate the failure all the way to the client saying that no i couldn't send this group message okay that's also fine but the client needs to be told that it's failed or it's cleared interestingly when the group service gets this message it can send a response that yes i i got the message sessions then sends a response to gateway and the user who sent the original message gets a sent tick mark okay group receipts when it comes to delivered or seen is pretty expensive main reason being that everyone needs to say yeah i got the message i got a message and then finally it has to come back to this guy so we won't be getting into that many chat applications actually don't even have that so it's fine right the final few interesting things when it comes to chat messaging or group messaging especially is that you need item potency um there's a entire video i made on retrial and adam potency again taking the tinder messaging example so you can have a look at that for the technical details this architecture is actually very resilient and as a chat system it's going to do pretty well there's some tips and tricks over here uh that you can you can get to know only if you have worked on messaging systems uh so i'll i'll give you a few examples for example i mean i was just reading this blog that facebook messenger does uh it de-prioritizes messages in case there's a huge event like let's say new year's or let's say some festival like diwali in india there's going to be a lot of messages everyone's going to be wishing each other happy diwali happy new year and that's that's going to be putting a lot of load on the system so all the principles of rate limiting come in here where you don't take messages which are very important or sometimes you just drop messages instead of dropping i mean the best thing to do is to you know de-prioritize messages things like last seen can be ignored the entire feature can be ignored it has this message been delivered has been received those are not as important as actually sending the message to the user you know the first thing of the server getting the message and the acknowledgement that's all the user needs to know okay that's more important than seeing whether the person has read the message or not so by de-prioritizing unimportant messages you're actually keeping the system health good and you're you're performing okay instead of not performing at all so do check out the course it's really useful when you're designing systems like these of course this takes care of the last requirement that we had which was to send group messages yeah that is requirement number one taken care of in the end all right thank you so much for listening thank you so much for going through this system design if you have any doubts or suggestions you can leave them in the comments below uh if you liked the video then hit the like button and if you want further notifications then hit the subscribe button and i'll see you next time oh and i'll be posting a poll so vote for what you want to see next time see ya
Info
Channel: Gaurav Sen
Views: 1,565,122
Rating: undefined out of 5
Keywords: system design, interview preparation, interviews, software interview, coding interview, coding, coding question, problem solving, design interview, computer science, programming interview, gaurav sen
Id: vvhC64hQZMk
Channel Id: undefined
Length: 25min 14sec (1514 seconds)
Published: Tue Jan 22 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.