System Design: Tinder as a microservice architecture

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so i forgot to say this before we started but it's important as prerequisites to know about the system design concepts we have talked about in this entire series uh stop looking at this diagram because you know you're not going to understand we actually start discussing about it and each of these concepts that we're going to discuss now can be you know broken down and gone deeper into so all those concepts will require prior knowledge that is going to be given to you through the videos that i made earlier and also of course through lots of links on the internet so you it's your choice just make sure that you have your basics and fundamentals clear before you actually jump into uh actually designing a system okay let's start hi everyone we're finally here we are finally talking about tindo and we are talking about its architecture so let's say you entered the interview room you meet the interviewer and you sit down so you're asked to design tindo now in my experience i've seen that most candidates get really really into the game they start thinking about what services are they going to use what kind of databases are they going to use but you know my suggestion to you would be to take a step back and think of this system uh in a very logical calm way so if you have been inventing such a kind of an app what's going to come up in your mind is what kind of features will i provide this person and with those features what can happen is you can actually think about how your system evolves um two approaches that you can have for this for starting off with this is to start with the er diagram which we are taught often enough in colleges that you know think about how the data is going to be modeled and then think about how your services are going to consume them finally think about what the clients will be doing to actually call the services so that kind of that kind of thinking is a little too constrained it's also too abstract because you're thinking about how to model your data without thinking about what do your users need the second approach is to go from the front to back that is think about what your users need as features think about how your services are going to actually be broken down so that you can fulfill these features and then think about their individual data uh you know requirements per service so in that way your system is far more flexible it's also a lot easier to start off with the system immediately with with feature development per feature development so the features that we are picking up in tinder um are storing profiles all right so you won't just write this down first of all what you could say or what you could ask your interviewer is so you're definitely going to be storing profiles right so that's an obvious question but best to get it out of the way in that profile there are going to be images that's really important for any dating site so one thing to remember here is that images will be stored in the profile a follow-up question can be so how many images per user do you want that could be five so i'll just note that down five images user is there something else you want to think about images no we can move to the next feature that might come up which is how are we going to recommend matches yeah to look into that but if you think of this in a in a story line what happens is you go to any dating site uh you make a profile and that's how you think about storing profiles you start accepting or rejecting people based on your preferences so that would be the recommendation system that would be some sort of recommending matches yeah uh in that case what are the questions you can think of how many active users do you have is is a good question to ask so number of active users is there something else you want to ask maybe are there other certain countries where there's too much population stuff but don't get into too much details again if you're if you're running behind too many questions per feature that also shows that you're getting into too much detail per feature so keep it fluffy keep keep it flying in the air even now and then you have the third feature which is the best one which is when you match with someone so if you match with someone you need to note that down and you need to you need to do something between the two people so one of the things you want to do is you want to note down matches in which case the number of active users is kind of enough you can take a percentage of that as the number of matches you'll have per day that's going to be an assumption i'm going to assume that typical indian match rates are going to come up here which is for every swipe uh you have a 0.1 percentage of matching with someone all right so the number of matches you'll have per person is going to be 0.1 percent so that's going to be number of active users into 10 raised to power minus 3. matches and the fourth one is once you have matched with someone of course you need to chat with that person so there's direct messaging which will be a feature of tinder once you once you guys match all right in direct messaging what kind of questions should you ask well we'll get to that we'll get to that um for now we have four features avoid taking too many features because it's going to be an hour long interview at most and more often than not you're going to be getting into the details that the interviewer wants you to get into anyways you don't need to pull out more and more features just for the sake of showing that you know you can implement these okay so start with four or five features let's start with solving profiles so storing profiles uh in fact the previous video also talked about this designing instagram uh there was a lot of jargon that a lot of you felt was there um storing images has only one important question in it all right and that is how are you going to store images so there's going to be a lot of images as you can see number active users are large per user you have about five images which is still a constant factor of the number of active users and the the question of how are you going to store these images is something which has been debated for a really long time in the whole technical field that we are in computer science and that is whether you want to store the images as a file or you want to store the images as a blob okay so blob is a binary large object and those of you don't know about this get back to your database classes because this is something which is taught in in engineering there's also another one which is club character large object and that's entirely useless so ignore that we are going to be having the argument of file versus blob so images typically are large in size and you can't store it as a varchar or something that's the reason why you have a binary large object which is specifically for large objects in databases now you might think that databases have definitely got a lot more to offer when it comes to specialized storage compared to files but my argument is not really not really because the only few extra guarantees that a database gives you are mutability okay you can easily mutate the rows in that in that database basically a one data entry the second one is transaction guarantees so transaction guarantees the third one is indexes okay so indexes are mainly to improve your search capabilities right let's start with mutability are you ever going to be changing the image you could yes but why would you ever want to do that why not just create a separate file because an update to an image is not going to be just a few bits so it's going to be the entire image basically why not why not store that in a separate file and get rid of mutability make it immutable so this is an unnecessary feature that the database is giving us the second thing is transaction properties so transaction properties are again not required by the by an image because you're not going to be doing an atomic operation on the image so you can get rid of this there's another feature actually i better note that down so that will be the fourth feature which is required and that is access control so yeah let's get to the third one again um indexes indexes are good for searching they allow your data to be sorted according to a particular field let us say you know you have a profile table in that the name can be indexed so anyone who's searching for gaurav will find that entry quickly because it binary searches on on the name so for a for a binary large object that's useless because you're never going to be searching on the content of the file that's going to be ones and zeros right so you can get rid of that and finally access control this is very important it's one of the arguments for databases using binary launch objects but i would say that you can get the same access control mechanisms using a file system right it is a little tedious maybe but setting up a file system a secure file system is almost as tedious as setting up a secure database so nearly equal the good things about a file are that it's it's cheaper that makes a big difference uh of storing files uh the the other things about files which are good are that they are they're built for this they're built for you know storing a file and an image is a little file so that's that's one good thing but it's a little abstract instead they are also faster in a way because they are storing large objects separately and you can do that in a database using something called vertical partitioning where the profile id is going to be over here and the image id is going to be over here and then you're going to store the image somewhere else but if you're doing that then why not store it on the file which is not just cheaper it's also less likely to do a select star nightmare so if you do a select star on this table so instead go for the file system it's going to take care of that i mean you want you won't do a buy mistake select star on the address and the id okay uh this is a little bit of a flimsy argument but it's still you know in practical real-life systems select star is used a lot so it's one of the good things um the third thing is that these are static so you can easily build a cdn over this content delivery network you can read up on this maybe in the description below you can just read up on it in general content delivery network allows fast access one argument for databases which are going to be showing the files i mean at the end of the day your database is going to be storing all your data so it has to have some reference some address to your file and that is going to be the file url all right and this will be stored in the database so we are going to have an image id of image url and the profile id per image stored by user with profile id in location so and so where are you going to store this image this file url it's going to be in a distributed file system right so your distributed file system is going to be handling requirement number one and that's how you argue for files versus blobs now if your interview is really really pushing you for blobs um you could take it i mean it's not going to really spoil your design it's only that it's good to have these arguments with you when you're logically you know talking about what would be better in an interview but don't push too hard if they say that blob is what i want go for it so taking this first option let's start designing our system the first thing to note here is that we have a client application on the mobile a user actually clicks a button to send us a request okay and over here we have our profile service now what does it need to do it needs to register itself with our profile service it needs to say here's my username here's my password and the profile service then stores that in the database okay of course there's going to be multiple authentication mechanisms there's going to be a password sent to the email so there might be an email service which you're using here but for the sake of brevity i'm just going to assume that the profile service can send emails and can do those two-step authentication stuff okay so once a user is stored in the database the user then asks for something it's probably going to be update profile because they're going to be adding their photos so in update profile with the username how do you make sure that this is an authenticated request the person who's claiming to update that profile is the person who that profile belongs to there's multiple ways to do this the monolithic service way goes that in the profile service you're supposed to have authentication mechanisms making sure that yes this update has come from the right user so they'll send the username and password and if it is authenticated then a successful response goes back that yes there's been an update of course there's multiple issues here the first one being that username and password is a little too insecure so instead we can send a token right and these are all security mechanisms if you mention token you should be good if your interviewer gets too deep into cryptography then best of luck but if you have if you are sending a token to the profile service it should be able to authenticate you and send back a response here's the problem today there's a profile service which is actually authenticating and sending you back responses if there's a new service coming up tomorrow which doesn't have information about tokens and usernames then that service is going to be requiring to talk to the profile service every time and that logic is going to be duplicated in the third service which is going to come up it needs to authenticate the user so every time a user sends a request there's going to be a lot of duplicated code which is going to be run so one of the things you can do here is to a standard service which most places use is to use a gateway and the clients always talk to the gateway no one talks to the clients except the gateway so this is the gateway service all right and the gateway does one thing essentially it just takes this request the username and the token asks the profile service whether this is authenticated request or not so the profile service of course has information about the user and the profile service says yes or no a yes or no response tells the gateway whether it should respect this request or not if it needs to respect this request it will direct it to the correct service otherwise it will fail the request okay that's that's all that the gateway does and once of course it directs it to the correct service and it gets a response it has to forward that response to the client what you have done here is decoupled systems you have taken away this the requirement of talking to a user and after authenticating sending it to the correct service from the profile service you move that to a gateway another good thing about this is that if you are going to be using some of the messaging protocols that we get to once we are at direct messaging then you have separated out the protocols over here and the protocols over here okay so now that we have our profile service we are just going to be updating the profile we are going to be changing the description we are going to be changing the name maybe so on and so forth uh images do you really want to store images in the same place other profile services there's no hard yes or no but logically you would probably want to store the images in a separate service because tomorrow if there's any other service which just needs the images of the user maybe for machine learning maybe for just you know it just needs images to send it across so you could you could effectively do that by using an image service rather more sensible scenarios when you just need the details of that person's profile maybe just a description or maybe just the age and you can send it across easily while the image service is used for heavy computations when you need all images of that user so image service is going to be of course having a distributed file system uh in which it's going to be storing all the images it's also going to be having a database in which you have the profile id and you have the image id and you have the url of the image being stored in this distributed database right so these are references okay so first requirement done yay we're taking care of the first requirement and what do we do next second requirement was uh recommendations i wouldn't go for that just uh i mean just now because it's a little complicated to get into what i would suggest is going for direct messaging which is chat so for that one of the things you can do is you can think about how do i connect from this client to another client which is over here so this client wants to talk to this client imagine that you just match with someone and you're going to send a direct message uh it means that you're going to be telling the gateway that hey i want to send message to user idx so instead of the update it's going to be message to user id 1 from user id 2. so this guy is user id 2 and this person is user id 1. now the question is how do we send a message to this user there's a lot of people who asked about what is xmpp last time and if you know about http which is a hypertext transfer protocol it's it's mainly a protocol a way of talking between two machines and in this way of talking there's always going to be a client and there's always going to be a server so a client talks to a server a server responds to that request it's now that the server goes to the client and says that hey can you give me some data right so when you have a client server communication protocol you cannot have chat you cannot effectively have chat because if there's a client on the server the only way that this user is going to get the messages that are sent to them is to poll the server is to say every five seconds hey are there any messages for me hey are there any messages for me and that is extremely inefficient from the from the overall app side so you don't want to pull the server you want messages to be pushed to you and if you want them to be pushed to you there multiple ways you can actually do it with http also to some extent but instead the better way to do it would be to use a different protocol a peer-to-peer protocol where everyone is equal all right so there's no client server now this is a machine this is a machine that appears and if the server needs to send any message to the client it can so one of the protocols actually doing this is xmpp all right and one of the clients our protocols of course is http so make sure you mention this because it's uh it's pretty important to know how clients and servers talk if if there's a chat application right so this message is going to also be sent through xmpp probably so um we have discussed how we are going to send the message and get the message but what's going to happen internally internally one of the things that could happen is you know the connections that this person this this xmpp is going to be taking a connection and that's a web socket connection all right a lot of this might sound like jargon so study yeah well in a system design interview expected to know a few things about this instead of websocket you could also simplify and say that hey i'm going to be writing a protocol of my own but it's going to be tcp yeah because you make a connection and you maintain it so tcp tcp happy happy now with these connections you can actually talk to the clients that's good um who's going to be maintaining the information on these connections as to with every connection id you need to know which user is using this connection right so there's a set of connections on the gateway service each of these each of these users need to use this connection to be able to talk to other users you need to find out where does this user belong i mean which connection is that user listening to and if that is the case it could be done by the gateway service but again i suggest you decouple the system as much as possible so you take away responsibilities of maintaining connection info from the gateway service by putting that in another service which can handle sessions so this service is going to be handling sessions in your in your overall architecture if you are having direct messaging this is more than enough because it's going to be storing connection information user id to connection right and with that what you can do is you can figure out where the other user is stored i mean which connection are they using and send a message to this socket then so direct messaging is possible if you do match things are looking good now however of course the two requirements that we had they have been taken care of out of the or the four that we had two of them have been taken care of the only two remaining are noting recommendations and the second one is what is the second one it's recommending uh suggestions i mean recommending people to you so noting recommendations let's quickly go over it let's say that you have the client right it can store information it's it's an app it can store information why not have all of that information as to who you're matched with or who you have asked to be matched with stored on the client are there any problems uh are there any pros and cons yeah are there any cons to this this thing well one of the cons is that the server should be the source of truth you should have the server knowing everything and you can then rebuild on on that i mean in case a client loses all the information in case they uninstall that app the server has all the information but what kind of information are you losing when you're noting down matches if you match with someone send that to the server say that hey i match with this person that's going to be probably sent to the profile service maybe it could it could handle that or it could be a match or service which just keeps a table of user id to user id which means this user has matched with this person now indexes like we talked about are going to be put on each user id over here right so um in fact you can put it on both but i'll just put it over here and you can duplicate the records so a matched with b means b also matched with a and you can keep it that way now the matcher is going to be checking if your match with a particular person that can tell the session service whether you're authenticated to actually send this message to that person a direct message so there will be some communication between the matcher and sessions so in case there's a message being sent it's first going to be sent actually to the matcher which will confirm that yes you have been validated to send this message it sends it to the sessions which then uh sends the the information as to where which connection you have to use and then you can always send it to the right place right so it's a pretty long process you might find a better way to do it but you know in general this is fine this looks fine okay so we are talking about noting matches if we need to note down the matches the matcher service can note down all the matches that you have if the app is uninstalled when it is reinstalled it will pull out all the matches you have from the matcher make sure that you can chat with them your profile is going to be pulled out of this service and the only information you are going to lose is the number of people you saw swiped left or right is that critical information no because you should get a chance again to swipe a person right or left in case you've already done that earlier so you're going to get re-recommendations of the same person so that takes care of requirement number three we are storing all information relevant to the number of matches you have had on this matter service and all information relevant to who you have swiped left or right inside your cell phone and in case you uninstall it too bad you will do it again that's all not that big a loss so three requirements taken care of that's good the final requirement we are going to be looking into is a little complicated it's about recommending people to you and let's see how we can actually put it in this architecture the biggest problem with recommendation will be to figure out who are the users close to me so i can figure out quite easily which genders i'm interested in which age group am i interested in using just indexes but when you look at this number of users you have a million active users you have to per user figure out which person is close to me right that is the core of all recommendations in this system so the profile service could have in its database i mean the name of the person and all those things but they also have the age that's good we also need the gender right and we also need another thing which is the location so based on age gender and location these three things we need to make decisions now a lot of people would probably come to the conclusion that why not put indexes on all three okay and this is a common misconception you cannot have multiple indexes basically you cannot have the data sorted in multiple ways in one database table so if it is sorted by age and it is sorted by gender and it is sorted by location when you're going to be making a query it's only going to use one of those indexes okay so it might use the gender index maybe i am interested in females so the female category will be picked up by the database it will be made efficient that only females will be picked up in just one shot because there's a there's a binary search going on over there and in that i'll have to search for people within a particular range and in that i'll again have to search for people within a particular location so it depends on what the database picks up as the single index that you have but because it depends on the database because it depends on the database optimizer query optimizer it's out of your control right pretty much so i mean you can you can suggest the query optimizer but what i'm getting to is that you need to optimize on multiple parameters and you can effectively do it only on just one so in this case what happens is you need to use either a nosql database like cassandra which is really good at querying for these kind of data types you know you just replicate the data in multiple places and depending on the query you build a table on that query and then you can have an efficient query so one of the things about the recommendation database is you could have a you could have a distributed database which is something like cassandra or amazon dynamo okay that's the first solution the second solution is if a person is not very comfortable moving on to a distributed database you could use the same concepts kind of on a relational database and that requires something called sharding also known by the veterans as horizontal party shun horizontal partitioning means you take some property of a data basically um you set ranges in one in one column and you direct data to a location based on that range so a lot of fluff that i just talked about what about name right all users having the name starting from a to j are going to go to database node number 36. all users having it from k to p are going to database node number 79. now you see what's happening here i'm partitioning the data based on its value to different locations and in this case what happens is when i'm going to be querying the data i can easily figure out what's the name of this person oh then it must exist in database number so and so right partitioning as a concept is really useful um one of the one of the ways that you can partition data is sharding or horizontal partitioning so i would suggest you have a look at the consistent hashing thing anyway this is consistent hashing is going to be critical to keeping your servers functioning so after that you can also have a look at sharding which i'll probably take a video on sometime but that's it that's what charting is you just do horizontal partitioning and based on your value you're going to go to a particular node um as usual what about the single point of failure what if this node crashes are all users from k to p going to fail i mean are the requests going to fail no you can have a master slave architecture so if the master fails the slave comes up if the slave fails then you're happy no i mean the slave failing has a really low probability because both of them have failed at the same time you can bring up a node in between so uh that is sort of how you are going to be doing the horizontal partitioning per per partition you can have a master slave or multiple masters and slaves but then you know you need to convince your interviewer as to why you're choosing sharding which is a little complicated versus using a using a database like cassandra dynamo which is going to give you all of those features in one shot okay now why am i using sharding why am i using cassandra i mean why am i talking about all this i'm so sorry the reason i'm doing this is because you need to shard the data based on the location that that person is in so it doesn't need to be necessarily a city it could be chunks of that city you can you can figure out that okay a person within this location is within this chunk and if it is within this chunk they are being sharded to a particular node okay based on that chunk therefore you can easily pull out data also you can pull this data out all users within that chunk and then search amongst it within the age and the gender variables so each of these databases can actually have the age sorted and you can query on the age and then finally you just filter out the genders that the person is not interested in okay so this is the kind of stuff that you could do basically to improve your recommendation engine your recommendation engine is going to be simple enough it's going to be recommendation service all it does is it pulls out all relevant people maybe from the same profile service thing or it could just be storing the user ids and the locations the current location this current location can be updated every hour every two hours every three hours depends on the client it can push that thing or the number of pushes it makes doesn't matter only after an hour you are going to make an update to the location and based on this location you you're going to be serving users for that particular user all right okay if all any of this is going you know seeming too complicated it's fine it may not be the right time to just start off with building systems but you can you can get the general gist of what's happening if you're if you're able to break this system down into pieces and if you're able to essentially partition and partition and remove single points of failure and figure out how these features are going to be done using interactions with each of these services then you're doing well all right good job good job um and that's pretty much it i think that takes kind of the second point also of recommending people you can recommend using this way that takes care of all four points in fact and tinder is one of these services which in fact seems quite simple and is because there's not so much of a news feed that you need to take care of there's not really a lot of a lot of social interactions going on like it's not group messaging it's just direct messaging so it's a nice system to start with i think it's one of the interesting systems i felt uh that i should start with and if you have any suggestions or if you feel like there was something that we missed out on uh definitely leave them in the comments below i think we are going to have a really fruitful discussion in the comments below after this and if you have any doubts on this feel free to ask i'll try to post as many relevant sources in the description um if you like this video then you can just hit the like button and also just subscribe to the channel because i'll be posting on similar services uh all the time so uh yeah that also gives me the question as to which service do you guys want to see do you want to see instagram twitter it's up to you guys i mean basically just leave a comment or maybe i'll take a poll that will be easier just make sure that you subscribe so you get a notification for the poll also others have to ask every time and i'll see you then next time
Info
Channel: Gaurav Sen
Views: 1,121,872
Rating: undefined out of 5
Keywords: System Design, System Architecture, Tinder System, Interview Question, Image storage, blob vs file, file vs blob, Direct messaging, xmpp, chat system, messaging protocol, decoupling, sharding, horizontal partitioning, noSQL, database, scalability, request handling, system availability, partitioning, design question, system design question, gaurav sen, load balancing
Id: tndzLznxq40
Channel Id: undefined
Length: 36min 41sec (2201 seconds)
Published: Sun Jul 01 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.