Google Docs System design | part 2| System components explanation micro services arcitecture

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] now it's time to look at the system diagram for collaborative document editing so I'm gonna give you a high-level overview of the system diagram how I have designed and it is always you know debatable you can come up with a even better approach or better system design for collaborative document editing and this is what I have come up with this is not exactly a microservice or it's not a monolithic it's some more tech service-oriented architecture you can think of I don't want to show it in a microservice away because it will just clutter the diagram so I have kept it very high-level and simple for the understanding box so now I'll give you a overview and then I'll go with each and every component window so on the left side you can see for the representation purpose there are three clients connected so they are interact with the API gateway for any operation to be performed for that matter to get the comments which is commented on the Google Doc for any given line or any given section or be it if you want to send notification how if you want to grant permission if you want to ask for the permission or anything it's always goes through the API gateway so once the API gateway gets the request it actually does the you know request composition it actually calls the authentication service to check your request is authenticated or not and all sort of things then here you can see there is a different service which just takes care of authentication authorization or permissions and the other other component which just takes care of the comments and it is directly connected to the nor sequel as we are expecting a lot of comments on the document and we also want to have a hierarchical view of the comments also so I'm expecting that data to be a lot so I'm thinking better say it in the most dB and over here we have email GCM or notification that itself is think as a service which takes care of all the notification a little stuff which is this GCM or APN over here we have app server or services which provides all the different API is to LAN to the main Google Docs page and if you want to export the document to PDF HTML or any different format if you want all this or even if you want to upload a new document and convert it to a Google Doc all these things will be handled over here so we have our DBMS because certain things we need to be consistent and we need asset property there like users or even the document ID and they reference to the document and all the different resources so so we have our DBMS we have no sequel we have time since DB we have that is all used over here time since DV as I have already mentioned this is mainly used to store all the different historical operations performed by individual user on a particular given document ok now coming to the main in part the very important part that is no J's WebSockets and operations queue and session server so for the first time when the user connects or opens the dog he will actually hit the API gateway and he lost the document and as far as if he has the permission he will get a session established once he go to the edit mode and then he gets an old Jes WebSocket no J is our new WebSocket connection which is established from the browser to the server always on so now he can efficiently keep on sending the small operations as a when he wants and then receive the updates from other clients from the server immediately as soon as they broadcast it so this design is designed thinking I'm using operational transformation so I'm expecting the message size to be very small so WebSockets are best for those kind of operations even in node.js hands as well so all these operations should be ordered in a queue because I mean if two guys are have sending the same operation on the same line or same character at the same time you still or the service should still prioritize one operation over other so we need to it's better to put it in the queue and then keep on in just as in when the operations from all the clients for that particular document keeps coming in so we have our personal queue operations queue here where all the operations are queued over here and the session server will keep on interesting both operations and it keeps its own state of the document and that acts as a single source of truth and also it keeps on saving a copy of history our operations history into the time series DB over here so that if you want to deliver it back to a particular version or if you want to check who edited this particular line or this particular word we can easily get that information from time 6 DB or I'm using time cells DB but if you if you want you can actually use more sequel or if you want to have a you know three kind of representation or a graph representation of how this party particular more you know line changed or document changed you can actually use graph TV as well and there is a cloud storage if you want to save the document as it is or if you want to when the user converts the document to PDF or HTML or any other format if you want to give you an given a link to get the document obviously once you download it you need to talk about it to the cloud storage and that link will be provided with the user via email or somehow for them to download so you can use the cloud storage there or to save that in exported document now let us talk about WebSockets most of you guys know why a mini habits okay several advantages for the newcomers I'm gonna explain it anyway so you can actually use HTTP Ajax to send the updates to the server and receive in the response itself basically we need to keep on pulling you know frequently and then get the information and send the information but that is not very efficient because every time when you make an HTTP call it has to do a handshake establish TCP connection do a handshake and then send the information receive the response and this is time taking and also it is overhead so to keep the connection lightweight we have to definitely go forward WebSocket other than HTTP attacks we can actually use long polling or comment like our implementation but those are not much efficient and also every time we make it should if we call the header size and everything will be really high so we may have WebSockets it actually makes connection established in connection once and once it does we can actually send and receive the messages seamlessly in real time this actually helps us to keep on sending the information as and when the user modifies the document we don't need to keep on pulling periodically so that's where we are saving the bandwidth and also overhead on these you know browser also here we have a connection open if the user is editing something then only we sell it or if you have any updates to be received then only we will receive the message like several reasons the message we don't need to ask server do we have any information when we have a connection open server will keep on sending the updates to us so these are the advantages having been all WebSockets and on the server side we can actually use nodejs which is well built you know a synchronous server which actually has like with messages and which is built for these kind of use cases and also when we use WebSocket along with the Redis we can actually provide a lot of cool features like we can actually edible chatting feature for the users who are editing the document that can be easily implemented with the WebSockets and the race and also we can in real time show where exactly the user is editing their cursor in different colors as and when the user joints to collaborate to edit that particular document this is all possible just because of the WebSocket you know real-time communication so now we're a little bit explained about why I have chosen to use micro service kind of architecture for this particular application because there if you see there are different kind of services or today that there is a notification service there is a comment service there is a session service there is an operation service there are different API right if you see all of these different services I can't make all of the service on to a-1 surveys and it will look I don't want to make it a monolithic service if one service goes down it just pulls everything along with it so I have identified very important services I can just deploy them in a separate as a separate services it will never impact we know failure of one service will never impact say for example if our common service goes down our dog editing you know saving exporting port and everything is still functional only the comment services will not be available I mean that's fine instead of taking down the whole app it's fine to have I have one service down and all the others are up and running you know very well so that's a reason why I am going with the micro service architecture so first of all it gives you the simplicity and modularity of the different services we can maintain different services without affecting other services and the second thing is it's very easy to turn up and we can develop any services faster without affecting anyone and if you want to talk from the developers perspective and management perspective also managing you know some small service separately itself is much easier than managing the whole monolithic service and also if you're a new developer the company just now and if you give the call for the whole Google dollars it's very difficult to understand you face the monomyth capitalism but instead if it is our micro service you can just view the comments section cord if you tell them to understand it's much easier to understand the whole core and you can easily debug anything or you can easily add any services into it but if it was a monolithic there is a hell lot of dependencies in between different services we can't just separate those things even if you built it as a module but it will not be that easy as a service it is much easier to understand develop and maintain and deploy also and also the freedom of using different technologies that I can say for example for all the AKS I can use Python as you know programming language to develop that and I can use C++ for handling the you know operational transformation itself because it is proven that C or C++ is kind of much faster when you have a lot of operations involved in it it's fine I'm I have that freedom to you know implement operation transformation using C or C++ and implement all the API is and all the comments service and everything is in Python or Java that's one more advantage if it was monolithic so you will actually tied up to the old you know code except and you have to even if you want to add a new service you are kind of tight on lock down to the old technologies and you had to use the same overtake but in this case it's not like that you can actually use any different static for any different service and then you can deploy independently the negative talk to each other using RPC or you know protobufs or you know a rest or anything like that and also the poor thing is scaling is easy you can just scale up the service which is getting more traffic say for example if the if the people are not actually using the chat services it's fine we can just scale down and then we can save a lot of resource and we can scale command service only and then we can scale APs we have a granular control on the services where we want to scale or beyond to downscale so it will so these are the advantages of having you know micro services so let's learn about a gateway when I say the system the Google Doc system is designing micro service architecture obviously you must be thinking that there are so many services how the clients will know that know about all of these services and how do we deploy and everything right so for all of those things a a gateway is the answer all of the micro services are usually deployed in dock like you know strategy right each and every micro service is built into a doctor as it whenever we want to scale we actually because the number of doctors deployed and everything so this all can actually happen by our one system itself like for example kubernetes for example or dr. Swann or there are so many other technologies available there so kubernetes also act as a a que get a PA gateway what it actually does is as the name indicates it is the main entry point for the offer all of the backend services this is the gateway where the clients will actually hit the request key so when the client make a request the request will first land at the take a tree and then the aka gateway decide which micro service or which instance of the micro service where to hit that hit these are ways to get the response back to send it back to the client so these are the advantages of having a pee-pee so the first one is single entry point so the clients will never need to know for this service where to contact or which IP to contact of which static IP or which force name it's always one entry point one IP is well which we it's easy to configure so there's always a single entry point and a PL composition so when you have a micro service say for example just to get the product information this is just for example just to get a product information in an Amazon page for a product we might need to hit you know product related information recommendation no reviews ratings number of items left in the inventory all this information right so there can be about 10 to 15 APA calls we might need to hit if we don't have a take effect because all of those are different services we have dog is equal all of the different services to get all of the information but if you have a PA carry what happens is we just need to make couple of calls one or two calls just to email you pay internally a take a frame what it does is when the deposit lands to the API gateway it internally does the API composition what is API composition say if I'm pretty custom for our product API it knows internally based on the configuration that we should call five to six different services what are those services are it automatically and a secrecy and parallely which calls fight since different calls either by an HTTP arrest our RPC aim to be messaging technology or any other protocols just to get to remove those respective services and get the response it actually calls the product API recommendation API imagery API you know reviews API rating sake a and collect all the information into one response and sends it back how cool is that we just made one call and the aka clay took the responsibility of calling all the different services and collect all the in our result and set it back and this is very easier for the client-side to get the information of all the different services right so that is one second one is securely so therapist will be keep on adding services and how do we make sure that all of the services if he doesn't have a key we make sure that all of them are properly protected are they authenticating not or checking the Commission's or not so if you have an area trade we just need to hardwire for only yeah here for all the requests but the negation is must so kind of these services are kind of safe inside so these services are kind of hidden inside the network the EPA gateway so no one from the outside can access from the from the customer or the client perspective only the EPA's which are exposed can be accessed for all of these different micro services are kind of hidden inside so that way we can actually maintain security easily in case of Google Docs the same thing right we shouldn't let others to see the comments for the document which he doesn't have access to or you shouldn't be able to communicate our edit the document which here hasn't have an access to write so all these things can be easily handled in a takeaway and the third way is dynamic service discovery services there is a different and difficult problem when we have you know micro service kind of architecture because when you have number of micro services in the backend because of the auto scaling and down scaling and you know different versions of the same service deprived and everything happening we can't have all my static IP stew all of the services and how does the client knows where are these services deployed what is that IP or what is the DNS or whatever or the domain name so that is kind of difficult but that is solved easily using a take it because the client should only remember the aka gateways IP or domain name and the API gateway automatically will keep on talking to the service registry where it has the this issue of what service is in which dynamic IP and everything so it automatically hit still respective IP and gets the information back and gives back to the client so that way service discovery is much easier the Forest Service partitions are hidden as I mentioned these different micro services are hidden behind the KKK and that's one thing and also today I might be thinking just the comment bar all and take a different example in the Amazon you know product page selves if I have a ratings and reviews itself as one service today but tomorrow I might think ok I want to split that into toolbar I want to make ratings itself as a separate service and reviews as a separate service if I didn't had the ek gateway it will um be tedious process because I already change the client code also to you know make two different calls one for rating and one for you know Rubeus because of the aka gateway I don't need to do that at all so I just need to split that into two different service and then reconfigure the API get the configuration to call one more you know a PA call to get the rating separately and W so we can call it together that way service partitions are also fitted I already spoke about hidden micro services right so in the system is very important circuit breaking so this is very important so there's one one module which is developed by Netflix called does his tricks you can take a look at that it is a very good you know application to handle circuit breaking and in case of when if you want to what is the kit braking means say for example I have a a cake all making and it in turn making making about five calls okay just so it's making five different cause for some reason my rating API is overloaded in this case this product a a called will be keep on waiting until this request is survived I have got it restful the response for this this this this I am just waiting for response for this particular microservice for the ratings to receive this is kind of blocking right because because of just one micro service is not responding the the whole trade itself is blocked and in turn we're not able to send back this much response to the client so you said we can actually so all these ten problems right chemists are using a separate baking pattern so we can actually set a timeout first on each and every micro-services if when the API if it makes a call to all of these different micro services if any of the services didn't respond with a given set I don't hide timeout just written all the response whatever you have so far and you can set a priority also say if you have the product information just send it back but if you have all the other information but not the product information just discard everything and send 500 out for whatever 500 or something so that all composition you can actually easily make and I'm supporting is when the product EPA understands that the ratings micro service is not actually responding like the waves was supposed to respond it can actually stop making the question a service at all it will just stop making liquids because it understands that this guy is kind of order it so it's not as formal if I keep on making the request it will just keep on cascading be no requests and then the service will never recover so it just stops sending the quiz to the waiting's APN so it lets ratings api to recover and and become healthy so after some time oddly maybe after 10 minutes it will just make one call to check whether the rating api is up and running heavily healthy or not if it gets a response then it considers that the rating api is up and running and then it keeps on making their you know ratings micro service call whenever there is a product take a call if it doesn't respond other ratings if it doesn't respond then it will just think that the service is still down and it will never make a call so this is what called a self circuit braking and this is very essential when you have a micro service architecture in case of Google Docs collaborative editing also so if the you know operations handling part itself is down the whole thing is down anyway but operations handling part is up but other things like comments down or you know because of statuses down or maybe the online users is down it's fine for us the document about data and operations is very important we can that the let AKA or the APA query written the response back and forth and wait for other services to recover and we are almost in the end of the video so I'm not actually the right guy to talk about the front end but here are some of the ideas on how we how we can actually implement Google Docs front end so I can actually use angular or any other frontal technologies the trick is you can't actually render actual Google Doc on the browser's so whatever you see the Google Doc whenever you edit it is actually the HTML page itself so whenever you add a new page you're actually adding a new div and then whenever adding a new line you are actually kind of adding a deal or span or paragraph or something like that so you are actually not editing the doc or the browser at all so keep that in mind so it's all the HTML file Jason fermentation only so this is all the HTML itself what we are editing it doesn't matter you're actually implementing Google Docs or excel sheet or rich text or just a plain text it's not HTML so what are the features we support in the front end is only the features which are actually supported in the actual dog because we can't support something fancier than the front end and we can't just when the user wants to download it as a doc or docx we can't just say that it's not supported right so we have to only provide the features which are supported by the doc or docx and then we had a the option to user to download the whatever the user has written on the browser to as doc or docx or or HTML or any other different formats we would like to and we can actually use you know enchants the web workers in the backend to have a you know panel threading operations one which capes keeps on taking care of sending the operations and receiving and acknowledging and waiting for server to conform and everything because if we just as we know that javascript is a single threaded if you just give all of the tasks on to one guy it will be heavy overloaded so we have to actually use JS where workers when one thread which kind of handles the you know operations and transformations and you know you know acknowledgments and the other you know worker which actually handles T if the actual features of all the docs which is supported or shown on the agenda and I should be read using html5 canvas but I don't suggest that it's totally like crazy thing to do it's better to use HTML and there are lot of other open-source implementation for you know collaborative editing is available like shays if you can just use those as a front-end and then you can build on top of that as well I think I've explained most of the information related to collaborative editing if you guys liked this video hit a like button comment if you guys have any solutions and thanks as usual please like share subscribe and tell your friends about the channel thanks a lot
Info
Channel: Tech Dummies Narendra L
Views: 36,671
Rating: undefined out of 5
Keywords: Amazon interview question, interview questions, interview preparations, algo and ds interview question, software interview preparation, google interview question, Technical interview question, software architecture, system design, learn System design, system design interview questions, system design basics, system design tutorials, google docs system design, google drive system design, google wave system design, microservice architecture example, API gateway example
Id: U2lVmSlDJhg
Channel Id: undefined
Length: 28min 12sec (1692 seconds)
Published: Thu Jan 03 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.