Design Youtube - System Design Interview

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
let's design the high-level architecture of a YouTube style application by the way this video is taken from my system design interview course which you can check out on neetcode.io now let's design YouTube first let's go over the background even though I'm sure you're familiar with how YouTube and other types of video sites like Netflix and others work compared to Netflix YouTube is a bit different and that users can actually upload videos and it's actually free as well pretty much anyone can upload videos and of course if we can upload videos we can also choose to watch videos as users and when it comes to YouTube This is actually the core functionality though that doesn't mean it's simple to implement there's a ton of complexity with reaching the scale that YouTube does with even just these two features but this is actually not all that YouTube is capable of doing there's a lot of data you can obviously search for videos you can obviously have videos recommended and you know designing that recommendation system could be its own design problem but even then and it would not be able to be fully described and designed in a 45 minute interview of course and users can of course you know comment and interact with videos by liking or disliking them and there's a ton of probably analytics that goes on with reporting views and I'm sure there's like bot prevention with like comments even though there's a ton of thoughts in the comments lately and advertising and you know the list could go on and on the point I'm trying to make is that when it comes to an ambiguous kind of design proposal like this there's many different directions we could go in and of course we can't explore all of them so then moving on to the functional requirements let's say that the main features that we want to focus on are going to be uploading videos from a user's perspective and then watching videos from a user's perspective so these are the two main functional requirements we want to focus on if we have time at the end maybe we can kind of explore how we can extend our design to handle additional functionality but you know these are the main things that we want to be able to focus on when it comes to non-functional requirements a first one that comes to mind is reliability when it comes to videos you would never want to run into an issue where somebody uploads a video and then that video is somehow corrupted or deleted we definitely don't expect that when we're storing something on YouTube even though it's free we wouldn't want a video to just disappear so we really need the videos to be extremely reliable at least in terms of storage and talking about the scale that we're going to be handling even a single video can have potentially thousands of concurrent viewers so that's what we have to kind of keep in mind of course we're going to have a ton of users let's assume that we're designing YouTube to handle a billion daily active users which is about accurate I think now when it comes to these users let's say that each user is watching five videos per day but the upload ratio is going to be a hundred users are watching videos for every one user uploading a video or you know this is the ratio of reads versus rights for videos so if we have five users watching a video per day we have five billion of videos being watched per day if the ratio is a hundred to one that means one percent of five billion is going to be the number of videos uploaded per day so that is going to be 50 million videos uploaded per day so a massive amount of throughput now the good thing is among these 50 million videos most of them probably aren't going to be getting a ton of views if I had to guess I bet you know the top five percent of videos account for like 90 of the views but this is just off the top of my head but I think we can kind of design this in a way that we assume that you know most videos will not be getting views though they do still have to be you know stored and we can't let them get deleted now in most cases doing a bunch of complex math isn't super important it's about coming to the right conclusions which we kind of are we also have to keep in mind that when it comes to availability we definitely want to favor availability over consistency what do I mean by that well every time you go on YouTube and you refresh the YouTube home page and you want to see like a bunch of videos on your home page every time you make that request every time you refresh you should get a correct response you should get an HTTP 200 response and things should load and it's okay if we have to sacrifice consistency to achieve that what do I mean by that well what if you're in your subscription feed and somebody just uploaded a new video somebody you're subscribed to just uploaded a new video one second ago and you just refreshed your home page you see a bunch of videos in your subscription feed but none of them appear to be the one that was just uploaded a second ago hypothetically this could happen if we have multiple storage systems and one of the storage systems that you happen to be reading from when you refreshed the page was this one but this one did not have the most up-to-date data this one did not have the new video uploaded but this one did but eventually that video will be replicated to the other storage but it just takes a few seconds so you're getting stale data our data storage is not favoring consistency it's favoring availability and the worst thing that would happen in this case is most likely you'd have to wait a little bit longer maybe when a new video is uploaded you have to wait five seconds before you can actually see it or maybe in the worst case something like 10 seconds but is that really that big of a deal I think it would be a lot worse if you refresh the page and it didn't return anything to you at all and lastly we want to obviously minimize the latency as much as possible when you click to watch a video ideally it should start playing immediately even if the entire video isn't loaded and if we have a good internet connection we shouldn't have to experience any buffering or waiting for the video to load now let's start with the high level design and I'm going to start with the user journey of uploading a video because uploading is probably going to be more complicated than actually watching a video and this will probably give us a better sense of the infrastructure that's going to be involved with our design now since we're dealing with such a massive scale 50 million uploads per day we probably can't handle that with a single server or so we would most likely have a load balancer sitting in between a bunch of application servers so that we can kind of scale this horizontally now this is a pretty generic thing for now let's just assume that how we kind of do this doesn't really matter whether you know the user hits this application server or this one so for now I'm going to simplify our design and just kind of draw it like the user is making an upload request to the application server even though under the hood we know it's of course going to need to be load balanced now even the act of uploading a video is not as simple as it might sound what happens if there's a short like internet connection breakage like even for just a second and we're uploading a file that's like over a gigabyte we were already halfway through but now would we have to restart over or could we start where we left off let's assume that this is not the direction that we want to go in and we can just kind of say that once the video is uploaded it's going to be stored in some object storage and let's say that this is where we're going to store the raw files that the user uploads but firstly the reason we're using an object stores because cause that's a lot better for storing media and large files like videos we probably don't need to store that in like a relational database for example and also object storage typically you know something like AWS S3 or Google Cloud Storage they kind of handle that replication for us so we can kind of safely assume that if we store something in an object store we don't have to worry about it being deleted and that's generally how Cloud file storage works like things like Google Drive are actually built on top of object storage so at a high level we can safely assume that we have our reliability cover now storing the videos here is fine but what about the actual metadata associated with every video going over what the API for uploading would look like it would obviously have like a title and a description and like the actual like video content itself maybe something like an mp4 that's what's going to actually be stored in object storage and you know there could be a bunch of other things that we store you know things like tags and stuff like that but this isn't really the important part knowing like every single field that you would want to store with a video but most importantly we also want to associate every single video with a user because remember every time you you know go on YouTube and you watch a video underneath there is usually like the profile picture and the username of the person who actually uploaded it this isn't like Netflix where you just have you know shows on YouTube people are actually creating the videos the content creators so you know every time we want to actually show a video to a user we're gonna have to join that video with the user information and the video metadata of the video itself and like the person who created it of course so long story short every time we upload a video we're also going to be storing metadata associated with that video and we're also going to be storing user information in this database and I'm choosing to do a nosql database because we're going to have so many videos uploaded probably going to be needing to read this metadata very frequently in this database itself we can store a reference to the video file in the object store and that should be fine now let's say for the nosql we're using something like mongodb which if you don't recall it it doesn't store things in tables and rows like a SQL database it stores things kind of like in a Json format the terms are a collection we can have a collection of documents and a document is pretty similar to a Json object it's very flexible so let's say you know one collection is videos every video document will have all the information about a single video that we need and we also have another collection for a user and you know all their information you might be thinking if a user you know wants to watch a certain video don't we have to then perform a join with a user well not necessarily with nosql databases like mongodb we can have our data a denormalized is the correct term normalized and SQL is basically like you don't store duplicate data you have Separate Tables and then if you want to aggregate or combine information you can join those tables but in mongodb you don't have to do that we can actually store duplicate information so in every video we actually would store the relevant user information like we know when a user what goes on YouTube and wants to watch a YouTube video they kind of see that profile picture of the user like that's one example that I'm going to be talking about right now well that profile picture is probably also going to be stored in object storage somewhere so that profile picture will probably have a reference to it in the user document but we'll also actually have it stored in every video document of you know the creator of that video so we'll have duplicate references to it but that's okay in nosql because it at the very least does improve a performance we don't have to perform joins now the question is what happens if a user actually updates their profile picture yes we'd have to update you know the user document but then we have to update every single a video document where that person created a video and maybe they have 100 videos or maybe they have a thousand videos we'd have to update all of those documents and in this case that's okay because first of all they're probably not going to be updating their profile picture very frequently you know uploading a video is probably more frequent and watching a video is more frequent so that's kind of what we're favoring here reads over writes but also if they update their profile picture you know we can kind of update all of those video documents asynchronously we don't have to do it immediately is it going to be the end of the world if somebody sees an old profile picture from this user for a few minutes or maybe even an hour probably not so these are some details that we could kind of discuss this is probably not you know high level so let's kind of continue with the rest of our design now when it comes to videos encoding is actually a big part of it as users upload videos like raw video files to YouTube YouTube does a lot of video encoding and compression to get the size of those videos down and encoding a video is not something that can happen like in one second this is definitely an asynchronous task so it can take on the order of minutes to typically uh encode files and if they're you know really large files I think YouTube will allow you to even upload like a 24 hour video file it can probably take hours to do the encoding for that which is the reason why we are using a message queue for that now there's a lot of domain knowledge that would be needed to understand and video encoding and that's not what we really want to dive into so let's just keep it high level as raw video files are uploaded we're going to be storing them but we're also going to be adding them to a queue so that they can be sent to another service which is going to be handling the encoding and it's probably not going to be you know a single server that's going to be doing that we're probably going to have a ton of servers to do that after the videos are encoded they are going to be stored in object storage because you know there's still videos we probably still want to store them in object storage to make sure that they are a reliably stored and replicated and videos are immutable so we don't really need like a Hadoop file system or something like that object storage is probably good enough we're not going to be you know updating a video we'll be updating like metadata associated with it but you know with a video we're either gonna upload it we might delete it but that's pretty much it you're not going to be editing the video now this is how a video can be uploaded but what about actually watching a video well we want the reads to be as fast as possible we want the latency to be as low as possible so anywhere we can kind of add caching is going to be really really helpful we know users aren't going to be reading you know raw video files they're going to be reading encoded video files and we probably want to have these distributed around the world but also to have the videos stored as close as possible to end users we can have a CDN service which you know does exactly that it distributes static files geographically and so when a user wants to watch a video the video file itself is going to be loaded via the CDN which is going to be pulling from the object storage but the user can fetch like the actual metadata associated with a video from a database but to actually speed that up because probably we know that a small amount of videos are going to be getting the most amount of views we can probably add a ash in front of our database and that cache of course is going to be an in-memory cache that's the whole point of speeding it up because disk is of course slower than memory but this can probably not store every video that we need so we'll have to have you know know some way to kick videos off most likely newer videos are going to be getting more views so we can probably have like an lru cache implemented here so now finally let's actually start digging into some of the details and the first thing I actually want to talk about is this encoding part over here more specifically we talked about we could have 50 million videos uploaded per day so my question is how many workers here assuming that they can actually encode the videos in parallel which you know this is a pretty easy service to scale horizontally at least at the high level I'm not saying you know video encoding is an easy a topic to understand but assuming that at a high level you know one worker can encode one video at a time so if one person uploads multiple videos or you know 10 people are uploading videos at the same time they'll be added to the queue and then they'll reach the encoding service before they're actually encoded and written to storage but the point is that multiple videos can be encoded at the same time there's no like dependency or anything like that so if we have 50 million and uploads per day and assuming that every video takes one minute to encode which is probably too small it would probably take longer than that on average but let's say you know these workers are really really good they have really good resources and maybe most videos that we upload are going to be pretty short so in terms of capacity planning how many workers would we need here well 50 million uploads per day that's assuming 100 seconds in a day we can divide that by a hundred thousand I think we get to roughly 500 videos per second are going to be uploaded per day so you know the first thing on your mind would be well can we just have 500 workers no that's pretty naive because remember we said that it takes one minute to upload or to encode every video on average let's say so if we only had 500 workers and in the first second we have 500 videos uploaded okay each of those workers is encoding a single video now one more second goes by and we have 500 more videos uploaded but every worker is busy so we add those 500 to the queue and then another second goes by and we add 500 more and you know that this keeps happening until one minute has passed and then finally these 500 are done and we can store them and now the workers can get 500 more videos but by this point our queue would be backloaded pretty hard at this rate we would never get through the backlog so we need more than 500 workers if you do the math there's 60 seconds in a minute so multiply 500 by 60 you'd get to 30k workers and this is roughly the answer I personally would be looking for now with video encoding it's probably pretty hard to get an accurate estimate and I'm not sure if you know one worker can actually handle multiple videos at once maybe that's the case but the important thing I would be looking for if I ask you this is that you know we definitely need more than 500 workers we need more than how many videos are going to be uploaded in a second that's for sure now another interesting thing about this problem is actually watching a video let's talk about some details on how this can be optimized and the best way to do so is by looking at an example so right now I'm on YouTube on my channel specifically I'm going to go ahead and open up our Dev tools and we're going to be focusing on the network tab I'm also going to filter this on xhrs and I'm going to click one of these shorter videos You'll see why in just a second so first thing you see here is this is how much of the video has buffered you can see this portion of the video has buffered when we watch a YouTube video we don't need to wait for the entire video to download before we watch it we're going to be starting at the beginning presumably we only need the beginning to be loaded but watch what happens when I click over here if I skip to this part of the video we would well it just kind of loaded a little bit so now I'm going to skip over here Watch What Happens see it kind of immediately buffers so that's what we want to do we don't need the entire video to be loaded but it's true that some people might be skipping around they might skip around to this part which seems to be popular and that you know this part of the video has not loaded only this part has so what's gonna happen when I click here well that part got loaded and what's actually happening here if we scroll down in the request the most recent request is a request to actually load that portion of the video we are not using a streaming to do this we're actually making HTTP requests to load chunks of the video I'm going to kind of expand this here you can see a request was made here and what the response was looks like gibberish to us because this is actually you know that portion of the video and going back to the headers when you scroll down to the response headers you can see that okay actually this was not the video this content type is audio so I'm going to hit the second one over here actually and scrolling down to the header and then looking at the content type we see that this one was the video so actually it looks like the audio is being fetched separately from the video so this when we look at the response is probably the video before we were probably looking at the audio not that it looks any different to us and I'm going to go ahead and refresh this and do it one more time so we can see pausing this a portion of the video has has loaded here so we're going to scroll down to the requests we can see the video playback requests are the ones that are actually loading the video itself and as I click here new chunk of video was loaded so let's scroll all the way down to see that one over here these uh multiple requests and you can see that some of them are larger than others but the point is that one megabyte of data is easier to transfer than the entire video which might be I don't know like 20 or 30 megabytes and this is the technique to lower latency loading a video via smaller chunks now while rendering and loading videos is also a domain knowledge heavy topic I still think it's worth mentioning because that technique that we kind of just went over small chunks of videos It's a pretty simple concept to understand at the high level we don't need to send the entire video to the user before they can start watching it we can just send them small chunks of the portion that they're actually watching now another relevant question would be what protocol should we use for sending videos and by the way what we just talked about is called video streaming not necessarily live streaming because we know that video is already stored it's not like a live feed but the video is being streamed meaning it's being sent in small chunks versus like when you actually download a video that's not streaming that's like taking the entire file that's stored and then sending it to your computer and then storing it whereas video streaming I believe those small chunks are actually stored in your computer's memory which is also why you would not want the entire video to be taking up all of your memory so most likely there is some client-side code that is handling that and freeing memory because it's pretty easy to write client-side JavaScript that will crash your browser and take up all your memory so that's kind of something it's a front-end developer you might want to keep in mind because if we were watching like a 10 hour long video which definitely exists on YouTube we would not need the entire video to be buffered in our memory we could just skip around the video but going back to what protocol we might want to use for this since we want latency to be as low as as possible you might favor at a high level there's you know two protocols UDP and TCP you might favor UDP for video streaming now that's probably a better choice for video live streaming because you know as like a sports game is going on if you miss one second of it you don't want to go back to that one second you want to keep up with the most up-to-date information so you want to say you know what's happening in real time that's what you would want if you're live streaming something or watching a live stream that's what UDP favors but with an actual video we know that video is stored somewhere and we want to watch the entire video if you're watching like a movie or something on YouTube you don't want like to miss you know two seconds of it because that might be like the actual plot point so TCP is favored for reliability that will ensure that we get the entire video there's not going to be any missing gaps in the video and sure it might take longer to do that but as long as we send it in small chunks it should be okay and that's exactly what we saw was happening with YouTube it was sending HTTP requests which are Bill built on top of TCP so I think that is kind of also another important question in the context of YouTube compared to a lot of like other system design problems and also there's a lot of other things we could explore with this especially when it comes to uploading videos keeping things at a high level we'd probably want to rate limit this we don't want somebody to just be able to upload like an infinite number of videos or you know that could be implemented in the load balancer itself which we kind of like emitted from this design but we know it does exist also when it comes to recommendations for YouTube videos or even searching we'd probably want to have you know other auxiliary Services which read from our metadata and we probably want to store like a history of what types of videos does this person watch what types of videos do they like so we can kind of build some recommendation for them and for you know searching videos that could be its own topic like that's kind of like designing Google search because there's a lot of like indexing you can do you probably do want to incorporate recommendations with searching as well but also you know you want to have like the metadata like the description the title how many views does a video have which videos are most relevant when searching which types of strings we could have some autocomplete with that and those most likely would be built on top of this or those would be built separately from this kind of core functionality now one last thing I wanted to cover because I think it's always interesting to understand how this type of service was actually built what YouTube actually did was not use a nosql database they actually used my sequel which is a relational database management system now you might be wondering why didn't they use nosql and I definitely don't know the details one guess I have is that YouTube was actually first created I think in like the early 2000s maybe 2004 or 2005 and mongodb did not exist at that point and they probably didn't need to handle the same scale that they do right now but as time went on they found that they did need to scale their database I think what they first did was added read-only replicas because of course this is a read heavy system So reading is going to be more common than actually you know uploading new videos but even then they ran into issues and next they tried to add sharding and so they sharded their mySQL database and they ended up having a lot of complex code in their application server which properly routed uh user requests to the correct Shard I'm not exactly sure which Shard key that they used but that's what they did and then eventually the long-term solution they found was by building a new engine it's called vitess I'm actually not sure how it's pronounced maybe the test but this was something that was created at YouTube and this is basically to decouple the application layer from the database layer the application layer should not have to know about how the database is sharded so vitess was added like as a middle layer in between the application servers and the database at least at a high level and that is kind of where all the logic for sharding and routing the requests correctly lives and this is kind of how they were able to take even a relational database like MySQL and scale it up now maybe if they could go back in time they would have started with a nosql database in the first place or maybe you know some other type of database but they did find a way to get my sequel to work and actually vitess was later open source and it's actually a very popular project that's still being used it's very modern it's very very powerful it's being used by new companies like Planet scale which are taking you know my Sequel and then adding my test to it and just you know selling that as a product and of course adding more functionality but this kind of shows you when you reach problems in distributed systems that can kind of breed a lot of Ingenuity and resourcefulness and you can kind of overcome a lot of limitations that you know we would look at and say oh MySQL if we're dealing with a lot of read scale and we don't need an eventual consistency is fine we can just use our nosql database but they found MySQL to work and if you found that interesting you can kind of read a brief history of YouTube and my secret cool and the test in like the vitess docs here and probably other places on the internet
Info
Channel: NeetCode
Views: 103,148
Rating: undefined out of 5
Keywords:
Id: jPKTo1iGQiE
Channel Id: undefined
Length: 26min 4sec (1564 seconds)
Published: Tue Dec 06 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.