How to build YouTube like website with AWS?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome back and in this video i'll be sharing how you can create a highly scalable video delivery infrastructure and pretty much show you what we do at code dam if you have seen any code damn course you know we have videos embedded and interactive exercises embedded as well but if you closely look at the videos the videos are not emitted from youtube any other third party player for example but they are custom optimized and delivered as well in this video my motive is to how not only just how to deliver but how to create this full pipeline and this will involve a lot of infrastructure architecture and i know a lot of you have been asking me this for a very very long time so let's just go ahead and see how we can create one also if you're new here make sure you like and subscribe to the channel i'll be doing more such videos live streams and things here and yeah i mean if you don't want to miss it make sure you hit the bell icon subscribe to the channel and also like the video it helps feed the algorithm okay so video processing where do we even begin the first thing you have to realize is that your video whatever you're trying to process and put in front of the user it has to be optimized in a certain format the reason for this is web currently does not support every single video right and most of your videos for example this video which i'm shooting in 4k 60 fps is 4gb for 10 minutes right and this is exactly not what you want or what you're watching on youtube right so youtube is also processing this video in some format that's why they're able to deliver it to you so fast your internet connection i hope is not something which can afford 4 gb per 10 minute i mean even if it could it's not worth it right so what you want to do first thing is that you take up the video which the person has let's say this is an mp4 video it could be any any single format and the first thing is to upload it to cloud right so the very first thing you would want to do is create a signed s3 url now what is a signed s3 url aws is huge right they have a lot of capacity for compute storage and everything so what aws allows you to do is it says that okay we have our s3 bucket here right it's a bucket and it can virtually store a lot of data i won't say unlimited because nothing is unlimited but it can store a lot of data you cannot even imagine so what you do is you ask s3 that hey give me an address a signed url where whatever i put anything it just uploads right so the compute is s3 the infrastructure is of s3 and you don't have to worry about what's the file size here it could very well be one terabyte it could be very well one megabyte right if you route it through your own server then you have to make sure that your server is able to handle file of that size or you know you're streaming it properly to s3 then but so on but what you do is you create a signed url from s3 which is aws and then you upload it to an s3 bucket uploaded to s3 bucket this is where the fun begins because now you have this on the cloud right i'm gonna stick with s3 and aws and aws infrastructure because that's how we have deployed our pipeline at code dam but you can pretty much choose any cloud provider i believe most of them will provide some sort of functionality like this now what s3 allows you to do is it allows you to link a bucket with a service called sqs which is a queueing service from aws which stands for simple queuing service so this means let's say if i have a bucket which stores file number one something something file number two something something so on what it can do is on the next on the next upload whatever it is right it could be a video file it could be anything this s3 bucket can add a message to this queuing service and this sqs is just a simple queue right where you read the message and you know you can push the message as well you can read and push read and push so like a queue works you have and q is a data structure data structures are super important but not for competitive coding only i have created a video on competitive coding versus software so maybe you want to watch that as well but anyway what happens here is that you add certain elements and you can take them out from the front this is this is what sqs does but the only good thing or you know the only major reason you should use sqs over implementing your own queue is that this is highly scalable right this is highly scalable for receiving tons of messages or tons of inputs every single second as well which you're probably your data structure would struggle to do but anyway what you want to do is every single time an item is added to this s3 bucket you want to post push a message to this sqs and this is inbuilt behavior with aws right aws provides this that whenever a file is uploaded here a message gets added here right so this could be a message right this could be a message this could be a message let's say i uploaded three videos together so this queue has three messages and that's it that's where the job for s3 ends right now aws also provides you a service known as cloud watch and what this cloud watch can do is you can set an alarm inside cloud watch to do certain action whenever the queue size is greater than zero so you can create an alarm in aws that whenever the cue size is greater than zero i want to do something right or if the queue size is equal to zero then also i want to do something right these are the two actions which we need in cloudwatch and i'm going to tell you why we need these two actions this first action right here is responsible for launching an ec2 machine so that means whenever the queue size is greater than zero what we want to do is we want to launch an ec2 server now this ec2 server could be a powerful server as well it could be a mini server whatever this is but this server has two jobs the first job is to actually read a message from this queue right so this queue is there it wants to read a message from this queue and i i told you like we uploaded three files so three two one for example what this ec2 server did is that it just read the first message so it got this first message right here and it started processing this video file right and what happened was this message actually contained the path to the s3 bucket right so this ec2 machine knows where it needs to go and download that file that particular file so it processes it ec2 just launches if mpeg for example we use ffmpeg for processing you use ffmpeg and convert this into three four formats whatever you want for example 1080 720 360. if you want adaptive streaming and hls and stuff like that all of that can happen right here inside this ec2 right because it has got access to the main file via s3 which you uploaded here right so the moment you do that this ec2 now has access to this mp4 file which you uploaded right in a highly scalable way there is no component here which could break down or collapse under load right given that aws in itself isn't down and of course this does not has redundancy built in so if for example sqs is down you might have a loss of message and something like that but those things we are not discussing we are just discussing let's say if aws is up and we don't have any redundant system so what is the minimum set of which you need so once you have sqs delivered this message to ec2 for example and i'm assuming you are choosing a sufficiently easy to size so that this does not bottleneck or you know you maybe can reject just files over 10 gb or 20 gb whatever you process it with a tool like ffmpeg and create certain resolutions the next thing is you upload them back to a production s3 bucket right so for example this is your production s3 bucket and because you have processed a single video right here you can just upload them back here and update any database entries for example in our case we link that video to a particular video in the course right so you can update any database entry once you do that and you also have your video files processed in the production s3 bucket now this s3 bucket can sit behind cloudfront right cloudfront is a cdn service from amazon so all the traffic which receives for example for that video needs to go through cloud front where you can create these signed urls and cache this stuff right here on the cloud front for faster speeds right and then cloudfront can access it in a secure way from s3 so that is pretty much all it takes the important part here is that one ec2 or depending on how you want to architecture it in our case what we have done is one ec2 gets launched every time the q is greater than zero and what do we mean by that that means every single time this is running like one for once per minute or once for five minutes and every single time q is greater than zero one ec2 gets launched this ec2 takes up one message from the queue tries to process it if it is successful then this message is permanently removed if this fails or something happens then this message goes back and it's retried again after some time right it goes back at the end of the queue if this keeps on failing then we have another concept of dead letter q dlq which is something different which just stores all the failed videos in case somebody uploaded a text file for example then of course that will fail multiple times because it cannot be processed by ffmpeg but yeah once you do that this ec2 second ec2 is able to pass the second file right here the third file is passed by another ec2 and so on so you see what we have done essentially is distributed the compute part inside multiple ec2s and distributed the storage part and not exactly distributed but offloaded the storage part to s3 because s3 we know is solid right s3 has sla of 999 i think five nines right and this is like a solid guarantee that your files would be available for this much time i'm not sure if this is the exact number but something like that similarly the compute part ec2's a single ec2 is handling a single file it can take 5-10 minutes and be done with it and deploy it to s3 that is fine right here we use the auto-scaling group asg which creates all these multiple ec2's right so the cloud watch alarm is bounded with this auto scaling group which increases the amount of ec2 instances and whenever this ec2 instance is launched this this is launched from a template which knows what it needs to do right a snapshot or template whatever you want to call it thanks to this asg group and once that happens it reads it from sqs processes it deploys it to production s3 bucket which then is readily available for serving to end users right now if you go to any code damn course page for example you will see that you have an option to switch between 720 1080 360 480 qualities right so that is because we have formatted it into four formats and delivering it via s3 right all right so we have discussed q greater than zero for q equal to zero what happens is that this is zero like i said if there is truly no video to process and this is also zero if all of the ec2 for example we have two ec2s which are processing two videos right then also the queue size is zero right because there is no message in the queue which needs to be processed right so this is where we are hitting this condition and at this condition what we want to do is we want to attempt to delete these instances because q size is zero then the auto scaling group this asg asg and my handwriting is really bad but this auto scaling group right here it should try to down scale the ec2's the pool of ec2s it owns right this asg owns it and therefore it tries to downscale it downscaling i mean it tries to destroy these instances which are present but there is another thing which we can do here is we can protect this ec2 instance from being destroyed by setting a deletion lock right so ec2 instances can set a lock like this where auto scaling group will not be able to destroy it and when we will do it just before the processing starts right so this is the processing phase this is the lock and once the processing phase is over then you just unlock the ec2 instance right so the moment you the video is processed and stunned then you unlock it and asg sees that hey there are no queue items i want to be you know downscale it sees that the ec2 instance is unlocked it destroys it and you get a scalable architecture which upscales and downscales depending on what is your need at the time so this was a very high level overview of the architecture and how it works obviously this is more complicated in the code and how you place all the architecture bits and bytes but this is fundamentally how and what powers code them even today in terms of video processing and maybe a lot of other sites as well because this is an approach which is scalable if you want to make any changes you just need to make some changes in the ec2 scaling part and the best part is this whole process is very very cheap i'll just quickly give you an idea why because you will see that right here uploading to s3 has a cost of zero dollars right because s3 has ingress bandwidth as free that means if you are uploading something to s3 uploading to aws cloud aws does not charge you for that then sqs sure has some fees but it's relatively lower i mean it's not that much right then let's see cloud watch alarm also has some fees but it's again relatively lower these ec2 instances which you create need not to be reserved instances right these instances could be spot instances so what a spot instance is spot instance is additional compute capacity available to aws which they will sell at a 50 to 80 lower price right so if you pay for example i don't know like 10 cents an hour for an ec2 instance you can very well get it for like two cents or three cents an hour right so that's possible ec2 has the spot instance concept but the only drawback for this is aws can terminate this any time giving you a two minute notice but for this task it does not really matter let's say this easy to get instance gets terminated then this message this message which it took will get restored in the queue right and would be picked up by another ec2 instance later so that's completely fine so you get a very low cost here of processing on ec2 because you can have big machines processing but at a very relatively smaller cost then again this machine when this downloads files to ec2 from s3 that is also free because this is running inside aws infrastructure right so this part can be made free if you have ec2 then basically the infrastructure in a similar region finally when you upload it from ec2 to production s3 that is also free because like i said it's the charge for uploading it to basically s3 is zero and the only price which you pay realistically which which you should consider actually is when people start hitting your video right the data transfer cost because this is where amazon makes real money because their data charge is pretty high but rest of the stuff is pretty good right so yeah i mean this this model is i think i ran some numbers and if you consider for example amazon's media kit which is a all in baked in solution for this this is probably i don't exactly remember what the number was but it was close it was definitely above 10 times even 20 times cheaper right so this model when you deploy it yourself it's 20 times cheaper but again it increases the difficulty of your architecture but if you're like me back in the days if you're simply free and want to experiment with this this would actually make a great great system architecture project on your resume if you would like to have a job like that for example creating a full infrastructure full architecture design from scratch so i hope this video was beneficial to you if it was make sure you leave a like and subscribe to the channel that is all for this one i'm gonna see you in the next video really soon [Music] you
Info
Channel: Mehul - Codedamn
Views: 18,725
Rating: undefined out of 5
Keywords: web development, codedamn, mehul mohan, full stack development, full stack web development, Highly Scalable Video Processing Pipeline Explained!, How to build YouTube like website with AWS, How to Build YouTube, Build a YouTube Clone, Build YouTube from Scratch, mern stack project, amazon web services, aws, How video processing works, how to use aws, REACT JS for Beginners, How To Make Your Own Video Streaming Website Like YouTube, How to Build a Simple Web App Using AWS
Id: 1ecqqWRvgrU
Channel Id: undefined
Length: 16min 26sec (986 seconds)
Published: Sun Aug 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.