I made a *serverless* YouTube Bot in Python!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody this is Christian and today will be the start of a brand new project that I'm working on right now where I want to learn more about some of the most exciting and cutting-edge Technologies in it in this video we will explore the world of serverless computing and how it's changing the way how we build and deploy software nit I've been wanting to learn and study cloud computing and all the related Technologies for a pretty long time now but as someone who doesn't work as a software developer or a devops engineer I've I've always found these Technologies to be challenging to understand especially because it's hard to come up with great ideas for projects that could be interesting to do and not just basic hello world function but I finally discovered a cool and exciting project that has a practical use case for me and that's what I want to share with you today I've used a serverless function and the YouTube apis to create my own bot written in Python that will check whenever I publish a new video on my channel and then sends out a message to a Discord Channel just so in form everybody hey there's something new you should watch now I know what you might be thinking well there are already three tools out there that kind of do the same thing but that's not the point yeah the goal of this project was to learn something about serverless Computing and to understand how these Technologies work how they can be used to create modern applications and microservices this is by the way a really hot topic in it right now so whether you are a software developer or a devops engineer or just someone who is curious about new and exciting I.T stuff just like me I hope you will join me on this journey I'll be walking you through each step of the process and explain everything that I've done in detail so it probably will be a longer video but anyway let's just get started okay so before I start showing you the bot and go through the code and so on I first want to talk a bit about serverless Computing because that's actually the whole reason why I've started this project I wanted to learn what serverless Computing actually is and how it's going to be different from traditional applications that you would Deploy on servers and let's start with that because when we would develop and deploy applications in it it is usually necessary to have a server somewhere this could be a Linux server or Windows server or a container orchestrator or something like kubernetes regardless of this setup the application needs to run somewhere on an infrastructure which you are responsible for installing configuring and maintaining typically that would involve things like installing an operating system installing packages and running updates but even if you are using Cloud providers with some managed Services container orchestration that takes away some of the burden from you but they are still a significant amount of work required to ensure everything is operational and scaled as needed depending on how many requests your application needs to process that can be pretty costly and time consuming particularly in the cloud so what serverless Computing aims to do is solving this problem for you yeah if you want to develop and deploy an application or a service you don't need to think about the info structure anymore you can just write your code go to a serverless cloud provider upload it and whatever the application is doing in the background this is all processed just magically by a cloud provider no matter if it needs to handle millions of web requests one day and on another there are just a few it will always work and the best is you only pay for what you really use on most Cloud providers you can even run millions of serverless requests per month completely for free that's what got me most excited about it because that sounds just perfect for a simple python application like my YouTube bot because that should just process a couple of API requests and I don't need to set up a server for it I even don't need to pay for it at all but also in the IT industry serverless Computing is obviously a Hot Topic and it's slowly taking over a certain part of application development because it just makes sense yeah it makes server costs more predictable and reduces operational costs for infrastructure in general it is really a nice architecture to run the so-called microservices or apis and that fits perfectly into a modern and Agile development process at least for my point of view and if you want to get started with serverless Computing most public Cloud providers offer some kind of serverless feature these days for my YouTube bot I wanted something easy to get started and that comes with a great feature set as well so I reached out to my favorite Cloud partner digitalocean and by the way thanks digitaloce for sponsoring this video a digitalocean has a new feature that is called functions that's how they call their serverless offering which is now out of beta and runs very stable and smooth and by the way if you want to test and try our digitalocean functions you can run many serverless requests per month completely for free but they also have some other great services like droplets these are basically virtual machines managed kubernetes clusters managed app deployments S3 compatible storage called spaces we are by the way using this technology in my project as well with my code that you can find in the description by the way you can start off by receiving two hundred dollars of free credits for digital ocean so yeah just give it a try and find out how it works and once you've signed up on digitalocean it logged into your account you can start creating new serverless projects in the function sections and yeah let's just find out how easy it is I will give you a short demonstration and I've just created a new namespace that is running at a Frankfurt data center that's like a collection for all your serverless projects of course you can choose any other data center location that is nearby or where you would like to run your serverless functions in here you can also see the current pricing for the serverless functions in that namespace in my example it costs 0.000185 dollars per GB by seconds no that is something I probably should explain because you have 25 GB hours free per month in your account but how should you know what your project actually costs and whether it exceeds the free quota or not well on the official website of digitalocean there is a great FAQ section for the serverless pricing here you can see that the GB seconds indicates the amount of memory which is allocated to a task per second so it depends on these two factors how much memory does your function need and how long does it run that's described in this GB per second value if that's a bit too abstract for you there's also another example calculation to quickly find out how much you actually get here for free these 25 GB hours or 90 000 GB seconds or three functions quota you need to divide by the amount of memory and multiply by the function's runtime in second so that means if your function would run 100 milliseconds and uses 256 maybe byte memory you can run it 3.6 million times every single month without paying for it so that is huge and we will also take a look later at my function how much memory that needs and how long does it run sure teaser it is a bit more than 100 milliseconds and I also need a bit more memory but not really much more and when I consider my YouTube bot would need to run let's say every five minutes and that would run for about a second I'm far away from exceeding this quota So that obviously will change when the bot gets bigger or whether it needs to run more than one function but that already goes a bit too fun or just keep in mind you will get plenty of 3D resources for audio serverless functions and once your project grows you can calculate based on the metrics you see in the dashboard so how much resources you actually need to allocate and what that that will cost you at the end of the month for most people and small projects don't think about it for testing that probably won't be an issue at all so let's go back to the project and let's finish with this namespace and once we've done that that opens up the namespace environment in here we can just create a new function and before we can start adding our code in here we need to tell the function in which programming language we wanted the most popular one will probably be JavaScript with node.js but there are also some other languages supported such as go PHP and python as you might know I have no idea how these other programming languages work I am most familiar with python even though I'm not really an expert but you will see how easy it is to develop your own functions in whatever is your preferred language let's just start with a short example something like a hello world project and I also want to keep the web function checkbox enabled for now so this is needed when you want to execute your function with a web call so it will get a public URL from digitalocean servers that you can just execute with a tool like crawl or from any other application to communicate with your function now let's create it first and you can see this opens up some kind of code editor in your web browser it's actually pretty smooth and you can see there is a short example function that will receive your name from an argument that is submitted to the function and then it will just print out a greeting with a short text and your name that's a nice example you can also directly run this function and see what it does and if we do that you can see it just takes a short time to initialize and run our code and then we get the result hello stranger because we didn't provide any argument to this function called it will just greet us with stranger and you can also inspect some of the logs and the output of your call so one thing that I wondered about was how long such a simple function actually needs to run at the first time and this is another important concept of serverless computing that I need to explain because even though it is called serverless it still needs to run somewhere on an infrastructure of course or there's always a server in the background you just don't don't have access to it but digital ocean needs to initialize a new environment for this code to run so I guess it probably will deploy a small container somewhere on their cluster that loads the python environment it execute it and you can see that task just took about half a second even for such a simple function but the good thing about it is that only happens when you execute the function the first time yeah and this is also called a cold start in serverless Computing and that basically means the environment to run this function isn't existing yet and the cloud provider needs to initialize it first so once this is done and we run this function a second time the cold start is set to false and this function call no was much much faster when we now scroll up and change anything in the function like for example let's change this greeting variable and add a second line to it maybe we can also do something a bit more complex for example a new function that needs to be called in Python that will also add just a few more output to it and you can see when we now execute this function again it always needs another cold start once we change the code this will always be the case or it could also happen when you suddenly get a lot of requests in a short time and the cloud provider needs to spin up new instances to keep up with the increased load it might also need cold starts too now there are certain scenarios where you need to keep that in mind here if you are writing a function that is not changed often or it has a consistent load it probably won't be an issue but if you're writing applications that need to process transactions in real time like on a game server or on an online Marketplace there it is important to develop your application in a way that the cold start doesn't add too much latency to it for my small YouTube bot or any other projects like that we can ignore it because when the Youtube board pushes out a new notification for a video and that might happen a few seconds later nobody cares but yeah this is how a serverless Computing basically works it's actually pretty simple and nice and you can just develop your code in a web UI run it inspect logs and metrics all in a clean web interface but of course when you want to develop a bigger app you don't want to do it in a web UI at least I don't want to do it I just need my vs code with all the other extensions and tools that I'd like to work with and that's why digitalocean also provides you a CLI client to interact with their Cloud infrastructure so that's executed with the doctl command and there you can work with the serverless module in here which you first need to install by the way so we need to execute doctl serverless install this will download and install the serverless packages to your local workstation and then we also need to connect to our namespace that we're currently working in with the command doctl serverless connect and then just type in the name of your namespace Okay cool so we now can start developing our python project in my example I want to start developing my YouTube bot I'll go into my videos project folder there I will upload all of the project files later to my git repo of course I will leave you a link to this whole project in the video description so you can just use it as an example for your own YouTube or Discord I don't know to initialize a new project for digitalocean functions we can use this serverless init command in the doctl tool so that will create a new template and some sample code for us so we don't need to write it all ourselves I'm just calling it youtubeify demo2 and also don't forget to change the programming language to python because usually it's JavaScript by default and then we just can open this directory and you can see the command has created three different files in here the first one is the project.yaml file which contains all the information about this project the name of the packages and the functions in which programmer languages they are I'm going to clean up this file a little because we don't need all of this sample data in here and what we have now is a base template that contains a package here we can bundle multiple functions together I'm just going to change the name to youtubeify that's a good name for this package and there's also a function called hello I'm going to change that name to get latest video so this will be the main function that should get the latest video from my YouTube channel and then push it to Discord that's roughly the plan I also want to disable the web function so that my function doesn't get a public URL from digitalocean because otherwise anyone with this URL could just send a simple web request to it and then yeah execute it that's not what I want I will show you later how the function will be called by the way so this is our whole project template of course we need to create a function in the packages subfolder first I need to create another folder for the package called youtubeify it actually needs always to be the same name like you defined in your project.yaml and there I've just created a new file that I've called get latestvideo.pi and here I can start writing my code right so first we will start with a very simple test here so that way I can make sure before we are running the codes that the function deployment will work and we can in invoke it so I've added a simple hello world output in here and to upload this whole stuff to the cloud namespace we just need to use the D or CTL serverless module again and just add the command deploy and then just the name of the project directory where your project.yaml template and your files are located so now this might take a few seconds until everything is deployed but once it is ready we can invoke the function for a quick test with just the same tool yeah just change the command to serverless function invoke and then the name of your package slash the name of your function and as you can see that is a result of our call to validate whether this was really processed in the cloud and not somewhere locally we can just open the functions namespace in the web UI and inspect all the logs where you can see our function call and get more information about latency log files and so on and so yeah these were the basics of serverless computing and how to develop a function and upload it this is where most tutorials or videos would probably just stop and let you make your own mistakes now but I thought I'm going to show you this small YouTube bot project that I've started because that's a great example for showing you things like how to handle API calls or how to store secret tokens and scheduling functions because when you want to do something useful with your serverless functions utilizing public apis is incredible to extend the functionalities of your app so for example if you want to do something with YouTube Google has published the data API for YouTube where you can search for videos channels posts comments and there are of course thousands yeah maybe millions of public apis out there that you can use to write a simple tool to automate something there's so much you can do with it and in the case of the YouTube data API Google has a great documentation created there you can find out more about the parameters values queries you can do and so on but just to summarize what I've done to access YouTube API I have added a new API token in my Google Cloud interface I selected the YouTube data API and just copied this API key and inserted it into my code to authenticate the function and allow it to make calls to the YouTube API of course I have not done this in clear text because it's actually never a good idea to put tokens or authentication secrets into your code there is another function for storing such variables or secrets in serverless functions and that's using environment variables you can use and define environment variables in your project template so just start with environment and then pick a name for the variable that you want to use and then put your value in here just like with the YouTube API token what I also need is the channel ID from my YouTube channel because the bot should obviously know which channel it should get the video from otherwise it just would pick something random on YouTube and you can find the channel ID for any existing YouTube channel when you open it on YouTube for example this is my YouTube channel here and the channel ID is the last part of this URL when you are at the Channel's homepage page so I've just copied this value here as well and also added this as a separate environment variable into my project template these environment variables I can then access in the python code and this is done by importing the OS library for the YouTube API I also needed to add the Google API client by the way and then I just added two variables in my code where I'm going to store the values from these environment variables so that makes my code somewhat flexible if I'd like to have the bot running for a second Channel I could just clone this whole project and just change the channel ID environment variable without making any changes to the actual code okay so that were how to use the environment variables now let's start with writing the bot logic first I've added a new function that's getting the latest video as an object from a specific Channel ID and as you can see I've just used the GitHub co-pilot here to help me out a little with writing this stuff and the first statement will in initialize the YouTube data API with our API Keys stored from the environment variable and then I'm going to add a new search request as a parameter I'm going to add the channel ID order it by date and limit the output to one result maximum so this request will get us the latest video that was published on a channel and from this video object I'm storing the video ID into a new variable so this is unique identifier for that video I know I also wanted to have the ability to do something else with the details of this video like getting the title gotta get a thumbnail query other information and so on so I've added another request to get the videos meta data by calling another request and added the video ID as a parameter so this will return the video with all the metadata as an object and that I have added as a return statement of this function so when I call it with the channel ID from the environment variables this will get the latest published video from my YouTube channel including all the metadata and before I start doing anything else in here let's just return the video metadata in the output and do a quick test to see if everything works let's go back to my terminal and call the deploy command again and in the web UI you can see the new code has just been uploaded to the function we can now invoke it again and test it however as you can see it returned an error at the first time and after inspecting what what's happening in the log files you can see that python couldn't find the module named Google API client so this is because I've used a third-party library in Python which is not present in this serverless functions environment because there's just a small collection of python libraries that you have access to and if you want to add other modules or third-party libraries to your functions you will need to build it first with all the required libraries in the cloud and only then python is able to access them so I've searched a bit around for a solution here and on the digitaloceans GitHub account there were some good examples on how to solve this problem and what you need to do is you need to create a folder with your function's name instead of a single file and then name your main function just yeah main dot pi and then you can add a requirements.txt file that you might be familiar with in Python that's where you can Define all the third-party libraries that you want to import don't forget to check which version you are in installed currently on your workstation and then just use the same version to the requirements file as well and then we also need to create a build script so I'm just using the exact structure that I found in the examples which is basically just a simple shell script that will call a virtual environment for Python and then use the PIP installer to download and install all the libraries that you have defined in the requirements.txt if that is a bit too fast for you again you will find all this stuff on my personal GitHub repos video just go there check it out and yeah don't forget to follow me by the way and once you have done all of the the stuff you can call the deploy function again you just need to add a double dash remote build parameter at the end so this tells digitalocean to not build it locally but instead upload everything to the cloud and run your build script in the cloud environments where the function is running if you get an error here that might be because you have missed to add the execution permissions to the script so whether change mod plus X command will allow anyone to run the script file and then we can execute the deploy command again and yeah no it will work it might take some time until everything is ready but as you can see there was no successful so let's check if we can invoke the function now but as you can see I got another error that's present in the functions log of the web UI yeah it's always trial and error in idea that new error tells us that the action exceeds its time limit and that might be the case when your function needs more resources that you allocated in digitalocean all created functions will have a default timeout of 3 seconds and 256 megabyte memory allocated this is like a protection from infinite loops and so on but we can also raise it a little yeah keep in mind increasing the memory also means increasing the price per call just like I have explained it at the beginning but sometimes you just need a few more resources and in my example I found out that the function call works great with 512 megabytes of memory and the same timeout so I've changed this setting as well in the project template and at the timeout 3000 milliseconds and increase the memory to 512. of course once you have done any changes like that you must really deploy your function again and once that is finished you can see I was then able to invoke the function successfully and now it runs perfectly we have now the Json response from the API request from YouTube and yeah you can see that is my latest video that I've published by the way if you are really still watching this video right now why not give it a like And subscribe to my channel if you haven't already done it because that's always helping me to make more of these three tutorials for you and your support is highly appreciated okay back to the topic so now we have come already pretty far we are now able to query the YouTube API get the latest video from a specific Channel now it's time to start coding the logic to push a new video to Discord and what we now need to do is we need to somehow find out where we can store this video object somewhere because when you think about it once you execute the function and you will get the latest video from a channel you can't push it directly to Discord because then it would push the message every time you execute the function and of course we only want to push a new message to a channel when the video has been changed so when the channel uploaded a new video and that's why I've come up with the following logic here once our function got the latest video from a YouTube channel where we collect the ID the title the thumbnail and all the other metadata we need to find out was this video already published before and that we can achieve by storing what was the latest video ID from our last query and compare it to the current one to find out had this ID been changed because only then when there is a new video we wanted to push a message now that might also happen when the channel on YouTube deletes a video because then the ID also changes and you might get another push for the older video again but yeah there will be some high level bugs in here but unfortunately I haven't had the time yet to find a better logic and cover some of these edge cases but that will be the task for the coming weeks anyway to store the video ID from the last Theory you could use any type of storage yeah you could use a database you could use a cloud storage and what you can do is you can't store it somewhere in the code of the function because the environment where the functions are running are always immutable so we really need to store it somewhere outside the function logic and what I've decided to be the most simple solution for me is just use the spaces feature of digitalocean that's an object-based storage which is fully compatible to S3 by the way and you can access it in Python very easily and here we just store the latest video ID in a text file and when we find out okay this ID has been updated that means this video we are currently querying hasn't been published before then we can send out a new message to Discord after that we then need to update the latest video in the storage file of course and this function we can then just invoke every few minutes with a scheduler that runs somewhere in the digitalocean infrastructure that's by the way a very simple way to do that I will show you that in a minute but yeah so this is basic log magic let's now start coding this first I'm going to create a new S3 bucket via the digitalocean spaces I'm just giving it the same name like my YouTube bot and here I also need to add a spaces access key to authenticate to the storage across again I will store the credentials in another environment variable to keep it flexible and then we just need to get the spaces axis and secret key in a variable in Python we also need to import the bottle free and bottle core Library so these can be used to communicate with any S3 compatible storage and then I'm also going to add another function to get the latest video from it so this function will start connecting to the Frankfurt data center of digitalocean of course I also need to add the environment variables here to access it and then we will try to get a new object from this youtubeify space so I just need to change the name of the space accordingly because the GitHub copilot doesn't know it yet and I will name the text file to latestvideoid.txt to just make it very simple I also needed to add an exception Block in here so this will catch an error more specifically the no such key error which will occur when the file isn't existing in the S3 packet when we execute this function the first time there is no file existing of course so it will throw us this error and we somehow need to return the value known to the function caller and so what that basically means is the function will get the latest video ID from the S3 storage and if it's not existing it will return none now let's minimize this function and use it in our main one after getting the latest video from YouTube we need to get the latest video ID from S3 then do a simple comparison and if the latest video ID is not the same as the latest video ID from S3 then we can push an update to Discord and also update the latest video ID to S3 hope this logic somehow makes sense to you I also needed a function to store the latest video ID in S3 luckily the GitHub co-pilot could write the entire function completely for me and yeah it seems correct the first look now let's finish the main function and use this to store the latest video ID so that when we invoke it we know if the video was published or not and then we are almost finished we just need one more thing a function that will push the video with a message to Discord and that's also pretty straightforward I just added a new web hook to my test Discord which I've called you 2b5 demo2 this should be posted in a testing channel of course and now we just need to copy this webhook URL and let me quickly demonstrate what will happen when you make a web request to such a web hook actually let's try to use the AI command search in the warp terminal it should come up with a good solution here as well I've just asked it to make a web request to send out a text message to a Discord webhook while AI is so amazing guys let's just change the web hook URL in here and as you can see that's our message in this squad how awesome is that okay so that is everything that we need let's also add the request library to the python function so with that you can send such a simple web request and I'm going to add a webhook URL to the environment variables as well and store it in Python and then we just need to create another web request to send a new message let's pick this one here new video youtube.com URL and then the unique video URL can be generated by just using the watch location and in the v parameter that's where you will put the video's ID and when you just send this message to Discord so just a URL Discord should recognize it and it should also add the thumbnail automatically to it so we don't even need to query the thumbnail image and send it with our message we can just send the plain URL and then we are basically done with our function logic okay great so that was a lot of work I know the code might not be very sophisticated again I'm not a software developer but it should work as expected so let's now start deploying it once more to digitalocean and then just invoke the function and as you can see there is a new video output so on Discord we also have our message new video with URL the thumbnail everything is working great and when we know invoke the function again you can see there is no new video because we have stored the last video ID in the S3 bucket so here is our last video ID text file that contains the ID and if we delete that file manually now we can now trigger another message so we are basically simulating this video ID would have been changed and the next time the function is called we get a new message now we're almost done there's just one last thing that I quickly wanted to show you because I don't want to invoke this function myself all the time I want it to be fully automatic so when I go back to digitalocean there is a new menu button at the function settings which is called triggers and this can generate an automatic function call to this function let's give it a name scheduler and then call it with a cron expression so for example let's call it every five minutes that should be enough here you could also add any additional payload as well such as variables and so on but I don't need that so I will just leave it as it is and now I want to do one more test I want to delete my S3 file in the bucket again and then our bot should automatically send out a new message at the next scheduler call what it just has done yet so as you can see all of this is working just fine well I'm done for today so that was really incredible but I'm really excited about serverless Computing and this is just a very simple example yeah imagine you can write any modern applications with serverless Computing such as API calls microservices in it you can use your own functions to do automation tasks and even communicate from one function to another call it whether web request and so on this has so much potential to use it for real business applications without the need for setting up a server yourself or to run and maintain a kubernetes cluster it's all automatically scaled and you can just write your code deploy it let digital auction do all the magic for you and you just pay for what you use it's absolutely insane and I hope it was interesting for you again you can use my code as an example extend it if you want change it if you like be sure I will continue working on it and add a couple more features to it and at some day I will deploy this to my own Discord Community then I don't need any third-party Bots to push updates for my videos anymore so it's going to be an interesting and exciting project and this is just the beginning that should be all for today with done a lot of work I'm exhausted right now as you might see and yeah please tell me what do you think about serverless Computing just put it in the comments and as always thanks everybody for watching I will catch you in the next video take care bye bye
Info
Channel: Christian Lempa
Views: 8,081
Rating: undefined out of 5
Keywords:
Id: D_MUphj5tCM
Channel Id: undefined
Length: 33min 23sec (2003 seconds)
Published: Wed Jan 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.