How to Deploy ML Solutions with FastAPI, Docker, & AWS

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

this is the fifth video in a larger series on full stack data science in the previous video I walked through the development of a modelbased Search tool for my YouTube videos here I'm going to discuss how we can take this tool and deploy it into a production environment I'll start with an overview of key Concepts and then dive into the example code and if you're new here welcome I'm sha I make videos about data science and Entrepreneurship and if you enjoy this video please consider subscribing that's a great no cost way you can support me in all the videos that I make when we think of machine learning we probably think neural networks or other models that allow us to make predictions although these are a core part of the field a machine learning model on its own isn't something that provides a whole lot of value in virtually all situations in order for a machine learning model to provide value it needs to be deployed into the real world I Define this deployment process as taking a machine learning model and turning it into a machine learning solution we start by developing the model which consists of taking data passing it into a machine learning algorithm and obtaining a model from that training process deployment can look a lot of different ways it could simply be making predictions available to programmers and other developers it could be using the model to power a website or a mobile application and then finally it could be embedding the model into a larger business process or piece of software but the key Point here is that the model that comes out of the training algorithm and is sitting on your laptop doesn't provide a whole lot of value however the model when integrated into a website into a piece of software or made available to end users through an API is something that provides value natural question is how can we deploy these Solutions while there are countless ways to do this in this video I'm going to talk about a simple three-step strategy for deployment that is popular among data scientists and machine learning Engineers the first step is to create an API in other words we create an interface for programs to communicate and interact with our model what this looks like is we take our model and we wrap it in this API which is represented by this box here and then people can send requests to the API and receive responses from it and so in the case of a model the request will be the inputs of the model and the response will be the outputs two popular libraries for doing this in Python are flask and fast API the next step is to take the API and put it in a container here container is a technical word referring to a Docker container which is a lightweight wrapper around a piece of software that captures all its dependencies and makes it super portable so you can easily run that piece of software across multiple machines and then finally we deploy the solution and since we put everything into a container now it's super easy to run that container on someone else's computer some server you manage or most commonly into the cloud The Big Three Cloud providers of course are AWS Azure and gcp so with this high level overview let's see what this looks like in code here I'm going to walk through how we can use fast API Docker and AWS to deploy this semantic Search tool that I developed in the previous video this video is going to be pretty Hands-On so I'm not going to talk too much about fast API Docker or AWS from a conceptual point of view but if those are things you're interested in let me know in the comments and then I'll make some follow-up videos specifically about those tools okay so here we're going to create a search API with fast API then we're going to create a Docker image for that API then we'll push that image to the docker Hub and then finally we'll use that Docker image to deploy a container on aws's elastic container service so let's start with the first one creating the search API with fast API which is this python library that does all these great things apparently and using it to write this example here it was super easy for me to learn and they have this great tutorial for those just getting started I walked through this to make this example and it probably took me like an hour or something to do it so super easy to learn especially if you've been coding in Python for a while anyway coming back to the code first thing we want to do is we're going to make a file called main.py and we're going to import some libraries so we'll import fast API to create the API and then the rest of these libraries are so we can Implement our search function like we did in the previous video so we use polers to import data about all my YouTube videos we use the sentence Transformers library to compute text embeddings we use psyit learn to compute the distance between a users query in all the videos on my channel then I have this other file called functions. piy that has this return search results function which I Define here and I'm not going to go into the details CU it's not critical for the deployment process but essentially what it does is that it takes in a user's query and then it'll spit out the top search results for that query coming back to the main script the first thing we do is I'll Define the embedding model that we're going to use from the sentence Transformers Library so by default the library will download the model when it's Run for the first time but here to avoid that I just save the model locally so what that looks like is here we have the main python file that we were just looking at and then I have this folder called data and in it I have this folder which contains all the files for this embedding model and then we have this parquet file which has all the data about my YouTube videos we can then load the model from file like this we can load the video index using this line of code and this is the same way we did it in the previous video and then we can import the Manhattan distance from sklearn which I did like this and again since I talked about this at length in the previous video I'm not going to get into the details of how the Search tool works here but you can check out that video if you're interested so everything we just did had nothing to do with the API this was just the implementation of that search function to create the API we create this object called app and and it's this fast API object and then we can simply create these API operations here I'm strictly defining these get requests which allows users to send requests to the API and receive back responses the other most common one is a put request which is often used to send data to an API and load it in the back end for example if we wanted to update this parquet file in some way we could use a put request to do that anyway here I Define three operations and that's done using this syntax here where we have this decorator What's Happening Here is we're saying that we wanted to find this get request at this end point here for the API and it's going to operate based on this python function this is a common practice where you have the root end point be a health check so it doesn't take in any input parameters but anytime someone calls this m point they'll receive back this string response of health check okay and a similar thing here so I created another endpoint called info which just gives some information about the API so it doesn't take any inputs but it returns back the name which I called YT search and I have a description for it which is search API for shots Levy's YouTube videos this one's not completely necessary but you can imagine if you have multiple users using this API and maybe you have multiple apis having an info endpoint can be helpful but the one we care most about is this search endpoint What's Happening Here is we're defining this search function that takes in a query from the get request and then it'll pass it into this function return search result indexes defined in this functions. piy file it'll pass it in as well as the video index the embedding model and the distance metric and then we can use the output of this function to return the search results and so a lot of fanciness here maybe it's not super easy to read but what's happening is we use the select method to pick out the title and video Columns of the data frame the video index we use this collect method because we didn't actually load the data frame into memory because we used scan paret instead of read paret and then once we load this in we pick out the indexes from this search result and then finally we convert that data frame to a dictionary that dictionary will have two Fields one corresponding to the title and the other field corresponding to the video IDs and it'll have up to five search results for each field that's the code to make the API was super easy it's great for me as someone who's very comfortable with python and knows very little about a lot of other programming languages especially ones that have to do with web development but now we can run this API on my local machine and we can interact with it so the way that looks is make sure we're in the right direction we see we have this app folder and then we can go into app and we see that we have our main.py file to run that we can do fast API Dev main.py all right so now it's running running on this port 8000 had to make a couple of changes to the main file so I had to add this app in front of functions and then I had to remove app from these paths here because it was running from a different directory than a previous version of the code but that should be working now now we'll see that we have this little URL here which we can copy and then I have a notebook here that allows us to test the API the URL is already here and it's at Port 8000 by default and then we want to talk to the search endpoint of the API that we created so we can actually run this you have this query called text embeddings simply explain and then we can pass that into our API and then we can see we get a response so it took about 1 second so that's actually pretty long long but maybe if I run it again it'll be faster yeah so maybe that first one is just slow but run it a second time 76 milliseconds and then we can see the search results here just kind of taking a step back the response in its raw form looks like this so it's just a text in the Json format which is basically a dictionary and then we can use this Json library to convert that text into a proper python dictionary and then we can access the different fields of it like this so these are all the titles from the top five search results and then we can also look at the video IDs so that looks like that now we've confirmed the API is working locally so coming back to the slides the next thing we want to do is to create a Docker image for the API the steps to make a Docker image from a fast API API is available on their documentation and it's a few simple steps we'll create a app directory so a folder called app we'll create an empty init.py file and then we'll create our main.py file we've actually already done this if we go back we see that the app directory already exists and we have the main.py file and we already have the init.py file taking one step out of that directory we see that this app folder is in another folder with a few other files so we have this requirements. text file which is shown here this is just your typical requirements file that you might have for any kind of python code here you can see we have all the different libraries we used in the main.py file so we have fast API polers sentence Transformers psychic learn and numpy we also have this Docker file which is essentially the instructions for creating the docker image this consists of a few key steps we start by importing a base image there are hundreds of thousands of Docker images available on the docker Hub the one we're importing here is the official python image version 3.10 and so we can see that on the docker Hub here it's an official image so I guess it's by Docker it's called Python and then they're all these tags so these are all different versions of this image we did 3.10 which I guess is going to be this one the next thing we're going to do is change the working directory you know imagine you just installed Linux on a machine or something so the working directory is just going to start as the root and then we can change the working directory to this folder called code next we can copy in the requirements file into the docker image so we take the requirements file from our local directory here and put it onto the images directory this code directory here and then once we've moved the requirements file onto the image we'll install all the requirements we do that with this line of code here and then I have this line of code to add this code app to the python path this might not be necessary because we actually changed this main.py file so I'm going to actually try to comment this out and see if it still works next we're going to add this app directory to the image so we're going to move it from our local machine to this code subdirectory on the docker image finally we Define a command that will be run automatically whenever the container is spun up to build the docker image we run docker build we specify the tag so we'll give it a name I'll call it YT search image test and then oh I forgot this we got to specify where the docker file is so it's going to be in the current directory so we do that and now it's building the docker image okay so now the image is done building you can see it took about maybe a minute to run the run times are here the longest was installing all the python libraries and so now if we go over to the docker desktop app we see that this image is here under the images tab so I have a previous version of the image but the one we just created is called YT search image- test and then we can actually run this image the way to do that is we can go let me clear it out so we do Docker run and then we specify the container name I'll put YT search container test and then we specify the port we want it to run at do 8080 yep cuz that's what I put here in the docker file finally we'll specify the image here's the code Docker run- d-- name of the docker container and then we specify the port and then specify the image so we can run that that's not the right image name it is YT search image test now the container is running locally we can actually see that if we go to our image here and it says in use this is the container that is using that image alternatively we can just go to this containers Tab and we can see all the containers saved locally here we can see that the container stopped running and that means something went wrong and we can see that the folder with the model in it is not a local folder so it didn't run because the model folder wasn't on the python path we could add the data subdirectory to the python path but alternatively we can just go back to the main.py file and add app to these directory names here and the reason we need this is that we Define the working directory as code all the code is going to run relative to this directory here that means if the python script is looking for this model path you have to put app here because it's running from code alternatively you could add this python path thing here but I don't want to do that to make this Docker file more simple now let's try to run it again so we'll build the image so that was nice and quick and then we'll run the container so now the container is running click this and indeed it is running successfully so we can click on it and we can see that it's running at this URL here we can test this if we go over to the super notebook we test the API locally so now let's test the docker container running locally so this should be the correct path so yep and so it's the same thing we have the URL and then we have the endpoint name we're going to use the search operation we'll Define a query and then we'll make a API call it ran slower because it's essentially talking to a different machine but we can see that the API response is the same we can similarly call the info endpoint or we can call the base endpoint as well we can see those generate different responses now that we've created the docker image next we're going to push the image to the docker Hub the reason we want to do this is that once the image is on the docker Hub it makes it easy to deploy to any cloud service that you like so here specifically we'll be using aws's elastic container service but dockerhub integrates with other Cloud providers so not just AWS but also gcp I'm sure it also connects with Azure even though that's not something I've checked to push the image to the docker Hub the first thing we want to do is create a new repository so I already have one called YT search but let's go ahead and create a new one from scratch so we'll call this one YT search demo and then we'll say demo of deploy deploying semantic search for YouTube videos and then we'll leave it as public and we'll hit create reposit doesn't have a category so let's just do that I'll call it machine learning AI so that's it the repository is made now what we can do is we can go back to our terminal we can actually list out the docker images like this we can see that these are all the images that I have saved locally what we want to do is push this one to the Docker hub first thing we need to do is Tag the image so it matches the repository name on the docker Hub essentially we're going to create a new image and this is going to have the same name as that repository we just created if we go back here we see that repo is called shahin which is my dockerhub username and then the name of the repo we just created YT search- demo and then the next thing we need to put is the name of the local image so here we had YT search image- test so I actually have it backwards we need to put the local image name first so YT search image- test and then we need to put the dockerhub repo name we've now created a new image which we can see here called shahin t/t search- demo and now we can just push it to the docker Hub so that's really easy so Docker push shahen T YT search demo and then we can add a tag to it as well but that's not not necessary so it's using the default tag of latest and now we can see that it's pushing up to the docker Hub so now it's done running if we go back to the docker Hub and hit refresh we see now the image is here now that we've pushed the image to the dockerhub the last step is we can now deploy a container on AWS using their elastic container service the way to do that is we can go to our AWS account if you don't have a AWS account you'll need to make one for this tutorial and then we can go to elastic container service so we can just type in ECS and it should pop up once we do that we'll see we come to a screen like this we can see that I already have a cluster running but let's start one from scratch the first thing we can do is go over to task definitions and click this create new task definition I'll call this one YT search demo we'll scroll down to infrastructure requirements we'll select AWS fargate as opposed to Amazon ec2 instances and the upside of fargate is that you don't have to worry about managing the infrastructure yourself that's all handled behind the scenes and you can just worry about getting your container running and using it as a surfice the next important thing is selecting the operating system and architecture this will depend on the system that you're running for Mac they use Arm 64 so that's the architecture and then Linux is the operating system of our image next we can go to the task size I'll leave it at one CPU but I'll actually bump down the memory to 2 gb and then task roll you can actually leave this as none if this is your first time running it and it'll automatically create this task roll called ECS task execution rle but since that already exists for me I'll go ahead and click that now we're going to specify the container details so I'll call this YT search container demo and then here we'll put the URL of the image which we grab from the docker Hub so I'll grab this and then I'll add the tag of latest and we'll leave it as a essential container port number we'll leave at 80 we'll leave all this as the same we'll leave all this stuff as default we won't add any environment variables we won't add any environment files and then logging leave that all as the default and then we have a bunch of these optional things that we can set like a health check startup dependency ordering container type timeouts so on and so forth we can also configure the storage so there's ephemeral storage so just like shortterm I'll just leave this as the default of 21 we can also add external storage using this add volumes thing which is good if you wanted to talk to some external data source and there's this monitoring Tab and tags tab but not going to touch any of that just going to keep it super simple here and then I'm going to hit create all right so now the task definition has been successfully created now we can go here and we'll see we have this new task definition now we can go over to clusters and we'll hit create cluster I'll call this one YT search cluster demo again we'll use AWS fargate for the infrastructure and then we won't touch the monitoring in the tags hit create now it's spinning up the cluster so this might take a bit so now the cluster has been created we can click on this so we see that the cluster is running but there's nothing running on it there are a few things we can do we can create services or we can create tasks services are good for the we service kind of like this API we're creating a task is better for something that's more of a batch process that runs like once at a predictable time increment but here we'll create a service so to do that we'll click services and then click this create button we're going to use the existing cluster the Wht search cluster demo click on launch type and we'll leave that as fargate and latest we'll make the application type a service we'll specify the family of the task definition and then we can give a service name call it YouTube search API demo we'll leave the service type as replica we'll have the desired tasks as one deployment options we'll leave those as default deployment failure detection leave that as default we won't do service connect I'm not sure what that is service Discovery networking we'll actually leave all this the same we'll use an existing Security Group and then we can enable load balancing if we like but I won't do that here service auto scaling we can automatically increase the number of containers that are running or decrease the number of containers that are running again we can configure the data and what not but we're not going to touch any of that and we'll just hit create so now it's deploying the search API so the API has been successfully deployed it took like 5 minutes or something but now if we scroll down and we click this YouTube search API demo something like this will pop up and we can go over to tasks and we can click this task here and we can see that it'll have a public IP address so what we can do is copy this public IP and then I have a another piece of code here and we'll just paste in the public IP and then we'll make an API call so just 100 milliseconds to make the API call it's actually faster making the API call to AWS than locally which is pretty interesting so this ran just fine here but one thing I had to do yesterday to get this working was go to the YouTube search API demo click on configuration and networking and then go down to security groups that'll open this VPC dashboard thing and I had to add this rule that allowed all inbound traffic from my IP address specifically so if you do that it'll you know have some default Security Group and then you'll hit edit inbound rules and then you can add a additional rule that allows all inbound traffic from my IP you can also have custom IPS which you specify one by one you can have any IP version 4 or any IP version 6 so it wasn't working for me but once I added this inbound rule it was working just fine now that the API is deployed on AWS it makes it a lot easier to integrate this functionality this Search tool into a wide range of applications to demonstrate that I'm going to spin up a gradio user interface that can talk to the API I'll just run this whole thing and this is essentially the same thing that I walked through in the previous video of the series so if you're curious about the details be sure to check that out but now we can see that this user interface got spun up we can search something like full stack data science and we see that search results are coming up this is the great thing about running the core functionality on AWS now we just have this lightweight front end that can interact with the API and return search results through a web interface so you can see that the other videos in this series are popping up in the search results we can search other things like finetuning language models and I had a typo but it doesn't matter and we can see all the content on fine tuning and large language models pops up and I'll just call out that all the code I walked through is freely available on GitHub so if you go to my YouTube blog repository and the full stack data science subfolder you'll see that all this code is available in this ml engineering folder and then you can check out other videos in this series and all the medium articles associated with this series this was supposed to be the last video of this series but then I got a comment from cool worship 6704 on my video on building the data pipeline for this project and they were asking how would you automate this entire process and so that's a really good question and it wasn't something I originally was going to cover but since you have this question here I assume other people have the same question and so just to recap what we did here is we took the Search tool wrapped it in an API put that into a Docker container and deployed it onto AWS so now users and applications can interact with the Search tool but one limitation of how I coded things here is that the video index the videos that are available in the search API is static it's a snapshot from a couple of weeks ago when I made the video on making data pipelines so the OB vious Next Step here would be to create another container service that automates the whole data Pipeline on some sort of time Cadence whether it's every night or every week or whatever it might be and then feed the results of that process and update the search API so that new videos will be populated in the search tool so that's going to be the focus of the next video of this series so that brings us to the end this video was a lot more Hands-On than a lot of my other content I'm experimenting with new new format so let me know what you thought in the comment section below if you want me to dig deeper into any of the tools or Technologies discussed in this video let me know and I can make follow-up videos on those topics and as always thank you so much for your time and thanks for watching

Info

Channel: Shaw Talebi

Views: 4,032

Rating: undefined out of 5

Keywords: data science, full stack, ml engineering, deploy ML model, how to, how to deploy ML model, api, fastapi, docker, docker hub, aws, ecs, aws ecs, elastic container service, python, tutorial, code walkthrough, how to deploy ml model using fastapi, deploy machine learning model aws, machine learning engineering, how to deploy machine learning model in production, AI, data scientist, ML engineer, machine learning, semantic search, gradio, text embeddings, step by step, data engineering

Id: pJ_nCklQ65w

Channel Id: undefined

Length: 28min 48sec (1728 seconds)

Published: Sat May 18 2024