Run Stable Diffusion as an API on AWS SageMaker

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys how's it going I'm going to walk you through how to host stable diffusion on AWS Sage maker and how to run inference using an endpoint that we get from Sage maker so I've created this notebook and a lot of the code from this is based on an article written by somebody who works at hugging face he wrote it about a year ago so I've turned this article into some code and changed some of his code as well and I'm going to walk you through how to set this up so first of all I am an AWS sagemaker Studio you can actually do this on your computer on vs code or cursor or Google colab you don't need a GPU to set this up but I'm doing this in this environment because the checkpoint files are huge and I don't want to download them on my computer so I'm using this environment so to get started you're going to want to pip install Sage maker and hugging face so I'm going to do that and then we're going to need to install V for this particular environment basically I'm doing this because when we use AWS we need to provide our access key our secret key and the region name and so you can hard code those but I want to off escate them in this tutorial so you guys don't steal them and use them so what this code is doing is it's creating a session for Bodo 3 and it's using Sage maker to create a session so all this is doing is it's basically initializing the AWS Sage maker session and so it's going to give us a bucket and a session region and we're going to use this later on in our example and so what we need to do now is we're going to create a d a directory with the name code and then in that directory we're going to create a requirements.txt file with diffusers and Transformers and then we're going to within the code directory create an inference dopy script so this script is the inference API call and so we can change the parameters in here but what we need to pass in is the prompt and you can also pass in other variables like the number of inference steps so let's run these let's make the code directory so it's going to appear here and then we're going to create the requirements.txt and then we're going to create the inference py script which is our script to run inference and we've run those and if we go in here we can see that there's an inference P script and then there's a requirements.txt which is perfect so let's go back to this and what this code block is doing is it's downloading the model checkpoint from hugging face so you're going to need to get a hugging face token and add yours in here I've already put this in the environment variable so it's going to work but just keep that in mind so it'll take a couple of seconds or actually it's already done and it should appear here shortly this will be the model which has the scheduler the tokenizer basically everything that from hugging face that it needs to run and so we're going to copy the code directory which has the requirements.txt and the inference script into the model directory so let's do that so it's basically is going to copy the contents of here and put it into here and it's done that and so we can just make sure it worked by looking for code directory here and yep you can see that it has those in there and then what this is going to do is this is going to compress everything in our model directory into a tarball is basically just a compressed file that has all of our files within it so let's do that and and it should create a model. tar file here and so we're going to actually upload this compressed tar file to S3 so that's what this code does here it's basically just uploading the the file here into S3 so let's do that and then I think that this may take a second so let's see so we can see that this is busy so I'm going to fast for it until it's done because if I remember correctly this is going to actually take a couple of minutes so I'll fast forward that okay so it finished uploading it to S3 and this is the S3 URL that it gave us so we can make sure that this is right by going to S3 and then let's refresh this so this is S3 and AWS so we are going to look for this Sage maker Us West to so we just just uploaded this just now so let's click this and we can see that we uploaded stable diffusion version 1.4 into the bucket and so that's good and so what this does now is this is going to use the bucket to create an endpoint on Sage maker and it's going to use this instance type which is a T4 GPU it has one T4 GPU and so that's what this code does here and so let's go ahead and make sure this lines up because you see where it says model data here this is the path to the S3 file that we just uploaded our model to so this needs to match this so it looks like it does but you're going to basically want to copy this into here and this is the execution R uh basically this is just like a ro saying that we have permission to execute this and create the stagemaker endpoint you can get this from your IIM console just go to AWS and go to IIM and look for look for the role that has access to Sage maker and so let's do this and this is going to take a couple of minutes as well this should take about five minutes and I will return once it's done okay guys it's finally done running this took about five minutes so the endpoint is created and let's make sure that the endpoint exists by going into AWS Sage maker so right now we're in S3 we need to go to Sage maker so let's click Sage maker and let's go to inference and then go to end points and there says no uh resources that's because we're in the wrong region remember we used Us West to so if we click this we should see a endpoint so perfect we just now created this so let's click into this to learn more about it so this is the name of the endpoint this is actual URL that you can invoke it through and it says here this that this is real time so this GPU instance is always running it's always on which can get pretty expensive and I'm going to talk about that in a couple of minutes but first let's run this by copying this and then let's go back here so this last script code is just to run the API and to do that all you need to do is change out this URL here and then run this so we're going to run this first and this is going to time how long it's going to take and it's also going to show the image and the prompt is a dog trying to catch a Flying Pizza art so let's run this with our endpoint and see what we get and then I'm going to walk you through that code a little bit more it's pretty straightforward but I'll walk you through it nonetheless so it took 7 seconds and this is the image it gave us which is a dog catching a pizza which is pretty cool and so what this is doing it's pretty simple by default the response you're going to get is going to be in base 64 so what this is doing is it's some helper functions to decode and convert it so that we can actually visualize the image within this notebook so so and this is just timing this is the important part we need to initialize Bodo 3 with sage maker runtime and if you're running all of this in inide of your python server which you can do you're going to need to add the AWS account ID as well as your AWS secret access ID and the region in here so if you're running this in inside of your own server and you get that error just be sure to add those aw us values in here and so this is our prompt this is the number of images we're basically just creating a payload this is the endpoint and this is the important part we're invoking the endpoint with the endpoint name with the payload and this is just processing the image so that we can visualize it as well as calculating how long it's taking so if you want to use this feel free to just copy this code and make a few small changes and you should be able to use this as an endo now a couple more things to talk about is the pricing we used the one of the smaller cheaper gpus and because it's always on leaving this on is going to cost us $300 a month which is quite expensive so whenever you don't want to use this you really need to be sure that you delete this and you can do that by checking this and then going to delete and then clicking delete otherwise you're going to get a bill for $300 at the end of the month and so if you have an interest in stable diffusion and using it as an API within your app I'm actually offering my calendar which you can book sometime we're running a service to run stable diffusion API as a service for you guys so if you're interested in how you can make use of that so you don't have to pay $300 and pay much less than that feel free to book a Time on here and I'll put this link in the description as well so yeah let me know if you have any questions at all I'm happy to answer them and yeah hopefully you're from you soon thanks so much bye
Info
Channel: Tosh Velaga
Views: 1,849
Rating: undefined out of 5
Keywords:
Id: yC6kTYcjZdk
Channel Id: undefined
Length: 10min 36sec (636 seconds)
Published: Sun Nov 05 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.