How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I'm going to show you how to deploy your own instance of an open source large language model as an API now for those of you who have used chat GPT you'd know that the performance is you know pretty great um but there are many cases where you might rather just deploy your own version of an open source large language model and this could be because of data privacy issues and also specific industry use cases for which open source large language models might perform better so in this tutorial I'm going to walk through one such large language model which is open um called Dolly now dolly was released by data bricks a couple of months back and as you can see in this blog they state that it's the world's first truly open sourced instruction fine-tuned large model and so in order to deploy it we're going to be looking at hugging face where this model is hosted in in fact hugging face is a great source where a lot of Open Source larger models are hosted um and then we're going to be deploying this in AWS specifically using sagemaker um so if you know what Amazon sagemaker is uh basically you know even if you don't know it's okay but Amazon sagemaker is um a cloud-based machine learning platform and it's great for end-to-end ML Solutions hosted on the cloud um so it's quite easy you can just you know click on deploy here um and click on this and you'll see that you know there's there's a bunch of code um and then what you need to do is go to AWS sagemaker which you have already created a notebook of this particular instance type um but it's fairly easy to create a notebook and then if you go in here um you'll basically you know see the code that I've written where you see let's go here so this is basically you know deploying this dolly V2 7 billion parameter model um and then there is a few parameters that I have changed one of them is the instance type so if you go back here you'll see that the instance type here is you know this ml.g5.2x large which is basically a GPU um but I did not have access in my sage maker in my AWS to this type of model so I had to use this other model which you know it's the same thing but it's just like um I believe it's an older type of uh um CPU um and then I change the container start up startup Health checkout timeout to 3600 seconds so that it doesn't time out so now basically you know you can run this entire code um and it'll take a couple of minutes so I'm just gonna pause the video right now and then resume once this is done all right so now it looks like it's done deploying the endpoint so what we need to do is we need to go back to the Amazon sagemaker console um and then when we go to inference we will see when we click endpoints we'll see this model right here right so you see this endpoint has been created and it is currently in service next what we need to do is we need to go to this AWS console and then click on Lambda to create a serverless Lambda function all right so now we go to create function um I'll just call it hugging face llm um and then for the runtime I will choose python um the same just just to be safe is there any python version that's mentioned here not really so I'm just going to um just default python 3.10 um I'm going to use an existing role here have used before and then create function okay so the function is important here and the function basically is the one that calls the endpoint so I'm going to basically copy this function that I have paste it here and then I'm going to go to configuration at the endpoint name here and for the value I'm just going to choose the one the name here so copy that paste here and then save okay what else do we need um generally so on the general configuration I'm going to edit this time or to um to five minutes or three minutes just uh just to be safe so that it doesn't time out and then I'm going to go to test so for testing let's basically look at the format so there's inputs um so I'm just going to copy this in paste it here format Json it's going to call this as trial remove the comma um then test okay right so I this is a common mistake I didn't um I'd apply this so you need to deploy it okay and now we can test it again succeeded my name is Julian and I like to write about software Hardware blah blah blah okay great um so now what we can do is now we need to create the final API Gateway to connect and to call and the Lambda function so I'm going to build a rest API rest API a new API so you already created a few of them just at llm all right so in the actions I am going to create a method it's going to be a post method all right so and of course integrating with the Lambda function I'm going to save it okay see now I am going to deploy this just any stage name you can call it you know that's you're going to call it production right um let's call it production and save changes um make sure to copy this URL here and then we can save it as you know this URL right here um so this is just the URLs here there's a custom object I'm just going to give this um you know hello I'm going to delete all of this stuff here not to confuse us and then we let's see you know you get okay so you have not imported the requests package and let's see okay great hello I'm a new user to the site and I'm looking for some advice um I'm single no I'm not but that's what's here okay great so now we can use this the same way we would use any other chat completion API um I'm not going to go get into details about performance um and costs and latency in this video but I'll do that in some other videos um so watch out for that all right thank you for watching um if you like this and if you found this was useful please make sure to like And subscribe and I will see you all soon
Info
Channel: Data Science In Everyday Life
Views: 14,717
Rating: undefined out of 5
Keywords: AI, data science, aws, LLM, sagemaker
Id: a2A_CxrH3Ts
Channel Id: undefined
Length: 9min 28sec (568 seconds)
Published: Thu Jul 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.