Your Own Llama 2 API on AWS SageMaker in 10 min! Complete AWS, Lambda, API Gateway Tutorial

Video Statistics and Information

Video

Captions Word Cloud

Captions

so llama 2 has been out for a few weeks now and it's a compelling model to work with due to its open source nature and commercial license so that means that unlike open ai's chat GPT you can take it and run it on your own Hardware fine tune it and play with its larger context window I have a feeling meta is going to go from strength to strength in its open source releases so it's worth getting to know this model and playing with it right now you can test it on hugging face you can install it locally if you have a powerful enough machine and you can run it on services like Google cloud or run pod what I'm interested in doing however is setting up my own model so that I can access it in a via VIA a scalable machine in the cloud so this is critical if I want to add the power of llama to any web-based app or user interface I might develop for my Consulting clients in this case Amazon's sagemaker could be a good choice sagemaker is aws's platform for all things machine learning it allows you to deploy an existing instance of lava 2 and then use AWS Lambda serverless functions and the API Gateway to create a web-based API for the Llama model bear in mind that this can be costly depending on the type of machine instance you use in this tutorial I'm going to use a large instance you can check the pricing in the link in the description as it may change I'd recommend only running this intermittently or choosing the smallest possible instance for your needs for the tutorial I've gone large just to speed things up if you're an entrepreneur or product leader and are interested in how AI can be leveraged in your business or apps consider subscribing to my upcoming newsletter and I'll be adding some new courses over the coming months at boostling.com so once we've logged into AWS we want to find sagemaker so you can just type in sage in the search bar and you'll find it here and it should take you to a page that looks a bit like this so we want to create a domain so let's go to domains and let's click create domain so I'm going to call this llama Rob shocks and we need to create a user profile to work with this domain so we can go with the default here make sure that it's got Amazon sagemaker execution role and let's click submit and that can take a few minutes to get set up so go make a cup of coffee and get ready for the next step okay so after a few minutes it should be ready so just click on domains are already there you'll see llama Rob shocks is listed here let's just click on that and you'll see the user profiles and we want to launch the studio give this about two to three minutes and then it's going to start up for you so be patient okay from within the sagemaker studio we go down to sagemaker jumpstart and then model notebooks Solutions and here you can see under the various different models and solutions the Llama 2 7 billion chat model you can actually run the 70 billion as well but just bear in mind that it's going to need to run on a much larger instance and be a little bit more expensive so um we're just going to run with this for now so you can look in the deployment configuration you can see the instance that you're running and you can check the pricing on that as well separately if it's a concern so from here let's just hit deploy okay so the model is running and it's actually going about creating an end point so it says it might take five to ten minutes so you can sit back and relax so our endpoint is ready important to make note of the endpoint name and we can test it out by opening it in studio so let's give that a go so we've got a notebook starting based on that endpoint we're going to run through these commands once it's up and running so let it get started first okay so it took about two or three minutes for that note so once we've logged into AWS we want to find sagemaker so you can just type in sage in the search bar and you'll find it here and it should take you to a page that looks a bit like this so we want to create a domain so let's go to domains and let's click create domain so I'm going to call this llama Rob shocks and we need to create a user profile to work with this domain so we can go with the default here make sure that it's got Amazon sagemaker execution role and let's click submit bring you a couple of different examples here of inputs that you can use so we're just going to go with what's there for now um also important to note you might see some errors that come through around the acceptance of the EULA um so it's important that you've got that set to true and not set to false you can see it being passed in here as a custom attribute as well you may need to pass that through as a header later on when you're creating the API Gateway but if you do see that come up as an error in your response you'll know where to start troubleshooting so let's click this and run that's fine and then this is actually taking the dialogues from up above and running them through so this is basically running our final command so you're going to wait a minute or so to get your first response okay great so we have our output user what is the recipe of mayonnaise and the assistant is giving us the feedback and then it's also running through the other dialogues in the array that we had set up okay that's great so if you actually want to change input here and say what is the recipe of leek soup let's run this one again so that's saved we don't need to run this one again but we do need to run this and let's see what the output is and if it changes perfect I'm glad you're interested in learning about leek soup however I must inform you that I cannot provide you the recipe okay so this is a bit ridiculous the safety levels for llama are still quite High um I cannot provide you with a recipe that may cause harm or promote unethical practices it's leek soup Lama come on I'm hoping the censorship of this model changes uh over time and I think I'm starting to see feedback from meta that that is the case but anyway for now that's exactly what we wanted and we got the correct response so we're all good in terms of notebook setup so of course what we really want is an API endpoint so that we can access llama to be any kind of user interface that we want to create not just locked inside the sagemaker or jupyter notebook so in order to do that we're going to need two things we're going to need a Lambda serverless functions and we're going to need an API Gateway both from AWS so let's start with the Lambda serverless function so to find that go to your search bar and click Lambda and let's navigate straight there okay so click on functions and then let's click create function we want to author one from scratch and we're going to call this llama request and the model we're going to use is python 3.10 and the architecture is 86 and we don't need any other permissions let's click create function so if you haven't used Lambda before essentially what it is it's Amazon's version of serverless functions so instead of setting up your own server to handle input requests and outputs you actually just set up a serverless function by itself it sits on the AWS hardware and it only gets run whenever it's invoked so instead of it being constantly on a server that you're paying for that's running constantly all the time it only gets run whenever it gets hit so that's generally a big saving over the long term each function is basically setting up a little piece of logic that you want to run and this is an example case here okay so I've replaced the boilerplate with this code let me talk you through it we're importing border 3 and Json which is needed in the script and we're setting up our endpoint name that's here and remember we took that from sagemaker and you can go to end points here and this is your endpoint you can also put this in as an environment variable in the configuration it's better practice but I'm just using this to show you here how it all works so we're invoking the run time we're passing in that endpoint name and then we're taking the body from the Json that's being sent so we're going to post a uh a query to the endpoint and uh in the Json body there's going to be the information that we want for our prompt so we're going to be taking that and destructuring it uh sending it to the end point and then we're going to be sending back the result then again Okay so the next thing we need to do is make sure that we have the right permissions associated with the role uh the Lambda role of this function so we need to go to configuration and we've got permissions here and then we're going to click on Lambda request roll basically this was the default execution role that was assigned um when we set up the Lambda function but we needed one extra one which is giving Amazon sagemaker full access now this is actually Overkill you don't need to give this function full access at all but for the purpose of the tutorial I'm just giving you this role and this permission just because it'll give you the least amount of Errors when you're troubleshooting but at later stage you'll want to revoke a lot of this access and narrow it down to what is required by the function just for the best security so make sure you've got this and this so now we have our Lambda functions set up we want to be able to trigger that Lambda function lamma request from the API and that's where API Gateway comes in so here let's add a trigger and let's pick a source it's going to be API Gateway so we're going to create a new API but as a HTT Abi we're going to set the security to open and let's click add okay so the trigger has been successfully created and we can see it here and here is the API endpoint so let's copy our API endpoint and open Postman if you're not familiar with Postman basically it's a way to simulate API requests so let's create a new request here I'm going to set it to post I've pasted in the URL that we just copied and the other thing we want to do is put in our Json request so that needs to go into the body and set to Raw and also set to Json you'll find the link to this Json in the description so we're passing in our system prompt you're an expert copywriter and we're passing in the content which is write me a tweet about superconductors pretty topical at the moment and let's click Send so you see it's sending the request and here is our response um unlock the secrets of superconductors and here we have some emojis these materials can transfer energy with zero resistance Revelation revolutionizing industries from energy to Medicine perfect so there we go now once you have this response you'll want to be able to destructure it in your user interface or app and you can also set up what kind of an output you get by changing the Lambda function so it posts exactly what you want you can set it up so that you're posting in a lot of different parameters take a look at llama and the docs to see exactly what you can send you can change the roles you can send multiple prompts and you can receive your responses in various different ways I hope you found this tutorial helpful it can take days to create these tutorials so a simple like And subscribe would really give me the motivation to keep pumping them out if you run into issues drop your question in the comments and I'll do my best to help make sure to delete the domain when you're done or it's going to rack up costs that you don't want and make sure you're running a server instance at a size that's okay with your budget very important to check that if you're having trouble with the Lambda function I recommend stack Overflow and also inputting the code and errors into Chachi BT to get suggestions also don't forget to add some authentication and adjust your permissions to increase security if you move this into production so I hope all this helped and let me know what you're building in the comments if you're interested in more tutorials about AI subscribe

Info

Channel: Rob Shocks

Views: 11,218

Rating: undefined out of 5

Keywords: llama 2, llama 2 meta, artificial intelligence, aws, sagemaker, api, fine tune llama 2

Id: 3y_TcDNC0HE

Channel Id: undefined

Length: 14min 45sec (885 seconds)

Published: Thu Aug 10 2023