Run Llama3 Model Locally with 9 Lines of Code Using Ollama, LangChain and Prompt Engineering (Basic)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello friends welcome back uh to my channel so in this episode I'll be going over a longer version of Lama 3 Model that I did last time so last time I did a tutorial of Lama 3 Model so basically how to use apis with Lama 3 Integrations served through all Lama uh instance and how to load the model to your local machine and RW inference locally and uh prompt engineering and all that Stu event feedback was quite positive so thanks everyone for um you know watching that video and give me feedbacks I had to blur out uh you know the actual prompts and some of the data because I did sign NDA with the vendor when I worked on it so I don't want to breach any contract or anything so apologies if those blur well sorry about that um but this time I will basically still slow down a little bit and um go over uh every single step what the tools are and then dive into details on what this specific uh thing does exactly and how to do it step by step for you and uh yeah so that that'll be the plan for today and um before we get started into you know all the implementation details I do want to point out that why you should use an open source uh lolm like Lama 3 instead of opening at apis or chat gbts there's a lot of advantages to using one open source model instead of uh the commercialized apis um the chpt foro turbo and all that stuff um one is definitely um cost so if it's open source that means it is free to use for everyone so you don't need to spend any money on it and these large language models are often times quite capable as well especially with the new innovations that came out over the last several weeks Lama 3 has been releasing updates on regular basis ever since it's launched uh last month so it's been improving pretty consistently over time and I do see very strong potential in basically the Lama 3 Performance versus the state-ofthe-art hbt performance so the open source models are not necessarily performing worse than the commercialized apis given the recent Lama 3 launches I do think Lama 3 is performing quite well compared to uh trpt Al apis and it's free so why don't you uh save some money on that front and get almost the comparable performance as a return so definitely uh cost savings is one factor one big factor that you should consider using open source model and the second um reason why you should use an open source model would be uh data safety so whenever you call open it as API to find Tunes with the data or get the response from you know chat gbt or chat or open AI um um apis you are making this protocol through the internet connection basically you are making an API call to open AI server like over the cloud and you're sending all of your data and your information over the internet to their servers no matter where that server is basically you are risking your data through two two ways one is through this communication potentially your data can breach by any that actors that's kind of stealing your data in between the service cost and second you don't know how open AI is going to use your data sure you can sign contracts beforehand and limiting their usage of your data but uh they still have your data nonetheless right at sitting on their server and they still have access to it um um if they mistakenly uh want to so definitely your data is communicated throughout the process and you are sending your data around with all those API costs so if you want to keep your data safe on your local machine completely and make all the large Dage model inferences cost without making any internet you know cost like API costs across over the Internet having open source models and run all your models and data inference calls and conversation chats locally would help keep your data clean oh keep your data safe so that's two main reasons why you should consider using open source models and the third reason I think this one is also depending how much like fine tuning or like how much modification you would like to make with your model if you would like to say further retrain or uh pre-train or fine tune or like you know on top of of any existing large model and then just pre-train your spefic task on top of it then having open source model would give you the most control over how you want to find your model basically there's a lot of flexibility in how you can use the model you can uh fine tuna you can take the last layer and take the embeddings and do additional you know task spe specific pre-rain you can you know take the middle layer and then do some iterations or like do some experimentations with it you can do a lot of things with the open source model where you just have full control over how you want to use it whereas if you use the commercialized apis yes I think they do have specific apis for fine tuning and pre-training and this and that but it's still going to be through like internet cost and it's going to be very costly and you don't know exactly what's going on underneath the hood so you lose a lot of the control and visibility into how you're developing model uh basically so here are the three reasons why you should consider using the open source large r model instead of using commercialized apis provided by open a cost data safety and uh control or like visibility and uh without further Ado let's dive into this specific uh episode on how to use Lama 3 Model locally um uh a real world example so here's my terminal the first thing that I would like to uh introduce is this uh instance instance so this is basically install on your local machine and it's an instance that once it's up and running you can serve basically it serves all of these models within the same instance so you can have access to all of these L 3 Lama 253 mistr all these models from the same survey instance it's super cool and convenient to use and you just download it and it also has a conversation interface without having AI so if you just do something like Lama 3 so here we go it's a conversation engine it's a bot so say hi to my audience so it's saying hi for me how's your day going so far let's make it an awesome one together what's on your mind got any questions topics or just on the chat I'm all ears so basically you have your own little uh Chat bar uh sitting on your machine and you can interact with it like this and it's entirely um served on your local machine and um you can have conversations with it and uh let's see if we can search on anything so I don't know what is the Medan house by being let's say California I'm looking at houses uh past few weeks so this is what I'm in what I'm interested oh so it does have some well okay of course L 3 is trained on the entire internet data as well so yeah so you can search you can ask for information you can ask questions you can you know talk to it you can interact with it uh like this so it has all the information as if it's like a Google or like a chat gbt so you can use it that way okay yeah around 2.4 million that's way too expensive anyway so this is AMA instance highly suggest if you and use it for your application it's super easy and cool to use highly recommend it and the next tool that I would like to you uh introduce is this land chain you must have you know seen this from other videos out there but it's basically the goto tool for using large Range model these days it has a lot of like different products but really what you need is just like two lines from this which I will go into in My Demo very soon and this yeah okay then that from L CH so how to guide so I would say maybe just like build a simple application just go over this you just need to pip install L chain and you can ignore all of this you don't need API key we doing any of this and here the example is with open AI but it'll require have an API key so ignore all of this do not try it but basically the concept is the same so you just initialize the model here you're going to see that we will be using AMA instance for initialization and it has integration with L chain and then you just invoke the model through your prompt which I will show you in a little while and Lama 3 so this is model that I will be uh demoing today it's claimed to to be I think the best open source model to date and it is uh quite powerful like I from my own experience it's a lot better than llama 2 in terms of uh performance but do notice that these so let me go ahead and start my Jupiter notebook so jupit Jupiter notebook is a interactive environment so I just removed probably the most important part of the prompts of the code here so I think I should be fine uh with demoing all of this to you this time without blurring anything but basically the way to use it so I'm not going to show you the input output because I will have to blur those later on anyways but basically the way you can use it is when you make sure your AMA instance is up and running you you just need to um install L chain which I think includes Lan chain community and also Lan chain core and these are the other packages that I use uh in this very specific project but that's it just AMA and land chain those are the only two dependencies you want to make sure you have installed locally and here's the way of how you want to init initialize with Lama 3 Model so you want to use the AMA that you're importing from here and then you set it to be Lama 3 and the fun part or the convenient the cool part is that if you want to use Lama two you just change this from Lama 3 to Lama 2 then you're good to go it's that simple so basically you have um yeah we can ignore these two just import uh these two lines and then set your llm to be a Lama with model is Lama 3 and um that's how you indicate which model you want to use and next uh we are going to do some prompting here so so let me remove all the relevant code and show you how many lines you need really to get this app running you really don't need them so we're going to prompt that enter the first prompt so basically chat prompt template if we look it up the API the doc string looks like this so chat prompt template there's a template where system AI human so this is a prompt template for chat models and basically you want to first of all tell the system what you are I don't think these are good examples okay yeah this is all you need the other examples are pretty bad so basically you just want to tell what the system you want to Prime the system to be what you want it to be so this is like what you're telling the system before you initiate any of the actual conversation so in our case we're going to tell the system that you are a boss that is going to help me transform my manufacturing process menu instruction doc into a structured Excel spreadsheet with these and these and these columns so this is what you want to tell your Bot any real conversations and make sure that you have this empty string at the end after uh in between every single sentences because otherwise it does not work you want to input this prompt as if you are talking to the bot and if you are um you know saying multiple sentences you want to separate by this empty string empty line here without um so it's one string and then this part is uh the user input and this is indicating um the input file for this bot so basically after you prime the system to do what you want it to do through this prompt here you're giving this system your own input and this will go in here this is how the large Range model will know what data to look at and this is what we're telling it to do given the data so I'm going to go ahead remove the second part so basically after it's that simple after we import these packages and you indicate which model you want to use in this case L 3 and you set up a prompt to tell the system what you want it to do and you leave a sort of parameter for user input right here and that's your chat prompt template that's your template down and this is how you indicate this is how you chain this prompt together with your larange model basically we're chaining this prompt on top of this llm which is the Lama 3 amaama instance model now how you run it I do have some data like this data so let's run on this and in here I have a util function that takes in a file path and then I basically just read it and output the string and I have a kind of input example here already and I'm just going to so in here I have additional dependencies because of all these utils that I have so it is running now and notice that these models do take a very long time models they are really large language models so it takes a very long time to run them locally usually and it could take anywhere from 10 seconds to a minute for each inference call depending on your locom machine spec and um usually even if let's say we download this model locally and try to serve them locally usually our Norm uh regular CPU machines will not be powerful enough for timely delivery unless you deploy this on the cloud or you have a GPU machine locally so due to limitation of our CPU machines it's going to be a limiting factor when it comes to Performance so yeah so I think this is all you need to run Lama 3 Model locally so uh import these dependencies and initiate your Lama 3 Model like this using o Lama make sure your AMA instance is up and running and indicate your prompt like this one for system so this is what you tell the system what you want it to do given the input data and leave one parameter for passing your input so that the model will know that given this input what you wanted to do with it and then you just chain it like this so this how you chain it and after you have your data basically you just invoke it like this with text being your input and that's it so this is all the code code that you need in order to run Lama 3 Model locally let's see 1 two 3 four I'll count this as three lines here five six seven eight nine so like something like nine lines of code three that is all you need to run Lama 3 Model locally and uh all the tools that I mentioned here AMA instance land chain Lama 3 and this is the model spec and I can certainly share these lines of code I'll just post this code in the comment section below because this is all original code from me and it's all my code so I can share it if I want to and will be the slower version of how to use Lama 3 Model locally in eight lines eight to 10 lines basically you just import all your dependencies and prompt it using the chat prompt template and chain it and uh invoke it with your input data and that's it that's all you need to run Lama 3 Model locally and also remember you have this really cool conversational uh engine that can be started locally in the terminal and you can talk to it whenever you want and it's your um private little bot and it's completely safe withing on your local machine so that's pretty cool to have all right so that would be everything that I have for you today tell me what you think what you want to hear about what you want me to what do you want me to build with large language models I can build it out and then make a video about it and share everything all the code the data and everything because it's all my work and hope you find this video helpful and let me know what you want to hear more about next all right I'll see you next time ciao
Info
Channel: Hujia(who-ja)
Views: 1,006
Rating: undefined out of 5
Keywords:
Id: LLLYceLbMkY
Channel Id: undefined
Length: 18min 57sec (1137 seconds)
Published: Fri May 31 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.