I Built an interactive AI Talking Avatar Part 3 - Integrating Llama 3

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Applause] [Music] hello everyone and welcome back to my channel last week meta released their latest large language model the Llama 3 and I want to show you how I integrated llama 3 into my AI Avatar project so stick around and I'll show you how so if you haven't watched any of my previous videos regarding this AI Avatar project check out the description below so I'm very excited to integrate llama 3 but just to give everybody a some context what is llama 3 well it's an AI large language model developed by meta Ai and uh it has two versions one is the 8 billion and the other one is 70 billion parameter um so depending on how powerful your computer is your system um you can run the 8 billion um or the 70 billion parameter the 70 billion being you know supposed to be much better than the 8 billion um model now there are a lot of benefits as to why you want to run your um own model computer one will be you know for for development and customization if you want to train it with more data but your own data uh you could do that uh you could even it's also for privacy um so you you just hold your own data um there also some you know you got control and uh you know some more performance and reliability I guess um cuz sometimes like open AI uh it it it kind of like goes offline or something or you know sometimes I can't connect to it uh but for me the main the main benefit is really right now cost um I've been running my project using open AI GPT API and over time it does cost money to uh continue to run it and test uh using it um so you know I'm really happy that I'm able to do this because I've eliminated one part of that cost um into into um implementing my this project um and so running the uh the Llama 3 local model uh you know will just save me some money so I I want to show you how I did it so the first thing that we need to do is Google node Lama CPP and it should give you the one of the top results in the search and uh after you do that it will take you to your GitHub page now the nice thing about this project is it's very well documented it has this getting started guide the API references and and you know some more help it's actively being developed the last version is 28.9 which was released March 21st and it's very easy to uh to implement um so what I you know the easiest way to do it is like if you click on the getting started guide and it will give you the installation instructions so you need just need to run this npm command what that does is installs that package into your project and if you go to your package.json file you'll see that no llama CPP um with that current version uh will show up in there um in the depth um and so after that you're ready to implement it so for me I created this llama JS llama api.js and uh obviously you could call it whatever you like and so um I created as a class with a Constructor one of the parameters is the model path the model path really is just the um location of where you save the the uh the model locally the ex the file extension is a ggf and you could get this models uh from different sources uh reliably you you could get them at hugging face um and so a after you've set that path the next thing that you need to do is is just create a a new instance so new llama model and uh you could pass in that model path now in this video you'll see that I have this GPU layers um it's because I've enabled Cuda support but it also runs on um just using CPU but I'll tell you about the about the the the Cuda support a little bit later um so after you you you created the model the next thing is you have to create the contacts and so this is pretty straightforward you just go llama contacts and you pass in the model and then you created a session you want the Llama chat session uh if you're doing this some sort of chat mod uh experience and so you know the parameter there is like you just pass the context which you created in uh before that um and so that's done you know if if you just want to use it um in your own project just doing like a a simple chat bot this is all you need to do uh now for me um I have this uh get response uh function that that I created and uh what it does is uh this is like the the the stream version there are two versions the non-stream version and the stream version the non-stream version what that thing does is it waits for the whole response to be completed before sending it back to you um the stream version um what it does is it it sends everything in chunks and so um you know it gives you this feeling of of it running um quicker because you'll get the response much quicker before even trying to you know get to the end imagine that if you have um you know 200 words of a response and if you wait for that it's going to take a while but it kind of gives you in chunks like you know every 10 words or something like that um the response will be faster and you'll you'll be able to to read or or hear the the response faster um so that's what all this function does for me now uh uh regarding this uh session. prompt uh which is pretty much the you know the the function to to to give you the answer you could pass in additional parameters to it like temperature max tokens top P um and some there are some other parameters uh that are available if you look at the documentation um temperature just means that um how creative do you want the model to be when it gives you the answer zero being like no not not very creative or not creative at all uh and it will just kind of like you know give you the same response every single time for the same exact question um if you increase the temperature a little bit by0 5.6 uh8 you know it's it kind of kind of gives you a little bit more variety on how it answers that question Max tokens um it's just like how much how many tokens do you want it to reply at a maximum uh I think for llama 3 they count the tokens as like one word is a token so this could be like imagine like 256 256 words for that question that you asked which is a lot already right um anyway you could take a look at all of this optional parameters uh so you could customize it the way you want and then the the rest of the code here really is just for me parsing the uh the response in in into the style that I like you don't have to do any of that um you could see that this function called this. context. decode chunk that's all you need that's converts the uh um the token into string going back to the GPU layers um if you want to have Cuda support and it's I highly recommend recommend it it it's just um the the the main benefit that I've seen it is obviously speed um when I was running the the 7B model on just a CPU the the very first time that it it when it loads the model it takes a little bit of time uh um I think when I clocked it it was between 4 and 6 minutes um for the very first um um response to to come out but then after that since everything there's already you know it's already cash in memory and all that stuff then at that point when you ask it a question it's it's it's pretty quick you know um and then the second one is I although I installed Visual Studio before um I didn't realize that you know I didn't have that uh desktop deployment for C Plus+ and so it wasn't until I I saw that as an error where like I had to install it but and then after you install that um and then there's this other problem where it couldn't find some files and so I'll I'll I'll show you that uh description also um of of the solutions very simple you just copying some files and moving them over to another location well not moving copying them to a different location I didn't try to move that and then after that you know I just have to rerun uh the command and um you know I was able to install the the enable the Cuda support for it um and then from then on just running it is is just way better and uh that's about it uh that's all I did to uh integrate this llama CPP um into my AI Avatar project um at the end of this video I'm going to show you some samples and how it performs um and then I'll probably make another video uh just to compare um uh running it with llama 3 and the uh the GPT API um that I used before uh until then thanks for watching peace hi how can I help you who are the actors who played Batman that's an easy one the actors who played Batman are Michael Keaton 1989 to 1992 Val hilmer 1995 George cloy 1997 Christian Bale 2005 to 2012 Ben affle 2016 to 2017 Robert patson 2022 present I'm only considering the liveaction movies and TV shows not animated or voiceover roles who's the most famous according to various sources including box office results and popularity polls Christian Bale is often considered the most famous Batman actor he played the role in Christopher Nolan's The Dark Knight trilogy 2005 to 2012 which received widespread critical Acclaim and Commercial Success however based on General recognition and cultural impact Christian Bale is often regarded as one of the most well-known Batman actors who's the least favorite this one's a bit more subjective while opinions about the actor's performances can VAR Val Kilmer 1995 is often considered the least favorite Batman actor among fans and critics his portrayal in Batman Forever received mixed reviews with some finding his acting style not fitting well with the character thank you you're welcome it was my pleasure to help you with your Batman related questions if you have any more feel free to ask

Info

Channel: Rob Enriquez

Views: 469

Rating: undefined out of 5

Keywords: Llama 3, AI, open ai, chat gpt, microsoft, azure, neural, avatar, talking avatar, artificial intelligence, bing, bing chat, gpt 4, Elon Musk, LLM, large language model, local

Id: 1R3K5Q8SwLY

Channel Id: undefined

Length: 13min 46sec (826 seconds)

Published: Sun Apr 28 2024