AutoGen with Local LLMs | Get Rid of OpenAI API Keys

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys this is your host prompt engineer in my earlier videos we have seen how to run autogen framework and it was very interesting because we have so many use cases for autogen and the applications of this is huge there are so many cases you can see coding examples GPT for in human users we have chess playing solving group task visualization by group chat you know there are tons and tons of example but there is one thing that is certain the cost associated with running this framework is huge and I pointed uh the cost in many of my videos and my viewers have also pointed out that until and unless the the factor of cost is taken into account this won't help in a long run but we have found a solution to that and the name of the solution is LM studio so in this video I am going to use use LM studio and use our open source llms locally instead of the open AI gb4 API so instead of wasting money and trying out using the gbd4 you can now run the open- source large language models locally we are going to see this in just a second but if we head over to the hugging phase and models we can see that these are the models there are so many models are coming every day every day in and out and you can see all these models open source models you can run these models well some models are base models or pre-trained models some models are fine-tune uh for example U The Zephyr is a fine tune model of on mistel model so what I'm saying is that there are different models and built for different purposes you can test out everything using the LM studio so let's see how to use use this LM Studio but in this video specifically I am going to show you how to run autogen with LM studio and you can run any open- Source large language models so I'm repeating myself let's get started so here uh we download for Mac LM studio so mine is in M2 so I have already downloaded this because I have been testing this out so what you need to do is just click here and then uh it will start downloading so let me stop this download because I already have downloaded this now after download it you need to install this and after installing you would see this interface so this is the LM Studio here we if we go to home we can search up any models here for example we can search mril and press enter here you will see the uh most famous one you can see that this is the most famous One the the blocks mral 7B instruct and on the right you can download the quantization different quantizations uh I have already downloaded this Q5 here as it is showing download here and this is the 5gb model it is already downloaded let me clear this result and let me download another model but first let me see a good model from the hugging face what should I download let's see uh let's see a good model that I can show you or maybe download zafire I think zafire should be good zph yr so we can just download this go to LM studio and z p yr press enter and under zafire the blocks Zafar you can see different quantization here so let me download this Q5 so we press press on download and it will start up the download you can click here and you can see the status of the downloads been going on so let us uh allow it to get downloaded but as you can see I have already downloaded two models for you first is the mistel 7B instruct and other is the Samantha mistel so let me click here let me minimize this and now let us go to this T tab uh chat AI chat so here what you can do is that you can load up a model so let me eject the model first and let me show you let me pull up a new chat here and select a model to load so you can just click any model to load for example let's go with Samantha mistel so it is loading up as you can see 5gb model it is loading up pretty fast on my Mac and let's switch wait for it to load and then we would see uh how to integrate it with autogen as well but first let us see on the chat level how can we chat with different models um on the LM studio so this is loaded up so let's say Hi how are [Music] you it should spit up something I'm good thank you for asking as you can see this is very fast uh please tell me a long joke you can see the generation speed it's pretty good once upon a time there was a Tiny Village near a deep dark Forest the villagers were afraid of the forest because they believe it is haunted by monsters and ghosts despite this belief one day little boy named Jack decided to explore the forest he ventured deeply into the woods encountering all sorts of creatures along the way talking FK a wise old turtle and even a magical I think you're not interested in my story but you can see the generation speed is pretty fast so I'm going to stop it and now I'm going to say I am sad let's see because this is Samantha she is a very friendly robot and that can you know has a little bit of feeling inside it and it can reply as a human I understand it can be difficult at times but remember it's okay to be to feel this way okay so enough of this let me stop generating this and now if we head over to visual studio so this was the code that we have seen on using Auto genen so let me pull up a new terminal here so on the terminal first and foremost what you need to do is PIP install all uh byy Auto gen py Auto gen so this you need to install this is already installed then you need to install pip install XG boost so this is also installed so after installing these two and if you run this code then you would get some output so for example here I'm just going to go through this code from autogen import assistant agent and user agent so these are two classes we import from Auto agent this is the config list we declare this model as gbd4 and API ke is this and as you can see I have again written this configure list because this is the magic this is the new configure list that we are going to use which means we are not going to use gbd4 here so I'm just going to command this out so this is the config list we are going to to use we will see how to integrate with the LM Studio but first let's go through the code so LM config lmm llm config is this config list and we are putting this config list here we are creating an instance of assistant so assistant is assistant agent name is assistant and llm config is llm config next we're creating an instance of the user proxy so user proxy is this name user proxy input mode is never so we are never going to uh interfere we are going to let the robot uh let this code handle us or handle our function maximum consecutive reply this you can increase now I had this fear of my API cost Rising but now you know what I can put 100 here no issue I don't I'm not afraid of any Loop here now so user proxy initiate chat and put in the assistant and messages that write me a python function to print uh the number one uh 1 through 30 so this is the code that we can now use but let's look how we did this instead of using open a CH GPD or GPT 4 we are now using the Local Host in order to make that ready you need to go to LM studio now we have already seen the chat function here we need to go to this local server and here we need to load up uh This Server for example let's load up Mel in this case it's loading up the model also we can see there is a download which is going on it is still going on a little slow internet on my site maybe I think because I'm using multiple laptops here so this is loading of the model we can see RAM usage here it's a 5gb model that's done and now we we need to put this server Port 1 2 3 4 that is already by default but we need to keep this on so as to you know if you can read this so as to allow to accept different requests from different Origins and put these two on and start the server and once we start the server we can see that this is loaded up now which means that now we can access this so when you say API type is open AI API Bas is Local Host 1 2 3 4 if you can see here this is the 1 2 3 4 and we need to put in version so you can see here uh Local Host 1 2 3 4 version one this is the models this chat completion completion so you can get everything here so and you can put in API key is null so let's try it out let's save this and let's try this out so I'm going to write python three or app.py and press enter you can see write me a python function to print number 1 through 30 if you head over to this location you can see you should see some outputs here in the server logs in the LM Studio let's wait for the output okay it is still running as you can see it has already started accumulated tokens here so print so you can see that step by step this system is running okay some timeout issue here let's cancel this and try this one maybe okay crl s and now let us start this again this time this is the function you want to write a code to check if the prime number so this was pretty fast as you can see function to write a prime number this is a function this is what the Assistant gave to the user proxy now the code has been executed the user proxy executed the code and it came true which means uh it checks if the number checking for prime number of 79 and print is prime 79 it checked if the 79 number is prime so it's Prime true so it's pretty fast now I have not tested this out but let's test if it is able to generate a snake game I hope it is to not be possible but let's see let's see okay so let's run this okay there is some issue okay but I will create a detail video on more such use cases but this is how you get started you know we have completely eliminated the need of gb4 here and instead we use Local Host now here you can see we have uh three models that I have downloaded in the chat you can go and see there are three models zapire Samantha mistel you can go to this location and download any models of your choice so right now we have mystal 7B the new models actually code Lama 7B instruct we have open our card so these are the I think uh highest downloads and these are liked by the users therefore these are showing on the top but this is yeah this is LM studio now I would like to summarize everything that we have done so as we know we have been using the autogen framework and for using the autogen framework we need API Keys which incur which has a cost associated with it now we thought of using open source llms in place of the GPT 4 and we were able to find a software LM Studio using that you can create an endpoint and you can access that endpoint uh with the help of this code and now you can eliminate the need of gp4 and instead use uh Local Host so of course you need a good system to work on and otherwise the results will be pretty slow but again if there is a big project that you're working on where there it requires multiple trials to be done then the cost Associated is very huge and this will help you mitigate the cost uh let me know what projects you did let me know if you were able to set this up on your system let me know if you were happy with the results that you get of course it depends on the model that you that is being used gp4 is very good model but for small creators and small uh testers like us we don't need to Anchor so much cost right now so having said that uh this is your host prompt engineer signing off I'll will be back with more videos on autogen and LM Studio stay subscribed to my channel if you want to learn how to use runps for running open source llm models or other models you can check out this video in the next video I will try to use autogen open source models and run pods so that is going to be interesting stay subscribe have a nice day this is your host promt engineer bye-bye
Info
Channel: Prompt Engineer
Views: 18,406
Rating: undefined out of 5
Keywords: openai, autogpt, autogen, ai, chatgpt, gpt-3.5, gpt-4, tutorial, handson
Id: xa5irTrK5n4
Channel Id: undefined
Length: 15min 33sec (933 seconds)
Published: Thu Oct 19 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.