Run MemGPT with Local Open Source LLMs

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so mgpt just dropped a huge update and now it supports local llms and this is the only video you need to watch in order to learn how to set it up correctly so let's get started M GPT uses gp4 or GPT 3.5 through the open AIS API but in order to replace open AIS llms with a local llm you need to serve them through an API in this video we are going to do do that through a three step process in the first step we're going to use the ubaba text generation web UI to load and run a local llm in the next step we are going to use the API server within the text generation webui to host our llm and in the last step we're going to connect that API back to mgpt since my last video on mgpt there has been quite a few updates to the project so I'm going to walk you through a stepbystep process in the first step we need to install the text generation web UI okay so I'm going to go and create a new virtual environment for my text generation web UI so I'm going to use cond create Dash and text so this is going to be the name of the virtual environment and in this case I'm using python 3.11 okay so we'll simply accept this next next we need to activate our virtual environment I'm going to just copy this and now you can see the virtual environment is activated because we can see the text virtual environment in here okay next we need to go back to the text generation web UI repo and simply clone the repo here so we're going to copy this link and then we're going to go back to our terminal and just use the get clone command and then uh provide the name of the Depo now if you want to to put this in a specific folder so you can provide a name here so I'm going to just call it text gen let's hit enter okay now we need to Simply go to that folder so for that we're going to be using the CD command which is text change directory and then we can go into this text gen folder that we just created now if I type it LLS you will see all the files that are present in this folder next we need to install all the the required packages now this part is tricky and you need to be very careful about this one so depending on your system you need to select the proper requirements. text file so for example I have an M2 and I'm running this with apple silicon so I'm going to be using this specific text file to install all my dependencies so now the command that I'm using is this so pip install dashr requirements Apple silicon. txt if you're running this on in in Nvidia GPU just use this requirement. txt file if you're running this on an AMD machine then use this text file and there is also uh a CPU only file as well so you can use this for a CPU machine okay so the installation is complete now in order to start the text generation webui we are going to use the python server.py file and just run this okay so you can see that uh the server is running on Local Host with 7860 uh that's the port ID so you need to Simply copy this so this is going to open an interface like this then we need to go to the model and now download a model now one thing you want to be careful about is you want to use a model that has the ability uh to do function calling now if we go and look at the example that they have provided so they are using um aob botos L 270b this is basically for parser but we're going to just use the same model now the way you do it is you go to hugging face and look for the block uh because he's the guy who who will have all the models up here so what I'm going to do is I'm going to go to this uh 70 Bill model you can actually try this on a 7 Bill model but I just want to see how good this is going to be on a 70 Bill model then you need to copy the model ID from here then simply go back and paste it in here now if there are multiple branches and you want to use a specific Branch so you can type colon and then uh type main or other Branch name so I'm going to be using the main branch okay if you want to download uh the model in a specific uh quantization so you need to go to files and versions and then look for the the quantization level so in this case I want to download it the 4bit quantise model so I'm going to Simply select this one and copy the model name then we'll just go back to the text gen web UI paste that and simply hit download okay it's a very big file it's around 40 GB so the download is going to take a while for me if you're downloading a smaller model so it's going to be a lot quicker okay while this is being downloaded I'm going to go back and set up my virtual environment for mgpt okay so for mgpt I already um covered the installation process in one of my previous videos so link is going to be in the description of the video so basically we first need to clone the repo so I open yet another terminal and we're going to type get clone then the repo ID or uh link to the repo so I'm going to just call it uh M GPT hit enter then we need to uh change our working directory to this new folder that we created I'll just type in LS to see everything is in there now we need to create a new virtual environment now for this we're going to uh simply create the virtual environment using this command so cond create dashn um I'm calling the virtual environment mgpt and I'm going to be using python 3.10 0 so in my case uh we already have that verion environment on my machine but I'm going to just recreate the whole thing now similar to what I did for text generation webui I already activated the virtual environment next we need to install all the requirements so we're going to be using pip install dasr and then requirements.txt okay now in order to run um mgpd we're going to use Python and then we will call this Main Pi file so when we run this uh it it's able to find uh my previous configuration files so I'm going to say just don't use those Now by default it's uh looking to either use GPT 4 or GPT 3.5 so you can switch between them by using the up and down arrow but let's say GPT 4 uh now in this case I haven't set up my open a API key so it's going to potentially throw an error but there are uh a number of different um personas for the model itself so we're going to Simply go with the first one which is Sam right now you see that it's also asking for um user Persona so there are two different which are available so there is a chat and then uh CS PhD if you really want to understand what these are uh make sure you watch my previous video If you haven't but let's simply select the basic one and then it says video like to pre preload anything into uh me gpts archival and memory so I'm going to just simply say no right and if we hit enter it throws an error because we haven't set the open API key but in this video instead of using the open a uh API we are going to be setting up the local llm through an API so let me show you how to do that using the ubera text generation web UI next next uh we're going to load the local LM llm that we just downloaded within the uuga text generation webui but in this case we're going to be hosting that through an API now in order to do that we're going to be using this specific command okay this might look a lot more complicated than what we saw before but don't worry I'm going to walk you through it step by step so first we are starting the uuga text generation web UI server but in this case we want to use the API extension and we are also blocking a certain API Port so this is the port on which we are going to be hosting our server uh so in order to access it uh we will need to make calls on this specific board next we are going to load the model uh so this is the model that I downloaded now this specific model supports function calling and that is important because mgpt make use of function calling uh to perform different operations next we are loading this model using Lama CPP since I am on an M2 that's why I'm setting number of GPU layers to one if you are on an Nvidia GPU just look at the number of uh GPU layers you want to offload and set those in here so for example a 70 billion parameter model has a total of 83 uh different layers right so depending on your uh GPU and Hardware just uh set those the next item is the context length so I'm just setting it to 4096 other parameters are also going to be Hardware dependent so you want to set the number of CPU threads that you want to use as well as the batch size the model is going to be using I found this uh command on uh GitHub repo of the mgpd and this has been very useful uh their actual instructions are not really clear so all we need to do is just run this command Okay so as you can see it loaded the model for us and uh one thing to keep in mind is if you're running this on a GPU you want to make sure that the blast is set to one so that means it's using the GPU on your machine now we're going to be making calls uh to the Local Host on this specific port number now another thing to uh notice is if you want to access the graphical user interface just use uh this uh URL but in this case we are simply uh listening on this uh port number 50/50 now uh you can set this port number to whatever you want right but I just uh set it to uh 50/50 by default I think the uuga text generation web UI is using port number 5,000 okay so our API server is up and running and it's listening to all the incoming traffic now we need to do a couple of other things on the mgpt terminal so we need to set up two different environment variables and these commands are going to be provided in the video description so the first one is the backend type in this case we will need to set it up to web UI now the second one is the openi API base so if you set this up this basically over overwrites the open API key or you can unset the open API key if you want but if you notice we haven't set our open a API key now this is basically the IP address uh and the corresponding quote number where our um API server is running so we'll just set this up now in order to run mgpt with local lmm support we're going to be using this command so the first part is what we have seen before so python main.py now we're using this model flag so I'm I'm providing a model name which is the arborus family of models but this is going to be simply used for uh using the output parsel of the model right so even if you're using something like a 7 uh 7 billion model you could still use this output parser if you don't uh Define an output parser uh that will simply use the default one and they also recommend to use this no verify flag uh because that tells the model to use or that tells the M GPT to use uh local llms uh from my experiments even if you don't provide this flag it still works but let's simply hit enter now you will see this warning uh which says that you are running mgpt with this model which is not officially supported yet but just simply ignore this now let's hit enter now if you go back to our API where we were listening you will see that it started receiving um some data from mgpt now in my specific case I'm using this with a 70 billion parameter model so that's why it's going to take a while for it to generate responses but for a smaller model you will expect much quicker responses but it could run into some issues I have seen that sometimes is not able to uh generate correct responses for smaller models okay uh so we got a response or like the initial conversation so it says hello my name is Sam how can I assist you today all right so I'm going to just simply say Hi Sam do you know my name now it should be able to um get my name because of the user Persona it has but let's see okay so it's able to retrieve it so it says your name is Chad and that is the default persona uh that mgpt use so let's uh test the model and see if it it actually works okay so here is the response that uh uh it came up with I'm Sam a digital companion I don't identify as a male or female but my voice is soft and soothing okay so it thinks it has a voice or speech I'm curious empathetic and extraordinarily perceptive okay so uh this works actually with with the local LM now I'm using a much bigger model for 70 Bill model and the reason is uh because it gives us much more coherent responses compared to uh smaller models anyways uh this is how you run mgpt with a local llm on your local machine without the need of using open AI llms I hope you found this video useful if you run into any issues I put them in the comments below uh I'll try my best to respond to them let me know uh what are different topics you want me to cover in the next video thanks for watching and as always see you in the next one

Info

Channel: Prompt Engineering

Views: 14,260

Rating: undefined out of 5

Keywords: prompt engineering, Prompt Engineer, GPT-4, memgpt, Memory GPT, memgpt paper, memgpt repo, ai memory, adding memory, memgpt with local llms, memgpt with open source models

Id: KxBWU96zfBY

Channel Id: undefined

Length: 15min 40sec (940 seconds)

Published: Fri Oct 27 2023