Mistral-7B with LocalGPT: Chat with YOUR Documents

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

can you use Mistral 7B with local GPT let's find out the quick answer is yes and I'll show you how if you're new here a local GPT is my own project that lets you chat with your documents securely without compromising your privacy the beautiful part is that everything is running locally so you can be assured that no data ever leaves you a computer if you want to learn more about it I have a series of videos which are linked here in the GitHub repo so I'll put a link to it in the description okay so by default local GPT uses the number 27 billion model in the gguf format now if you have an Nvidia GPU on your computer I would highly recommend to use the gptq format file instead of gguf format in this video I'll just show you how to replace this model with the new Mistral 7 billion model okay so in order to download the model we are going to be using a quantized version and for that we are going to go go to the blocks hugging face Repository all right see if you go down you will see uh that the block has converted Mistral 7B instruct models into a GPT queue format as well as gguf format so I'm running this on nm2 with apple silicon so that's why I'm going to be using the gguf format but if you have an Nvidia GPU I would highly recommend to Simply use the gptq format now when we go in here we need two things the first is the repo ID so we can select this this is the model ID okay so I go back to my visual code Studio here I'm going to be updating the model ID in the constants.pi file and let's simply update the model name next we need to select a specific version of the model itself so here you will see a list of uh different quantization of the model now depending on your Hardware select the quantization level that is supported by your system so in this case I'm going to be using the 8-Bit quantized model but you can select either 4 bit 3-bit or two bit so let's simply copy this let me just copy this now here I'm back to my visual code Studio and we will replace the model base name so this is going to be the new model base name next we simply need to save the changes now if you don't know what I'm doing right now I would highly recommend to watch the previous videos that will give you a lot of context I will put a link to a playlist on all the videos on local GPT for testing purposes we are going to be using the Orca paper Okay so the first step is to ingest this file so I have this Source documents folder and in here I pasted the workout paper now in order to ingest the file I'll create a vector database so we're going to run the following command so the command is going to be python in just.pi now if you are running this on an M1 or M2 I would highly recommend to pass MPS as the OS type if you're running this on a CPU you can pass CPU as the device type for NVIDIA GPU you don't have to do anything okay so the injection is complete and we have a total of 193 chunks of text if you are on MPS uh just simply ignore this warning now before running the model and start asking questions on our Orca paper let me highlight a couple of changes that I have made in here now to add support for the Mistral uh 7B model I added a new command line option that you know you can use so it's called Model type the default value is Lemma but it will also support Mistral and non-lemma so basically this will Define The Prompt template that the model is going to use now here if you go to the prompt template utilities file so I added another option specifically for Mistral 7B model so we're going to be using this prompt template for Mistral 7B and that's the one that we are using in here so we have the start and end token basically added to the prompt now in the original missile prompt there's no system prompt but I added because uh this will limit the model to answer questions related to the documents that we have provided so in this specific applications uh we don't want the model to generate answers from its own knowledge but rather simply stick to the documents that we are providing so these are the couple of additional changes in local GPT when it comes to the Mistral models so now let me show you how to run the model okay so the these were the changes related to Mistral 7B in local GPT now in order to start a conversation with your documents using this new model so we are going to run the local GPT file again since I'm running this on MPS so I'll provide MPS as an option Now by default the model type is Lama but if you want to use the uh prompt template of the Mistral model so we are going to Simply pass Mistral as an option okay so this is using llama CPP and if we check here so we are indeed using the Mistral 7B instruct model okay so here's the question that I asked on the archive paper how does Orca compares to chat GPT on the BB BBH Benchmark and it actually gave me a pretty decent answer okay so in fact it's actually generating the answer based on uh this specific chunk so if you see here it says that Orca exceeds chargpt by 11 and 5.8 percent on this ambiguation query and snacks respectively right so this is the uh I guess the chunk that specifically is being used by uh the embedding model and then return to the Mistral 7B to generate this answer now it also uh added some extra information and that's actually coming from the following I guess paragraph So it seems like the embedding model put these two together in a single chunk and the model is using this information as well to give us an answer now the great thing is that the answers that we're getting are accurate okay so this was a quick video on how to use Mistral 7B with local GPT from my initial experiments I'm really happy with the results that I'm getting keep in mind that the results that you get are really dependent on the type of chunks that are written by the embedding model that's fine selection of the embedding model as well as how you decide to pre-process your model including how do you split them into different chunks is going to play a crucial role in the performance of your retriever augmented generation system I have a few videos on these topics if you want to learn more links are going to be in the description of the video overall it's a great model and I would recommend everybody to check it out now if you're using local GPT in your own projects and are looking for guidance so I do offer Consulting Services check out the description of the video for more details as always thanks for watching and see you in the next one

Info

Channel: Prompt Engineering

Views: 39,053

Rating: undefined out of 5

Keywords: prompt engineering, Prompt Engineer, natural language processing, GPT-4, chatgpt for pdf files, ChatGPT for PDF, langchain openai, langchain in python, embeddings stable diffusion, Text Embeddings, langchain demo, long chain tutorial, langchain javascript, vectorstorage, chroma, train gpt on documents, train gpt on your data, train openai with own data, langchain tutorial, how to train gpt-3, embeddings, langchain ai, Mistral-7B, Mistral AI

Id: ASpageg8nPw

Channel Id: undefined

Length: 8min 14sec (494 seconds)

Published: Tue Oct 03 2023