LocalGPT: OFFLINE CHAT FOR YOUR FILES [Installation & Code Walkthrough]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video is going to be a little different we're going to be looking at one of my own projects that I am calling local GPT now this project lets you chat with your own documents on your own device using open source GPT models so once you download the required embeddings and the llm No data leaves your computer and everything is 100 private now this project is inspired by this another Amazing Project called private GPT which implemented an information retrieval system using local embeddings there were two things that I wanted to improve on the private GPT in the original implementation they were using Lama CPP embeddings and for llm they were using GPT for all J model these are great choices however both of them run on CPU and that's why it is extremely slow you know in my local GPT version we're going to be replacing both those components first we're going to be using instructor embeddings which are state-of-the-art embeddings at the moment and it runs on GPU and then for the llm we're going to be replacing it with vocunia 7B which is one of the best open source edit them available now the great thing about both of them is that they run on GPU and it's going to be much faster you can easily replace both llm as well as the embedding models with anything that you want so for the rest of the video I am going to walk you through at the installation process as well as we will have a detailed walk through the code to see how it's set up and how you can make changes if you want now please keep in mind that this project is under active development so things will change however the basic structure of the project is going to remain consistent so you will be able to run this using the instructions in this video so let's walk let's have a walk through the record but before that let me show you the architecture of how this project is set up now the main question is going to be why do you want local or private GPD so apart from the Privacy consent one great thing about these type of tools is that it lets you use your documents as source of information for llms so basically you are augmenting the knowledge base for the llms because they don't have access to your own data now in order to implement this here's the basic architecture that I'm using you might have seen in this or a similar diagram so I will quickly walk through this now there are two components the first component is implemented in this python file called ingest dot Pi in this case we are ingesting or feeding in information from our local files into the knowledge base of our local GPT the second component is implemented in run local gpt.pi and this lets the user interact with the information or the knowledge base that was created in the first step so we are going to be looking at both of these files in a lot more detail first let me show you how you can set this up on your local machine and then we will walk through the code base so for that first go to this code button and click on this copy button so it will basically copy this GitHub repo location next we will open an instance of visual code Studio so I have already created a local virtual environment that I'm calling local GPT but if you want to create one so you can do it by simply typing in conduct create dash n and then give it the name of the local environment so in my case it's local GPT for this to work you will need to have conda installed on your local machine so in this case it's asking me a conda environment already exists remove the existing environment so I'm going to say no so now in order to activate it I'm going to just type in conda activate and the name of my virtual environment all right okay next we want to clone the repo so get cone you will need to have get installed on your local machine for this and just the Clone the repo right so now we have everything locally installed now the first thing that you want to do is you want to install all the requirements okay so here are the all the requirements down in order to install all the required packages we're going to be using pip so pip install Dash R and then requirements dot txt all right okay so once you run this command it will start downloading all the required packages and installing them so once you clone this repo you're going to have this folder called Source underscore documents and you need to put all your PDF text or CSV files and this folder So currently there is an example document the constitution.pdf which is the constitution of the USA but you will need to replace this with your own files for it to work on your own data all right so first let's look at the ingestion part now if you are familiar with these Concepts you can skip ahead or if you want a lot more detailed description of these so watch this video alright so currently the local GPT supports PDF text and CSV files this list is going to expand with the time but the basic idea is that we want to retrieve information from our own local files for that we split them into smaller chunks because we cannot feed in let's say 100 page document into an llm and then we compute embeddings on top of it embedding is a basically converting our documents into a string of numbers or a vector of numbers instead of characters you are just representing a specific chunk or a document by a vector of numbers and using these embeddings along with the original documents we create a semantic index which is our basically uh your database of the information that you have provided in the text or PDF files so in terms of the code we call the main function now let's see what is happening right so it will first simply print the path of the source directory saying that loading documents from there then it makes a call to this function load underscore documents so let's look at this implementation so this loads documents receives the source directory and basically the path to this as an input and then it goes through the list of all the documents so you can have multiple different document types and then it uses this function called load single document to each to load each one of those document right so let's look at here right so currently as I said we only have support for uh three different file types foreign so for each file type when it receives a path to a new document it checks whether it's a text PDF or CSV file right and depending on the type of the file it will use a different data loader so for text file just using text order for PDF it's using PDF Miner loader and photo CS3 it's using CSV loader so essentially you are simply going to get a list of document paths right or document loaders we need to split them into smaller chunks so that our llm can process them and this specific case I'm using a chunk size of 100 a thousand sorry so this is the chunking process over here right and with an overlap of 200. now I go over these choices in a lot more details on my other videos so check them out then uh what we do is we take the documents and pass them on to this text splitter so each of the document is going to be divided into multiple chunks now the next part is that we need to compute embeddings for each of the chunks now in this case we're using the instructor embeddings so here you can actually replace it by any type of embeddings that you want but I have found that the instructor Excel embeddings works pretty good for most of the applications now keep in mind that this is going to be not the most efficient you can actually tweak the instructor embeddings even further for your application think of it as a neural network so you can further fine tune it okay so once the embeddings are computed then we need to create a knowledge base so for that we are using chroma DB so this takes your text documents which are chunked so these are the chunk documents and the corresponding embeddings and put them in a vector store and it persists so basically that means that you will be able to store them on your local drive so that you can use it for later on current so that's the first part of how you can create your vector store or database based on your own documents now when you run this code you will see that it will create another index folder and under which and it is going to have all the files related to your vector store okay in order to run this you just need to type in Python in just dot Pi it's going to walk through this step-by-step process it will first read your document convert them into chunks then compute embeddings and store them in your local Vector store we're going to look at an example later on but these are the type of messages that you would expect now probably you notice you don't really need to set up anything at all you don't have to set up a DOT Envy file in here or you in the path everything is taken care of automatically now let's look at the second component so let's say you have your knowledge base ready and a user wants to interact with your data so you simply want the user to ask a question in natural language then compute embeddings for the question that you had based on the type of embedding models that you used to a semantic search on your knowledge base right and it will simply return the most relevant documents or chunks that are present in your vector store and then you use the original documents related to the chunks as a contacts for your large language model along with your question and you get an answer now let's look at how this actually looks like in the code now for that we're going to be looking at the Run local GPT file now again we start at the main function so let's go in here now the first step is that it has to load the embedding models again just to compute embeddings for your question right so that's why we are embedding we are loading the exactly the same model that we used for computing embeddings for our documents okay next we also need a knowledge base and because then that's why we're going to be asking questions from so we also load our Vector store on knowledge base using the chroma DB now this is going to be the one that we created in the injection phase and that's going to be created as a index okay next I'm not using this callback so I'm going to just comment this out all right we need to load the llm that we're going to be using Okay so let's see what's going on in here so for the load model function we are loading a model uh in this case that is volcania 7B from hacking face now I'm trying my best to make it modular so if you want to try another model you can just replace this line with the model ID of that specific model so let me show you an example um so we will look at this user Tom jobin's or the block who is a legend when it comes to llms on hugging face so just look for anything that supports the hugging face format so for example this model right then you simply need to copy this model name and then come here and replace it now uh currently this will support the Llama based models because we are using the Nama tokenizer but you can easily replace it with other type of models that you want next we load the model using the Llama for causal LM we got a model and then we are creating our pipeline now since it's a decoder only model that's why we're using the text generation then as pass on our model tokenizer and here's the maximum length so depending on the model you can actually change this in this case I think wilkinia7b supports only 2048 rain the temperature I set it to zero so because I don't want any creative answers I want for the model to extract information from my document and then there are a couple of other parameters that you can play around with so we create the pipeline and then pass it to the hugging face pipeline function and we get a local llm now uh when we call this function we will get our llm and then we're using a length chain for information retrieval where we pass on our llm we are using the uh stuff type chain right then we are passing out the retriever that we created based on our Vector store and we are also asking it to return Source documents so basically where it's getting the information from cocaine so we're basically uh walking through these steps to get an answer and then the user asks a question so there is a this while loop if the user types in exit it will break the loop and you will get out of the program otherwise there's going to be an infinite Loop that is running right so let me quickly walk you through this this is uh purely based on the private GPT code so first um the user indirect query right then we run that query through our question answer retrieval chain that will give us an answer in terms of the results that the element found as well as the source documents so next we simply print and both the question as well as the answer first and then we go through one so next we simply print the question as well as the answer and then we go to the source document so this was a lot of code now let's go and watch this in action in order to run this you will need to type python then run underscore local GPT dot pi and then it will give you the user interface so I'm going to switch to my other machine in our room run this part okay so once you run this run local GPT so you're going to be presented with this place where you can type in your prompt so since we're working with the Constitution of the United States let's ask it a couple of say simple questions so I'm going to say how many states or in the USA so here's the response from the model there are 50 states in the USA then it also will give you some Source documents so you can actually go through the list of documents that it used for example it's um getting information from these different articles within the Constitution now as I said it's under active development so I'm going to be adding a lot of proof more features so it's going to have a support for more models as well as I plan to add a graphical user interface you can simply drag your documents and chat with it so a lot of cool updates coming in in the future if you would like to contribute to the project create appear and I would integrate your contributions if you would like to support my work there are quite a few ways check out the description of the video if you have any questions or comments put them in the comment section below if you like the video consider liking it and subscribing to the channel if you haven't done so and thanks for watching and see you in the next one
Info
Channel: Prompt Engineering
Views: 59,586
Rating: undefined out of 5
Keywords: Prompt Engineering, Prompt engineer, privategpt, how to install privategpt, privategpt setup, chatgpt offline, gpt4all free chatgpt, local ai chatbot, run ai chatbot locally, run chatgpt locally, local chatgpt install, train chatgpt on your own data, train chatgpt on pdf, custom knowledge chatbot, how to install gpt4all, langchain tutorial pdf, langchain tutorial, chatgpt local install, Vicuna ai, vicuna ai 13B, LocalGPT, localGPT cahtbot, vicuna 13b installation
Id: MlyoObdIHyo
Channel Id: undefined
Length: 17min 11sec (1031 seconds)
Published: Mon May 29 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.