Private GPT4All : Chat with PDF with Local & Free LLM using GPT4All, LangChain & HuggingFace

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone my name is vitamin and in this video I'm going to show you how you can chat with a PDF file using only a walko model if you are privacy aware and if you don't want to send your data to charge GPS API or any other API you can use GPT for all along with heavy face embeddings in link chain in order to chat with your PDF files so how you can do that I'm going to show you in this video let's get started this is the official page of GPT for all and here it says that it is free to use locally running privacy our chatbot non-gpu or Internet required so what all this means is that this model is actually downloaded on your hard drive and then you can run it from your own machine and they have a pretty nice UI application that is running inference on M1 Mac which is great but in this video we're going to see what is the performance of a specific model running on a Ubuntu machine machine with Hiram and we're going to compare and you're going to compare actually how much faster rose water is this compared to what they're presenting right here they have a wall of installers so these are the capabilities personal writing consistent code understand documents Etc and actually the installation instructions are pretty much non-existent well okay so I'm going to show you how you can install this and there you have links to all of the ecosystem and you can see that you can even train your own models at the python Bindings that we're going to use and then documentation and chat and right here they have this model Explorer you can see that they have a lot of models and in this case we're going to use actually this first one the one that is trained or based on gptj and actually created by Dynamic AI so these are the guys that are providing GPT for all with the latest and greatest data set that they have I have a running Google Co-op notebook with high RAM and you can change the runtime type from change run type type and then you have to select Hiram since gpt40 is using a lot of it and we're going to need it all right so next I am downloading a PDF file that we're going to have a look at in a bit and then I am downloading the checkpoint that we are going to use and it is GPT for all J so with gptj backend and then the version is 1.3 groovy and this bin file is 3.5 gigabytes of memory or storage and this is the structure of the the directory we have the model checkpoint and then the PDF that we're going to have a look at next I'm going to run the Imports but before that I'm going to show you what are the installation packages that we're going to use I'm going to use pop4 just to present you the pages of the PDF and then I'm using clankchain chroma DB pipe PDF Pi gpt40 which is the embedding for GPT for all and then PDF to image just show you the images of the PDF note that there is a newer version of flankchain that is using the newer bindings for GPT for all the library now is called just GPT for all but in my experiments thus far the newer bindings don't work well with the new version of flank chain I was unable to run anything through the model I was waiting for like 20 minutes or more just to take an output and I didn't get any so in the future I might change this in the text tutorial that is going to follow this video but just for now the newer launching versions don't work well with GPT for all all right so here are the Imports and then I'm going to take the PDF file and convert it into images this is going to create two images since we have two images in the PDF or two pages so the first one is a page of dividends so this is a financial statement from Microsoft and I've downloaded this just from the Microsoft official webpage so this is the table of dividends and you can see that we have the Declaration date 2022 and then the dividend per share was 62 cents and then we have the same thing for the 2021 and here the dividend per share was a 56 cents so this is the first page and then on the next page and this is a stock performance graph that is included into the annual report from Microsoft so you can see that they're comparing here SP 500 and then the NASDAQ compared to their stock performance if you invested something at June at 2017 if you invested 100 bucks they you will likely get 397 bucks at 2022 so this is the performance that is included into the annual report and we're going to ask some questions about that using GPT for all all right so the first thing that I'm going to do is to what the PDF file and then convert the PDF into texts so this is going to use the P by the PDF order and I recall that I'm using these dependencies here which is called Pi PDF internally link chain is using this for this loader in order to load the file and then extract the text from it and this is converted into documents for link chain so if we have a look at the first document for example you see that it has page content and then have an index of the page then a source and then this is the text that is included within the page content of course the source is within the metadata which is again source and page and for the page content you can see that we are actually loading the first page and this pretty much looks like the table that we got right here and it is a pretty well formatted or reformed so the water is doing a good job there then I'm going to split even further the texts into chunks in order to maximize the amount of data that we're going to put in GPT for all since our model has only 1000 tokens as a limit so in this case I'm going to chunk the elements into 100 or 1024 and I'm going to split the texts so this is going to take both of the pages that we have and convert them into texts and you can see that both of those pages are actually converted into three texts and again each text is a document and you can now see that the first page was pretty much yeah divide it into two since this total if you recall is included up to somewhere around here so this part of the page was included in another text so this is what the chunking of the data is doing so in order to search for the text and using our questions we are going to need some embeddings to do that and for this purpose again we're going to use something that is completely free we're going to use the hearing face embeddings and this is actually going to use the sentence Transformers library in order to embed the text that we have of course you can specify another model name right here but this one should be a good start I think that this is the actually the default one and it is based on the mini language modeling that was provided by Microsoft I think so this is the embeddings and this is going to pretty much go ahead and downward the embeddings model you can see that it is downloading the pytorch model and then the tokenizer along with that so next I'm going to use chroma in order to store the embeddings from the text or do the embedding itself and when I run this I is going to tell us that it's going to create this DB directory that we specified and you can see that it gets this index and even if I call DB dot persist yeah this should go ahead and actually store the data on our disk something that's not really important uh for this video but just in case you might want to persist the data all right so the actual large language model itself again is going to be GPT for all and it is going to be this flavor of GPT for all and this one is pretty much trained with GPT J so it is freely available for commercial use uh yeah I have this right here don't need that actually so this is the model path which is pointing just to this file right here and if I run this this will go and load the model I don't want it to be verbose just in case that might speed up the performance but you see in a bit that actually the performance of this model is quite low at least on a CPU and it at the current moment it's not really easy to run this on a GPU so now that we have the model loaded we can create a chain type or a retrieval q a chain and in this case I'm going to pass in the GPT for row and then I'm going to pass in the database and I'm once for each document or each result we we get also the source of the document so let's run this this happens pretty fast and then these are the actual questions that we're going to ask so how much is the dividend per share during 2022 extract it from the text so I would expect that this is going to give us 0.62 dollars or 62 cents let's run this and finally GPT for all is done as you can see it took it about five and a half minutes and if you know any ways to speed this up please share it in the comments down below at least from what I've gathered thus far the GPU interface doesn't work with the bindings itself so this might change in the future and this might be quite a bit of a speed up compared to this so let's see what is the response you can see that it got the query then the result and then this is the source documents that is going to use within the answer and I'm going to print the response right here and it says that the dividend per share during 2022 is 62 cents let's check this out and double check it so yeah dividend per share 62 cents okay so it got what this one correctly uh so let's check the next one how much is the investment amount in Microsoft on 6 22 and you can see that my runtime just crashed because it used all the available Ram so I'm going to rerun everything and going to prompt the model again and this is the response that I get after 6 minutes of waiting so yeah this model is pretty slow and the total investment amount in Microsoft on 6 2022 is 1 million dollars wow so it got this totally wrong and I've tried a lot of different prompts but haven't got it to check the answer which is actually right here within that table so maybe you can try a couple of different prompts and maybe you can try your GPT or other model and tell me in the comments down below if it is doing better compared to GPT for all this is it for this video we've seen how you can chat with a PDF file using only local model in this case we've used GPT for all with long chain and some having Face embeddings provided by sentence Transformers I'm going to include all the source code in this case the Google clock notebook into the link into the description down below I'm also going to write a complete text tutorial that is going to be available for free on ML expert so you can read that one as well please like share and subscribe and play please join the Discord channel that I'm going to link down into the description below thanks for watching guys and I'll see you in the next one bye
Info
Channel: Venelin Valkov
Views: 18,826
Rating: undefined out of 5
Keywords: Machine Learning, Artificial Intelligence, Data Science, Deep Learning
Id: k_aURLKTrvU
Channel Id: undefined
Length: 13min 49sec (829 seconds)
Published: Sat May 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.