Crazy FAST RAG | Ollama | Nomic Embedding Model | Groq API

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello guys welcome back in this video I will be explaining as I mentioned in my previous video chat with a PDF but in this case what I'm going to do is is use the models which I mentioned in my last video this nomic embed Tes to do the embeddings and we can use any models from this AMA let's say website example I I have different models here also I will be testing with the one of the model from the gro API and they have two two models right now one is the Lama 270b 496 and the next one is mixol 87b so I'll be using mixol also and show you how how do we get the answer between these two models let's get started okay as always here is code in the GitHub so you can just go here and clone it you can just go to the code here SSH or https just copy this go to your terminal I have already clone this as you can see here I'm inside the CH R chart with PDF and this these are the contents of that file what is the first thing that we need to do is create the virtual environment right you can just do python 3- MV EnV Source activate so meaning that it will create a virtual environment and activate the virtual environment and as you can see V EnV is shown in front meaning that our virtual environment is being created next thing is we need to install the necessary packages what is the packages if I go and say cat requirements. dxt you can see that we are going to install chain lead l chain Lang chain Community P PDF because we are going to use the PDFs chroma DB to store the embeddings gro because we are also going to test with the grock model and Lang chain Gro and the AMA to load those uh models right what I will do here is Pep install requirements.txt so it is going to install all the necessary packages for us once this is done we can just create a simple uh chain lead applications and I will be showing how to quickly switch between different embedding models and different models and check the performance let's say based on that I as as I have already shown you how fast is the nomic embed model in my last video I hope you you will be watching that video also because why not use the best embedding model out there instead of just using the random random models right as I said you before also we are going to use the nomic uh embed text and it has as you can see here it is the large context length text encoder that surpasses the open AI text embeding are 002 and text embeding three is small we will we were actually using this kind of models so why to pay for that right so yeah now I will be going to the terminal as you can see all the things are being installed so I will open this in the vs code to explain you in depth one thing first is if you want to have the lank Smith also in your applications you can just uh go here there is this example EnV just copy this I have already copied here in EnV I will show you because I'm going to delete the let's say the Lang chain API Keys later so yeah this is the things for tracing into the Lang Smith and later I will show you also how to how to add the grock API here when we will be using that right now I'll will go to the app.py file what is here there is import Pi PDF to that is for PDF in loading the PDF right and then there is this AMA embeddings because we are going to use the embeddings Via this AMA embedding class as you can see by default it is Lama 7B but we are going to use the nomic Ed in this case and there is the recor text splitter what was this splitting the text by recursively look at characters so as the name already suggest here the docky strings and now we are going to use chroma to let's say store all our embeddings and the necessary things and we have the conversational retrieval chain so what is does a chain for having a conversation based on the retrieved documents and there is this chart AMA where we will be using the AMA model AMA locally runs large language models right by default it is Lama 2 we will be using instructor we can even use Lama 2 in this case just to test and then I will show you also how to use from the chat grock the model from Gro and we need to pass this API key so we will do that later and also there is this chart message history and conversational buffer memory so that it knows your history so you don't need to provide all the context each time that's the that's the thing that this history and conversation buffer memory does as if you if you just hu on top of this it will show what it does okay buffer for storing conversational memory and if you just go here it will show okay in memory implementation of chat message history right and the chain lead is there to have the simple chat UI by the way we don't even need to load the environment variables in chain lad because it does by default so if you are using other than chain lit you need to let's say uncomment these lines what I'm doing here is I will show you two things as I said you before also here l local meaning that chat Lama chat AMA and by default it is using llama to as it is shown here but we can even pass the mol instruct we'll test with this one and then this is the gro so for that we need to have this Model mixol 8 into 7B as I showed you before also for that we need to have the API key right let me actually go and already grab the API key so it's easier if I go here we need to provide this API key key right I will copy this first go to EnV down here I will paste this and I will go to the website now it is the playground here there is this API key I will create new one submit okay name must be there okay grow CAG submit again I'm showing you this because I'm going to delete uh soon this one so I go here and paste it that is done let me go back to the app so yeah this is done and now there is this on chat start what we are telling here is what to do for the chain lad application when we start the applications right here it says initialize the variable to store the uploaded files when files equals to nonone we we upload the file please upload a file and you can pass this accept PDF so it knows that we are uploading the PDF and you can play around with this Max size how much of the Big File you want to do and what is the timeout for this right right and we just send this and we get the first uploaded file here and we just print the message okay processing and here as you can see the by PDF 2 we are reading the PDF files extract the text from it and then split the text into chunks right this is how we do things and the chunk size is 1,000 and you can pass the chunk overlap whatever you want to pass it here and then we just use the Splitters and split the text the PDF text that we get from here and we create a metad data out of that and we create a chroma Vector store here is the main thing we are using the noic ined text and I will also show you with this llama too first I will actually you can just play around with this but as I showed you in my previous video it's quite fast right let's not even go with this Lama to and then we are just taking the text embeddings and metadatas and then we are creating the doc sorts that's it then we have this CH history initialize the history for the conversation and the memory for the conversational context so here you can see the memory chat history output answer this is the normal normal things that goes into the conversational buffer memory and here we will create the conversational retrieval chain from llm we are passing the llm Here Local first we can change this to llm Gro when we use that Gro right typ is stop you can just play around with this also and the retrieve will Lo doc source. as retriever so it retrieves the information from this doc sours that we created and memory equals to memory so this is what we created here and the re return Source document we say it to True right after that we just say that okay processing done you can ask the questions and we just use that uh user session do set chain we just have this chain set here so that is the initial part of the chain lad application here as you can see here it just we just get the get the chain that we set before and then we just have this async Lang chain call back Handler and then here we are just passing the message call backs and we are getting the answer and we are also getting the source documents this is the text initialized list to store the text and we process Source documents if available right yeah this is just the code here so what we will be doing first is now we will be running this uh chain lit application so first remember that we are using this mistol instruct and we are using using the model embedding is nomic Ed text right I can just go here in the terminal and I will just say chain lad run app.py right so it is loading the EnV files and it will run the simple chain lead applications okay this is here it says the upload the files I can upload this GPT for all paper as you can see here it is taking just a fraction of seconds to do all the eddings now we can just go ahead and ask the question I have already asked the questions as you can see here many I will just say when was GPT 4 released so it will go and provide the answer for us so let's see if it provides the answer for us or not because I'm using the mistol instruct in this case and there was the part okay it says that okay the text does not provide any information on when gp4 was released but it was there and it gets that it is getting into the GPD for all okay let me open the PDF for you so that you know what question I am asking right GPT for all paper so this is the paper as you can see here I'm just asking the question uh from the part here okay when was the GPT for release it should be March 14 2023 right you can just go ahead and ask the questions let me go and ask the questions here I have already asked some questions so I can say okay where was the collected data loaded on it should give me it was loaded into the atlas because they they just load that into the atlas but let's see if it works or not not by the way we are using the mol instruct in this case so it say the context does not provide any information so so that means that we have the information from the embeddings but then the mol instruct is not able to process this information right I will just close this one let me just uh remove this mistol and just try with the Lama to how it performs right we don't we don't need to provide any of this here I will do save and now I will go to the terminal again and here I will again run chain lead run app.py right let me see if this provides the answer okay your app is running so what I will do is browse here go to GPT for all it is loaded because we have the first embedding model I will just go here and ask the same question okay when was gp4 released let's see if it provides the answer or not we are using the AMA and we are using the default Lama 2 model in this case right so as you can see it is taking uh some time to uh provide the answer and okay according to GPT for GP according to the text gp4 was released in 2023 specifically it was released March 20 2023 so yeah is it March 20 2023 let me open the PDF again so it says okay it's March 14 2023 but we did that okay it gets the information from here March 2023 so okay March 26 yeah it just went to the other other part but it gets the somehow answer but it's not correct right let me see if it answer where was the collected data loaded in so it should be the atlas where it is uh loaded so let's see if it provides the answer or not okay yeah it gets the answer as you can see here according to the text the collected data set was loaded into the atlas so yeah it gets the answer so meaning that how you choose the models also defs right now what we can do is let me cancel this let me go to the uh code first and from here what we can do is instead of uh local model we will be using the llm gro right so for that what we can do is we can go down in the model part here and we can change this to uh Gro I will save this one I will go to the terminal and here I will try to stop this for some reason my terminal is not responding so I will open a new one I will activate the virtual environment and now I will run chain lad app. P so now it will make the API call to the GRS website right so this is the thing I will just browse here I will use the GPT same paper and now okay it said fails to load undefined what is the ISU let me go to the terminal uh attempting to open already in use okay it says already in use because we are using this here contrl C for some reason it is not opening I will cancel this and again uh run this one so yeah now it should be running let me browse here gbt for all yeah it is processing it's done and now if I ask the question okay when was gp4 released it should be quite fast as you can see here because we are using now the GRS model but it cannot find that I cannot provide you the exact date for the releas gp4 it was not mentioned in the provided context but it is mentioned there right and what is the information okay Source zero is there uh and then what happened here so there is also not there in the source two it's not there Source three okay it is not in the source so it is not providing the answer but let me ask another question let me then the the same question when was uh where was the collected data stored or located on right so it's okay as you can see here how fast the inferences and it is loaded in the atlas it gets the uh it gets the answer and you can just go here and ask as many questions as you want uh let's say on which model was original GPT for all model fine tuned uh let's ask this one so it should be Lama 2 but it is saying okay the original GPT for model was fine tuned on GPT turbo model from open a this metal was used to generate responses yeah it was responses but if you go to the PDF it's actually fine-tune the original one uh it was fine tuned uh with the Llama 7B but it is going to the open AI because the open Ai and llama 2 as being quite mixed up here so it is getting the answers but the idea here is you know how fast it is when using the GRS inference and using the model from the GRS so until now it's free so I'm showing you this otherwise you can just use the open a eyes models as you can see here our eding model is Top Notch so we are using the best open source eding models out there and also there are different models in the ama's website you can just go there and play around by the way when we are asking this using the gro uh API if you go to the terminal it is making the API call uh to the api. gr so you know that where uh the API call was was being made so yeah thank you you for watching this I hope now you get the idea how to use the uh let's say the open open source embeding model and also the open source models and also switch between the two different models with just one line of code here right we just changed uh the we because we have the embedding model is the same right but what we did was uh the model where it is uh this one right so we just just change into the LM Gro and it is making into different API calls and by the way I explained you in my last video also and why it is important is because why it is important is also because as you can see in this diagram so let me show you here so this part of the diagram uh is all about the eddings as you can see here first there is a PDF it split into the chunks and we have the chongs there is the invading API we use the nomic and then there is the invading different embeddings there is the build semantic index that is the knowledge base and again when we ask the question also there is the question that is passed into the same embedding API that is being embedded and the centic sorts happens in the knowledge base we get the top ranked in our case four and that is passed into the llm so the llm just comes on the last phase of it so it synthesize the answer and we get the answer right so by the by using this open source models as I showed you here also you can quickly switch between different different models and use different embedding models so yeah I hope you learned something new and just give a try it's not only with a PDF you can just go to the official documentation and load csb load text file you can load the readme files whatever you want to do so yeah thank you for watching and see you in the next video
Info
Channel: Data Science Basics
Views: 1,635
Rating: undefined out of 5
Keywords: openai api, code, llm, chat, langchain, langchain tutorial, langchain explained, what is langchain, chat models, chain, langchain use case, chatgpt like ui, chainlit, open source model, ollama, groq, groq with langchain, language processing unit, chainlit groq, groq fast inference, Inference groq, LPU, groq insane speed, fastest ai chip on the world, nomic embed text, nomic ai, fast embedding model, embedding, open source embedding model, what is nomic ai, what is nomic embed text
Id: TMaQt8rN5bE
Channel Id: undefined
Length: 18min 22sec (1102 seconds)
Published: Mon Mar 04 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.