Easy RAG Setup - Load Anything into Context - Mistral 7B / ChromaDB / LangChain

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

what is the price of the Apple Pro the Apple pro has a quite the price tag it comes at a whopping $34.99 how much does the Apple headset weigh it's around 625 G how many pixels does the display have 23 million pixels now for the big question is the Apple headset any good it depends on what you're looking for in a headset the Apple Vision Pro has some cool features but it also comes with tradeoffs tradeoffs like what some trade-offs you might encounter or comfort from the weight on your face potential makeup smudging restrictions on picture taking limited connectivity options decreased performance in dark rooms constant hand tracking and potential limitations with cameras and displays okay so that was me testing out the rag putting all the text from the Apple Vision Pro review from The Verge into a document and yeah I think it worked quite good so let me show you how I did this okay so this is pretty much the setup it's pretty simple so we have a document in our case is the Apple Vision Pro review we send that over to Lang chain that is going to create our vector embeddings and those embeddings are going to be stored at chroma DB and then we can take a user query to fetch the most similar vectors to our user query and we can take those responses and send them to our chatbot in this case is mistal 7B as just a context I'm going to show you how to do that and then Mr be can use that context and give us an answer back right so I have set this up so we can get the answer both in text or in speech that's yeah that's up to you but overall this is a pretty simple rag setup yeah and as you can see here it's 90 lines of code where is simple to set up you can see we use l chain and we use chroma here uh we have we have to use open AI because we I think we use the chat open AI to actually fetch the embeddings you can see I have this set up with a local LM Studio that is running mistal 7B yeah our open I key and we set our directory where we want to load our documents for from we have a load document function we have a split document function because we want to chunk it up into different sizes so I'm doing like a 500 size and we do like a 50 overlap so we don't miss anything right and here is kind of how many chunks we want to feed into our query so yeah I just put that down to three but you can adjust that of course our Mr all 7B function we have a system prompt that is basically just going to be your name is Julie you're a tech expert always keep the response quite short and conversational right and here is kind of our true Loop so what I wanted to show you is how we kind of feed the context into the to the prompt you can call it like that here is kind of we get the answer back from the vector database and we feed that answer into I just created a prompt that says context and we feed the answer in here and then we have the user query as our prompt that is going to be what we are putting in here right in the start of our true Loop so that means that the M 7B can kind of use the context here to answer our query and that is basically the setup you can of course do this as advanced as you want and if it take a look at kind of the speech version is basically the same right it's a bit more advanced but it's basically exactly the same we feed the context here and yeah from the context above answer the user query that is our text input from our voice that is translated with faster Whisperer right and that is basically the setup uh the only thing I did here on the on the talk part or the speech part is that I log everything everything we talk about and the idea behind that was that we can kind of store our chat log and when we load that into this system here we can kind of get the shat to remember what we talked about last time right so we're going to test that out and yeah that is basically the setup of course I'm going to upload this to my GitHub if you are interested in that just become a member of the channel follow the link in the description below and you will get access to this but uh this should be pretty easy to set up yourself if you are interested so now let's test it more let's upload different kinds of documents and see how well it performs okay so the first test I thought we can do I went over to open AI I went to blog I copied all the text from this blog post here about embeddings and API updates I put that just into a simple text document uh I'm not going to do anything fancy I just put everything into a text document yeah like I said I put it into my log folder here and now let's try to run it and see if we can ask like very detailed questions here about the blog post okay so I wanted to see if we can pick out a very specific part of the blog post here and question it so let's compare the performance between a 002 and the text embedding three small model you can see yeah it has increased from 31.4 to 44 blah blah blah so what we're going to C uh query here is how is the performance uh if we compare a 002 to embedding three small so let's run that so what I wanted to show you here is the first thing we see here now is kind of the prompt right so you can see this is the part we are feeding into mistol 7B if you zoom in a bit you can see we have context and this is what is being fetched from the the vector embeddings and here we just repeat our user query from up right so when we feed all of this to mistal 7B it can respond here so the performance of betting three small is better than the all 02 which increases in score both in Mirror CL and M benchmarks okay so we can only return this but I just wanted to show you what we are actually feeding into mral 7B and this I think shows very good how rag works like we add additional context to our query so this is what I wanted to show here uh so the next thing I thought we can do is just go over to the speech version of this and try a different query okay so I thought we just can ask a very general question so let's just ask about the updated GPT for Turbo preview so let me just run this here now tell me about the updates to the GPT 4 Turbo preview we've made improvements in the GI 4 Turbo preview fixing bugs and enhancing performance for tasks like code generation it's now more thorough and has reduced instances of laziness there was also a fix for non-english CF eight Generations overall these updates aim to make it better and more reliable okay yeah I think that was pretty good so again you can kind of see now this is the part what was fed into the context and here is what we got back from mistal 7B in our voice in this time uh so yeah I think this works good okay so now I wanted to go super specific so let's ask about what is the absolute lowest embedding size we can have so let's try that what is the absolute lowest embedding size we can have 256 dimensions and what is the biggest embedding size biggest would be 372 yeah well done so you can see here smallest is 256 and the biggest is 372 so yeah perfect okay so the final thing I wanted to show you is the logging system I created for this this was kind of so that you can maybe give your chatbot companion like a longterm memory by using Vector embeddings so this works so this was kind of our conversations so I just had like a fictional conversations with the the chatbot here and these conversations ended at 1536 we had a a conversation later and we had one more so what we're going to do now as we're going to send this into the vector database and then we're going to start to query if we can remember what we talked about Yeah in our last past conversation so let's test that out and see if it works okay so as you can see from the chat log here I asked to meet up this fictional character at the restaurant called the Cozy Corner so I thought that could be kind of our first query uh if I can if I can see if we can remember what the restaurant we went to was called so let me just fire up this here hi again Julie it's Chris so I kind of forgot the restaurant we went to a few days ago I I forgot the name of it uh do you remember the place we went to is called the Cozy Corner Ah that's right now I remember do you remember what kind of food we ate there I'm I'm so forgetful oh yes we shared a lasagna dish and it was really tasty ah yeah I remember now what did we have for drinks we had a bottle of red wine it was a nice Cabernet San what did we have to drink before we went to our table that's correct we had a couple of beers while waiting for our table to be ready okay so I got to say that was pretty much a success if we go up here so the first thing we got the name of the restaurant let me move this over here so we got the name of the food we ate we got the what we drank we got the my favorite game and Julie's favorite game so yeah I got to say this was a success and if we scroll down here now if we reload our log you can see everything we talked about now was also logged so that's a bit funny but yeah I'm happy with this so yeah that was basically what I wanted to share today I will put up the simplest version of this uh just follow the link in the description and you can find it on my public GitHub if you want access to the other versions just become a member of the channel and you will get access ACC to the community GitHub and the community Discord other than that hope you learn something and I'll see you again on Sunday

Info

Channel: All About AI

Views: 8,133

Rating: undefined out of 5

Keywords: rag, retrieval augmented generation langchain, langchain, chromadb python, rag tutorial, rag tutorial llm, mistral 7b, open source llm, openai, vector embeddings, retrieval augmented generation

Id: ydU5L-OGhmc

Channel Id: undefined

Length: 10min 37sec (637 seconds)

Published: Thu Feb 01 2024