RAG Implementation Medical Chatbot with Mistral 7B LLM LlamaIndex GTE Colab Demo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is ritesh srinivasan and welcome to my channel in this video I'll be explaining how you can chat with your medical data so basically you are trying to create a medical chat bot using the latest Mistral 7 billion large language model so the Mistral 7 billion is the state of the art 7 million large language model and the block also released the instruction tuned fine-tuned version of Mistral 7 billion its quantized models okay the advantage of context models are that they are going to use very less space compared to your non-contized models okay so I'll go to the demo and then I'll be coming back and forth okay uh so for this demo I'll be using Lama index to index my PDF document okay so basically this is my PDF document it's a document which talks about a medical condition called fibromyalgia okay so this is a PDF which has details about fibromyalgia and other things so I am actually now I want to create a chat bot which answers questions from this PDF file okay that's the idea so for that I instantiate a collab instance this is a GPU instance even though I'm using a quantized model okay and I have uploaded this PDF file over here under a folder called Data so once when you are starting your chat bot once if you start this particular collab instance you also need to do this you need to upload the data file over here okay under a folder called Data right then what you have to do is that basically run the code so I have to install Pi PDF okay I have to install Transformers I have to install Lama CPP python because I'm making use of this particular gguf or quantized models okay I'll come to that right and uh for it to make use of GPU plus CPU this is the installation command okay where did I get this details I got these details from Lama dot CPP this particular page okay I'll be putting this in the description of the video right so Lama dot CPP python is a python binding for llama dot CPP uh so it supports inference for many llms so this is for making use of C your CPU and GPU for inferencing your large language models okay right as I said you need to install using this command okay and then I'll install lamba index okay what is Lama index Lama index is a library which supports a retrieval based uh what do you call augmentation right retrieval based augmentation for large language models so I've made a previous video on Lama index you can check that out okay so here I'll be making use of lamba Index right so the idea is that I have a Content I have content in this particular PDF now I want to use a large language model which is Mistral 7 billion over here to answer questions from this PDF for that I am using the Lama index Library what would Lama index Library do using Lama index Library I can create embeddings for the text present in this particular PDF and then that embeddings are chunks of embeddings are created they are indexed into a vector store so then when I ask a particular query right it finds the appropriate content from this Vector store this is done by llama index and then the lamba index gives that content as part of my prompt or the query to the Mistral model to get an appropriate response that is the workflow okay so what I have to do is that first I need to actually from lava index import Vector store index simple directory Reader Service context and my documents are present within this content data so I use a simple directory reader which will load the text data in this particular directory what whatever files are present over here it will read that basically it is PDF over here it will read the PDF it will convert it into text and that is what is present in this documents okay so this is simple directory reader is present in Lamb index okay then what I need to do is that from Lama index because M I have this particular model which is gguf format so basically it is supported by libraries like C Transformer llama CPP so over here I have to actually instantiate Lama CPP from Lama index.lms that is what I'm doing over here and I also import some messages to prompt and completion to prompt because to use this particular model the instruction tuned mod right you need to have a prompt template like this right instruction then your prompt instruction okay and this is by default present within uh you know messages to prompt in this Lama index utils so that is what I am doing over here and here I am instantiating the model okay so here I am doing llm is equal to lamba CPP model URL okay one thing which you need to take care of here is that if you go to this particular release from the block right and if you go to the files over here right you have these various files which are available all right so what I'm going to do is that I will be using the 4 bit quantized model because uh you know you have uh so I'll be using this particular medium size 4-bit contest model it is got balance quality and it is recommended to use so I'll be making use of this so one way of getting this particular link is that I can click on this right it opens up over here and then I need to look at this download link okay basically I copy this URL address and I have to paste it over here so that is what I've done over here okay that is the model URL I have not downloaded it locally so model path is none and these are the other parameters okay and if I put GPU layers is equal to minus over 1 over here so what happens is that it makes use of the GPU right all the right offloads a lot of GPU layers onto GPU basically again this is present over here in terms of what are the parameters of llama CPP so you can fire the CPP python you can find it out right so that is what I'm I'm doing over here because I want to make use of the GPU as well okay but I want to make use of a contest model now why I wanted to make use of a quantize model over here when I tried the Llama 7 billion instruct as such on this GPU on this collab instance on a CPU on a GPU collab instance it ran out of memory right that's why I am using the uh what do you call quantized model over here I want to make use of less resources over here okay and the context window for uh here it says lamba 2 but it's not lamba 2. the context window for install is also 4096 tokens but uh you know what they're saying is that let's keep it a little bit lower right so this is the default parameter over here okay and messages to prompt is messages to prompt completion to prompt is completion to prompt so this is again default messages so I found out that this particular instruction uh you know the format basically uh where is it uh you know how do you actually prompt this particular model that instruction format is part of you know this message is to prompt right so that's why I've included as such so this model is now downloaded and it is instantiated okay then to convert your embeddings right you are texting to embeddings to index it in a vector store you require sentence Transformers you need to install sentence Transformers and then what I do is that from Lang chain embeddings hugging phase I have to import hugging face embeddings okay and from lava index I have to import Lang chain embedding service context and here is my embedding model okay this is how I instantiate it now I have choose a particular embedding model over here called you know GT large or general text embeddings okay now why didn't I use a plain sentence Transformer embeddings over here it's because if you look at this uh you know the leaderboard for text embeddings here you have this particular GT large is doing better than say your sentence Transformers over here across many tasks so I wanted to make use of this so you can actually make use of sentence Transformers as well okay you can try out with these various embeddings okay so that is how I am instantiating the embeddings over here so now you have your llm instantiated you have your embeddings instantiated right you have read your text uh and next what you do is that you need to actually uh basically you need to create an index right uh before that you have to instantiate a service context because you are using your customized llm over here and customized embedding model by default in lamba index they make use of open AI but we are using customized llm which is nistral 7 billion and customized embedding model which is GT large so that is what you are doing over here in service context and then you can directly index your documents by creating this index is equal to Vector store dot index from documents you pass your documents right basically what your PDF which are converted to text and by passing this particular service context okay now you have done that so this uh indexing happens right once you have done the indexing you can create your query engine as index as query engine and you can pass this responses query engine dot query what is this condition so this is the question which I am passing through to this query engine so it has created the index so what internally does is that it takes this query it creates embeddings using our embeddings then it searches in the vector space right in in the documents which is the closest or relevant content for this particular query it gets it and then it attaches it as part of our prompt and then it asks the llm to give a response right and that that response is what you are seeing over here so it gives that fibromyalges a long-term condition that causes pain tenderness all over your body it is thought to be caused by your nervous system uh process pain signals right this condition is linked to so and so okay so that you can see from your this thing fibromyalgia is a long term condition that is from see it is thought to be caused by your nervous system in your brain and spine not being able to process so it is actually extracted this is a relevant content basically using Vector search then this is given to the llm and llm is now pulled out this response now we can actually ask further queries so basically you can put it into an infinite Loop and you can ask queries right so you can like what are what are the symptoms okay see the symptoms of error includes spraying that spreads throughout the body uh particularly in the neck so it actually pulls it out and it shows this particular response okay again from the symptoms pain you may feel different parts of the body right so it's actually pulling from over here okay so in this way you have actually created a basically you have created a medical chatbot over here now this could be from any domain you can change the PDF to any other domain and you can create chat Bots across various domains so this is using Mistral 7 billion language model as your llm okay for retrieval augmented llm based you know text generation okay and you have made use of your GT large okay or general text embeddings as your embedded model okay you are using Lama index as your orchestrator over here to actually connect the embeddings basically the vector store and the llm okay so in this way you can also create your chatbots on different uh content basically uh using Mistral 7 billion language model so I've created a previous video on Mistral 7 billion you can check that out to understand more about Mistral I hope this video is useful to you I'll be putting the link of this collab in the description of the video along with all the other relevant links see you in another video
Info
Channel: Rithesh Sreenivasan
Views: 12,881
Rating: undefined out of 5
Keywords:
Id: 1mH1BvBJCl0
Channel Id: undefined
Length: 13min 30sec (810 seconds)
Published: Thu Oct 05 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.