Building a Multimodal RAG App for Medical Applications

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to AI anytime channel in this video we are going to explore multimodal retrieval augmented generation that's also called multimodal rag inser it's a very famous use case nowadays when it comes to generative AI where people wants to work with uh you know multimodalities nature of data you have images videos which have different dimensions modality and you also have audios audio is fairly easy to work I don't count audio uh to be used in the same bracket because you have the transcription you can do a lot of things with it our Focus will be on to see how we can look at tables images or in near future videos and query that uh knowledge base and how to retrieve information from these modalities when we talk about different data now so far we we have been on my Channel at least we have been working with I have more than 25 videos uh of RG rag of both open source and close Source TCH the next three videos will be F will be focusing on multimodal rag end to end in this video we'll use close Source tag completely gptx model you can also replace that with Gemini but I'm not a big fan of Google's model so I'm going to rely on gptx model in this video the next video will be completely open source we'll use lava in the next video as a as a large Vision model or a multimodal llm so there will be two videos one that we're going to do in this we're going to develop an application a fast API based app where you can query your multimodal data that's the one thing in the next video we'll use lava most of the things will remain same the underlying principles we'll do a bit of changes on the pre-processing postprocessing and also with the uh the large Vision model the third video will be on deploy deployment on AWS through ec2 that's the three vide so uh why this is important because so far we have been only looking at the text uh text part of a PDF or a document or a text file or whatever markdown whatever format that you deal with now imagine if you have images you know you talking about I'll give you an example of a couple of huge cases you're talking about uh your electronics product okay you have you know people query about give me an uh phone of range between 20 to 5,000 having 16 GB of you know Ram blah blah blah how will you also return the image with it not only okay this is the Amazon this is a Samsung phone or this is a OnePlus phone that you can buy but also the images that you have in your product guide or product uh uh manual that you have so this is one of the use case there are a number of other use cases so we're going to use a few things in these videos we'll use unstructured we'll use unstructured Library first to see how we can deal with uh this kind of data then we'll take a approach that I will discuss when I'm writing the code and we'll first do it in collab and then we'll build the app okay let me first show you quickly that what's what's going to be the end product so it will also you know motivate you to watch the complete video so let me show you so if you look at here on my screen I have something called multimodal rag app okay it's a fast API based app so it's a micro Services driven app you can also if you do Local Host 8,000 SL dos it will take you to the Swagger UI where you can see the specification we have we have two endpoint one is get which is for index.html the other is post method because we want to upload some like you doing a post request here you asking questions and then you get the answers of it so you can also try it out here if you if you just need an API endpoint to integrate in your existing application you can also do that but I am more interested to show you in an app so it can become your project if you working as as an intern if you want to do an internship you working as a final year college project or if you want to build a proof of concept at your workplace to show the capabilities of multimodal rag application this you can just take this code basis this entire app make a bit of changes on the back end with your data and stuff and then you are okay to go ahead with this is not a chatbot at this moment this is more of a search that we have built but it's easier to build chatbot with with it you can just uh add a few lines of go to haveit ter memory and stuff now if you look at here I'm ask question what is gingivitis now why I'm asking this question because I have taken a different use case let me show you that this is the source data that I have taken you can see it's a PDF file it says canine periodontal disase PD for canine okay so it's for canine now we have it's by United States Department of Agriculture you know we're talking about a particular disease it says what is PD inflammation of tissues and Bone that surround and support the teeth due to a bacterial infection PD is sometimes referred to as a dental disease okay so this is related to a dog so we're talking about pets here so this is mainly relevant for the vet doctors veterinary doctors that we have that will be relevant for them now imagine if you have n number of PDFs like this you have n number of documents you can see it also contains images and that's my focus like if I ask a question what is ging it should not only return the answer like redness and inflammation at the gum line it should gave me the image as well to explain to the end user this is what jitis is so I hope you now understood what I'm talking I'm saying that if people ask what is savior PD which is the right hand side here it says note the missing th T and gingervitis it should return me this image not only the text part of it and that's what we're going to achieve in this video so let me ask this question what is ginges and I'm going to ask once I ask the question it will return the answer and also the relevant image that you'll see and this is how it looks like a very beautiful image you know I also think it has dark and light but you can make some changes I'm not a front end guy by the way but if you look at look at here it says what is gingervitis so gingervitis is a type of gum disease characterized by inflammation of the gums it is typically caused by a buildup of plag sticky deposit blah blah blah and you get the output here in the left hand side but on the right hand side you get the exact image that you have in the PDF you can see what is ging is and that is why it's so important now because you are working on multimodal rag application it's not only on the textual letter you can also return tables we'll see how we can do that we can return images you can return text so that's the end goal once you complete this video you will have an app that this that can do this that you are currently seeing on the screen now we're going to use unstructured we're going to use a vector store we're going to probably use chroma or fast you can you are okay to use V8 M as well depending on what kind of services you use and we're going to use gptx model 3.54 Vision whatever we'll see there's a t they will work in Tandem and that's it we're going to use Lang CH as an orchestration framework so let's get started and build this application multimodal rag app in this video with closed Source in the next video with open source Tech recording is already done guys by the way so let's start building this application all right so to develop the rack pipeline basically we're going to use Google collab okay we will write all of our code in collab and then we'll create a vector store we'll download that Vector store and build the fast AP app I'll tell you why uh the backbone of this entire pipeline is a library called unstructured having a pathetic documentation but a great library now unstructured has lot of issues when you work with Windows by the way not a library to work on Windows okay it's it's a great library to work on a Linux drro for example ubu and many of you have faced this I know many of you told me that I'm getting this error and I'm not able to install this is the partisan PDF is not working I get uh popular popular error I get Ghost script error and what not right I will explain that each of those things now we're going to develop the pipeline here and going to use the persistent persistent Vector store in a fast API application so I'll show you how it works end to end now if you are on Linux drro you need to install this TCT OCR lip TCT popular utils now unstructured has two different backends that you can work with so if you don't know unstructured it's a library that helps you work with granular data okay or data which having a lot of granularity in a hierarchy like you have tables you have images you have you know unstructured data and whatnot right also now how you can use unstructured library to basically extract and parts those data that's that's why we use un structured a very powerful Library as an integration with Lang chain as well now uh it uses two different backends one is Ghost script and the other is popular utils and that's why we are going to use popular utils in this one and that's why I'm installing Tessa that basically helps you with the optical character recognition because we're going to use uh something which will get get us the bounding boxes of the images through an YOLO model I will show you so tesser act is there and then Li Tess act because we are on UB 2 machine and then popular utils that will that is going to act as a back end for unstructured so let let's install this so it will take a bit of time to install but if you are on UB 2 you have to download TCT and you know keep it in your uh path you have to go to environmental variable set the path because you have to use Tess from there you can keep this in localx or wherever you know somewhere depends what kind of machine you are working on on now I have installed this so let's come down now the next thing is I'm going to still rely on the Legacy version of Lang chain but you can also use the current update of Lang chain version 0.1 where they have come up with three different components one is called Lang chain core which contains all the chains and what not the other thing is Lang chain Community all the open source libraries are within Lang chain Community which has been created by Community like P PDF for example and other one is Lang chenore open AI because I'm going to use open AI so they have you know created different classes for close Source uh product or services that we see like anthropic and whatnot right now pip install Lang chain I'm going to use unstructured I'm saying all docs you know you can also download you know in bits and pieces but I'm going to go with all docs I'm going to use ptic that helps you with validations and stuff I have open AI probably will use fast you can also use chroma DV we8 whatever and I'm going to I need Tik token because I'm we're going to work with open Ai and then we have open CB python let's install this once you install you have to restart the uh kernel okay you have to go and restart from runtime so you have to restart the runtime so it will take a bit of time now you can see I already have uploaded the PDF here so you can also keep it in your drive if you're working on Google collab if you are locally you can also do that now let me just better to work locally if you have an open to machine uh that's easier now I'm on Windows as you can see so the next thing is the Imports just to save some time I've created lot of gist that I'm want to do a copy paste but I will explain that these are un that not that relevant but I'll explain if you look at here we're going to need a few things uh the first is BAS 64 why we are using Bas 64 we'll see that it will help us get because we get a we have to use incode and decode from base 64 because we get a base 64 uh string there so we we have to convert that so we can get the image right for the decoding purpose because gp4 takes a b 64 string over there now okay uh you can see restart session so we have to restart this say all variables will be lost it's okay I'm restarting that once you restart you can see on the right hand side top in Google collab it says restarting so now you can see it has restarted and now let's import all of this as I said that I'm already using Lang chain Legacy you can also use the newest version if you are using newest version you cannot get it from chat model you have to get it from Lang chain doore open aai and then you import like that not embeddings langen do langore Comm len. community and then you get it from there langen core. chain then you get it from there something like that now Lang chain community. Vector store import fire so there's a bit difference go through the documentation to find out that now I have imported successfully now once you import that successfully what you're going to do next is we're going to get the open AI key so let's let's get the open a ke there are few ways you can get open AI key the one way is to uh directly Define it here create an EnV file also or you can they also they already have something called Secrets now if I go to Secrets you can see I already have set my open AI API key here in the secrets you can see how I have defined the name of that particular constant which is open aore API uncore key all in caps and then I give the value over here in the value that you will get it from open AI dashboard now once you have done that now come back here and let's write that from google. cab import you just import the user data so this is what you're going to import and then open aior API uncore key equals and then you give user data doget and the way you have defined here in the secret you just have to write it over here here so open aore API key now it will be okay to work with the API Keys now we have set our API Keys let's move to to get the elements now when you use unstructured they have different elements that you have can extract I will explain that so let me first write it now let's call it raw PDF elements and I'm going to use if you see here I have used something called Partition PDF okay that's a mod in unstructured so let's use that so I'm going to do partition so let me just write it over you can see it says Define partition PDF so it a function within unstructured library and I'm going to use that partition PDF and inside that I will pass a lot of parameters you can see first is the file name so let's pass a file name it's a string value come here copy the path so let's copy path now once you have copied the path you can see you can pass n number number of other thing the next is extract images so I'm going to say extract images in PDF it's a Boolean value so extract image in PDF and I'll say okay just do that because we need images so true extract images in PDF then info table this has a other parameter called info table info table structure yes let's do inference the table structure it's all bu and value then chunking strategy so it also has a chunking strategy that you can Define so let me just say chunking strategy and I will say by title you can do by Header by Title by subheader whatnot right so let's keep it by title it's again see there is no thumb rule that you have to use what I am using it's again a very much trial and error experimentation and depends on what kind of documents you have at your disposal and you have to you know uh look at the document closely and you know use your experience to when you have worked with natural language and stuff and then you have to decide that how should you create a know chunks of your data or document that you have now chunking is done the max characters okay you can see it will show me Max character it's not showing me so now it will show me oh this is strange Max characters and then it goes 4,000 so let's keep it a bit higher just give me a minute now you can see we have Max characters and the next is new after n characters you have used if you have used lch recursive text splitter recursive character text splitter you will see these are in different terminologies like chunk size chunk overlap and whatnot new after n charts and let's keep it like 35 something like okay when we have 4,000 let's keep it like this new and then combine oh it's not showing me combine text under n character characters so in cast and then it goes like for example half of what you have your max character so let's keep it 2,000 and then extract you have to Define where should you keep the images that you have extracted so extract image block output directory now once you for example if you have a PDF which has 20 images and unstructured gets all the 20 images from it where will you keep that 20 images is something that you define so it's a path of that so let's define after this an output path or you can directly Define it but let me maybe we need some time later output path equals and I want to just do create an images folder in this runtime so it create a images folder in runtime and I'm just going to pass here so output path so let me just write it excuse me output PA now this is this is what we have defined you know we are using partition PDF file name and we say extract images in PDF through info table is structure through chunking strategy by Title Max characters as 4,000 new after end characters as 3800 a bit lesser than what you have kept as Max characters and then combine uh text under n charts now I think we are okay with it now let's run this and see it says name output path is not defined I think I didn't run I didn't run it okay I think now I have run it now let's do this it will take a bit of time so let me pause the video and come back because you have to download a model and I'll show you that as you can see the images have been downloaded in this images folder now what we have done here if you look at closely we have extracted images from the PDF and we have save that in that images folder on the run time that's what we have done first now how have we done it and how does un instructure does it simple first has to detect the images inside a document can be a PDF and for that object detection model can be also used and if you look at here it's using onnx model onx model that basically helps you you know runs a lot of advantages of onx by the way okay on compute limitations if you have compute limitations how you can run an object deduction model you know compressed model with a very smaller size and what not if you look at here it's us YOLO okay. onux had downloaded the model of having size 217 million and then couple of sa T Source okay uh from now it identifies the bounding bounding boxes wherever you find the images and then extract that image and save that wherever you have given the path now that's the beauty of an structure to get the images out of a PDF now if you look at here okay this is the images that we have got fantastic I loved it now if you see this USDA all the images that you will have you know that you can get it the logos and whatnot uh if you look at this I got the Ginger whitis image and that's the question we asked when I started the video if you look at this you know this is something you know t or whatever they call it you know looks bad a bit but yeah now once you have the images once you have the text PDF and whatnot how can you basically do this now next what's what's next the next is to we'll take a summary approach we'll get the summary of text and tables now you have tables you have infer TBL equals true so you'll have tables how to get the summary of those tables and text and that's a better approach I feel because uh you know it has a summarization chain as well that we can use it by default you know and that's the advantage that's one approach of doing it the other approach is that you don't take a summary you take the complete context of whatever extracted text that you have now let's see how we can get the text summar and table summaries for that I'm going to do a couple of things the first thing I'm going to do is get a empty list for text element so let me just do a text elements equals an empty list we're going to append that later the next is table elements when you work with unstructured it it the unit is element by the way now you have table elements text element now also get a empty list for their summaries so take summaries so take summaries and then you also give it a table summaries so let's give it a table summaries this is what it is now we have text summaries and table summaries this becomes Capital make it small now let's write a summary prompt you're going to use GPT model over here you can also use any other smaller model as well if you want to use that but I feel that for me I believe that if you are don't always go on the open source open source and open source if your client is not asking for that right because that's again uh depend on what kind of problem you're working on because no model has right now is better than gp4 hands up hands down hands up whatever you call it no models okay forget about the HX that's going on that okay mral xyg model has you know out performed they outperform on a single Benchmark that's very easy to do that if you are training an llm you just overtune on those data sets of the evaluations so when you work for the real ug cases the use cases model haven't seen in the entire training phase that's the real benchmarking so don't always go on that it some model has outperformed on human evalve or mlu or whatever I'm not a big fan of those evaluation benchmarks and even you should not consider that seriously if you're working for an Enterprise now gp4 is the best model out there in the world right now but we'll see more open source llm will go mainstream because that's how it has happened in the past as well now if you look at summary prompt let me create a doc string in that I'm going to write a summary prompt so let me just do that now here I'm going to say summarize basically the element so summarize the following elements okay so following the let's have an element type because we have table and we have t so let me just do an element underscore type thingy over here and then I'm going to just do element perfect now this summary prompt is fine now we have a summary prompt now let's use a summary chain so we're going to use length chain chain for summarization because it supports that now summary chain I'm going to write LM chain so let's do llm chain and then in llm chain I'm going to do a few things first is the llm so my llm is nothing but the chat open AI so let me just get that chat open Ai and you're going to pass here model equals and I'm going to use for summary I can rely on a 3.5 turbo and yes many models are better than 3.5 turbo mistrals and some of the Mistral proprietary models and other on Elo rating if you look at there but not with gp4 GPT 3.5 turbo so let me just get it turbo and then open aior API key open aore API key and then you just give open aior API key and we going to also write a Max tokens so let's do a maxcore tokens here so maxcore tokens and we can keep it 124 a very standard thingy cool now we are okay with our llm thingy now the prompt so prompt equals and it goes prompt template I'm going to use the prompt template that we have imported from uh on top and in prom template from template so let's get from template and in this from template I'm going to pass my summary prompt as simple as that you can also use Lang chain expression language LC I have a detailed video on LC as well if you're following the newest Lang chain update 0.1 I will recommend you to use LC to write better chains more readable and also better to it is easier to scale as well so summary prompt so let's get that now summary prompt thing is done now let's run this okay now summary chain thing is done now what happens next is that we're going to write a for Loop which will have two conditionals one is for the composite element mainly the tables in unstructured and the other is the text so let's write that now I'm going to do for element in uh raw PDF element that's the variable name that we have declared above raw PDF elements and now we're going to write two if so basically we're going to have a conditional here so if L if so now let's write if composite element so let me just write that if composite element if composite element in in element if composite element in element now let's do text elements text elements text elements happened text elements happened and and then you're going to write here uh gen is not working tonight today in collab this is surprising otherwise it used to show you uh pretty fast now text I think it's got confused now text element. appen now let's have our summary so summary equals and then you pass summary chain. run that you can run that summary chain now in the latest update of Lang chain they don't they the Run has been depreciated you cannot use the Run uh uh there now here I'm going to pass element underscore type so let's do that element underscore type and your element type is nothing but the text and then you you say Element e perfect now this is what my summary is and just going to append the text summaries okay so Tex summaries do appin and then you're going to pass your summary variable that you have written just on top now my this is completed now I'm going to have an L if so L if if table so composite element is done now table in Rip r and then I'm going to write e if a if table in representations of e and then I'm going to write table underscore elements do append table uncore element. append spelling is wrong and then you write so bad it's not suggesting me append e. text now I'm going to write summary again so summary for this conditionals as well again summary chain. run at least show me now it's so bad okay summary chain uh do run and then you going to write element type and then you write your table so just do a table so table element type and then you're going to give element so element and E fine now table summaries do append and I'm going to append the table summary table summary. append uh and then going to pass summary again here and this will hopefully we written right composite element I think let's run this now now this might take a bit of time it depends on how big your PDF is basically how big your text is that you have extracted using uh any structured if it's big it will take more time but now you can see for me it's done because it's only couple of pages there now this is done now what next is that you're going to get the image summary and pass it to uh uh the basically pass it to GPT model GPT 4 Vision preview for that I already have written the code I'll explain that just to save some time because I have created a couple of videos where I have explained the same thing again and again so let's just get the summary thingy now I'm just going to go rawr a contrl c and I'll explain that what we are doing here now let's come back and let's just close this now now what we are doing here guys if you look at closely we are getting the image summary the image elements right now we make it an empty now we have an incord image because as I said in the beginning that it gets you 64 of image when you pass it to gp4 you can also pass URL directly uh if if you have an URL handy but that's an input uh the gp4 takes now we have incote image and now we have summarized image now if you look at the summarize image I have a system message that I'm saying to system that you are a bought that is good at analyzing images related to dog's health because it data is related to that so a bit of instruction as a system prompt now the human message that it going to take a text describe the content of this image and the image URL and you passing the encoded image you are getting from here encode image now the response because we going to use gp4 Vision preview open AI key have been passed Max tokens blah blah blah and then you get response. content now that's the this function gives you the summarized image now we are saying output path looking at the PNG jpg and jpeg the image uh extensions image path out from taking from output creating from all the images that we have basically now that this this is what this for Loop does it goes in that folder the image output folder takes all the images one by one create a summary and then append this list that we have over here image elements and image summary now let's run this and see what happens now it will take a bit of time because you have around 10 12 images it has to do the processings create it and then again save it back now after this we're going to create a vector store and for that we're going to use fast so let me just get the code again from the because this is very easy you have now if you have worked with rag you would have done it at least n number of times but uh let me just get it from here now I'll just again go raw get it contrl C come over here and paste it I'll explain that what's happening inside this now what we are doing is we are saying create documents and Vector store again the same empty list and we are getting the text element and text summaries we have an unique ID for each of the image and the summaries that we have and we are creating a metadata over here you can see the ID type and original content because in the output you also need that F we have to also F the image as there and then we append that the same goes with here for the table summaries and table elements one's for the text one for the table summaries table element and then again for the image and image summary so what we are doing here if you look at carefully we are creating for text we are creating for table we are creating for images all goes in a vector store now there can be n number of approaches you can take it you can create three different collections as well you know Vector stores as well that you want to do it but that's again for you you can try it out uh as a trial and error to see which one performs better now you can see this has ran successfully now once we do that we create that and then we are saying okay Vector store fast from documents documents documents embedding giving the add embedding model and we're just running it so let's run this and see what happens now this will again take a bit of time you can see it has taken now if you come here okay you can also save this now let's save that locally so you can persist on the disk so let me just do that now what I'm going to do here is Vector store this is my name of the variable that I created I'm going to use the save local function so do save local and inside the save local you're going to pass fastcore index so let's do that now and now I'm going to now you can see here if we just refresh this let me minimize the images folder you can see that we have a vector store that we have created and we're going to use the same Vector store in the fast API app because I'm on a window mine and app I'm not going to install unstructured in my local environment so now you can see that how you can do it on fly as well it's very easy to do uh you can just follow the same Notebook write couple of functions and do it but I already have the vector store and I will just do uh the inference and use the vector store in my fast API app now let's try it out what we going to do next is now let's define an embedding sep ly so I'm going to have an embedding function kind of a thing so embeddings here I'm going to pass open AI embedding so let's do that open Ai embeddings and I'm going to pass here open aore API uncore key and then I'm going to pass here open AI uncore apore key you can also probably we should have skipped that by defining os. invar on os. get there get env2 on top but it's fine now let's load so I'm going to call it DB so my offline task is done now you see this as a batch processing where you have hundred of documents and you first create a vector store index it and save it wherever whatever you are using if you're using elastic search you can have index it and all but you have created a vector store a vector database offline of all of your documents if your data is not changing every minute and hour but you can also do it dynamically you know on a periodic basis if your data is flowing through any streaming pipeline you can also do that but imagine if you have thousand PDFs or different data you a vector store of vector database offline and your task is done you just going to utilize that Vector database make a connection if it's running within a Docker or if you have on a dis just call that and that's what we're going to do here we are saying tv. load local because we have saved it locally so I'm going to do load local so let's do that and load local and here I'm going to call not load local fast. load loal because we're want to use fast so fast. load local and here you just call the folder name so our folder name is fastor index now I'm just going to call this and just pass the embedding function so let's pass embeddings embeddings embeddings now once you run that let me just put DB it will show a fast object now you can see it shows a fast object now I'm going to write a prompt template so let's do that so prompt template and in this prompt template we're going to do a few things the first thing I'm going to do is of role based prompting instructions and all of those things you are a wet doctor and an expert in analyzing docs Health okay answer the question based on the following context now here you can also Define or declare which can include include text uh images and tables now here you can just give the context now my context goes here now once you have the context thingy done the next is the question so let's pass the question so question is a variable we can Define like this and we want to do an input variables now question equal to question let's also say don't answer if you are not sure about not sure and let's do decline uh to answer and say ah excuse me a bit of you know uh sorry I don't have much information so let me just do much information about it and that's it and then just do just return the helpful answer just return the helpful answer helpful answer in as much as detailed possible possible and then you just get your answer here so let's just get the answer and let's make this now this is my prom template now once you have written the promt template I'm going to use uh let me just close this now once you WR the promp template I'm want to create a QA chain so let's use QA chain you can also use LC L to create QA chain but I'm going to go with Legacy llm chain and inside the chain I'm going to pass llm equals chatore opore aai inside this I'm going to pass gp4 model for better output so let's get gp4 and then we going to pass open aior API key equals open aore API key so let's do that and then you're going to give Max tokens so max tokens equals again 124 the very standard one and then you're want to pass your prompt thingy now the beauty of LC that you they also have output parsel by default so you can use that as well prom template here prom template and inside the prom template I'm going to do from template to and here you just pass your prom template thingy so let's get prom template and then run this now qn is ready now I have written a function that helps you just get the output so let's just get the function I have just done a pre-processing kind of a thingy over there so let me just go back to my gist and get the postprocessing dopy so I'll just go raw and explain that it has metadata and stuff just to get the output not not anything else now if you look at here what I'm saying this Vector store will get changed to DB because now we have a DB variable because we have loaded from disk now we have a context so context is basically the retrieved chunk the similar paragraphs or whatever you call it now and the re relevant images in a list if there is no images the list will be empty if there are images it will get returned you always take the first image that is has more confidence score now I'm saying for D in relevant docs okay d. metadata type text getting the context table text table image whatever is possible and then I'm passing that in qn. run context and question and getting the result and the relevant images you can see over here you can also get the you can also get the table as well let me just run this now now once I run this I'm going to do result and with the result I'm going to do relevant relevant images because that's the two thing that we going to need in the function and I'm going to call answer and inside that you're going to ask a question so let ask a question like something like this but not always that you know it depends on Vector store as well how you created so not always you will get the result for the for example if you created previously and you are getting it that time and you are not getting here so don't get you know don't say that okay I'm not getting the answer any stuff you sometimes you might not get as well depending on how your vectors have been you know placed in the end dimensional space now here just let's do a result thingy now you can see it says gingivitis is a type of you know peridontal disease that result that primarily affect the gums leading to inflammation irritations and potential infection fantastic answer we got our answer but what about the images now let me just do a relevant images and see if I have got any images over there now when I run it if I either get an image you can see I have got an image I have got three images you can see I have three different images over there that I have retrieve now I only look at it's a list so you can just get the element of the list so let's get that first off that relevant images the first one index start from zero and you can see this is the image that we have now how will you run this images that you're going to use if you are in Notebook you're going to use I python display so let's do that so what I'm going to do is display so let's do display. display and inside this display let's pass dis display do IMG display. image and then you just pass the you going to use Bas 64 to decode we have to decode this B 64 string so B 64 dot b64 something I don't know why it's not suggesting me now B yes b64 decode and again you're going to pass the relevant images thingy so relevant uncore images and then it goes your first one let's run this and see if it gets you the ginger vtis you can see fantastic we got our answer this is the ginger is now if you look at what we did here guys we got the text here for as a result and then we got our relevant images separately so text and relevant images both now on the front end if you have built an application or a chat bot you can just come uh you know create a create it together like you can create a function which will you know take both that we have done uh text and the image and just return to the end user about that now this is what we did now let's build this app in a locally in a fast API kind of a setup for that what I'm going to do is uh let me create a new folder so I'm going to create a new folder or let me just show you that you don't have to create a new folder because you already have that created because I shown you the demo I just walk you through because it's very very easy to do that now now I'll explain the code if you look at here we have have our fast index the vector store that we have created and downloaded in this here in the folder and we have created a virtual environment and I'm running everything within this now as I said that you need fast API to work with if we don't know what fast API is fast API is a python Library it's a framework that helps you run you know uh build rest apis microservices and whatnot right so there's a very scalable framework to work with now uh Tian goo is the creator of fast API I love the guy by the way now if you look at here it's we have fast API all these Imports are same from Google collab notebook and the entire code will be available on the uh GitHub repository you don't worry you will get each and every lines of code there on repository I'll give the link in description now if you look at what we are doing we are saying from Fast API input all the fast API thing that we need we are going to use Ginga 2 templates from an HTML I'll explain that I'm not a front end guy but I can explain you a bit and then we have Json encoder because we need to pass the Json output to the front end now I'm initiating a fast API app equals fast API using Ginger 2 template and my Dory name is a templates which contains index HTML so let's go to index HTML once you come to index HTML we have a CSS styling lot of CSS code that we have written now if you come down we have a container because it's bit of we using bootstrap 5.3 so it's a responsive app now in container we have multimodel rag app you know loading blah blah blah some CDN to work with JavaScript kind of thing and then we have if you look at here we have few things that we have working you know we are taking your uh we are using a fetch method if you look at this fetch method it takes your get this is only where you have to make change now let response equals await fetch get answer now this get answer is nothing but the end point of your API that you have created in fast API if you look at over here get answer now this is the get answer thingy that we have defined here get answer method post and the form data because we are collecting from a form you know when you put input that comes as a form data and we are using a post method and using that get answer and then we get our answer over here blah blah blah relevant images you know and we get that right it's a very simple uh HTML page now on here what what we are doing if you look at we have configure course you know that helps you when you are working in some production applications just ignore it for now now we have our open API key coming from the EnV file that I have created you need python. EnV probably to work with that we have defined our embed model you can also use clip that for the next video now DB we loading it locally you can see the fast index folder and the same prom template that you saw in the collab same QA chain this is for the index.html and this is the same function that we have written just couple of lines of changes I will show you when you write the function you get a question which is a string and coming from a form setup right so string and a form and then using the relevant docs and this is same till here we just using Json response function in fast API because we have to send that to front end and JavaScript we are using vanilla JavaScript to get the output now this is what it does and if you run this let me just open it here you know open in terminal and probably I'll just give me a minute okay now I'm just going to do cd. vnb I already have created CD script do/ activate and couple of three dot dot so let's do that clean that and then just do uvon app app colog app you can also do hyphone hyphone reload but I'm not doing it I'm just going to ignore all all of this new warnings because it says that you want to use up upgrade to Lang chain uh the new version of it and then you have to do Local Host 8,000 you come here ask the same question like what is gingivitis and I will also ask other question where you don't have image you will get an error what is gingivitis let's ask this now let's meanwhile let's ask a question here also because we have this handy let me ask what question we can ask let me ask what is tter and see if he gets something like here if you ask what is tter you can see it gives you the answer that tartar also known as a dental calculus is a hard crusty deposit that forms of on the teeth it's a result of PL sticky flame of bacteria because it's an infection happens in dog and whatnot now it has given you the text answer sometimes can what happen is that it might not give you the image answer but gives you the text and that's something that you have to evaluate now if you click on relevant images in this case it has given you the relevant images now let's see if it's the which relevant images because you can also print out the similarity score of that you can see this is the tter which is the right one which got it the answer right which is fantastic you can see we got for ginges now let's ask tell me about tell me about Tarter so for gingivitis we got our answer we got the outputs and I'm asking tell me about t let's see if it givs you here because that's a different Vector St that I been using just to show you the capability of it but now I hope you got the idea right it's very interesting see fantastic answer I loved it you got the tart you got the answer so it's whatever images we have extracted you know if you ask questions related to it it will return the answers okay now that's all I think that's a very interesting tutorial that I created and I hope you now have understanding that how you can build a multimodal rag app and you can see it's an app and we're going to use the same UI same front end screens with a open source llm we're going to use clip and lava to do that in the next video and we'll also deploy this in the third video on AWS ec2 so please stay tuned for the next couple of videos on multimodal Rag and I hope you like this video this will be available on my GitHub repository so if you like the content please hit the like icon and you know if you haven't subscribed the channel yet please do subscribe the channel and share the video and Channel with your friends and to peer if you have any question thoughts or feedbacks please let me know in the comment box you can also reach out to me through my social media channels find those information on channel banner and channel about us that's all for this video guys thank you so much for watching see you in the next one
Info
Channel: AI Anytime
Views: 9,691
Rating: undefined out of 5
Keywords: ai anytime, AI Anytime, generative ai, gen ai, LLM, RAG, AI chatbot, chatbots, python, openai, tech, coding, machine learning, NLP, deep learning, computer vision, chatgpt, gemini, google, meta ai, langchain, llama index, vector database, multimodal RAG, Multimodal, Multimodal RAG, chat with videos, chat with video, chat with image, multimodal llama index, multimodal RAG langchain, GPT4V, GPT-4V, GPT 4 Vision, Chat with image, llava, idefics, medical, chatbot, health chatbot, doctor, mergekit
Id: fbbFrCfaF0w
Channel Id: undefined
Length: 48min 54sec (2934 seconds)
Published: Sun Jan 21 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.