Unlocking RAG Potential with LLMWare's CPU-Friendly Smaller Models

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome to AI anytime channel in this video we are going to explore that how a small language model you know can really perform well uh mainly for rag implementation and the other aspect of this video will also cover how domain specific embeddings model uh are better uh when you compare this with general purpose embeddings model so we are going to implement a rag in this video on the smaller models and the domain specific models that you can run it on a uh compute uh limited Hardwares right for example if you have a single GPU or if you don't have any GPU you have a CPU machine and how you can use this smaller llms right the smaller language model I not call it llms now so the smaller language models when I say smaller don't uh don't think that it's a 200 million parameters or 300 million parameters that's not going to work at this moment but probably 1 billion or 2 billion right so it scale has to be on more than 1 billion right now uh language model which is which is probably more than 1 million uh 1 billion excuse me has performed better and that's what we have seen But there are some exception for example lamini you know with 7 768 or something has performed better for summarization kind of a task right but in this I'm going to particularly look at llm where so if you look at on my screen this is the demo that we are to build Insurance rag demo so it's for insurance industry and here we going to use insurance-based embedding model that llm Weare has created so llm Weare is an organization who has who is contributing really good uh in the open source Community right if you look at here on the hugging face you'll find out llm Weare company llm Weare AI uh they have a GitHub repository as well that I will give the link in description and they have they're they're doing good by the way they have 20 models uh in open source uh mostly with Apachi 2.0 license you can commercially use it and they also have a data set that you can use right so they have an instruction rack Benchmark data set that you can use to evaluate your other racks that you are building and some models and some industry specific models as well for example you know if you look at uh birth contracts the birth based models by the way the birth based embeddings model that will help you you know extract features which is the first part of your rag implementation where you look at the retriever part uh using an embedding model now they have uh B sck they have B Asset Management they have bought insurance now you can ask this question that why are we going to use this model why not use a general purpose model right I think see there are lot of tradeoffs when you work with Rag and I think these are the things that you should know right it's easy to build rag you know it's very easy two lines of code you build rag right but when you work for an organization for an Enterprise there are hell lot of things are involved cost being one factor the security and compliance being the other factors not everybody will use close Source Model A few of the organization will like to use open source everything on premise within their infrastructure and network so if you work for those kind of organization this video is helpful if you work for like some hobby projects pet projects you know then you don't watch this video I recommend don't waste your time watching this video of mine right so this video is important to understand that what are the requirement when you work for an Enterprise right you have to look at domain specific model that will help you know uh build better retriever then use smaller models that is very particular uh that will be used particularly for a use case you know very specific use case like you know asset management or underwriting the risk Etc right so these are the things that I'm going to cover in this video now if you look at here they have Dragon Models llm Weare or Dragon rate Pama Dragon you know stable LM language model 7B so these are the 7B model one a 6B model and the other is a Dei 6B version Z right so this are the dragon model then they have bling models and I'm going to cover bling I'm going to cover this llama shed SED model I'm not sure what seed means by the way but SED I understand okay so bling I think which is an acronym for I don't know I forgot the acronym there something related to Best Little Instruction following no GPU required which is is really intuitive and funny as well right Best Little Instruction following no GPU required so it says that you don't need GPU to inference this model or use this model right this is what we're going to look at here let me just ask this question but I don't know after running it let me see if I'm running it yes I'm running it I've asked a question what is group life insurance and you can see it says generating response and you can see I got my answer uh this was your query what is group life insurance and then you say okay group life insurance is a type of insurance you know that is sold to businesses uh that want to provide life insurance insur for their employees group life insurance can also be sold to Association blah blah blah and then we have top two documents at uh top K which is equals to and you can see it gets it right right most most group contracts and group life insurance can be hold and this is the document by the way guys we are using so we're going to use this document to as a source document and we going to create a vector store out of this and you can see it says glossery of common Insurance terms okay now this is like a terms terminologies of insurance terminologies that we use right this is what it is we're going to cover this embedding model industry Bird Insurance okay it's by llm whereare again you can see what they say it says it's a part of series of Industry ftuned sentence Transformer embedding model it says a domain fine tune I already have covered in my meditron video I'll give the link in description where I have used pubit but right which is a medical embedding model model has performed better than general purpose model and they are smaller in size as well now you see it says a domain fine tuned birth based 768 Dimensions as a 768 parameter sentence Transformer model intended to as a drop in substitute for embeddings in the insurance industry domain fantastic right it's available documents on the industry Insurance uh industry when you work with deep learning on natural language right it's it's all boils down to weights right when you use when you find tune this model you know on particular databases like Insurance in a domain or industry the model understands the taxonomy better than a general purpose model if the general purpose model has lesser Corpus when you compare with a domain specific model and that is why it's very important because it all boils down to weights the weights there will be some weights assigned to these keywords which is more domain specific and once you use this in your similar use cases the vectors will be very similar okay they will understand the model will understand okay this the search has been uh the search has been you know uh that been done on this Vector so let me return return a similar vectors right so this is very important now let's start uh building this app Insurance rag demo that you see is going to go build a extrem lit application uh using llm Weare model you can see it's a 1.3 billion model of llama and we have Lama 7 billion and you can find out this is fantastic right more than 8 % right size reduction so 80% of lesser size when it comes to the params as well and also with the compute memory right and that's why you can you can easily deploy it and even you can influence it on a CPU machine so let's start building this we'll combine both the lmw models today one is for uh embedding model of the B insurance and the other one is the llm blings seared llama 1.3 let's start building this guys all right so to build this uh application the rag implementation that we're going to do with llm Weare embedding models and the llm what we're going to do is create a folder first you can see I have something called domain specific demo I'm going to open this in terminal and you can see I already have a virtual environment I'll will show you how I created it if you don't have if you have an AA you can use that and if you want to create you can do something like python hyphen MV EnV Dov EnV and you hit enter and it will create a VNV for you you can use some other ways of creating as well using virtual EnV as well right so that's how you do it let me activate this is how you activate on Power Cell scripts and then do/ activate if you are on uh Linux you have to do go to bin not scripts and then you do Source activate uh let me just do a couple of C dot dot clean this and open this in vs code once I open this in vs code I'll explain uh a few things here uh and by the way I'm you can see that I'm using hugging face API token of course I will delete this after the video the reason is I'm using the inference API from hugging face to use the language model I'll show that in a bit but let me first create something called inest dopy now once I create inest dop here I will write all the code for uh creating the vector store uh using an embedding model which is the domain specific embedding model of llm whereare now you can look at I have a one single file you can keep multiple files depending on if you want to if you have you know like if you document store is too big or database is too big then you probably need a vector database not a vector store chroma DV f is like they're not that good when it comes to build uh you know implementing Solutions at scale okay you have to look at w cant mil spine cone and uh Etc right the lot of others now in the in. Pi the first thing that you have to do is you have to install all of those things torch sentence Transformers streamlit Lang chain chroma DB P PDF hugging face Hub and P python. EnV this will be available on my GitHub repositor so don't have to worry about it now the next thing is on in. Pi let me just do few things uh import OS I don't know if I need it then from Lang chain do teex spitter import recursive character text splitter I'm going to use recursive character uh text spitter or whatever okay uh let me just do that and the next is document loader I need a document loader so from Lang chain. document loader because I have uh my data in a directory so document loaders input if you have unstructured data I recommend you use unstructured Library just go to my rag playlist I have more than 25 videos on rag go through all of the videos and you'll be a rag expert okay so uh import directory loader uh directory loaders okay not loaders now these are fine uh the next is from Lang chain. Vector store uh import what I'm going to do is import chroma from Lang chain. embeddings right so for that I'm going to have now here I'm not going to use hugging face embedding I'm going to use sentence Transformer embedding so I'm going to use this embedding model through sentence Transformer so let's do that so for that I'm going to do import sentence uh sentence Transformer embeddings this is what I'm going to do fine now the vector store chroma is done the document loader directory loader is done and then we need P PDF guys there a PDF file so from Lang chain. again right for that also document loaders let me just get it after directory loader only okay not no need to write something new right here I'm going to use Pi PDF loader let me just do a print and see if I'm able to load it successfully loaded something like this okay if this print message gets printed in this make uh you have done it right python in. Pi loaded right let's remove that our input is successful now the next thing is to have the embedding model so let me just do that embeddings equals and I'm going to use sentence Transformer embeddings and in this you have to pass your this thingy uh this hugging face link you know of the embedding model the insurance domain specific embedding model let's do it here and then you have your embedding thingy done now let's just print the embeddings if you want to look at the pipeline of this embedding model okay now print embeddings the next is we need a loader variable to load the file so directory loader thing and then I have my folder as a data data and I will just close the data okay now data and then I'm looking at globe and Globe is like I have to define a PDF thingy here so where are a couple of stars and then uh PDF if you have CSV if you have word you have docx doc whatever you have to Define like that that's a better way of doing it let's do so progress to track this in terminal so so progress equals to true and then I'm going to have a loader class I want the directory loader thing to use P PDF library to know extract the uh load and extract the data so the loader CLS and excuse me this underscore loader CLS equals that becomes your Pi PDF thingy so Pi PDF loader you can also use Pi PDF itself it's fine loader is done now let's use a documents variable pretty much you go on Lang chain documentation you'll find this loaded out load once you loaded out load you are done you are have to define a text splitter with some chunking strategies very simple chunking strategies it's like trial and error we have to keep trying it and see which chunking strategies work good for you now chunking uh no not chunking uh chunk size chunk size let's keep it around I don't know maybe 700 okay and chunk overlap for how many chunk it should overlap let's keep it 70 1110th I always SK 1110th of the chunk size now once that is done let's do a text equal text splitter. split documents it's a method documents and this will and let's pass your documents this is fine now we are done with that uh after that let's do a print thingy here print text something something like this I don't know we'll check it out okay now the next is let's have a vector store or vector stores and here I'm going to use chroma but you can use any other as well chroma do from documents from documents and then you pass your text you pass your embedding thingy and I'm also going to define a collection metadata or I'm creating the supports you know IP dot Lenin coign blah blah blah right so you can look at I have explained that in my previous videos a lot I'll recommend you to watch that if you have some time hnw and then hnw it should be in the same one here hnw and then space now I'm saying okay you use hnsw uh hnsw and then it should be nothing but your cosine I want to use cosine similarity here okay now Co cosine and then after you do cosine let me do an ALT G I'm also going to persist this on my memory so on disk so let's do persist directory because I don't want to create embeddings on runtime right I want to persist that okay so persist directory equals then you have your stores and then you have your uh let's call it a name right let's create a folder and will create a something here okay so cosign blah blah blah and just let do a print and say Vector stores created okay uh let me explain that quickly what we are doing here guys you know if you are not familiar with the retrieval part of it we have imported all the required libraries we have an embedding model that we are using it's by llm whereare uh by AI blocks okay uh so I I I like them what are they doing right basically I I like uh people who are working for the open source Community to strengthen the open source right because open source will win that's how the AI we have seen how AI has evolved right we still use B we still use Transformers all these llms even the close Source ones are based on Transformers which is an open source thing right so open source is powerful and we have to make it even more powerful uh that's what the people like Yan leun and other they like they Advocate it for right so uh llm where you know the AI blocks or something if I remember it exactly they are doing good stuff let's back them up so they can build it more uh llms and more embeddings model and the more AI products for us now uh if I look at here we have embedding model we have a directory loader that loads your file then the documents loading it recursive character text splitter no Lang chain provides character text splitter recursive character text splitter Json splitter tick token uh Etc right so you can use that depending on your use case I have a chunk size of 700 overlap of 70 then I'm splitting the documents then I am just creating a vector store a persisting it with a collection metadata that's what I'm doing now let's uh run this and see python okay why I'm writing it it says uh okay uh llm where sentence position argument two were given okay I forgot to define the model name here by the way excuse me sorry model name equal but this is so bad you know it should automatically when there's a string it should be model name right but I think there there are other arguments and qua as well that's why I think the error is fine I think that's that makes sense uh now once you do that you will see right for the first time it will take a bit of time because it's going to download the model for you from the hugging face W thing you can see it downloaded for me Transformer pulling Etc that they are using right and uh now once you do that it says vector stores created so our Vector store has been created and you can see it over here it says Insurance cosign you know it has an SQ light database 6 months back chroma was using duck DB parket right duck DB they were using it then they migrated from Duck DB to sq and that's why you see this new structure of chroma DB and then you have your everything here okay your bean files all the nend dimensional spaces where you vectors are stored right uh with metadata arguments Etc now your vector store has been created uh which is inside this folder and our inest is done now one once inest is done let's create something called retrieve now so retrieve retrieve dop or something okay I'm just going to call this retrieve dop now in retrieve dop what we are going to do is we're going to retrieve and see if our retriever is working fine so let me just copy this quickly this thing here from inest and we'll see what are the things required let me just put it here I don't need this couple of all of these okay I don't need it okay I need these two because these are the thing that we're going to use so I need the embedding model again because once I put a query that respective embeddings needs to be there and then we will pass that uh respective embedding for the query in the vector database or vector store and then that will retrieve the similar chunks for you that's what we're going to do uh let me just do a load Vector store here a load Vector store and here I'm going to write chroma and I'm not doing from documents you only do from documents when you are creating it once you persist on dis you just have to use chroma not do from documents that's not required now inside chroma what I'm going to do is persist directory that which directory I'm going to use as a vector store is stores and insurance cosine okay you can look at the folder name there let me see that ins and then we have your embedding function okay which is your model by the way so that's called function here in this class of chroma and then you pass your embedding thingy so let me just pass that let me do an ALT G this is how now you have loaded your vector store so once you load your vector store let's define a query so I'm going to use the same query I have here on the notepad by the way that I was showing you let's just get it and you can use any other questions from your data it's fine just to save some time I have just already copied it now what I'm going to do is I'm just saying docs equals load Vector store and it has a method that's called a similarity with similarity search with his score so I'm just going to do a similarity search to see if that is working fine similarity search with a score and that will basically return me the similarity uh search with a score I'm going to Define query equals query which is and then top K documents equal to three so give me three documents and then I'm going to use a for Loop for I in docs just to make it more readable Doc and score equals I python is fun right I and then print and you know some key value pairs here so let me just do a score and score is done then content so content equals which is nothing but your doc. page content doc. page content and then your metadata so let me just do a metadata thingy here so metad data okay doc Dot not page metadata it's only metadata okay so let's just do that perfect let's see if this works guys if this doesn't work then see that we I'm doing something wrong so let me just do python retriever not uh excuse me ah and it's not retrieve okay it's retrieve on let me see if the spelling is Right t r i e v e once I do that hope hopefully I'll get something here let's see it uh I'm expecting that it will generate some response and you can see let me just bring it here in the notepad Okay uh so you can read it better I'll make it bigger and you can see here let me just make it a bit space because we have top three documents and you can find out we got our response fantastic it says most group contracts are sold to business that want to provide life insurance blah blah blah group life insurance can be sold to associations to cover their members and lending institutions to cover the amounts of their debter loans and this is exactly what we see here in our data let me open that and let me show you it's by group right let's come to G quickly okay now G or something and you can see this is what it is group life insurance it says blah blah blah you know most group contacts are sold to businesses that want to provide you know life insurance you can see members Etc right fantastic well it's working fine now the embedding model has done its job and you can see with a very small embedding model it's done the job for insurance understand the taxonomy and the terminologies of insurance Industries let's go back and write the app.py thingy now so let me just write an app.py here quickly in app.py what I'm going to do is I'm going to copy a lot of things I don't want to write the same thing again and again okay now let me go back to my any chat vot I'll just copy this promp template Etc right I'll just copy this prom template from my 25 others rag video that I have okay I'll just copy the prom template a simple prom template and a few things I will also copy I'll copy this yeah let's copy this load Vector store as well exactly let's copy this to okay uh and you know what we are doing here so we are just using a prompt promt template template equals promt template input variables Contex context and question then I'm loading the vector store chroma let's get it from here quickly and put it here then let me just import stream lit as well so import stream lit as STD and then what I'm going to do is next is H let me just get this as well and this I will write in a function I'll explain that from top to bot bottom let me just write a QA chain and inside this QA chain I'm just going to use that there are some indentation problem let me just get it back once and then I can just do this as well once perfectly fine and just just return the QA now this is going to be the same guys you can see we are uh okay uh load Vector store and then let me just get a few things quickly from Lang chain. llms now here in the llm I'm going to import hugging face Hub right I'm going to use a hub I'll will show you how you can get that inference API now if you look at this model this is the llm that we are going to use it's around 80% lesser in size and it gives you the same performance as equivalent performance as 7B models and that is why I'm using it because this llms are fine- tuned for rag guys so if you want to build a rag you can just use this and you can see it's available through inference endpoint you can out over here it's also available through tzi text generation inference and if you come here on deploy once you click on this inference API it also shows you that how you can use it through tokens you know so API tokens and blah blah blah right uh inference API that's what I'm using okay here you need a token from hugging face uh settings you go to your hugging face accounts go to settings if you don't have an exess token create an exess token and just get it make sure that exess token is readable read read only tokens the WR is not required for this purpose now let me do a set page config or something quickly okay set page config for stream lit so s. set page config I'm going to give a page title uh page title equals uh what would I do insurance rag demo okay uh set page config and std. title okay uh and insurance rag demo now I need a do thingy as well so let me just do I have installed python. EnV so from EnV import load. EnV H load. EnV let's get OS as well because we have to get EnV as well retrieval qn is required so from Lang chain. chains import retrieval QA chain what else we need I think we are fine with this now okay uh let me just get this load. EnV thingy here very quickly load. EnV load. EnV is fine and then OS Dot inan and here you have to write the same anyway let me just get it because anyway I'm going to delete that okay so I'll just copy this variable come here in app.py and then just paste this OS do G EnV perfectly fine now this your API has been loaded now you have your prom template set now so you can basically increase the uh complexity in promt templates to avoid hallucinations or make it more structured or you can also specify some examples uh here just to retrieve better now you have your prompt you have so let's get the embedding first we need the embedding model let's get it from retrieve do PI that's why I've written that function as well right now it embedding sentence Transformer embeddings model name is okay uh let me just look it more better then load Vector store choma this will not be paid cosine this is from my previous repository you have to change to your cosine so Insurance cosine embedding function embeddings and then you write your retriever so retriever equals uh load Vector store okay so load Vector store and do as retriever so I'm going to use as retriever uh of Lang chain so as retriever to uh i v r as Retriever and here I'm going to pass the search quar so search quarks means topk documents and some of the other uh arguments that you can Define now here I'm going to just do k equals 2 only look at two top K documents the retriever is done now here you define a repo ID for your model so let me just do that for the llm that we are going to use so this is this is the llm by LM where you can see bling seared Lama 1.3 and the first version of the model 0.1 that you see now the repo ID is done now after repo ID what you have to do is you have to do llm equals hugging face Hub so let's do an hugging face Hub and inside this I'm going to pass my repo ID equals repo ID then you have your model quar so let me just do model quars where you can you know set up your inference parameters it's not hyper parameters are inference parameter that you can set up like temperature top P top K Max new tokens Etc okay and these are the smaller things that you should know guys hyper parameter you do it once you f tune or train a model not once you inference it now model qua and in the model qua what I'm going to do is pass a temperature so let me just do a temperature and let's keep it 0.3 uh it value can be if you keep it higher the model will be trying to become more creative might hallucinate a bit but it has it's it's not strongly associated with hallucinations I see a lot of people in Industry talk about it right that okay your temperature is higher model hallucinate it's not like that uh it has some role to play but Hallucination is a complex topic uh max length 500 let's keep it uh a smaller max length this model is good for chatboard kind of a thing where you want a very long detailed response this model is not for you this model is for if you want to build a chat bot uh kind of a thing or conversation interface now model quarx now after that let's define a chain type quarx and chain type quar is nothing but your prompt thingy so my prompt okay prompt equal a colon by the way that becomes your prompt that you have defined in line number 26 perfect then we have qhn guys and then after that uh once you have this QA chain uh let me just do a QA equals uh QA chain and then we are going to use this QA uh in our exem L code so let me just write a main function so Define Main and let me also quickly write if underscore uncore name underscore underscore equals and this is bad right you can see tab L is not suggesting me once I need it okay maincore uncore and then colon main here we go okay now inside this I'm going to write my all of the thing the first is text query which is a text area so St do text area xt. text area and here I'm going to write ask your question and let's give a height to it so height equals let's keep it 100 okay it's fine height equals 100 then let's have a generate button response kind of a thing so generate response button generate response button and let's define SD do button uh SD do button generate response or let's call it R run rag run retrieval augmented generation now I'm going to say st. subheader let's already keep a subheader for response and let's call it response response and then after that I'm going to say if generate response button and text query there a value or the button has been clicked and there's a value inside text query you can also do with length text query greater than zero or if text query is not none you can also do that I'm going to have a spinner that will spin with sd. spinner generating response generating response blah blah blah with HD spinner and then you just do ah okay with a width then you write text response text response equal then I'm going to use the QA that we have you know used above and pass the text query here we go text query and then say okay if text response ah it's response it's fine okay you can let me make the change by the way because you'll be using the same GitHub repository right so if text response is hd. WR and I'm going to do text response hd. success and here I'm going to do response generated and then if not then just give an SD do error that hey I'm not able to generate the response response not generated we are done with this guys now let me show what we are doing here very quickly we have imported all the libraries we're using stream lit or setting the page config Etc loading the envs you know from that EnV file then I have a prompt template I have a prompt where I use prompt template Define find the input variables couple of variables one is your context that comes from the retriever part of it from the vector database and the other is your question that the user will ask both gets combined with your prompt and then it goes to a large language model to generate a humanik response that's what it does now you have your embeddings we're using LM where embedding but insurance then load Vector store chroma from persist we are loading it now then we are having a retriever thingy as retriever search quars blah blah blah and then we are just using an M uh that you can see from hugging face inference API the C Lama bling where no GPU is required you can also download the model locally and do it through Transformers pipeline I think you can do that much right which which is shown here as well you know they have given some code you can look at here how you can use this code okay this is a detail you can use Transformer pipeline to do that right Auto Cal LM you can use it to do that Etc because it's a text generation task now you have your qhn retrieval qn type you know defining is stuff uh Q blah blah blah streamate kind of thing to run that let's do it okay let me see if we have imported everything we have chroma we have you know OS retrieval prom template we do not have a prom template so let's get that so from Lang chain dot prompt uh it's prompts from langen DOS prompt import prompt template so this is what the prom template is we have our sentence Transformer embedding hugging face retrieval keyway uh we have chroma let's run it now and see if we get any error now what I'm going to do is extrem lit run ah ex stream lit CMD by the way sorry extrem lit run app this is once I do ex stream L run app.py you can see what it say it says Oh cannot assign to function call here maybe you want mean okay this is instead of this this is okay you can tell me what are the mistakes I have done here it's not but it's not a method right I this is my bad this is my bad this is my bad this part is not a thing which is a right now it's loaded let me just close this man okay uh let's do a can I do a rerun kind of a thing let me see it gives you something I don't know if there's there's still some errors we'll see it out for the first time it will take a bit of time let's use the same question quickly uh and we can also ask one more question and see if it's working fine or not let's use that here run rack kind of a Thing Once I do it uh you can see it's going to generate some response for you okay you can see it says generating response and fantastic right uh we are using llm whereare embedding model and llm and we get the output here query is what is group life insurance and it give you group life insurance is a type of insurance that is sold to businesses that want to provide life insurance for their employees fantastic uh this is great loved it Lov the response let me ask one more question quickly what kind of question we can ask what is inflate inflation protection okay let me ask this question quickly what is inflation protection and see if it's able to generate some response and you can try it out for the complex use cases where you see there are chances of hallucinations you can try it out this model I also wanted to test this model and in the first go on a very initial experimentation I like the model and the embedding model both you know which is very lightweight to work with very easy to deploy as well right so let's try it out and see uh and let it generate uh the the code is available on my GitHub repository you can find it out over here llm rag demo app let me just do one thing uh I'll open extrem L cloud or something like that okay extrem lead Cloud okay let's see sign in so slow continue with GitHub the stream is something that where you know you can uh deploy your uh and you can see we got inflation protection as well so inflation protection is a type of insurance that pays a lumps some account lumpsum amount of money if the insured insurance policy is cancelled insurance company offer wow this is fantastic great response from 1.3 billion model I really loved it I think the credit goes to llm Weare and the AI blocks team I'm more than happy to you know see that the future llms and the models that they are going to build now I'm going to create a new app uh let me see that um let me see if I able to get something okay what is this file name where is this file okay it's I have imported everything right the EnV is not there but it's fine so we have app.py I'll just show you probably I'll just not show everything but you know this is how you can deploy if you have an stream late app domain is available in advanced setting you can have your secrets let me use Python 3.10 and let me use this from EnV okay I'll just try it out if this works otherwise you have to see it okay save deploy maybe some requirements also we have to add but this is how you can deploy an app okay it gives you 1 GB free space to deploy so if your entire folder is not bigger than 1GB of course llm saving locally will not help you to deploy like this on free stream lit Cloud but I'm using inference API so I probably I think I'll be able to deploy it I'm not sure let's let's check it out and see if that that's the case but if not then you you just got the idea right how to deploy and you can see the app is working fine you know and you can find out the deployment over here you can see it says click on manage app it's know installing everything the problem is that you have to do a bit of uh work around here because for example you know you you just saw nvdia NCC and all of those things have been installed I think we don't need it so we should have just said torch CPU so it would only have installed torch CPU but it is fine anyway uh you can deploy your stimulate app here Insurance rack find the GitHub repository here I'll give the link in description find the models here by llm whereare find the embedding model here by llm Weare and this is this is the hugging face repository of llm Weare okay so this is fantastic GitHub repo and that's all I you can see it says installing successfully uninstalled let's let's see let's wait for a few couple of minutes and see if it's able to deploy it because I I believe that will exceed more than one uh 1 GB of space because the embedding model is around 400 MBS okay of size and then you have your all the required dependencies that will again be around I don't know 4 500 MBS again because you have torch you have sentence Transformers Nvidia you install something is not required but uh let's see out one more minute and what you can see it says your app is in the oven right I I love the ux part of it uh they would be using some socket programming to get the uh status from this part of it that if that gets done it will load the app it's a containerized application they use container to run this but anyway it's taking a bit of time guys you know if it gets successful I will give the link in YouTube description and if it's not I will not give it so please excuse me for that but this is the video is where more towards covering lmw models and Ming models because few of you also requested it I could see the comments and I hope you loved it I liked it a lot you can see the performance fantastic for chat Bots it's really good where you need probably not need a longer responses for that it is good that's all for this video guys you know if you have any thoughts comments feedbacks please let me know in the comment box if you like the video please hit the like icon if you like the content I'm creating please share with your friends and to pier and also subscribe the channel if you haven't subscribed the channel yet thank you so much for watching see you in the next one

Info

Channel: AI Anytime

Views: 5,270

Rating: undefined out of 5

Keywords: ai anytime, AI Anytime, generative ai, gen ai, LLM, RAG, AI chatbot, chatbots, python, openai, tech, coding, machine learning, ML, NLP, deep learning, computer vision, chatgpt, gemini, google, meta ai, langchain, llama index, vector database, llmware, LLMWare, DRAGON, BLING, AI Bloks, RAG on CPU, RAG Chatbot, RAG video, llmware ai, llmware huggingface, Hugging Face, huggingface, llmware LLMs, small models, RAG production, namee oberst, CPU RAG Chatbot, vector DB, AI models for chatbots

Id: qXEUqhqjHdg

Channel Id: undefined

Length: 43min 40sec (2620 seconds)

Published: Sun Dec 10 2023