CPU-based SLMs for AI Agents and Function Calling by LLMWare

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to AI anytime channel in this video we are going to explore the slim models by llm Weare so that's their newest Innovation where they have created this structured language instruction model the acronym for slim models which has been fine-tuned specifically for function calling so if you are not aware about function calling I will give couple of videos Link in description where already have created videos on function calling function calling basically helps you uh when you want to connect with some external or even internal data sources you know if you want to you know connect some plugins tools on prompt level when you are putting a prompt as an end user into your gen system how that can be you know classified on prompt level and then basically you know invokes or triggers one of the respective function okay and that's how basically you know it works it has been started by open AI was in Trend and now you know we have seen function calling with open source and close Source llms now this helps you when you really want to you know get structured output like Json and you know different structured schema uh to work with and that's that's how you know llm wear uh you would have seen their models on hugging face they are like specialized in rag driven uh models and also the small language models they have created a number of language models which is available on their hugging fish they also have created some embeddings model they also have some data sets available so basically they are trying to strengthening the open source Community you can find them on hugging face now today they have released uh uh slim models and that's what we are going to look at in this video I will show you that how you can use slim models to perform n numbers of natural language task okay you can do sentiment analysis you can do emotions detection you can do you know topic modeling you can do uh you can also extract T TXS between your Text corpus from your Tech Text corpus you can also do uh text to SQL basically that I will do in the next video that video will also come after this now you can perform all of these task that to on CPU machine so all these models are CPU friendly models and that is fantastic right and these models can also be utilized you know when you want to do function called in your um llm llm powered applications and we'll see how we can do that I'll just give you a glimpse of that now if you look at over here let me just wear my glasses I can't see I'm a uh uh I'm a blind guy by the way uh if you look at over here uh llm Weare on on the screen it says llm Weare AI uh us-based uh startup and they have been doing some great uh work in the field of gen and that's their recent model uh you can see slim ggf basically I'll explain that to you so they have around right now 49 models you can see it over here on the model uh Tab and uh if you look at all of their recent slim models so there are two categories one is the slim which like like a base model and they have the quantized version of this particular model that's basically a ggf version which is the km K medium version of that and if you open one of these for example let me open uh let me open the slim sentiment because the sentiment tool is already open over here in the other Tab and I'm just going to open slim sentiment right click open that and let's read it it says slim sentiment is a part of the slim which I told you about structured language instruction model so it's a part of the slim model series consisting of small specialized decoder based model okay fine tune for function calling look at this statement which is a particularly fine-tuned for function calling slim sentiment has been fine- tuned for sentiment analysis function call so for example if you want to build an application where you see a use case where the end user will you know put some text and you have to invoke the sentiment analysis function call how can you leverage this particular model that's what you can do and it's less than one B you know all of this model that you see there are less than 1 billion parameters because average size around 660 670 uh uh uh million parms and they are all quantized model if you go to fil andent version you will see this model model is around 2.2 GB which is like the base model of uh slim sentiment not the of course this when I say base model don't look at a pre-trained right it can be fine- tuned further and they would have fine tuned it but it's the difference between slim sentiment and slim sentiment to is that's a slim sentiment tool is a 4bit km quantized GF you can see it says 4 km quantise ggf version of slim sentiment so this is the model that they use to create this particular model which is the quantized version of the model it says providing a small fast inference implementation because you know when you're building an application and you do a function call you expect the latency should not be any s you need faster latency over there and it says optimize for multimodel concurrent deployment fantastic right now they have given some uh instruction how you can leverage this but let me show you what I'm going to show you that uh what we have built already and how you can build this I have built an application that's called perform NLP tasks on CPU and I have leveraged all of their models except the SQL model that will be a separate video all together text to SQL but so far we can see here we have default which is sentiment analysis we have emotion detection we have generate tags for example sometime you want to extract tags from your Text corpus so you can create an in in a rich metadata for your rag driven use cases which is which is really important right when you can when you enrich your metadata and then you you create an hybrid search combining both Vector search and a keyword-based search like bm25 retriever then it performs even better right so tax can be helpful tax can be helpful in your cash as well if you're building some cash based mechanism to you know retrieve uh redundant or duplicates uh answer that's you can also use tax then we have topics we have been looking at the topic modeling you know there are other algorithms natural language algorithms where you can use to create topic modeling but this is like using a language model to do that okay then you have intent you also have perform intent so you can you know classify intents and things like that then you can also get ratings get C category you can also do n named entity recognition you would have worked your whole life with spy and models like where you have performed named entity recognition then we have natural language inference it can also give you a natural language inference to do that now when I ran this you can see there's a text let me just call copy and show it to you what text I have put it over here let me show that in notepad now this is the text I have taken it's a it's about Nokia you can see Nokia said it would cut up to 14,000 jobs as a part of a cost cutting plan following third quarter earnings that plunged the finished tele uh the finest finest telecommunication giant said I I read it finished because I I thought Nokia is finished but it's not the finest telecommunication giant said that it will reduce its cost base and increase operation efficiency basically they are doing layoffs by the way uh and if you look at the sentiment analysis which is selected by default and you can find out a very structured response you know you can use that as a Json you know and basically you can pass it out depending on what kind of applications requirement you have and you can see llm response the sentiment is negative right and I have done sentiment analysis now what if I just for example if if I select uh identify topics and I click analyze now I'm expecting that it should identify the topics for this particular you can see how fast it is you know it took around four to 5 Seconds to get give me the response and I'm running this on a CPU machine I will show you that how we can build this application this is a stimulate application how we have built it now you can see it gives you the topics called layoffs which is fantastic because it's been talked about layoffs in this now you can also select multiple for example if you want want to you know uh perform natural language inference and if the model is already not there so what it will do it will Fitch the model so for example if you are running it for the first time it will take a bit of time because it has to download all the models from hugging face and has to keep it in a cache or memory of it so if you look at over here it says cost cutting it identifies the topic it performs the nli you know supports you can find it out over here the natural language inference right if and the Logics are true for this maybe we can make it false logic by default it might be true now we are able to identify topics let's let me remove nli and uh and let me bring it emotion and once I do analyze I'm expecting it also detects the emotion right so this is this is what we are doing on a CPU now imagine if you are building a natural language uh uh you have to perform n number of natural language task you have built an application where you have to bring all these capabilities how you can use llm whereare Slims model fantastic right it's open source you go ahead download it set it up locally you don't have to worry about lot of infrastructure because it runs fine on a CPU machine as well and if you look at the emotion here it says sad because we talking about layoffs and people will be sad you know when you talk about layoffs and you can see the topics fantastic right uh you can select n number of other things over here I'm not doing that and I will show you that how we have built this and uh I also tried a notebook I'll give you the notebook as well but this is how you can do so what you have to do is you have to do pip install llm Weare and I'm on a CPU you can see collab CPU you do pip install llm Weare it installs it and then you do from llm wear. Models import model catalog so they have a model catalog you know where you can they have they basically a model catalog class where you will find out all different functions like get llm toolkit now when you do model catalog. getet llm toolkit it get all the slim models so there are 10 models right now there are 10 slim models available slim means structured language instruction model which is fantastic now Slim models delivered as a small fast quanti tools and then you run one of the tools like this if you want to test it out you just run it it has some preconfigured text for you that will generate some result you can see you know talking about Alaska Airlines you know I guess blah blah blah right now this is a testing you can do it this is their GitHub repository by the way if you don't know I will give all of these links in description you can check them out now this is how I have built it so I'm not going to build this stimulate application because it's pretty simple I just walk you through that how you can build it now in the requirements.txt you need two requirements if you're building a stimulate app which is lmw and stream lit now I have created a file llm whereare module. piy where I have you know I you don't need prompt by the way because right now we are not building a function called ability in this uh video but you can also use that now here what I'm doing is I'm saying from lmw do models import model catalog and then I have created nine different functions I am not keeping text to SQL that is the upcoming video and first is you can see classify sentiment which is taking one text as an input it takes text you can define a bit of schema as well like it's a string and kind of a thing you can do that now it says classify sentiment which is T next and then I'm saying there's a variable called sentiment model that basically uses the model catalog you can see the model catalog it's a class it says the model catalog is the main class responsible for model lookup of one model card finding model class and they have lot of other functions you know module available in this now model catalog. load model so I'm using this function called load model now this model I'm saying as this is a classify sentiment use this tool called sentiment tool what it will do it will Fitch the 4bit km quantized model the ggf model by hugging from the hugging face repository of llm where and then I'm just running this on the text you can see get logic false and things like that now all of the below functions remain same the only thing that will get change is of course the naming conventions will get change uh for more readability but the only thing that gets change is the model name the model name that you have to change you can see I have slim sentiment I have slim emotions I have slim tag slim topics intent ratings category ner name n recognition and natural language inference I have created nine different functions I have written in a separate python file you can keep it in the same but it's better to write in a separate file uh if you you can also create a class and put that there for more oops concept if you want to follow and then I'm writing an app.py I am you know importing the module you can see lmw module import all of these functions you can also do uh a stck or Star to get all this but I'm just defining it here classify sentiment and I have import streamlit and I'm saying give it a title that's my title and and then I'm giving a text input I'm saying st. text area into text area I basically text take your text and then you have an analysis tools you where you know you define all the it's a multi select so you can select multiple by there will be a default one which is sentiment analysis by default selection you can see and then I'm executing I have a button over there that's called analyze which gets an empty result you can see and then I'm using a condition for depending on what has been selected in the multi select feature so if the if the sentiment analysis has been selected then you trigger that if emotion has been selected then trigger this so on and so forth right depending on how many of the features that you select in the multi select and respectively the model will get called the function will get invoked from the module and then I'm just saying you know for two response in result. items we have this results over here and then just printing it as a Json the reason we do function calling guys because we need the structured output that helps you uh you know like for example if you want to classify sentiment detect emotions uh you know have to generate text topic modeling all of this can work you know if prompt Remains the Same but all of this does their job and that's what we have built over here and once you run this you can see the output over here how fantastically it's working it's fantastic you know all the responses are really correct I have tested for a couple of the prompts but you can also try it out so what I will do I'll give this on GitHub repository the entire application we can also Host this somewhere deploy it as well but I'll give it on the GitHub repository you can try it out yourself and let me know what you are building on top of this uh slim models by llm Weare and this is what I wanted to create in this video guys I hope uh you will take uh this code and you will get benefited from if you are working on a compute limited Hardware like a CPU or a consumer GPU and if you have any thoughts feedbacks or comment or anything related to this project please let me know in the comment box you can also reach out to me through my social media channels find those information on channel Banner or Channel about us if you like the content I'm creating please hit the like icon and if you haven't subscribed the channel yet please do subscribe the channel that's all for this video guys thank you so much for watching see you in the next one
Info
Channel: AI Anytime
Views: 3,084
Rating: undefined out of 5
Keywords: ai anytime, AI Anytime, generative ai, gen ai, LLM, RAG, AI chatbot, chatbots, python, openai, tech, coding, machine learning, ML, NLP, deep learning, computer vision, chatgpt, gemini, google, meta ai, langchain, llama index, vector database, AI Agent, Function Calling, ai agent, function calling, AI Bloks, NLP tasks, SLIM models, openai function calling, function calling using open source, streamlit, streamlit project, CPU models, small language models, small language model, SLM, SLMs
Id: 0MOMBJjytkQ
Channel Id: undefined
Length: 16min 3sec (963 seconds)
Published: Sun Feb 11 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.