Unlock AI Agents, Function Calls and Multi-Step RAG with LLMWare

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone welcome to today's session that we are thrilled to announce the launch of Slims the objective of this video is to really walk you through what are Slims what can you use them for and how do they really unlock some Next Generation kind of use cases at the Nexus between agents function calls multi-step rag all deployed on a CPU so what are Slims a slim is a structured language instruction model these are small specialized function calling llms that have been carefully fine-tuned to provide structured outputs that can be handled programmatically so instead of outputting just standard text they output python dictionaries they output Json they output SQL outputs that you can then handle programmatically in a more complex workflow now what would you actually do with this let's start with an example to really motivate the problem that we're trying to solve well on the left you see a very simple but very representative type of transcript some type type of input text that you may see into some llm based process this could be feedback that you got from a customer could be an earnings call it could be any other form of text that's coming into the Enterprise now what did we typically see in 2023 it was so 2023 that was a long time ago right but what would we typically see in an example like this um it would be some form of how do we create a prompt how do we create that prompt to generate some type of summary the summary itself will be typically multiple bullet points or longer text or a few different paragraphs and usually we'd be using some type of cloud-based llm this would be the proof of concept the hello world example of look we can summarize an incoming piece of text but what's happened to all of us is that tldr is tldr so text coming in text coming out even if it's a little bit less text it's still a lot of text to process and so were we believe that the market is really going that was geni 2023 that's old school geni 2024 four We Believe fundamentally is about multi-step processes being deployed through highlevel apis conceptualized as agents it's about thinking about llms as function calls providing structured outputs it's about enabling not just another open-ended bullet pointed kind of report but how do you really start to deliver structured reports with well identified keys that ultimately map up then with Enterprise processes and Enterprise data and then actually how do you do all of this privately and impr private Cloud what we believe ultimately want where things are in 2024 is the the output of that whole process then is not another big report it's something that looks a lot like this decomposing this through a whole set of structured steps specialized models going out reading the text identifying key things and automatically packaging it up and generating it it's a really nice dictionary report like this so to maybe lay it out in a slightly bigger picture in a minute we're going to take a look at the demo but just to motivate it you have an incoming text some type of work item coming in you then want a whole series right a whole series of different models that can be stacked together that can be orchestrated to form an end-to-end process to go through multiple steps of analysis the first model might come in and do some type of extraction of key information perhaps of named entities or of people you then may want to pass that to a classifier model that's going to be looking at some soft skills type of classification perhaps it's the intent or sentiment or emotion or overall topic category based on that route it in different directions perhaps we want to go then and Route this to an overall question answering model answer and ask a bunch of questions or perhaps we want to based on the classification that we've received do some additional look up run a query against an Enterprise data store and then finally we want to package all of this work up write some type of report and have that seamlessly integrate into some business's usual enterprise process now underpinning all of this is a SC scalable data pipeline of an AI ready knowledge base that consists of documents that have been text chunked and parsed and indexed have been vectorized having run through an embedding model but it also increasingly includes SQL table data most of the valuable data in an Enterprise is SQL table data how does that start to connect to these types of workflows again we see that as one of the key things to really unlocking the potential of a lot of llm based automation so this is in a nutshell the problem that we're ultimately trying to crack and where we believe the slim models and the slim models are every single one of those di the diamonds on the chart by orchestrating these models we believe you can do some extremely complex and Powerful you know multi-step Automation and you can do all of it running on local Cloud so hopefully that motivates the problem we are going to have a whole series of both videos and examples that walk through all the technical underpinnings the technical specifications of the slim models how they work how you get them all of that but what we want to do now is we want to flip over we want to show you a running demo based on the scenario that we've mapped out here okay so now I've flipped over to the IDE let's take a look actually at a demo scenario using Slims in LL mware to generate that kind of multi-key structure dictionary report now if you recall from the PowerPoint slide that we looked at this was uh the customer transcript so we've just passed it in now um S A String we're going to go and we're going to call this method that we've created the multi-step report and I'm going to walk you through this quickly again we have a lot of other tutorials on the nuts and bolts of how to use these models but I want to at least start to orient you to some of the ways that we're creating an agent framework we think is a very simple way to start orchestrating these function calls using slim models so as always you'd load in from llm whereare we're going to use primarily this llm FX class we're going to use that llm function class to go instantiate an agent we're going to load in the work from of the customer transcript we're then going to load up a bunch of tools and each of these tools are 4bit quantized slim models you can identify them here simply by name by the category we're going to load in a bunch of those tools and again just just a reminder all of these are models llm based models are going to be running locally on this machine we're then going to run the agent through a series of analyses we're going to go get the sentiment we're going to get the emotions we're going to get the intent we're going to call on the ratings we're then going to run through NE topics and tags we just show a few different ways that you can easily call and invoke these we're then going to ask a couple of questions so we're going to say what is a short summary what is the customer's account number and username we're going to unload some of these models again remember they're all running locally on a CPU then we're going to show you the report then we're going to show the activity summary and then some of the output that we get from this so with that perhaps as a backdrop you can see how intuitive how easy we've really tried to design this to be a very nice highlevel API that you can do some powerful orchestration across multiple models so let's dive in and let's see it so let's go ahead and run now what we're doing we've just loaded all the tools and now we're off and running so each of the tools are processing we have journaling capability that's been turned on so as each tool runs we actually see this running journal on the screen showing us step by step by step as the tool is loaded s the tool generates some inference and then the output and we're done so now what we just did and I'm going to go back and I'm going to show you the results we just ran through nine different inferences we used eight different tools eight different models all running locally highly specialized delivering structured outputs and what this generated for us is this right here it generated the report for us all as a nice structured predictable python dictionary all those models were able to orchestrate all the state was handled by that llm FX class and we're able to get a really nice distilled structured report out of this and again all all running locally all with small specialized slim models running under the covers now let's take a minute let's walk you through what we've got here because it's pretty cool you you can turn off this this is verbose mode you can turn it off if you don't want to see all this stuff on the screen we found it's actually a really cool way to start conceptualizing some of these processes and all the different things that are happening it also becomes a really powerful debugging tool as you're working on some of this to identify what are the steps what's happening and where potentially is some multi-step process breaking down so to start with we loaded all of those tools please note if you run this locally the first time it is going to download and cache those tools so it will take maybe 30 seconds to a minute to start downloading and caching those tools once the tools are actually loaded we start running our function calls so we ran a sentiment classifier you can see the output of it then is a nice python dictionary of sentiment with a list then of results in this case the one value that we would expect is negative you get all the the details then that the output type is a dictionary the usage information around it and then we go and we analyze it um and what we've actually done is we've built up a methodology around the ljit analysis for these types of structured outputs so what you can actually see then colorcoded is the confidence that the model had in the output that It produced we produce an overall confidence score you can then look and see several of the choices that were considered so in this case 79 negative. 16 towards neutral 04 towards positive there there was very very little signal that this was going to generate a positive sentiment all of that is then provided then we move on to the next one so emotions we saw that the emotion was annoyed but it's interesting to look at some of the other choices and you can see pretty clearly angry Furious were some of the other choices that were being considered around emotion this color coding gives you a great way to see the overall confidence level that the model had how many other choices were being considered in delivering that answer customer service was the intent the rating is on a scale of 1 to five so our rating model you could say it's a degree of sentiment five being very very positive one being very negative it was spot on here that this was a very negative call but you can see the next choice again very sensibly about 22 it was going to deliver a two again gives us a great in indication of some of the choices that the model made some follow-on analysis that we can do in postprocessing the ner analysis extracted the key people it extracted the location information again you see the really nice visualization of it the topic in this case not that helpful it was a query that it generated the tags are really interesting this actually gives us some tags automatically generated of some of the key pieces of information that are being pulled out of that text that we can use then to enrich this if we want to do follow-up queries we want to embed this in some type of text index here are some tags autogenerated that we can start to attach to that and then we go and we ask a couple of questions we ask for a short summary Michael Jones has been calling mixco for the past four times he's not received any response we then put all of this together along with the customer information query that we asked and you see this beautiful report that we've just been able to generate you can see how simple the code was how we've been able to orchestrate eight different models all coordinated and generating this type of output and then what we actually get is this type of response analysis tracks step by step by step what tool was used the inference and all of the metadata associated with it so we hope that you've enjoyed this example stay tuned we've got a bunch more videos and a bunch more examples to start to break down these individual components so you can start writing and deploying code like this yourself thanks again everybody take care and have a great day any questions as always please check us out on Discord

Info

Channel: llmware

Views: 5,488

Rating: undefined out of 5

Keywords:

Id: cQfdaTcmBpY

Channel Id: undefined

Length: 11min 45sec (705 seconds)

Published: Sun Feb 11 2024