An Introduction to LLM Agents | From OpenAI Function Calling to LangChain Agents

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] what's up folks welcome back to the channel in this video I want to talk about agents so I want to do a little introduction and this part of presentation that I've been doing for or media I'm an instructor there and I do live trainings about building stuff with Lang chain with agents with Doge language models prompt engineering Etc what I would like to do now is just go through a basic introduction on agents and some cool interesting things you can do using Lang chain to build really powerful and interesting agents so let's get started so this is my presentation we're just going to go through some machine learning engineer I'm also an instructor at riy and I'm very curious about all things intelligence now uh regarding agents we're going to be looking at uh agents as the combination of thought and action and then we're going to be talking about the definition and we're going talk about tools we're going to talk about this idea of agents in three complexity levels and we're going to talk quickly about the Open Eyes function API then we're going to talk about lynching as a framework and then uh building agents with lying so what are the things that are important to think about and then I mean in not in this video but in the next video I'm going to be um sharing some of the demos that I've done for this particular presentation that I think could be really cool uh so we're going to skip the demos today but we're going to do demos at um for the next I think two to three videos and I also have a bunch of other videos where you can look at how to build agents with link chain Etc so let's get started now what is an agent I like to think about in simple terms about this idea of combining thoughts and action what I mean by that is how do we do anything right now we think and we act kind of obviously and one example of that is and I use this example in my presentation decision making process for like attending the a live training for example right you might think I want to learn about agents and then you might say okay so I'll go online and you research cool platforms where you can learn about agents and then you think okay so Riley has some really interesting courses in live trainings and then you take an action which is looking at or courses and then you think all right live trainings by instructor Lucas are awesome no bias at all and maybe you schedule live training uh with me look at that now Jokes Aside uh the idea of thoughts here um is uh very reductionist this a very simplistic definition right H it's all about thinking what to do and then plenty ahead meaning prioritizing and ordering over the things that you're going to do and action in this context means the usage of tools like searching browsing the internet Etc right so in essence what is an agent well an an agent is nothing more than the combination of an llm plus and tools now this is a very simplistic definition of an agent and in it is in the context of the current surge of applications for large language models so obviously an agent can be defined in other ways if you're new to you know if you're new to large language if you're new to AI if you have no idea what I'm talking about uh we're not going to go into details about large language modules I do this in other representations but an llm is essentially a model that can predict the next word or the next sentence based on some previous context right it will calculate the likelihood of the next word and it will output some completion that makes sense for what it has seen in the past which we call context right so in this this case I'm doing a very silly example uh maybe if maybe a model will take in a phrase like I love eating and the output of that LM will be pancakes that would be true for me I don't know if that would be true for you now a tool is just anything that allows this model this reasoning engine this llm to perform actions in the real world like for example a pancake maker will allow you to make pancakes in the real world I know I'm I'm hilarious uh one of the first papers I remember reading uh regarding this idea of combining LMS with tools it's a paper called to forer pretty cool paper and in it they show that l could teach themselves how to properly call external tools and this is an example where to perform some completion on some text the llm understands that it needs to call some tool and then it calls this QA tool in order to get a response and that response gets fed back into the uh text gets fed back into the completion Etc and then we have a response right and they show examples with calculator with Q&A and I say in Wikipedia search right so Tools in this context are all python functions so python code that the model can decide to execute and then they set up a system that allows the llm to make that decision call the function it doesn't call directly right it just outputs a string indicating that the function should be called and then there's some hack around it so that that decision can be translated into an actual action that calls the tool and performs the action of like searching Wikipedia calculating some MTH problem Etc all right cool now another fundamental paper to understand agents that I think is Paramount for the current useful applications that we have of Agents today is a paper called react and in this paper where they discussed this idea of llms for reasoning in action and essentially they they discovered that you could prompt the models prompt llms like jpt to uh inter leave thoughts actions and observations in such a way that when um the model is trying to solve a problem the ele is trying to solve a problem like for example here it begins to tackle the problem by saying I need to search I need to search X right it's saying right here I need to search Apple remote and find the program Etc so then it takes an action the action is the search which is right here and then there's an observation from that action which is the remote is a control introduced in so it discovered that from the search and it goes on and on interleaving thoughts actions and observations in that manner so all of that to say uh these true papers I think there obviously there are many many more papers right and and we're not going to tackle the papers because this's not a technical introduction to what agents are this is just to give an idea of what when you hear about agents out there what are we talking about what are agent how are how do they connect to lolms and what is interesting about them right because they're getting very popular like this is a graph from a paper called the survey on large language model based autonomous agents and you can see that you know over the last two years there's been a surge of papers about uh using llm agents to do all sorts of stuff and that's interesting because you know with the surge of Chad PT and other llms that could actually do really really useful stuff they realize that if you put that as the reasoning engine in the center with access to tools and being able to manage you know a bunch of stuff like you know performing search executing code Etc you could have a really powerful AI that you know has behavior that can perform actions and not only just output text right so uh there are many popular agent implementations we're not going to go over like all of them and you know some of one of it is Babi which is one of the first ones that did this thing of combining llms to tools and uh they did a really interesting uh work in having these task prioritization task execution and task planning separated you have these separate modules to do these things so you have autog GPT that does this long running you know open-ended go Pursuit uh and this is one of the most this is a famous repository that you know has like a lot of has gotten a lot of popularity and can do some pretty cool stuff uh you have also GPT researcher one of my favorites it was actually featured Harrison chase the creator of Lang chain U featured it in one of his videos about you know building a research assistant and I kind of uh played around with it and read through the code and GPT research is one that I really like because it's all about combining an LM with tools that allow the llm to perform research so you end up with this really cool research assistant that can do some pretty cool stuff H you also have openg GPT which is the open source customizable agents that are supposed to be the open source version of custom gpts which are uh tools today that we have access to by open AI that allow you to customize your own chpt to a particular purpose and this is a really cool implementation as well now unathi mentioned agents and I also think that you know every time undery says anything I always pay attention and he makes a really interesting point in this video that I definitely recommend you check it out about why you should you know play around with agents I think that in this image it's not showing very well but I put a link in the description so let's talk about agents in three levels of complexity now when I talk about complexity here I mean complexity of you know implementation in a very raw sense because I think it's cool to think about how you can connect a model that only knows to read and output text to tools that perform actions in the real world and it's actually not that you know bizarrely complicated the way that that is set up the thing that is complicated in know is how you get to train a model to have amazing General performance across many tasks now at the level one uh this is a video I did before and that has gotten a lot of popularity and I'm really happy about that and I kind of integrated into my presentation about agents because I think it's a very interesting and intuitive way to understand how they work now level one is about putting python functions right code inside of the prompt that you sent to another L right so that was inspired by two former and I mean there I think there are papers before two former I'm not very sure I might be wrong about that uh if I am please help me out in the comment section uh however what is the basic idea the basic idea is that you might have some code like this right so I have some functions to create directories to create files to list files right and this is just the the general way that you uh this is how you call HPT right and what we're going to do is we're going to put everything inside of a prompt like I'm doing here so I have a description of a task right and it's saying create a folder called Lucas D Master because you know inside I'm 12 years old okay so that's the the the task is create a folder called Lucas the agent master and then the output is I call chpt however when I call chpt I not only give the task description and right in my prompt I add that the model has access to these functions and I actually put the functions there with some quick simple descriptions of what these functions do and I add your output should be the first function to be executed to complete the task and the output should only be the python function call uh nothing else right I kind of try to help the model to see if the model can learn to just call that one function and when you run this you see that you get actually a good result and I have a repo I think with this implementation I'll put that link in the description so you the model can call the right function to execute the task which is pretty cool right now all we need now would be to find a way to execute the function so you can use EAC from python which is a builing method that allows you to execute functions inside of a terminal right and you can have something that looks like this EAC and this should not be called Model I apologize this should be just EAC it should be just EAC output I have to change that right and what happens is that you actually get to create the folder is that before I had it as a class so the model dot was just to execute a method of a class so the model makes uh outputs the correct text to call the right function with the right parameter Etc and what you're doing is you're setting up a hack using Python's EAC method to execute what that function call was right now some of the limitations of that are that you know probabilistic outputs make functions call make function calls unreliable which is annoying you have a need for structure ways to prepare the inputs of the function calls and you know putting Tire functions inside text prompts is not exactly amazing right it's clunky it's not scalable it's it's weird so what should you do well you could use open a functions for example usually I would do a Q&A but this I'm doing this video for YouTube so no Q&A right now however ask anything on the comment section I read all the comments and I always try to answer everyone that asks relevant questions the open ey function calling is really cool right they trained a model to be able to be calling function python functions so they train a model to be good at that and what they did is they connected models to outside Tools in a standard way and the steps to do that and you can check that out in the docs for the openi function calling I'm just going to repeat it here so you can call the model with a query and a set of functions that are defined in the functions parameter so you have a standard way to define the functions for the model and then you choose to call one or more and if the cont and if so the content will be string five Json object that can adhere to our custom schema and you parch the string into Json in your code and you call the function with the provided arguments if they exist and then you call the model Again by appending the function response as a new message you you let the model summarize results back to the user let's take a look at that in practice right so you would have a function like create directory right so you have this function so here is where we specify we Define the function for the model essentially we're saying okay so this is a function uh this is the name this is what it does um and then properties it has this parameter called director name of type string with this description of the parameter in this case right and then you put it inside of a list right because you could have more than one function then you write up a little a little function called run terminal task this is essentially I just made a modifications for a more simpler example but it's just from the actual open ey function calling documentation which I'm going to put in the description so essentially what you do is you have this messages parameter here which is the traditional mes parameter that you would put in the chat API right you have the function you have the functions that you created right so the tool is right here the tool to create directory and then you call the model and you set tools to tools to choice to automatic and then you get a response and then you have you check if there were two calls in the response right if a tool was called during the response the model identifier had to call a tool and then called a tool okay if that that happened you go to step three and the step three is to parse and execute the function so what you do is okay so you have a set of available functions right and then you Loop over the two calls because there might be more than one two call and you say Okay function name two call function name and then you uh you get the function uh you load the arguments you call the function this is the call function calling the function okay and then you pend um you append the response to the messages parameter right and you append it in a standard way so you will pend a dictionary with the keys to call ID the r in this case the r is not User it's not system it's not assistant it's tool and then the name of the function and the content of the response from calling that particular function all right and you summarize the results back to the user so you put so you do a second call to the model so that it can put everything together into response and then you send it as a response right now and here I I've done videos on op functions before so you know check it out on my channel and now let's talk about lening as the third level in this abstraction complexity layer that we're discussing so how could we effectively perform tasks using agents well uh in this really cool article by I think it was Harrison J I don't remember who wrote it but it was from Lan chain and it's called open B on cognitive architecture they actually discussed this really interesting idea that open AI model with the assistance API is actually making a bet on a specific type of cognitive architecture and they discuss you know the why is that interesting why that's the case and different cognitive architectures Etc now uh usually any agent that you have and remember when we talk about agents we're talking about llms plus tools right uh involves this idea of you having some user and let me change the color here yeah so you have some user and the user will call the LM right the LM can either send a response directly or it can call some some tool right perform some action when it performs an action you have an observation from that performance of that action right and that can gets everything gets fed back to the llm now that goes in a loop until you have some response when you get response then you send the output oh sorry now that's the basic agent all right so uh a really interesting point they make in this uh blog post is that good agents are routers they know how to route stuff well to different things so useful agents Implement a routing type of architecture than an actual agent architecture so that's more in practice right they make this point on that same block post I was discussing and I kind of agree because my experience building things are actually useful usually involves building chains where you know the llm has a smaller participation because there's an uncertainty regarding the output from the llm it's so it's more that rather than just let it decide everything do everything right so Ling is a really cool framework where you can Implement these routing procedures this should say routing proced let me just change that routing procedures there we go so let's see how that's how that works so linkchain if you're not familiar with linkchain and linkchain is a framework for building context to our reasoning applications essentially it's a framework to build powerful applications that are powered by large language models and the main features are components which are these tools and Integrations that you can uh used to work with large language models and you have off-the-shelf chains you have off-the-shelf building blocks that are pre-built that allow you to do tasks that are you know Common that you know you do every day tax summarization and talking to a PDF like asking documents for Stuff something called rag which we're not going to cover in detail in this presentation but something that we can discuss in the next video now what are the core elements of Lang chain well you have the model you have the prompt and the output parser the model is nothing more than just an abstraction over the llm API like the Chad API right so this is an example on how to call the chat open right now they changed the import because they they did some you know the next version so this is actually linkchain open AI because now they separated the modules honestly I would like linkchain to like make stuff a little bit more simpler in terms of like the changes in the Imports and things of that sort but uh and this code was a bit outdated in that sense just because they changed it in the last version so now this is deprecated but whatever this how you call Chan P so you interact with Chann you give the open API key you set the model you want and then you can use it and actually now this is also deprecated you would call invoke but whatever all all we're saying here is that the model is an abstraction over calling some provider of an llm okay now another one is prompt templator prompts right and that's nothing more than just an abstraction over tax prompts that you sent to another lab so essentially what linkchain is doing is they're taking everything that's related to interacting with llms and making you know infrastructure around it transforming it to objects that you have more control over right so you want to have variables over all the aspects around working with lar language models now uh here is an example of using chat propop template directly from a prompt so this would be the prompt right however the new thing here is that you can and this actually not showing here and it should have set this concept yeah and this is the end of this prompt so show me five examples of this concept and now this is a variable and the power is right there right so it's Dynamic okay meaning you can um You can call this prompt now with different names for the particular concept right just like you would with you know just basic formatting of a string the difference with using this prompt template is that you can put it inside of that bigger more complex chain of building blocks to do interesting stuff right it becomes kind of a Lego piece in a bigger Lego structure that's you know you're building a house or building a bridge or whatever you're doing with Lego all right so the last one is output parser and that's just parsing outputs of the NM NM will part you know give an output like uh the answer to your question is five but you just want the number five so why do I have to have all that tax around it right you need to parse the output of the llm and there are many interesting ways to do that and output pars is a big thing in LMS and now linkchain um is very well integrated with pantic which is a python data validation Library which is super powerful and allows for you to have something called structured um structured prompts or you know it allows you to have control over what comes out of an LM and come into an llm and it's pretty cool pretty interesting so output pars is all about parsing the output of an llm all right now next thing is LCL which is Lang chain expression language to put everything together and essentially that's using the pipe symbol which is like the Unix SP symbol that you use in the terminal if you use the terminal to combine stuff together okay so it's an interface that leverages a symbol to compos components and you know you have this type of structure here right so this is a basic example where you know I have the chat open ey the template uh string output parser that parses the output of a model into a string because usually that would come as a a AI message object if I'm not mistaken so if you give the string up a parsel you get the string and now you have model prompt parsel and boom when you put it all together this is called a chain this is a building block okay so this is a building block you know it's our Lego piece this is where we start building stuff that's interesting I don't know why I'm trying to draw Lego pieces here I clearly don't know how to draw and the only interesting thing that you have to pay attention here is that when you invoke this chain this building block you have to give the variable that was set in the prompt template so in this prompt template name five concepts related to this concept the variable here was concept so that's why it says concept over here right that's all now let's talk about the agent Loop so I've said before that this is the loop right you have the user you send something to the model it can either go output or you go action from action observation it goes back you can go into this loop again and you can do that you know in a sequence until you have an answer when you have an answer boom you go to the out right that's the loop so so some key components for setting up these Loops using Lang chain are schema and what is a schema so lch gives a bunch of abstractions to allow for ye to use but for that use to be easy you have to have structure ways to interact the different components so that's what they call schema if I don't know if that's a good definition but that's my definition and um and you have things like agent action for example agent action is just an object that represents that an action uh of that an agent should take okay agent finish it's just the final result that you would get from to the user all right and intermediate steps is just uh previous actions and outputs for the current agent run so essentially when um when an agent is trying to solve a problem right it might do a bunch of calls to an llm call tools Etc right so you have a bunch of intermediary work that you have to store somewhere so intermediate intermediate step is where you store all that and then you have the agent and that's just the chain that's responsible for theci in the next step that's powered by a language model okay so what are the inputs to the agent so you have this key value mapping and you have two things that you need to have there right you have the intermediate steps and in the output uh you have to have either you're always looking for either an next action or the final response now L chain is changing quite a bit so this could have changed but even if like something specific here changed I think that the the general idea here is that you have an llm right which is in a loop with tools okay with calling tools or with just thinking about how to solve the problem whatever so you're in this Loop okay until you have a response when you have a response you get out of the loop and then it's I'm going to put here response and you output right you give your output to the user okay so all of the stuff that happens before you give the final result response is the intermediary work that we're talking about okay and then when you get the final response you need a structured way to say so here so agent action would be everything that's going on here right everything is like okay so this is an action from the agent meaning it's calling a tool it's doing something all right and then agent finish here okay so it's how length train it's way you can put in the loop to let the model to to let to end the loop and then return an output okay so it's how you say it's over okay you need some uh you need a stopping condition but then you need a signal for that stopping condition and uh we're talking about the loop so we can look at in code and see what that looks like so you might have okay so do this right okay so while the next action you have to take is not the agent finish observe and what is the observation is the output of running that particular next action right and then X action doesn't necessarily have to be a two call not sure right now but uh the idea is you do the observation and then you evaluate what what is the next action right so the next action might be ah do this or do that Etc you know first you have to fetch what the next action is you return it and then you go in a loop right so you set up the loop until you're done and you send the output and it's good to have this kind of runtime because you can handle stuff like errors when calling the tools and logging information logging how the model reasons about stuff and linkchain is really cool because they're setting up infrastructure for deployment obviously and they have this tool which I think is still in beta it's called lsmith and essentially you can look at the intermediate steps and you can evaluate everything that's going on with the agent until it solves the question solves the problem and you can you know there's probably a bunch of stuff that's going to be built upon the next six months months on how to be efficient at evaluating that a model is being efficient so this is pretty cool all right tools and L are awesome those are just functions that anation can invoke it's actually a little bit out of context this this part of the slide I maybe should change that and but just for you all to know tool in link chain you have an input schema for the tool you have a function to run and it's very important for you know building a working agent and linkchain provides you with toolkits so it's like groups of tools for performing objectives like gab tool kit I've played around with it because I've actually done a video on using Python's like sub process to just call the actual terminal commands for git which obviously assumes that you have git installed in your machine and uh not only that but it's not like the most reliable the most reliable this is based on something called Pi geub which is like a packageing python for interacting with the GitHub interface and it's it's great what I like about lynching is that lynching is all about doing stuff it's all about Let's do let's do stuff and that's what I love about it that's what I enjoy about it I think that they have to do probably a better job on the coming months to make it more intuitive to use because even for people that develop on it I think that sometimes gets a bit confusing some of the some of the ways that you're supposed to build stuff like agents and chains however what I do enjoy is that and I follow them closely is that they're all about doing you know performing actions in the real world so they're setting up Integrations that there's like a bunch of Integrations they're thinking about okay what's the easiest way for you to make an llm useful that's what I really enjoy about L and why I trust that this is a framework that's going to be really Paramount on the next you know next years on you know building powerful useful tools that are you know based off large language models and yeah they provide these two kits which is awesome so yeah these are some of the references for this presentation I hope you liked it uh if you like this video don't forget to like And subscribe and see you next time cheers
Info
Channel: Automata Learning Lab
Views: 1,282
Rating: undefined out of 5
Keywords:
Id: ATUUd2bpRfo
Channel Id: undefined
Length: 31min 44sec (1904 seconds)
Published: Sun Feb 04 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.