Langchain Agent Toolkits - Pandas DataFrame, JSON and SQL Database Agents

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
language models can be used to interrogate different types of files and data structures for example CSV files pandas data frames Json files and SQL databases now when working with language models we can use a library called langchain in order to make certain operations much easier and what we're going to look at in this video is some of the toolkits that Lang chain provides and these toolkits they're designed for specific tasks we're going to look at three of them in this video firstly we're going to look at the pandas agent secondly the Json agent and thirdly we're going to look at the SQL database agent so let's get started I'm on the line chain documentation for toolkits I'll link this below the video and we have a list of the different toolkits that line chain provides out of the box and you can see toolkats here like the jira one the Json agent which we're going to use and also there are toolkits for Azure cognitive services and for Gmail and other tools as well now we are specifically going to look at the Json agent we're going to look at the pandas data frame agent and we're also going to look at the SQL database agent as well and you can see other agent the bottom here for Vector stores and if you're working with spark clusters for example data frames and SQL agents on spot these are also available directly in the line chain library now to get started I'm going to go to a Jupiter notebook and I have installed in the notebook environment the line chain and open AI python libraries and you can do that with this command here and put the exclamation mark at the start of the command if you're running it in a Cell that's going to then install those two libraries into your environment now line chain provides the toolkits and we're going to call the open EI language model so we're going to install that library that allows us to call the API and get back responses from the language model now once we've done that what we're going to do is we're going to make sure we have our DOT in fail defined and our project directory and we can set a value there of openai API key and you can set that equal to whatever your API key from open AIS once you've defined your dot end file we're going to load from the Library dot in with a function called load.inf and then just below that we can call all that function and that's going to say any environment variables that you've defined in the dot end file until your operating systems environment we can then import the OS module from Python and we can get the API Key by looking for OS dot Environ and we can index in here at the open AI API key once we've executed that that will load the key into this variable and we can then use that in our calls to the openai API so that's just some setup we can now go to the documentation and we're going to look at the pandas data frame agent to begin with and this is an agent toolkit that you can use with langchin in order to interact and use Panda's data frames and get information from those data frames using your typical prompts so let's copy these Imports into the environment so let's go back to our code here and I'm going to define a new cell and paste the Imports so from langchain.agents We're importing this function called create pandas data frame agent and from the linechain.chat models module we're importing the chat open EI and finally below that we are inputting this agent type construct from these agent types here we're going to see what that means in a second so let's go back to our documentation and we have this failed titanic.csv I've used that in previous videos but if we go to the directory structure you can see I have copied that file to this local directory where the Jupiter notebook is running and that means I can load this file into a pandas data frame directly from the current directory and if you need this file of Titanic data I will leave a link to this below the video you can grab that and copy it to your file system so let's go back to the notebook and we're going to read the data frame in on this cell here so importing pandas as PD what we're going to do is call the PD dot read CSV function and we can pass the name of the file to that and then we store that in a variable called DF we can then look at the head of that data frame and we can see the data that we're getting back so the data frame head function will return by default five rows of data from the data frame and you can see some of that data just below here so we have the data let's say we want to ask a question of that data for example we might want to ask what's the average age of the passengers that were on the Titanic what we can actually do here is go back to the documentation and we can look at these two different calls that we can use to create pandas data frame agent at the top we have something called a zero short react description and that's the default agent type that we get when we call the create pandas data frame agent function and this is an agent that has no memory it does not recall or know anything about previous calls to the openai API so it doesn't retain any memory so all of these are just One-Shot calls to the API with a particular prompt and you get back your response but there's no memory concept when you're using these now below the zero short model we also have the ability to use openai functions and with this one the call to create Panda's data frame agent also takes a parameter Called Agent type and this time is set to agent type dot openai functions and this is an alternative to using the zero short model if we go to Twitter for the moment we can see the line chain account has tweeted this particular tweet they have recently updated key agent toolkits to use openai functions so previously the SQL CSV and pandas agents they all used the zero shot react description agent that we've seen just in the documentation there but this new one is often faster and better so they have added this option so because of that let's go back to the documentation and we're going to copy this code here for the openai function model and let's go back to our Jupiter notebook and what we're going to do is I'm actually going to do this in a new cell just below here I'm going to paste that code in and we can see that below so we're calling the create pandas data frame agent function and we're passing in a chat openai model to that we set the temperature to zero in this case and we're also setting the particular model that we want to call on the API and the second parameter to the pandas agent is actually the data frame itself and we can also pass some other options for both set to true and we're also passing that open AI functions type let's execute this code and what we can do on the cell below that is we can use that agent that we're getting back from calling that function and that has a function called run defined on it and then we can pass to the Run function are prompt now the prompt that I just mentioned was this one here what was the average age of the passengers let's execute that prompt and you can see the line chain has been entered and when we enter that chain you can see that it's invoking what's called a tool in line chain the python repel AST and that is basically executing some python code and you can see the code that it's executing here it's the data frame we're indexing in on the age column and we're getting the Min which is exactly what you would write if you were defining list yourself using code and from that you can see that the average age of the passengers is approximately 29.7 years and if we copy that line of code and just quickly paste that below you can see we get a result that is basically 29.7 so this is quite similar in some ways to the panda zi videos that I did before but this time we're using line chain and we're defining this agent here the pandas data frame agent and we can pass through that and open AI model and a data frame and then when we get back the agent we can provide the prompts to that agent using the dot run function now be aware that when you're running this code when you define the open AI model you must have that open AI API key in your environment or you can pass it as an argument here to that model so if you're getting any errors there make sure you have the open AI key and your environment now I'm going to go below here and I'm going to paste another prompt let's get rid of this don't mean call this time we're going to ask where males or females more likely to survive let's execute that again we're entering the chain here and we're getting back some more code that's been executed with this python tool this time the code that we're seeing here is the data frame dot Group by function so we're gripping by the sex column and then we're indexing in at the survive column which is a Boolean column that tells us whether or not the passenger survived and then we're getting the Min from those two groups of male and female and that is returning this series of data below here where the female and the male have a particular survey enable it and below that the output from the language model is that females were more likely to survive than males and that the survival rate was approximately 2.16 and for males it was 2.39 now if we go below here I'm going to paste some code in to get the survival rate for males so what I'm going to do is pull out all of the entries in the data frame where the sex is equal to male and then what we're going to do is we're going to calculate all the number of meals that survived and divide that by the total number of males and that gives us the rate of survival of 2.389 basically 2.39 as we got back from our model so again what we're doing is simply passing a prompt to the agent dot run method and that allows us to interact with the openai API and get back responses from the language model and that also takes into account the data that's defined in our pandas data frame and we can do all of this by using this create pandas data frame agent so that's very handy if you're working with data frames in your data and you're getting back responses based on textual prompts to that data now let's do do one last prompt just below here we're going to get the name and the age of the oldest person on board and that gives us back this line here the name of the oldest person was algonon Henry Wilson and his age was 80. and again when line chain enters this chain here you can see that it's invoking the python tool and it's executing this code to pull out the oldest person so we're looking at the age column and we're getting the max age in that column and then we're indexing into the data frame to get back that single Row for that person so that's very handy we can interact with pandas data frames using this agent toolkit let's now move on to another one if we go to the toolkits here we're gonna go to the Json agent now and this is designed to interact with large Json objects or dictionaries and it's useful when you want to answer questions about Json that's too large to fit in the context window of a language model now the context window is basically the maximum number of characters that can be considered by the language model at a single time what we're going to do in this example is we're going to get an open API specification which defines the specification of an API and we're going gonna then take that specification and we're going to ask questions of it using Python and line chain now I'm going to link to this below the video but we have a repository here and this contains a Json file defining the open API specification for an imaginary pet store API you might have seen this if you've watched in programming examples before and if you're unfamiliar with these specifications they describe your API in a particular way for example if we scroll down you can see all of the endpoints that are available on the API that's with this paths key and each object within that refers to a specific endpoint and contains data such as the HTTP method that you can use with that endpoint and you can also get the parameters that you can send to the API endpoint for example in this case we have tags and a limit parameter that you can send in the URL in order to alter the response that you're getting back and below that we can also get the response codes for example 200 if all goes well and there's an entry in this paths dictionary for every endpoint that's defined on your API so if we scroll down here we can see another endpoint we have the pits slash Dynamic ID endpoint is defined and that will return an entity based on its ID so what we're going to do in this section of the video is we're going to use the Json agent in line chain so again the link to this data file is below the video you can copy that into your directory I've already got it here and it's called pet stores simple.json if you go back to our documentation for this I'm going to bring in a couple of imports here so first of all let's grab this Json toolkit from launching and I'm going to paste that here just below so this is defined in the agent toolkits module and it's called Json toolkit and we're also going to grab the Json specification which is a particular tool that we can use so let's copy that into the notebook just below this import and before we go on let's go back to the documentation and we have these toolkits but we also are importing Json spec from the tool module if we go to the tools documentation you can see that these are interfaces that agents can use to interact with the world so they are functions that agents can use and these tools can be genetic utilities for example search they can be other chains that are defined and you can even have other agents as tools as well they're the tool that we are looking at here is the Json spec tool so what I'm going to do is go to line chains source code on GitHub and you can see some of the tools that are defined for Azure cognitive Services Bing search and so on there's quite a lot of tools here you can explore this if you want if you're interested in a video on any of these tools let me know what we can do is load up the Json tool and we're going to go to the source code for this and we're going to have a look at this code or some of the tools that they find in this file when we're executing the code that we're about to use here so let's start by loading in our pet store file the API definition we're going to load that into the notebook so we're going to open a file and it's that file there and we're going to open it as if and what we're also going to need here is the Json module from python standard Library so let's import that and we're going to set data equal to the json.load function and we can pass the file descriptor into that function and then we will get back the data for that API as you can see below so now we have the data let's go back to the documentation for the Json agent and what we're going to do is copy these three lines of code and we're going to bring them into the notebook so let's remove the data and we're copying that in so we're setting up a Json spec object and we're passing in the data that we've loaded in as the dict argument here so this creates the Json specification and then we instantiate the Json toolkit and pass that specification in as an argument below that we're calling a function called create Json agent and we need to import that so let's go back to the documentation and we're going to import that from the linechain.agents module we can do that at the top of the file here so we're calling the create Json agent function and we're passing our model to that as the llm argument and also a toolkit now rather than using the open EI model I'm actually going to scroll up here and I'm going to change that to the same model that we used before so let's copy that line of code and that saves us having to import the open AI model as well so let's remove this line of code and and we're going to paste in chat open EI model so let's try executing this now and that gives us back this Json agent executor and just like the pandas data frame agent we can use the dot run function on the Json agent executor in order to run a prompt against the own Json data now I'm going to use the prompt to ask what parameters are required or optional for this slash pits endpoint here so let's go back to the notebook and I'm going to paste that prompt in here what parameters can we provide for the slash pits endpoint and let's now execute the function again we're entering our chain here and you can see a specific action in this case it's the Json spec list case so that lists out the keys at the top level of the Json file and you can see the thoughts that are occurring below that so for example it sees that there's a key called paths so it then goes into that sub-object for paths and it lists out the keys that are available there and this is iterative it keeps doing this and eventually it gets down to this action here where we're actually getting the value from a particular sub object and the Json file so it's smart enough to go into each nested level of digestion until it finds something that matches what it's looking for so you can see two different types of actions here we have the Json spec list keys and also the Json spec get value and that's the whole idea of these toolkits they chain together different operations that you might want to do on a particular source of data and the agent is smart enough to make decisions at each step about what to do based on the observations it's getting back now if we go to the bottom here you can see the final answer that's outputted by this particular operation and the answer states that the parameters that can be provided for the slash pits endpoint are tags and limits so if we go back to the open API specification for this endpoint we can go to the list of parameters that's defined here and you can see that actually matches exactly what's defined to you so we have a parameter called Tags and also a parameter called limit so we were able to use our prompt to actually get information about this API specification and we can use LINE chains Json agent toolkit in order to do that now if you go back to GitHub I loaded up the source code earlier on and we have these Json specifications in this source code but we also have these tools just below here so we have the Json list Keys tool and that's a tool for listing out the keys in a Json specification and that's what was called as you can see here in our code so we were calling the Json spec list Keys tool at each step in this process in order to try and find the right data and if we go down even further you can see the Json get value to and that is what was called to actually extract a value from a particular key in the data structure so the line chain agent toolkits such as the Json token are comprised of often multiple tools and it chains together operations and makes smart decisions at each step based on what you're trying to achieve in your prompt and we could then run further queries here for example as the limit parameter for the slash pits endpoint required so let's execute that and we'll see what we get back once that's finished running we can scroll down we can see the output here and it's returned just false to us so basically it's telling us that the limit parameter is not required if we go back to the specification we can see the limit parameter here and the required field for that is set to false so that is returning the correct result so that's the Json toolkit that you can use with langching what we're going to do now is move on to the last example in this video and that is the SQL database agent and this agent will build off of the SQL database chain that's defined in line chain and it's designed to answer general questions about a database as well as to recover from errors so what we're going to do is we're going to download a SQL Lite database file just for Simplicity in this video and I'm going to link this page below the video and the particular database that we're going to use is this one here and that's just because it's very small it doesn't contain a lot of data it's only 0.1 megabytes so let's download that Florida file here and this is for inmates on death row in Florida so I now have that fail in my local directory you can see it here it's the dot SQL light file so now that we have that let's go back to the documentation and we're going to import these three at the top here so it's from langchain.agents let's import the create SQL agent we're going to import from the agent toolkits module the SQL database toolkit and finally from linechain.sql database we will import the SQL database object now let's go back to the notebook and paste the top three Imports in there and I'm going to create some new cells just below here now again I'm going to create an llm object so let's call that llm and we're going to invoke or instantiate our chat open EI object and again that's going to call the GPT 3.5 turbo model just below that I'm going to set a reference to our file which is the SQL light file that we're going to load and then analyze using prompt if we go back to the documentation here what we're going to do is actually load the database using the SQL database Dot from URI function and then what we can do is pass that database that we've loaded to the SQL database toolkit so let's copy these lines of code and go back to our notebook I'm going to paste those lines of code just below here and we need to change the file that's being referenced here right other than that one so I'm going to set our python F string here and we're going to reference the file that we have got here in the cell above that's the floor of the death row dot SQL light file so we're going to load that file into a variable called DB and then we pass that DB as a parameter to the SQL database toolkit and the llm I'm going to change this I'm going to pass in the llm that we have defined in the cell above here so let's execute these cells and just below that what we're going to do if we go back to the documentation is we're going to grab this agent here we're going to use the zero shot react description agent so let's copy the agent code into our notebook and again we're going to pass the llm directly rather than reference that openai model so we have a call to the create SQL agent that's going to create that SQL database agent and you can see as well as the llm we're also passing in a toolkit here and that's the SQL database toolkit we've defined in the sale above and this time we're setting the agent type to zero shot react description so let's execute that code and create this Agent X executor object now what I'm going to do is we're going to have a look at the database and see what kind of data is stored in that database so we have a file here called Florida death row dot SQL Lite and I have a tool called DB browser for SQL Lite installed and we have the database open in that tool we can see that there's one table in the database and it's a table called inmates we can right click and browse this table we can see the data that's stored in this table and we have for example a column for the inmate name for the race and gender for the dates of the offense and the sentence and also the county where the offense occurred so that's the data now we know what that looks like let's go back to our notebook and we're going to refer to the SQL agent here and we're going to call the dot run function on that agent and let's pass the prompt to that how many rows are in the inmates table so that enters a new chain and it's going through some observations and some actions here for example the SQL DB list tables action and because we put that particular name of the table in the query it can detect that that's one of the tables and then it's able to then dive into that table and analyze the schema and get back some results so you can see also below there's an action called SQL database query and that takes as an input a particular SQL query for example select count from inmates and that's exactly the query we want to get back the number of rows in the table and the output of that query is 492. what we can also do if we scroll down is execute another query and we're going to check which county has the most inmates so let's execute that we'll see if that gets a valid result and once that's finished executing you can see we get back the county Duval now we can see above the SQL DB query action we're going to copy this SQL statement and I'm going to paste that into the sqlite browser so let's go back to the browser and that has a tab called execute SQL so let's paste that in and we can then execute that particular code and that gives us back the Davao County which had 72 inmates apparently now in order to get all of them we can remove the limit one close that's been used in that SQL and we can then re-execute this and we get back all of the counties with the number of inmates that are in that county and you can see that Deval is the number one County and the database it's got the most inmates and it's followed by Miami-Dade and we're getting this output by using the SQL Group by statement and then we're ordering by the column number of inmates which is the Alias we're giving to the count all so line chains SQL token is smart enough to build up those SQL commands and execute them against the underlying database to return a particular result to you now I will emphasize that these tools are not perfect they can be very brittle and you need to format your queries sometimes in a way that makes sense and will make it more obvious to the tools what they are actually trying to achieve and how to do that and sometimes the output you get back from queries is not in a format that can be output immediately and that can cause some issues with these types of operations as well now if you look at langjain's documentation we have under model IO section on output parcels and that allows us to get back responses from a language model and then change the format of those responses or parse that response in a particular way for example you might want to Output numbers instead of strings or you might want to extract data from a textual output into a python dictionary or something like that output Parcels can help with that but we're not going to cover that in this video but it might be useful if you can use these with the agent toolkits and pass some of your output responses into a particular format there's lots of options with line chains a very versatile tool and it's developing rapidly at the moment so if you're interested in more videos on that please let me know thank you for watching if you've enjoyed this video please like And subscribe to the channel and we'll see you in the next video
Info
Channel: BugBytes
Views: 4,708
Rating: undefined out of 5
Keywords:
Id: BdH0XjbsXO4
Channel Id: undefined
Length: 23min 45sec (1425 seconds)
Published: Thu Jun 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.