Querying a database with OpenAI's ChatGPT and Langchain

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in the previous video we did a very high level overview of how length chain and different LMS in this case chai GPT can interact with our relational database to answer certain questions this time it's a deeper Deep dive as we see the exact prompts at lag chain exchanges with the chai GPT and how cha GPT decides to make use of different functions that length chain provides jumping ahead I must admit I'm not impressed it's all very basic we're still working with prompts like link chain under the hood tries to to generate the right prompt results are not deterministic but hey that's just me and maybe I've been too negative decide for yourself let's get right into it as in the previous tutorial we'll start with a very basic create SQL query chain it takes a question as an input and returns a query that things would answer our question most precisely as a response run that through and and we're still working with the Chinook database by the way take a look my previous video there I described the the structure of this database and if we ask Lang chin and Chad GPT in this case GPT 3 by 0.5 turbo what is my date about it will return us the query that it's things the best answers the question and now the way we'll we'll ask Lang chain to Output all the information that will provide us the the intimate details to setting Lang chain debug setting to true you might have noticed in many of the length chain tutorials that they they set the verbos attribute here of LM or chained to true but the variables attribute is not really helpful because it doesn't go all the way up to The Prompt level and I think that's because langcham doesn't want you to see the prompts it just tells you what Lang chain does and that is not very helpful from my experience so yeah I I would recommend if if you need to understand what's happening set the length chain D back parameter to true and then we do the exact same thing so we pass on the question what is my data about it will output a lot of information you see some code going on here and all of that so I'll parsed some of it so the most helpful bits for you here and so to start with Lang chain goes into our database and looks for the tables that are present in the database we pointed it to so in this case in this database it lists the tables first and then it lists the ddl so the structure of the tables and then for each of the tables it lists first three rows so it gets an idea of what's there and then it generates this horrendous prompt that gets passed to chat GPT given the input questions first create synthetically correct postgres query to run the look of the results of the query and return the answer to the input question then it sets up this kind of like conversation structure that it expects charge Deputy to pick up and that it appends all the stable information and the rows information that it fakes from our database and then Chad GPT reads this horrendous prompt and it goes further because here I just cut it to the first three tables it actually passed the the entire table set which I think is close to 15 tables and then chat GPT reasons about that and outputs a postgres query which then Lang chain passes back to us now that is a very simple example that already gives you an idea that under the hood there is no magic happening Lang chain simply queries for the tables that are available in the database and then leaves out all the reasoning to the chat GPT and charge D being very smart it figures out what to do with those table structures and kind of data format now let's move on to a more sophisticated case according to Lang chain the preferred way of interacting with the database is by using SQL agents here I'll walk you through the different parameters we'll pass on to the SQL agent and explain why it makes sense and so you could also understand what other options there are first is the choice of the model here we specify GT 3.5 turbo 0613 which is meaningless but if you look up the documentation it will tell you that it was fine-tuned for working with functions meaning that if you have a question and that you have also a list of functions which may be helpful in answering your questions then this model is good at picking up the those functions and making use of them while answering your question and then we pass on so-called SQL database toolkit that comes from langchin and that is essentially a set of the functions that Lang chain has that it can run against our database it has a function a description a name and it has functions like hey list the tables in our database or list the columns and data types that are present in each of our tables so those are the functions that result in this SQL database toolkit it was added to false it's useless and then we specified an agent type agent type is basically you give a hint to lanction how should operate under the host basically how it should be working with a prompt and the responses it gets from from the llm so now case it's open AI functions and if you if you go and read the dog through a different agent types the one we're using is open AI functions and it's made to work with the models that were fine-tuned to detect when a function should be called and respond with input blah blah so basically that's the agent that should be used with this model because it was fine-tuned to work with functions so that's how it all comes together one more thing I forgot to mention is that when Lang chain calls open AI API it makes use of the completions endpoint and the completion setting Point takes as one of the parameters array of functions and this is exactly where this SQL database toolkit fits so it will basically pull the functions it has and pass it on in this functions array let's see how it's done under the hood now we instantiate that and then we set the debug parameter to true and and ask the question we asked in the previous video so what is my data about very simple question it produces a ton of output which is quite tricky to read but here is the gist so it says that you're an agent designed to interact with the SQL database and then given an input question create a synthetically correct postgresql query to run and then it lists our question what is my data about and then it passes on a list of functions that it got from this SQL toolkit from Lang Chen here all of our functions it has a name it has description has parameters it takes and that is the prompt that is being passed on to open AI API then the response that comes from openai is the name of the function it things that needs to be used in this case it came back as a SQL DB list tables and this is exactly what then Lang chain executes on our end so your queries for tables that are available in our database and comes back with this list of tables then it goes back to LM one more time and it gives this exact same prompt that we have a question and we have the functions but with an addition of this dictionary now we let open AI know that one of the functions namely this one SQL DB list tables was run and this is the response that we received the rest is pretty much the same also in the first query we're giving a hint to chargipti that that maybe we need to look at the tables present at the database so on the second iteration we already give give it a hint that hey we run that function and this is the list of tables that we receive then what open AI comes back with is it says you should also run the this function SQL DB schema function for each of these tables meaning basically reasons that okay now I know the list of the tables now I need to know the schema of the tables langchain does that it has the tool to do that basically look up the schema of the table and then it generates yet another iteration of a prompt to open AI so this we already saw and then we said that we ran that function and that was the output basically the list of tables and then we say that we ran another function here is the function with a huge output but basically that function name was sqldb schema I cut it for for Simplest to here to also hinting tragedy okay you recommended us to run a function we did run a function and this is what it returned and the rest is the same we still we always also pass a list of functions that we have in our toolkit and then after this third iteration and then and only then does Chad GPT get back to us with a recommendation saying that okay I I now know the tables that reside in your database I know their schema and now I can infer what this data is about and it says well based on the tables it appears the data is about a music store to sum it up who went under the surface of Lang chain and you saw that under the hood it does multiple round trips between some internal calls to the database then asking help from LM and then in the end all of that is expressed and prompt in no way I think that it will be useful in even six months as I mentioned like shame framework is changing literally every day but it does give you a good idea of how tools like Lang chain and I promise you there will be many more coming out in the next year's interact with the database hope it was a useful video let me know if there are any questions there is anything could help you with that will serve me as an idea for the next videos have a good day [Music] foreign

Info

Channel: Denys on Data

Views: 1,305

Rating: undefined out of 5

Keywords:

Id: K2ykxPsh6Ys

Channel Id: undefined

Length: 11min 57sec (717 seconds)

Published: Wed Sep 20 2023