NL2SQL with LlamaIndex: Querying Databases Using Natural Language | Code

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone in today's video we're going to explore the natural language to SQL capabilities of the Llama index we have already covered uh you know in the one of the video explaining the basics of the Lama index and how to create let's say question answering or the rag kind of app using the Lama index and before that I had also covered the Lang chain SQL agent that how do you connect your own you know SQL database using natural language you know the Lang and SQL agent but recently you know Lama index has been doing really good progress with lot of features and I thought you know this is one of the good thing that we could explore because I have been using uh this natural language tosql from the Lama index in couple of my you know recent projects so in this video we're going to see how we can you know connect to the database using Lama index ask queries in the natural language you know similar to what we have seen for the um Lang chain SQL agent so the first thing you know we're going to install the Lama index and the MySQL because when we want to connect to the uh the database I'm using is actually the MySQL so we need a MySQL so let me you know uh install those two things and then I'm adding some basic logging capabilities so that we could see you know what's happening under the hood and you know for the Llama index specifics those are things I'm logging so here are my you know uh database details so my database is hosted in the RDS Amazon relational database service whatever the your database will be it will be having its username password the host which is the URL of the your database and maybe the database name so I'm using this classic Model this is the same database I have used in the earlier example because it has a good table and the familiarity it has like a e-commerce kind of thing like it has a customers products orders payments kind of information something a kind of a sample DB that I could use for you know tutorials like this so maybe I can put that link also I have put in the earlier because I'm going to share this uh let's say this notebook with you right so I can put the database link also here so how does this um you know llama index uh works so if you go to the Llama index has some kind of you know a wrapper class on your database so it called the SQL database which is going to take an engine that engine is nothing but the SQL Alchemy engine so first we need to create the SQL Alchemy engine that this SQL data rapper can use to connect with our database so we can simply you know uh from SQL Alchemy we can simply import the create engine and I'm not sure where I'm using text okay here so first thing we need to create is the connection string so connection string is going to have all these details you know that is uh all the authentication URL details that is will be used to connect with the thing right so for MySQL the connection string looks like you know uh the database which is MySQL the driver or the library I'm using to actually connect with that particular database and then username password host and the database name if you are using something else let's say postra SQL then make sure you know what is the connection string of postra SQL using let's say SQL Alchemy and then you can substitute that thing here right and then once you have the connection string you simply use the create engine function and get the engine and to know whether we got the correct functionality working or not we can simply use that engine to connect to our database and you know fire some simple query like I'm selecting some customers from the let's to make sure that the engine or the connection string is correct and we are able to connect it okay maybe I haven't run this cell let me run that Cale and then we could run this cell okay so uh seems to be whatever the connection string and the engine I created I'm able to connect with my own database and that's why I'm able to display this results right so here is you know table details uh dictionary I have here I have the name of the table and maybe just one line description of the table this is something an optional thing because uh sometimes what happen the database table names what you have those table names might not be the self-explanatory maybe you have a customer table but then you are calling it XYZ so to um to better to have a you know some description about your table that we can give it to let's say l index or the chat GPT so that it understand your tables better so I just created one this simple dictionary for the mapping purpose uh table name and the description now as we saw here right we need to create the object of the SQL database right so we import the SQL database from the Lama index and this SQL database will take the engine that which is created above SQ alch engine and it can also take certain extra parameters so it can take like this also for example when we created the let's say when we are creating the SQL database object we can literally specify the tables that it should include let's say I have many tables but I only want to include this table for this we could specify those you know tables we can also so whenever this SQL database objects get created it will uh you know get the information about the schema and it can also F certain records from each table so that it understand the nature of the data and you can specify Know sample rows in each table kind of a thing five by default is the three actually but you know so now what I'm going to do I'm going to allow it to take all the tables what I have and maybe sample rows from each table is just two rows let's see what happens so it will you know uh it will connect to the database and uh kind of get you can see what so this SQL database object we can check the all tables that it is able to retrieve from the database and you could see it has got access to this many tables you can also check the table info where you can see what particular info it got from each table so you can see first of all it got the schema of the customers table and then it also got the you know two rows from that customer table so this is the information it can provide to chat GPT so that chat GPT understand the columns it has and the nature of the data it has you know this value we can change two rows zero rows or whatever or fire RS right we want to pass then it will have another table called employee okay this is how it is able to extract the schema and the data information that it can use actually further we also going to require actually the uh you know open AI key because ultimately it's going to use large language model like an open AI under the hood to create natural language to SQL so I'm just putting my key into the environment variable here so this key I will be deleting after recording this particular you know video so you won't be able to use this particular key right now let's see what extra thing we are doing so since um you know um we want to experiment with the different thing I'm just using the this particular utility from the Llama index called token counting Handler so this is some kind of a call back that we can add so that we know how much token we consume this is not something needed for this natural language to SQL but it is good to have so that I know how much you know you can read more about it again so how does it work first of all we need to import this you know token counting Handler uh we give some tokenizer information and also the model that we're going to use which is GPT 3.5 turbo and then we attach this token counter through the call back manager and this call back manager can be passed inside the service context so in the Llama index service context as per my understanding is something which takes some configuration and this configuration you can use this right so this service context so here what configuration we have added here you could see we have you know also imported open AI from the Lama index and actually we are creating 3.5 model as a llm you know so the service context has two information what llm we are using and what call back manager or any extra things we want to you know add in this service manager so the service context for example let's say you are chunking your documents you know Lama index let's say while creating the index so in the service context you can also specify the chunk size and all so this is nothing but just kind of a providing the configuration kind of a thing okay so we got the service context this configuration now we can provide now this is the time to actually create the you know natural language to SQL query engine so again in llama index the query engine is actually you know it's kind of a what do you call H you can think of the query engine is a generic interface where you can ask question to the data that's it that's what the you know the uh you can ask natural language question to your data and that's that's what nothing but the query engine and then you can have a very specialized query engine now this query engine specifically for the natural language to SQL table related query engine so we call it SQL table query engine now this query engine requires your database uh in object that we created and the additional configuration and information and then once we get this query engine again I forgot to run the UPS this sale okay there's couple of sales I need to run let's run this sale which is nothing but this token counting Handler this is like defining our configuration of the service context and eventually finally our query engine which takes all this information now we can ask the query we can ask something like you know maybe you know which customer has the highest orders and we can execute that particular query and you could see under this is what we have added the logging now you could see U the moment we ask any query the first thing it is doing actually retrieving or using all the schema it has so it has a nine tables it is going to put that all nine table information here and give it to GPD 3.5 so that GPD 3.5 can understand from each table which columns to use use or you know uh whatever so it has consumed this information we got the response back and let's see what response so we can see exactly what response we got so the customer with the highest number of order is this particular whatever the name of the customer is given you know and they have the 26 orders then you can look at the when when this response was generated what was the SQL query actually generated internally so it generated internally this particular query and you can also see when it fired this query against the database what result it you know it got back so this is what it got back from the database set though so it used this information and eventually synthesize these answers so that we can give it to the user pretty simple right we just simply use some kind of a wrapper called natural language SQL table query engine uh which took the database uh connection object some kind of thing and then some extra configuration detail right we could you know fire the other query and we should get the let's say we are asking this information that when was this particular order was shipped again it is going to uh use this all tables information generate the response and give it to so it is saying it is from the 10th January something that particular and this is what it got from the database from database it got the date and the did form the query now okay I should have printed this thing so this will not be the correct thing let's let's reset the counter and then run actually for one single query and see actually you know how much it has consumed okay so it consume almost 900 tokens uh when we ask any question now let's reset the counter because I'm going to use the different Techni and then we're going to see you know how much now if You observe what we have been doing every time we ask questions we are giving whole schema to chat GPT to figure it out the column names and the tables it should use but here we have only a couple of tables but what if you have hundreds of tabl you can't put the hundreds of tables information inside this uh every time the prompt is you know uh going to run so you need some way so that you first figure it out out of all the tables you have what subset of the tables maybe three or five subset of the table that you should pass to the GPD so that it reduces your cost and maybe feasibility also you can't put all this information in a prompt you will get this you know token limit error so how do we do that so in the Llama index what we could do we can actually create the index on this information what you see so for each table we have the table schema and its description we can create first index on this particular information and then before we uh directly query to chat GPD we will query this index and we will see out of this many tables what two three tables maybe I require to pass the GPD and we're going to see that particular thing now so to do that thing we going to create actually index over the tables so we require two things so that kind of index is called actually the object index in llama index so object index is me you actually want to put a some kind of objects like in our case we're going to put now schema object or the table object inside this object index we require two things one is SQL table schema which is nothing but it will take your table schema and give you some kind of a you know a wrapper or some kind of a interface and SQL table node mapping which is nothing but going to um so I will tell you where this mapping is going to use so first thing let's do this thing we already have a list of tables that we have read here and we give this mapping and sorry this list of tables what we have in this SQL database object and it will create the mapping what is the SQL table node mapping so in llama index whenever you create the index your data is actually get stored in internal representation called node so one node is actually one chunk you know at high level in your index so here what it will do here one node will be one SQL table schema so this object index will be having multiple such a nodes and each of these node will have some table schema that's what we are doing here so first of all we're going to trade through all the tables and we're going to create the instance of SQL table schema for each table and we going to also add some extra contextual information which is nothing but what I have declared on the top here store the customer data store the list of scale number right this is the way you can pass some extra information to GPT you know or let's say that index that will help you know to narrow down your search so come back here so um we create so what we're going to do we're going to collect all the schema objects in this particular list so we trade through our tables what we already have and then for each table we create this SQL T you know table schema where we give the name of the table and we also give the description what we have now once we got both table uh node mapping and the schema list we going to create the object index so object index is going to consume this you know table schema objects the table node mapping is nothing but the table name and the node mapping and this is what it going to create the vector store index we're going to also pass the service context which talks about which llm to use and you know our counter it's going to use right so let's first of all create this thing table scheme objects table details is not defined let me check the spelling ways of details so yeah we haven't defined so let's run this table details come back here and now let's check again once again okay so it is created so you can see the table schema object has a multiple table schema okay for each table you can see we have this schema object where it indicate the name of the table and some extra configuration that extra information that we have added this is optional this extra information is an optional thing okay now let's create the index so we're going to create index using that same mapping and the scheme objects so we're going to create the vector store Index this particular index okay maybe uh we first create the index take this lines out of it let's create the index now once we have this index what uh you know we created we can use something called SQL table retriever query engine so what was we earlier using so we were using natural language SQL table query engine now this is like a retriever query engine why retriever now it's actually going to first of all retrieve the tables that it should use and then eventually find the uh you know create the uh what you call the SQL corresponding query so whenever we ask something let's say how many customers we have you know a simple thing what it going to do actually you see it's actually not taking all the tables we have it has only taken I guess two or three tables so it can take top K is equal to three tables I think it took customers table orders table and the payments table okay and then maybe only one of this table is actually sufficient to answer and it has consumed 500 token so this is a better way actually you know I did ask simple question maybe you can ask some better question which you know this needs to be careful what if uh the tabl requires actually answering from the multiple five tables or something so you can put here five six whatever the number you want to put but the whole idea is we are doing this thing because we going to first of all there is object index retriever is nothing but this retriever going to take top three tables from this particular object index and we have created this index because we can't put because let's say we have hundreds of tables or maybe 50 tables and we can't put all 50 tables in each prompt request that is the the reason the first step is happening is actually retrieving the relevant tables and then actually uh using those retrieve tables to generate the answer and you could see it will be consuming less you know tokens compared to what we were you know consuming so this is was like a short tutorial to give you the glimpse of the natural language to SQL capabilities of the uh llama index and hopefully um you find it useful if you haven't seen the uh you know earlier llama index videos you can go and check so we had this Lama index basic introductory video and then we also have a lang chain SQL agent and then you can compare what we have with the Lang chain SQL agent with the Llama index SQL agent okay thank you

Info

Channel: Pradip Nichite

Views: 9,502

Rating: undefined out of 5

Keywords:

Id: ZRSI8LHpqBA

Channel Id: undefined

Length: 19min 51sec (1191 seconds)

Published: Wed Oct 04 2023