Query Your CSV using LIDA: Automatic Generation of Visualizations with LLMs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to AI anytime channel in this video we are going to uh build an application that will that will help you generate automated visualizations and we are going to use something called L which which is a new release by Microsoft they have released they have built a system that will help you automate this entire data analysis process and that to powered by large language models so you can see Lita as a panda's AI killer I will say uh that to be honest pandas AI uh I think they were a little slow when it comes to working on the active issues or on their road maps now we have a new system uh that's called L by Microsoft and it really looks promising if if I talk about Lita uh it the meaning of l is like loved by the people and I I when I show you the demo uh you will believe that this might get a lot of love from the community okay and let me first show you a quick demo of that what we are going to build in this uh video guys so if you look at here I have a examit application and that's what we are going to develop in this video and I have two option one is summarize and the other is question based graph so the good thing about L is that it also helps you query your granular data you have csvs right now and the excels and the other on the road map app you can upload a CSV file and you can uh generate different graphs based on a based on different queries or questions so it can look at your queries and then it can generate graph how it does it I will explain that in a very high level in a bit now if you look at here we have something called summarize and the other is question based graph let's first look at summarize here and in the summarize I'm just going to browse a file name and this is the data I uploaded let me first show you a data it is the same data file that I have used earlier to build a CSV chatbot using llama 2 I'll give that uh video in the description as well you can have a look that video is completely open source doesn't work that good but now we have lot of work arounds with pandas Ai and CSV loaders blah blah blah right now this is my source data and if I work as a data analyst or you are a data analyst and you want to augment uh your entire workflow using a large language model this is the agenda in this video guys now you can look at here it this is a 2019 uh world uh different countries and their happiness score so you can find out the happiness score which is based out of 10 then we have GDP per capita we have social support healthy life expect uh expectancy freedom to make life choices and I hope North Korea is below in this uh generosity and perceptions of corruption now these are some some variables the metrics and based on this this final score has been calculated and you can you can look at here Finland you know scores the chart here Finland Denmark Norway the Nordic countries are doing really good when it comes to uh happiness score uh at least according to the data now this is my source data and I want to generate some visualizations or infographics using lead and that's what this tool is going to help now once I uploaded here I can find out all the details about the data set that's what we do when we use pandas right DF do info DF do sa DF do type whatever whatever right we do some basic analysis of our data frame when you use FAS to analyze your data frames or csbs or excels now if you look at here these are some information uh first it talks about the F name main column which is overall rank uh it gives you some sample that how does it look like and it has number of unique value it has country uh you can look at some countries like Bulgaria Mongolia ETC number of unique values 156 countries are there in that CSV that I was showing you then we have score we're talking about their score and we have some that's really good we also getting standard deviation mean and maximum of that particular column that's fantastic right that's we used to do with DF do info or DF do mean or Max whatever and then we also have different columns and some uh information over here for that I will just uh I'll minimize it here that's how you minimize once click here now the interesting thing guys it also gives you questions and I will cover what goal mean by the way you know in a bit it has different uh it has four major features goal summarize visualize and there's one more that we'll cover it in detail now if you look at the goal the first question it gives me how does the score of countries vary with their GDP per capita now suppose if I am a beginner in a data an analytics field or even if an experienced person I when I see a data for the first time I might not uh visualize in my mind that okay what kind of chart or what kind of graphs I can generate what are the two three different variables that I can consider can I do a scatter plot can I do a bar plot between two variables or two columns now this will give you a very high level quick overview okay these are the possibilities that you can consider and you can draw it okay you can use this tool or now we have built this app right you can just upload your data and play play around it now if you look at the first goal it's it gives you the question it also gives you how will the visualization look like it says scatter plot of score versus GDP per capita that's the heading of your graph scatter a plot of score versus GDP per capita but it also gives you rational which is really important now if you look at here we have one goal we have two goal and then we also have the chart here the chart which I I only get it for the first rule where we where we just we do we did chart and the first index of the list that's what we looking at the first element of the list we'll so we'll see that in uh the code when we writing the code now if you look at the goal one we have a detailed information about how are we going to execute that goal and get our chart or Graphics or infographics whatever now if you look at here this is the question how does the score of countries vary with their GDP per capita it gives you a detailed information that this visualization will help us understand the relationship between a country's GDP per capital and scho not only you just go and randomly generate a chart but there should be a business logic or there should be some logic behind that chart that what what kind of value we are going to create or what kind of uh impact it can create from a decision- making standpoint that's what also giving you from rationally here you can see the rationally on the string here is the visualization and the other one is which countries have the highest and lowest score so this is the second chart that I can generate right I can generate a a pie chart or some other charts like that and you can find it over here bar chart of country or region versus score and then we get a chart here for the first one because we are only using the first element but you can you can do a for for Loop and you can also visualize other charts I will show in the code when we writing it how does the score of countries vary with uh their GDP per capit now this is a summarization of your data you have a you know large number of CSV where you have lot of rows inside a CSV file or how can you just upload it here on a application and get a quick summary of your data and also with some visuals and charts that's the purpose of this and also gives you with lot of questions and you can do right now we have only two goals you can do n number of goals you know that depends on how uh what is the purpose of your uh use case now this is the summarized part if you go back to let me just copy this here guys I'll just copy this here and I will go to question based graph now when you look at question based graph or the query we have a uh subheader here it says query your data to generate graph and here I'm just going to browse files let browse the same file it's already there and here I'm just going to write that question that I've copied from there you can also do query based graph generation and that's why it's lead is really uh uh I mean it's looking promising because it provides uh reach contextual information to a large language model once you upload a data right a CSV file which is granular in sense right it it looks at the context behind your entire csvs and then it then it gives to a large language model which generate a uh chart and it's again it's a multigeneration steps it's not a single generation step and that's what they have provided a research paper we'll look at it uh in a bit now here once I click on generate graph you can look at here the app starts running and here for this particular query I get the response from a visualization execution there's an Executor that works in the back end and I have kept it veros because I want to explain you that it first gives you a b 64 uh string over here it's a b 64 uh encoder value that you get uh you gets it here you have to take it and convert it to image that's what I have done now you can look at it also gives you the code by the way now if you want to use this code or run inside a backend process as well you can also do that and here we got our image for this particular query that we have asked now imagine how easy it makes it for me if I'm not good in a data analytics field that's the one uh area I'm looking at even if I'm a very good data analytics guy this can augment my entire day-to-day uh work that I am doing at my workplace that's what the power of leader guys uh that Microsoft has released now this is on generating the graph here you can also set up a lot of uh instructions for example here you can pass an instruction uh in back end that okay give me graph of red color it will set the color labels for you so this kind of customization you can also do it or I will say a personalization you can also do it here uh if you if you are you know if you find some interest in different kind of color labels Etc you can do it so this is the application that we are going to develop but before we go into the code guys you know uh let me first show you a research paper over here you can read the paper here about l so l in French it basically I said right loved by all that's what l means in French okay now if you look at here it says automatic creation of visualizations and it understand the semantics of data the contextual u meaning behind your granular data which is really complex a lot of people will have question that hey Lama 2 is not performing well for CS SB data and when we building a chat it will not do it to be honest it's still not there okay so because even if it's an instruction Ted model they are not that capable of Performing really good you know there will be wrong false projective there will be hallucination some giis outputs at least GPD 3.5 or other close Source model performs a bit on expectations that we have when it comes to granular data now this kind of Frameworks library or systems whatever you call it they call they're calling it a system here the lead the system that what that's what they're saying now this is really going to power that llm workflow when it comes to performing well on CSV or excels so this is why it's important it understands the semantics of data innumerate relevant visualization goals first create the goal and then generate it's not directly generating it it's a multi-step generation process and you can see it over it says multi-stage generation problem and they have different pipelines and these are the four things that their features on their modules one is a summarizer that converts the data into a rich contextual meaning okay of your CSV file that's what it does for the first first module in a very natural Language summary that an llm can understand directly feeding an llm and hell hell amount of a hell number of uh your numbers in your data frame it's not going to work LM struggles with numbers and Mathematics there's no doubt about it now if you look at the next part which is goal Explorer it it creates the goal and it find out the relevant query that you can ask to generate some graph that's what the goal is you can find it out over here then it has the visual generator that generates refine and the visualization code and then it has infog grapher generate infographics that's the four modules it provides I'll give the link in the description but again you can find it out yourself as well it also has a GitHub repository that you can also read it you can see here the four thing that we are talking about okay if you look at the four things summarizer which is Rule plus llm combined and then the other is llm only goal Explorer once you give a natural language to an llm it will create some goals for you based on that uh natural language information that that's available to llm and then it generate the graph and the infographics there are the four modle and L one of my subscriber today just mentioned right when I'm creating this video just 30 minutes before I wanted to create a video on L and somebody I think if you are watching it I forgot your name by the way somebody has just suggested me to create a video on l what I also want to thank him as well thanks for suggesting me to create a video on this now let me just close this uh let me just close this as well let me just close this as well now let's let's start developing this application guys and we'll explain a bit that how you can do it and not only with close model how you can also use models from hugging face and I I know that that some of you might be interested into using a local model so let's now start developing this application guys so to develop this application guys you can see I have a folder here that's called le. demo and I will open this in vs code let me just open this in code Dot and then it will open vs code for me here you can find it out uh let me just open it now here you'll see I have a requirements txt now in requirements txt I have few libraries and packages so lad that is Cap uh compatible with four different types of model provid I will say so you can use open AI models and again a open AI is combined there as well then you can use coher model if you have coare model you can also use coare model then you can also use Palm models like for example vertic AI models you can use text bison Etc and then you can use HF model which is hugging face model provider so it it does not support or it's not compatible with quantized model right now so if you want to use ggf or gptq model right now you cannot do it you can only use the Transformer based model which are not quantized okay the autoc causal model you can use now here uh if you look at I have stream lit that's the dependency of the streamlit application I have to because I will show you I'm not going to use uh HF model because that will probably take a bit of time I will explain that how you can replace the open AI with an HF model there it's just a single line of change that you have to make and you might get an error that provider has an extra keyword or something think if you are using llm X because it uses llm X I'll show you when I'm writing the code let's you can see I have a EnV file one thing that you have to do you need python. EnV as well because I'm going to use uh open AI model here in this case I'm using GPT 3.5 turbo so I'm just storing it in an EnV file and just using it in a load. EnV let's me create an app.py quickly here now here we're going to write the code so let do import streamlet as STD and then I'm going to use from Lea so from Le import manager text generation config and then the llm these are the required dependencies the modules from L okay so we need a manager that will manage your uh llm and then the text generation config where we will have some configuration related to a large language model and then LM is something that basically you know that will uh so llm is something that where you pass your the name of the llm okay so let's let's see that now here I'm going to also use from Dov import load. EnV and I also need import OS I need import open Ai and I need input let me just do input IUI as well because we want to use P pillow here to save the image we're going to get a cbond type of image now from P import image and from IO import byes IO because we're going to handle the csbs the uploaded csb and who going to need import base 64 okay I really okay fine now let's do a load. EnV here so let me just do a load. EnV and I'm just going to do open AI do API key and I'm just going to get that from EnV file that we have using get EnV function and this is the name that how I have defined in that en EnV file now open AI API key now we are done with this guys okay so let's write a function or I will not probably write a function let me just go to my GitHub gist GitHub and I will just this is my GitHub by the way if you want any code of any of my video you can just go and take it from there now in just I'm just going to copy this thing from here okay so let me just copy this thingy okay now copy and paste now what this function does okay it's it takes your base 64 string okay which is an encoded string and then it gives you the image that's what it does it returns an image okay so you can see it says use by S to convert The Bu data to image because uh uh leader provides you a image which is like it contains the Bas 64 of that image so I I I have to use that on a stimulate application and that's why I'm going to convert that to an image now now what we next we are doing here let's have something called a variable here L and here I'm going to use manager and inside this manager I'm going to write let's see text gen or you can also Define it later but this is how you define it by the way even if you're not using it we will overwrite this variable later on but now here I'm going to write llm open AI now if you're using open AI you're going to give open AI like this if you're not using open AI in that case what you have to do is you have to use llm and then you have to provide a provider you can look at this right it says param provider it takes a parameter that called provider in this provider you have to give HF and then you give model yes model and then you give the model name so for example if you are using a model from meta or for example if you're using from together computer a Lama 232b model then this is how it works looks like okay so something like this Lama 32 key and yeah and then you can also pass a device map if you are if you have a limited compute okay so this is how you load a open source model okay but that will take a bit of time on inference as well if you have limited uh compute power like if you have a single GPU or if you have very limited uh CPU memory there now this is what I'm going to do now let's have a text gen config so how do you define a text gen config now so in text just text generation config what I'm going to do is let's write text generation config and here this is just for we are not going to use both of this that we are writing here but this is just to show you okay so here you can define a temperature and for example 0.5 and then model and the model is for example GPT this is model I'm just going to use 33.5 turbo and uh let's use a 0301 model of Turbo okay and then let's also use cach equal true okay so you can also pass it also has a caching mechanism so let's do use cach equal to true now this is how you define so if you're using a if you want to use an open source model these are two things that you have to worry about apart from that rest of your code will remain same okay the model name will not be model in that tation config will look different that's you have to pass your provider model etc etc now let's keep it like this now I'm just going to create a stream Le thingy here s. sidebar. select select box is it right yeah and then here let's write uh choose and choose an option or something okay and here let's uh it's a list so let me just write uh summarize and then the second one is uh question based graph now this is the two menu that I'm going to have okay now if menu so if menu equals equal summarize so let's if menu equals equal summarize then let's have a subheader here hd. subheader and let's call this as summarization of your data so a bit of you know decoration on the app okay and you can extend this further further guys okay now let's have a file uploader and in this file uploader what I'm going to do is we're going to use s. file uploader uh Define a label here in not enter upload your file and you can also Define a type I'm just going to uh let's keep csb for now okay so type csb or you can just not not it's not needed to provide a list because we only giving one value of tion here which is type equals csb but if you have multiple it's better to pass this as an list if you want want to give XLS or something like that now file uploader is not none let's check that there's a value inside that somebody has uploaded a file then only execute rest of the code now here we have to save the file guys because it except we have to get the value the F name that's where we have to pass now path to save so let's just do pass to save and here I'm just going to call it fin name. CSV and if you don't want to uh do the coding uh alongside me you can skip this part but in most of the my video I do coding and if you want to follow it together you can follow it otherwise you can just take the code from GitHub or you can just go in the last okay of the last part of the video uh path to save F name. CSV now let's save this so with open path to save path to save Ed Ed F or Ed file and then here what we're going to do we're going to write this so f. write that we write on the disk uh file appr do get value yes we're going to use get value ah not get attribute sorry I'm going to use get value and yes now this will get the value of it now this is we are okay with this so this will save the file now let's have a variable called summary and in the summary here we start using that module one of the module we'll use here so that's called leader do summarize this is the module we using so you can look at leader that we have defined on line 22 that's the leader that we're going to use here okay that's the basically the variable the the model that we are using it now leader. summarize and in this what I'm going to pass is so in this let's pass the file name first so this is the file name that we going to save so F name. CSV and once you save save it to going to use it has couple of methods so default means it just give you you can look at here right it gives you summary method default it gives it has couple of more where you can have different type of summary but I just need a high level summary of my CSV file that's what I'm going to use here so summary equals uh summary method equals to default and then just pass list let's just pass text gen excuse me okay let's pass a text gen config as text gen config that we have defined on top let me do an ALT G and then you also use Library equals Library so let me just do o okay now we are done with this guys okay now uh we we haven't defined the library yet we will we need this Library once we are plotting the chart not now okay so this will not this will library is part of uh the visualization not the summary okay summarization now this variable holds your summary okay the lead. summarize now let's uh do that so here let's write sd. write summary and this will just write the summary now once we write the summary next step is to define the goals so first was the summarizer and the next part is goals now the llm will have your summary okay uh in like a contextual Manner and now it can look at those summary and can create some questions for you the goals for the rest of the generation is step to generate the graph the next module which is visualization now in this goals what I'm going to do is I'm going to use leader. goal and inside goals it's not goal it's goal pass the summary you have to pass summary as an input parameter that's the summary and then number of rule number of goals that you want okay so let's keep it two for now because we are using uh open AI which is in a paid model right so you have to pay for it okay so Nal to 2 and text gen config equals text gen config that's what I'm going to write now this goal is there now once you have the goal at least as now we have two goals we have to use a four Loop here so for goal in goals and here we can just again do s. WR and just you can do goal that's what you can do here now this gives us the goal now we have summary now and we also have goal now the next step is to visualize get the graph out of it let's do that so what I'm going to do here is and this is the only uh thing that we have to write here guys most of the code for the next L if which is question based gra will be copy paste so I equals z and here I'm going to use Library it provides you different libr it has mat plotly or plotly also or D not plotly I think mat plot La c bond D3 it has D3 as well so for Simplicity let's keep c bond now in cbon now after that I'm going to again have a text gen config and text generation config here and then let's pass for that n equals to 1 only gives the one image to me I don't want so we are overwriting here guys okay so temperature and temperature let's keep 0.5 and then let's use cach equal true now the reason I'm not uh what I'm doing here I only need one image so nals to 1 now what I'm doing next is once I do uh Nal to 1 okay uh the next is charts so let's have a variable called charts and let's now visualize so we started with summary and then we saw goals and now we're going to visualize it so l. visualize and in this visualization I'm going to pass summary equals summary so we are passing the summary and we're going to pass goal equals goals and you're going to pass let's pass I here if you have n number of thingy uh the goals and then text generation config equals text generation config and then here we're going to pass our library which is cbon okay so let's pass the library now if you want to use D3 if you want to use M plot live you have feel free to use that as well now we have the charts now now this charts basically holds a a base 64 uh in coding as I said now we have to handle that so let's do it so I'm going to call it image base 64 and and image base 64 or let's call it uh image base 64 string a little bit of self-explanatory variable and then here I'm going to call it charts and we only have the uh and that's called R I have done that exercise so I know that's called that's has been called raster that holds your base 64 so I'm just iterating I'm just you know I have to look at the subscript of that uh Json that I have the list that I have okay instead of Json now charts. rest it's not a Json by the way it's it's a list okay now charts restor now here I'm just going to use image and you can see on top we have this function so let's utilize that function so base 64 to image and here I'm going to pass that IMG thingy uh B 64 string and then just st. image that's it and this will give me the image now we are done with the first part the first menu now the next is L if so let me just write it quickly uh L if menu equals equals uh it is question based so let's just write question based graph okay question based graph let me just copy this thing from uh I'll just copy it from at least this part okay now let's just copy it to make it little quick come down to okay and let's not call it summarization let's call it query your data to generate graph query your data to generate graph this is what I'm going to do and file uploader it looks good upload your CSV okay upload your CSV and file uploader is not n let's keep the same name there's no problem let's change this at least because it will get over written otherwise uh file name 1. CSV uh file this looks nice this looks nice this looks nice now let's come out of width okay and we'll have a text area and in text area I'm going to use h. text area and I'm going to write query e your data to generate graph and I'm let's have a height of that text area as 20000 looks nice and if St do button uh S button generate graph let's keep that and here if text area is not none or length of text area let's do if length of text area is greater than zero there should be a value there should be value inside it otherwise don't execute the rest of the code okay now s. info uh s. info your query your query and it's a concatenation of text so text area Okay this looks good now we have text area as well and now again the leader so let's have a leader which is manager let's let's copy this guys here okay where where we okay and we can just use it by the way also but we should have just called Le equals L leader manager uh text this looks nice text generation config uh text generation config let's use this text generation config okay I'm just going to use text gen config from here and I'm just going to paste let's call it 0.2 for this case probably let also keep this this really too high you know 0.5 0.2 and we are okay with that now the summary thingy so let's get the summary thing from here uh you can look at the summary thing so we first need the summary and this will become fin name 1. csb this is default is okay Tech gen config is fine and once we have the summary now let's have an user query so your user query is nothing but what is that text area right okay uh user query you can directly pass it as well now let's have the charts so in the charts I'm going to pass leader do visualize and here we have to make a change guys here summary will be summary so let's do that but your summary is summary but your goal is not now the goals which is like which is by default given by leader now this goal is an user based query okay so the query which user has given right so that's what you have to do here so your goal becomes user query that's what your goal becomes now and then text gen config becomes your text gen config that's it and we are done with now okay so let's do and this is how if you want to do it in terminal you can find out charts and the first of the uh list and then you have let's for example image base 64 and then you can e chart z. rest again uh just to get the B 64 uh string from there and then just have an image and image equal to let's use that function what the function name this is the function name and let's pass image base 642 and then hd. image thingy over here and in the S do image I'm going to pass the image that's it we are done with our code guys now let's run this and see uh if we are getting the response but let me explain you because after that I will cut short the video once I show you the demo I already have shown it what we are doing here guys we have two things one is summarize the other is question based graph now we are using a library called L which is like a system that provides you flexibility of multigeneration steps which generates automated visuals for your queries okay on top of your CSV file that's what it does okay and it has some both open source and close Source model supports now this leader gets open AI we have some configuration and then we have couple of conditionals in the first conditional what we are doing we are uh getting the CSV from end user and we are storing it getting the value of it and then we have three different modules that we are using apart from in Graphics we have used all the three modules rest of the three modules the first is the summary which provides you the contextual meaning the semantics behind your data and then it has goals the llm which looks at the semantics or the context of your data generat some goals which are the questions probably for the llm to again generate an image or chart and that's what it does the visualize that's what we're doing the first condition second condition is more on the user based query if I have some query that I want to generate some chart for I can use a second part of it now let's run this application and see if we are getting any error or it's working fine let me open the uh terminal and here I'm just going to do streamate run app.py now once I do streamate run app.py I'm expecting that it will open the same app I was just showing in the beginning let me just quickly give you a high level uh walk through again you upload the CSV you can see it running it it has given you the summary first this is a summary step and you can exchange this further guys we can put some more decorations some veros of like subheaders header Etc and you can look at the goals now okay the goals that I have got and you can also look at the chart that I have got and this is really really interesting right it's really helps you make better decisions on uh your data analytics workflow right if you want to have different type of charts graphs you can set the goal high right you can do nals to 10 to get 10 different type of charts logic to get it quickly and then you have question based graphs here you can ask the question that you know you want to ask in the in the question part you can see it's generating for you let's let's take this now we can take this one and try to see if it's like generating any graph or something for us and once I do that probably it will generate or it will be a messy graph because you know sometimes if you generate bar graph or plot graph you can see list index out of range which which perfectly fine I think I guess for me right because you know it might be the index might be out of range for that particular graph chart but you got the idea I hope right so uh the code is available on GitHub what I will recommend you to do is I will recommend you to uh try out the open source models okay uh with that and I will give the link in the description that collab notebook that you can follow for that particular thing uh open source model where but you might get an error if you do pip install leader because let me just show you quickly what I'm talking about you know when you want to use use open source if you want to use open source you have to look at here it says lead I want I want to show you the real dependencies behind this particular system that Microsoft has built uh you can you have to come down and you have to search for uh the open source thingy you can see with locally hosted llms and it uses a library called llm X now there is an issue with llm X you have to downgrade the LMX version to use hugging face models with L so if you are facing any error with L and if you're watching my videos I will recommend you to downgrade the LMX version I can give you that in my GitHub link or the YouTube link description somewhere that which version you have to try for if you are facing any error on Windows machine because right now I'm on windows so I'll give the link in the description please go ahead and try it out I really loved it because it's really promising and I'm going to experiment something with open source on some other use cases so uh that's all I think this is what I wanted to show you in this video if you have any question thoughts feedback for me please let me know in the comment box you can also reach out to me through my social media channels the the credentials are available on my YouTube banner and the channel about us uh if you haven't subscribed the channel yet please do subscribe and support the channel so I can create more such videos uh please share the video and Channel with your friends and toer thank you so much for watching see you in the next one
Info
Channel: AI Anytime
Views: 15,506
Rating: undefined out of 5
Keywords: lida, lida microsoft research, lida llm, LLM, LIDA, langchain, query csv, chat with CSV, CSV Chat, generative ai, gen ai, llama index, ai models, how to build a chatbot, streamlit, streamlit app, tech, python, coding, india, ai videos, viral videos, 1littlecoder, sam witteven, prompt engineering, abhishek thakur, medical chatbot, mistral, chatgpt, gemini, question answering tool, RAG, lida for csv, microsoft, meta ai, gpt4, pinecone, vector database, fine tuning, training LLMs
Id: U9K1Cu45nMQ
Channel Id: undefined
Length: 39min 30sec (2370 seconds)
Published: Wed Oct 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.