Building an AI Data Assistant with Streamlit, LangChain and OpenAI | Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
imagine accelerating your machine learning projects with an AI assistant that will save you hours and hours of work welcome to digil Academy you go to destination for in-depth courses and resources to help you master the world of data science and AI I'm Anna and in this series I'm going to show you how to build your own AI assistant using strim Li and lung chain I'll walk you through the entire process from instelling the required libraries to solve a machine learning project use AI make sure that you take the rating tutorial that accompanies this video in the digilite academy website you'll find the link in the video description below plus you can download the final project and also the data set that we will be using now let's break down what to expect in this video in this project streamlet is the foundation for the user interface allowing you to upload CSV files visualize data and interact with the AI assistant the beauty of streamlet is its Simplicity even if you have limited web development experience you can create Dynamic and interactive data applications while streamlet is beginner friendly it also offers customization ation options for those who want to create more sophisticated applications finally streamlet makes it easy to deploy your web apps to the cloud or share them with others F than enhancing its utility for collaborative data projects we'll start by structuring the Streamlight app with titles headings subheadings captions and text formatting next we'll Implement dividers to segment your content making it easier for users to navigate through your AI assistant uploading csb files is essential for this data driven app we'll cover how to implement this to enhance user interaction we'll create a dynamic sidebar and expanders and we will discuss how to display graphs for data visualization we'll also explore text input which allow users to interact with your AI assistant through text caching is a powerful technique for optimizing performance we'll explore this topic and show how you implement it effectively understanding session State and implementing non-state fold buttons are key to creating responsive applications we'll cover this in detail now let's shift our Focus to L chain longchain is a framework that can be used to build conversational AI systems that can understand and respond to user queries its main components are models agents tools prompt templates chains memory and indexes the project integrates open AI gpt3 3.5 turbo large language model llms are integrated into this streamlet application allowing you to have a dynamic realtime interactions with the AI assistant you can ask questions SE explanations and receive immediate responses we'll start by setting the open AI key then we will explore how to load and use open AI large Lang models to generate information open up a world of possibilities for your AI assistant with a pandas agent you can answer a specific predefine questions about your data frame or any variable of your choice I'll show you how to set this up you can also enable your AI assistant to answer a specific questions chosen by the user what you learn here can be applied to your own data analysis problems for example you might want to create an assistant to help you explore sales data or customer reviews why will focus in one specific use case you can easily adapt these techniques to your unique needs so without further Ado let's start building our AI assistant hit that like button and don't forget to subscribe for more data science Adventures let's get started okay so before we start building our AI assistant there are two things that we need to do first is to set the API key and the second is to run streamlet to set the apaa key you need an APA key from opena and you need to put that into a different script in your same directory what I've done is I created this API key file uh and there what we need to do is store or key so here you would put your key as a string but in this file called APA key the reason that I'm not sharing is because this key is a secret key and it shouldn't be shared so let's remove that just remember that that goes to your API I script the second is to run streamlet for that what I want you to do is to open the terminal go to your directory and what I want you to do is to run streamlet using this command so that is streamlet run and then the name of your pivot script where you're going to write the code for the AI assistant and just press enter and what's going to happen is that a new window is going to pop up in your browser and there is where we're going to see your app now we're going to import packages that are going to provide the necessary functionality for our project the packages that we're going to import are os API key L chain dot m striplet and pandas so let's write that down okay I'm going to close Explorer so we can see the code better so as you can see we have imported the recorded liaries So they always always Library provides a way of using operating system dependent functionality then we have API key in order to load or API key correctly then stri that is the heart of our project really then behind this Library fundamental for data manipulation and visualization this we have the package Lang chain which is a specific to your project and that incorporates open a l language models that is going to allow us to interact with the a assistant and then we have do M very important package to make sure sure um that sensitive information such as the API key is securely stored okay so now we're going to start building the user interface a little and I'm going to show you how to add titles subtitles headends and other stuff that is going to make the user experience a little bit more engaging so first of all we want our a assistant to have a title and also we want to have a welcome in message okay so let's write that so in streamly in order to write a title we do sd. tile and then inside of the brackets we write the title that we want to add to our app so I'm going to call a AI assistant for data science and I'm actually going to capitalize that um let's add a header now so in order to add headers in streamlet is just SD do Heather and we're going to add for example exploratory data analysis part if you want to write a subheader it's just st. subheader so let's say for example solution okay as you can see nothing appears on the app and in order to see what we've read then we need to click rerun so let's do that and let's see what happens so this is running and you can see now the title appears and we have here the heer and this of Heather as well as I said we want to write a welcoming message so for that we're going to just use plain text in order to do that in strip lid we use sd. WR so let's right for now just welcome in message and see what happens when we rerun so we rerun and it appears here so I'm going to write something along the lines of hello I'm your a assistant and I'm here to assist you with your machine learning projects so let's write something like that so okay so let's see what happens when we run okay so now our placeholder for the welcoming methods has been populated and now we have hello I'm your a assistant and I'm here to help you with your data science projects I'm going to move the Heather and this Heather after the welcom in message to enhance user experience something that we can do is add a sidebar on the side of our application and to do that we can use with s sidebar so I think something interesting that we can put in the sidebar is an explanation of what the user can expect from the app for that what I'm going to do is to write a text so using sd. write that will appeal in the there so it needs to be within the width statement so let's write that so your data science adventure begins with a csb file and now I'm going to add some extra text okay and I'm going to add that to make sure that this is actually a string okay let's rerun okay so as you can see this has appeared here let's click and this is our sidebar with our text in the side giv a little bit of explanation of what the user can expect from the app we have your data science adventure begins with a csb file you may already know that every exciting data science Journey starts with a data set that's why I would love for you to upload a csb file once we have your data in hand we'll dive into understanding it and have some fun exploring it then will work together to shape your business challenge into a data science framework and I'll introduce you to the coolest machine learning models and we'll use them to tackle your problem sounds fun right this looks all right like this but we can make it a little bit better and a little bit more visually appealing so what we're going to do is use some text formatting to make this look a little bit more organized so first of all let's split the first sentence into a title and then the rest is going to be the text but instead of using titles and write we're going to use write and caption so caption is used more for footnotes um to add text to images and things like that but just because I like the formatting of this particular um way of writing text I'm just going to use it for this so let's do that I'm going to leave that add and write and then what I'm going to do is I'm going to write sd. caption and then add this in Brackets so let's run and see what is happening brilliant so now we have a small title at the beginning and then the text I'm going to write um this here that is better perfect something that we can do is add bold text and italics so I'm going to show you how to do that so if we write here an asterisk and here another asterisk at the end of our string and here add double asterisks when we run we're going to see that the first T is now going to be italics and the rest is going to be bold so that means that as you can see if we want to write text in bold we use double asteris and if we want to write text in italics we write just one asteris okay let's continue talking about formatting and the appearance over our up I think it would be cool if we could add a little line here and we can achieve that but us in dividers so in strip l in order to add dividers we can use command st. divider so just as this if we run we will see that this line has appear here and finally something that I would like to add is just to say that I made this app so I'm going to use caption okay and if we run we're going to see that that appears here I think it would look better if we could make that centered luckily in strim lid we can use HTML in order to further format or text and that is what I'm going to do now in order to make this text move to the center and we need to add another parameter which is this one the default is set to fals but in order to make the HTML work we need to set that to true so let's do that okay and when that is done we can rerun this and there you can see that it has been centered there are all things that we can add to our application something that is quite interesting is expanders so let's see how to do that with sd. expander so inside of the expander we're just going to write some placeholder text and let's run and see what happens so as you can see this has a PA here so the user can click and then it can dis decide you know they want to look at that information or not something that we can also add our emojis just to make it a little bit more fun so I'm just going to add a couple there okay let's rerun and see what happens so not the Emojis have appeared on the screen okay so now I'm going to show you how to add a button and how to manage what is called session state so in order to add a button in strip L and I'm going to add it just before the header is as easy as this so um well it's with SD do button but because we want to trigger an action we're going to add an if statement so if s dot button let's get started so when the usern clicks the button then this is going to happen I'm going to move that there and in fact let me just move all of this after the [Music] sideb okay okay so let's rerun and we can see that now this button appears here when we click then we have the heer and this up header so let's continue building our app and I'll show you in a second why session state is so important as we said in the explanation your data science adventure begins with a csb file so now I'm going to show you how to integrate a csb uploader into your application streamling provides a convenient function call st. file uploader for adding file uploads in your application so is as simple as adding the following so after the button we are going to add the following and then we need to specify the type of file that we want the user to upload so in this case it's going to be a csb file oh sorry this isn't equal right let's rerun okay so now you can see that the appears there let's try it out okay so this is the data set that I'm going to use it's a Twitter stock market data I'm going to upload this and this appears here great but we would like this to appear after we click the let's get started button so we're just going to move this and let's rerun so now we click this heads appear here and then we have the file up loader we click we select our file but boom it disappear okay so it turns out that pattern are unstaple and that means that butons return true only momentarily during the page load immediately after their click and then they rever to false so in order to work around this the streaming allows you to use session state which is essential for maintaining information and interactions between different section of your application so I'm going to show you how to implement this with the button that we have in our application so first of all what we need to do is to initialize the key in the session state so following in the notation that we've used for the if statement I'm going to create this function called click and finally what we need to do is to modify our button and then under this if statement then we can add the csb upload there so let's run see what happens okay so now we click disappears browse file choose our file we upload and nothing disappears so we have fixed this problem using session State okay so now that the users can upload their csb files is time to convert the uploaded file into a pandas data frame which is the standard data structure for data manipulation and Analysis in Python so for that what we're going to do is after the user uploads the file what we're going to say is if the user csb file actually exists we're going to transform that into a data frame also what we're going to do is to say the low memory to false um just because the the default is to actually optimize memory but just in case the file is really large we're just going to set that to fold also what we're going to do is to ensure that the file pointer is at the start of the file just in case so let's write all that down so that ensures that the file pointer is at the start of the file and then we're going to transform this into a data frame okay and now the data frame is now ready for analysis and exploration our AI assistant relies on large language models to provide natural language understanding and generate responses in this subsection what we're going to cover is how to load and initialize the Ln model for your Streamlight application so first of all what we're going to do is we're going to create an instance of the L model and also we're going to set the temperature parameter to zero what does that mean well the temperature what it does and controls the randomness of the model so the higher the temperature the more creative your model is going to be so for this particular project we're going to let the temperature be low in order to make the responses a little bit more deterministic so let's write that down so we create an instance of the model we're going to call it llm and then we're going to set the temperature to zero so first thing we're going to use our model 4 is to generate some information and we're going to add that to the sidebar so I think it would be interesting if we could add some information about what is the steps of the Eda are and we add that into the siteb bar and if the user wants to look at it because it finds it useful just need to click and the information will expand so let's do that so I'm going to move our lent model first [Music] here and then what we're going to do is in the expander what are the steps of Ed a and what is going to write now is the steps of the Eda but we're not going to manually type it down the LM is going to give us this information so in order to do that what we're going to do is we're going to say okay llm and then we're going to ask the same what are the steps of ADA so let's rerun okay so very important we need to add the open AI key so let's do that very quickly so we're going to add it here after we have imported the required um packages and also something that we need to do is to load the dot environment just to make sure that the variables are correctly red let's do that okay so what we're going to do and you'll see why later um is we're going to move this at the very end and we're going to add another with SD dot sidebar oh and I saw another error so that should be like that right let's rerun we're going to browse a file upload it's running and let's check the sidebar so we have the expander here and if we click we can see that the Ln has generated the steps of the Eda in this subsection what we're going to do is we're going to cover how to create a pandis agent and enable it to analyze and provide insights about the data so first of all what we need to do is to create an instance of the pandas agent by passing our lln model and the data frame that we want to analyze so in order to do that what we're going to do is so here we have Panda agent and then we use the create pandas data frame agent and we're going to pass the llm and we are also going to pass data frame and we're going to set ver V to true the default is false and that's just to see the train of thought of the of the agent so first thing we're going to use our pandas agent for is to answer a specific questions about the data so for example let's create a question that is what is the meaning of the column and now what we're going to do is to create a variable that is going to be called columns meaning for example that is going to be the response over Panda agent so for that we need to say okay pandas agent. run and then we're going to pass the question okay so now we want to see the output in our application so we write a d. WR um and we're going to write there columns meaning let's run cool so the columns represent the date opening price highest price lowest price closing price adjust and adjust closing price and volume of the stock so or pandas agent is able to answer this predefined question that we have pass through it so what we're going to do now is to create a function with certain predefined questions so when we run these functions all these questions will be answered by the pandas agent so the type of questions that we're going to ask are very general Eda questions so for example um how do the um rows of the data set look like as I said what are the meaning of the columns how many Miss ballons do we have um are any duplicate values a little bit about data summarization um and even a little bit of feature engineering so we're going to ask the pandas agent if you know if there are any new features that it would be interesting uh to create so let's write down this function so we're going to leave the pandas agent outside and then I'll just remove this this for now and we'll just create um um a function that's going to be function agent and we're going to use the pandas to generate um the pandas agent to generate all the answers of this of these questions okay so having written your function down let's call the function and then let's run the app and let's see what happens okay we run okay so this has appeared here so we can see how the first rows of our data set look like then we have the meaning of the columns we have the missing values it's saying that there are no duplicates information about the data summarization correlations between variables um also what are the uh features that we could maybe create and also is talking about potential layers okay now imagine that the user is interested in a very specific variable um and we want the user to select the variable for further study well we can do this using um text input this is a feature that stream lip has in where the user can actually write text in the application so user question is going to be a text [Music] input and we're going to ask um what variable are you interested in we're going to run and see what happens and as you can see this is running again and you would agree with me that it would be better if the information stays and it doesn't run again okay so now what baral are you interested in well we haven't done anything with this but let's say volume enter on again everything runs again so how do we fix that so every single time we are entering something it doesn't run again so we can sort this issue using caching in streamlet so caching is an important feature that allows you to store and reuse the results of computationally expensive functions which improves the performance and responsiveness of our AI assistant so we're going to use the cach the Creator to make sure that the function are not run and run again and we avoid the app to run everything again so one thing appears after another so let me show you how to do this I'm going to going to create um a couple more functions and organize or script a little bit more so first of all what we're going to do is we are going to move or a function agent and orandas agent earlier earlier in the script so there after the ENT model so what we're going to do is to add functions of the main script so here we're going to add this dysfunction agent and then I'm going to move the pandas agent early on as well so we are importing the libraries we are set in the open a key um and then what we're going to do is we're going to put there after the opening I key we are going to set the title and welcoming message and so let's write here title and then welcome in message okay and after we've done that we are going to put here the explanation sidebar so let's grab that so this is this part here so we're just going to move that just necessary so things work fine what we cach the functions so um yes as I said here is where the explanation sidebar is going to be um and then we are going to move the button after this [Music] so this is all um about the button we really don't want so we don't want the Heather and this Heather there we will use it later for the sections that we are created so we are going to move all this um yes after after the sidebar so yes we have the explanation sidebar and then we are initializing the key in the session State adding the button we are uploading the file and when we upload the file and um we convert it into a data frame then is when the game starts right then is when every everything is um buil and generated so that means that we are actually going to move this in to this F statement and then um or Panda's agent is going to go here and the same for the functions and then we'll move this um as well so we have the pandas agent there this is something that we already have so we can remove this cool um and first function that I'm going to create is after initializing the LM model we're going to create a function for creating the steps of the Eda so that is going to be um steps EA and then here is just going to be this bit right so the llm is going to generate that and that is not going to run again because we're going to decorate that with the cach data decorator so let me show you how to do that um so here we are going to add this and this is going to be called steps a and then that is what we are going to return excellent so this will be computed and that's it and will be computed again amazing um then we have the pandas agent and then the function that is involving the pandas agent so let's put here function cyar and then this is the function main excellent um so all this is running all this is running and then it's time to add a Heather so first thing that we're doing is exploratory data analysis right so that will be a really good [Music] heer and then we're going to move the Side Bar that involves the llms there but now we're just going to write steps Ed so what this function returns great and then we are running the function agent and then we have the um the user question and let's add a sub Heather uh here so let's say um general information about the data set and then here let's say variable of study okay and finally we're going to Hash the function agent so let's do that cool so let's run and see what happens okay so running function agent there's Heather and there Heather the function agent is running okay so now hopefully when we write volume here and we press enter nothing reruns again brilliant so we have sold out this issue of things were running again and again by using cash in in streamlet okay so coming back to this variable of study what we're going to do is to create a function that does some exploratory data analysis but is specific to this variable and I think the first thing that will be interesting to show um is a graph of this variable streamlet has really good visualization properties so we're going to make use of that to create a line chart so we are going to add a function there so first we hash the function and then we're going to call this a function okay so um first of all we're going to use the line chart from streamlet that is SD do line chart what we're going to pass is our data frame and then the user question variable so did I call it um user question okay so user question um variable we're going to call it um um so if we actually let's do this if we actually run this let's see what happens so okay uh I forgot that okay run excellent so you can see now I line CH of the variable volume so now I'm going to use the pandas agent um the same way I did before to ask a specific questions about this variable but now instead of just um writing questions uh I write questions and I'll pass that use her question variable so we can answer specific questions about this a specific variable so for example imagine that what we want is to know the um a summary of the statistics right of the user re question variable so we're going to ask um so summary summary statistics so um so for example pandas agent. run um and it's going to be um give me a summary of the statistics off and then we have the user um question variable and then we're going to use sd. WR to write this um summary statistics um we're going to add other things like for example checking for normality checking for appliers checking for Trends and add all this to our function okay so let's rerun and let's see what happens okay so this has changed so it's running again and great it's given a summary statistics of this variable it's saying that the variable is not normal but is SK to the right and there is an outlier present great and there is a cyclic pattern of increasing volume in the summer months and there are five missing values of volume in order to avoid issues in case that user enters um an empty string and or there's no response and then we continue writing things U because that would definitely create a problem what we're going to do is to write a couple of lines of cod to sort that issue so we're going to say that if the user question variable um is not NN and also if it's different that an MC string then in that case is when we're going to run or function um and then it will continue WR in a or code after that so it's n all right and then for the next section we're going to add our Heather that is going to be further study so imagine now that the user wants to add a specific questions that are not predefined in our app so what we can do now is to create another text input in where the user can enter whatever question they want and we're going to use this pandas agent to answer this a specific question so so first of all if the question variable exists then we're going to say user question uh data frame um so we're going to say here is there anything else that you would like to know about the data frame okay so that was our user question data frame and now again if I'm just going to copy and paste this so if the question um is not known and also if it's not empty um and also actually not in a empty string or if they say okay I actually don't want to know anything else I'm very happy with the information I've had so far imagine that that happens um then what we're going to do is we're going to run a function that we're going to create in a second so this function is going to be called um function question uh and data frame and in this function what we're going to to do is to ask the pandas agent to answer this specific question so again and then we're going to create this function called function [Music] question data frame okay so what is going to happen here is this extra information is going to be stored in this variable and we're going to use the pandas agent. run to answer is specifically this user question data frame and then I'm going to add a return and I'm going to um add st. R in order to actually right the the answer to this question cool and then we have call this here [Music] um if the [Music] user um is actually in the user question data frame is actually no and or no then we're just going to add um an empty string um let's re run okay again the colon let's run okay so is there any strong coration between some of the variables let's see what happens okay so there is a strong correlation between open hand low close and add close but not between volume in the other variables um does the variable clothes have many Peaks and let's see what it says yes the variable close has a peak of 77 something cool amazing okay so that is that's all for this first video of the series but what is next you can continue your data exploration by asking more questions selecting different variables and seeking additional insights from our AI assistant in the next video of this series we will continue building our assistant it will be able to help you in converting your business challenge into a data sign framework offering guidance on model selection providing predictions and more we will introduce the concepts of chains and tools and we will be exploring other agents make sure to download the completed project and Sample data on the digilab academy website where you can also find more resources and courses on data science and AI you can also find the written tutorial Linked In the description don't forget to like And subscribe to be updated with the latest tutorials and upcoming content thank you for joining me in this journey and I can't wait to see you in the next part bye
Info
Channel: digiLab Academy
Views: 5,012
Rating: undefined out of 5
Keywords:
Id: CBRE_Me1IQ0
Channel Id: undefined
Length: 46min 48sec (2808 seconds)
Published: Mon Nov 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.