Building an AI Data Assistant with Streamlit, LangChain and OpenAI | Part 2

Video Statistics and Information

Video

Captions Word Cloud

Captions

welcome back to digil Academy I'm Anna and today we're diving into part two of our series building an AI assistant with shrim lit Lun chain and open AI if you miss the first video don't worry you can catch up by clicking the link in the description below before we get started make sure that you visit the digilive academy website where you can find the full reading tutorial as well as the finished project and the data set that we will be using in this video we're going to continue our journey to simplify your data science tasks we'll pick up from where we left off and I'll got you through in hands and your AI assistant now let's talk about what is on the agenda for this video in this part of our series we're diving into the concept of prompts prompt templates chains and tools in the line chain framework a prompt is a set of instructions or input provided by a user guiding the model's response it aids the model in comprehending context and generating coherent language based output such as answering questions or engaging in conversation a prompt templates are predefined recipes for constructing prompts these templates may include instructions few short examples and a specific context and questions suitable for a given task longchain provides robust tooling to create and work with his promp templates allowing for the seamless reuse of templates across different language models typically language models expect the prompt to be either a string or a list of chat messages chains are a fundamental aspect of Lang chain they are logical connection between one or more llm instances chains can vary in complexity tailor to the requirements and the specific llms involve we will be exploring two types of chains simple sequential chains and sequential chains the simple sequential chain comes into play when there is a single input and a single output between chains on the other hand the sequential chain is utilized when there are multiple inputs and outputs prompt templates provide the structure for constructing prompts and change uses prompt as part of a logical sequence of interactions with language models they work together to facilitate a systematic and organized approach to leveraging language models for various tasks with the Lang chain framework a tool serves as a dedicated interface craft to execute a particular task it typically involves a function that takes a string as an input and produces a string as an output in today's tutorial we're going to explore two different tools the Wikipedia APA wrapper and the python RL first we will create a promt template to store the specific business problem or users aim to address this will allow us to streamline the process of converting a broad business challenge into a focused data science problem additionally we will create another template to guide the user in selecting machine learning algorithms suitable for solving a given data science problem this second prompt template expects information about the data problem and additional insights from the Wikipedia research with our prompt templates in place we will then create two chains this change will utilize an llm to interact with the prompts and provide information we will connect these two chains using a sequential chain enabling us to input a business problem and in turn receive a reframing of the problem in a data science framework and ACC create list of algorithms suitable to solve the problem finally Ori assistant gains the ability to generate and execute python code to achieve this we'll introduce a python agent a key component that interacts with the python APL tool users will be able to select an algorithm of their choice and use it to solve the problem they have presented we will continue using expanders to provide additional information and text inputs to gather user information we will be introducing two new streamlet elements the text area input and the select Box by the end of this tutorial you have a more powerful AI assistant ready to help you in your data science projects if you have any questions or you need any clarification don't hesit itate to drop a comment in the comment section below and without further Ado let's jump into the exciting content that awaits Us in this video don't forget to like And subscribe for more data science Adventures let's get started okay so we're going to continue what we left in the last video so first thing that we need to do is to import the required libraries that we need to continue building our assistant so from the line chain Library we want to import from template then for the change LM chain simple equation chain and sequential chain um for the python agent we need create python agent and also agent type for our tools we need python our EPL tool and the Wikipedia EPA wrapper so let's write down we go to the start of the script and we start writing this up so we're going to add a divider to separate the exploratory data analysis section from the data science section and we're going to add a header with the text data science problem to introduce our new section we will also include some text to provide information about the importance of reframing the business problem into a data science problem but first we need to add a conditional statement to check if the variable usern question um data frame is not empty uh or not none um so if the user question date frame is evaluated as true then the code with it the indended blog will be executed so let's write that down just um here right so okay so the introductory text is now that we have a solid grasp but the data at hand and a clear understanding of the variable we intend to investigate is important that we reframe our business problem into a data science problem so we're going to run see what happens and there we go so we have the Heather and then our introd text now that we have our introductory section we are going to write our first prompt to do this what we're going to do is to create a new variable call Prompt and set it equal to text input include a label that says add your prompt here okay let's run see what happens and we can see that this text input has appeared here nothing is going to happen if we enter this because we haven't connected it to anything we haven't connected it to or llm so let's proceed to do that what we need is a way to trigger or prompt to our llm so if there's a prompt we'll create um a new variable for or response and pass the prompt to the llm and in order to render this back to the screen we will use SD not right so let's do that so as you can see we're using the llm that we used in the previous video right so what we're going to do now um is ask the llm if it can convert a business problem specific business problem into a data science problem so convert this business problem into a data science problem and the business problem is I want to know what the volume of the Twitter stock market is going to be in January the 1st 2025 right so I'm going to click enter see what happens great so data science problem predict the volume of the Twitter stock market on January 1st 2025 So currently we need to write the entire prompt but ideally what we want is or application to determine what should be generated based on a business problem that the user inputs and this is where prompt templates come in so we want to create a prompt template that takes in or business problem and generates a prompt asking to convert the business problem into a data science problem so to create our prompt template what we're going to going to do is to create a new variable called data problem template and that is going to be equal to prom template what we need to do is to define the input variable and also the template in this case the input variable is going to be the business problem and then our template is going to be convert this business problem into a data science problem and we're going to pass the business problem so let's write that down okay so we have our promp template with the input variable and the template convert the following business problem into a data science problem and we pass the business problem so now that we have a prompt template in order to use it effectively we need to use an LM chain so we're going to create a data problem M chain and we're going to set it equal to LM chain we need to pass our llm through the LM chain so we're going to use the llm that we used in the previous video we're going to set LM to that llm and then we're going to set our prompt to be our data problem template that we just created and instead of using the llm directly what we're going to use is the run method on our data problem chain based on that prompt and we are also going to set varibles to truth in order to see the thought process if we want to look at that so let's write that down so after our prompt template we're going to write data problem chain and then we're going to set this to llm chain we're going to set our llm equal to our llm and then our prompt is going to be the data Pro one template that is it and then BOS um is going to be equal to True excellent and then we also need to change this bit here so now instead of using the llm directly we are going to use our chain so data problem chain in this case do run and also we need to say that the business problem this is the notation um our business problem is going to be or prompt let's run now and see what happens let me just I'm just going to copy this last bit so now I show you oh I missed the five but we can rewrite that that is no problem so let's rerun okay so now instead of saying convert this into data science problem we're just going to add or business problem and it will output the data science problem so we write that down I would like to know what the volume of the Twitter St Market is going to be the 1st of January of 20 2025 with five we press enter and boom data science problem predict the volume of the Twitter stop market on January 1st 2025 okay so for now we've only generated the data science problem but we are also interested to know suitable machine learning algorithms that will be able to solve this problem so currently we're using just a single chain and what we can do is to chain several of these chains together sequentially to bring them together to perform multiple tasks so we're going to create another prom template and another chain let's go with the prom template so um we're going to copy and paste and then we will modify our first um template so the second um prom template is going to be called Model selection template and now our input variable is not going to be business problem it's going to be actually or data science problem I'm going to call it data problem and then or a template now is going to be give a list of machine learning algorithms that are suitable to solve this um problem and then we're going to pass the data problem so now it's going to be a model um selection template OKAY model selection template cool the input variable is going to be now data problem and now this is going to be uh to get me a list of suitable machine learning algorithms um to solve this problem so give me um give a list and then this is not business problem this is data problem and now we're going to add another chain let's copy our data problem chain and we're going to modify that to create uh a model selection chain so um model selection underscore chain right LM the same LM but now our prompt is not not our data problem template is our model selection template um like that and then beos to true so now what we need is a way to link this chains together because now they're just working independently and here's where we're going to use the simple sequential chain so we're going to create an instance of our sequential chain and we're going to set that to simple sequential chain so there's one positional argument that we need to set for a simple sequential chain and that is the chains positional argument um which is uh list of the sequential chains and the order is super important here so the first chain that we're going to run is the data problem chain that generates our data science problem and then we run the next chain so the output of this chain the data problem chain will then get passed to the model selection chain so after model selection chain um we are going to write um sequential chain and that's going to be equal to uh the simple sequential chain okay another thing that we need to do is to change the response so now we are no longer using the data problem chain we are using our sequential chain so let's just write that down sequential Tain do run and then we're just going to pass and this is just how the syntax is not it won't throw an error we're going to pass the prompt like this so let's run and let's see what happens we have a little error ah so here in the input variables there was a comma great so as you can see we have an output which is a list of machine learning algorithms the simple sequential chain is only outputting one output which is the list of machine learning algorithms and it is not actually outputting the data science problem um and this limitation arises because a simple sequential chain manages only one single output and one single input so in order to fix this what we're going to do is to use a sequential chain instead of a simple sequential chain what we need to do is to swap here the simple sequential chain for the sequential chain but um sequential chains are a little bit more complex than simple sequential change so they're other things that we also need to change so first of all we need to specify the output keys so the output key for data problem chain is going to be the data problem and we also need to update our model selection chain so the output key is going to be the mod selection and then um we need to replace our simple sequential chain with a sequential chain and specify that our input variables for the sequential chain will be purely the business problem and the output variables will include multiple output variables the data problem and the model selection when using a sequential chain we need to pass through a dictionary so something to take into account so our dictionary is going to take the business problem as a key as well as our prompt to get our data problem and the moral selection we can access them separately from our response so let's change all these things so first of all what we need to do is to add output Keys here in the chains let's do that so for this first chain is going going to be the data problem for the second chain is going to be the model selection okay and then for the next um chain is going to be both the data problem and the models selection so now this is not longer a simple sequential chain this is a sequential chain right and then the response is going to be the sequential chain and now this works like this with remove run and then say that the busness problem is going to be the promp and now we're going to write oops um um so we're going to access the data science problem and the model selection from our response right um let's run okay another an other mistake no problem so let's check what is this so an input variables oh right so what has happened here is that I've missed the input variables for the sequential chain so we also need to specify that let's write that down so that is going to be the business problem okay um let's see if this is is happy now great so now we have two outputs the data science problem and also the list of suitable machine learning algorithms what we're going to do in this video is to import the Wikipedia apaa wrapper tool which allow us to make APO calls to the Wikipedia API why is this useful so well it turns out that the model that we're using which is the gbt 3.5 turbo is trained up to September 2021 so if there are new algorithms released after that date the LM itself won't be able to access that information so this APA rapper allow us to access information that is released after September 2021 so is there if there's any relevant like like suitable algorithm that has been released after that day um or app will have access to that information and also other information that might not have been considered by our llm okay so the first thing that we need to do is to update our prom templates to include this Wikipedia research so the template that we're going to use for the last prom temp plate is not going to just take into account the business problem but also the Wikipedia research so what we need to do is just to add while using this Wikipedia research and pass the Wikipedia research we also need to create a Wikipedia APA rapper instance and because we are creating a lot of things already we're going to start building or functions and the same way that we did last time we're going to start to Cache the functions as well so let's start doing that so let's go back to the section in where we were caching the functions so we're going to use sd. cach and then resource we'll get that um explained in a second um and we're going to define a function that is going to be called week we're going to pass our prompt and what we're going to do is to create a variable that is going to be called Wikipedia research and we're going to do is to set this equal to or Wikipedia AP rapper and we're going to run or prompt and then this is going to return the um week research cool then um what we're going to do is do ride a function for the prom templates and what we're going to do is to move or yes this thing we're going to move it up there um wait and this is going to return or data problem template and or model selection template and then um indentation please but not just that what we need is to add to this template as I said before while using this Wikipedia research and then this is going to be right next thing that we're going to do is to create a function for our chains so let's do that so let's move our chains all of them going to move this here let's start this things [Music] up so things that we need to modify from here so now our data problem template and our motor selection template is actually the output of this function here which is the prom template prom templates so what we're going to do is to change those things here so one is the first output of the function and the second is the second out output of the of the function so this is no longer going to be this but it's going to be prompt templates the first output um and then the second is going to be the second output what is going to return is or sequential chain and now I'm going to create a function that is going to take the prompt and the Wikipedia research and it's going to use the chains to create the outputs so it's going to be called um change output let's do that right so we have loaded the function change and then we have passed or as a dictionary or prompt and or weki research and then what we have done is to extract from that or data problem and the model selection and then the function returns both the data problem and the model selection cool so that being done we also need to tidy up the prompt part of our code so let's go there so this is how whoops this is how it looks like for now so this is not longer going to be at your prompt here it's going to be what is the business problem you would like to solve okay and we're going to change this to text area because this will allow more room for the user to write their business problem and also we need to tie the up the if prompt statement so now what we're going to do is to set the weeky research is going to call this function and we're going to pass the prompt and now what we want is to use the function change output to Output the data problem and the model selection and we're going to pass both the prompt and the Wikipedia research so we no longer need the response what we need to do right now is my data problem and my model selection so let's R run okay so we can see now the text area here so we're going to write the same business problem as before so I would like to know what the volume of the Twitter stock market is going to be the 1st of January 2025 oh uh I written something wrong absolutely no problem where is that cool so now we get the data science problem and a more curated list of machine learning algorithms so we have used cash resource the Creator before or Wiki function instead of cash data it turns out the cat data is designed to cat functions that return data whereas the cash resource is designed to cast glober resource in this case it just works better to use cast resource for a weki function but you can go to the Streamlight documentation to read a little bit more about it if you're interested so now that we have our list of suitable machine learning algorithms it would be be great if the user could select their choice of algorithm to do their predictions so in order to enhance user experience we're going to implement a selection box with striplet but for that we need to format first or list of machine learning algorithms into a tle or into a list so what we're going to do is to convert our list of outputed machine learning algorithms into a tle that we can pass to our selection box in striplet so let's do that so this function what is going to do is to get the list and assume that each line corresponds to an element of the taple is going to remove the enumeration and also is going to convert each element to string and what is going to do as well is to add a new element that is going to be select machine learning algorithm and that is going to be at the star of the tle so let's write that down so now that we have this function we can use it to format or list of algorithms and pass it as the options for the select box so let's do that so here after writing the model selection formatted list is going to be um the list to select box we're going to pass um or model selection so that will transform it uh what's going on okay and then select algorithm is going to be equal to this select box um so St do select box and we're going to say SCT machine learning algorithm okay and then going to pass the formatted list let's see what happens when we run great so here we have have our selection box we click and then we can select whichever algorithm we want but now obviously if we click nothing nothing happens because we haven't linked this to anything else but we're going to do that right now in order to solve or they descience problem with our chosen machine learning algorithm we're going to use the python agent the python agent is going to be equipped with a tool which is the python RL tool so in order to do that we're going to create a function that initialize and configures this python agent you can go to the documentation to get a little bit more into the details of this python agent so the great thing about this agent is that not only is able to write Python scripts but also is able to execute it so now that we have initialized our python agent we can use it to provide a solution to our problem using the algorithm chosen by the user so in order to do that we're going to use the do run method on our python agent and request a generation of uh python script to address our data problem and also to solve this problem as well so in order to do that we're going to create another function that is going to be called python solution so what we're going to pass through the Run is write a python script to solve this and then we pass the data problem that we have generated from the chain using this algorithm we're going to pass the selected algorithm and using this data set so remember that the user csb is the csb file that we loaded at the beginning from the previous video If you have a um watch part one I would definitely recommend that go watch part one in where we um we explain how to load the csb files so we're going to pass that as well let's write all that down and also the python solution then we definitely need to take all of this so we need to pass my data problem the selected AR algorithm and also the user csb let's do that so my data problem selected I'll go okay and then this is going to be right so in order to integrate this function into our worklow um we are going to write a couple more lines of code here so um again in order to make sure that things run in order we need to say okay if the selected algorithm is not known or is different that select um select algorithm that's how we call it right so um so the first one is Select algorithm okay so if it's different than that then we're going to add a of Heather that is going to be solution and then we are actually going to run the function python solution and then we will write down the solution okay and then sd. R solution okay and let's not forget to write the if statement [Music] here right so which choose linear regression and we get a solution how great is that so if we have a look to the terminal we can actually see the train of thought of our agent so it's important the data set converting the DAT column to date time and then use a linear regression to predict the volume is reading the C file splitting the uh data sign to train and test data is important linear model from s kit learn fitting the model and then doing all um necessary adjustments to the data to make sure that everything is in the correct format in order to predict the output and it now knows what the final answer is and it gives gives the final answer let's try now the session trees so the predicted volume of the Twitter stock market on January 1st 2025 is this value okay last thing that we need to do really quick to further enhance our user experience we're going to add a couple more expanders the same way that we did in the first video I won't go into detail um because we already covered this on part one if you need a little refresher come back to part one and you can have a look and how to build your expanders using the llms as well so what I'm going to do is to write at expander um here after what are the steps of the Eda is going to be about the importance of reframing a business problem into a data science problem that will appear when this section is run in the application and then I'm going to add another box about the importance of selecting differenter machine learning algorithms to get the best results so I'm going to do that so what I'm going to do is add two different functions for the generation of the information of the LM for this two separate expand and then I will add with a sidebar with spund and then um include the function for the generation of the information let's do that so we go to functions Side bar I'm going to add that there so we have the first one write a couple of paragraphs about the importance of framing a data science problem appropriately and then the second one write a c of of paragraphs about the importance of considering more than one algorithm when trying to solve a data science problem and then we will add the side bars within the flow of or main script so the first one is going to go here before the prompt and the second one is going to be after the model selection right so I have rerun everything since the beginning so you can see so now this appears here and we have the importance of framing a data science problem appropriately we're going to write the business problem again and then when we run this let's just check that this appears there is one algorithm enough and then this is generated there and that concludes the second video of the series but the fun doesn't end here as you continue your data science Journey consider an experimenting with the selection of different algorithms if you want to go a step further you can use your agent to Output evaluation Matrix or use the generat code for manual execution exploring diverse machine learning algorithms and comparing various evaluation metrics is crucial in order to get the best results in the upcoming video we'll wrap up the development of our assistant we'll look into the concepts of memory and indexes and we will further enhance the user experience of our app memory will Empower our model to retain past interactions with the user while indexes will organize documents for app utilization be sure to download the completed project and samp data from the digilive academy website there you'll also discover additional resources and courses on data science on AI the written tutorial is also linked in the description if you like this video don't forget to like And subscribe to be updated with our latest tutorials and upcoming content thank you so much for joining me in this journey and I can wait to see you in the next part

Info

Channel: digiLab Academy

Views: 2,152

Rating: undefined out of 5

Keywords: ai, artificial intelligence, datascience, machinelearning, assistant, langchain, streamlit, open, chatgpt, python, python tutorial, nlp, model, largelanguagemodel, llm

Id: 4ChcBu0eY2Q

Channel Id: undefined

Length: 43min 21sec (2601 seconds)

Published: Thu Nov 30 2023