Analyze Custom CSV Data with GPT-4 using Langchain

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so what is the overall price strength bearish or bullish Kevin my name is villain and in this video we're going to ask gpt4 whether or not he is bearish or bullish on bitcoin price we are going to do that using Lang chain and a custom CSV file which contains Bitcoin historical prices up to today 4th of April let's get started if you enjoyed these videos and want to talk more about machine learning AI statistics probability Etc you can join the Discord server which I'm going to link in the description down below the library that we are going to use today is called link chain and this Library allows us to connect large language models such as charge GPT or gpt4 with our own custom data so this is the official documentation of the library and this library is available in both JavaScript and python uh today we're going to have a look at the python documentation so let's see this is the official page the getting started page they have a very good documentation you can see that they have use cases and they have references to the API and also they are discussing each of the modules that the library contains but the most important thing is probably here in the introduction lecture is a framework for developing applications powered by language models so one important thing about this is that this library is actually not entirely or whatsoever connected to charge GPT or gpt4 you can use this library with other language models so it's kind of a plug and play so you can think of some sorts of components or modules that you can basically combine and achieve some result so even if now charge GPT or gpt4 are probably state of the art models that you can use probably in the future you might want to try something else and those other models might be free so you might not have to pay to open AI or other company so you can still use long chain in order to create your own applications the data that we are going to use today is available on Kegel and it's called cryptocurrencies daily prices it's provided by Stephen and it is very frequently updated as you can see this one is updated 11 hours ago and this contains Bitcoin and other cryptocurrency since 2009 and you can see here that we have a lot of CSV files that contain the historical data currently I'm showing you the Bitcoin CSV file which we are going to use in this video here we have the ticker which is BTC for Bitcoin and then we have the date on which the price was recorded and then you essentially have an opening high low and close price so you can maybe draw out some chart candles I have an empty Google Co-op notebook which is already connected and this notebook doesn't need to use any gpus let's start by installing some libraries we are going to use Link Chain then I'm going to install the open AI API and finally I'm going to get python.if since I want to essentially use my API key from open AI in order to run the experiments here so first let's install the link chain Library and it's the current version is one zero zero one three one next I'm going to install openai Library 0.27.3 and these are the most current versions of those libraries and finally the python dot amp Library which is version 1.0 after the libraries are installed I'm going to continue with getting the environment file which I'm going to take for the open AI API key first I need to mount my drive and I'm going to show you why and while this is happening I'm going to copy from my drive I have a file called dot Dev top f and then I have another one called example.diff which I'm going to show you let's copy those files and here are the files so should look exactly like this but you must put your API key right here so this is just an example file once this is done I need to download the Bitcoin prices or the data that we're going to use this is again from my Google Drive but this will be also available to you next I'm going to continue with adding some imports here then I'm going to import some think from the link chain Library the first one is going to be the create CSV agent then I'm going to import the open AI so I have a typo here then I'm going to also import from chop modules chopped open AI this will you will use this for gpt4 and then I'm going to also import display and markdown so I'm going to use some little markdown in order to show you some things related to the link chain Library let's run through this okay so the Imports are very quick next I'm going to import the API key using the.nf Library I'm going to call what the 10th and here I'm going to pass in def.f and I'm going to pass in the overwrite true so this will load the API key into the environment the data that we're going to use is somewhat pre-processed version of the Kegel repository I basically went ahead and took all of the Bitcoin prices for 2022 and 2023 at least up to this time and if you want to have a look at how I did the pre-proc sync I'm going to include all of those steps within the text tutorial that will be available on ML expert so let's continue with awarding the data so the data is called BTC daily price dot CSV and let's have a look add some arrows from it and you can see that I've modified this quite a bit so I essentially took the closing price and from the date I've extracted a couple of components the day the month with essentially the name of the month then the year and then the day of the week as a name again so this is the data that we are going to pass in to charge GPT or the agent model that we are going to use so let's see how many rows do we have here and cons five columns we have roughly 460 rows of data 458 now that we have loaded the data we need some way to tell link chain the file that we are going to use and connect that to GPT 3 3.5 or 4. so in order to do that I'm going to use the create CSV agent function and this is essentially a way to provide the CSV file provided by link chain library of course and then interact with this CSV file using the openai library or another model to create the agent we are going to use the create CSV agent function as I've already told you and this one once on llm so it's a base watch language model then some part and then we have some keywords so what I'm going to pass in here is going to be the open AI model and you can see right here into the documentation that the model name is actually text DaVinci 003 so in this case I'm going to use this model which is an older version but I've also tried GPT 3.5 or charge GPT and it didn't it appeared to be breaking the library at the current time so that's not very advisable and I've also tried gpt4 which works much better even compared to this model so I'm going to continue with creating agents for both of those models but let's start with this one so for the temperature I'm going to pass in 0 in order to reduce the randomness of the model and then I'm going to pass in the butt which is essentially this CSV file and as a final parameter I want verbose equal to true so this will essentially output a bit more information when we are interacting with it and I'm going to call this agent this is the default one and then I'm going to create another one which is going to be the gpt4 agent and instead of creating this open AI I'm going to create chopped open AI instance I'm going to keep the temperature all right and this one actually is using GPT 3.55 or charge GPT as a default and I'm going to specify that I want this to be the gpt4 model and I'm going to run through those okay so everything it appears to be working now I'm going to just show you a lot of data from the agent or the values that it contains and it's a bit hard to look through this but essentially we have the model name this is confirming that it we are using text DaVinci 0.03 and then one important thing that you can see right here is actually part of the data that we are giving it so it already knows that we are working with this specific data so yeah it appears that we are getting the columns and then some values of those and another important thing right here is the fact that this is given some template or a prompt and we're going to have a look at this prompt in a second but this looks really interesting so to get to the to the prompt I'm going to call the agent agent then the llm chain don't worry if you don't know what this means at least for now and here is the prompt it's a bit not well formatted so I'm going to format The Prompt using markdown prompt templates let's look through this so this is the default prompt that the library is giving us you're working with a pandas data frame in Python so when we are working with this CSV file the initial prompt to the model that we're using is going to be this one and it's already giving it context about that we are looking or working with a data frame called DF and then it is providing tools in order to answer the question posed to you so the tool that it is provided is actually a python interpreter or a repo and in this case it is given the instruction that you should actually re uh write some code which is very interesting and then use the following format question the input question you might answer told you should always think about what to do action the action to take should be one of python repo okay so it's getting a question then thinking about it and then deciding on what code to write essentially and then it is giving an input an observation so what is the result so this is a step-by-step process and then if it now knows the final answer so this can be repeated n times however it is needed and then after it knows that The Final Answer is given it is provided that you can print essentially the answer and then begin and then the first question and then the scratch path which is essentially going to be the run through the python repo that is providing right here so this is a really interesting way to do prompt engineering and it is essentially giving you a way to apply some tools in order to do something because I've tried to essentially load this data with one clay a one chain and provide a CSV file and after I've tried this actually the prompts within the embeddings Etc didn't appear to be working very well but what I'm going to show you now it is working extremely well now that we have the data loaded and our agent has understanding of it we can simply start ask questions about the data we can start by asking a very basic question so to do that I'm going to ask agent dot run or call this and the first very very basic question is how many rows and columns do we have within the data so this will tell us if it is know what the data is at all how many rows and columns the data has and even with my not perfect English yeah you can see that it is taking an action which is which is df.shape and then this is the response or the output from the df.shape which is the Tuple it is essentially writing the code for that and then the data has the exactly correct number of rows and Cones so let me double check this yeah just to make sure that we are actually doing something that is correct so at least for now it looks that it is working and it is writing code for us so let's try something a bit more advanced I'm going to ask what is the average price during February 2023 so recall that we have the data for 2022 and the data up until 4th of April 2023 so what is the average price during February 2023. and then it started to execute so this is what it did it did a filtering or selection just to take the month of February and then the year 2023 it took the price column and then called the mean value and this then it got this number and the answer that is giving us is the average price during February 2023 is this number let's essentially get this and try to execute it so yeah we're getting essentially the same a result so again very well done so the next question is going to be what is the difference between the price during February and March this is somewhat more involved and it's it shouldn't be just one whiner or not A simple one-liner as this one so let's continue asking questions about the data this is mind-blowing actually what is the difference between the price during February and March 2023 again a bit more involved but okay so what it did it took the mean price from February and then it took the mean price from March and then it came up with this number which is around 2K or 1.8 k and then it says that the difference is negative 180 1800 so let's run this it looks to be correct just to make sure yeah exactly this number and then it did the rounding for us to keep mind blowing actually so the next question that I'm going to ask is what is the percentage increase from 2023 to 20 so sorry 2022 to 2023 a bit more complex question again what is the price increase from 2022 to 2023. so this is what it did it grouped the data frame by ear and took the mean price and as you can see the mean price for 2022 was much higher which is expected so next it took the same for the 2023 and then it says that the average price or the increase from 2022 to 2023 is actually the the average price for each year and we are down from the average of 2022 to 2023 we are down roughly 18 which is a very interesting uh let's do this for uh let's say January and February during 2023 so I'm going to ask what is the price uh percent or percentage change from average price in January versus February 2023 so the mean price yeah so this is the output or the top process first it took January the mean price for that for 2023 then it took the same thing for February and then it calculated the percentage based on the yeah the January date so on average we get 15 increase very interesting right so let's try something a bit more advanced stew let's calculate the moving average price with a lack of one week during 2023 a note that we are continuously adding more and more um constraints to the model what is the moving average of the price with luck of one week during 2023 so let's see what he's doing uh it knows about the rolling or the rolling mean function or the combination of rolling and mean from pandas it is converting this into this data frame and then it takes yeah this is the mean I now know the final answer so how did it came up with this foreign let's see how gpt4 is responding to this question gpt4 agent let's run through this and the exact same question and let's compare if we're getting the same number so sometimes these errors out but let's try it again and third time just in case okay so gpt4 is not giving us any response to this that's all right at least for now okay so the next question is going to be well it's interesting how it came up with this number but I'm going to continue with the next question so I'm going to ask it which days of the week the price dropped the most so what it is doing it is essentially getting the mean value of which day of the week Friday has the worst average price so it dropped the most I'm not really sure that this is the case because it's just taking the averages so it's not doing essentially the correcting here let's ask gpt4 about this same question and let's see if we get a different result hopefully it works this time probably not okay so it appears to be working this time let's see all right so as you can see it's a bit slower but it is doing an entirely different thing it is actually doing a new cone called price difference so it's taking the diff between the current and the previous day and then grouping by the date of day of week and taking the minimum price difference it appears that this hangout in a while okay the days of the week with the most significant price drops are Friday Monday Saturday so yeah it is basically giving us the most price drops happen on Friday and then on Monday so very very interesting and hopefully did this correctly let's let me run through this and group it I'm just taking the call from gpt4 yeah so we get exactly the same numbers and that yeah we're grouping by day of week and then taking the price difference and the minimum value of this wow so yeah as you can see gpt4 is much more advanced when talking about some serious queries that require much more analysis of the data I'm going to ask essentially the same thing but which days of the week the price so which day of the week is the best to buy which is essentially the same question let's run the original agent which day of the week is the best to buy so what it is doing it is again giving us the average for each day of the week which is wrong and then the gpt4 agent let's see how to get here I need to get the average price for each day of the week and find the day with the lowest average okay so it's it's actually correct yeah even this one so yeah you would essentially want to buy on the cheapest possible price and yeah one way to approach this is to just take the average so it appears to be to be doing just that I'm fine with that and then but let's actually let's actually run this once more through the gpt4 and let's see if we get another response here if we don't I'm going to continue with the next one okay so gpt4 is not very responsive when we talk about this one so next I'm going to ask which day or what day was the best one to buy so I'm going to just ask it which they hit the minimum price of Bitcoin and it appears to be yeah it appears to be November 21st 2022 let's just get this just to confirm that's the minimum price and then let's run this and just take the role yeah it appears that it is correct and this is the price difference the default or also audit right here so the next question is going to be what date was the best to sell during 2023 let's see what he's doing I need to find the highest price in 2023 all right the best day to save during 2023 was the date with the highest price which was so it doesn't appear to give us the date specifically let's run through this with gpt4 okay first it takes the 2023 year now I have a data frame with only rows from 2023 so it is looking for the max price row then it is taking the day the month and the year but that's interesting it is giving us essentially this template and not the the day the month and the year but yeah actually let's let's get this it's somewhat somewhat went and gave us the correct response but failed at the final step foreign this okay it appears to be working but I'm going to just print those 1st of April 2023 all right this is the day with the highest price or the date with the highest price so I agree with that finally or one of the next question is going to be how much the price percentage has increased since November 21 2022 so recall that this is the date on which the price was the lowest let's run this so let's look through the query I need to find the price on November and compare it to the current price all right so it took November 21st all right this is the correct price I need to find the current price so it's taking the first price from the date frame this is entirely wrong so it says that it's 200 but that's incorrect so let's call gpt4 and let's look through the response from it okay to answer this question I need to find the price on November 21 2022 and the last available price in the date frame okay so interesting okay so it's getting the date with the correct day of the week then it is taking it is taking the worst price which is on April 3rd which is correct and this is the price so it is subtracting from this price on April this on November 21st and then we are normalizing this the process increased 76.24 percent very good compared to the text DaVinci 0.0003 okay so the final question or Pro in probably the most important one maybe I'll give you a bonus on this one actually so what is the overall price strength or bullish let's see the text DaVinci zero zero three is giving us that the overall price trend is bearish and it is giving us this chart I don't know what this chart means we have days and then we have some prices I'm not sure what it is doing behind it this but this is essentially the port thought I can see the trend in the graph I mean that's that's really interesting it sees the trend within the graph wow let's ask gpt4 and let's see the response from it to determine the overall price trend I will calculate the price change from the first day to the last day in the data frame okay another really entirely different approach the price has dropped twenty thousand bucks from the first day to the worst day this indicates an overall bearish trend so according to gpt4 the current data suggests even after this simple calculation it's just that the trend is still bearish at least for bitcoin price Okay so I promised you a bonus and let's ask this question what do you think so let's formulate it in a bit of other way what will be the price of BTC on let's say May 31 2023 using the data you have you must give me a single number okay let's see okay I need to find the price of BTC on May 20 31 23. and then it found so we don't have such data Okay so let's see guess if I change the prompt okay so what it did he stuck to 23 it's a noun and then went to 22 and find found the mean value okay of May 2022 let's see if we ask gpt4 so what it did is to just take the mean value which is probably a very reasonable estimation let's write another one let's try another one okay to predict the price of BTC on May 31 okay it started very interestingly so here is the idea he didn't complete the the prompts to predict the price of BTC we can calculate the average daily price increase or decrease and multiply it by the number of days between the most recent date in the data frame and May 31. then add the result to the most recent price all right so this is a strategy that gpt4 is giving us in order to predict the price of Bitcoin for this particular date you might want to have a look at how you can do this in this video we've seen how we can use the link chain library with a custom CSV file that contains Bitcoin prices for 2022 and then 2023 and we used all of this in order to create a CSV agent then we basically started asking questions to text DaVinci zero zero three or gpt4 and we've asked a lot of questions about the data and you've seen that actually those models are performing somewhat Incredible or some of the queries and they fail on some of the other queries even gpt4 so take that for what you will but those types of queries on custom data appear to be working very in at least are very interesting to me and they appear to be performing incredible tasks and we just witness how those models can write pandas code and create queries and then give us the results and they essentially did the planning and then the execution and then output parsing just for us so that's pretty mind-blowing according to me at least so thanks for watching guys I am going to leave a Discord link or Discord server link or link to a Discord in the description down below if you have any questions or want to join the ml expert Community please go there please like share and subscribe and I'll see you in the next one bye
Info
Channel: Venelin Valkov
Views: 13,718
Rating: undefined out of 5
Keywords: Machine Learning, Artificial Intelligence, Data Science, Deep Learning
Id: Ew3sGdX8at4
Channel Id: undefined
Length: 43min 5sec (2585 seconds)
Published: Tue Apr 04 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.