Automate Machine Learning with ChatGPT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In this video, I'm going to show you how you can automate machine learning using chat GPT and Python. Now, let's imagine you're working as a data scientist and a bike rental company ask you to create a prediction model that can, for any given day, predict how many bikes are going to be rented out. Now, this is a typical problem that you can solve using machine learning. And in this video, I will show you how we can set this up in VS Code using a data set that I will provide to you as well, link is in the description, and chat GPT for generating the code to basically make our workflow really fast. Okay, so let's now set up Visual Studio Code workspace that we can use to start writing Python code. We first download the data set and put it into our directory. Then if you're new to working with VS Code and data science, you can check out these videos. By following that, you will have the exact same setup that I will be using. Then in Visual Studio Code, I make sure that the data set is on the data raw. The next step is to open up an empty Python file called make data set that is in source data. Next, make sure that you select the proper Python environment that you can use for this project. Okay, with the setup completed, we are going to follow the data science life cycle to create a prediction model. And along the way, we'll be asking chat GPT to write code for the specific elements that we need. Okay, so let's get started. So for this business understanding, we already know because of the problem definition over here. We want to create a prediction model to optimize revenue. Then second, understanding the data. And for this, we're going to ask chat GPT to import our data set. So I've given the problem statement to give it some context. And I'm also going to ask to read the data and I specify the path and then also asking to write Python code to load it into a Pandas data frame. Okay, so we get the nice piece of code there, even with some examples of how we can tweak it. But for now, let's just copy and paste this into our file and see if it works. So I'm going to fire up an interactive Python session over here that is trying to load the data frame. All right, that is looking good. And we can have a look at the data frame over here. So that is great. So now in order to get a better understanding of this data, I'm going to take the columns from this data set and I'm going to copy and paste those and to get a better sense of what we have to do in the data preparations, we need to understand the data a little more. So let's just ask chat GPT to explain what we're dealing with here. So I'm just providing it with all the columns and I'm asking how we can use these columns to create a prediction model that can predict rentals. Okay, so this is actually pretty amazing. So it went ahead and created a train test split and a linear regression model already. So I was just asking it to explain the columns to get a sense of what we can do with this data. But then it said, okay, so we can use certain columns to make predictions for rentals. We have to create a trained test split. Boom, here's how you do it. So let's just go ahead and copy this code as well and come back to a Visual Studio Code. So I'm just going to put this over here. Don't change anything and import the linear regression and it train test split. Then it knows that our data frame is called DF and we can split up our variables into an X and a Y. So I think that looks really solid. Then we're going to create a trained train test split. This also works linear regression model, train X and Y train on the model. Then we're going to test score the model and print the R2 score. And boom, we have a prediction. It's not perfect, but we have without changing a single character, have a prediction model that can help us make predictions for the company. Now let's tweak it even further. Okay, so typically in regression problems like this, it's best practice to convert the categorical features to dummy variables. So here you can see we have four numerical features. Those are fine as is, but we want to convert the categorical features to dummies. So I went ahead and asked chat GPT how we can do this. I just copy and paste it all the categorical columns and then asked two categorical features and then create dummies in Pandas data frame. It provided the code just perfect. So here we can take this selection over here and then I'm going to insert that right at the top of my file, save it to let Black do its formatting and then let's just run this, see if it runs. Okay, so now if we look at the data frame dot info, we should see that the columns we've provided are now as a category. And then the next step is to convert them to dummies. So this was all using one prompt from the last one and it also gave us the dummies. So we can copy this as well and now come back and then store that into our data frame again. Let's run this, save it again and see what we currently have. So we are now dealing with a data frame that has a lot of dummy variables based on all of the columns over here. Now in order to speed things up for the training process, I'm going to copy and paste this and put it down here. So we first load our data frame, then we split our x variable, then our y, and then I'm going to change the df over here and change that to our x in this way. And now we will apply everything to the x only and then using the get dummies, we can convert our x variable to all the dummy variables while keeping our original data frame as is. And this should enable us to create a trained test split. And now we can create the same linear aggression model again, try to fit, train it, score it and then print the result. Okay, so we can see that we have an increase from the 66 r score that we just had and we are now at 73. So that is a nice increase. But this is of course a very simple model. Let's now try and ask chat GPT if we can explore some more fancy models. So let's come back to chat GPT. Okay, so I said the regression model is too simple. Can you provide me with five algorithms and code to train, evaluate and compare them? Okay, so we got a couple of algorithms that we can choose from. And this is actually really useful. But I want to go one step further and actually automate this process. So I've asked, can you combine all code into one function that test all the algorithms and save the scores to pick the best one in the end? Sure, here's an example. So here we can see that it imports all the various models and then creates a loop and then scores everything. So let's just see how it works. How this works. We can come back to the x and y. So with our x and y defined, we are going to get rid of this block of code over here and then copy and paste this. We are going to put the imports at the top. Let me quickly install xg boost. All right, there's also how you can install packages if you don't have them in your environment. Just do a quick pip install in the terminal. And then we can come back to our algorithm over here. Basically our function. Let's just make sure that everything is nice and clean up here. Make sure to run everything. And we also have some duplicates over here. We don't need that. So we have the imports. And now we have our compare algorithms function that takes the x and y. Let's just put this over here. So make sure we run this and then we can run the algorithm over here. So now it's training looping over everything and we can print everything. And wow, look at that. So it's giving us scores, but we have to check so it's using a mean squared error in this sense. And then, okay, so it's trying to find the best model by minimizing the score, then print best model. Boom, random forest. How awesome is that? So with just a couple of prompts to chat GPT, we have created all this code over here that can loop over over over various algorithms and compare the scores. Okay, now let's take it one step further. We know that the random forest is the best model out of all the algorithms over here. Now, let's try and tweak the hyper parameters to get an even better performance. Okay, so now I'm going to ask it to optimize the random forest. So I ask, can you create a function to perform a grid search over the most important hyper parameters in order to tune the model? Use cross-felladation, five-fold cross-felladation and use the mean squared error and the R2 as evaluation metrics. So we can compare them both. Okay, so here it goes. Tune random forest. It's giving us an estimator's max depth. Also, this is actually quite an extensive grid search. And I was going to apply the cross-felladation. All right. And we have another beautiful function over here. So let's copy and paste this, see how this works. So we have some imports again that I'm going to put at the top. And the random forest regressor is already there. Let's just import the grid search, the K-fold, and the error metrics. Now I'm going to store this into memory and then we're going to tune the random forest. Let's go. Okay, and after running for a while, the grid search completed and look at these results. We have an R score of 97 over here. So we are getting close to a perfect model. So now let's take these best parameters and try and create the model, make predictions and visualize the results. Okay, so I'm just going to ask it's hey, these are the best parameters. Can I now create a random forest model with these parameters, create predictions for why, and then visualize the result. Here it goes. All right. And it's finishing up. It's now providing description of what it has just created and it looks really good. So let's copy and paste this and see what we get. So here underneath the last function, we are going to fill in the code over here. Then come back and do a few more. I think the only one is we need the mudplot lip in order to create the visualizations. Then come back over here and we have to split the data again because yeah, we can still use the X and the Y. And now let's run this using the best parameters. Let's save this and run it. Okay, so these first two plots look really good, but this is kind of messy, but it's actually not chat GPT's fault because I know that the index here is probably messed up. And when you get predictions, there is no index. So we should be able to fix that by calling just the values and get rid of the index and then everything should line up correctly again. So let's visualize that one more time. And this is starting to look real good. So here we can see a scatter plot. So it created a scatter of the Y test and the predictions. And we can also see that now using the train test split, our R score is a little lower. So it performed better with the when we did the K fold cross validation, but it's also why it's always good to create a proper train test split, train on one and then test on the other 20% in this case. But we have a really solid model over here. Okay, so I'm happy with how the model turned out, and I think we can help our client with this. So now let's ask chat GPT what the next steps are to bring this model into production. Okay, so now I ask how to export the model to a specific directory in my project. And also what are the next steps to put this into production? All right, awesome. So we have some code over here that we can use to export the model. So let's come back over here, make sure the import is at the top and import the job lab library and then come back over here and we are going to export the model. So we're going to dump it and then it should be in here, awesome. And then we can also use the same model to load that again using the job lab library. And now it recommends that we can use this model on a server and then start make predictions for the client. This is actually really awesome. Now I'm putting this model on actual server, might be something for a future video, but this is how you automate machine learning using chat GPT and Python. Now, and this is pretty scary, right? Because you're probably wondering like what's even the point of learning all this machine learning data science Python if chat GPT can do this. And this is only version three. What happens when we are first in 10? What will happen? But don't worry. This is not going to replace you, but it is going to drastically change how you approach work. And not just for data science for every industry. And I believe we're currently at a point in time where you're going to fall behind if you don't start to leverage AI for your work because someone else will and that person is going to be 10 times faster than you. Look at what we've just created in only a couple of minutes. And I'm now already fully utilizing the power of chat GPT within my work because I'm working as a freelance data scientist. And that means the faster I can complete a project, the more projects I can take on, the more I earn within a given year. So this is huge for me and of course for you as well. So what I think you should do if you want to set yourself up for a successful career in data science and machine learning is you should focus on a holistic approach to solving business problems. And you do this by getting a really good understanding of the data science lifecycle, all the individual steps that you need. Because what you've seen in this video is my understanding of the data science lifecycle and projecting this projecting it onto this problem and then asking chat GPT to fill in the nitty-gritty details. So it's not so much more about understanding all the algorithms and really knowing the specifics of the and the syntax of the code. But it's much more about understanding how to solve a problem as a whole and then asking the right questions to AI. And now in my opinion, the only way to actually develop a sense of how you solve business problems using data is by actually getting your hands dirty and doing projects. So not just following textbooks and reading up how all the algorithms work, but actually getting to work coding and solving problems. And now when I was studying data science and machine learning, I found it very hard to find actually good examples of how to learn this. So that's why I created this series here in this playlist free on YouTube where I uncover an entire machine learning project. So if you're learning data science and machine learning, I would highly recommend to go check that out next.
Info
Channel: Dave Ebbelaar
Views: 4,178
Rating: undefined out of 5
Keywords: chat gpt, open ai, python, machine learning, automl, automate machine learning, artificial intelligence robot, data science chatgpt, machine learning chatgpt, chat, gpt, save time, tips, fast, freelance, predictive model, grid search, random forest, linear regression, chat gpt to make money, chat gpt prompts, artificial intelligence and machine learning, open ai chat gpt, vs code python setup, vs code tutorial for beginners, vs code, how to, artificial intelligence
Id: OmQiLvnY3WY
Channel Id: undefined
Length: 13min 45sec (825 seconds)
Published: Sat Jan 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.