Make Your First AI in 15 Minutes with Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right guys today we are going to be building your first ai as fast as possible so the first thing that you're going to want to do is you're going to want to go to collab.research.google.com and this is going to allow us to do all of our code within the browser which is really really cool so now that we're here we're going to create a new notebook and this notebook will be where we create our ai alright so while that's loading we're going to rename this notebook from untitled 0 to let's just say ai awesome and so here is where we're going to be writing all of our python code but first what we have to do is we have to connect to a runtime and these are basically if you click this connect button the gpus that google gives us for free for us to do all of our training which is really really cool so now we're going to go to files we're going to press upload and we're going to upload our data set which you will be able to find in the description and so the name of our data set is cancer.csv so we'll upload that and i'll show you what we're working with here so if we double click on that cancer.csv you'll see we have a diagnosis and then we have a bunch of numerical attributes relating to different tumors that may or may not be cancerous and so what we're going to do here is we're going to make an ai figure out based on these parameters whether a tumor is malignant or benign so malignant meaning that the tumor needs to be removed and benign meaning that the tumor is not cancerous so we see here we have in the label diagnosis one means that the tumor is malignant and zero means the tumor is benign and so for radiologists this is a very important part of their job because they have to make sure whether or not what they're seeing on the mammogram or whatever is actually going to be a malignant tumor so having ai help radiologists in this effort is really important and can actually save lives and we can do this in just a couple of minutes so the first thing that we need to do is we need to import our data set and so what we're going to do is we're going to import pandas as pd and so if you don't know what pandas is it's a library that lets us do a bunch of cool stuff with data in python and then we're going to set our data set equal to pd dot read underscore csv and the file that we are putting into our environment is called cancer.csv okay so we'll run that and the way that you run things is uh unlike other files or other programs where you run the entire file all at once in google colab you're running it cell by cell and so in order to run this cell what you're going to do is you're just going to press shift enter and it'll create a new cell for you so now that we know that that part of the code is working next we need to set up our x and our y attributes so the y attribute that we're going to be wanting to predict is whether or not the tumor is malignant or benign and the x attribute is going to be all of these features and what we're going to do is we're going to have the ai map the correlations between these two features and that will allow the ai to predict whether or not a tumor is malignant or benign so in order to do this we're going to say our x is going to equal to data set dot drop and when we're dropping what we're basically doing is removing a column so our data set dot drop and we are going to drop the column which is called diagnosis 1 equals m comma zero equals b close parenthesis and so with that all we've done is we've said that our x attribute is equal to everything in the data set except for this diagnosis column so we'll run that and it says this isn't found in axis and that's because we need to say the columns that we're dropping is just going to be the diagnosis m or one equals m zero equals b so i'll run that and then for the y column all we're doing is we're taking the diagnosis column so y is going to be equal to data set and the column is going to be diagnosis 1 equals m 0 equals b so run that okay so now we have our y column and we have our x column all that's left is to split up our data between a training set and a testing set and this is something that's really really important in artificial intelligence because oftentimes you'll find algorithms are overfitting which means that they do really really well on the data that they've already seen that's going into the algorithm but when you give them new data they just fall apart and so in order to mitigate this what we're doing is we're going to set a part part of our data set aside um to be tested on later with our algorithm so the algorithm will be given data that's never seen before and then we'll use how well it does on that data to evaluate it because that's really what we're looking for not how well does the algorithm understand the entire data set but more just the problem in general which is cancer diagnoses so what we're going to do is we're going to say from sklearn dot model selection and so scikit learn is actually a really popular machine learning library we call it sklearn when we import it and i have an entire tutorial series on how to do it which you can check out in the card but today we're actually going to be using another library mainly but we're going to still use scikit-learn just to split our data set in between a training set and a testing set so we'll say import train test splits and then we're going to say our x train our x test our y train and our y test is going to be equal to that train test split and we're going to do the split on x and y and the test size is going to equal to 0.2 so basically 20 of our data is going to be in the testing set all right and that's actually a pretty pretty normal way to divide things up 80 20. awesome now it's actually time to build our artificial intelligence and we're going to be doing this using tensorflow's keras which is really really popular and we're actually going to be building a neural network which is also a very very popular form of ai that can be used in almost any problem and it just sort of adapts to it so in order to do that first we're going to import tensorflow as tf and we're going to say our model is equal to tf keras dot models dot sequential okay so we have that and everything looks good there next we need to start adding layers to our module so if you look up a neural network what we have here is we have an input layer which is our x values all the different attributes of the cancer data set and then it goes through a hidden layer and then finally there's an output layer which would tell us if our tumor is malignant or benign okay so we're going to do is we're going to say model dot add and we're going to add tf.keras dot layers dot dense and so what a dense is is the dense is just one of these standard vanilla sort of default neurons that you get in keras and so the number of units we want and you can see here it says uh input the units we're just going to give it some random power of 2. so i'm going to make this neural network really really big bigger than it needs to be just to sort of see how powerful we can get with this data set so we'll say 256 and then since this is actually our input layer we need to say that the input shape is equal to and it'll be equal to the x train dot shape okay so basically what we're saying is we're going to input something with the size of the x train that shape which is to say all of these x features right here and then the output will be 256 neurons and then most importantly we also need an activation function which we're going to set to sigmoid and basically all the sigmoid function is if you see on your screen now we're taking all the values from the neural network and we're just plotting them between zero and one and this is going to be very helpful for reducing model complexity and also making the model more accurate so now let's add another layer we'll say model dot add tf tf.keras dot layers dot dense and let's say let's get another 256 neurons and also the activation function will be sigmoid and then finally let's make our output layer we'll say model dot add gf dot keras dot layers dot dense and we'll just have one dense neuron with the activation function sigmoid because again uh our final value is just going to be one single value between zero and one for the diagnosis so you can understand why sigmoid function might be helpful in that scenario okay so we'll run this and it looks like everything ran okay so next it's time to compile our model so we're going to say model dot compile and what we need is we need an optimizer so the optimizer that we're going to be using is called atom and you can see more information about how this works but this is basically how the machine learning algorithm is being optimized how the neurons how the weights of the algorithm are being fine-tuned to fit the data next we're going to have a loss function and so the loss function because we're doing binary classification we use an algorithm or we use a metric called binary cross entropy and this is really really good when we're talking about categorical stuff so like malignant or benign just discrete values like that and then finally the metrics that we're going to be looking at is we want to look at accuracy because we want to correctly classify as many tumors as possible so we'll run that awesome now it's time to fit our data so we'll say model dot fit and we're going to fit the x train of our model to the y train and we're going to set the number of epics to let's just say a thousand and so what the epics mean is that that's how many times our algorithm is iterating over the same data so usually on the first run through the entire data especially with a smaller data set like this only about under 600 entries the algorithm needs to constantly learn by going over the same data over and over again so a thousand epochs is kind of a lot but since our data set is small that's okay most likely because of the size of our neural network for the size of the problem uh this is a bit overkill but it's better to be safe than sorry so we're gonna run that and then while that's running i'm gonna go do some push-ups a few moments later all right so now that i got some quick reps of push-ups in let's see how well our model did so if i scroll down you can see we have the loss function the accuracy and also the epic that it's on so as we're nearing towards the end it looks like the accuracy fluctuates quite a bit but it stays consistently above 95 percent for the testing set or the training set and now the final thing we need to do in evaluating our algorithm is we're going to say model dot evaluate and we're going to evaluate with the x test that's our testing set on the y test so basically what this is doing is it's comparing what the model thinks that y-test should be versus what y-test actually is so we'll run that and you see our accuracy is 97 which means that on new data our machine learning algorithm our ai can correctly classify cancer diagnoses cancer tumors with 97 accuracy which is amazing considering the fact that this algorithm never went to college didn't go to medical school doesn't even know what a tumor is but from this data it was able to find a pattern and from that it can be used to help radiologists and doctors yeah guys that's pretty much it for this video if you have any questions let me know and i'll see you in the next one sweet thank you guys so much for making it to the end of this video as always if you enjoyed it make sure to hit that like button also if you have any questions comments or concerns about what we did in this video please leave them down below now if i could have just one more minute of your time i would like to tell you about a service that i've been using oh for over a year now called script now just as a side note script did not sponsor me to make this video i just wanted to tell you about it put simply scribd is a lot like audible except for instead of being 15 a month it's only nine and instead of only having two audio books per month you get an unlimited access to a plethora of audio books ebooks documents and even sheet music and magazines so for me this was obviously a no-brainer and right now if you use the link in the description you get 60 days free of script and i get one month if you sign up using my link so that's why scribd didn't officially sponsor this video i'm just telling you about it so that i can get some free months and i can continue learning and you can also continue learning with your 60 day free trial so thank you guys so much for making it to the end of this video and i'll see you in the next one [Music] peace [Music] foreign
Info
Channel: Khanrad
Views: 965,701
Rating: undefined out of 5
Keywords: khanrad, adam eubanks, online course, learn to program, python tutorial, make an ai tutorial, keras, deep learning, machine learning, ai doctor, artificial intelligence, ai, scikit-learn, make an ai fast, make ai for beginners, build ai, make an ai
Id: z1PGJ9quPV8
Channel Id: undefined
Length: 16min 37sec (997 seconds)
Published: Fri Jan 29 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.