Introduction to Neural Networks in Python (what you need to know) | Tensorflow/Keras

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
👍︎︎ 1 👤︎︎ u/aivideos 📅︎︎ May 11 2020 🗫︎ replies
Captions
hey how's it going everyone and welcome back to another video in this video we're going to talk all about neural networks in Python so we're gonna start with an overview of the important concepts and with neural networks I really think it's important to understand how they work at a high level so we'll walk through the basics of how they work and also we'll go through some information on network architecture hyper parameters and activation functions once we're done with that we're gonna jump into some code and so the first coding section we'll walk through the basics of writing neural networks with the Charis library and we'll kind of go through some rapid-fire examples of actually building models with that and then the second section will be a real-world problem and so in that section we're gonna go through building a neural network to automatically classify images as rock paper or scissors before we get started though I want to give a quick shout-out to this videos sponsor and that is kite kite is a code completion tool for python that uses machine learning to find the best suggestions kites completions are sorted by relevance instead of popularity or the alphabet and it can significantly speed up your development time by completing up to full lines of code kite integrates with the most popular Python editors like atom vs code sublime vim pycharm and spider and the best part about kite is that it's completely free to download to get started I've left a link in the description I've been using kite for about three or four months at this point and it's been a lot of fun to use so I definitely recommend giving it a shot alright to get started let's talk a little bit about why we use neural networks in the first place and I think that this can be pretty well explained through a couple visual examples so imagine you have the graph that looks like this and you know you have these red dots and these blue points and we're trying to build a classifier to automatically classify red dots and blue dots correctly so this first example is pretty straightforward we could simply draw a line between them and get perfect classification so a more slightly more complicated example imagine we have now these two sets of curved points in this case you know it's definitely not as trivial as the line but if we use a quadratic equation here we can once again pretty easily perfectly classify the red dots and the blue dots but this leads us to the real significant use case of neural networks and in reality oftentimes our data is not as nicely separated as this often times our data looks a little bit more like something like this where you know there's red dots all over the place seemingly kind of random with blue dots scattered between them visually looking at this graph we can draw some lines and separate the red from the blue but training and classifier to automatically do that that is not so trivial of a task but neural networks can do this neural networks can find patterns and find groups within the data and kind of pull out and so that's why they're so powerful so let's get into kind of what a neural network looks likes and some of the basics so using that last example that last graph is an example imagine we're trying to you know classify blue and red points properly well all neural networks are going to start with a input layer and in this case the input layer would be two-dimensional it would just be the x coordinate and the y coordinate of the point that we're trying to classify as red or blue next all of our neural networks are going to have some hidden layers and so this example we have two hidden layers each of for four neurons all these neurons here are communicating with the input layer the values from the input layer are being passed through weights in these connections and then you know red on this side and then further more pass to the next layer and that finally leads them to being passed to a output layer and in this case our output layer would be determining whether something a dot is red or blue with a certain degree of confidence really what a neural network is doing is it's updating these weights between the connections to hopefully be able to properly classify a graph like this so what's going to start out is you know the neural network is gonna have no idea about how to classify it it might kind of just draw a line and say everything to the left of the line is blue and everything to the right is red and that's not gonna be very accurate but it gives us a starting point and so valley is gonna be coming through this network and they're gonna have a certain degree of confidence so let's say we were looking at a red example here our neural network shouldn't give us kind of predictions like with fifty five percent chance it's red forty five percent chances blue and what we're trying to get is that that percentage value that confidence value here is as close to the actual value so in this case if we're looking at a red example this would be a one and this would be a zero based on this calculation and we're going to see that okay we weren't 1 and zero as we were supposed to be there were some loss involved and we're going to tell them weights to update accordingly and that's going to lead us to kind of a new separation in our and once again our you know our values as we get more and more examples are gonna be updating we're gonna start to get more confident about what's coming in I'm going to keep updating this graph and the value is going to get kind of more and more confident if there is actually you know a separation in the data they'll kind of converge to better and better values and ultimately we hope that if we train it enough we get something that looks like this where the data seems to be pretty well fit by our network and values that come through it will be pretty confidently predicted as red or blue correctly if you want a really good visual explanation of how these neural networks work I definitely recommend checking out the video series that three blue one Brown did on the topic he animates it beautifully and it definitely can help drill down some of these high level concepts before we move into actually coding networks next let's talk about the hyper parameters of this network there's several different aspects that were kind of able to adjust within our network I think the the most obvious one is going to be the number of hidden layers and the number of neurons per layer some of the other ones are going to be the batch size so how many data points are we passing through the network each update step so we're not gonna be passing in a single point usually we're gonna maybe be passing in sixteen points into our network or 32 or 64 so if batch size determines that optimizer so how does our network learn and it's an algorithm to update the neural map again one thing I wanted to note is that you can usually use atom as a pretty safe bet to your optimizer and that also leads us to the learning rate so how much do these weights update each time we see a batch of inputs and so if you adjust that higher it's going to update with the greater magnitude if it's a lower learning rate updates are gonna be smaller so we can play around with that as a hyper parameter another hyper parameter that's important to note is dropout so one thing that we find helps our networks generalize better is if we randomly basically disconnect nodes with a certain probability basically what this is doing is that if we're dropping out nodes randomly the rest of the network has to step up do more or an influence more I can't rely on a single node to to learn everything and this helps because we're not going to see all the data that's in the wild and drop out kind of helps us simulate some of those conditions that we can actually see on our own another important hyper parameter should know is a pox and that is how many times are we going through our data while we train it so this is another parameter that you couldn't adjust so a question that's asked a lot is you know how do we choose these layers the neurons the hyper parameters well the biggest thing I would say is use your training performance to guide your decisions if you are getting a high accuracy on training data but not on a validation set then you're kind of overfitting to your your overfitting to your training data and you probably should reduce the number of parameters if you're getting a low accuracy on training like a relatively low accuracy and you think you can boost that up you might be underfitting the data and maybe you should increase the number of parameters and what I will mention is that there's no exact science to this it's going to be a lot of kind of tweaking numbers and tweaking values and that's just the nature of building neural networks and don't worry that you don't feel like you are confident about everything you're doing a lot of it is just playing around in testing things and then another way we can choose hyper parameters is using some automatic search methods to kind of help test us help us test a lot of different values at once and ultimately choose the best combo of things and so with like SK learn you could use grid search CV to help us do that and I think we'll get to that in the actual coding section of the tutorial one thing I haven't mentioned yet but are very important to how our neural networks function our activation functions activation functions introduce non-linearity into our network calculations and that might not make mean anything to you but what that really comes down to is in like our example here what we were doing if we were trying to build a network around this data is ultimately we're taking our node values so our kind of input values multiplying them by some weight and adding all those values together to get kind of the output of the node and what an activation function does is it allow it to this node value instead of just adding input values times weights it adds this non-linearity and basically adds complexity into the values that we output out of that node so to kind of sum that up in the activation function is what allows us to fit our neural networks to more complex data and do some really exciting things so allows us basically to fit two data that looks you know more complex like this more easily and another question I hear asked a lot is what activation functions should I use well in general I would say it's a pretty safe bet to go with the rel you and your hidden layers there's this concept in neural networks called vanishing gradients basically when you're updating those weights and how far back you can update the weights and learn from your training data rel you activation function helps avoid that vanishing gradient problem so in general I would say for hidden layers really use a safe bet and then what I would say though is for your output layer if you're classifying a single label you're doing like you have you know red blue yellow green and you just need to classify each point as one of those soft max is a good bet but if you wanted to maybe classify things could be red and blue at the saint-like there could be both red and blue for multi-label classification the sigmoid activation function is a pretty good bet and then the last thing I want to get into Before we jump into code is a quick overview of tensorflow / Kaos versus PI torch and in this tutorial we're gonna be using Kerris and it's great for getting started quickly and really rapid experimentation with neural networks you're gonna find as you get more and more advanced that it lacks the complete control and customization that pi torch and kind of the more full version of tensorflow has tensorflow is been storica li the most popular framework for industry but i would say that it can get pretty complicated and the documentation for it isn't always consistent myself personally I'm not the most experienced with pencil flow what I'm usually using if I'm doing more complex neural network stuff and what I'd use for my master's thesis ultimately when I was working on different types of networks is PI torch and this has kind of been for a while the favorite of the research and academic community it has a very pythonic syntax and you can easily access values at any point in the network to start off the coding section of this tutorial I would say the best way to learn is by doing so we're gonna jump straight into some examples of building neural networks and through that you should kind of build up the fundamentals of what you need to know with tensor flow and caris so if you go to my github page get up comm slash Keith galley slash neural dash nets and this is linked to in the description I've left some examples there so we're gonna build a neural nets for each one of these examples in this folder and what the task is is just like what I introduced at the start of the video so we'll start with this linear example and just basically write a simple neural net to properly classify the red and blue points so let's download this github repo and there's two ways to do this I would recommend forking it and cloning it locally and I have instructions on how to do that right here but the other option you can go with is just simply downloading the zip and then take this and extract it to wherever you want to write the code the last thing I'll say before we start writing code is make sure you have tensorflow installed and probably the easiest way to do that is by installing the anaconda distribution and so I'll have a link in the description on how to install that alright and I'll be using sublime text is my head in there but we want to go into that folder that we just created wherever you extracted your files or cloned your files and I'll start out by going into the examples going into linear and just saving a file a file called Network linear T why so we're writing a neural network the first thing not always is going to be to import the tensor full library of the PI torch library if you're using that and specifically for this example we're using Karis so we're gonna import Karis from tensor flow so from tensorflow import Kharis so that is good then we'll probably want to also import some helper libraries so I think that ones will be important right now are to import pandas SPD and import numpy as NP and one thing I want to quickly mention is I'm using kite copilot over here so it's a really nice feature okay and basically as I type things out this will follow my cursor and pull up documentation on the Associated code all right and the other thing to note real quick is that in that same linear folder that has the picture of the example and where we just saved our file there's also two data files there's a training data set and a test data set that is producing the graph that you see here so we need a load that did that data so our files right here so we have to go into data and then load in training to start and then we'll also do the same with the test so we'll load in the CSV file with pandas so I'll do pandas read CSV and then the path I just showed you was the data folder and then train dot CSV file and I'll make all this code slightly bigger so you can see it more clearly and we'll probably want to save that as some time so we'll just call this the training data frame equals that and we can confirm that it's loaded by doing Train D F dot head oh and we should probably print that okay cool so yeah we have an x-point Y point and then the color and the color is just a zero or one argument here so we have it loaded in that's good so now we can go ahead and actually build a neural network around this training data and to do that we'll want to start out by defining a Karass sequential type and what this sequential is saying is that we have a certain number of layers in our neural network so this sequential here is allowing us to list the different layers that we have in our network so we're gonna go ahead and start defining layers and so to access layers in carrots you can do Karis layers dot and then all the types you see here are different types of layers you can add we're focused on a fully connected feed-forward network and that's going to be defined by this dense keyword next here are a couple different things we can pass into dense and you can see these different keyword arguments over here on the right in the kite copilot window but the first thing that the only thing that's required is the number of units we want to use and with this what you're gonna want to do is actually define your first layer as your first hidden layer and you'll see why in a second but I can say for let's say we just want Ford no neurons in our first hidden layer next we're going to pass in and I'm looking at this documentation to kind of guide me a little bit to is the input shape so we're actually passing our input shape into this layer and so our input if we remember is x and y so that's going to have a single dimension of two then another thing we can pass into this dense layer is our activation function and as I mentioned at the start of this video usually a safe bet for the activation is a rel u unit and there we go we've now defined a input layer of two no neurons that feeds into a hidden layer of four neurons and that Hinn layer four neurons has a rel u activation and then let's say let's make this first example because the data is very simple let's make our first example very simple and we're just going to feed this in this one hidden layer into our output layer and our output layer is too because colors can either be red or blue and on our data this looks like 0 or 1 but right here that's our first neural network it's a two node neurons to four neurons to two neurons so let's actually fit our model to that and we're gonna first want to compile our model and this is going to tell us how to train it so we're gonna want to use the atom optimizer and you'll see again with this compile often times I forget the exact syntax of how to write these Kerris networks but if I go to compile and then I look at my notes over here we see we need to pass in an optimizer the safe bet here is to use atom and then next we can go ahead and define a loss function for our network and to do that we're gonna wanted to go caris dot losses and then we'll see we have a couple different options here but with losses specifically I think sometimes it's it's nice to get a little bit more information and if I click on these doesn't tell us too too much about the type of loss here so what we're going to do is actually go to the tensor fluoro documentation and I want the losses page and as you can see there's a bunch of different options and the two popular members we saw here was categorical cross entropy and spares categorical across entropy it's unclear to me just reading that what the difference is here so that's a good thing we can kind of check on the losses page and the tense of flow documentation so first I'll click on categorical cross entropy so it computes the cross entropy loss between the labels and predictions so as you can see here this was 0.9 this was point zero five and point zero five the actual was one zero zero so we compute the loss on that you know the difference is between that that sounds pretty good but the one issue we have here is the way that these are encoded is a little bit strange they're called they're encoded in what's called one-hot representations so basically this is saying label 1 this is saying label 2 this is saying label 3 and in our data that we are looking at this was a float value of just 0 or 1 so what's going to be good for us to do here is actually check what that space categorical cross entropy was and the difference between the two is it says the same exact thing as the last loss but the key difference here is that it says use this cross entropy loss when there are two more labels and that's good for us and we expect the labels to be provided as integers instead of one hot representation so we can pass in integers here and we don't have to encode them in one hot representations so that's good for us so we'll define that spares categorical cross entropy so I'll do dots Perce categorical cross entropy and let's see find it like this and to be honest if you saw the autocomplete I was suggesting I format it like this and I don't quite know the difference between this I just know in all the examples that I've worked through I define it with this representation and then the last thing with this that we will want to say is that from logits equals true so if I go back to the losses that was one of the options we had with the special categorical cross entropy and if you are curious what from logits equals true means I always recommend you know go ahead and do a Google search on this question and as you see I was going through the potential flow API Docs here and tension-filled documentation they use a keyword called logits what is it well we found a nice little answer here and it simply means that the function operates on the unscaled output of earlier layers and in particular that the sum of the inputs may not equal 1 and that's what we want because we're using values that aren't necessarily one like if you look at our input values here you know these aren't going to be between 0 and 1 so we want to use the from logits equals true keyword and then the last thing we want to keep track of is a metric and we're going to use the metric of accuracy to see how our network does when we evaluate it so there we have compiled how the network's going to be trained and learn so it's using the atom optimizer that is figuring it is updating oh shoot I see I left a eye out of optimizer that updates the network based on the spares categorical cross entropy loss function now that we've done that we can actually go ahead and fit the training data to our network so as you can see here it's expecting an X Y and a batch size so our X values here are going to be the x and y coordinates and the values associated with them the Y value here is going to be the color 0 or 1 and the batch size we can kind of set how big we wanted that to be so let's just say batch size equals 16 to start so now what are our x and y well looking over here at the documentation we see that the argument X is the input data and it's expecting a type of a numpy array or a list of arrays you also could pass in a tensor flow tensor or a list of tensors and then there's a couple other options we're going to focus on that numpy array and so right now we have our data in a data frame which obviously isn't a numpy array but what's really nice is that we can easily convert the data frame from pandas into a numpy array by just doing train DF dot the column that we want to access dot values and so this will get the data in numpy form and I can show you that by doing 0 to 5 and Train DF dot X Dogg values we can surround this with a type and then we can print both of these things out and print this and there's going to quickly comment all this out so it doesn't run see these are our X values in numpy form and the type is numpy so we see that that is correct so let's go ahead and start passing things into our network and we note we could have done the same exact thing with the color label and also the Y values and just to help us out I'm going to also print out that data frame again so for our y-value that is the color so we can go ahead and fill in train D F dot color dot values to get that in numpy form X is a little bit tricky because it's not only the x value it's the X and the y value because both of them are important in influencing whether or not graph something is red or blue dot so what we're gonna have to do here is actually stack those columns together so that they're paired up so I'll say x equals numpy column stack and we want to stack the values that are in the X column X dot values to get it a number before and the values that are in the Y column train DF dot y dot values so now what this is doing is its pairing up this as the kind of first input this is the second input this is the third input all these columns the X and the y column are now paired together with this column stack command and I can just pass an X here that's cool all right before we run this let's do one last sanity check on our code to make sure that we didn't miss anything so we load in the data we build a model around the data we've set up loss for our network and then we prepare our x values and one thing I do notice that we forgot to do and this is very very important whenever we're training a deep neural network and that is shuffle our training our training data so this is important because as we have it now all of these zero labeled colors are right in a row and basically when we're updating our network we'll have highly correlated examples right next to each other and our network is not going to have a real idea of what the data looks like in the wild so it's very important that we shuffle our data and the easiest way to do that is going to be to do NP dot random dot shuffle and we can pass directly in train DF values to have it shuffle everything that's in the Train DF data frame and this shuffle method works in place which means we don't need a reset train DF to be the results of that it will update behind the scenes alright so now we've shuffled it and we can confirm that it's been shuffled by rerunning train DF head and notice before we had all zeros and now we have ones mixed with zeros so we know that the order is now flipped that's good anything else we miss and one other thing I noticed is that let's add an activation to our output layer and with this what I mentioned was good is to either use for for the binary classification task we can either use sigmoid or softmax so let's use sigmoid in this case it won't really matter for this specific example and with that I think we're good oh ok it just went through one time on the data as you can see I classified 70% of the examples correctly let's see what happens if we decrease the batch size okay it does a lot better if it's learning on small I never make examples at a time another thing we could try doing is maybe make this a little bit bigger and as you can see it really I can learn quickly when we made it bigger let's reset it to 4 and these are the types of hyper parameter testing we're gonna do in addition to bachelor's let's specify any number of epochs let's say 5 for a pox and look at that with the current settings we have here the network can classify a hundred percent of our training examples correctly and that makes sense given that our data looks like this it should be pretty easy to classify that the last step we want to do here though is actually evaluate on the test data and we can do this very similar to how we load it in the training data will say test DF equals PD read CSV of dot slash data slash test dot CSV and then we can say that test x equals NP column stack of the test DF dot X dot values and test DF dot y dot values it doesn't actually matter that the test data is shuffled because when we're evaluating a model after it's been trained you know we're not updating the network anymore so the order doesn't matter anymore because it's just performing classification all then the weights that have been set in the network so test X and then to evaluate we can use model dot evaluate and we'll want to evaluate test X on the test X label so that's when we test the app dot values per sorry dot color values so it's training again and then the final step here is the evaluation and as you can see this bottom part here is the evaluation run and as you can see I could just confirm that by doing print evaluation just training again and as you can see evaluation the stuff below that it does 100% accuracy on the test examples cool so that was example one let's rapid-fire go through some more examples the easiest way for us to test the next example is just to just save this file somewhere else so instead of the linear example we'll go to how about the quadratic example and we'll save network dot quadratic py here so this is the same exact network as before but now instead of the linear data we are training it on the quadratic data that looks like this alright so right off the bat one thing that's nice is how well would it do given the current network setup so now this is running on different data because we're running this function in that different folder and as we see here okay it only got 78 percent accuracy on this new data and with this graph you know we've added more complexity into the classification process so what were probably not to do here is add more more layers into that hidden layer hidden layer of the neural network so what happens if we bump from it from 4 to 16 and as you can see when we bumped it from four to sixteen it did ninety percent accuracy so that was significantly better than before what else could we do here I mean we could honestly keep going up with this we could just say thirty-two here and now it's classifying 95% of the training examples correctly and maybe we would even learn more if we bump this up to ten epochs so now we're running through the training data more and more times and as you can see we're near perfect we're at 98% accuracy this is a type of task we should be able to get 100% accuracy on because there's such a clear separation in the data so as one additional thing maybe we want to do is add a dropout layer and what we need to draw put this in here is a percentage that ran nodes are randomly dropped out so let's say 0.2 20% is a common value you'll use for dropout you need out of a comma so now it goes from the hidden layer it drops out 20 percent of that 32 node hidden layer okay so that it actually didn't improve our model another option we have would be to add another fully connected layer so layers dense let's say maybe we add another layer of 32 nodes also with an activation of rel you maybe that will give us 100% classification but honestly we're very close to what we're looking for and look at that yeah it didn't get all the training examples cracked but on the thousand test examples I got all of them so yeah that's a pretty good setup it seems like for the quadratic example and let's just keep this going next I recommend saving this file as will go to the clusters example so network clusters dot py will save this as and let's just look at the clusters data or the graph for the clusters it looks like this and so I think the one big thing to note here is that instead of now just red and blue dots were classifying six different colors so go ahead I would recommend trying this one on your own and seeing if you can tweak the network to get it to work for all six colors and note you might have to actually dig into the data a little bit because there's a slight nuance in this example on how we do that so maybe pause first I could try to do this on your own alright so I think right from the get-go we've saved this file in that correct directory last time we could just immediately run our file in the new spot and it just would work pretty well and here we have an issue it looks like let's see what her issue is now what could this issue be well let's actually look at the data real quick and I think it's safe enough that we could actually just look at what we printed out started this and here's the big difference with the last one right now we have colors that are written out of strings and so when it gets down to the fitting over here it's not going to know how to handle that so we're going to need to convert these strings that we have in our training data that we see printed here into a numpy array so probably just convert strings to colors to do some sort of mapping so that's not too hard so we can just do print train D F colors [Music] dot unique to get all the different colors that we have I'm gonna comment all this out real quick this is just color okay red blue green teal orange purple so all we're gonna do is just do a dictionary mapping from the string to a integer number so we can just do something like colored dict equals and we'll say red maps to zero blue maps to one green maps to two teal maps the three orange maps to 4 and finally purple maps to 5 and so once we have this dick now the next thing is to actually apply it to our data frame so we can do Train DF color we're changing the color column equals Train DF color apply and this is going to apply a lambda function on this so I'm gonna say lambda X which is going to say everything for every X cell in the color column we want to change the color column to the color dict of that string and as we will see when we print this out again now the colors are zeros and the unique colors are 0 1 2 3 4 5 that looks good so with that we can literally uncomment our model and we should be able run this just not Oh interesting we have one more issue it looks like received a label of five which is outside the valid range of 0 to 2 so the other thing that we have to actually change is before we were just constipating two labels so now we need to change this last layer to be 6 because there's now six different labels we could have look at ago getting up there and then we also had the same issue here with the test data frame so we'd also want to that same processing that we did the test data frame or the train data frame we'll also want to do that the test and now we'll see that it should give us some test performance here okay 97% accuracy on the test data that's pretty dang good and the last thing I really do want to say real quick is in addition to evaluate sometimes it's nice for us to just predict what the output will be for a single point so I could use the model dot predict function here and I could pass in well it's our type is a numpy array which is a list of two-dimensional values so let's just pass in a value like 2 comma 2 if we look at our chart 2 comma 2 right maybe not - coming - let's do 0 comma 3 that should be a purple point so it should map to the number 5 I believe from that color dick we were just talking about yeah purple is 5 so let's see if the prediction gives us 5 and ultimately like one we're using these neural networks out in the wild this is what we're going to be using like incoming data we would have to predict the value and use that as like kind of the the truth as long and as long as our models train well you would think that it would hopefully be giving you the correct prediction and you can use that however you want in your applications okay well it gives us all this information might be a little bit hard to read this what you could do is do NP dot around and then just have it round up to the nearest integer and so whatever one's closest to one would be ultimately your prediction here yet as we can see the prediction is not one not two not three not four not five not it's the sixth value so that's what we are looking for it was purple cool that looks good let's move on to the next one okay for the next example let's go ahead and save this file as network clusters will just say underscore two meaning two categories and when I say two categories I think it's nice to look at the data for this so first off let's look at the figure it looks like this and as you can see in this case in addition to having a color they also have a marker so what if we wanted to not only in our neural net predict the color but also predict the marker so like a plus sign here a star here triangles here in here how could we do that so now we're predicting two labels instead of one and that's going to change up what our network looks like so again feel free to try this on your own but this one is definitely gonna be trickier so it might not be as straightforward and just to show you the data would look like this we're now with XY red and some sort of marker so we're gonna have to convert all these into kind of more numerical we'll have to convert the color and the marker to some sort of vector representation and then be able to predict two things at once instead of just one so this will definitely change things up all right so now it might not make sense to do this color deck because now we need to predict two things at once what I recommend we do is go ahead and now instead of passing in like just integer numbers we're gonna ultimately stop using this Parrish categorical cross-entropy and use just the categorical across entropy and the difference here is that if you see what I type on the screen before it was expecting things like three to be passed in now we're gonna use labels that are vectors so just to show you what we want to get is have the first three labels represent the six different colors you have the first six labels represent the first six colors that we can output and they have the last three labels represent what the marker is so how that what the marker is in the graphed so what we can do here is pandas actually has this function called get dummies that can convert labels unique labels into this kind of one hot and coding representation so we're gonna utilize that so I'm going to delete the color decked for now delete this and now we're gonna do is get our labels so I'm going to say one hot color equals PD get dummies and we're going to pass in train D F dot color and then so this will can get it in a data frame form but we wanted in a numpy four so we're gonna do get values and just to show you what that looks like one hot color and one hot is the encoding it's what you call like encoding like this where you have a one with the truth of where the label is and I'll comment the rest of this stuff out quick oh I also have to delete this now I guess I could it would be helpful comment that out keep it on the screen oh no I don't want to put the whole thing as you can see though now its each of these values for the colors is encoded as can be seen here so we can do the same thing for the marker gonna BPD get dummies trained D F dot marker dot values and then what we're gonna have to do is concatenate the two so we want to append this with the three values that are going to be found for marker so we'll do n P dot concatenate then we'll pass in one hot color and one hot marker on the first axis and this will be our labels that's good and now it's a little bit trickier to do the shuffling like we were doing before just doing train DFL use because our labels are now separate from our data frame so I'm actually not going to do this right now we're gonna shuffle later let's uncomment everything and let's see I mean it will keep the data frame now let's shuffle down here where we actually get our x-values and this is now going to be labels instead of X so what we're going to do is we're going to do NP random Dutch shuffle X + NP dot random Dutch shuffle of the labels and the one thing to be careful here is if we separate out how we shuffle these we need to make sure that they're shuffled in the same order so we can do - this is because it would be terrible if we shuffled our input like our X values and shuffled our labels in a different order then they wouldn't actually match up with the truth so we would have at our a really hard time building a neural network around that so we can do is random state seed equals 42 let's say it doesn't matter what we set this seed to but basically this is just ensuring that in our random shuffle that the same shuffling is happening and happening in both places and now we can go ahead and fit our data comment all this out temporarily a target array six thousand nine was passed for an output of shape nun six huh using lost categorical cross-entropy okay that looks like it's pointing to this as an error and now that we're concatenated not only our color but also with our marker value we need to make this nine so now we're predicting you know a color in the first six cells of that neural network and a marker in the last three brutal and look at this this is really not good we got twelve percent accuracy on the task obviously we want to do better and when you redo something this bad you know something's wrong with their network so it's a matter of figuring out what going through and making sure that like the labels properly look like they should might be one thing to do like this let's just confirm that labels zero looks kind of yeah this looks good so that's good what else could be going wrong another thing I might look at as being a possibility of what's going wrong would be this shuffling here making sure that they're shuffled together if they're not shuffled right it's gonna be really hard to learn anything because it's not like truthful data each by just kind of random pairings of things but we seeded everything here so that looks good and here is the issue is that we're using product categorical cross entropy loss and this is expecting only one thing to be the label instead of the possibility of multiple things being the label as a result of that all we have to do to really fix our network is change this to binary cross entropy and what binary cross entropy is going to do is that in the output layer of our network now we're kind of going to be predicting each of these positions independently of the other positions so we can have multiple things be one or zero like we can have all this b1 and this b1 so let's see how that fixes our model and look at that yeah way better accuracy 90% that's yeah significantly better if we wanted to do the test we would go ahead and just basically we don't need the color dict anymore we do the same processing that we did up here just copy all this we'll say test one hot color test one hot marker this is test EF this is test DF and test labels and now we're evaluating on test labels and because we've only got 90% accuracy I think we could maybe need some more parameters I'm gonna bump these up to 64 neurons per layer and see if maybe that helps things to because there's more more things we need to learn here so I might need a little bit more parameters and we'll also just do a fun prediction so let's look at what we should expect zero three would be purple and a star and we would have to probably like if we wanted to really utilize this in the wild we'd have to do one additional step of basically converting our one hot and codings back to a string but I'm going to kind of skip over that for now yeah that's pretty good 93 percent classification and I'm sure if we wanted to we could like maybe bumping this up would help notice too that we got this prediction down here and if we wanted to kind of just sanity check that things were working well we could maybe classify things in three different spots so 0 1 would be arrow and red negative 2 wanted to be a plus and green so let's pass in those 2 0 1 and negative 2 1 I think was the last one negative 2 1 would be plus let's see what happens with these predictions or actually I make this 4 and look at that so we don't actually have the string representations but we can see that they all classify them as a different color and all in the last three things there all were a different marker so that looks like things are good and also our accuracy tells us that things are pretty good and for the sake of time the next step would be I would say to be able to convert these back into their string representations but that's more of a Python task than a specific neural net task so I will leave that for you guys to try to do as a final example and kind of a to bring this video full circle the last thing we're gonna do is build a neural net to classify the data that was introduced at the start of this video and to do that let's again open up our quadratic example and just save that now just because like the actual problem is very similar to the quadratic problem over the categories probably just did save that as Network complex dot py okay and so right off the bat let's see how our quadratic network does on the more complex data that's it here one last reminder ok so right off the bat it actually classifies 80% of those about 80% 79% of those points correctly so how can we do better well I think it's as easy sometimes as let's increase the number of parameters let's maybe add a drop out layer let's maybe add another one of these so we'll have three hidden layers we're gonna increase our batch size just so it goes a bit quicker all right maybe we'll make this like 256 will really increase the parameters here all right let's see how that now does Oh caris all right come on come on okay eighty one percent that's better I wonder if you know increasing this even more would help like how many parameters do what you really need and I guess here's the balance of you know we don't want to over fit too so we want to make sure that our test accuracy stays pretty high and you know this house actually here was eighty one percent and the training was eighty so didn't we didn't lose any generalizability there hmm that actually decreased performance maybe we dropped this back down drop these down maybe we try making the drop out a bit higher so let's say 0.4 0.4 and maybe just to go through the data more times with this higher rate of dropout and maybe that will help us more generalize and look at that that time I think with this added drop out here we did the best that we have done we didn't over fit it too too much and we also learned more because we dropped out nodes at randomly and forced these nodes in this network to do more on their own so that's a pretty good I'm pretty happy with that with 84 percent on the test set I might just as a last thing add the dropout to that final hidden layer as well and just see if that does anything it doesn't seem like that improved anything so we'll just keep it at this I think this is a pretty good solution we might not get a hundred percent on this last test I do recommend if you want to try to keep you know tuning these parameters see if you can do better and better but I think that's going to be the ending point of this video a couple things we're going to say before we conclude is that this took longer than I expected so we're not gonna have time to do the rock-paper-scissors example in this video but I'll bring that out into a part 2 video and kind of make that its own real-world example video another thing we'll do in that next video is look at how we can automatically select some of these parameters instead of manually setting them and then also I just want to say you know what our next steps where you can kind of take your neural network skills to the next level well I recommend looking at in addition to these fully connected layers that we were working with look at other types of networks look at you know RN ends look at convolutional neural networks and building networks around those types of things that's a one good thing and then another thing you could do is for all the examples we went through today maybe if you want to learn PI torch you could try doing try implementing these solutions with PI torch instead of Carus alright that concludes the video thank you all for watching hopefully you enjoyed if you have any questions let me know in the comments and if you haven't already make sure to throw this video a thumbs up and subscribe to the channel also I want to mention real quick if you enjoyed watching me use kite definitely check out kite and download kite in the link in the description feel free to also check me out on my socials Instagram and Twitter all right that's all that we have until next time everyone p7 you [Music]
Info
Channel: Keith Galli
Views: 37,580
Rating: 4.9674072 out of 5
Keywords: Keith Galli, tensorflow, pytorch, numpy, python 3, python programming, data science, neural networks, neural nets, neural nets in python, keras, tensorflow 2.0, tensorflow examples, machine learning, ml, beginner neural net tutorial, how to write neural nets, practice, exercises, neural net practice, pandas library python, numpy library python, scikit learn, sklearn, data analysis python, data science python, machine learning python, build neural nets, cnn, rnn, activation function
Id: aBIGJeHRZLQ
Channel Id: undefined
Length: 60min 36sec (3636 seconds)
Published: Sun May 10 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.