Single-Layer Perceptron: Background & Python Code

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's happening everyone this video will be covering the single air perceptron which is the most fundamental of all neural network elements and a thorough understanding of it is certainly required before you get to the more complex topics in the field note that a single layer perceptron is essentially synonymous with simply perceptron so I'll be using both names that are changeably throughout the rest of this video the element found in nature most similar to a perceptron is a neuron so many times you'll also hear those two words used together the beginning of this lesson will briefly cover the history of the perceptron then move into the basics of how the element actually works in practice around the halfway mark we'll switch over to a coding editor and actually implement a single layer perceptron and written on some data using the Python coding language in the description of this video I'll post a link to a pretty amazing free machine learning textbook if you're interested just came out this past year and covers a wide range of topics you also have the option to purchase a paper copy on Amazon if you'd like to support the authors as always if you find the video interesting or informative consider throwing me a thumbs up also think about subscribing to my channel if you'd like to stay up to date on the rest of my Python coding tutorials so to start things off the perceptron algorithm was invented way back in 1957 by Frank Rosenblatt while working at the Cornell aeronautical laboratory with funding by the US Office of Naval Research soon after the algorithm was built into custom hardware called the mark 1 perceptron a machine design for the purpose of image recognition it contained an array of 400 photoreceptors which were randomly connected to the perceptrons or neurons of the computer although the perceptron initially seemed promising researchers quickly proved that perceptrons could not be trained to recognize very large of a variety of different data patterns look it into why that is later on but for now suffice to say this realization caused a pretty major slowdown in the area of AI research fairly soon after the funding picked back up again after it was shown that far more data patterns could be recognized if we just stacked perceptrons what we now call multi-layer perceptrons we won't get into too many other specific of MLPs in this video but stay tuned because they'll be coming out with the video covering those in the coming weeks even though it may sound complicated the perceptron is actually an amazingly simple element the left-hand side of the picture as we can see here the first step is to feed any number of inputs into the perceptron we nearly always set the first input to a static one value called the bias input the reason why is a little more complicated but essentially biting in a single static input we give the perceptron the ability to move its decision boundary away from the origin if the date of the perceptron is being fit to requires that the next thing that happens is that the input values are fed through weights a single weight per input the weights act by multiplying the inputs by their current weight value creating a new value for example if we had an input value of 1/2 being fed into a weight with value 1 our output would simply be 1/2 times 1 over 1/2 after the inputs have been weighted they're sent to the third step the weighted sum which simply adds all them up and creates a single output value now in the final stage called the step function in the diagram you perform a check to see if the single value is above a certain threshold if so we have put a 1 if not we output a 0 and that right there is everything that happens inside of the perceptron if you don't think you really have a grasp on it yet you could try to pause the video now and go through the steps as I've written them on the right half of this slide but on the next slide we going through the mathematical representation of the process so that should also help to share things up so once again the first thing we're doing here is creating the inputs and the weights in a real application the inputs could be for example X Y Z coordinates and they could be trying to predict whether or not the coordinates were pointing to a car or a plane the car on the plane would be enumerated such that a car output would be signified as a 1 and a plane output signified as a 0 or vice-versa the next time we're creating a new placeholder variable called a which will be set equal to the weighted sum of all the inputs multiplied by the respective weights on the last time we're producing our final output by comparing a to the threshold value of 1 if a is greater than 1 we output a 1 signifying a car in our example or zero signifying a plane take note that during the training process it's the job of our perceptron to learn the weights which will cause it to accurately predict outputs when provided a set of input values after the perceptron has been trained the ideas that will have the ability to accurately predict outputs even when provided with inputs it has never seen before so inputs that were not on the training set the major downside to the simplicity of a single letter perceptron is that I can only accurately differentiate between linearly separable sets for example in a single dimension this would allow you to differentiate numbers larger than a certain value from those smaller in two dimensions this means you can accurately tell apart data if it could be separated by a single straight line and for three dimensions is pretty much the same but the line turns into a flat plane so in these examples here on the right you can see that we wouldn't be able to draw a straight line to separate these two sets of data in this case we would have to draw a circle if we wanted to separate the red dots from the green dots and then in this case we'd have to draw some sort of hyperbolic function if you wanted to separate these two sets if we did wish to write a classifier to be able to differentiate between this and this hyperbolic set we would need to use something a little bit more complex such as a multi-layer perceptron if you take a look at our plain example we can see that on this slide we would be able to differentiate between the planes and the cars because if you imagine you could separate the cars and planes with a single flat sheet placed anywhere between the lowest altitude plane and the highest car but on this slide we can see that one of the cars on the top of a mountain making it higher in elevation than was plane in this case if this was the training day we would never be able to get 100% accuracy because we would never be able to figure out a way to draw a flat sheet to separate all the planes and cars even if you tried putting this sheet at an angle you still wouldn't be able to separate any of the cars and planes the next couple slides will talk a bit more about the training process and how we figure out how to update the weights and then we'll get to coding the actual perceptron all the inputs and desired outputs are provided to the perceptron during the training process normally in a single matrix where each row is a single training set and the last column of each row signifies the desired output when trading the perceptron we first initialize all the different weights the weights may be initialized to 0 or to some small random value then for each example in our training set we perform the following steps over each input and desired output first we calculate the predicted output given the inputs and our current weight values next we calculate the error which is the difference between our predicted output and the desired output finally we update each of the weights by adding to them the product of the overall error and the prior input to that specific weight the picture on the right we can see how the perceptrons hyperplane separating the differentiated sets is slowly changed as more training samples are presented so in the beginning here we just have a simple straight line separating the single cat and the single dog but then as we added another dog sample we can see the full slope of the line decreases a little bit to accommodate the newer dog and then same thing continues down in these two examples except it gets a little bit more specific as we add more and more samples note that if we were to add another dog sample somewhere around here we wouldn't then be able to differentiate between the dogs and the cats because we would have a curved hyperplane somewhat difficult to actually explain what's going on in the training process just using math equations so you'll get a much better understanding once we move over to a coding editor and actually implement the training function itself so now that we have our coding editor open first thing we're going to be doing is covering the import statements we'll be importing the print function from future which effectively allows you use the Python 3 print function in Python 2.7 second import is going to be the assist module this is going to be allowing us to use the assist standard out right which is an alternative to using the print function third input is going to be PI plot from the MATLAB library this allow us to just output our data points along with our decision boundary or hyperplane last import is going to be numpy which we want to actually be using in this tutorial but it will be you in the plot function that I'll just be pasting in so if you guys would like to actually take a closer look at the plot function I'll be posting a link in the description to my github repository that actually has the entirety of the source code to definitely take some time to look over that if you're wondering what's going to be going on inside of the plot function next thing we're doing here is writing out all of our input data the first column pertains to the bias input as you can see every single bias input is set to 1.0 a 2nd and 3rd inputs so on the first row 0.08 and 0.72 are considered the two actual inputs into the neuron beside the bias input you could say these are X&Y inputs and as you'll see soon enough once we plot the data you'll get a better idea of how these all line up and then the fourth and final input in the fourth column is the actual classification of the data so that's going to be the output that we're trying to fit our weights to so now we've pasted in our plot function and passed in our input matrix as the parameter and we can see our data set plotted out on a XY plane as you should recognize immediately this is linearly separable data you can tell because you could just draw a straight line basically a line of slope negative 1 starting in the top left and going down to the bottom right you'd be able to accurately split the red dots from all the blue dots after writing in our three randomized weights going to now write out our predict function which is provided our input values as well as our weights currently in our perceptron and it will iterate over each one of the inputs and calculate the predicted output following the same equations we talked about earlier notice the threshold function is accomplished on the last line where we're gonna be returning one if our total activation was above the threshold otherwise we'll be returning zero the last function we'll be implementing here before we get to our Train weights function is an accuracy function this is also provided our input matrix along with our current weights and it returns the percentage correct so it goes through and calculates what our prediction was for every single one of the inputs and then compares that to what the actual desired output was this function will be used inside of our train weights function to figure out if we can stop early if we've already reached 100% accuracy so we've got all the helper functions out of the way we can begin to write our actual train weights function the APIs are our data matrix our current weights number of epochs which is essentially the maximum number of times we're going to go through and try to train the weights then we have the learning rate which we set to a default it from 1.0 with one point now it shouldn't really affect anything but if you did a lower learning rate you'd be changing the weights by a smaller amount on each one of the iterations so it may take longer to eventually get to the correct outcome if you increased it it would be changing the weights more but then that also might slow it down if it's overshooting itself every time by adjusting the weights too much v input do plot which is set to false by default if it's set to true then on each epoch we're going to plot out our current progress on separating the two data sets let me go through and run this the first time I'll show you guys an example of this being set to true so we can see the progression as our hyperplane begins to separate the two sets of data next input stop early it is set to true by default and if we ever get to an accuracy of a hundred percent before reaching our maximum number of epochs then we'll just quit out early instead of running through the rest of the pollack's because they're not going to change anything and the final input verbose one set to true we'll just be printing out more information to the terminal such as how much each weight is being updated on each epoch what the current accuracy is etc so now we're gonna create a for loop that's going to iterate for the number of times that we have epochs in the number of epochs in input first thing we're doing inside of the for loop is calculating our current accuracy using our accuracy helper function well then print that out along with current epoch number as well as our current weight values well then check and see if we have both 100% accuracy and our stop early is such a true in which case we'll break out of the for loop they're gonna check if we have two plots set to true in which case we're going to plot out our current hyperplane after that we're gonna enter into another nested for loop where we're going to iterate over each one of the samples in the data set for each sample we're going to get our predicted value using our helper predict function I'm gonna calculate the error by subtracting our predicted value from the actual desired value at that row in the matrix if Faribault's is set to true we're now gonna print out just a statement saying we're training on the data at index I so that's just gonna be which row in the matrix we're training on currently not gonna iterate over each one of the weights in our perceptron line and update it based on what our error was the actual equation is weights of J equals weights of J plus the product of learning rate times error times the input to that exact weight on that last run again if for most is set to true we're going to write out the actual weight chain that just occurred by saying what the what the initial weight was and then also printing out what the final weight was after making the change given ft to plot is set to false at the end of the entire iteration whether or not we broke out because we had a hundred percent accuracy or because we used up all of our allotted epochs we're gonna be printing out our final results with the title of final epoch and then at the end returned the weights we found the best fit the matrix we had input so I'll move back down to the main function and test out the train weights function will obviously be passing our data as the input data using our randomized weights as the weights using 10 as the maximum number of epochs just using the default learning rate setting do plot to true so on each epoch we're gonna get plot showing our progress in developing the hyperplane and stop early set to true as well so that once we hit a hundred percent accuracy if we do we won't run through all the extra epochs at the end and we can switch back over to terminal and try running the code we can see after the first epoch our initial randomized weights have got us an accuracy of 50% and with the plot that opens up we can see our hyperplane placement so as soon as we close out of the plot it'll start the next epoch so now we can see we still have 50% accuracy but it achieved that just by guessing that everything was of the right class instead of splitting it down the center and getting 50% I'm wrong as well now we can start to see the blue classification coming back into play in the bottom left same idea here again except it moved a little bit farther to the bottom left and here we can see this is gonna be your last epoch because we have a hundred percent accuracy and we can see we've placed our hyperplane going from the bad 0.8 on the y-axis down to probably 1.5 on the x axis and with this placement we have a hundred percent accuracy on all of the testing data so assuming our testing data was representative of something in the real world and all the elements in the real world actually fit into this situation where you could split them down the center we would then be able to go on into the future and predict on data that we hadn't seen in the training session just as an example if we set do plot two falls here we can run it again and we'll see that we skip through all the intermediary epochs and just print out the final epoch guys I think that's gonna be it for this video if you enjoyed it or found it informative be sure to throw me a thumbs up in the coming weeks I'll probably do a video on a multi-layer perceptron as well as continuing working on the Python data structures playlists that I've already started I think we have four or five videos in that playlist so far so I hope you guys enjoy the video and see in the next one
Info
Channel: Brian Faure
Views: 70,874
Rating: 4.8635254 out of 5
Keywords: single layer perceptron, single-layer perceptron python, single layer perceptron python, single-layer perceptron, perceptron python, neuron python, perceptron history, python, machine learning, python neural network, python neuron, python perceptron, perceptron implementation, single layer perceptron history, neural network basics
Id: OVHc-7GYRo4
Channel Id: undefined
Length: 18min 41sec (1121 seconds)
Published: Thu Sep 14 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.