Neural Networks 1 - Perceptrons

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in order to talk about neural networks we need to know about the building blocks of neural networks and those are perceptrons now you can think of perceptrons as a very simple form of neural network or as the building blocks of networks because a single perceptron is very Singh similar to a single neuron this little cartoon depicts an image of a neuron that you would see in a typical multicellular organism now from a biological perspective there's a lot going on in this neuron that is irrelevant to us as a AI practitioners but I will point out that we have this special outgoing portion of the neuron called an axon and then there are these various little tendrils which we can think of as incoming pathways which are dendrites now we typically don't use this terminology when speaking of artificial neural networks what we do is we think about the point where the dendrites would connect to other neurons so what happens is the axon of a neuron lead to some tendrils and we could have another neuron here and the axon of one neuron would touch the dendrites of another neuron and the sort of space between where information is transferred is known as a synapse so how do we abstract this for the sake of artificial intelligence this simpler diagram shows an individual perceptron or neuron or node that could be part of a larger neural network but let's analyze it on its own first so here we have several inputs these are similar to the dendrites of a biological neuron and then we have an output which is something like the axon specifically for our artificial perceptron all calculations will be done in the form of numbers so numerical values so I'll have several numbers being input here and I'll label them like so and so here I've used subscripts to denote the individual numbers but we can also think of this collection of inputs simply as a vector of some length and then we will have an output from here which I'll label O so we have our four incoming values in this example and a single output value now the question is what use is it to take a bunch of numbers and produce a number as output how do we do this and what can we actually do with it so another component of this perceptron are what we call synaptic weights and so these are associated with each of these incoming links and so I'll label them these weights are meant to model the strength of a connection in a biological neuron so higher weight values mean that the numerical input will be propagated more strongly whereas weight values close to zero will reduce the strength of the input and you can even have negative weights which flip it's magnitude and so what happens in the neuron is we will sum up all the incoming values so what are the actual incoming values well they are the products of the inputs and their corresponding weights so here we'll compute the sum of N equals 0 up to 3 I sub in times W sub n so we will sum these values and this gives us a momentary calculation here that is not the final output so there's actually one additional step here where this sum gets processed through something called an activation function a now there are many different activation functions and the choice of activation function can actually be quite important in different network models but for this simple perceptron our activation function will be also very simplistic it'll simply be what we call a threshold function one that if you plotted it on an XY axis would look like this where the point where you have a jump is at x equals 0 so in other words if this sum is less than 0 then the output here will also be 0 but if this sum is greater than 0 you'll output 1 and typically an actual output of 0 will be on the zero sides this is an open gap there so this output equals the activation function applied to the sum over the different inputs and weights now we have one more additional thing to consider here that is currently missing from this model and that is something called a bias so we typically have an extra input or sort of simulated input to every neuron which has a constant value of 1 and the weight on this link is a special way known as a bias now you can also model this differently you could say that we input a bias value but having B times 1 is equivalent to that so however you want to model it so you can treat the bias as an extra weight or you can have it as a separate component in which case it would look like this so this is how we take a collection of numbers in this case 1 2 3 4 use some math to then get an output so we're just computing a simple function and for those familiar with linear algebra you'll recognize this as a dot product so if I have I is a vector and if I have the weights also as a vector then this formula here is equivalent to the activation function applied to I dot W plus the B so different ways of writing the same calculation you get the same result either way but now the question is what is this actually useful for why will we do all these calculations it's a nice albeit simplistic model of a biological neuron but what does it actually do for us in terms of artificial intelligence a common goal in artificial intelligence is to be able to classify things humans are very good at classifying things we would like computer algorithms that can also classify things and so what sort of things will it classify well let's start with a simple example of points in a two dimensional space so I have a perceptron here which takes two inputs X and y coordinates and there's also a bias and since we only have these two inputs i've also labeled the weights weight sub X and weight sub Y to make it clear what they're associated with so if I have a 2d space like this so this is our standard Y axis and this is the x axis then if I put some points in here I would like to be able to determine if these points belong to group a or to group B now a human can see that there is a clear division between these points and so how can this perceptron perform the classification for me well a perceptron essentially defines a linear decision boundary so how does it do that well let's assign some specific values to these weights and see what happens so let's keep things simple to start with and we'll just say that the bias is zero that way X is negative 0.5 and weight y is positive 0.5 and so given these weight and bias settings what does this perceptron actually do well the sum is computed here so what we're actually computing sort of at this point before we pass through this threshold activation function is the following formula well for these specific values that is negative 0.5 X plus 0.5 Y now keep in mind that this threshold activation function has a jump at the zero point so if this value we're computing is greater than zero then we'll output a 1 otherwise we output a 0 so really this formula is most interesting at points where it equals zero since we have an x and y here we can actually move the variables around to get the equation for a line so if I add 0.05 to PO sides then I get and of course the 0.5 cancelled out on both sides so really this is just a very simple line y equals x and if we draw that in this space it looks like this so what does this mean well if I pick any of these points and take their XY coordinates and I plug those into this perceptron then I will get an output that from this activation function of one similarly if I take the XY coordinates of any of those points and put them into the perceptron I'll get an output of zero so in other words I take a point I put it in the perceptron either says one for these points or zero for these points in fact I could take a point right here and put that in and it would also be zero so this line is a decision boundary that separates these two classes of points and so that is the use of a receptor on is that it allows me to do simple classification with a linear straight line decision boundary now this generalizes to more dimensions and so let's say I did it in three dimensional space it would correspond to a plane rather than a line or if I did it in a higher dimensional space it would correspond to a hyperplane and that becomes nearly impossible to visualize in any sensible fashion but it is still possible to make distinctions between points so let's do one more example with some slightly more complicated weight and bias settings here are some bias and white settings for the same simple perceptron we were just discussing so what we want to do is figure out what the decision boundary is for a perceptron with this set of parameters so first we'll make the same equation we had before and we were going to set this equal to zero because that is where the decision boundary will be so now we can put in the real values and then simplify it's fairly easy to simplify I just put the Y on the other side so any perceptron with two inputs and a bias can be simplified in this way and then we get essentially the slope-intercept formula of a line in two-dimensional space we know that when x equals zero y will equal zero point five so the bias is basically the the y intercept and then when X is one we'll have 0.5 plus 0.5 that'll be 1 that'll be this point and that's enough to draw a line but I'll do a few more just to make it clear when x equals 2 that is 1 plus 0.5 so that'll be right here and when it is 3 that is 1.5 plus 0.5 which is 2 so that would be over here so we get this line so examples on one side of this line will be considered positive and so for this particular formula these will be the values that have an output of 1 and then examples on the other side of the line would be negative which would have an output of 0 so there is a simple learning algorithm associated with this which I'll show you but I do want to point out right now that the major weakness of perceptrons of a single neuron is that we cannot define a curved decision boundary let's say that we have points like this and so that's one cluster and we'd like to group them all together but have another set of points here that's sort of inside of it and then have that be a separate cluster there's no way to really draw a straight line that completely separates those two clusters and that's why there are or complex network models which I'll discuss in a later video however first let me describe the perceptron learning algorithm because it's actually simple enough that you can go through a few steps of it by hand so here is the learning update rule for basic perceptron learning there are several variables in this formula the first is alpha which is the learning rate a larger alpha means we make larger changes to the weights every time we do a learning update however if the changes are too large we might oscillate back and forth rather than actually honing in on a solution sometimes alpha changes over time in the learn during the learning process but we'll keep a fixed alpha for there a simple example here W is the weight vector we will use T to indicate the target for a given sample and so if we're really technical this is really T sub I where I is the input vector so every specific input vector has a particular target and the targets are always 0 or 1 because we have two classes we're distinguishing between now if we make AI be a parameter to what we're going to call this function P that'll be the output of the perceptron the perceptron is basically a function it just takes an input vector and gives an output and technically the value computed by the perceptron also depends on the weight vector but that is not shown in the notation that's just assumed so given all of this we have this formula so what happens is will have several labeled training examples and each one consists of an input vector and a target and after seeing each one we will modify the weights in accordance with this formula so the current weights are updated or changed to be the current weights added to the learning rate multiplied by the difference between the actual target and the perceptrons output multiplied in an element-wise fashion with the input vector values now this is a vector this is a vector this alpha x quantity t minus perceptron for i quantity this is a scalar so this whole thing is a single number and so the idea is that for each input value and eat there's a corresponding weight in this weight becomes this number multiplied by its corresponding input the the next way is this same scalar quantity multiplied by the next corresponding input and so on so this is an element-wise update to the weight vectors and we'll see an example of this in a moment but using this learning rule I'm gonna walk you through three updates of the parameters of a perceptron now the perceptron has to have some initial weight and bias values I've simply set my initial weights to zero and also my initial bias to zero but they're often randomly initialized to some other small values and more complex networks the way you initialize your weights and biases can be very important but it won't matter for our simple example here so our starting weights and biases are all zero and this is once again a perceptron that only has two inputs corresponding to x and y and a bias and this column shows the points the coordinates of points in the 2d plane and the knees are the target values and so the targets here are 1 0 1 so these are two separate classes a B a the labels are arbitrary so let's see what happens as these examples get exposed to the perceptron so if we take 1 1 and first we have to compute what P of the inputs is so what is the actual perceptron output for that input so we have one times zero because the weight of the X input is zero plus one times zero because the way the Y input is zero plus one so the bias value is one times zero and so that is B so the idea is that we have one as a sort of hidden value in our input vector here and then B is the weight of the bias this will equal zero and if we pass that through our activation function now this is where small decisions are important we said this is a threshold activation function where if you are less than or equal to 0 then the output is zero and only if you are strictly greater than zero with the output be one so if we take the zero pass it through the activation function of the perceptron we get an output of zero and so that means that the perceptron output of zero is different from the actual target the target was one our output was zero they are different and so we have to do a learning update so we will actually apply this formula here so our target was one our perceptron output was 0 so 1 minus 0 is 1 our alpha value has been specified so we need to pick a suitable one for these examples I will define alpha alpha equals zero point one zero point one times one this is zero point one and then times each individual input value on position by position basis now in this case it's easy because the coordinates are 1 and 1 and the hidden sort of bias input is also 1 also all the weight values are the same so we're simply going to add zero point one to each of those values and so just a note that this was an example where the perceptron output did not match the target so I'm putting an X there to indicate that so that's one update to the weights and biases let's do it again so for this set of inputs we have to compute what the output would be so just to remind you this is the value of P applied to I hat that we're computing here so this is this formula through a threshold activation function so that would equal the threshold activation function applied to 0.2 plus 0.04 plus 0.1 which equals the threshold applied to 0.34 which because that is greater than zero it is a value of 1 but in this case the target was zero so we're wrong again that's another strike and we'll update the weights so our target was zero our perceptron output was 1 so 0 minus 1 is negative 1 and so we'll have our alpha of 0.1 times negative 1 time's the individual input values and that's going to be added to the individual weight values to get our actual updates 0.1 minus 0.1 times what is the input for X well it's a two so we're doing 0.1 minus 0.1 times two in other words minus 0.2 so this actually equals negative 0.14 wait why we start at 0.1 subtract once again 0.1 times 0.4 which comes out to 0.06 for the bias a value of 0.1 minus 0.1 times 1 well that just cancels out so the bias is back to zero so let's do one more just to finish up so as always we first have to compute what perceptrons output would be for the given input the perceptron output is simply 1 so in this case we get the correct value so 1 is equal to our target of 1 and what happens in this case is our target is one our perceptron output is one so this quantity is zero so the magnitude of the update will be zero in other words whenever your target matches your prediction there actually is no update so I could go through the math but I'd just be adding zero to each of these weights but ultimately we get the exact same weights after that update step so when the perceptron is making the correct prediction we do not change the weights at all and so if we had more inputs in our training example we would keep cycling through them until eventually all of our predictions were correct and as long as our points are actually linearly separable and our learning rate is small enough this is guaranteed to eventually happen however as we saw earlier in this video there are many cases where points that you want to classify are not linearly separable and a perceptron can simply not handle those so we need more complex neural network models to handle more interesting problems
Info
Channel: Jacob Schrum
Views: 44,564
Rating: 4.9288435 out of 5
Keywords: Neural Network, Perceptron, Artificial Neural Network, ANN, NN, Activation function, linear algebra, dot product, deep learning, Artificial Intelligence, AI, Neuron, Node, Perceptron Learning, linear model, classification
Id: aiDv1NPdXvU
Channel Id: undefined
Length: 27min 1sec (1621 seconds)
Published: Thu Aug 16 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.