Lecture 5 "Perceptron" -Cornell CS4780 SP17

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right hello everybody hello good hi alright so last time we talked about we finished off the curse of dimensionality and then we went to the perceptron and then the K nearest neighbors actually curse dimensionality k nearest neighbors and the perceptron a few questions that were asked afterwards I make quickly relay so one person asked you know why is that so bad if you have you know everything is far away from everything else and so for ones actually yeah well it someone asked the question well how many points would you need if you actually want to have a small neighborhood right so we had that little box and we said you know this here's my little box with a size L and when we showed is that if the dimensionality becomes very large then this box becomes larger and larger and basically it takes up the entire volume one question that you may ask is well if I want to L to be 0.1 right if I want L to be small how many points would I need all right and it turns out it's very easy to show that this you know the number of points you need actually grows exponentially right it's actually on your nodes so it in fact that it scales with 10 to the D it's the D is the dimension that that's how many points you need right this order has a constant but so in the mat you know if you have a thousand dimensions by ten to a thousand right is absolutely gigantic right so that's more than there's electrons in the universe right and the moment you have anything that's more than electrons needed worse it's useless all right it's basically infinite so that's another way of looking at it another question that someone asked me was if you have two two points so with the curse of dimensionality what we see is here's my test point and the you know the nearest neighbor may be here but the sake nearest neighbor maybe over here and they about the same distance but I can actually find out that this year's a nearest neighbor I've have a really really precise computer I may be able to find out that this is my nearest neighbor and I know that this here's an X and this year's O so why couldn't I then assign an X to this point because I realize this here is the nearest neighbor this here is a little bit closer than that point ok does that make sense so the question was well if you can still figure out which point is closer even if they are far away from you why couldn't you still do nearest name and the reason is that it really doesn't tell you much about the data right so all you know about the data is you have the assumption that nearby points have similar labels so you know this is an X so you know around here is all axis and this isn't own you know around years or lows and you know another thing at some point it switches from O's to X's but you have no idea where it switches but it could be that it switches here right that at some point here basically goes over to Oates right and you allowing squarely in the old territory right or it could be that X is all the way up to here and you know here's where it switches then you're squarely in the X territory right the promise you really cannot infer anything about the label of this point if you know all you know is over there X's over there are O's and you have no idea what's in the middle does that make sense Razia under that makes sense awesome ok good those are two little thoughts will do convey about the curse of dimensionality quick thing about the project so some people report their problems with look for aquarium it seems to be that the most common problem is that people click commit without clicking safe first so that's I guess that's not really expressed in the user interface so you have to first say and then you have to commit otherwise you get through the previous version submit it to the leader board so please be conscious of that and also today project 0 is due and project to project 1 is still running in project 2 is being shipped out either tonight or tomorrow morning and they'll be on the perceptron any questions about the projects or logistics No okay good after we I scared you with the curse of dimensionality oh Jesus still oh my handwriting it's amazing it's prefilled actually that's that's exactly what I want to talk about so we talked about the perceptron it's the idea the perception is you have data points that either X's or OS and the assumption that you're making is that you can have you know there's actually a hyperplane that you can squeeze between the different classes so in this case we assume there's only two classes that's called these the minus one and these plus one there's a plus one so there's the binary case and that may seem restrictive that you can you know find a hyperplane between these clouds but actually in high dimensional spaces that's not very restrictive and high dimensional space is actually that's very very very likely all right and that's in some sense basically you're taking an advantage of the curse of dimensionality okay and this is actually yeah okay let me erase this part so last time we defined the perceptron I showed you that it's just a hyperplane so when we have a point X what we compute is just W transpose X plus B that tells you how the how the point X stands in relation to the hyperplane which is defined by W and the offset B and if this is negative then you're on the on one side if it's positive you on the other side so actually in this case this would be this way all right raise iana that still makes sense okay all right so the perceptron algorithm assumes that such a hyperplane exists and provides basically a very efficient algorithm to find such a hyper plane or a such hyperplane one trick we did is we said we do the following transformation we take our X and we just so everybody just add a dimension of one of constant one and our you define a new W it's called this W bar or something and X bar X bar is our original X with a 1 as one additional dimension and W bar is our original W with just one more constant that one more dimension that we're learning and the nice thing about this is if you take the inner protect the inner product between W bar and X bar and that's the same thing as here so if you learn our W bar then you can just look at the last dimension that gives us our B so then we can basically decompose our W bar into W and B and why is it that pertain to someone else now why are we doing this right it's like it's the same thing and that's exactly the point it's exactly the same thing but the derivation is a lot easier because you don't have to futz around with that B all right so we'll prove something in a few minutes and it's just annoying to always carry that be around with you the same in the code it's just annoying to update it so you just there's a very very simple trick just get rid of all the updates to be etc okay yeah so well and I guess is linearly separable that is that I'm not shown us and the question oh it's very different I can use name I see well Keeney's neighbors can around go around curves and stuff right like this so K nearest neighbors also has a decision boundary let me let me look is a good point let me let me explain this and so when you do K nearest neighbors and this is actually something I guess I showed you this example with the pink and blue they a in a set right so one thing you can basically say is for me every single point you could say if there was a test point would use classify it as a circular would you classify it as a as a as an X right and ten years neighbors therefore all that he finds a decision boundary is thy something sighs disempowering public it was like like this right between all the O's and all the X's right and so you could actually have something you know we have another thing carved out E and Ireland or something right the perception makes a very very restrictive assumption on the decision boundary it says this decision boundary has to be a hyperplane so that that's it's very restrictive I have a trans out actually sufficient high dimensional spaces right but the advantage is that now you just have you just have to store the one vector and the button and the bias of the hyperplane where's Ford K nearest neighbors in order to encode that decision boundary actually have to store every single training point right and you have to compute the distance which is a very slow and memory inefficient algorithm so they'll be busy simplify the hyperplane get a lot of memory savings you know a lot of savings for it but there's a trade-off because we couldn't model something like this okay you're a good question any other questions okay and the same thing by the way like you know that the K nearest neighbors we meet this have this nice theorem saying that if n goes to infinity you know we will have most have it twice as many errors as the Bayes optimal classifier well that's impossible to prove for the perceptron why I can even tell me why that's impossible why couldn't you possibly ever prove something like this for the perceptron all right I'll give you a minute to think about it discuss it with your neighbor [Music] [Music] okay any suggestions so I would like to show the baby yet the training set becomes infinitely large my perception will make very few few errors why can't I possibly prove this when is it going to break down yeah that's exactly right right so if you have a data set you know you could have a data set that the distribution is such that you have positive you know points here negative points here positive points here negative points here and so on right imagine you theta they do a sample from some kind of sine wave right well there's absolutely no hyperplane that separates the O's from the crosses right and the more data you get you know you won't do any better if you error will not improve if I tell you know here's really a lot of crosses right like here's a lot of oh right it won't help you because there's no hyperplane that separates the only crosses right the moment you put your hyperplane here we're gonna get all these guys well why do you say everything is oh you're gonna get all these guys wrong and the more data you have like you know it's it's all proportional value just sample more points so you're not actually change in the air okay so nicely but keen ears neighbors it works for all data set where K nearest neighbors would work here but perception would not does that make sense raise your hand of that make sense okay awesome so here is the perceptron algorithm it's very simple so you start out with W equals zero so you say initially my weight vector is all zero you know that will always predict zero that's neither positive nor negative all right and just a reminder I assume that my Y eyes are either plus 1 or minus 1 so every data point has a label plus 1 or minus 1 and then I say you know I can either say well let me write it the way I have it here Y true it's a loop forever that's what's not great coding but so we said you know any consumer may see what we're doing is we just repeatedly go over the entire data set and they count how many points we missed classified so M here is the counter of how many points I miss classified if I make it pass to my data so initially I say okay so far I haven't got misclassified a single point yet now let's go over my data set let's try every single data point and see if it rice lies on the right side of the hyperplane alright so I have my data set like this yeah my ex is here my OHS I initially have a zero I have a pains I will get every single point one but so the algorithm is basically every time I get a point wrong I adjust my hyperplane and then I basically you know loop over the data set and the moment I don't make a single mistake anymore that means you know if everybody see all the crosses line on this side of the hyperplane or the ozone that is set up to hyperplane then I know I've actually found a hyperplane that separates the two and then I can stop so that's the high level intuition so this year is the count of how many mistakes I've made and now we say for every X comma Y element of D so we loop there's a for loop or for you now you can buy this annotation that you're familiar with then you basically say okay now I classify this point and so here's what I'm doing I now say okay does this point lie on the right side of the hyperplane if it does I don't do anything and if it does not then I increase my counter because now I made a mistake and I adjust my hyperplane such at this point is now hopefully correct but that's the idea that's the idea behind it so how do I do this I say if Y times W transpose X is less than 0 less equals 0 Y times W transpose X is less equals 0 I claim that that's the case if and only if I get the point wrong can anyone explain this only one person why does this mean why W transpose X so I have a point X with label why I call him John and I want to know that's this point lie on the right side of the wrong side of the hyperplane now I'm but I'm computing is y times w transpose X and save that's less equals zero that I must have gotten the point wrong otherwise much gotten right who can explain it ma okay give me the first that's exactly right so you say these guys should you have a positive label these guys with negative label if I'm on this site and I have a positive sign so W transpose X here W transpose X is greater here W transpose X is less than zero and what I want here I want this to be greater zero here I want to be less than zero right so I want these signs to align right so here I wanted to be negative and here I want to be positive here the label is positive here the label is negative so what I do is I multiplied right so if I get it right here this is negative x minus one is positive if I get it right here this is positive x plus one is positive okay so if it's less equal than zero then I know I got it wrong and then what I say it's very simple I say my W becomes W plus y times X so what does that mean Y is either plus 1 or minus 1 so it's a part of this point I'm adding it to W and if it's a negative point I'm subtracting it from W you can think about it in the following way if it's a positive point then I'm adding it to W that means the next time the inner products going to be a little larger all right so hopefully it will make it positive and if it's if it's a negative point then I subtract it from W that means next time the inner products going to be a little smaller so hopefully then it's going to be negative times the negative minus one it's going to be positive and the other thing I'm doing I'm increasing my counter M is M plus one so this counts how many mistakes 1/2 and I say at the end so I do this a loop over the entire data set that's M equals 0 then break and then you can end so but if you what we're doing one more time we initialize our W to be zero that's going to get everything wrong everything wrong then we just loop forever and here's what we do everything a loop we set our count initially to zero and now we go through every single data point in our sample we say which side of the which side of the hyperplane do you lie on the positive point should lie on the positive side the negative points are on the negative side if that's not the case then the update our W vector we try to reinforce the positive points to become yet larger at the point in the product the negative points they have small inner products and we increase the count of how many points we get wrong this time around and we keep doing this until we make a full pass over the data set without a single point run and what does that mean well if you went through every single data point and every single one was in the right side of the hyperplane we didn't make any updates that means it must now be a separating hyperplane any questions yeah let me let me get to this in a few minutes but well yeah in some sense what you can think about is the following right if now W plus X so W transpose X becomes W plus X plus X all right well that equals the old W transpose X plus X transpose X right gbz increase the inner product by that amount okay any more questions it's one second let me just bring the projector down yep now this is inside the loop it's inside this loop Lizzy goes through the whole data set if he had made a mistake then you go again over the data set you reset M equals zero when you go again over the data set but the moment you make a mistake you update it so you could actually feel ready go the data set in every single time you update it right the only thing you stopped it if you haven't made an update for endpoints does that make sense yeah so though we will get me here proof in a minute in a few minutes we will prove that that actually will converge very quickly so this is actually you didn't just make this up and you know it actually has a very well you know motivated derivation yeah sorry so depends depends on how many data points you have I mean come on there's 1957 like you know we are glad I'm getting something right I mean you will eventually I mean that's already but all of Machine does like your training you try to minimize the error on the training set there's other mechanisms you do against overfitting this was not actually a concept yet at the time yeah we will get to this any more questions all right good so here's a little demo so um what I can now do is I can draw a bunch of data points and I can then train the perceptron to classify them let's first draw some positive points so here my posit I think positive comes first I'm not sure and that's choice I'm negative these are negative okay whatever and so let's hope that's linearly separable okay if it loops forever it's not linearly separable all right let's see and so here's the first thing it does that initially had a zero vector you can't see anything and then it encounters this vector here right so these are the positive points and negative prints so pix11 arbitrary vector first it's this one it realize it gets it wrong so what is what does it do with ads x2 that vector and what you see here blue the blue line the blue arrow is actually the error after I've made that update so it points exactly to X Y is that because my initial vector was zero zero plus X is exactly that vector okay does that make sense so this is my origin and X is basically the vector from here to here so I'm adding it to its there's Nam and now I'm actually have a hyperplane great right so everything here is you know it's classified positive everything is classified negative so now I'm this is my current hyperplane my vector W this red thing now I have another point this one down here I get it wrong right this guy's positive it should be on the red side but it's on the blue side so what do I do I add this point to my vector right now this vector Plus this vector becomes this vector ok does that make sense so that's my new hyperplane now I have a new hyperplane I look at this guy oh I get this guy wrong - right so this is on the wrong side so I again add this this is my original vector the red is my original hyperplane vector I add X to it now I end up here so now I get this guy wrong again I already had this one but you know now I get it wrong again right so I keep adding to it and that's can someone please pray this is yeah ok good good and to now actually end up with hyperplane that actually you know here all the positive points on one side all the negative points on the other side right and we were proven a little bit that whenever there is such a hyperplane the algorithm will converge to society any questions about the demo any data said you want to see yeah there's there's infinitely many and wars have been fought over which ones are the best we will I will expose you a little bit to this later on but that was the first algorithm that found any so at the time was amazing that you could find any and the moment that had that happened people we're like wall - better than yours so there's a lot of data it turns out there are actually better ones in numbers so one that you can see if let me just make a really easy example you know here for example I've just you know three points on this side I have a really large margin between these right and one thing you gonna see is it's gonna do it very quickly right so the algorithm takes a lot longer when there's very little wiggle room right because you kind of overshoot it over and over again and that will make sense once we look at the proof right so in this case it immediately found it with one update any more questions I can show you another example before we do this I just want everybody to have a moment of silence that we think about the time it took me to make this demo [Applause] all right so here here's what we have so so he had handwritten digits of zeros and sevens so I only take zeros and sevens turns out that's linear acceptable because they don't really look very much alike and so that's in the space these are 16 by 16 pixel images sorry 28 by 28 so 760 my 16 I was 256 dimensions anyway so it's based in a 256 dimensional space and this is my training data set these digits here right so this is my entire training set is a small training data set and what I'm showing you now here is this box is the weight vector right so the weight vector is also in the same space so I can visualize it as an image is a 60s by 256 dimensional vector I can plot it as an image okay why not so right now it's all zeros right so that's gray white is positive you know that's kind of negative so right now all zeros means gray okay so now I go through my first image this 7 and I misclassified I classify that as a 0 because 0 times whatever this pixels are is 0 so I get this point wrong so what do I do I add this point to the data air to the classic air to the weight vector and this is my new weight vector okay actually I subtract it sorry because it's a negative example does that make sense raise your hand of that demo makes sense ok now I reclassify all right so these are now the reclassification Sakuni but I'm classifying everything is negative ok that makes sense right because so far I've only seen one example and it was negative so if I take the inner product of this vector with all my entire data set every example is negative so now you give the next example before when everything was classified as zero that's why they were all here so the y-axis here is basically for every single image how I classified it it's up here it's positive if it's down here it's negative so now I'm classifying this guy here well I'm classifying is negative as a 7 but it's actually a 0 all right so I got it wrong so what I have to do I have to add this to my weight vector all right so this is what happens all right so now I actually have this kind of it's half the 7 half this year right if I now reclassify things you can see now this guy's classified positively right let's now go to the next oh it's already everything is correct well that was pretty easy ah let me do it one more time because I speak you know you gotta make it worth it yeah okay here one more times I have the first one I get it right I get it get it wrong so make it positive like one so here's actually mine here's my weight vector so now I get the seven wrong so I subtract two seven now the things that miscast a reclassified now I go to this zero this gets classified correctly next zero gets classified correctly seven gets classified correctly everything's correct awesome feeling pretty good and now this guy oh what's going on it's still correct I think the top left corner no maybe the thing is wrong okay maybe it's the center okay so I have to update my vector yeah and I move this guy up oh that was too much right so now I have to get the seven wrong so after zip you know subtract it again and do we have it I think we got it all right good doesn't make sense [Applause] yeah question yeah so the negative that the points that have negative label these points have minus one so these aren't subtracting and these here that wants these zeros they have a positive label so I'm adding them so what I do is W becomes W plus y times X so Y is either plus 1 or minus 1 all right that's why I'm adding these guys and I'm subtracting these guys yeah good question any other question yep well if it's not linearly separable it's just going to loop forever I said that's the problem that there's no there's no check of is that even possible but you just assume it's possible yeah oh that's right yeah so Y is actually the Y of that particular data point absolutely right yeah yes that's V we've evolved a great deal since 1957 so that there's much much more efficient base right and that was making you know this was the first algorithm that to do this yeah so later on the course you basically take this intuition and I will show you much much more efficient ways of doing this that's good question though it's extremely inefficient yeah any more questions all right let me close this [Music] okay good so now I thought we spent a few minutes I just want to have my home the geometric intuition on a little quiz so there's just two parts to it so if you turn you your you actually have this printer that's the same thing oh I don't have it there you go so the question number one is you have a point X you have to draw it like yeah here's my origin and here's my weight vector W so remember that the hyperplane always goes to zero right why does it always goes to 0 as 0 who knows why doesn't happen train I always have to go to zero yeah that's right so we added the 1 and that B we forced it to be that way right bigger better we incorporate the offset as the additional damage so this here's my picture I have x fw i want you to draw so this guy is misclassified it should be positive I want you to draw the new W after an update that's the first first quiz I want to do is just to visualize this and the second thing I want you to do is think about how many times imagine I would encounter the same point over and over again in a row how many times could I get it wrong it does that make sense so please spend 3 minutes with your neighbor and figure these two things out [Music] [Music] [Music] [Music] all right please raise your hand if you've got both answers all right keep thinking all right well the first question so first guess is my first initially my hyperplane is defined by this w and so the happening goes to zero right so it's basically the vector that goes to zero and I thought it so that the plane so it's talking onto the vector W so this here's my hyperplane not only come to this point X and I get this point X wrong so this point should be positive but it's negative what do I do what does the hyperplane look like after the update can anyone show me what I have to draw I see people drawing the - okay good so here's what I do right I look this vector actually you know this is actually a vector X right now become W becomes W plus one times X is the positive points of one time so I vector addition Macy works that way right so you add W plus X this is max let's translate it and this is my new W then and so I have to draw an orthogonal hyperplane so this is my WT plus 1 WT okay thank you to everyone crazy Hannah that's what you got all right good so the second question is well what if this point doesn't actually have to be correct now right it could be it could still be wrong but there's no guarantee that after one update you actually get the point correct so imagine that we visit the point again how many times could we possibly get the same point wrong I could it be that we just keep updating there's one darn point we never get that right right yeah that's exactly right so let me just derive this so you basically say my you know W transpose X what does that become that becomes W plus X transpose X okay then we know this is less equal zero all right so if we make a mistake until we make an update and it's still wrong let me make another update what would do that would that be well that would just be adding X one more time okay so let's say after K updates we've added X K times 2w okay and we're still getting it wrong so what does that tell us about K but if we solve this what we get it's W transpose X plus K times X transpose X is less equal zero and what does that mean what we can solve that for K that means K is less equal minus W transpose X divided by X transpose X so that is so this is negative right so that's a positive value stability shows there's an upper bound how many times you can get K wrong right so that is some positive number and after that we have to get X right right so there's only a finite number of times that we can get this particular point wrong right that's very encouraging that means we're doing something right right so Z right this is just if you just look at this one data point I agree so if there's good this is just to simplify the argument right now right the proof that we will do in a few minutes will be over entire data set but you're exactly right so well at least we showed something right so if I give you one data point and I show it to you repeatedly eventually it will get it right right I mean I said you know it's a minimum requirement but it means it's learning something yeah if you're exactly the opposite right what would happen so here's my zero this is my W and then my X is actually exactly say X is minus W right so my X is here but what happens if you add that right you actually get the zero back and then the next time you would actually get exactly this vector oh if it's zero if it's zero it's actually still then you get exactly the equality here right so when it's zero you say it's still wrong it has to be on the right side if something lies on the hyperplane you still consider that wrong and in part you need that because you initialize with zero so initially you classify everything as zero right so it's actually a common mistake and people implement them it's perceptron so you must have less equal otherwise it immediately converged I mean it's easy to detect you all right any more questions yeah sorry moved it now another way to come at them no no you don't have to the origin is always zero no no no no and that's just one more dimensions you just them high dimensional space right so that's just there's some dimension in which the vector actually X really lies in that you know one off I guess some dimension up in the hype in the into the blackboard whether the vector X lies lies one remove no you could have two two-dimensional you could have one dimensional data but in this case yeah so this I mean in this case you would always have a third dimension if you did this two dimensional ya de ya this was basically he can uh give one dimensional data and I add a second dimension okay all right who's ready for the proof someone at least one person good all right convergence proof so this most but what made Rosenblatt famous and you guys are here in Cornell half a century later has changed the world we live in alright so the question is why does W why does the perception algorithm always give us a separating hyperplane if it exists right and so we won't get through the entire proof today what I really want you to do is they be novels will get started with a set up and what I really want you to do is read through it before Wednesday it just you know one evening maybe not Valentine's Day you know but you know maybe there's snack we read out through the proof and just make sure you understand the different steps right it's okay if you don't understand everything that's what the lecture is for but it really helps a lot if you've looked at it beforehand okay so here's what we assume right the assumption of the perceptron is that they exist the W that can separate it we call this w star the exist of U star such that every x and y element of our data set we have Y W transpose X is greater than 0 all right that's our assumption now we're not saying we will learn that w start and we will learn a separating hyperplane but what we will show in the proof is that we will get closer and closer to the W star and then eventually will be so close that we will classify everything correctly that's kind of the idea all right so so far that's just the assumption that's just what we know the foundation of the perception algorithm and now we say okay well we know thus W star exists so one thing we don't want to deal with is a scale of W actually W star is separating hyperplane well then actually five times W star is also a separating hyperplane all right you just rescale the hyperplane right it's B just rescale about any positive constant it's the separating hyperplane so clearly there's infinitely many separating hyperplanes right if they exists one so that's not very interesting right so in the picture that bassy see it says the following if you have our X's here now O's here here's our hyperplane well there's one that has you know error like this another one has an arrow that's twice as long right that's exactly the same hyperplane so we just fix one scale so we say be rescale w star such that W star the norm equals one okay that's not a limitation we can always just take any W star divided by its own norm and then they get exactly this okay this is we just assumed W star equals one and we know that exists because they exist a hyperplane and now we can do something else we can save all let's we scale our data so that's somewhat a trick so basically this is just to simplify the proof later on all right so basically what we can say is well we can always just take our data in multiplied by a constant right the same you multiply this w star by constant so what we assume is that for every X I the norm is less equal 1 for all X for all our the Xin biiian artists so all our data points right hola these are feature vectors we assume that they're norm is less equal one can you always do this that seems can fish right so this was still I think people bought this one do you buy this one raise your hand if you buy it all right good some people have concerns you should have concerns actually this is it is totally kosher but so here's basically what we can do we can say busy save all so month in some sense basic but you could think about it you can just take the entire data set rescale it to make shrink it find a separating hyperplane and at the end expand everything again right so you've easily have some magical device you just take this data set we make it really small are you just multiplied by a small constant we find a hyperplane here now we take all of this all the points and all the hive grain and modify it back alright so we busy month we multiply by alpha and then meet multiply 1 over alpha so you can always go back between these two - just a rescaled version of the other it doesn't change anything does that make sense raise your hand if you'd buy it now another way of looking at it is if we have so in order to I have all my axis alright the base e what I'm doing is I take my largest X that's like in some X has the largest you know let's say our mica equals max over X right sister Omega is the largest norm of any of my data points and now I say everything a data point becomes X I times 1 over Omega so I rescale it now this is satisfied right because the largest X now has exactly the norm one then everybody else is a smaller norm and but this is a non-negative constant so I have this one hyperplane here that says everything you know this is my MA my separating hyperplane every point lies on the right side if I just multiply this with 1 over Omega this still holds right because that's non-negative so I'm actually not changing anything in the same way that can rescale W I can also rescale my axis and just to make things simpler I now basically take my data and I rescale it such as everything is within a within norm one another way of thinking about this is that base you have my axis in my nose and I basically rescale and let's add a on a circle with size one I did that that's the idea and they'll be only doing this because it's convenient during the proof right and this does not not infringe on the generality of the of the result let's leave it here please read over your over the proof you can maybe you can make out of it at valentine's power all right think about it it can be very romantic if it's read the right way

Info

Channel: Kilian Weinberger

Views: 23,649

Rating: 4.9794869 out of 5

Keywords: machine learning, cornell, course, perceptron

Id: wl7gVvI-HuY

Channel Id: undefined

Length: 49min 56sec (2996 seconds)

Published: Mon Jul 09 2018