Lecture 1 | The Perceptron - History, Discovery, and Theory

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I normally ask people to make little tags and keep them in front of themselves so I know their names but considering that this classroom doesn't have desks we cannot do that so if I point to you I expect you to identify yourself introduce yourself not just today but for the next several classes because of the probability of one I'm going to forget everybody's name as soon as you tell me right I have to keep practicing it till I begin remembering your names I expect all of you to know who I am my name must be familiar me and some of the tiers do we have any of the tiers over here Wendy and one other person the only person in class who is allowed to keep their laptop open currently the only people are the TAS and today for today's volunteers Veda and out there who's going to be following the class following Piazza in order to forward any questions that might be posed on the streaming by people who are following the strap the video stream online all right so let's begin I assume all of you have actually looked at the logistics lecture but I'm going to go over some of it anyways but even before that as a quick intro here's something we're all familiar with you've probably seen this slide before already neural networks are pretty much taking over every aspect of AI these days in most problems that you can think of the state-of-the-art currently it has has been established by neural network so here are some historical highlights speech recognition we've been trying to perform speech recognition automatic speech recognition since the 50s hour maybe even earlier and the performance of automatic speech recognition systems until about 2015 left a lot to be desired although we did have things like Siri which were pretending to perform speech recognition more often than not they they were more useful for humorous purposes than for actually actually for anything useful but then something in 2016 Microsoft came up with this note the date October 20 2016 so we had this article which said Microsoft AI beats humans at speech recognition this was on a specific task it did not generalize but it's been three years since then and guess what things have gotten exponentially better these days an automatic speech recognition system can outperform you on most tasks machine translation observe the date over here this is is there a date there must be a date November 15 2016 these things are coming really close together right so if you used Google Translate in October of 2016 you translated something from English to Spanish took the Spanish string that came out and translated it back to English what came out would be would have no relation to what went in it was it used to be very humorous in fact we used it for playing jokes and then something crazy happened in November 2016 one fine day you just woke up you tried Google Translate and voila the thing actually worked it gave you useful translations these days you can use machine translation systems do you mind shutting your laptop so these days you can use machine online machine translation systems and they do an excellent job in fact very often they outperform human translators so what happened in November 20 November 2016 Google switched from using their statistical machine translation systems to something based on neural networks similarly with image segmentation this is a picture that I got from the web obviously everything you get from the web is true so this is correct right now trying to identify all the objects in an image to even segment out all the objects in an image was incredibly hard just a few years ago few thousand three years ago and trying to identify every one of these objects on top of segmenting them out or something quite impossible but today something like this where you have an automated system that not only segments are pretty much every object on the scene but can assign a tag to it these things exist they are powered by neural networks he has a beautiful little demo that I got I don't know if this will work again I got it from the website must be true but this is powered by neural networks you can see a system that's actually tracking detecting and tracking automobiles on the street as they drive down live and not only is it finding automobiles it's actually are telling you which model it is pretty amazing huh then what about games how many of you know where the world's first chess game that beat a human Grandmaster was built yes anybody else down the hallway beam huh right so this was again the this was the PhD thesis by a guy called Thomas Anant Ramon who was actually not working on chess at all he was trying to build processors for dynamic programming he had to try it on something so he decided to try it on chess and the next thing he knew he had a chess game that beat a grandmaster it was called deep thought then he went to IBM and then it became deep blue and he blows the one that actually beat Garry Kasparov so again for a very long time the notion of what it meant to be intelligent what it meant to be smart was identified by these really smart cerebral games like chess and it was assumed that computers could never really beat him and said the best humans had chess and then along came 1995 and of course you had AI systems beating pretty much everybody all the time but then there was a tougher game go and go is exponentially exponentially harder than chess in a typical chess game there are 10 raise to about 120 sponsible states and I think goo has 10 raise to 200 so the state space is much larger it was expected that a machine could never really learn to explore that kind of state space you need human smarts and then in around 2016 you had alphago which beat the human champion Lisa at all I'd go for the first time and those charts over there now tell you I also go to quite a lot of training it had to be taught how to be how to be the human but this was a neural network based system these days you can pick up alpha 0 and alpha 0 holster we'll start from scratch without knowing the rules of the game and in a few hours it will teach itself to beat the best game automatic game out there it can beat out of a go it can be tough because successors which are even better than alphago and again this is powered by neural nets so you can you see where this is going pretty much everything this is an old paper these are all images and you can see captions underneath the images those captions are provided by neural networks and you can see they're pretty amazing right girl and pink dress is jumping in there that's that's accurate and that has been that has been generated by Network black and white dog jumps over a bar man and blue wetsuit is surfing on a wave I mean pretty accurate and all of these have been done by neural networks so point being that these systems are almost magic on a variety of problems including the ones you've seen down to astronomy healthcare the finance market neural networks are the way to go I like to joke that you know it's it's a common fallacy amongst people that you cannot rub your stomach and pat your head at the same time but this is powered by a neural network exist in his head anyway the biggest factor here is four years ago if you knew about how to code a neural network you wear a you know it was a hot item on your resume right these days if you don't know how to code a neural network and to make it really sink then you're not going to find a job so it's gone from being up close to being essential you better be good at this right so what are the objectives of power course we're going to try to understand your own networks we're going to comprehend the models to that do all of the previous tasks if they look like magic now they will not look like magic three months from now and you may even be able to build some of these you will be familiar with a lot of the terminology there's a link over here on the on the on the slides the slides are up on the chorus page and this this blog maintains a list of all of the latest neural network architectures I expect that by the end of the course you will be able to understand what each of those architectures is and maybe even implement them and basically the idea is that by the end of the course you will be fearless label to design build and train networks for various fairly complex tasks again keep in mind that this is this course title is intro to deep learning so it's an introductory course but don't let the name fool you it's an introductory course only at the beginning when the course ends you'd have gotten in so deep you won't believe it by the end of the course you will it will be quite amazing to you how far you have come and at that point the course will not be an introductory course anymore he will actually be not an expert but fairly close to one and the instructor that's my email picture at CS my phone number is nine eight to six I have not actually assigned listed office hours for myself on the website because I tend to be extremely in disciplined and part of this invisible in discipline means that if you walk into my office I'll talk to you I'm always accessible if there are people waiting in my office you just walk in and you sit down very often my room is so packed that you have to sit on the floor just wait I'll get you right we have an army of TAS at least two of who are here I see Wendy I just saw part come in part so we have 15 tears scattered across three campuses and these are your supermen and superwomen who are actually going to help you through the course most of them have been through the course organic and all of them have been through an equivalent of this course they know their stuff so they'll be able to help you officer as the TA officers are going to be on the on the webpage here here they're right T officers will be on the web page the course page of course is deep learning got CS got seniority so it's going to be easy for you to actually find a find it and all information will be on the course page most relevant logistics including the schedule are going to be on the backs course page there's a bar there wait yeah Aishwarya one of our TAS has put up a video with the logistics I assume most of you have seen it if you have not seen it I recommend that you actually see it because all the rules for the course are in that video so you really must see the lady over there do you mind sharing your laptop there's a strictly closed laptop's class right and if you have not seen the logistics some of the this will be may be new but hopefully everything that I'm going to say in the next five minutes is something that you're already familiar with we have in class sections this class has several sections this classroom of course typically at the classroom tends to be much larger than this but I chose a smaller classroom because one of the magical things about the fall semester is that as the semester progresses the days get shorter and it gets gets colder we also have online section online sections all the lectures are being streamed the streaming link is on the course page the lectures are also recorded these videos are going to be on media texts Panopto and they will also be on YouTube which means that you can watch these videos offline now while I do not recommend it in fact I strongly do not recommend it I recommend that you come to class every single class what I expect is that most of you are going to ignore my recommendation so by the end by the time it's nothing but nobody wants to wake up at 8 a.m. so you'll maybe crawl out of bed at 8:45 I'm imagining this and then roll to your computer and watch the video that's not what happens you wake up at 10 o'clock say oops it was because you know we had a class then go to the web page and try to figure out of the videos on the web page and then say ok it's too long you played at 4x speed I'm squeaking and then you go and answer the quiz and your failure quiz because if you don't actually what's the videos he will not be able to pass the quizzes so it's important that you actually watch the lectures even if you think you know the topic we have 13 recitations we may have a fourteenth if required there's a tentative list of all the topics on the course page I'm saying tentative because we tend to we tend to be nimble on our feet depending on how the course is couldn't be progressing we might change the schedule to add or substitutions these will cover implementation details and basic exercises you it is important that you actually attend all of these recitations again the idea here is that you become near experts on the topic and if you miss the recitations that's not going to happen right the topic list is on the course schedule quizzes we have 14 quizzes one every week will retain the best 12 quizzes and so which means that if you get each of these quizzes is a multiple-choice test that that will be up on canvas and if you manage to score perfectly on the first twelve you don't need to take the last two but nobody scores perfectly on the first one there are four homeworks each of these has two parts one is on auto lab the others on Kaggle you compete with each other on Kaggle which means that don't cheat somebody if you help your friend cheat your grade goes down because we're going to be we're competing you're competing remember so we we recommend collaboration we recommend discussion but at the end of the day we want everybody to do their own stuff so it's perfectly okay to share all kinds of information what it's not okay to share is work you have to do it yourself right and we've put up some practice homeworks over summer hopefully some of you at least some of you have tried those if you haven't I recommend that you do because they'll get you it'll get you warmed up right and hopefully you've also seen recitation zero which is in three parts the last official end up on the yesterday we haven't please go over all three because this will really really happen now the slides that I'm showing in class are not the entire set the actual slide deck has many more slides more often than not so even after class you must download the slides and go over the slides to get to cover the entire material now I'm very cunning and evil so most of the quizzes most of the questions in the quizzes will be from the slides that I don't present in class so not all because I want you to be in class so some of those questions will be on stuff that I say in class but it's not on the slides and here's my chicken slide you've all probably seen it the course is a lot of work just to emphasize it's a lot of work right in fact it's a lot of work it's not for chicken right that's how tough it is and we pile it on for the best of intentions you'll hate me two months from now you'll love me four months from now simply because if you stay with it at the end of the course you're going to be really happy with where you are the evaluation is based on mastery I am NOT we have timelines we have deadlines like every other course but timelines and deadlines are for administrative purposes if you don't get things on done on time I will not be able to score you and Grady before the semester ends so nobody will actually pass the course so I recommend that you actually follow the strict duty timelines as strictly as you can but our objective is to make sure that you are actually really good at the course do you mind switching off your phone you in the middle yeah thank you I'm gonna pick on you if you do this in class this remember this right last semester I think half the class Gordon some sounds some version of Annie maybe slightly more than half the class God some version of Annie nobody got a free grade it was hard every one of the students who got to record and he deserved it my point in saying that that over half the class got some version of an A is to tell you that it's possible provided you put in the effort and I expect that you will and if you have any questions please post on Piazza Piazza is monitored pretty continuously over the past three semesters Artie is like I've told you that they're supermen and superwomen they've done a wonderful job we've we maintained an average response time of less than 12 minutes cover four semesters tears have won awards for their work in this in this course so even more than me or the content of them the content material of the course our tears are famous which means use Piazza freely you can expect to get a response right and before I close the intro I'd like to close with this little code from Rosenblatt's book in 1962 he actually quotes this book called perception and the representative design of psychological experiments by Eagan Brunswick which tries to define perception which is at least part of what we're going to be looking at and can anybody read it it's really long and complex perception then emerges as that relatively primitive partly autonomous institutionalized ratio morphix subsystem of cognition which achieves prompt and richly detailed blah blah blah blah blah blah blah blah blah and I think the New Yorker actually summarized it really nicely says that's a simplification perception is standing on the sidewalk watching the girls go by anyway that was just the introduction you're going to start off with the topic for the class but any questions I'm piada know anybody here the mic there is no mic in the class so I you're asking me to yell louder okay I'm happy to do that this thing is only going over the streaming link so yeah I'll speak louder can you hear me perfect okay anything else all right so what we've said so far is that we have this there's this magical box called a neural network which does all kinds of amazing things it can recognize speech it can play games it can assign captions to images what really is happening in each of these cases there's a box something goes in something comes up our voice signal goes in the transcription comes out an image goes in the caption comes out the current game state goes in the next game state comes out so what we are really trying to do is to figure out what's in these boxes what is really in these boxes now before we begin the first thing we have to realize that all of these tasks are fundamentally human tasks playing clever games very human speaking and recognizing speech very human assigning to assigning captions to images very human these are all human tasks that are power powered by the human brain so maybe we should begin but now here's one common architecture the brain which performs this magical diversity of tasks so if we want to have some kind of a common framework which can do all of these things maybe we should begin by trying to understand what the brain really does or even before that what does it mean what does cognition mean how do we actually and how does cognition work now any of you know what the statue is I've been very helpful I put a caption on the slide this is August and Rodin's the thinker and the guy that statue that represent us represents a specific character anyone know who that is that's Dante Alighieri right the Divine Comedy okay so here's what human cognition can do human cognition humans can learn humans can't solve problems they can recognize patterns they can create you can be having a shower and you can think and you can come up with ideas completely randomly you can cogitate so these are all these beautiful aspects of cognition that are worthy of emulation this is in fact when we begin speaking of ki these are the kinds of things we would like our system to do right so this is definitely worthy of emulation but for this first we've got to understand how humans work and the problem of course is the age-old problem is which is if the brain was simple enough to be understood we'd be too simple to understand it is a very beautiful code by Marvin Minsky but that doesn't stop us from trying so we've been trying for hundreds no thousands of years it turns out the earliest attempts are trying to understand how cognition works goes back to about 600 BC and it was in our around 400 BC that plateau actually came up with the first theories of cognition he based it he said it's all based on forming associations he came up with the theory of what was later called associationism and that was the dominant theory for how cognition worked for over two thousand years or till well into the early 1900's there have several famous names associated with her including Yvonne Pavlov who of course ran this famous dog experiment that pretty much everybody must has heard of right so what is this business of associationism the concept that became a myth is that our ability to think our ability to to to to recognize and act on recognitions to make to come up with inferences is based on associations so here's a simple example lightning is usually followed by thunder so you observe these things happen together happening together or ad you form an association between lightning and thunder thereafter if you observe some lightning you expect to hear thunder if you hear thunder you believe a lightning must have struck someplace even if you didn't see it so you've actually formed an association right so and if you think about modern machine learning techniques this sounds very crude but pretty much everything that we that we do is about forming associations you associate an input to an output these associations may be arbitrarily complex but it's still associations functions are associations right so the idea itself is pretty smart but then it's still very naive just saying forming associate you're forming associations doesn't quite explain it for a start even with the brain sure you have all these associations but we are not is told and how we need to understand that right and so let's go back to our favorite exemplar of something that actually does a really good job of intelligence the human brain we had to look into the human brain to figure out how these associations are stirred how these inferences are made and by the mid-1800s people pretty much understood what the brain was constructed how the brain was constructed that's because by that time we had figured out how to build fairly powerful microscopes you could take slices of the brain you could look into it and what what they found was that the brain is a mass of neurons you have many many many neurons how many we didn't know for a long time we just knew it had neurons and these neurons connect to one another every neuron is connected to by several neurons every neuron connects out to several neurons so clearly we have to look at this structure to figure out how the whole whole mental process works but just having the structure in mind knowing the structure doesn't quite help you you still need a computational model for this and it turns out that the first computational model for how the brain operates was proposed at this point part of 150 years ago in 1873 so the first neural network artificial neuron proposal for an artificial neural network was an 1873 BIOS I called Alex Bain who came up with who who wrote a beautiful book called mind a mind and body in 1873 where he proposes this idea so now why was this business of trying to figure out how the brain worked so complicated he had this one small object it was just one object but it was performing a plethora of different operations you had the same circuit doing different things and it was it was inconceivable to people early on that you could have one little can you open door yeah that you could have one little computational mechanism which would perform different completely unrelated kinds of computations and Bain came up with this idea that you know it's all on the connections if you connect and these these units just right then they can produce they can compute perform different computations they can produce different outputs for different inputs these days we just take that for granted but in 1873 this was a revelation so here is an example of a circuit he came up with if a and B fire then X fires if ANC fire in this figure Z fires if B and C fire Y fires so it's one common circuit but depending on what the inputs are you have different outputs he actually came up with and even more detailed models where he said the strength of the input can modify the output for example over here if the input is relatively weak then only rival fire because the input is being added three times in Y but if it's strong even X will fire so just the level of the input can change the output again these are things that we just take for granted now but in 1873 it was it was pretty remarkable and the fact that he hypothesized this model meant that he actually had to face a lot of detractors who thought the whole whole idea was very hokey and Bane not only came up with the idea that the information is in the connections and how the units are enacted he also came up with hypotheses for how a brain a raw brain could learn to form these connections so he said when two impressions concur or closely succeed one another the nerve currents find some bridge or place of continuity better or worse according to the abundance of nerve matter available for the transition he was actually predicting hebbian learning will look a bit C heavy and learning in a few minutes so it was a brilliant idea it was way ahead of his time and here's a famous quote by Bertrand Russell the fundamental cause of the trouble is that in the modern world the stupid or cocksure well the intelligent are full of doubt so and this still holds true right now in 1873 been postulated that they must be 1 million neurons and 5 billion connections just so that you can store two hundred thousand acquisitions two hundred thousand of these perceptions or concepts but then after a while after ten years if he realized that you know if there's a learning process then there are partially learned concepts partially learned acquisitions where do you store those and now that's going to need more than two you know 1 million neurons and two Billy 5 billion connections the more he thought about it the stupid I had sounded and it was at all himself there were a lot of people telling him he was stupid and so by 1903 he decided he really was stupid he said I was wrong I'm sorry please forgive me and he died and now except he was right so fortunately the idea stayed and this idea that it's the that the magic lies in the connections stayed on this is the so-called connectionist framework connectionist models for computation the again the earliest hypothesis that the brain is a connectionist machine where the information lies in the connections comes from bein in 1873 there were other reports later on the idea is that neurons connect to other neurons the processing capacity of the brain is a function of these connections and connectionist machines emulate this structure now why was brain Bain wrong it turns out that he simply completely underestimated the scale of the brain does anybody know how many neurons that are in the brain anyone there are about 80 billion neurons that's a lot of neurons how many connections any one trillions so the number of connections in that little thing you got on top of your neck it has a trillion connections and eighty billion neurons now imagine building this in silicon if you are building a little processor with 80 billion units and one trillion connections how much power would it take anybody want to guess you're probably going to have to build your own little you know power generator with it's probably with your own little power station just to power it but you power that little that massive computer in your head on a slice of bread stop and think about it right so that's what B that's what been completely underestimated but now are we have sort of come around to the idea that connectionist machines are a really good idea what are connectionist machines these are networks of processing elements all world knowledge is stored in the connections between the elements and connectionist machines differ from your standard computer in your smartphone in your desktop in your laptop what is the architecture of the the processor in your laptop anybody now as the von Neumann architecture what is that anyone else anyone else who can tell me what a what a one human architecture is how many of your electrical engineers and how many of you crcs grads come on only four what is the rest of you what is your major or do you just are not like raising your hands I think you don't like raising hands okay anyway so here is the if you don't answer my questions I'm going to stick my finger out and point that say you and then you're going to have to have to answer my question just get used to the idea so here's the how the modern computer works we use something that's called a von Neumann architecture not quite a von Neumann architecture but something very similar which is that the processor is separate from the memory so you have a processor you have a memory and you can store programs and data in the memory even within that you have different different designs the so called Princeton architecture had one common memory for both the both the programs and the data the Harvard architecture architecture which is more efficient has a separate memory for programs separate memory for data and the machines that we actually use these days as some are some kind of a hybrid where within the processor it has a harvard architecture so that you have a different segment of memory for the program a different segment of memory for cache but then when it goes out to main memory it's actually the princeton architecture because it's one common memory where everything is stored so but the point is this architecture which separates the memory from the processing as watch what gives you this extremely versatile processor or computer that you currently have because you can just take the same block you can change what's in memory you can get different programs you can defer get different kinds of computation the problem with connectionist machines is that the machine is the memory the machine is the program if you have to change the machine program you have to change the machine because the program is in the connections it's fundamentally different in the way it actually does its operations so a quick recap everything that we've looked at so far neural network based AI has taken over most a eye tasks they originally began as computational models of the brain or more generally as models of cognition the earliest model of cognition was associationism the more recent model of the brain is connectionist which is neurons connect to neurons the workings of the brain are encoded in the connections and current neural network models are connectionist machines which emulate this architecture questions anybody yes so the association now again we'll get to that right but remember here so how thank you how do connectionists connectionist models actually represent associations now as we will see in the next few next few minutes these can these are actually functions an association is just a function you give it an X it must associate a Y with it and so these now this connectionist architecture can actually learn these kinds of representations if you are hopefully I will answer your question in the neck with some more clarity in the next few minutes right but let's go back to connectionist machines this is a network of processing elements the old-world knowledge is told in the connections between the elements now to understand how these would store associations or how they actually perform computations now what is an association if X happens Y must happen this clearly tells me that there must be some connection between x and y and triggering X must trigger Y or vice-versa right so but then - in order for this to happen we have to understand what the individual elements are and to understand them let's go back to our favorite model the brain the basic unit and the brain is the neuron it should look something like this the neuron has a basic cell a the the main body of the cell the soma which has the nucleus and a bunch of connections a little tendrils going out called dendrites through which you have incoming connections signals from other neurons come in to this neuron through those dendrites if the cumulative signal coming from are coming in from other neurons exceeds a threshold then the neuron sort of trips and it fires and this information goes down this long connection called axon and there's only one X axon per neuron and then eventually connects out to a bunch of other neurons so obviously the most important part of the axon of the neuron is the axon you snip it the neuron is not doing anything so the axon is actually protected by a little G for a sheet of fat called the myelin sheath which is actually formed of something called glial cells and so here's something really magical your head is mostly fat in fact the more fat there is in your head the smarter you are so if somebody calls you a fat head say thank you right the aqueous they actually stored einstein's head and people try to figure out if he had why he was smarter turns out he doesn't have more neurons than the average person what he has is more fat in his head he has more glial cells so being of being called a flat if acted as a compliment here's something else that happened that's really interesting neural neurons don't don't divide they don't have undergo cell division which means whatever neurons you're born with in your brain that's it after that maybe you get a small number of neurons because and you may have additional neurons being formed from neuronal stem cells very early in childhood but beyond up beyond maybe 2 years of age you're pretty much fixed after that the only thing that happens in your brain is that your neurons die which is a good thing and we'll see we'll see why ok but anyway so he had this little neuron we need a computational model for it the first first team to come up with a decent commit computational model for this where these two guys Warren McCulloch and Walter Pitts one of them was a neurophysiologist the other one was a homeless guy a hobo who wanted to be a logician and I have both their pictures on the screen who wants to guess who's who's the homeless guy and who's the physiology love physiologist anyone anyone want to take a guess yes the one to the right is the homeless guy no he's he's he's Warren McCulloch he was the noodle neurophysiologists he's dealing with some wine the guy to the left who went on to become a professor in MIT his is is Walter pets he was a homeless guy he ran away from home at the age of 15 and ended up at makalah's door wanting there he he used to exchange mail with Bertrand Russell and other mathematicians he was a homeless guy he hadn't actually gone through formal schooling but then Michael oh let him in and they worked on modeling the brain and and Pitt's was 20 years old he was a mathematical genius they came up with this paper which I cannot understand even today which is what 80 years later but I think most people who understand the read the paper cannot understand it what we the take-home lesson from this little paper was the model they came up with which said that a neuron can be modeled as something that's shown in the figure a neuron has incoming synaptic connections and if the total tour the total signal coming down these synaptic connections exceeds a threshold the neuron will fire unless and they have a second kind of connection which is the which is the inhibitory synapse if there's a signal on the red line they inhibitory synapse then regardless of what else comes in the neuron will not fire and just this very simple model they showed can actually compose all kinds of boolean functions so here for example if the figure to the top left in every single neuron over here the threshold is 2 so you have a signal coming in from one which connects to the second neuron through two synapses in the top left figure so anytime one fires a little later two fires because the total incoming signal is two so this is a delay the one to the top right again the threshold is to each each incoming signal is connected to the neuron with by two synapses so if either of the two signals fires then the total incoming signal is to the neuron is going to fire which boolean gate is that it's an odd right what about the one to the bottom is the bottom left what is that that's an and and what's in it okay and what about since since you have begun answering questions the bottom right what is that one and not two exactly right two cannot fire but one must fire so you can actually compose fairly complex boolean circuits using these abusing these basic units they went overboard though Mechelen pets they made tall claims they claim that their nets should be able to compute a small class of functions but that if you provided a tape it actually becomes equivalent to a Turing machine they claim it's Turing complete and obviously it's not because a very finite state machine these were just claims and they didn't actually prove any of these results they just these were there they just made claims and they didn't actually provide a mechanism whereby a network of this kind could learn to perform specific operations for that we had to wait a few years for Donald Hebb now Donald Hebb turns out that people in those days tend to be pretty eclectic so this guy he's not a crazy guy who he was actually a Canadian he started off being a novelist then he decided to become a hobo then he decided to become a farmer then he decided to become a schoolteacher then he gave it all up and decided to do a PhD in psychology and he came up with this idea of hebbian learning which says which which says when the axon of a cell he is near enough to excite a cell be and repeatedly or persistently takes back takes part in firing it some growth process or metabolic change takes place in one or both cells such that is efficiency as one of the cells firing B is increased now it's a very complex way of saying that if a neuron a repeatedly excites a neuron B then the connection between a and B will be strengthened he also said it very succinctly saying neurons that fire together wire together it's a beautiful way of saying it so how does that work so here's what the connection between neurons her typical synapse looks like the neuron that's connecting that to a second one has at the end of its axonal connection this little bulb and that bulb connects to the synapse of the second neuron it doesn't really connect they don't touch they have chemicals flowing between the tip and if neuron X I repeatedly trigger X repeatedly triggers Neron Y then the synaptic knob gets larger and so this connection gets stronger now you can actually make which means that it becomes easier for X to trigger why you can think of this as the weight of a connection between x and y and it actually gives you a nice little formula it says that anytime x and y both fire the connection gets stronger so you can say that wait everytime x and y fire together the weight increases a little I can write the for you in math I can say w equals W plus 8 a times X Y now X Y is going to be 1 only if both x and y are 1 which means both of them fire together now there's a problem with this learning mechanism can anybody tell me what the problem is yeah the connections only strengthen right they keep getting stronger and stronger nothing ever weakens so it's an unstable learning mechanism eventually what's in a you'll have to write it down for me so it's there's no notion of competition eventually the whole brain is going to saturate right now there were other modification modifications to later on which tried to account for the fact that this process is fundamentally unstable so called the like the so called generalized hebbian learning rule by Sanger which allows for multiple outputs and so on the point is the basic hebbian rule itself while it's a very nice idea and it will keep keep making its appearance again and again over the course in its raw form it's not very good so to improve this and come up with a real formal model which actually with nice mathematical properties we had to wait a few more years for this guy whose name we've seen already this is the guy who came up with his big court which the new yorker summarized very nicely right Frank Rosenblatt he was like all of them a psychologist and a logician he also came up with the number 42 in other words he came up with a solution to everything in the universe he came up with the perceptron so what is the perceptron it's a model for the neuron and it actually sort of embodies what we saw body about the cell it also includes some of the aspects of what Donald had proposed so he says that within any cell a cell has a number of inputs each input has a weight associated with it so the total input that the neuron gets is going to be the weighted sum of all of its input each input is multiplied by its weight all of them combined you're going to get a weighted sum if this weighted sum exceeds a threshold the neuron will fire so mathematically you can say y equals one if the way to a weighted sum of inputs exceeds a threshold so the summation WI X I is greater than T or some new summation WI X I minus T is greater than zero right or greater than or equal to zero depending on how you do it otherwise it doesn't fire so it's a very nice simple model and he was really excited by it he figured this little unit could perform any kind of computation and he managed to convince the world so the Navy actually pitched a whole lot of money in to him and the newspapers went overboard this is from 1958 the embryo of an electronic computer that the Navy expects will be able to walk talk see write reproduce itself and be conscious of its existence or the Oklahoma Times Frankenstein monster designed by Navy the Navy that thinks this little circuit down here right obviously it's kind of an overstatement and we'll see why now what Rosenblatt did along with coming up with this model is he even came up with a learning mechanism which said that if I want this unit to learn a specific function then I can keep providing it examples of input-output pairs at any time the output is not what I want it to be again mind you this is a boolean output right it either fires or it doesn't anytime the output is not what I want it to be then I can scale the weights by the difference between what the output must be and what what the output really is this is the famous perceptron learning rule and we will see this again in a few classes so we basically update the weights whenever the perceptron output is drunk he proved mathematically that this will actually converge and it'll and if the classes are can be represented if if a boolean function can be represented by this unit then it will actually learn it will it will converge to the correct solution and obviously this can model boolean gates so in these eggs in these figures the number inside the circle is a threshold the number on top of each line is the weight of a connection so the Garf connection to the top left now I'm using greater than or equal to for my logic so the the connection to the the figure to the top left that neuron obviously is only going to fire if both x and y file then the total input is 2 which matches a threshold it's going to fire if one of them is zero it's not going to fire right so that's it and the one to the bottom the threshold is one is going to fire the total input is going to be one if either of the two inputs this one so that's a not the one to the right the weight is now minus one observe that the weight is allowed to be negative so if the input is zero the total input is zero the threshold is the threshold is zero it's going to fire but if the input is 1 the total input is minus 1 the threshold is zero it's not going to be fire not verify this is a not gate right so clearly this can perform all kinds of boolean functions unfortunately this single unit doesn't solve everything not very long after a Rosenblatt made is really tall claim that he has solved all of the world's problems Minsky and Papert came up with this book the chapter in a book where they show that they can't model the simple XOR this unit this threshold unit cannot actually capture the XOR function so it's not a universal computing unit but then you can perform an XOR if you network them so the individual elements are weak computational elements but if you network them then you can actually perform stronger computations for the XOR for example if you connect them like so so now it's not a single unit that's performing the XOR it's a it's a combination of three units the first unit to the top it computes X or Y the unit to the bottom computes not X or not Y and then you end the ball and the two and the final output is going to be an XOR now here we introduce the terminology this these units the outputs of these units are not seen what you're really seeing is the output of the final unit at the end so the units in the middle are what we will call the hidden units or the hidden layer the key point again points that we've introduced are the notion that networked units can perform powerful computations and that these network units can have hidden elements whose outputs are not really seen but then once you get to this you realize you can compute any boolean function so this is what we call a multi-layer perceptron you can add these guys on up in layers you can do and you can do ours you can do knots you can do X or guess what you can do pretty much anything so here for example is a little network which computes a hideous function on top that I wonder that won't even bother to read but the network will compute it so the summary of everything we've done so far neural networks began as computational models for the brain there are connectionist machines which comprise connections of networks of neural units the early model was the McCulloch and Pitt's model where neurons were boolean threshold units it models the brain is performing propositional logic but has no learning rule hebbs learning rule actually have actually came up with the learning rule neurons that fire together wire together but his learning rule is unstable and so then Rosenblatt's perceptron which is a variant of the McCulloch and Pitt's neuron actually has a provably convergent learning rule but it's not universal but once you begin networking these units you can actually compose arbitrary boolean functions pretty much any boolean function at all questions none on Piazza no yeah I think Rosa it depends on the presentation so Rosenblatt I think came up with this model in 61 in 1968 he had Minsky and backwards people with hindsight 20/20 hindsight it must be obvious because we know exactly what this model does he will see this in a few minutes but back in the day when he just presented this threshold unit as his part as a gate and when I went to college we learned about threshold units it was astonishing to me the kinds of things a threshold unit could perform because it was presented as you know here is this magical threshold here are these weights as high dimensional space you can compute all kinds of functions it kind of hides the fact of what really is going on that it's just a linear threshold so it wasn't it wasn't immediately apparent anything else okay so we've seen that these things can be connected together and can do all kinds of perform magical boolean tasks right the problem is our brain is not boolean brain actually has real valued inputs it has continuous valued inputs and we make non boolean inferences and predictions it's not a one zero inference out of a one zero prediction right so if you want real inputs the perceptron with real inputs is going to look something like this where now the x1 through xn the inputs are real valued the weights are also real valued you're going to have these real valued connections and the neuron will fire if the weighted combination of these real inputs exceeds a threshold now you there's one way you can think of this somewhat differently I can say summation WI X I is greater than equal to T that's a logic right I can also say summation WI X I minus T is greater than or equal to zero which is the same logic right what I can write this as summation WI X I plus be some threshold function which is working where theta of x equals 1 if X greater than or equal to 0 right there all the same statement but once I begin writing it in this manner you realize that you can modify the theta over here and so that theta for example I could replace it by a sigmoid and if I think of it as a sigmoid out but you can think of the output as the probability of firing rather than just a binary firing or not but then you can also replace it with other generic functions we will go over these later in the course for now for the next few minutes let's stay with with the threshold unit because it's very easy to interpret what's really going on with threshold units right so let's go back to our perceptron our perceptron fires if the weighted sum of inputs exceeds a threshold now what is the boundary between minute fires and when it doesn't fire anyone what would the boundary be part of me no but what is the actual formula right summation X I WI X I equals T is the boundary when summation WX I go becomes greater than T it's going to fire when it's less than T it's not going to fire so that's a hyperplane that's the equation for a hyperplane in two dimensions that's going to be alive so if I had two dimensional input it's the actual function that you're going to get as something like this this is the line which is the boundary to the left of the line it's a zero then it's going to jump up to the right of the line it's going to be a 1 left and right being you know where you look at it from but the network is it's going to look like this function right so if you look at it from the top you're going to get that figure to the top right there's a line on one side it's one on the other side of it at zero the moment you see this you will know why you can how you can model boolean functions using this network using this unit how so the inputs are boolean the inputs if I have two inputs the inputs are going to be 0 0 0 1 1 1 or 1 0 right if my threshold unit if my my a my desk by the boundary the hyperplane by the boundary which separates the one from the zero falls in the region between the zero zero and the diagonal what is that gate the figure to the top left which boolean get is that that's an R it fires when I the only time it doesn't fire is when both inputs are 0 right what about the one to the top two one in the middle what is that that's an and both of them have to be 1 otherwise it doesn't fire but this is just your perceptron right the guy to the top to the right is a is it is a not why can you not do it do it do it do an XOR for an XOR you would have to you have these points for an X or you'd have to go up here these would be zero these would be one that's not feasible right that's why this cannot do an X on so anyway once you figure that you can do this you can get a linear boundary magic happens how does magic happen two dimensions I want a function that's going to give me a 1 if the input lies in the Pentagon but not otherwise it's got to give me a 0 outside how can I do this anybody yeah exactly right so here's what I can do I can have one neuron which captures this boundary it's a zero below the line one above the second one gets me this boundary the third one gets me this boundary the fourth one gets me this boundary the fifth one gets me this boundary if I add their outputs with a weight of one each then their sum is going to be 5 inside the Pentagon and in something less than 5 outside if I use a threshold of 5 what is the output going to be exactly a 1 inside the Pentagon exactly as you get outside right now I can do this right here's a crazier function I want on one inside one of the two Pentagon's zero outside guess what I'm going to have one subnet in reading one Pentagon another subnet generating the other Pentagon the two outputs are going to be summed I have a threshold of one it's going to fire regardless of which Pentagon it's in but now this is a much crazier boundary right and if I can do this boundary I can do these things all right how would I do it anyone how would I do a boundary like this come on yes you just I don't need a lot of layers right I can do something much simpler I have my crazy guy boundary I can chop it up into little polygons I can have one sub you and subnetwork for every polygon and then or the lot and I get a boundary that actually looks like a human being I can do pretty much any boundary once I once once I figure this out right and which means I can get even more complex decision boundaries now think of a standard classification problem you have some high dimensional space in this high dimensional dimensional space you're trying to compute a decision boundary but what is the decision boundary it's some arbitrary outline and you want a one inside this outline and a zero outside we just saw that you can model pretty much any boundary using this network right which means the network can actually give you these kinds of functions so the story so far MLPs are connectionist computational models individual perceptrons are computational equivalence of neurons the multi-layer perceptron is the still layered composition of many of these units they can model boolean functions so where individual perceptrons act as boolean gates and networks of perceptrons are boolean functions but they are boolean machines in that they can represent boolean functions over linear boundaries and they can give you arbitrary decision boundaries right so they can be used to classify data regardless of how complex the decision boundary itself is because the network is capable of modeling it now what about continuous valued functions something like this how do you actually use the same network structure to model a continuous valued function for this example I'm going to start off with something that takes we revisit these topics in the next lecture but for this example I'm going to start off with a function of one variable so I want to compute f of X X is a scalar f of X is a scalar right now let me look at this little let me start off with this little basic network which has two threshold units one of them them has a threshold of t1 the other has a threshold of t2 I sum the outputs of these units so what happens if t1 is lesser than t2 and the weights are 1 and -1 respectively as the input starts until it exceeds t1 the output stays 0 the moment it exceeds t1 the first neuron fires for the second doesn't so the output becomes 1 when it exceeds t2 the second neuron also fires it cancels out the output of the first neuron that the output goes back to 0 so you get this nice little step function right now this little pulse that's enough now like you give me the arbitrary function I can put a whole pile of these guys underneath it and I can model pretty much any scalar function to arbitrary precision using these guys right so continuing with our story what we found is that multi-layer perceptrons are connectionist computational models they are classification engines they can identify classes in the data what I didn't get over is that individual perceptron scan our feature detectors we will get to do this in a later class but we can the model then the MLP can also model continuous valued functions so there are some bullets over here which I didn't think I'd have time to go over but the slides are there right so questions we've seen how versatile these things are at this point right what else can they do so far we've looked at things that go strictly left-to-right you give it some input things cut it compute stuff the outputs are fed to more units which compute something those outputs have fed two more units which compute something it goes down you know it's a linear computation but then Lauren Scoobies in 1930 actually showed that if he had units of this kind and he connected them in loops then they actually gave you memory things that networks that will remember now which means that if you have a network of this kind you trigger it with a part of a memory it can recover the entire memory we are not going to get to this topic until much later in the course but these are not just simple machines which can compute boolean functions classification boundaries and real valued functions they can even remember right they can represent probability distributions they can represent probability distributions over integers they can represent probability distributions there were real inputs or complex valued inputs they can model a priori and a posteriori probability this all of these we will see in class but the point is the versatility of these models is pretty immense and yet the basic unit is very simple it's just a threshold unit or some modification of it and you connect them up right so an AI as far as we are concerned the neural network is a function what we've seen when we started the class we said you know here are all these boxes something goes and something comes out what is it that relates as something input something to an output something it's just a function what we've seen also is that these little networks of these threshold units or even things that don't necessarily thresholds can model functions they can model almost any function at all right so a network is just a function it given an input it computes the function layer wise to predict an output more generally given one or more inputs it predicts one or more outputs so over here these tasks the tasks that we began the class with their test functions right every one of these is a function an image goes in a caption gov comes out what is that that's a function a voice signal goes in our transcription comes out what is that that's just a function a game state goes in a predicted prediction for what move to make comes out that is also just a function so in fact the way we model all of these these are just functions we are going to model every one of these things with a neural network which will take in some input some appropriate representation of the input and give you an appropriate representation of the output from which you can perform downstream tasks and so our story so far this is I'm coming to an end this is my second to last slide ml peas are connectionist computational computational models they are classification engines they can also model continuous valued functions and most interesting AI tasks are functions that can be modeled by a network so in the next class which is on Friday we will continue on the same vein we will talk about networks as functions we will talk about the limitations of networks when they try to model functions and we are speaking of deep learning right so where is this notion of deep come in and why is it important we will also cover that topic questions
Info
Channel: Carnegie Mellon University Deep Learning
Views: 21,330
Rating: 4.9430199 out of 5
Keywords:
Id: VO5vKowfMOQ
Channel Id: undefined
Length: 69min 12sec (4152 seconds)
Published: Wed Aug 28 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.