Lecture 1 | Introduction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome all welcome again to 118785 introduction to neural networks this is my second attempt at recording this lecture because the recording today morning in the class had some audio problems because this is attempt number two i will skip over the first couple of slides which have to deal with logistics only hoping that you have already gone through homework zero that you have set up your aws accounts and that you're registered for piazza just one note on the slides for this lecture and throughout the course the slide decks will have many hidden slides that will not be shown during the lecture but they'll nevertheless be uh be featured in your weekly quizzes so it's important for you to download the slide decks and go through them even if you have been at the lecture because they contain information that was not presented in the lecture so without further ado let's begin why are we in this course that's because neural networks are taking over they have become the main approach for ai problems by this time they have been successfully applied to various pattern recognition prediction and analysis problems and in many of these they have established the state of the art often beating benchmarks set using earlier machine machine learning techniques by large margins in some cases they have even managed to solve problems that you simply could not solve using earlier machine learning methods so what are some examples of these amazing achievements of neural networks speech recognition researchers have been working on the problem of automatic speech recognition system speech recognition technologies since the 1950s and yet after almost 70 years of research the state of automatic speech recognition in 2016 was pretty abysmal you typically use siri more to make fun of the speech recognition output than to actually do something useful with it then in late 2016 we had this breakthrough when microsoft announced this result that for the first time they had managed to develop an automatic speech recognition system that outperformed human beings on a speech recognition task this was the switchboard task which consists of people calling each other up and speaking about random topics and since then the technology has only improved these days asr systems generally outperform humans in speech recognition on most benchmarks and have become a stable user interface and that's because these systems now use neural networks this was not possible without without neural networks or machine translation machine translation has been one of the holy grails for artificial intelligence you always wanted to have systems where you could enter english and get its translation in french or enter german and get its translation in chinese and these systems have been really hard to build as late as 2016 if you went to google translate and you entered some english and translated it to spanish took the spanish translation put it back in google translate and re-translated it back to english what went in and what came out would not even look similar and then something surprising happened in the middle of november 2016. one fine day you went to bed after having used google translate and got absurd results and then you woke up in the morning and google translate was suddenly doing a near perfect job what happened google had suddenly switched to using a neural network based machine translation system and since then the technology has just continued to improve and these days even professional translators will first put their put the text into an automatic translation system to get a first cut that they will then fine tune that's how good these systems have become and that's because of the use of neural networks here's yet another example image segmentation and recognition this is a rather complicated image it's got all kinds of detail in it it's got people it's got bicycles these people have bags there are buildings there's a push cart and a neural network has analyzed this picture it has managed to identify detect every one of the objects and not just them but also to identify them so it knows that this is a bicycle that this is a person that this is a bag the person is carrying that this is a person there's a push cart this is a building so this level of detail uh could simply not have been achieved or we couldn't have come close to achieving something a result of this kind using earlier methods but a neural network can do this now to be honest i'm not sure of this particular image because i got this off google but this kind of result is readily achievable using the latest neural networks or here is another example this is from sighthound.com and they claim that they have the system that can recognize vehicles on the highway so this one detects cars oncoming cars and not just detects them it also identifies them so this is an infinity this is a kia sportage here are all these cars that are being detected and recognized and it's not just for oncoming cars even for things that are going away let's just wait a couple of seconds and you'll see that portion of the video here here it comes and it's not just on the main lanes over here even on the far lanes it's able to detect cars and recognize them that's how accurate system is and this is achieved using a neural network or here is another example games now for for a very long time it was assumed that the real test for ai was thought based games like chess or go and people devoted an extraordinary number of uh of research hours on trying to build ai systems that could beat human beings on such games in the 90 in the in the 1980s your typical automatic chess game was pretty bad it could barely beat novices and then towards the end of the 1980s there was a remarkable breakthrough here at carnegie mellon university when feng shuing sue and thomas anand built the first chess game automatic chess game to beat a grand master thereafter sue went to ibm and he went on to build the blue which beat the then world champion gary kasparov and afterwards no world champion has ever managed to beat the best automatic games at chess so we managed to beat chess in the 1990s but ghosts was was a bridge too far and that's because while a test game has only 10 raised to 120 possible states go has 10 raised to 180 and so it was assumed for the longest time that you needed human-like intuition to actually be able to play this game really well and nobody imagined that an automatic game could beat the human champions could go and then in 2016 we had this event where alphago beat the then world champion we set off at go and subsequently the technology has only improved these days you can alpha zero can play i hope can uh start at zero and begin to learn the rules of the game on its own and in just a few hours it can learn to become so good that it will not only beat the the human champion it will also beat the previous best automated systems for go and this too is thanks to deep neural networks or here's another one image capturing here are a number of images so these images have captions man in black shirt is playing a guitar construction worker and audience safety vest is working on road or boy in blue wetsuit is surfing one way a man in blue wetsuit is surfing on weight all of these captions are very correct and these have been generated using a neural network and the neural network itself has worked just on the images and has received no additional input now this is from 2015 or 2016. the latest systems are far more accurate than this and it's easy for you to imagine that these captions were actually generated by a human being so all of those were were prediction or classification or description tasks but neural networks can do a whole lot more these they have even become good at what used to be essentially be an essentially human uh human phenomenon which is imagination so here are a number of images these images of images of faces that were downloaded from this person does not exist.com and every one of these photograph looking images is not a photograph it's a picture of a person who does not exist and this was generated by a neural network so neural networks can can even uh create new nuclear new pictures they can paint imaginary faces they can draw landscapes they can generate music they can write stories and often they can do it better than a lay person can and that's how good they have gotten so and many other such problems ranging from astronomy to healthcare and even stock prediction neural networks are amongst the best credit performing models that are that are used by investment firms these days and in fact most investment investment firms have entire research divisions uh devoted to developing new neural network models to predict the stock market and so basically neural networks can do pretty much any intelligence task for almost any intelligence task these days including so uh there's a joke in india that the most difficult thing a person a person can do is to try to rub their stomach and pat their head at the same time you should try it it's really hard and believe it or not there are even competitions for this kind of silly game but if you were to build a robot powered by a neural network almost certainly it would learn to do this in no time at all so neural networks are amazing things but here is the biggest reason why you should be taking this course it used to be that if you knew how to to how to build neural networks and how to apply neural networks to business problems this was a brownie brownie point on your resume it was a good thing these days knowledge of neural networks has become an essential skill if you do not have the skill you will have trouble finding a job if you want to do any kind of data science or machine learning based job so it has actually become an essential skill anyway so what are these magical neural networks what are these strange machines that just performed all of the tasks that we just saw now from where we stand at this point in the course they're just these magical boxes where things go in and other useful things come out for example a voice recording goes in its transcription comes up an image goes in its text caption comes out a game state goes in and the recommendation for the next move comes out but what's in the boxes to understand what's in the boxes let's begin here all of these tasks that we just looked at recognizing speech captioning images playing complicated games all of them are fundamentally human activities powered by human intelligence so perhaps the place to begin is with the seat of all human intelligence the human brain or even earlier our amazing ability for cognition our ability to think to analyze and to create so let's begin with that here are all of the magical cognitive capacities we've got we can learn we can solve problems we can recognize patterns we can create we can communicate we can just sit in one spot think and reflect and imagine and dream without external stimulus these are the abilities that make us who we are so if we want to build a machine with human-like capabilities these are the ones that we must emulate but then how exactly do we achieve all of this how do we function how does the brain perform all of these activities the problem in trying to understand that is that is its best encapsulated best best described by this quote from marvin winsky if the brain was simple enough to be understood we would be too simple to understand it the issue is that the brain is an immensely complex organ it has to be in order to generate the kind of intelligence we have and if it were not that's that complex if it was simple we simply wouldn't have the intelligence to understand it but then that hasn't stopped people from trying people have been curious about human cognition for thousands of years and in fact the earliest models for it go back to at least the greeks to plateau and aristotle the earliest known model for human cognition is associationism and we know that that theory goes back to plato in 400 bc so this picture here is the famous school of athens painting where a pile it shows many of the scientific grades way back from the greek times until the 15th century or about the time when this was painted and these two characters in the middle these are platinum aristotle associationism was in fact originally proposed by plato and this was the primary model for human cognition until into the 20th century where scientists like david hume and evan pablo developing fairly complex theories around it associationism posits that the process of cognition is fundamentally one of learning associations so what are these famous associations these are literally associations we somehow form between persons by percept i mean the various stimuli the various various concepts that that that we recognize like the concept of lightning or the concept of thunder and associationism states that we form associations between these percepts so from your experience we will find that lightning is generally followed by thunder and then we make the association not just out there so not just of the events but also of the temporal order of these events and from these we will make inferences so if you hear thunder you will assume that a lightning just dropped nearby so also if you see a lightning you will assume that you you're going to hear thunder so these are associations and the fact that the hypothesis that all of our inferences are based on these associations was something people had made 2000 years ago but but a lot of the research actually went into studying the associations themselves like this famous experiment by ivan pavlov and in general it was understood that that when we that we think through the process of association and are in the influences we make are based on the associations that we have formed it's actually a really good theory and we still use it explicitly or implicitly in machine learning but then how does this relate to the brain even if we agree that associations are how the brain works the question now is how and where are these associations stored in the brain there were several sometimes strange theories on how the brain actually stores associations for example if you go through the slides you will you will read up the david hartley's odd theory on vibra tonkans that percepts are stored in the brain in the form of vibrations within the skull and that associations are formed so ours are represented somehow as the interaction of these associations and there were many other such strange theories but then things began to get more real once we learned about the structure of the brain itself based on from the uh almost 300 years of advances in microscopy by that time by the mid by the mid-1800s we had figured out that the brain is essentially this mass of cells which are interconnected and by 1873 we even knew that these in the individual cells that that were connected within the brain had a very special structure the cells that we call neurons so the here's what the structure of the brain is now known to known as known to be like it has a large number of neurons and these neurons are all connected to one one another each neuron has many neurons connecting into it each neuron connects out to many neurons so the basically the brain is a network of neurons the first person to to make the association between this structure of the brain and how it works was this guy here alexander bing back in the 1870s 1870s in 1873 he wrote this well-known book called mind and body in which he postulated that the brain stores information through its connections that all the information regarding associations between persons or generally between stimuli and responses are all stored through the structure of the connections between the neurons this was in fact the first connectionist model being even came up with various computational models for his theory the key idea here was that the same connections if appropriately arranged could result in different outputs for different inputs this seems like an obvious thing to us today that you can have one circuit which can produce different outputs for different inputs but back in baine's time this was an astonishing idea it was heresy here's one of the examples being came up with here you have three inputs you have three outputs all of them are binary they take the value zero or one each of these circles represents a neuron and a neuron fires if all of its inputs are one and you can see over here that if a and b are both one then x fires if a and c both fire there's z fives if b and c both fire will fire so you have one structure where the uh the uh same circuit produces different patterns of outputs for different patterns of inputs being even included models where a circuit's output could depend on the intensity of input for example over here in the circuit you have only one input the output y obtains three copies of the input the output x obtains only two copies so when the input has low intensity y will fire but x will not if the input is high intensity both x and y will fire so uh the output depends not just on the pattern of the inputs but also their intensities these are in fact very modern neural networks only bain came up with them 150 years ago these models have been around for a very long time he not only showed the circuit structures but also proposed a learning mechanism whereby the brain or any computational model of it could learn these connections here its brains propose mechanisms for how neuronal connections could be made in his own words when two impressions concur or closely succeed one another the nerve currents find some bridge or place of continuity better or worse according to the abundance of nerve material matter available for the transition so this is literally heavy and learning which you will hear about shortly but he's saying that when two neurons which are close together fire together then the connection between between them gets strengthened this was the same theory that donald had came up with 75 years later so bing was 75 years ahead of even donald trump sadly in bane's time all of his ideas were considered just too outlandish and his peers thought he was talking nonsense they doubted after the extent where eventually he began doubting himself so the issue as uh bertrand russell the famous mathematician burton russell says is that the fundamental cause of the trouble is that in the modern world the stupider cockshirt while the intelligent are full of doubt and being doubted himself in 1873 being postulated that there must be one billion neurons and five billion connections to capture about 200 000 this seemed like a very large number but then by 1883 he uh realized that he had not taken into account the number of partially formed associations and to account for these we need even more neurons and connections so by 1903 he decided that our brain was simply not large enough and that it would need too many neurons and connections for his model to work so he recanted and went through all of his ideas he said this just doesn't make sense sadly this was why this was where he was wrong he was correct in 1873 and wrong at the end our brain is very large it has about 80 billion neurons it has about 100 trillion connections that's comfortably more than any capacity being assumed was required and his model is in fact more or less how we think our brain works today and so connectionism lives on we know by now that our brain is a connectionist machine as predicted by being in 1873 and by david faria in 1876 it is a mass of neurons it's a large collection of neurons which are all connected to one another neurons connect to other neurons and they are connected to by other neurons and the processing capacity of the uh and information capacity of the brain is a function of these connections so modern modern artificial neural networks are connectionist machines which emulate this architecture so this is what a connectionist machine is it's a network of processing elements where you have a very large number of elements which are connected to one another and all world knowledge about the task that the machine is trying to perform is stored in the connections between the elements and so modern neural networks are connectionist machines consisting of many little processing units connected to one another now this differs from the standard kind of machine that you use on your computers or on your smartphone those machines follow those processors follow what is known some variants of what is known as the one 9mm architecture or the princeton architecture and the princeton architecture you have a processor which is separate from the memory and the programs and data are stored in the memory and they are accessed by the processor to perform computations extension of this is the harvard architecture where the program and data have set separate uh regions in the memory whereas the the other variant the one neumann architecture actually assumes one common memory for both of them this difference is kind of immaterial in fact in modern processes the processor core tends to be harvard architecture but the main memory is princeton in any case the key feature here is that the processor is separated from the memory which holds the program and data so if you want to run a new program all you have to do is to upload a new program into this memory and voila you're set to go that's why your typical computer or your smartphone is so versatile it can do so many different things in order to make it perform a new task all you have to do is to load the corresponding program into the memory and you're set to go that is not the case for connectionist machines for connectionist machines the computer is the program because the program and its memory are all encoded in the connections so if you want to change the program you have to change the machine itself which is why we don't we don't generally go ahead and go about building a neural network machine for every task that you want to perform we usually emulate these machines these connectionist machines on regular harvard or princeton architecture machines and so to recap neural network based ai has taken over most ai tasks neural networks originally began as computational models of the brain more generally as models of cognition the earliest model of cognition was associationism the more recent model of the brain is connectionist where according to which neurons connect to other neurons and the workings of the brain are encoded in these connections current neural network models emulate the structure and they are connectionist machines and so again here we are connectionist machines are networks of processing elements all the information the machine uses to perform its computation is stored in the connections between these elements but what are these individual processing elements themselves to understand this let's return to the brain the individual units in the brain are neurons which look somewhat like this the neuron has a head that called the soma within which the nucleus resides the soma has several protuberances which are called dendrites and a long stem that separates out into a quantum axon that separate out separates out into fibers so when neurons connect to one another these axon fibers of one neuron connect to the dendrites of the other neuron at a connection called as synapse so the neuron receives incoming signals from other neurons through the synaptic connections of its dendrites to the axons of the other neurons the neuron generates spike trains and the rate of these spike drains depends on the total input signal and these pipelines travel down the axon to other neurons at a high level we can think of it this way if the total input to the neuron from the other neurons exceeds a threshold this neuron fires that signal travels down this long leg the axon from which it's transmitted to other neurons and so here's an interesting uh factoid neurons don't generally undergo cell division you neurons are formed from stem cells and the the production of new neurons is minimal after birth particularly the brain and so whatever brain you're born that is mostly the brain that that's going to that's more or less the maximum amount of brain you're ever going to have in your life and here's another interesting fact this this axon is protected by a sheep called the myelin sheath which is made mostly of fat and it turns out that uh if you lose this fat this fat protects the axon and so it protects the neuron and if you lose this fat you have these the neurons can no longer function and in fact the more fat you have in your head the more functioning brain you have in your head and so you're literally your brain is literally mostly just fat and being called a fat head is actually some some something of a compliment because it means you're really smart you have a very highly functioning brain in fact they studied uh einstein's brain to find find out how it was different from normal people's brains and they found he had more fat in the spring than normal people did anyway so we looked at the neuron and now we want to come up with a computational model for the neuron the first real computational model for the neuron was proposed by these two uh warren mcclellan and walter pitts rodriguez this guy here was a neurophysicist in the university of chicago walter pitts when he met michael who was a homeless kid in his late teens uh who hadn't gone really completed school who wasn't going to college but was corresponding with mathematicians like bertrand russell russell and meccalo took a liking to him i suppose he invited him to stay with him and the two of them began working on the problem of trying to model the brain and they eventually came up with this paper mclo and it's a logical calculus of the ideas immanent and nervous activity this was published this work was done when pace was just 19 and it was published when uh pits was only 20 and even at that time this was somewhat of a momentous paper so here is the actual model their body for the neuron the neuron itself is shown by this triangle here the line going out represents the outgoing signal from the neuron the collide lines represent incoming signals from other neurons which connect through to this neuron through these synaptic connections there are two kinds of synapses excitatory synapses and inhibitory synapses and excitatory synapse transmits weighted transmits inputs into the neuron and an inhibitory synapse has the property that if any signal comes down an inhibitory synapse the neuron immediately stops firing and regardless of what is happening through the excitatory synapses the neuron is not going to fart and the property of the neuron is that if the inhibitory synapse is not turned on and if the total signal from all of its connections to the incoming all of its incoming connections exceeds a threshold generally set to 2 the neuron will fire and you can see now how you can use this simple structure to compose various keys but appropriate choice of excitatory and inhibitory synapses so here for example uh in each case the the final triangle represents the neuron it will fire if the total input is at least two each line is assumed to have a time delay a unit time delay so in this first case if neuron one fires one time instant later neuron two gets two inputs it's going to fire this is just a delay here if either neuron one or neuron two five then neuron three is going to get two inputs and so it's going to be going to five so this the output of neuron three is one r two here for neuron three to get two inputs both one and two must fire and so this the output of neuron three over here is one and two here for neuron three to fire one must not one must fire and two must not fight because if two fires it will inhibit the output and so the output of neuron three is one and not two so as you can see using these basic boolean gates you can construct any boolean circuit which means you can use the mclaren pits neuron and to compose pretty much any boolean circuit mechanics also show you can produce even more complex responses for example this circuit models temperature response now typically what you observe is that if you touch something that's really cold for a very brief amount of time the immediate response is that of heat you have to keep touching it for an extended period of time to actually sense cold and they came up with a model for it here but let me call this triangle here neuron number five and this guy over here six so suppose you have a very short cold sense the cold stimulants at time 0 this guy fires at time 1 that goes from here and this guy fires but at either time the coal sensor sensation neuron gets only one of these two inputs because the input stimulus is so short that in the first instant this fires but not this in the second instant this does not fire but this does either way the output remains the total input is one so the cold sensor never fires but then after two instance this guy here fires because it got a signal from here and that makes neuron number three fire and you get a heat sensation so if you have a very brief cold stimulus you're gonna sense heat but if the cold stimulus persists for some time what happens at time zero this fires at time one this fires but because the cold stimulus is persistent this one two fires and so from time to on the cold sensor continues to fire so you sense cold but what about the heat sensor in two time instance this gets an input from this guy but because the cold stimulus is persistent this also gets an inhibitory response from the cold the cold cold receptor and so this guy never fires and so the heat sensor doesn't go on so what you have is a model that explains why you sense heat for very brief cold stimuli but for extended stimuli only sends code so it's a pretty powerful model the mclean pits model and yet they made several claims that that garnered them uh a significant amount of criticism michael and pitts claimed that their networks would compute a small class of functions which is true but they also claim that if you provided the network with tapes they would become turing machines just also likely to but the these are they claim that their machines are tuning complete which is clearly not true because you know these are finite state machines and they cannot be too incomplete also they didn't actually prove any results themselves and they didn't provide any learning mechanism whereby the model the network could learn its connection rates from experience from data for that we had to wait for another guy donald here heb goes another interesting character he was a canadian and he initially trained to be a novelist then he wanted to be a farmer then he wanted to be a hobo and then figured that he'd be a school teacher and eventually said well i've tried it all now uh and educate myself so he went and got himself a phd in psychology from harvard and then went on to work on a variety of places in 1949 he published a book called organization of behavior where he outlined a mechanism whereby neuronal cells in the brain could learn their connection weights from experience from data and here's how we summarize the mechanism when an axon of cell a is near enough to excite a cell b then repeatedly or persistently takes part in firing it some growth process or metabolic change takes place in one or both cells such that is efficiency as one of the cells firing b is increased basically he said as a repeatedly excites b its ability to excite the improves or in other words if neurons fire together yes the connection between them gets stronger we can stay safe succinctly as neurons that fire together wire together here is hebb's model for how neural neuronal learning happens at any neuronal connection let's look at one connection between two neurons i'll call them x and y neuron x connects to neuron y at a synapse where the axonal connection known as the synaptic knob connects to the dendrite of the second neuron so although we call it a connection these two never really touch what happens is that anytime a stimulus comes down over here the synaptic knob releases some chemicals called neurotransmitters that go and excite this dendrite this dendrite now every time a signal from x successfully triggers a response from y this synaptic knob gets a little bit larger and that makes then it makes it easier for the knob to excite y the number the synaptic knob of x to x right y the next time around so if neuron x repeatedly triggers neuron y x becomes more able to trigger neuron we can write this in equations in this form let w x y be the weight of the connection between neurons x and y so in this model x all neurons produce binary outputs so both x and y are either zero or one then the hepian learning rule says that any time both x and y become one the weight of the connection increases so you can say w x y equals w x y plus eta times x y observe the second term becomes one only if both x and y are one so anytime both of these fire together the weight increases by a small amount hater so this is the famous famous heavy and learning rule and is in fact still the basis of many learning algorithms in machine learning but then the problem is that the hebbian learning rule as donald had defined it is unstable the weights over here this term ada is positive so the weights can only increase and they will keep increasing continuously and so as a result of random activity in the brain each of these weights will get stronger and stronger and eventually just saturate and in the end you're going to have a mess which doesn't really perform anything useful so we need to be able to account for this various proposals were made to try to modify this model allowing for weights to be normalized to decrease and so on for example in generalized heavy and learning also known as angus rule and input x i uh uh an input x is allowed to contribute to many outputs y j which basically says that a neuron can connect out to multiple other neurons but in order to determine how much the weight between an input and a particular output must be updated using the evian rule we first subtract out the contribution of x i to the other outwards so that only the residual is used to increment the weight of the connection between x and any specific one and there were many other such rules but then most of them continue to have problems the best and most lasting improvement to the hebbian learning rule came with the introduction of a new model that was introduced by this guy this is frank rosenblack he was a psychologist and a magician at yale and in 1958 he proposed a neural model called the perceptron which and when he proposed it it was actually hey we still use the model that's how good it was but when he was proposed that he was so confident of its abilities that he thought it could solve all ai problems rosenblatt's original proposal related to perception in the eyes so it included things like efferent and afferent and the efferent connections outgoing and incoming connections and various other details we have slides explaining the model in the slide deck take a look but mathematically the model simplifies to this model over here which is called a perceptron each neuron receives a number of inputs x1 through xn associated with each input is a weight so the neuron eventually computes a weighted combination of all of these inputs if this weighted combination exceeds some threshold t the neuron fires and the output becomes a one otherwise the output is zero so we can write it in this manner using equations if the weighted sum of the inputs minus the threshold is greater than or equal to zero the output of the neuron is one otherwise it is zero so electrical engineers amongst you are going to recognize this model as threshold logic rosenblatt originally assumed that with appropriate choice of weights this model could represent any boolean function and perform any logic and he actually managed to sell the popular media at the time on this idea so the there was considerable hype about the model and there were various articles about it for example here is something in the new york times of 1958 the embryo of an electronic computer that the navy expects will be able to walk talk see right reproduce itself and be conscious of its existence or the the tools oklahoma times of 1958 frankenstein monster designed by navy that things and uh clearly there were very high expectations for the model frozen blood also and the key feature here is that rosenblatt also provided a very nice learning algorithm with which we could learn the weights of the connections to perform any boolean task so here again let w be the weight of the connection between an input x and the output y of x but then the then or rather the input x the weight of the connection between a neuron x and we run y where y computes some function of x then we can say the weight of the connection w equals w plus eta times x times the difference between the desired output of the of the neuron and the actual output of the of the neuron now observe that this is very similar to hebb's learning rule except that in hebs learning rule you updated the weight by a product of the actual output and the input here you're updating it by the product of the input and the error if the neuron is already doing the right thing there's no need to update the weight you only update it when you make mistakes so rosenblatt not only proposed this rule we also showed that if you have linearly linearly separable data which means the data for uh which means that uh the data for which the instances for which y is expected to be meaning data instances for which y is expected to be one can be separated from data instances for which y is required to be zero by a hyperplane this means the data are linearly separable for such data he showed that this learning rule actually converges after presenting only a limited number of training examples to learn from so it's not only was it a powerful model he also had a learning algorithm and sure enough it's relatively easy to see how these perceptrons can mimic any boolean function so consider these examples and these figures the number the the circles represent the neuron the number within the circle represents the threshold the number on top of each line represents the weight of the connection the neuron fires if the total weighted input is at least equal to the threshold so to this top left neuron only when both x and y are 1 is the total input going to be two so this will only fire if both x and y are ones so this the output here represents x and y here if either x or y are one the total input is one the threshold is one so this will fire if any one of these two takes the value one so the output is x or y here when x is one the weight is minus one so the total input is minus one which will be less than the threshold and so the neuron will not fire whereas if x is zero the total input is zero which matches the threshold and the neuron will fight so this perceptron just negates the input and so you can uh you can sort of compose boolean gates very trivially using perceptrons and this is what led frozen black to think that you can compose more complex functions and that you can in fact compose any boolean function at all which was the uh which was the fallacy that eventually was shattered by minsky and pappard who showed that there is absolutely no setting of weights and threshold using which this perceptron can compute the xor function we'll see why this is later but in any case this meant that a single perceptron could not compute any odd boolean function it was limited this discovery led to so much dismay that research funding into perceptrons and neural networks sort of dried up for many many years so we need to resolve this problem the perceptron is a powerful engine but not good enough and yet it sort of makes sense as a model for the neurons of the human brain so how do we make it perform more complex tasks to see this we have to go back to the brain now the fact is the brain is not just one single neuron it's in fact a network of neurons a single neuron by itself will not be able to do very much you need the entire network and so for our computational models too although the individual perceptrons are rather weak we can greatly extend what we can do with them by networking many of them for example if you just connect up three perceptrons like so you can compute an xor here we've connected the three perceptrons in a layered manner we have two perceptrons in the first layer which directly operate on the inputs and then the outputs of these two perceptrons go into this third perceptron the first perceptron here connects computes x or y the second perceptron computes not x or not y and this one and the two so this gives you x xor y now some terminology over here this layer of perceptrons we will call it a hidden layer because the outputs of these perceptrons are not directly observed you only know what you observe is the output of this final unit over here and this entire layered structure is what we will call a multi-layer perceptron this will be the hidden layer and this unit whose output you do observe will be called the output neuron or more generally as we will see later the output layer so once you realize that you can begin connecting up perceptrons into a network to compute more complex boolean functions then you realize that you can indeed compute arbitrarily complex boolean functions simply by connecting up perceptrons in the right way in cognitive terms it means you can compute any arbitrary boolean function or sensory input so here for example this network operates on four inputs x y z and a and computes this rather ugly function of the four inputs so a network of perceptrons was in fact a universal function or a universal model for boolean functions and so the story so far is that neural networks began as computational models of the brain they are connectionist machines they comprise uh networks of neural units the mcclellan pits with model model neurons as boolean threshold circuits and modeled the brain as performing propositional logic but provided no learning rule hebb's learning rule was that neurons that fire together wire together and this turned out to be unstable rows blocks perceptron is a variant on the mechaloid neuron with a provably convergent learning rule but then individual perceptrons are limited in their capacity that's what means key and packet form multi-layer perceptrons can model arbitrarily complex boolean functions however everything we've discussed so far was about boolean inputs and boolean outputs but then our brain is not a boolean machine we obtain real value inputs through our various sensors and make real valued inferences so how does this perceptron square with that now there are two sides to it the real value input and the real value output let's just take the first first part first so it's considered real valued inputs so here now these inputs x1 through xn are real valued the weights are also real value the perceptron fires if the weighted sum of the inputs exceeds a threshold so the output remains boolean so in math again we say the output of the perceptron is one if the weighted sum of the inputs minus a threshold is greater than or equal to zero otherwise the output is output of the perceptron is zero we can write it you in an alternate way we can say instead that the perceptron first computes an affine combination of the inputs an affine combination of the inputs is a weighted sum of the inputs plus a biased term and that this affine combination of the inputs is then put through a threshold activation function where the threshold activation operates on this affine combination and outputs a one if the input to the activation is non-negative and zero otherwise and specifically when you set when you set this bias term to be minus of t then this just becomes exactly the same as the top as the perceptron model we saw earlier but then once we write things in this way we can easily extend the perceptron to have real valued outputs instead of a threshold activation now we can have something like a sigmoid of this kind which is a smooth version of a threshold and which operates on the sapphire combination and now the perceptron produces real value outputs between zero and one instead of just a binary zero or one and this real-valued output can be viewed as the probability of firing or we can replace the activation with any real valued activation function that operates on this affine combination of inputs we'll see several of these later in the lecture in later lectures the result here is that now the perceptron is going to be a mapping from real valued inputs to real valued outputs so we've seen how we can get from real stereos but for the purpose of building our intuitions and for interpretability we will continue to assume that the inputs are real value but the outputs are boolean so now the perceptron operates on real vectors but outputs a boolean value so here is the real value perceptron again it operates on a real valued space of inputs if the weighted sum of the inputs exceeds a threshold the output is going to be 1 otherwise it's 0. the boundary is where the weighted sum is exactly equal to the threshold that is where summation w i x i equals t that is the equation for a hyperplane in 2 dimensions if this were operating on the two-dimensional input that would be the equation for a line and so this equation says that the boundary that that the output that there is a line defined by summation w i x i equals t and the perceptron is going to be an up to output of one when the input is on one side of the line on the pink side and the zero when it's on the other side and because the boundary between between the two is a line or more generally a hyperplane which is again linear this this perceptron can operate as a linear classifier where it draws a linear boundary between two classes but if you plot if you visualize this perceptron fully so for example if the perceptron is working on two-dimensional inputs then a complete visualization would require a three-dimensional plot where these two coordinates represent the inputs and this third coordinate represents the output and this plot here shows the the response of the perceptron when the input is to one side of this boundary the output is zero on the other side it becomes one so the overall response is the step function also called a heavy side function so once we see that the perceptron is actually performing a linear classification task or or that it's computing a step function we can see how the perceptron can model boolean different boolean functions when the inputs are boolean then they're not going to be real value that if you had two inputs it could the perceptron could take only one of four possible values the input to the perceptron could take only one of four possible values 0 0 1 0 zero one and one one and if the decision boundary the boundary the hyperplane separating the one and the zero sides where this line over here then the output of the perceptron is going to be one for these three units and you can see immediate these three combinations of inputs and you can immediately see why this perceptron is modeling x1 or x2 the output is 1 when either x1 or x2 is 1. if the boundary is out here then the perceptron outputs a 1 only when both x1 and x2 are 1. so this is x1 and x2 if this is the boundary then the perceptron is going to output a z to zero when the input is when x one is one it's going to output a one when x one is zero it's ignoring x two and so this perceptron is simply modeling not x1 and now you can see why this perceptron cannot model an x y and if you can't like i encourage you to think about the problem and and then reason out why it cannot model an exon but if you can't figure out why we'll see that in the next class but anyway once you see how a single perceptron behaves you can immediately see how to network them to construct arbitrarily complex decision boundaries for example like this pentagon say we model that we wanted a model that would output a one for inputs that lie within this pentagon within this yellow region and output is zero outside here's what we did we'd include one perceptron to capture the lower boundary of the pentagon so this perceptron would output a one in the yellow region and zero here we'd have one perceptron for this boundary or which would output a one here in this yellow region and zero here one perceptron for this guy and one for this guy and one for this car and eventually uh we have this uh this collection of perceptrons where all five perceptrons will output a one for inputs that lie in this pentagon inputs that like anywhere outside the pentagon at least one of the perceptrons is going to output a zero and so now if i just add a final perceptron at the top which just sums up the inputs it sums up their outputs and compares it to a threshold of five this perceptron is going to output a one for inputs that are within the pentagon and zero outside and once we figure that out we know how to construct even more complex decision boundaries like the strange figure to the left the two pentagons here are you can you have one subnet for the first pentagon another subnet for the second pentagon and then you'd order their two outputs a perceptron which mars the two and so now the overall network is going to output a one if the input is within either one and zero outside or you can have more complex boundaries like this crazy figure here or even a decision boundary which looks like this human or something that looks like a horse uh it's uh now if you can't see how this is feasible think about it and let you figure it out and we'll touch upon this again in the next class so now we know how we could build multi-layer perceptrons for that capture complex decision boundaries so this means they can perform complex classification tasks your inputs are now high dimensional for example if you were building a network to recognize the dj2 uh in the missed cordless the digits in the ms corpus are 784 pixel images so it's a 784 dimensional input and the region so the all inputs live in a 784 dimensional space and the region of the space within which the instances of tula is going to be some region of this kind um shown illustratively as a peanut out here and you want the network to output a one when the input lies within the region and zero when it lies outside and we know how to construct exactly such a network and so the story so far mlps are connectionist computational models they can individual perceptrons are computational equivalence of neurons the mlp is a layered composition of many perceptrons can model boolean functions individual perceptrons can act as boolean gates networks of perceptrons are boolean functions mlps are boolean machines they represent boolean functions over linear boundaries and they can represent arbitrary decision boundaries so they can be used to classify data so we've seen how to go from boolean inputs to boolean outputs from boolean inputs from real inputs to boolean outputs which extends trivially to categorical outputs but what about continuous value outputs so what about a situation like this where the input is real valued and the outputs are also required to be continuous value so you want to model a function of this kind to see how this can be done consider a function of a scalar input so here now the input is a real valued scalar you want the output also to be a real value scanner and now consider this very simple trivial little model we have two perceptrons each of which outputs a one when the input exceeds a threshold this perceptron outputs a one as the input exceeds threshold p1 this outputs a one as the input exceeds threshold t2 assume that t1 is less than t2 these two are connected to this final addition additive unit and their weights are 1 and minus 1 respectively now consider the overall response of this net so when x is less than t one both of these guys are going to output a zero so the overall output of the this little net is going to be zero when x is greater than t1 but less than t2 then this guy will output a one but this one will output a zero so this so the weighted sum of their outputs is going to be a one as you continue to increase x and x exceeds t2 both of these guys will output a 1 but since their weights are 1 and minus 1 respectively they're going to cancel each other out and the final output will be a 0. so the overall response of this net is that as x increases until t the output is zero between t1 and t2 the output is one after t2 the output is zero again so it has this step like response and you can specify the thresholds of course now once we know how to compose this we can compose a network to approximate any arbitrary scalar function of a scalar input say i want a network that approximately gives me this function shown by the blue curve i partition the axis into many ranges and use one subnet of two neurons for each range which produces a step whenever the input is within that range i scale up so each pair over here represents one small range in the input i scale their outputs by the height of the function within that range and then i sum them all up and so now this network is going to approximate this blue curve as the hull of all of these steps and you can see that it approximates the blue curve and i can make the approximation arbitrarily precise by making these ranges narrower and narrower in other words an mlp can approximate any scalar function or scalar inputs to arbitrary precision now this is just for scalar inputs but it generalizes to multivariate or vector inputs and i'll let you think about how this could be done if you can't figure it out on your own we'll see how this is done in the next class so here's the story so far multi-layer perceptrons are connectionist computational models we have classification engines but they can also model continuous valued functions here are in addition to the various things we saw so far here are other things neural networks can do they can model memory they can represent arbitrary probability distributions over real or complex value domains they can model a posteriori and eight priori distributions of data they can even generate data from complicated or even unknown distributions and so from an ai perspective these neural networks are but so so they're very complex but from an ai perspective as we just saw eventually the the neural networks are just functions they get an input and then they compute the function layer-wise to predict an output so more generally given one or more inputs they predict one or more outputs but then going back to the tasks that we began with voice recognition image captioning neural networks what are all of these magic boxes these magic boxes too are functions they take in an input like a voice signal and produce an output like a transcription or they take in an image and produce a caption or a game state and produce the next means next game state so these are functions and when we perform these tasks using neural networks all of these functions are modeled and approximated by neural networks and so the story so far is that multi-layer perceptrons are connection is computational models a classification engines they continue they can model continuous valued functions and interesting ai tasks are functions that can be modeled by the network this is the key takeaway from this on this lesson and so we come to the end of this lecture we will continue in the next lecture on the topic of neural networks as universal approximators and the issue of depth and networks thank you
Info
Channel: Carnegie Mellon University Deep Learning
Views: 1,415
Rating: 5 out of 5
Keywords:
Id: 3opRvP5a8oo
Channel Id: undefined
Length: 71min 13sec (4273 seconds)
Published: Fri Dec 18 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.