How are memories stored in neural networks? | The Hopfield Network #SoME2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
The amount of random access memory - RAM - in  a typical laptop is probably around 8 to 32 GB.   That's the part that directly interacts with the  CPU. Aside from that, your hard disk might have,   say, another terabyte or so of memory. But  how much memory do you - that's to say:   the human brain - have? And can we even measure  it in bytes? But maybe that's not even the first   question we should ask. However much memory the  brain has, where is it? Because every piece of   memory in a computer has a physical location. To  access a piece of data in RAM for example, you   have to know the binary address associated with  that location. In fact, for the CPU this really   comes down to turning on just the right wires  to retrieve the bits at the desired location.   Now imagine a different kind of memory.  Instead of specifying the *where* of a memory,   its binary address, how about we could specify  the *what*, its content. A memory system where,   if we provide an incomplete version of the memory,  it just sort of ... autocompletes. Of course   with the right software your computer can already  do this. But it's not how computer memory works   on its basic level. The point of this video is  to convince you that autocompleting memories,   also known as *associative memory*, is a kind  of natural behavior of networks of neurons.   With that it'll become clear that it  doesn't really make sense to measure   memory capacity in networks of neurons in  the same way we measure computer memory.   The biggest difference might be:  computer memories have a place,   a fixed location, but as we'll see, the memories  in an associative network rather have - a time. Computer memory is measured in bits, binary  switches of ones and zeros. A string of eight   such bits can represent anything from letters  to integers. For our purposes let's visualize   them as patterns of this kind, like these 64 bits  representing this 8x8 image of binary pixels. I   always find that there's a piece missing from the  story of bits as memory and that's the following:   how do I get to a memory once it's saved, say  in RAM? Because on its own it doesn't do much.   It's only when we retrieve it that it becomes  useful. So how do we? Well, broadly speaking, and   I'm glossing over tons of technical detail here,  every piece of data in RAM is matched to a binary   address. And this binary address really eventually  boils down to a set of wires, in this case eight,   that are either turned on or off. Each piece of  data is in a different physical location and can   only be retrieved by knowing its address. How the  reading and writing of memories is accomplished,   is really the meat of programming and is another  story. What I want you to remember is the peculiar   fact that memories are matched to addresses and  that's ultimately the only way to retrieve them.   Contrast this with what we believe about the  brain. There isn't a central orchestrator   like a CPU and there aren't any addresses. Rather,  there is a constant buzz of activity of many   independent units called neurons. In this video,  we'll try to make some sense of the buzzing of   activity of networks of neurons by introducing a  mathematical model called the *Hopfield network*,   named after the author of this 1982 paper.  And as much as this has to do with memory,   more generally this video aims to be a lesson  in modeling itself, which I always think of as   the art of the essential. This is a neuron.  I'm almost certain you've seen something like   this before. But it sometimes pays to remember  why we always come back to this when we want   to understand things about the brain. The reason  is, I think, that it has a rather simple behavior:   it integrates electrical signals from other  neurons to determine its own activity and then   it broadcasts that activity back to the network.  Mathematically the story goes something like this:   there's electrical signals coming in from other  neurons, which we will say are just some numbers.   Then the synapses act as multipliers on these  signals - another set of numbers - and then the   activity of the neuron is based on the sum of the  weighted inputs, and by "based on" I mean that   it's fine to apply some function after computing  the sum. And that's it. So it gets interesting   once we turn this into a network, connecting the  outputs of neurons to the inputs of other neurons.   This is a special type of neural  network. It's a *recurrent network*,   meaning that there are back and forth connections  between the neurons. I haven't drawn them but   remember that any such edge is actually two edges  so that the two neurons influence each other.   Okay there's details here that we need to  get into, but first, what does this have to   do with memory? Well it needs to be somewhere  in here doesn't it? Where? Remember the idea   of an associative memory, which is the ability  of a system to sort of "pattern-autocomplete".   Let's try a definition of memory that's  slightly wider than maybe what we're used to.   Let a memory system be a system that,  after having been in a certain state,   a configuration, it has the ability to return  to that state later on. Now our computer memory   from earlier actually has this property if  we include the CPU into the memory system.   Our network seems different though. So let's get  creative. There's other things in our everyday   lives that fall under our definition of memory and  one might be - and hear me out on this - a simple   plastic bottle. If it's crushed, in other words  its configuration changed, it can sometimes return   to its earlier state, which in that sense could be  said to have been memorized. And the metaphor is   not arbitrary. I actually do think that networks  of neurons are kind of like that. What i mean is:   a neural network is a system with a pattern of  activity that dynamically evolves. If, somehow,   we could construct our network such that it would  have some preferred state and would return to   that state over time if it was perturbed, then  that could reasonably be qualified as a memory.   This is a network of 64 neurons that I  cleverly constructed such that it memorized   this pattern of 8x8 binary pixels. So what's going  on? To describe what this model is actually doing,   we need to take the following steps. Remember I  said time was important? We need to describe how   the activity of the network changes over time. And  then there is the question of *learning*. How do   we actually imprint memories into the network? And  this will have to do with the connections between   the neurons. Finally, we need to understand if and  when the network converges to its memory states.   The crucial ingredient of our network really  is the fact that it is a dynamical system, its   activity changes over time. By activity we mean  that each of the now 16 neurons in the network   is described by a number and that this number is  a function of time.Aand let's just assume that   time moves forward in discrete steps. Furthermore,  since we are interested in binary memory states,   we'll assume that activity means that the  neurons can only be in one of two states:   inactive, say minus one, and active, say  plus one. Anyway this all leaves us with   16 minus or plus ones at any given time,  which we will call the *state* of the network What actually happens if we  increase time by one step?   Imagine folding out the time dimension in space.  Now we select one of the neurons at random and   update its state according to the input  of all other neurons in the network.   The rest of the neurons stay the  same. And this we simply continue. But hold on you, might ask, how  do we update the state exactly   and why update only one neuron at a time? Well  for the second question, we could, in fact,   update all neurons at once but the issue here  is plausibility. Because that would require   a global updating signal, almost like a clock,  instructing all neurons to update simultaneously.   It's slightly more realistic, although not  too much to be honest, to let them update   asynchronously. Okay and for the other question  - yes, what is the actual update equation?   Well it's remarkably simple. It's a weighted sum  of the states of the other neurons, "weighted"   meaning that each state is multiplied by the  strength of the connection between the neurons.   And since the connections in that sense "weigh"  the inputs, from now on I'm actually going to   call them "weights". But then of course to ensure  that the result is plus one or minus one again,   we'll make use of that function I  mentioned earlier, f, and that's it.   Those of you familiar with linear algebra  will have recognized this as computing   the dot product between the vector of neuron  states and the vector of connection weights. This is a network of 64 neurons and I'm just  going to tell you, without explaining how,   that it has memorized this pattern. Starting  it off in different initial states and then   running the equations I just described, selecting  one neuron at a time at random and updating its   activity - and wait let's just make this a little  simpler - we can see that the network really has   this intriguing property that it gravitates  towards the memory pattern in all cases - or,   well, it ends up with the anti-memory in some  cases. We'll ignore that. It has to do with a   certain symmetry in the network that we will get  to. Moreover, once it's settled into that state,   it doesn't change anymore. The memory  pattern is what is called a *stable state*   of the network. So can we be sure that  this always happens? Well, no. For example,   things start to get complicated when there is more  than one memory stored in the network. We'll get   to that. but for the simple case of just a single  memory - yeah, the network will converge to either   the memory or its anti-memory or, as an edge case,  to the completely inactive state. So let's recap:   networks are just vectors that evolve in time and  we update their elements asynchronously. Given   some configuration of weights, there are stable  states in the network which we call memory states   and we have seen, although not proved, that the  network is kind of attracted to these states.   That leaves the following question: how do we make  the network memorize a certain pattern? In other   words, how can we design the stable states of the  network? This amounts to setting the weights of   the network, which turns out to be a matrix with  as many rows and columns as there are neurons. For   example, the state of this neuron is determined by  the weights in this row of the matrix. At first,   this might seem impossible, especially if we wish  to store more than one memory in the network. So I'm just going to tell you the  magic rule and then I'll motivate it.   Given a desired memory state with, say, eight  elements, we are looking for an 8-by-8 matrix.   We will just say, seemingly out of nowhere, that  the weight between two neurons, i, j in general,   is determined by the product of their states in  the memory. For all of you keen on non-linear   algebra: this means that the matrix is an  outer product of the memory vector with itself.   Except for the diagonal which we set to 0 since we  don't want any self-reinforcement. But why? Think   of it this way: our whole approach was in some  sense to build something interesting from many   simple parts. So there really should be a way for  the two neurons to determine the weight between   them independent of the rest of the network.  This is a principle called *Hebbian learning*   and it is exceedingly plausible because remember  that weights are supposed to be synapses? And   synapses in actual neurons also would have no way  of knowing what goes on in the broader network.   And so why the multiplication? Well here comes  the reason I coded the binary states minus one   and plus one. If you map out all four combinations  of states of the neurons, you'll see that the   weight will have positive sign whenever the  neurons in their memory state agree and negative   if they disagree. And this should make sense  because a positive weight lets a neuron project   its state onto other neurons and a negative weight  lets a neuron flip the state of other neurons.   It's almost like the neurons behave like charged  particles, maybe ... One last question before we   can see it in action, and that's - how do we store  multiple patterns at once in the same network?   We do it by computing the outer products for all  desired memory patterns. This gives us a matrix   for each and then we average those matrices. This  gives finer and finer gradations of the weights.   Now however things start to become  a little bit more complicated.   There's no way to guarantee that all  memory patterns are stable states.   The memories will start talking to each other,  fusing into new memories, which I find frustrating   and super interesting at the same time. Here's  the network with four memories again. It sometimes   converges to one of the memory states, yes,  but in other cases it converges to something   in between, a merging of memories, which I  find, well, almost a kind of human mistake.   And with this we are finally making some  progress on our question from the very   beginning of the video: how much memory do you  have? Originally we might have answered this by   giving some number of bytes, but now the question  presents itself very differently. It's more like:   how many memory patterns can we store in a  recurrent network of a given size such that   they are stable states of the network? And the  answer is, at least for this model, not very many!   The original paper showed that this model has only  linear memory capacity. That's to say the number   of stable states grows as a linear function of  the size of the network. Plus there's a hidden   assumption in this graph, too, which is that all  memory states are uncorrelated, which for any set   of pictures like this is totally not the case. I  tried my best picking out some not-so-correlated   images to really show what this network can do  and the results are ... well see for yourself.   Realistically, that means that this model is maybe  too simple after all and that's not surprising   given all the violence we did to these networks  with our many simplifications. But the goal   of this video wasn't to convince you that this  model can be used in any practical sense anyway,   although in a follow-up video, I want to tell  you what steps people took to make this model   actually useful for deep neural networks. What  I wanted to achieve with this video is, first,   to warn you of false comparisons. Networks  of neurons don't behave like USB sticks   and why should they? But secondly, it's to show  you how with modeling approaches, walking a very   thin line between complexity and simplicity,  we can sometimes start to conceive of the world   differently than otherwise we would have. You  might have set out picturing memory as something   static, but I'm hoping now you're willing to  consider that it might be something dynamic,   the invisible stable states of  a network buzzing with activity. you
Info
Channel: Layerwise Lectures
Views: 684,367
Rating: undefined out of 5
Keywords: SoME2, Neural Networks, Memory, Hopfield
Id: piF6D6CQxUw
Channel Id: undefined
Length: 15min 14sec (914 seconds)
Published: Mon Aug 15 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.