The amount of random access memory - RAM - in
a typical laptop is probably around 8 to 32 GB. That's the part that directly interacts with the
CPU. Aside from that, your hard disk might have, say, another terabyte or so of memory. But
how much memory do you - that's to say: the human brain - have? And can we even measure
it in bytes? But maybe that's not even the first question we should ask. However much memory the
brain has, where is it? Because every piece of memory in a computer has a physical location. To
access a piece of data in RAM for example, you have to know the binary address associated with
that location. In fact, for the CPU this really comes down to turning on just the right wires
to retrieve the bits at the desired location. Now imagine a different kind of memory.
Instead of specifying the *where* of a memory, its binary address, how about we could specify
the *what*, its content. A memory system where, if we provide an incomplete version of the memory,
it just sort of ... autocompletes. Of course with the right software your computer can already
do this. But it's not how computer memory works on its basic level. The point of this video is
to convince you that autocompleting memories, also known as *associative memory*, is a kind
of natural behavior of networks of neurons. With that it'll become clear that it
doesn't really make sense to measure memory capacity in networks of neurons in
the same way we measure computer memory. The biggest difference might be:
computer memories have a place, a fixed location, but as we'll see, the memories
in an associative network rather have - a time. Computer memory is measured in bits, binary
switches of ones and zeros. A string of eight such bits can represent anything from letters
to integers. For our purposes let's visualize them as patterns of this kind, like these 64 bits
representing this 8x8 image of binary pixels. I always find that there's a piece missing from the
story of bits as memory and that's the following: how do I get to a memory once it's saved, say
in RAM? Because on its own it doesn't do much. It's only when we retrieve it that it becomes
useful. So how do we? Well, broadly speaking, and I'm glossing over tons of technical detail here,
every piece of data in RAM is matched to a binary address. And this binary address really eventually
boils down to a set of wires, in this case eight, that are either turned on or off. Each piece of
data is in a different physical location and can only be retrieved by knowing its address. How the
reading and writing of memories is accomplished, is really the meat of programming and is another
story. What I want you to remember is the peculiar fact that memories are matched to addresses and
that's ultimately the only way to retrieve them. Contrast this with what we believe about the
brain. There isn't a central orchestrator like a CPU and there aren't any addresses. Rather,
there is a constant buzz of activity of many independent units called neurons. In this video,
we'll try to make some sense of the buzzing of activity of networks of neurons by introducing a
mathematical model called the *Hopfield network*, named after the author of this 1982 paper.
And as much as this has to do with memory, more generally this video aims to be a lesson
in modeling itself, which I always think of as the art of the essential. This is a neuron.
I'm almost certain you've seen something like this before. But it sometimes pays to remember
why we always come back to this when we want to understand things about the brain. The reason
is, I think, that it has a rather simple behavior: it integrates electrical signals from other
neurons to determine its own activity and then it broadcasts that activity back to the network.
Mathematically the story goes something like this: there's electrical signals coming in from other
neurons, which we will say are just some numbers. Then the synapses act as multipliers on these
signals - another set of numbers - and then the activity of the neuron is based on the sum of the
weighted inputs, and by "based on" I mean that it's fine to apply some function after computing
the sum. And that's it. So it gets interesting once we turn this into a network, connecting the
outputs of neurons to the inputs of other neurons. This is a special type of neural
network. It's a *recurrent network*, meaning that there are back and forth connections
between the neurons. I haven't drawn them but remember that any such edge is actually two edges
so that the two neurons influence each other. Okay there's details here that we need to
get into, but first, what does this have to do with memory? Well it needs to be somewhere
in here doesn't it? Where? Remember the idea of an associative memory, which is the ability
of a system to sort of "pattern-autocomplete". Let's try a definition of memory that's
slightly wider than maybe what we're used to. Let a memory system be a system that,
after having been in a certain state, a configuration, it has the ability to return
to that state later on. Now our computer memory from earlier actually has this property if
we include the CPU into the memory system. Our network seems different though. So let's get
creative. There's other things in our everyday lives that fall under our definition of memory and
one might be - and hear me out on this - a simple plastic bottle. If it's crushed, in other words
its configuration changed, it can sometimes return to its earlier state, which in that sense could be
said to have been memorized. And the metaphor is not arbitrary. I actually do think that networks
of neurons are kind of like that. What i mean is: a neural network is a system with a pattern of
activity that dynamically evolves. If, somehow, we could construct our network such that it would
have some preferred state and would return to that state over time if it was perturbed, then
that could reasonably be qualified as a memory. This is a network of 64 neurons that I
cleverly constructed such that it memorized this pattern of 8x8 binary pixels. So what's going
on? To describe what this model is actually doing, we need to take the following steps. Remember I
said time was important? We need to describe how the activity of the network changes over time. And
then there is the question of *learning*. How do we actually imprint memories into the network? And
this will have to do with the connections between the neurons. Finally, we need to understand if and
when the network converges to its memory states. The crucial ingredient of our network really
is the fact that it is a dynamical system, its activity changes over time. By activity we mean
that each of the now 16 neurons in the network is described by a number and that this number is
a function of time.Aand let's just assume that time moves forward in discrete steps. Furthermore,
since we are interested in binary memory states, we'll assume that activity means that the
neurons can only be in one of two states: inactive, say minus one, and active, say
plus one. Anyway this all leaves us with 16 minus or plus ones at any given time,
which we will call the *state* of the network What actually happens if we
increase time by one step? Imagine folding out the time dimension in space.
Now we select one of the neurons at random and update its state according to the input
of all other neurons in the network. The rest of the neurons stay the
same. And this we simply continue. But hold on you, might ask, how
do we update the state exactly and why update only one neuron at a time? Well
for the second question, we could, in fact, update all neurons at once but the issue here
is plausibility. Because that would require a global updating signal, almost like a clock,
instructing all neurons to update simultaneously. It's slightly more realistic, although not
too much to be honest, to let them update asynchronously. Okay and for the other question
- yes, what is the actual update equation? Well it's remarkably simple. It's a weighted sum
of the states of the other neurons, "weighted" meaning that each state is multiplied by the
strength of the connection between the neurons. And since the connections in that sense "weigh"
the inputs, from now on I'm actually going to call them "weights". But then of course to ensure
that the result is plus one or minus one again, we'll make use of that function I
mentioned earlier, f, and that's it. Those of you familiar with linear algebra
will have recognized this as computing the dot product between the vector of neuron
states and the vector of connection weights. This is a network of 64 neurons and I'm just
going to tell you, without explaining how, that it has memorized this pattern. Starting
it off in different initial states and then running the equations I just described, selecting
one neuron at a time at random and updating its activity - and wait let's just make this a little
simpler - we can see that the network really has this intriguing property that it gravitates
towards the memory pattern in all cases - or, well, it ends up with the anti-memory in some
cases. We'll ignore that. It has to do with a certain symmetry in the network that we will get
to. Moreover, once it's settled into that state, it doesn't change anymore. The memory
pattern is what is called a *stable state* of the network. So can we be sure that
this always happens? Well, no. For example, things start to get complicated when there is more
than one memory stored in the network. We'll get to that. but for the simple case of just a single
memory - yeah, the network will converge to either the memory or its anti-memory or, as an edge case,
to the completely inactive state. So let's recap: networks are just vectors that evolve in time and
we update their elements asynchronously. Given some configuration of weights, there are stable
states in the network which we call memory states and we have seen, although not proved, that the
network is kind of attracted to these states. That leaves the following question: how do we make
the network memorize a certain pattern? In other words, how can we design the stable states of the
network? This amounts to setting the weights of the network, which turns out to be a matrix with
as many rows and columns as there are neurons. For example, the state of this neuron is determined by
the weights in this row of the matrix. At first, this might seem impossible, especially if we wish
to store more than one memory in the network. So I'm just going to tell you the
magic rule and then I'll motivate it. Given a desired memory state with, say, eight
elements, we are looking for an 8-by-8 matrix. We will just say, seemingly out of nowhere, that
the weight between two neurons, i, j in general, is determined by the product of their states in
the memory. For all of you keen on non-linear algebra: this means that the matrix is an
outer product of the memory vector with itself. Except for the diagonal which we set to 0 since we
don't want any self-reinforcement. But why? Think of it this way: our whole approach was in some
sense to build something interesting from many simple parts. So there really should be a way for
the two neurons to determine the weight between them independent of the rest of the network.
This is a principle called *Hebbian learning* and it is exceedingly plausible because remember
that weights are supposed to be synapses? And synapses in actual neurons also would have no way
of knowing what goes on in the broader network. And so why the multiplication? Well here comes
the reason I coded the binary states minus one and plus one. If you map out all four combinations
of states of the neurons, you'll see that the weight will have positive sign whenever the
neurons in their memory state agree and negative if they disagree. And this should make sense
because a positive weight lets a neuron project its state onto other neurons and a negative weight
lets a neuron flip the state of other neurons. It's almost like the neurons behave like charged
particles, maybe ... One last question before we can see it in action, and that's - how do we store
multiple patterns at once in the same network? We do it by computing the outer products for all
desired memory patterns. This gives us a matrix for each and then we average those matrices. This
gives finer and finer gradations of the weights. Now however things start to become
a little bit more complicated. There's no way to guarantee that all
memory patterns are stable states. The memories will start talking to each other,
fusing into new memories, which I find frustrating and super interesting at the same time. Here's
the network with four memories again. It sometimes converges to one of the memory states, yes,
but in other cases it converges to something in between, a merging of memories, which I
find, well, almost a kind of human mistake. And with this we are finally making some
progress on our question from the very beginning of the video: how much memory do you
have? Originally we might have answered this by giving some number of bytes, but now the question
presents itself very differently. It's more like: how many memory patterns can we store in a
recurrent network of a given size such that they are stable states of the network? And the
answer is, at least for this model, not very many! The original paper showed that this model has only
linear memory capacity. That's to say the number of stable states grows as a linear function of
the size of the network. Plus there's a hidden assumption in this graph, too, which is that all
memory states are uncorrelated, which for any set of pictures like this is totally not the case. I
tried my best picking out some not-so-correlated images to really show what this network can do
and the results are ... well see for yourself. Realistically, that means that this model is maybe
too simple after all and that's not surprising given all the violence we did to these networks
with our many simplifications. But the goal of this video wasn't to convince you that this
model can be used in any practical sense anyway, although in a follow-up video, I want to tell
you what steps people took to make this model actually useful for deep neural networks. What
I wanted to achieve with this video is, first, to warn you of false comparisons. Networks
of neurons don't behave like USB sticks and why should they? But secondly, it's to show
you how with modeling approaches, walking a very thin line between complexity and simplicity,
we can sometimes start to conceive of the world differently than otherwise we would have. You
might have set out picturing memory as something static, but I'm hoping now you're willing to
consider that it might be something dynamic, the invisible stable states of
a network buzzing with activity. you