The Free Energy Principle originally
emerged from systems neurosciences as a way, a principled way, of understanding what the
brain does and how it does it. Subsequently, the principles proved to be so simple
and so powerful that they have been applied in a variety of contexts. So one could
almost regard the free energy principle as an organizing principle for any living system
that shows the characteristics of life. So, the reason I start like that is that there are
two roads to explaining or understanding the free energy principle. You can either start from the
perspective of people like Helmholtz in the 19th Century trying to understand unconscious inference
in the brain and build a story through analysis by synthesis and psychology through to current and
exciting developments in machine learning – things like Geoffrey Hinton’s Helmholtz machine. And
then how that has become contextualized in the enactivist or the embodied cognition
context. I’m generalizing these notions, and you end up with the free energy principle,
or you can start from the top and just ask very simple questions about what it is to be
alive. And, if you are alive and you exist, what sorts of behaviors must you show? And in
fact, if you answer those questions, you end up with exactly the same answers that you would
have gotten had you followed the historical route. For brevity, I’ll take the high road. I’ll go from
the minimalist assumption that things exist and then try and unpack that and show how one can get
to notions of the brain as an inference engine, sometimes called the Bayesian brain hypothesis.
The brain is one of the best examples of an organ that is actively constructing explanations
through its own sampling of the world. So, this inactive perspective is very important because
not only does the brain then have to explain all the sensory input, but it also has to choose
which sensory input to sample. It is in charge of gathering information and evidence for its own
predictions and own beliefs about the world. But I’ve jumped ahead, so now I have to explain to you
why is it that any system that exists will behave as if it has a model of the world and it’s trying
to gather evidence for its own model of the world. So, the story starts just by acknowledging
that if you want to talk about something, there has to be a separation between the thing
you are talking about and everything else. And, if there were no boundaries, there would be nothing
because there would be no distinctions between the thing and not that thing. Statistically speaking,
that distinction or that boundary is called a Markov blanket. It’s just a mathematical way of
separating states of some abstract world system: organism, culture, life, cell, brain into things
that are internal to the boundary that is owned by that system and things that are outside the
boundary that are external to the system. So, it could be a cell and its milieu; it could be
a phenotype; it could be me and my environment. Well, at any scale, there has to be this division.
Now, the very existence of that separation, that Markov blanket, in conjunction with the
assumption that that system exists over time, tells you something quite profound about
the behavior of the internal states and the states that constitute the Markov blanket.
This is a bit abstract, but it is actually quite simple. The Markov blanket has two bits to
it. There are the sensory states that are just defined because they don’t influence the external
states, but they do influence the internal states. So sensory information, for example, would be
mediated by sensory states as they get from the outside world into my internal world, my brain.
And there are active states that go in the other direction. So, they influence external states but
are not influenced by external states. They are actually dependent upon the internal states.
If I take myself as a model of my world, my active states would be how I am currently moving,
whereas my sensory states would be the activities of my photoreceptors, all those sensory organs,
and sensory epithelia I had at my disposal. Let’s put that Markov blanket aside for one moment
and just think about what it means for a system to exist over periods of time. What that means is
that it is effectively resisting a dispersion by random fluctuations. Perhaps the simplest
example would be if I dropped or placed a drop of ink in a cup of water, then almost immediately,
it would start to disperse as random fluctuations disperse all the molecules around. And, I
would not call that drop of ink a living drop of ink because it has dispersed. If, however,
I placed a drop of ink in some water and then, to your amazement, you saw it gather itself up,
then relax a bit, and gather itself up again, like it was breathing as if time was reversed.
You would say there’s something very peculiar about that drop of ink. It’s almost as if it was
living, and you become quickly convinced it was alive. And the only reason you would endow
it with the property of self-organized life, biotic self-organization, is that it’s not
dispersing. And the only reason it’s not dispersing is that all of its internal states and
its Markov blanket that separates it from the rest of the water are moving toward the center of the
drop. The flow of the molecules of the system is exactly countering the dispersive forces that
are trying to disperse it throughout the water. Now that flow, operationally or mathematically,
can provably be shown to be simply moving uphill on the probability distribution of where the
ink molecules should be. And that probability distribution mathematically is also the same as
something called the Bayesian model of evidence. I can’t, I don’t have time to go into it, but
it is a beautiful observation that the defining dynamics of any system that does not dissipate
over time is that they, on average, will move or their states will flow so as to maximize model
evidence, Bayesian model evidence. So, that means that if a system exists, then it will appear to
maximize Bayesian model evidence; it will appear to be a little Bayesian engine. It will appear
as if it has a model of its world. Why? Well, because of that system, let’s now go back to
the Markov blanket that comprises the active and sensory states and the internal states that
are encompassed by the Markov blanket. The law, the rule which says that all of the states
must maximize model evidence which is also known as marginal likelihood, is also
an inverse upper bounded by free energy, hence the free energy principle. All of those
states have to maximize marginal likelihood or minimize free energy, including action. That
means actions and sensations on the internal states are all doing the same thing. Which
means that we can understand the internal states say of the brain as modeling the world
because they are maximizing the Bayesian model evidence for a model of the world or me. At the
same time, my action is also trying to maximize the evidence for my model of the world.
So, put very simply, almost by definition, I am in the game of garnering information that
maximizes the evidence of my own existence, and that’s basically the free energy principle.
It’s a corollary or a consequence of any system that doesn’t dissipate; it looks as if it has to
behave as if it is maximizing actively soliciting information from the environment and modeling
that information as a model of the environment to maximize the evidence for its own existence.
And that’s where we started with the long history of Helmholtz’s notion of unconscious inference
right through to modern-day machine learning formulations, for example, the Helmholtz
machine of Geoffrey Hinton and Peter Dayan. That can be unpacked at many, many different
levels, and it has provided a very useful framework within which to understand how that
free energy principle is complied with by the biology and the anatomy and the physiology
of the brain. What it tells you is that the anatomy of any system has to contain with it a
model of the environment in which that system is immersed. Which means that if we live in a
world that has some deep hierarchical structure, in which there is action at a distance, for
example, the color of objects around me is determined by the instant light as it comes almost
instantaneously to my eye, or a falling body is caused by gravity, then my brain must recapitulate
that causal structure, and of course, it does. The very fact that we have nerve cells
with long slender connections connecting each other at a distance speaks exactly to
the causal architectures of the world that we inhabit have this action at a distance
and this sparse connectivity. Furthermore, the hierarchical structure of the world is
recapitulated in the neuronal structures that constitute the hierarchies of the
connectome or the hierarchical disposition of functionally specialized brain areas.
You can go further; if the brain is truly a statistical model of the world it inhabits, can we
understand some fundaments of brain organizations, such as the distinction between what and
where streams in the brain? So a very powerful observation, a principle of functional
specialization, is that where processing for a stream of brain areas roughly down here and
a more dorsal stream is concerned with what. That may be a simple reflection of the fact that
we live in a universe where different things can be in different positions so that we can
statistically separate the whatness from the whereness. If we lived in a universe where
whenever something moved, it also changed its nature, we couldn’t do that. So, just by looking
at the brain, I can tell you the sort of universe that you inhabit under the free energy principle,
under the assumption that your brain has become a model of the environment that it inhabits.
The free energy principle has been quite useful from my perspective and that of my
colleagues largely because it shows the connections between previous theories. There
are many global brain theories that have been brought to bear. For example, the principle
of minimum redundancy, maximum efficiency, notions of the brain extracting as much
information as it can from the environment. There are other theories that speak to how we
select and value certain behaviors. It’s useful to see how all of these become special cases
of a variational principle, one of which is the free energy principle. Which means that you
can now talk to different disciplines and see how one particular construct, theoretical and or
empirical evidence speaks to another theoretical construct and essentially see how they are
approaching the same problem from different perspectives. Because you’ve got a principal
framework, it also allows you to make a very particular hypothesis about the process
theories that would conform to the principle. So, all I’ve said so far is that,
in principle, every internal state, every action that I make, and every
sensation that I gather should be at the service of minimizing variational free energy
or maximizing the marginal likelihood. How? How do you do that? How does a brain do that? But,
if you know what the objective function is, if you know what the process is, and what the
imperatives are, you can then cast it in terms of processes. For example, I can say: “well,
this minimization of variational free energy or maximization of Bayesian model evidence is a
hill climbing or gradient descent algorithm. So, I can now write down a differential
equation where everything, every neuronal state, and physiological variable in the
brain now becomes describable as a differential equation given other states in the brain. And
if that equation is true, then I can now go and map the variables to physiological processes.
And if one plays that game, you can go an enormous way in starting to understand not just anatomy but
also physiology, and also can generate questions because there are alternative processes that
don’t conform with the same principle. So does the brain use sampling techniques to maximize
model evidence, or does it use hill climbing optimization schemes and variational schemes? So,
you start to generate a whole testable raft of hypotheses pertaining to the process theory that
are all consistent with the overarching principle.
(this is mostly copy-past and I already did post this... um somewhere, maybe on Astral Codex Ten, maybe here but I can't find it?)
I believe the best way to think about the FEP is "one layer more meta than predictive processing" - it's a very general principle and you can derive useful things from it. This very meta-ness appears to be very attractive to philosophers (e.g. most of the predictive mind papers cover the FEP) but also makes it very difficult to apply practically.
In the Video, Friston motivates the FEP with the reasoning that living creatures have maintained their boundaries over time. The reasoning goes like this (my summary, not his): 1. Every living creature is alive because it didn't die in the past (the same goes for it's ancestors before reproduction). Most parts of the world and situations are dangerous, so by just randomly fooling around, a creature probably dies. 2. There's a boundary between you and the world around you (or any other living creature). If this boundary get's destroyed, you die (e.g. if you accidentally run into a knife). This boundary also defines your ability to sense the world around you (biology: senses / system theory: inputs) and your ability to manipulate the world (biology: action / system theory: output). 3. Following from 1) and 2): There's an evolutionary drive to ensure survival, but the world is separated from you. Any living creature shaped by evolution adopted to this by developing mechanisms that internally mirror and model the world around them to increase their chances of avoiding harmful situations. Hence the "avoidance of entropy" and "minimise a quantity called 'free energy'" (equals "adopt your internal model to optimally represent and predict the world around you").
So from this understanding, evolution has shaped brains to model the world around them as if they were following a general rule of "minimize surprisal" or "minimize free energy" or "maximize marginal likelyhood". This is really difficult to comprehend, but I at least have a rough understanding of what Fristion might mean with "bayesian model evidence", so I'll shortly try to explain that. Note that this is a very technical thing and I'll try to explain it as best as I can with both engineering-related and layman terms.
There's the notion of filters (easiest version from math-side is a Kalman filter, non-linear versions are called generalized filters) that tune the internal variables of a model to incoming measurement data. Think of it like this: - you have a mathematical model of something, e.g. a pendulum (state is described by angle position and velocity) - you get input, e.g. from an acceleration sensor (measured acceleration) - the filter will automatically estimate the current position and velocity and automatically update as new data is received - most importantly, the filter is a probabilistic estimate - it includes not only a "best guess" estimate but also a range of uncertainty
Now, assume you have several possible models to use, maybe a very simple model vs. one that includes friction and nonlinearities. Which model should you use? This is where model evidence comes into play: We can run all models in parallel and ask each model to predict upcoming measurements; then shift trust between the models based on how well they are currently performing. This should lead to more trust placed on the simple model in early tuning stages (as it'll zoom in on good parameter sets very early that "roughly get it right"), but the more detailed model slowly taking over as more data comes in and it as had enough time to fit it's internal prediction structure. This process of "model comparison" is what Friston refers to with "bayesian model evidence" (if I'm not missing something)
For more discussions on state variables and filters, see also the comment I wrote here https://www.reddit.com/r/PredictiveProcessing/comments/o17eel/eli5_what_does_state_mean_in_laymans_terms/ and the linked post I wrote on Kalman filters a few years ago: https://hmbd.wordpress.com/2017/01/21/a-kalman-filter-can-do-interesting-things-like-filtering-poll-results/
If you are only puzzled by the ‚as if‘ notion: I assume this is just a modest and careful expression. We cannot know if the nervous system or living beings in general really follow the FEP and whether information theoretical concepts and thermodynamics really line up. Yes, the formulas are the same. Yes, we can derive a heuristic proof (Karl‘s words) from simple, quasi self evident principles. So it looks as if this works out and suggests that FEP explanations are useful - especially in comparison to our previous models. If a system behaves as if it does X you could also claim it does X. However, this requires further evidence and could be even a metaphysical question.
One additional example, Karl goes on suggesting that the nervous system recapitulates a deep hierarchical causal structure because the world is structured in this particular way. (This leads to the notion of a Markovian monism.) However, I like to think about this also in the weaker as if version. There could be other reasons why our nervous systems needs to model the external world in this hierarchical fashion (e.g., computational costs, encoding of complex probability distributions by simpler distributions). So it appears as if the world is isomorphic to the way we model it. Now, you have to decide if this is truly the case and based on your own choice you will land somewhere on the realism-idealism spectrum.
That is why I like the as if notion. You can decide for yourself to what degree you want to commit to the theory and still have something useful and relatable to work with.