Free Energy Principle — Karl Friston

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

(this is mostly copy-past and I already did post this... um somewhere, maybe on Astral Codex Ten, maybe here but I can't find it?)

I believe the best way to think about the FEP is "one layer more meta than predictive processing" - it's a very general principle and you can derive useful things from it. This very meta-ness appears to be very attractive to philosophers (e.g. most of the predictive mind papers cover the FEP) but also makes it very difficult to apply practically.

In the Video, Friston motivates the FEP with the reasoning that living creatures have maintained their boundaries over time. The reasoning goes like this (my summary, not his): 1. Every living creature is alive because it didn't die in the past (the same goes for it's ancestors before reproduction). Most parts of the world and situations are dangerous, so by just randomly fooling around, a creature probably dies. 2. There's a boundary between you and the world around you (or any other living creature). If this boundary get's destroyed, you die (e.g. if you accidentally run into a knife). This boundary also defines your ability to sense the world around you (biology: senses / system theory: inputs) and your ability to manipulate the world (biology: action / system theory: output). 3. Following from 1) and 2): There's an evolutionary drive to ensure survival, but the world is separated from you. Any living creature shaped by evolution adopted to this by developing mechanisms that internally mirror and model the world around them to increase their chances of avoiding harmful situations. Hence the "avoidance of entropy" and "minimise a quantity called 'free energy'" (equals "adopt your internal model to optimally represent and predict the world around you").

So from this understanding, evolution has shaped brains to model the world around them as if they were following a general rule of "minimize surprisal" or "minimize free energy" or "maximize marginal likelyhood". This is really difficult to comprehend, but I at least have a rough understanding of what Fristion might mean with "bayesian model evidence", so I'll shortly try to explain that. Note that this is a very technical thing and I'll try to explain it as best as I can with both engineering-related and layman terms.

There's the notion of filters (easiest version from math-side is a Kalman filter, non-linear versions are called generalized filters) that tune the internal variables of a model to incoming measurement data. Think of it like this: - you have a mathematical model of something, e.g. a pendulum (state is described by angle position and velocity) - you get input, e.g. from an acceleration sensor (measured acceleration) - the filter will automatically estimate the current position and velocity and automatically update as new data is received - most importantly, the filter is a probabilistic estimate - it includes not only a "best guess" estimate but also a range of uncertainty

Now, assume you have several possible models to use, maybe a very simple model vs. one that includes friction and nonlinearities. Which model should you use? This is where model evidence comes into play: We can run all models in parallel and ask each model to predict upcoming measurements; then shift trust between the models based on how well they are currently performing. This should lead to more trust placed on the simple model in early tuning stages (as it'll zoom in on good parameter sets very early that "roughly get it right"), but the more detailed model slowly taking over as more data comes in and it as had enough time to fit it's internal prediction structure. This process of "model comparison" is what Friston refers to with "bayesian model evidence" (if I'm not missing something)

For more discussions on state variables and filters, see also the comment I wrote here https://www.reddit.com/r/PredictiveProcessing/comments/o17eel/eli5_what_does_state_mean_in_laymans_terms/ and the linked post I wrote on Kalman filters a few years ago: https://hmbd.wordpress.com/2017/01/21/a-kalman-filter-can-do-interesting-things-like-filtering-poll-results/

👍︎︎ 4 👤︎︎ u/Daniel_HMBD 📅︎︎ Jun 19 2021 🗫︎ replies

If you are only puzzled by the ‚as if‘ notion: I assume this is just a modest and careful expression. We cannot know if the nervous system or living beings in general really follow the FEP and whether information theoretical concepts and thermodynamics really line up. Yes, the formulas are the same. Yes, we can derive a heuristic proof (Karl‘s words) from simple, quasi self evident principles. So it looks as if this works out and suggests that FEP explanations are useful - especially in comparison to our previous models. If a system behaves as if it does X you could also claim it does X. However, this requires further evidence and could be even a metaphysical question.

One additional example, Karl goes on suggesting that the nervous system recapitulates a deep hierarchical causal structure because the world is structured in this particular way. (This leads to the notion of a Markovian monism.) However, I like to think about this also in the weaker as if version. There could be other reasons why our nervous systems needs to model the external world in this hierarchical fashion (e.g., computational costs, encoding of complex probability distributions by simpler distributions). So it appears as if the world is isomorphic to the way we model it. Now, you have to decide if this is truly the case and based on your own choice you will land somewhere on the realism-idealism spectrum.

That is why I like the as if notion. You can decide for yourself to what degree you want to commit to the theory and still have something useful and relatable to work with.

👍︎︎ 3 👤︎︎ u/sweetneuron 📅︎︎ Jun 19 2021 🗫︎ replies
Captions
The Free Energy Principle originally  emerged from systems neurosciences as a way,   a principled way, of understanding what the  brain does and how it does it. Subsequently,   the principles proved to be so simple  and so powerful that they have been   applied in a variety of contexts. So one could  almost regard the free energy principle as an   organizing principle for any living system  that shows the characteristics of life.  So, the reason I start like that is that there are  two roads to explaining or understanding the free   energy principle. You can either start from the  perspective of people like Helmholtz in the 19th   Century trying to understand unconscious inference  in the brain and build a story through analysis   by synthesis and psychology through to current and  exciting developments in machine learning – things   like Geoffrey Hinton’s Helmholtz machine. And  then how that has become contextualized in   the enactivist or the embodied cognition  context. I’m generalizing these notions,   and you end up with the free energy principle,  or you can start from the top and just ask   very simple questions about what it is to be  alive. And, if you are alive and you exist,   what sorts of behaviors must you show? And in  fact, if you answer those questions, you end   up with exactly the same answers that you would  have gotten had you followed the historical route.  For brevity, I’ll take the high road. I’ll go from  the minimalist assumption that things exist and   then try and unpack that and show how one can get  to notions of the brain as an inference engine,   sometimes called the Bayesian brain hypothesis.  The brain is one of the best examples of an   organ that is actively constructing explanations  through its own sampling of the world. So, this   inactive perspective is very important because  not only does the brain then have to explain   all the sensory input, but it also has to choose  which sensory input to sample. It is in charge   of gathering information and evidence for its own  predictions and own beliefs about the world. But   I’ve jumped ahead, so now I have to explain to you  why is it that any system that exists will behave   as if it has a model of the world and it’s trying  to gather evidence for its own model of the world.  So, the story starts just by acknowledging  that if you want to talk about something,   there has to be a separation between the thing  you are talking about and everything else. And, if   there were no boundaries, there would be nothing  because there would be no distinctions between the   thing and not that thing. Statistically speaking,  that distinction or that boundary is called a   Markov blanket. It’s just a mathematical way of  separating states of some abstract world system:   organism, culture, life, cell, brain into things  that are internal to the boundary that is owned   by that system and things that are outside the  boundary that are external to the system. So,   it could be a cell and its milieu; it could be  a phenotype; it could be me and my environment.   Well, at any scale, there has to be this division.  Now, the very existence of that separation,   that Markov blanket, in conjunction with the  assumption that that system exists over time,   tells you something quite profound about  the behavior of the internal states and   the states that constitute the Markov blanket. This is a bit abstract, but it is actually quite   simple. The Markov blanket has two bits to  it. There are the sensory states that are just   defined because they don’t influence the external  states, but they do influence the internal states.   So sensory information, for example, would be  mediated by sensory states as they get from the   outside world into my internal world, my brain.  And there are active states that go in the other   direction. So, they influence external states but  are not influenced by external states. They are   actually dependent upon the internal states.  If I take myself as a model of my world, my   active states would be how I am currently moving,  whereas my sensory states would be the activities   of my photoreceptors, all those sensory organs,  and sensory epithelia I had at my disposal. Let’s put that Markov blanket aside for one moment  and just think about what it means for a system to   exist over periods of time. What that means is  that it is effectively resisting a dispersion   by random fluctuations. Perhaps the simplest  example would be if I dropped or placed a drop   of ink in a cup of water, then almost immediately,  it would start to disperse as random fluctuations   disperse all the molecules around. And, I  would not call that drop of ink a living drop   of ink because it has dispersed. If, however,  I placed a drop of ink in some water and then,   to your amazement, you saw it gather itself up,  then relax a bit, and gather itself up again,   like it was breathing as if time was reversed.  You would say there’s something very peculiar   about that drop of ink. It’s almost as if it was  living, and you become quickly convinced it was   alive. And the only reason you would endow  it with the property of self-organized life,   biotic self-organization, is that it’s not  dispersing. And the only reason it’s not   dispersing is that all of its internal states and  its Markov blanket that separates it from the rest   of the water are moving toward the center of the  drop. The flow of the molecules of the system is   exactly countering the dispersive forces that  are trying to disperse it throughout the water.   Now that flow, operationally or mathematically,  can provably be shown to be simply moving uphill   on the probability distribution of where the  ink molecules should be. And that probability   distribution mathematically is also the same as  something called the Bayesian model of evidence. I can’t, I don’t have time to go into it, but  it is a beautiful observation that the defining   dynamics of any system that does not dissipate  over time is that they, on average, will move   or their states will flow so as to maximize model  evidence, Bayesian model evidence. So, that means   that if a system exists, then it will appear to  maximize Bayesian model evidence; it will appear   to be a little Bayesian engine. It will appear  as if it has a model of its world. Why? Well,   because of that system, let’s now go back to  the Markov blanket that comprises the active   and sensory states and the internal states that  are encompassed by the Markov blanket. The law,   the rule which says that all of the states  must maximize model evidence which is also   known as marginal likelihood, is also  an inverse upper bounded by free energy,   hence the free energy principle. All of those  states have to maximize marginal likelihood or   minimize free energy, including action. That  means actions and sensations on the internal   states are all doing the same thing. Which  means that we can understand the internal   states say of the brain as modeling the world  because they are maximizing the Bayesian model   evidence for a model of the world or me. At the  same time, my action is also trying to maximize   the evidence for my model of the world.  So, put very simply, almost by definition,   I am in the game of garnering information that  maximizes the evidence of my own existence,   and that’s basically the free energy principle.  It’s a corollary or a consequence of any system   that doesn’t dissipate; it looks as if it has to  behave as if it is maximizing actively soliciting   information from the environment and modeling  that information as a model of the environment   to maximize the evidence for its own existence.  And that’s where we started with the long history   of Helmholtz’s notion of unconscious inference  right through to modern-day machine learning   formulations, for example, the Helmholtz  machine of Geoffrey Hinton and Peter Dayan.  That can be unpacked at many, many different  levels, and it has provided a very useful   framework within which to understand how that  free energy principle is complied with by the   biology and the anatomy and the physiology  of the brain. What it tells you is that the   anatomy of any system has to contain with it a  model of the environment in which that system   is immersed. Which means that if we live in a  world that has some deep hierarchical structure,   in which there is action at a distance, for  example, the color of objects around me is   determined by the instant light as it comes almost  instantaneously to my eye, or a falling body is   caused by gravity, then my brain must recapitulate  that causal structure, and of course, it does.  The very fact that we have nerve cells  with long slender connections connecting   each other at a distance speaks exactly to  the causal architectures of the world that   we inhabit have this action at a distance  and this sparse connectivity. Furthermore,   the hierarchical structure of the world is  recapitulated in the neuronal structures   that constitute the hierarchies of the  connectome or the hierarchical disposition   of functionally specialized brain areas. You can go further; if the brain is truly a   statistical model of the world it inhabits, can we  understand some fundaments of brain organizations,   such as the distinction between what and  where streams in the brain? So a very   powerful observation, a principle of functional  specialization, is that where processing for a   stream of brain areas roughly down here and  a more dorsal stream is concerned with what.   That may be a simple reflection of the fact that  we live in a universe where different things can   be in different positions so that we can  statistically separate the whatness from   the whereness. If we lived in a universe where  whenever something moved, it also changed its   nature, we couldn’t do that. So, just by looking  at the brain, I can tell you the sort of universe   that you inhabit under the free energy principle,  under the assumption that your brain has become a   model of the environment that it inhabits. The free energy principle has been quite   useful from my perspective and that of my  colleagues largely because it shows the   connections between previous theories. There  are many global brain theories that have been   brought to bear. For example, the principle  of minimum redundancy, maximum efficiency,   notions of the brain extracting as much  information as it can from the environment. There are other theories that speak to how we  select and value certain behaviors. It’s useful   to see how all of these become special cases  of a variational principle, one of which is   the free energy principle. Which means that you  can now talk to different disciplines and see   how one particular construct, theoretical and or  empirical evidence speaks to another theoretical   construct and essentially see how they are  approaching the same problem from different   perspectives. Because you’ve got a principal  framework, it also allows you to make a very   particular hypothesis about the process  theories that would conform to the principle. So, all I’ve said so far is that,  in principle, every internal state,   every action that I make, and every  sensation that I gather should be at   the service of minimizing variational free energy  or maximizing the marginal likelihood. How? How do   you do that? How does a brain do that? But,  if you know what the objective function is,   if you know what the process is, and what the  imperatives are, you can then cast it in terms   of processes. For example, I can say: “well,  this minimization of variational free energy   or maximization of Bayesian model evidence is a  hill climbing or gradient descent algorithm. So,   I can now write down a differential  equation where everything, every   neuronal state, and physiological variable in the  brain now becomes describable as a differential   equation given other states in the brain. And  if that equation is true, then I can now go   and map the variables to physiological processes. And if one plays that game, you can go an enormous   way in starting to understand not just anatomy but  also physiology, and also can generate questions   because there are alternative processes that  don’t conform with the same principle. So does   the brain use sampling techniques to maximize  model evidence, or does it use hill climbing   optimization schemes and variational schemes? So,  you start to generate a whole testable raft of   hypotheses pertaining to the process theory that  are all consistent with the overarching principle.
Info
Channel: Serious Science
Views: 116,732
Rating: undefined out of 5
Keywords: science, lecture, Serious Science, brain, neuroscience, energy, cell
Id: NIu_dJGyIQI
Channel Id: undefined
Length: 15min 12sec (912 seconds)
Published: Fri Jun 16 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.