Free Energy Principle — Karl Friston

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

(this is mostly copy-past and I already did post this... um somewhere, maybe on Astral Codex Ten, maybe here but I can't find it?)

I believe the best way to think about the FEP is "one layer more meta than predictive processing" - it's a very general principle and you can derive useful things from it. This very meta-ness appears to be very attractive to philosophers (e.g. most of the predictive mind papers cover the FEP) but also makes it very difficult to apply practically.

In the Video, Friston motivates the FEP with the reasoning that living creatures have maintained their boundaries over time. The reasoning goes like this (my summary, not his): 1. Every living creature is alive because it didn't die in the past (the same goes for it's ancestors before reproduction). Most parts of the world and situations are dangerous, so by just randomly fooling around, a creature probably dies. 2. There's a boundary between you and the world around you (or any other living creature). If this boundary get's destroyed, you die (e.g. if you accidentally run into a knife). This boundary also defines your ability to sense the world around you (biology: senses / system theory: inputs) and your ability to manipulate the world (biology: action / system theory: output). 3. Following from 1) and 2): There's an evolutionary drive to ensure survival, but the world is separated from you. Any living creature shaped by evolution adopted to this by developing mechanisms that internally mirror and model the world around them to increase their chances of avoiding harmful situations. Hence the "avoidance of entropy" and "minimise a quantity called 'free energy'" (equals "adopt your internal model to optimally represent and predict the world around you").

So from this understanding, evolution has shaped brains to model the world around them as if they were following a general rule of "minimize surprisal" or "minimize free energy" or "maximize marginal likelyhood". This is really difficult to comprehend, but I at least have a rough understanding of what Fristion might mean with "bayesian model evidence", so I'll shortly try to explain that. Note that this is a very technical thing and I'll try to explain it as best as I can with both engineering-related and layman terms.

There's the notion of filters (easiest version from math-side is a Kalman filter, non-linear versions are called generalized filters) that tune the internal variables of a model to incoming measurement data. Think of it like this: - you have a mathematical model of something, e.g. a pendulum (state is described by angle position and velocity) - you get input, e.g. from an acceleration sensor (measured acceleration) - the filter will automatically estimate the current position and velocity and automatically update as new data is received - most importantly, the filter is a probabilistic estimate - it includes not only a "best guess" estimate but also a range of uncertainty

Now, assume you have several possible models to use, maybe a very simple model vs. one that includes friction and nonlinearities. Which model should you use? This is where model evidence comes into play: We can run all models in parallel and ask each model to predict upcoming measurements; then shift trust between the models based on how well they are currently performing. This should lead to more trust placed on the simple model in early tuning stages (as it'll zoom in on good parameter sets very early that "roughly get it right"), but the more detailed model slowly taking over as more data comes in and it as had enough time to fit it's internal prediction structure. This process of "model comparison" is what Friston refers to with "bayesian model evidence" (if I'm not missing something)

For more discussions on state variables and filters, see also the comment I wrote here https://www.reddit.com/r/PredictiveProcessing/comments/o17eel/eli5_what_does_state_mean_in_laymans_terms/ and the linked post I wrote on Kalman filters a few years ago: https://hmbd.wordpress.com/2017/01/21/a-kalman-filter-can-do-interesting-things-like-filtering-poll-results/

👍︎︎ 4 👤︎︎ u/Daniel_HMBD 📅︎︎ Jun 19 2021 🗫︎ replies

If you are only puzzled by the ‚as if‘ notion: I assume this is just a modest and careful expression. We cannot know if the nervous system or living beings in general really follow the FEP and whether information theoretical concepts and thermodynamics really line up. Yes, the formulas are the same. Yes, we can derive a heuristic proof (Karl‘s words) from simple, quasi self evident principles. So it looks as if this works out and suggests that FEP explanations are useful - especially in comparison to our previous models. If a system behaves as if it does X you could also claim it does X. However, this requires further evidence and could be even a metaphysical question.

One additional example, Karl goes on suggesting that the nervous system recapitulates a deep hierarchical causal structure because the world is structured in this particular way. (This leads to the notion of a Markovian monism.) However, I like to think about this also in the weaker as if version. There could be other reasons why our nervous systems needs to model the external world in this hierarchical fashion (e.g., computational costs, encoding of complex probability distributions by simpler distributions). So it appears as if the world is isomorphic to the way we model it. Now, you have to decide if this is truly the case and based on your own choice you will land somewhere on the realism-idealism spectrum.

That is why I like the as if notion. You can decide for yourself to what degree you want to commit to the theory and still have something useful and relatable to work with.

👍︎︎ 3 👤︎︎ u/sweetneuron 📅︎︎ Jun 19 2021 🗫︎ replies

Captions

The Free Energy Principle originally emerged from systems neurosciences as a way, a principled way, of understanding what the brain does and how it does it. Subsequently, the principles proved to be so simple and so powerful that they have been applied in a variety of contexts. So one could almost regard the free energy principle as an organizing principle for any living system that shows the characteristics of life. So, the reason I start like that is that there are two roads to explaining or understanding the free energy principle. You can either start from the perspective of people like Helmholtz in the 19th Century trying to understand unconscious inference in the brain and build a story through analysis by synthesis and psychology through to current and exciting developments in machine learning – things like Geoffrey Hinton’s Helmholtz machine. And then how that has become contextualized in the enactivist or the embodied cognition context. I’m generalizing these notions, and you end up with the free energy principle, or you can start from the top and just ask very simple questions about what it is to be alive. And, if you are alive and you exist, what sorts of behaviors must you show? And in fact, if you answer those questions, you end up with exactly the same answers that you would have gotten had you followed the historical route. For brevity, I’ll take the high road. I’ll go from the minimalist assumption that things exist and then try and unpack that and show how one can get to notions of the brain as an inference engine, sometimes called the Bayesian brain hypothesis. The brain is one of the best examples of an organ that is actively constructing explanations through its own sampling of the world. So, this inactive perspective is very important because not only does the brain then have to explain all the sensory input, but it also has to choose which sensory input to sample. It is in charge of gathering information and evidence for its own predictions and own beliefs about the world. But I’ve jumped ahead, so now I have to explain to you why is it that any system that exists will behave as if it has a model of the world and it’s trying to gather evidence for its own model of the world. So, the story starts just by acknowledging that if you want to talk about something, there has to be a separation between the thing you are talking about and everything else. And, if there were no boundaries, there would be nothing because there would be no distinctions between the thing and not that thing. Statistically speaking, that distinction or that boundary is called a Markov blanket. It’s just a mathematical way of separating states of some abstract world system: organism, culture, life, cell, brain into things that are internal to the boundary that is owned by that system and things that are outside the boundary that are external to the system. So, it could be a cell and its milieu; it could be a phenotype; it could be me and my environment. Well, at any scale, there has to be this division. Now, the very existence of that separation, that Markov blanket, in conjunction with the assumption that that system exists over time, tells you something quite profound about the behavior of the internal states and the states that constitute the Markov blanket. This is a bit abstract, but it is actually quite simple. The Markov blanket has two bits to it. There are the sensory states that are just defined because they don’t influence the external states, but they do influence the internal states. So sensory information, for example, would be mediated by sensory states as they get from the outside world into my internal world, my brain. And there are active states that go in the other direction. So, they influence external states but are not influenced by external states. They are actually dependent upon the internal states. If I take myself as a model of my world, my active states would be how I am currently moving, whereas my sensory states would be the activities of my photoreceptors, all those sensory organs, and sensory epithelia I had at my disposal. Let’s put that Markov blanket aside for one moment and just think about what it means for a system to exist over periods of time. What that means is that it is effectively resisting a dispersion by random fluctuations. Perhaps the simplest example would be if I dropped or placed a drop of ink in a cup of water, then almost immediately, it would start to disperse as random fluctuations disperse all the molecules around. And, I would not call that drop of ink a living drop of ink because it has dispersed. If, however, I placed a drop of ink in some water and then, to your amazement, you saw it gather itself up, then relax a bit, and gather itself up again, like it was breathing as if time was reversed. You would say there’s something very peculiar about that drop of ink. It’s almost as if it was living, and you become quickly convinced it was alive. And the only reason you would endow it with the property of self-organized life, biotic self-organization, is that it’s not dispersing. And the only reason it’s not dispersing is that all of its internal states and its Markov blanket that separates it from the rest of the water are moving toward the center of the drop. The flow of the molecules of the system is exactly countering the dispersive forces that are trying to disperse it throughout the water. Now that flow, operationally or mathematically, can provably be shown to be simply moving uphill on the probability distribution of where the ink molecules should be. And that probability distribution mathematically is also the same as something called the Bayesian model of evidence. I can’t, I don’t have time to go into it, but it is a beautiful observation that the defining dynamics of any system that does not dissipate over time is that they, on average, will move or their states will flow so as to maximize model evidence, Bayesian model evidence. So, that means that if a system exists, then it will appear to maximize Bayesian model evidence; it will appear to be a little Bayesian engine. It will appear as if it has a model of its world. Why? Well, because of that system, let’s now go back to the Markov blanket that comprises the active and sensory states and the internal states that are encompassed by the Markov blanket. The law, the rule which says that all of the states must maximize model evidence which is also known as marginal likelihood, is also an inverse upper bounded by free energy, hence the free energy principle. All of those states have to maximize marginal likelihood or minimize free energy, including action. That means actions and sensations on the internal states are all doing the same thing. Which means that we can understand the internal states say of the brain as modeling the world because they are maximizing the Bayesian model evidence for a model of the world or me. At the same time, my action is also trying to maximize the evidence for my model of the world. So, put very simply, almost by definition, I am in the game of garnering information that maximizes the evidence of my own existence, and that’s basically the free energy principle. It’s a corollary or a consequence of any system that doesn’t dissipate; it looks as if it has to behave as if it is maximizing actively soliciting information from the environment and modeling that information as a model of the environment to maximize the evidence for its own existence. And that’s where we started with the long history of Helmholtz’s notion of unconscious inference right through to modern-day machine learning formulations, for example, the Helmholtz machine of Geoffrey Hinton and Peter Dayan. That can be unpacked at many, many different levels, and it has provided a very useful framework within which to understand how that free energy principle is complied with by the biology and the anatomy and the physiology of the brain. What it tells you is that the anatomy of any system has to contain with it a model of the environment in which that system is immersed. Which means that if we live in a world that has some deep hierarchical structure, in which there is action at a distance, for example, the color of objects around me is determined by the instant light as it comes almost instantaneously to my eye, or a falling body is caused by gravity, then my brain must recapitulate that causal structure, and of course, it does. The very fact that we have nerve cells with long slender connections connecting each other at a distance speaks exactly to the causal architectures of the world that we inhabit have this action at a distance and this sparse connectivity. Furthermore, the hierarchical structure of the world is recapitulated in the neuronal structures that constitute the hierarchies of the connectome or the hierarchical disposition of functionally specialized brain areas. You can go further; if the brain is truly a statistical model of the world it inhabits, can we understand some fundaments of brain organizations, such as the distinction between what and where streams in the brain? So a very powerful observation, a principle of functional specialization, is that where processing for a stream of brain areas roughly down here and a more dorsal stream is concerned with what. That may be a simple reflection of the fact that we live in a universe where different things can be in different positions so that we can statistically separate the whatness from the whereness. If we lived in a universe where whenever something moved, it also changed its nature, we couldn’t do that. So, just by looking at the brain, I can tell you the sort of universe that you inhabit under the free energy principle, under the assumption that your brain has become a model of the environment that it inhabits. The free energy principle has been quite useful from my perspective and that of my colleagues largely because it shows the connections between previous theories. There are many global brain theories that have been brought to bear. For example, the principle of minimum redundancy, maximum efficiency, notions of the brain extracting as much information as it can from the environment. There are other theories that speak to how we select and value certain behaviors. It’s useful to see how all of these become special cases of a variational principle, one of which is the free energy principle. Which means that you can now talk to different disciplines and see how one particular construct, theoretical and or empirical evidence speaks to another theoretical construct and essentially see how they are approaching the same problem from different perspectives. Because you’ve got a principal framework, it also allows you to make a very particular hypothesis about the process theories that would conform to the principle. So, all I’ve said so far is that, in principle, every internal state, every action that I make, and every sensation that I gather should be at the service of minimizing variational free energy or maximizing the marginal likelihood. How? How do you do that? How does a brain do that? But, if you know what the objective function is, if you know what the process is, and what the imperatives are, you can then cast it in terms of processes. For example, I can say: “well, this minimization of variational free energy or maximization of Bayesian model evidence is a hill climbing or gradient descent algorithm. So, I can now write down a differential equation where everything, every neuronal state, and physiological variable in the brain now becomes describable as a differential equation given other states in the brain. And if that equation is true, then I can now go and map the variables to physiological processes. And if one plays that game, you can go an enormous way in starting to understand not just anatomy but also physiology, and also can generate questions because there are alternative processes that don’t conform with the same principle. So does the brain use sampling techniques to maximize model evidence, or does it use hill climbing optimization schemes and variational schemes? So, you start to generate a whole testable raft of hypotheses pertaining to the process theory that are all consistent with the overarching principle.

Info

Channel: Serious Science

Views: 116,732

Rating: undefined out of 5

Keywords: science, lecture, Serious Science, brain, neuroscience, energy, cell

Id: NIu_dJGyIQI

Channel Id: undefined

Length: 15min 12sec (912 seconds)

Published: Fri Jun 16 2017