welcome back so today i'm really excited to
tell you about a new kind of emerging field of machine learning where we're essentially
learning coordinate systems and dynamics at the same time for complex systems so especially
i'm going to talk about deep learning to discover coordinate systems where you can get simple or
parsimonious representations of the dynamics in those coordinates okay so in particular i'm going
to mostly be talking about autoencoder networks so here you might have some high dimensional
state x that describes your system maybe this is a video uh you're you're recording your
system and you're trying to learn the dynamics uh maybe of a pendulum swinging or of you know
some phenomena you're trying to observe and with deep learning we're going to essentially discover
a coordinate embedding phi and a decoder psi so that you can map into a latent space z that has
kind of the minimal essential information that you need to describe your system and specifically
we're going to try to learn these auto encoder networks so that we can represent the dynamics
in that latent space very efficiently or simply okay so in this latent space z again that's
like the minimal description of your system uh through this autoencoder network we're looking
for a dynamical system z dot equals f of z where f is as simple as possible and so this is really an
important dual problem so on the one hand you have uh the coordinate system discovery the the fee and
psi coordinates and on the other hand you have the dynamical systems discovery we're trying to learn
this efficient model f that accurately describes how this system evolves in time so i'm going to
walk you through a number of key architectures for how you can do this with deep learning uh what
restrictions we can put on f and phi and psi and also tie this a little bit to kind of the history
of coordinate discovery and dynamical systems okay good and i think this is a good example
kind of motivating what we would like to do so maybe we have a video of some complex system
in this case this is just a pendulum in my lab and what you would like to be able to do is from
this you know movie or this you know megapixel image that's evolving in time you would like
to be able to discover automatically without a human expert in the loop that there is a
key variable theta that describes the system the kind of minimal descriptive variable for
the motion of this pendulum is the angle theta and you'd also like to just discover that
the dynamical system governing the motion of this pendulum the evolution of theta is
given by a simple differential equation here so that's kind of at a very high
level what we'd like to be able to do my colleague nathan kutz calls this gopro physics
sometimes so you'd like to be able to have you know just a movie watching the world learning the
physics learning you know how clouds are evolving learning how pendulums swing things like that
learning how balls drop according to f equals ma okay so that's kind of what we want to do but this
is a difficult task to learn these coordinates and to learn these dynamics good and so you know i'm
going to zoom out a little bit and talk about just kind of general challenges we have when
we model dynamical systems x dot equals f of x again most of the world around us is dynamic
it's evolving in time and in space whether you're thinking about a fluid flow or an autonomous
vehicle the state of your brain you know your collection of neurons a weather system or a
disease model these are all dynamical systems that change in time according to some rules
on the right hand side some physics f x and so the big challenges that we have one of them
is often we don't know this model f of x we don't have a good model for the dynamics so we need
to discover that model in the case of the brain that's a good example non-linearity is another key
challenge so even a small amount of non-linearity really hampers our ability to simulate
estimate predict and control these systems in the real world from limited measurements
and again this is somewhere where finding good coordinate transformations can make a big
difference in simplifying nonlinear dynamics and the third challenge we very very frequently
face in the real world is high dimensionality times you know we do have this camera this movie
of a very high dimensional uh pixel space or we have a simulation or a measurement of a fluid
flow with you know thousands or millions of degrees of freedom and so we want to essentially
leverage the fact that patterns exist in that data to find those low dimensional representations
those kind of autoencoder coordinates i showed you so that we can discover the dynamics and maybe
even handle some of the the non-linearity okay so that's kind of a very high level overview
of some of the things we want to do with these autoencoder networks okay uh and again you know
i'm going to use the example of fluids it's one of my favorite examples we know that even in very
very high dimensional systems like weather systems or you know more classical fluid flows even though
we would require hundreds of thousands millions or billions of degrees of freedom to represent these
in a computer or with our satellite images we know that there are low dimensional patterns and a few
degrees of freedom that matter i really like this video here that shows kind of this low dimensional
pattern in this very high dimensional system and so i'm going to motivate
these deep autoencoder networks more classically through the singular value
decomposition okay so this goes by many many names in different fields singular value decomposition
principal components analysis proper orthogonal decomposition but if you have some complex system
that you believe has low dimensional patterns you can essentially compute the singular
value decomposition of a data matrix and you can decompose this this movie into a
linear combination of a very small number of modes or kind of eigen states that uh capture
most of the energy or variance of this system okay and the way i like to think about this
this is a data-driven generalization of the fourier transform this is essentially a fourier
transform that's tailored to this very specific problem you're looking at the specific data that
you've collected now in the uh lens of modern kind of neural networks you can look at the uh
the svd or the principal components analysis as a shallow linear auto encoder network where
you have a high dimensional state x that might be my whole flow field stacked into a column
vector and what i'm trying to do is i'm trying to find some latent representation some minimal
set of variables z so that if i compress down to z i can lift back up to the original state
and recover as much information about that high dimensional state as possible so this is
literally the u and the v matrices from the svd and this latent space z here would be the
amplitudes of these these modes these pod modes now in practice i would not recommend
actually using a neural network to compute the svd it's way more efficient to to
compute it using a qr factorization or some kind of classical numerical linear algebra
technique but as a as an abstraction we can think of the svd as a very very simple
neural network and so what we're going to do is generalize this linear coordinate in betting
and again this is a coordinate system to represent your dynamics we're going to generalize this
coordinate system by making it now a deep neural network so instead of one latent space layer now
we're going to have many many hidden layers for the encoder many hidden layers for the decoder and
our activation units our nodes are going to have non-linear activation functions and so what this
allows us to do is instead of learning a linear subspace where our data is efficiently represented
now we're able to learn a non-linear manifold parameterized by these coordinate z where our data
is efficiently represented and often that means we can get a massive reduction in degrees of freedom
needed to describe this latent space z good and So this is kind of what this abstract autoencoder network
is trying to represent usually I have many many hidden layers in 'Phi' and
in 'Psi' and I have non-linear activation functions so the 'z' parameterizes a manifold.
Good, and we can learn these essentially by constructing this architecture
in Tensorflow or Pytorch or whatever neural network training framework you like and
essentially cooking up a loss function so that we want the output minus the input to
have the smallest error possible given that we've choked down the information to maybe three
latent variables in 'z'. So you fix the number of latent variables you fix the architecture
and you train this so that it minimizes the error between the output and the input and it
learns then the latent features of that input. And so, like i mentioned earlier, what we're going
to do now is we're going to be training these autoencoder networks not just to represent the
data in 'x' efficiently but also to be able to predict how the dynamics and that latent space
evolves accurately and efficiently forward in time. So now we're going to be learning not just
the encoder and the decoder but we're going to be learning a representation of the dynamical
system 'f' that evolves the system forward in time. And this is becoming a big deal in the machine
learning community because now we're able to start understanding the physics of how our
movies or our high dimensional inputs are evolving in time. I like to give this example. I think
this is an extremely motivating example where what you have here is this great
visualization of the geocentric view of the solar system. So the blue dot is Earth, the yellow dot is
the sun and the red dots are the other planets in the solar system. If we have kind of a naive
coordinate system where the Earth is at the center of the solar system then you see that you actually
get extremely complex dynamics. These are really complicated circles on circles and
there's no obvious simple law that describes the motion of all of these bodies in this coordinate
system. But if we change our coordinates, so now we put the sun at the center of the solar system
and every other planet is going around the sun now this becomes a much much simpler dynamical
system and this is amenable to discovering the basic laws of physics 'F=ma' and Kepler's laws
of motion. So i think that this is an extremely motivating example that shows that
when you have the right coordinate system you have some hope of learning a dynamical system 'f' that's
simple. I could try to learn a dynamical system that describes this evolution here but that
differential equation is going to be much more complicated harder to interpret and understand
and generalize so getting the right coordinate systems is often essential to learning the right
dynamics. And that's been born true over and over throughout the history of physics. Good, so
that's what we're trying to do here we're going to learn the coordinate system that simplifies
the dynamics. Now I'm going to walk you through a few examples of how we've been doing this and
what some of the challenges and opportunities are. So one of my favorite networks here is one
that was developed by Kathleen Champion when she was a PhD student with Nathan and me and
this is essentially combining SINDy, or the sparse identification of nonlinear dynamics, to learn
a dynamical system in the latent space of an autoencoder and i think this is a really clever
network design so essentially what kathleen is doing is she's learning these encoder and decoder
networks these are big deep neural networks and there's additional loss functions
that we use in our neural network training so that the dynamics in z are essentially a sparse
linear combination of candidate non-linearities in a library so we basically build this big library
of the possible terms that could describe the dynamics in z and then we essentially try to find
the fewest the sparses combination of those terms that describe z1.z2.z3 dot in the latent space
and so to do this kathleen had to introduce a number of extra loss functions to make this the
cindy auto encoder converge to the right solution essentially um there's the original reconstruction
loss if you just remove the dynamics in the middle you still want your auto encoder to faithfully
capture your data to find a good latent space but then we have these additional cindy losses and
the cindy regularization so the l1 term at the end tries to make this set of column vectors as
sparse as possible so we have as few terms in the dynamics as possible that makes it so that our
dynamical system is interpretable and simple it's more like f equals m a than some big complicated
expression and then essentially these additional two loss functions take different parts of
this network and make sure that if i compute x dot here it's the same as if i took z dot
here and mapped it out and things like that so essentially you can compute chain rules
on different sub networks and make sure that the dynamics the time derivatives across those
different pieces of the network are consistent that essentially makes sure that you don't get
into weird issues where you just try to shrink the z as small as possible which would in principle
also make c small so this is a really cool network that kathleen developed she was able to learn
lots of very very sparse parsimonious dynamical systems in these latent spaces for complex
systems and one of her systems was actually a movie of a pendulum a synthetic movie of
a pendulum okay so uh this is a really nice way to learn kind of those dynamics and uh and
coordinates but what if we want to do more than just learn a sparse non-linear model what if we
actually want to try to find a coordinate system that actually linearizes our dynamics so we know
from kupeman operator theory that this is often possible now there's lots of caveats but there
are often these special coordinate systems fee that if i take my original dynamics and i map it
into you know the latent space through z i might even be able to expect a linear dynamical system
z dot equals l of z now why would i want this linear dynamics are kind of you know generically
solvable with textbook methods we can get optimal linear estimators and controllers with you know
one line of matlab or python so anytime we can make our system linear we gain a tremendous amount
uh in doing so now the big big big caveat here is that this puts a tremendous amount of uh
strain on the encoder and decoder networks so finding a coordinate system like in the cindy
auto encoder where we have sparse dynamics that's not that bad for the coordinate system to do we
can find a coordinate system where there's a few sparse non-linear terms this is related to
normal forms from dynamical systems theory but if we want to actually remove all of
those nonlinear terms and make the system truly linear that's a much much uh stronger set
of constraints that we're putting on this this encoder and decoder network so generally speaking
these are much harder to train much more expensive and these might be much nastier functions fee
to get these linear dynamics but the good thing is just like most of neural network
training this is a very big expensive offline training cost but once you've found
these coordinate systems you can use them forever online in a very fast and efficient way
you can do optimal estimation and control in this latent space once you've put
in the upfront cost of learning fee now again this could be a deep neural network with
non-linear activation functions but it could also be a shallow neural network with linear activation
functions in which case this kind of linear dynamics auto encoder would represent the dynamic
mode decomposition okay so if if phi and psi were shallow with linear activation units this would
essentially encode the dynamic mode decomposition so dmd has been used at length in fluid
mechanics and many other fields to get linear representations of nonlinear
systems like these periodic fluid flows and essentially dmd is finding some kind of a
linear regression model on those svd coordinates i talked about earlier okay and there's
another cool paper here that connects it to coupeman operator theory so if you take that
idea of the dynamic mode decomposition which is an extremely popular useful kind of general
purpose linear modeling technique and you take this neural network representation and you make
those encoders and decoders deep neural networks with nonlinear activation functions what we're
essentially doing is learning a non-linear analog of the dynamic mode decomposition we're learning
a nonlinear coordinate system that'll take an original non-linear system and make it look
linear and so that's exactly what many many groups in the community have done in the last
few years so i'm showing you work by bethany lush who was a postdoc with nathan kutzen myself
this is her deep koopman auto encoder where she takes her input and essentially through
some some hidden layers learns a coordinate system and encoder where she can get linear
dynamical systems again she looked at systems like like a pendulum or flow past a cylinder
where there is a continuous eigenvalue spectrum so she needed to kind of add a few innovations to
this network to parametrize these linear dynamics by the frequency of the system there have been
many many other groups that have developed similar deep kootman autoencoder networks that essentially
learn these coordinates where your dynamics can be simply or linearly represented and i want
to point out that those different approaches to these kind of koopman networks or these
non-linear analogs of dynamic mode decomposition come in many shapes and sizes and we talk
about this in a recent review paper we wrote on the coupeman operator theory this is kind of
a beast of a paper it's about 100 pages and it's all about how you find these coordinate systems
with neural networks and with other approaches and there's kind of two big philosophies that
the community has arrived at one of them is the the kind of constrictive autoencoder that
i showed you before where you have a high dimensional state you choke it down to a lower
dimensional latent space where you can model the dynamics linearly and then you can you know
kind of decode that latent space to your original state x but what many people in the community
have also done is take your your system state and actually lift it to a higher dimensional
latency variable z so you can take you know even if you have a high dimensional state x you
can lift it to an even higher dimensional state try to model the evolution in that higher
dimensional state with linear dynamics given by this matrix k and then you can always map
back down to x and these are two very very different philosophies of how you can get
embeddings that give you linear representations of your dynamics now i'll point out um this
bottom approach here is actually much you know this is reminiscent of lots of things that
people do in machine learning already like pooling and lifting kernel methods often if
you operate in a very high dimensional space non-linear processes can look more linear so
this is very consistent with you know existing machine learning literature but i will point out
oftentimes these kind of constrictive autoencoders are more consistent with our physical intuition
that even in these very high dimensional systems there might be a few dominant coherent
structures or patterns that we really care about and we want to find out how those behave
and evolve linearly so these constrictive autoencoders might be more interpretable you might
be able to tease out relationships when you are working in a low dimensional latent space
z both are valid approaches and i just wanted to point out that you know lots of
methods explore both of these architectures and since i'm already here talking about
this koopman review paper i'll just kind of give a couple of teasers there's lots
and lots of stuff we talk about here so the basic idea of koopman is that if you
have some original dynamical system like this duffing oscillator with three fixed points you
can essentially find new coordinate systems or coordinate transformations where you expand
the region of linear validity you kind of expand where linearization is valid through a
nonlinear coordinate system so here for example i might take my local linearization around
this third fixed point and i might find a local coordinate system where all of the dynamics
in this p pink region are approximately linear another perspective of these systems there are
global coordinates that appear linear like the the hamiltonian energy of this uh duffing oscillator
so this this phase portrait is given by a particle in this potential well here this double well
potential and the actual energy level of the particle is itself one of these eigenfunctions
it's a hamiltonian that behaves linearly in time the time derivative of the hamiltonian is zero
times the hamiltonian that's linear dynamics and this fourth perspective here i think
is really cool um and we're still exploring this you can also rescale space and time so that
non-linear oscillators like this oscillator here appear to be more like linear oscillators like
you know just sine and cosine pairs and that uh is something that henning henning lang has
recently looked into is essentially building deep neural networks that can rescale kind of time
and space to make your non-linear oscillators look like a single linear oscillator and in very
complex examples like this shear layer fluid flow he's been able to find really really simple low
dimensional representations that very accurately represent this non-linear oscillator system okay
a couple of other examples before i conclude so those deep koopman neural networks i was
telling you about for ordinary differential equations craig gin and bethany lush recently
extended that to partial differential equations so we know that even for spatial temporal evolving
systems there are often coordinate systems that will make them look approximately
linear so for example in the nonlinear burgers equation which describes the evolution of
uh and formation of shock waves in fluid systems there is the classic coal hop transformation that
transforms the nonlinear burgers equation into the linear heat equation and with our koopman network
applied to partial differential equations we can essentially learn that linearizing transform in
an automated way without knowing any kind of first principles physics or having even having access of
the governing equations so just from data we can learn these linearizing transforms in in pdes we
also have a method this is with craig gin and dan shea essentially to find these deep embeddings to
discover non-linear analogs of green's functions and green's functions are very very useful
for linear boundary value problems in partial differential equations and essentially what we can
do is through a very similar nonlinear autoencoder network we can learn these non-linear analogs
of green's functions which would have lots of applications and like non-linear beam theory or
you know aircraft wings that can deform massively past where the linear approximation is valid okay
so i want to just kind of tie this up and conclude here so we've talked about how there is this joint
problem of learning coordinates and dynamics this is uh one of the most exciting areas of machine
learning research for physics informed machine learning or for physics discovery with machine
learning where oftentimes you know if i don't know what the right coordinate system is to measure
my system maybe i'm looking at the brain or you know the climate system or a disease system i
don't know what the right coordinate system is to get a simple law you know like f equals m a or e
equals m c squared and so we can set up this dual optimization problem where we're simultaneously
trying to learn the coordinate embedding fee to find a latent space z where the dynamics
in that latent space are as simple as possible i've shown you two examples one is where f is
sparse and non-linear using our cindy approach another perspective is where you try to find
a coordinate embedding where f is actually linear this is a matrix and both of those are you
know getting widely explored in the literature lots and lots of networks are being developed to
learn those embeddings for lots of complex systems so i encourage you to try it out and to think
of what other constraints you might put on f and fee and psi for example maybe you have
symmetries in your system and you want these embeddings to respect those symmetries or
you have some conservation law that you know is going to be satisfied like conservation of
momentum or energy you know colleagues are already building these networks that have those
baked in and so there's tons of opportunity to put partially known physics into your
system and to learn entirely new system system physics that we didn't know
before all right thank you very much