Deep Learning to Discover Coordinates for Dynamics: Autoencoders & Physics Informed Machine Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back so today i'm really excited to  tell you about a new kind of emerging field   of machine learning where we're essentially  learning coordinate systems and dynamics   at the same time for complex systems so especially  i'm going to talk about deep learning to discover   coordinate systems where you can get simple or  parsimonious representations of the dynamics in   those coordinates okay so in particular i'm going  to mostly be talking about autoencoder networks   so here you might have some high dimensional  state x that describes your system maybe this   is a video uh you're you're recording your  system and you're trying to learn the dynamics   uh maybe of a pendulum swinging or of you know  some phenomena you're trying to observe and with   deep learning we're going to essentially discover  a coordinate embedding phi and a decoder psi   so that you can map into a latent space z that has  kind of the minimal essential information that you   need to describe your system and specifically  we're going to try to learn these auto encoder   networks so that we can represent the dynamics  in that latent space very efficiently or simply   okay so in this latent space z again that's  like the minimal description of your system   uh through this autoencoder network we're looking  for a dynamical system z dot equals f of z where f   is as simple as possible and so this is really an  important dual problem so on the one hand you have   uh the coordinate system discovery the the fee and  psi coordinates and on the other hand you have the   dynamical systems discovery we're trying to learn  this efficient model f that accurately describes   how this system evolves in time so i'm going to  walk you through a number of key architectures   for how you can do this with deep learning uh what  restrictions we can put on f and phi and psi and   also tie this a little bit to kind of the history  of coordinate discovery and dynamical systems   okay good and i think this is a good example  kind of motivating what we would like to do   so maybe we have a video of some complex system  in this case this is just a pendulum in my lab   and what you would like to be able to do is from  this you know movie or this you know megapixel   image that's evolving in time you would like  to be able to discover automatically without   a human expert in the loop that there is a  key variable theta that describes the system   the kind of minimal descriptive variable for  the motion of this pendulum is the angle theta   and you'd also like to just discover that  the dynamical system governing the motion   of this pendulum the evolution of theta is  given by a simple differential equation here   so that's kind of at a very high  level what we'd like to be able to do   my colleague nathan kutz calls this gopro physics  sometimes so you'd like to be able to have you   know just a movie watching the world learning the  physics learning you know how clouds are evolving   learning how pendulums swing things like that  learning how balls drop according to f equals ma   okay so that's kind of what we want to do but this  is a difficult task to learn these coordinates and   to learn these dynamics good and so you know i'm  going to zoom out a little bit and talk about   just kind of general challenges we have when  we model dynamical systems x dot equals f of x   again most of the world around us is dynamic  it's evolving in time and in space whether you're   thinking about a fluid flow or an autonomous  vehicle the state of your brain you know   your collection of neurons a weather system or a  disease model these are all dynamical systems that   change in time according to some rules  on the right hand side some physics f x   and so the big challenges that we have one of them  is often we don't know this model f of x we don't   have a good model for the dynamics so we need  to discover that model in the case of the brain   that's a good example non-linearity is another key  challenge so even a small amount of non-linearity   really hampers our ability to simulate  estimate predict and control these systems   in the real world from limited measurements  and again this is somewhere where finding   good coordinate transformations can make a big  difference in simplifying nonlinear dynamics   and the third challenge we very very frequently  face in the real world is high dimensionality   times you know we do have this camera this movie  of a very high dimensional uh pixel space or we   have a simulation or a measurement of a fluid  flow with you know thousands or millions of   degrees of freedom and so we want to essentially  leverage the fact that patterns exist in that data   to find those low dimensional representations  those kind of autoencoder coordinates i showed you   so that we can discover the dynamics and maybe  even handle some of the the non-linearity   okay so that's kind of a very high level overview  of some of the things we want to do with these   autoencoder networks okay uh and again you know  i'm going to use the example of fluids it's one   of my favorite examples we know that even in very  very high dimensional systems like weather systems   or you know more classical fluid flows even though  we would require hundreds of thousands millions or   billions of degrees of freedom to represent these  in a computer or with our satellite images we know   that there are low dimensional patterns and a few  degrees of freedom that matter i really like this   video here that shows kind of this low dimensional  pattern in this very high dimensional system   and so i'm going to motivate  these deep autoencoder networks   more classically through the singular value  decomposition okay so this goes by many many names   in different fields singular value decomposition  principal components analysis proper orthogonal   decomposition but if you have some complex system  that you believe has low dimensional patterns   you can essentially compute the singular  value decomposition of a data matrix   and you can decompose this this movie into a  linear combination of a very small number of   modes or kind of eigen states that uh capture  most of the energy or variance of this system   okay and the way i like to think about this  this is a data-driven generalization of the   fourier transform this is essentially a fourier  transform that's tailored to this very specific   problem you're looking at the specific data that  you've collected now in the uh lens of modern   kind of neural networks you can look at the uh  the svd or the principal components analysis   as a shallow linear auto encoder network where  you have a high dimensional state x that might   be my whole flow field stacked into a column  vector and what i'm trying to do is i'm trying   to find some latent representation some minimal  set of variables z so that if i compress down to   z i can lift back up to the original state  and recover as much information about that   high dimensional state as possible so this is  literally the u and the v matrices from the svd   and this latent space z here would be the  amplitudes of these these modes these pod modes   now in practice i would not recommend  actually using a neural network to   compute the svd it's way more efficient to to  compute it using a qr factorization or some   kind of classical numerical linear algebra  technique but as a as an abstraction we   can think of the svd as a very very simple  neural network and so what we're going to do   is generalize this linear coordinate in betting  and again this is a coordinate system to represent   your dynamics we're going to generalize this  coordinate system by making it now a deep neural   network so instead of one latent space layer now  we're going to have many many hidden layers for   the encoder many hidden layers for the decoder and  our activation units our nodes are going to have   non-linear activation functions and so what this  allows us to do is instead of learning a linear   subspace where our data is efficiently represented  now we're able to learn a non-linear manifold   parameterized by these coordinate z where our data  is efficiently represented and often that means we   can get a massive reduction in degrees of freedom  needed to describe this latent space z good and   So this is kind of what this abstract autoencoder network is trying to represent   usually I have many many hidden layers in 'Phi' and  in 'Psi' and I have non-linear activation functions   so the 'z' parameterizes a manifold.  Good, and we can learn these essentially   by constructing this architecture  in Tensorflow or Pytorch or whatever neural   network training framework you like and  essentially cooking up a loss function   so that we want the output minus the input to  have the smallest error possible given that we've   choked down the information to maybe three  latent variables in 'z'. So you fix the number   of latent variables you fix the architecture  and you train this so that it minimizes the   error between the output and the input and it  learns then the latent features of that input.   And so, like i mentioned earlier, what we're going  to do now is we're going to be training these   autoencoder networks not just to represent the  data in 'x' efficiently but also to be able to   predict how the dynamics and that latent space  evolves accurately and efficiently forward in time.  So now we're going to be learning not just  the encoder and the decoder but we're going to   be learning a representation of the dynamical  system 'f' that evolves the system forward in time.   And this is becoming a big deal in the machine  learning community because now we're able to   start understanding the physics of how our  movies or our high dimensional inputs are evolving in time.  I like to give this example. I think  this is an extremely motivating example   where what you have here is this great  visualization of the geocentric view of the solar   system. So the blue dot is Earth, the yellow dot is  the sun and the red dots are the other planets in   the solar system. If we have kind of a naive  coordinate system where the Earth is at the center   of the solar system then you see that you actually  get extremely complex dynamics. These are   really complicated circles on circles and  there's no obvious simple law that describes the   motion of all of these bodies in this coordinate  system. But if we change our coordinates, so now we   put the sun at the center of the solar system  and every other planet is going around the sun   now this becomes a much much simpler dynamical  system and this is amenable to discovering the   basic laws of physics 'F=ma' and Kepler's laws  of motion. So i think that this is an   extremely motivating example that shows that  when you have the right coordinate system you   have some hope of learning a dynamical system 'f' that's  simple. I could try to learn a dynamical system   that describes this evolution here but that  differential equation is going to be much more   complicated harder to interpret and understand  and generalize so getting the right coordinate   systems is often essential to learning the right  dynamics. And that's been born true over and over   throughout the history of physics. Good, so  that's what we're trying to do here we're going   to learn the coordinate system that simplifies  the dynamics. Now I'm going to walk you through   a few examples of how we've been doing this and  what some of the challenges and opportunities are.   So one of my favorite networks here is one  that was developed by Kathleen Champion   when she was a PhD student with Nathan and me and  this is essentially combining SINDy, or the sparse   identification of nonlinear dynamics, to learn  a dynamical system in the latent space of an   autoencoder and i think this is a really clever  network design so essentially what kathleen is   doing is she's learning these encoder and decoder  networks these are big deep neural networks   and there's additional loss functions  that we use in our neural network training   so that the dynamics in z are essentially a sparse  linear combination of candidate non-linearities in   a library so we basically build this big library  of the possible terms that could describe the   dynamics in z and then we essentially try to find  the fewest the sparses combination of those terms   that describe z1.z2.z3 dot in the latent space  and so to do this kathleen had to introduce a   number of extra loss functions to make this the  cindy auto encoder converge to the right solution   essentially um there's the original reconstruction  loss if you just remove the dynamics in the middle   you still want your auto encoder to faithfully  capture your data to find a good latent space   but then we have these additional cindy losses and  the cindy regularization so the l1 term at the end   tries to make this set of column vectors as  sparse as possible so we have as few terms in   the dynamics as possible that makes it so that our  dynamical system is interpretable and simple it's   more like f equals m a than some big complicated  expression and then essentially these additional   two loss functions take different parts of  this network and make sure that if i compute   x dot here it's the same as if i took z dot  here and mapped it out and things like that   so essentially you can compute chain rules  on different sub networks and make sure that   the dynamics the time derivatives across those  different pieces of the network are consistent   that essentially makes sure that you don't get  into weird issues where you just try to shrink the   z as small as possible which would in principle  also make c small so this is a really cool network   that kathleen developed she was able to learn  lots of very very sparse parsimonious dynamical   systems in these latent spaces for complex  systems and one of her systems was actually   a movie of a pendulum a synthetic movie of  a pendulum okay so uh this is a really nice   way to learn kind of those dynamics and uh and  coordinates but what if we want to do more than   just learn a sparse non-linear model what if we  actually want to try to find a coordinate system   that actually linearizes our dynamics so we know  from kupeman operator theory that this is often   possible now there's lots of caveats but there  are often these special coordinate systems fee   that if i take my original dynamics and i map it  into you know the latent space through z i might   even be able to expect a linear dynamical system  z dot equals l of z now why would i want this   linear dynamics are kind of you know generically  solvable with textbook methods we can get optimal   linear estimators and controllers with you know  one line of matlab or python so anytime we can   make our system linear we gain a tremendous amount  uh in doing so now the big big big caveat here   is that this puts a tremendous amount of uh  strain on the encoder and decoder networks   so finding a coordinate system like in the cindy  auto encoder where we have sparse dynamics that's   not that bad for the coordinate system to do we  can find a coordinate system where there's a few   sparse non-linear terms this is related to  normal forms from dynamical systems theory   but if we want to actually remove all of  those nonlinear terms and make the system   truly linear that's a much much uh stronger set  of constraints that we're putting on this this   encoder and decoder network so generally speaking  these are much harder to train much more expensive   and these might be much nastier functions fee  to get these linear dynamics but the good thing   is just like most of neural network  training this is a very big expensive   offline training cost but once you've found  these coordinate systems you can use them forever   online in a very fast and efficient way  you can do optimal estimation and control   in this latent space once you've put  in the upfront cost of learning fee   now again this could be a deep neural network with  non-linear activation functions but it could also   be a shallow neural network with linear activation  functions in which case this kind of linear   dynamics auto encoder would represent the dynamic  mode decomposition okay so if if phi and psi were   shallow with linear activation units this would  essentially encode the dynamic mode decomposition   so dmd has been used at length in fluid  mechanics and many other fields to get   linear representations of nonlinear  systems like these periodic fluid flows   and essentially dmd is finding some kind of a  linear regression model on those svd coordinates   i talked about earlier okay and there's  another cool paper here that connects it to   coupeman operator theory so if you take that  idea of the dynamic mode decomposition which   is an extremely popular useful kind of general  purpose linear modeling technique and you take   this neural network representation and you make  those encoders and decoders deep neural networks   with nonlinear activation functions what we're  essentially doing is learning a non-linear analog   of the dynamic mode decomposition we're learning  a nonlinear coordinate system that'll take an   original non-linear system and make it look  linear and so that's exactly what many many   groups in the community have done in the last  few years so i'm showing you work by bethany lush   who was a postdoc with nathan kutzen myself  this is her deep koopman auto encoder where   she takes her input and essentially through  some some hidden layers learns a coordinate   system and encoder where she can get linear  dynamical systems again she looked at systems   like like a pendulum or flow past a cylinder  where there is a continuous eigenvalue spectrum   so she needed to kind of add a few innovations to  this network to parametrize these linear dynamics   by the frequency of the system there have been  many many other groups that have developed similar   deep kootman autoencoder networks that essentially  learn these coordinates where your dynamics can be   simply or linearly represented and i want  to point out that those different approaches   to these kind of koopman networks or these  non-linear analogs of dynamic mode decomposition   come in many shapes and sizes and we talk  about this in a recent review paper we wrote   on the coupeman operator theory this is kind of  a beast of a paper it's about 100 pages and it's   all about how you find these coordinate systems  with neural networks and with other approaches   and there's kind of two big philosophies that  the community has arrived at one of them is the   the kind of constrictive autoencoder that  i showed you before where you have a high   dimensional state you choke it down to a lower  dimensional latent space where you can model   the dynamics linearly and then you can you know  kind of decode that latent space to your original   state x but what many people in the community  have also done is take your your system state   and actually lift it to a higher dimensional  latency variable z so you can take you know   even if you have a high dimensional state x you  can lift it to an even higher dimensional state   try to model the evolution in that higher  dimensional state with linear dynamics given   by this matrix k and then you can always map  back down to x and these are two very very   different philosophies of how you can get  embeddings that give you linear representations   of your dynamics now i'll point out um this  bottom approach here is actually much you know   this is reminiscent of lots of things that  people do in machine learning already like   pooling and lifting kernel methods often if  you operate in a very high dimensional space   non-linear processes can look more linear so  this is very consistent with you know existing   machine learning literature but i will point out  oftentimes these kind of constrictive autoencoders   are more consistent with our physical intuition  that even in these very high dimensional systems   there might be a few dominant coherent  structures or patterns that we really care about   and we want to find out how those behave  and evolve linearly so these constrictive   autoencoders might be more interpretable you might  be able to tease out relationships when you are   working in a low dimensional latent space  z both are valid approaches and i just   wanted to point out that you know lots of  methods explore both of these architectures and since i'm already here talking about  this koopman review paper i'll just kind of   give a couple of teasers there's lots  and lots of stuff we talk about here   so the basic idea of koopman is that if you  have some original dynamical system like this   duffing oscillator with three fixed points you  can essentially find new coordinate systems or   coordinate transformations where you expand  the region of linear validity you kind of   expand where linearization is valid through a  nonlinear coordinate system so here for example   i might take my local linearization around  this third fixed point and i might find a   local coordinate system where all of the dynamics  in this p pink region are approximately linear   another perspective of these systems there are  global coordinates that appear linear like the the   hamiltonian energy of this uh duffing oscillator  so this this phase portrait is given by a particle   in this potential well here this double well  potential and the actual energy level of the   particle is itself one of these eigenfunctions  it's a hamiltonian that behaves linearly in time   the time derivative of the hamiltonian is zero  times the hamiltonian that's linear dynamics   and this fourth perspective here i think  is really cool um and we're still exploring   this you can also rescale space and time so that  non-linear oscillators like this oscillator here   appear to be more like linear oscillators like  you know just sine and cosine pairs and that   uh is something that henning henning lang has  recently looked into is essentially building   deep neural networks that can rescale kind of time  and space to make your non-linear oscillators look   like a single linear oscillator and in very  complex examples like this shear layer fluid flow   he's been able to find really really simple low  dimensional representations that very accurately   represent this non-linear oscillator system okay  a couple of other examples before i conclude   so those deep koopman neural networks i was  telling you about for ordinary differential   equations craig gin and bethany lush recently  extended that to partial differential equations   so we know that even for spatial temporal evolving  systems there are often coordinate systems   that will make them look approximately  linear so for example in the nonlinear   burgers equation which describes the evolution of  uh and formation of shock waves in fluid systems   there is the classic coal hop transformation that  transforms the nonlinear burgers equation into the   linear heat equation and with our koopman network  applied to partial differential equations we can   essentially learn that linearizing transform in  an automated way without knowing any kind of first   principles physics or having even having access of  the governing equations so just from data we can   learn these linearizing transforms in in pdes we  also have a method this is with craig gin and dan   shea essentially to find these deep embeddings to  discover non-linear analogs of green's functions   and green's functions are very very useful  for linear boundary value problems in partial   differential equations and essentially what we can  do is through a very similar nonlinear autoencoder   network we can learn these non-linear analogs  of green's functions which would have lots of   applications and like non-linear beam theory or  you know aircraft wings that can deform massively   past where the linear approximation is valid okay  so i want to just kind of tie this up and conclude   here so we've talked about how there is this joint  problem of learning coordinates and dynamics this   is uh one of the most exciting areas of machine  learning research for physics informed machine   learning or for physics discovery with machine  learning where oftentimes you know if i don't know   what the right coordinate system is to measure  my system maybe i'm looking at the brain or   you know the climate system or a disease system i  don't know what the right coordinate system is to   get a simple law you know like f equals m a or e  equals m c squared and so we can set up this dual   optimization problem where we're simultaneously  trying to learn the coordinate embedding fee   to find a latent space z where the dynamics  in that latent space are as simple as possible   i've shown you two examples one is where f is  sparse and non-linear using our cindy approach   another perspective is where you try to find  a coordinate embedding where f is actually   linear this is a matrix and both of those are you  know getting widely explored in the literature   lots and lots of networks are being developed to  learn those embeddings for lots of complex systems   so i encourage you to try it out and to think  of what other constraints you might put on f   and fee and psi for example maybe you have  symmetries in your system and you want   these embeddings to respect those symmetries or  you have some conservation law that you know is   going to be satisfied like conservation of  momentum or energy you know colleagues are   already building these networks that have those  baked in and so there's tons of opportunity to   put partially known physics into your  system and to learn entirely new system   system physics that we didn't know  before all right thank you very much
Info
Channel: Steve Brunton
Views: 59,552
Rating: undefined out of 5
Keywords:
Id: KmQkDgu-Qp0
Channel Id: undefined
Length: 26min 56sec (1616 seconds)
Published: Fri Aug 13 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.