Data-driven model discovery: Targeted use of deep neural networks for physics and engineering

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi my name is nathan kutz i'm a professor of applied mathematics at the university of washington in seattle and i'm going to be talking today about the discovery of dynamics directly from data so mostly what this talk is going to be about is how we can take modern machine learning algorithms and ai infrastructures and bring them into data collected from physical systems towards building models and thinking about how we can take that and and really enrich what were our possibilities uh from the point of view of scientific discovery all this work has been done uh in collaboration with stephen broughton who's in mechanical engineering he is my main collaborator here at the university of washington and i will feature much of the work of our fantastic group of students and postdocs who have been doing a lot of the the groundwork on building out algorithms for for physics and engineering uh machine learning so moses talks going to focus around the idea of coordinates and dynamics so whenever we start taking recordings from a system what we would really like to do is find a representational frameworks or a coordinate system that is sort of in some sense uh ideal for representing some of the dynamics in the system so i'm going to give a generic picture like this which what you see here is i'm going to have input data which is going to come let's say from sensors of a system and i want to learn a coordinate mapping to a new variable a latent space in which i can represent that dynamics in some kind of efficient and potentially parsimonious way so we're going to talk a lot about that process during the talk but this is sort of this generic structure of getting into a coordinate system where i can represent the dynamics and then coming back out and a lot of the talk is going to feature targeted use of neural nets towards doing a lot of work for us that is advantageous so first of all this idea of coordinates and dynamics certainly is not new and it has a long history and many of our coordinates that we have used in the past revolve around a lot of different kind of features that we know about the data for instance special functions have been widely used in mathematical physics for for using it in in the context of coordinate systems like for instance if you're in a system with polar coordinates or spherical coordinates there are representations and special functions that are actually quite ideal for representing solutions for instance many of the special functions out of mathematical physics like vessel hermit lager are really sort of these ideal coordinate systems given a specific set of geometries so you exploit that geometry in representing your solution we also have a lot of expert knowledge uh people who have deep expertise in many physical systems can leverage that expertise for understanding the physics in in a in a nice way in having a good representation more recently there's been a lot of methods using svd-based approaches in other words taking that data and finding a low rank and betting in which to represent that data so svd base methods are things like proper orthogonal decomposition this also goes under the name of pca empirical orthogonal functions hold link transform lots of different names for the same thing which is find a way to take your high dimensional state space into some low rank feature space and build your model there and more recently now we have neural nets neural nets give us even more flexibility because they allow us non-linear transformations into a coordinate system whereas the svd base methods are all linear so this is one way to think about coordinates we've been doing it for a long time finding good representations and then once you're in this model space once you're in now this new coordinate system you want to represent the dynamics and there's a lot of different ways to do that as well i give you some here there's for instance standard statistical methods for time series like arima autographs of moving average models there's more modern things like dynamic mode decomposition and koopman theory which tries to represent the dynamics in a linear fashion one of the things we'll focus here too is on cindy which is the sparse identification of nonlinear dynamics which means once i'm in this space can i discover the governing equations that produce that data and so there's these discovery methods that have emerged that we can now make use of there's also things like trying to represent this in terms of normal forms of in dynamical system setting and also when we think about taking time steppers you could imagine once you're in this coordinate system and you have time series data you can just use neural networks like lstms or current neural nets grouse echostate networks for actually doing forecasts of that in the time series domain so lots of options for both building coordinates and also building dynamics and so part of what we want to discuss is what are some ways to do this and obviously when i say good coordinates this is a very subjective term and so what i mean by good coordinates could be very different than what you mean by good coordinates but i'm going to give you some examples in which i hope will really highlight what i intend when i say good coordinates and we start here so hopefully you can see this okay this is uh this is the night sky and what you're seeing here is snapshots taken over time and what you're seeing here this is the retrograde motion of mars in the night sky okay and then what you're seeing here for instance is the retrograde motion of saturn in the night sky so this is really remarkable this is in fact the oldest of the physics problems which is to try to understand the nature of the movement of the heavens right so celestial mechanics is sort of in some sense the canonical physics problem that was explored for for centuries and generations okay uh in fact millennia and so what we want to do is try to understand can we write down predictive models for the trajectories of the planets and this was really sort of this uh big challenge problem that people had and but was of great interest to most of society and so the earliest models of this actually go back all the way to alexandria egypt in the uh claudius ptolemy this is the ptolemaic dynasty in which this was sort of the intellectual center of the ancient world really before rome sort of took over all the mediterranean region uh alexandria was sort of this intellectual capital and out of that play out of this claim the potomac version of a doctrine of the perfect circle and the idea here was that you could actually think about the retrograde motion of the planets as circles on circles so that was one way to represent the dynamics of those orbits and also do forecasts of where those orbits would be this is kind of like the earliest theory of this and in fact it was very successful it lasted for 1500 years until the time of kepler and galileo when it came down and it also represents sort of in some sense a representational aspect which is it's in some sense the earliest fourier transform so you can think of this data circles on circles these are different frequencies different orbits of these circles which is very much like what you would do in a for a transform in terms of representing uh things in terms of different frequency content so there's these amazing pictures uh from renaissance times and this is renaissance italy showing sort of a map of the perfect circle theory in some sense and so this is uh the one of the longest lasting theories we've had from second century ad all the way to right around the 1600s this was the theory and it's kind of like the roman empire of theories it's one of the longest lasting it's highly unlikely we'll have theories like this last that long in human history again and what brought it down it was still very accurate model still is today right it's not like it's a terrible model it does a pretty good job the big change was to understand the coordinate system and so here what i show you is that one way to understand the retrograde motion is if you change the coordinate system in those times the coordinate system was the earth was everything came around was revolving around the earth and then there's copernicus mating making the suggestion that in fact everything was going around the sun and really this was backed scientifically by galileo and kepler and laying down the foundations for making the shift to the heliocentric coordinate system now the amazing thing is once you move this to this coordinate system now you're set up for newton to come along and derive and produce f equals m a this beautiful theory about gravitation but you have to be in the right coordinate system to push that theory forward and of course it's a great theory and it lasted for many hundreds of years before we realized that there was the discrepancy between the data and and the and those f equals ma models which einstein used to build out his theory of general relativity and undoubtedly as we get more and more accurate measurements one can imagine that we're going to also go beyond general relativity but of course we would need the data to inform that physics making process just like einstein's had the data to build that model and what eventually happened is galileo kepler also had the data finally to build this model and set it concretely in shape place for future generations so this is what i mean by a good coordinate system in a good coordinate system the dynamics you come out are these nice parsimonious representation f equals m a and you don't find that until you get that good coordinate system to work with there's also something to be highlighted here in the work of kepler and newton kepler essentially was using the data and doing a regression of elliptic corporates onto that data he was fitting the data he was fundamentally doing an interpolation task newton on the other hand with his f ecosm a f e goes m a is inherently an extrapolatory tool so for instance newton once you have this he could imagine things like what if i launched someone from earth to the moon and putting them in orbit this is something that fe gosumei allows you to model you've never observed it but f equals may allows you to understand the conditions under which that could happen whereas kepler was a regression on to the data itself so this represents this kind of tension between the two because in some sense newton is the more famous of the scientists partly because he built a model that allowed us to go places we've never seen with the data and that's exactly kind of in some sense what i would like to do and i think what a lot of physics and engineering practitioners wants to do is understand some principles and then figure out how to engineer new and innovative systems that i've never had collected data from before by the way these concepts of kepler newton are live today in some of our modern technologies and i want to highlight here this is these are our grand challenge problems that we see today we have robots self-driving cars autonomy right now notice what this robot is doing this is a remarkable feat this robot's doing a gymnastics routine it's much more athletic than i am uh it's it's actually just remarkable to watch some of these videos that come out of boston dynamics and in what these robots are capable of doing but it's also equally amazing to watch these self-driving cars that can navigate you know through construction sites and and through traffic and be on highways be in cityscapes and this is remarkable that these cars can actually navigate those very successfully the interesting thing about these two autonomy paradigms is that in some sense they are very different from each other they're both autonomy but one is completely physics based so the this robot technology is really imbued with f equals ma physics it understands so much of its ar of its world through having physics models of the world it understands momentum it understands angular momentum this is how it's able to walk and do its tricks it has it has a physics engine inside of it which is based upon all the physics we've built up to the modern time so one way to think about it it's it's imbued with newton-type architecture on the other hand this car is not built this way this car is built with a bank of sensors with massive amounts of training data which is labeled to tell it what the environment is happening if you give it enough training data it learns a representation of this of this world through deep neural networks and so there's no physics per se but it learns everything it needs to be able to successfully navigate the roads with given rules about how fast it should go how when should it break and so forth and so in some sense this is a fitting a regression to the data i'm not going to make any comments about which method is better both these methods have experienced a tremendous amount of success in our current society and it also reflects this tension that's always been there and the brimen highlights this in the two cultures of statistics one is to think about this as a statistical learning which is trying to really get after the models of the primary idea and this one here more machine learning which is building some representation that simply functions well and works well whether you have an interpretability or not is in some sense yeah this would be nice but that's sort of beside the point you just want to make this thing work well so these two are where we are right now right you can use different architectures you can think about really impearing or physics or just straight machine learning and where i want to spend time in this talks is sort of sort of at the in between spot which is you want to still use all the power of machine learning but bring it over here to the physics world how do you bring all this missing learning technology but you still want to get interpretability and maybe have a shot at generalizability using getting some models that you can go to places and extrapolate so here's the mathematical formulation that i want to give i want to lay down fundamentally what we want to do mathematically and it's and this is it here here's the mathematical framework i'm going to have some model a dynamical system that produces the data i'm going to measure f specifies the model there's some parameters for it might have some stochastic effect and x is like is the state space variable what i actually have access to is a measurement y t of k these are discrete time measurements through a measurement model h there's some noise on this data always there's noise and so part of what the goal here is just simply from the measurements can i in fact tell you what the measurement model was what the state space is x what the dynamics are f what the parameters are theta this is a terribly ill-posed problem so normally you think we teach all of our undergrads and grads you know avoid ill-posed problems but as i'm starting to learn what you really want to do is say i'm going to have the really interesting problems that we're solving today are all opposed in many ways and the whole idea here then is to say if i want to solve these what i really need to do is bring in appropriate regularizations and constraints to make it go from ill-posed to well posed and so i want to talk about the constraints i want to impose on this mathematical formulation because that's really what it's about i'm going to take this measurements and dynamics it's ill-posed and to make it well posed i'm going to really impose the idea that i want interpretable parsimonious representation of the dynamics so in some sense i would say the ultimate physics regularization is to figure out what the intrinsic dimension of the system you're measuring is and the smallest number of terms some kind of parsimonious or nominal representation of the dynamics i mean these ideas are old they go back to william of occam which really he really was a proponent of nominal models you know include the things you need but nothing more and this has been said over and over again through history since that time including by pareto and einstein himself and so this is going to be our physics regularization we're going to promote in many of these architecture sparsity towards giving us a nominal parsimonious representation of the dynamics which oftentimes leads to interpretability and the ability to extrapolate so that's where we're going with what we want to do find a coordinate system that this holds okay so let's start off in a simple place the simplest place maybe is i want to find coordinate transformations and the coordinate and dynamics transformations in which linear models are a good place to work with so linear models are really nice because we typically can just write down the solutions to these linear models they're you know we can say a lot of things about linear models much more so than nonlinear models obviously and let me give you an idea of this isn't this going to go towards coupon theory and cumin theory is all about thinking about a coordinate in which the dynamics becomes linear we're not linearizing the dynamics we're using a coordinate system to make the dynamics linear there and here's an example of this here's a two by two nonlinear ode and so normally what you might say is like well okay i can you know i don't necessarily have uh for generic two by two nonlinear ods you don't can't write down closed form solutions but you can you know look at fixed points face plane analysis but here notice that if i change variables to y1 y2 y3 which is given by here which is x1 x2 x1 squared then this model becomes perfectly linear so all i did is made a coordinate transformation and in this new coordinate i have linear dynamics and i can say everything that i want about the system i can look at the eigenvalues eigenvectors write down the solution to this nonlinear differential equations problem by putting it in a different coordinate system so that's the generic idea behind building these linear models figure out the coordinate system or an approximation to a coordinate system to make this work so part of the way we do this is through in this cuban theory is is using a tool called dynamic mode decomposition to make an approximation to a linear embedding so the idea is the following i measure some complex system so i have snapshots of that system it could be high dimensional and i'm taking measurements of it and generically it's some null in your evolution equation that gives that data and what i do is i form some data matrices x and x prime so x are snapshots of that system over time x prime or all those snapshots advance delta t into the future and so those are going to con constitute my observables i just take those measurements and then what i do is i just say build a linear model that takes me from x of n to x of n plus one so that snapshot x of n to x of n plus one and there should just be some linear matrix a that does that but i have a lot of snapshots so i want the best linear matrix that gets me through all of those snapshots from one to the next delta t into the future and so when you do this regression it's just a least square fitting procedure and in fact there it is here that matrix a of x tells me the best linear model through all the snapshots which is just x prime times the pseudo inverse of x itself and that's what's called dynamic mode decomposition it's a regression to the best fit linear model going through the data snapshots what kuben theory does on the other hand it says why would i work directly with x maybe there's some observable some change of variables so i can go to g of x so i don't work with x directly but some observable of x and i construct new matrices y and y prime which are whose columns are the snapshots in this new observable g of x then i do dmd so the hope here is if i can find the right coordinate transformation then this linear model should work out pretty nicely and this is exactly the example i showed you if you make a change in your coordinate system you can get a linear model okay you're not guaranteed to get a linear model by the way there's there's more to be said there but the idea here is we're going to approximate that with this algorithm which is the dynamic mode decomposition and here what we're going to do is we're going to figure out well how do i come to an observable set we're going to use neural networks so neural networks are going to try to learn a coordinate transformation to do this heavy lifting for us now i want to highlight here that neural networks this is from the first sentence of a paper by stefan a lot on understanding deep convolutional neural networks supervised learning is a high dimensional interpolation problem the key word here is interpolation neural nets live in the space in which you sample that data right most of what we want to do in physics is extrapolate i would say so you're going to have some constraints here by using neural nets which is within the data you've collected you can build a model but it tends to be over parameterized and it tends to be very brittle and does not generalize well this is well known and people are working very hard at finding ways to loosen that up to make neural nets works uh neural networks a little bit more interpretable more generalizable so here's where we're going to start we're going to try to do this linear embedding by figuring out if the neural net can figure out the coordinate system for us so you see what i'm doing here i'm pairing a coordinate system with dynamics the coordinate system is some transformation the dynamics i'm going to force it to be linear so we're going to learn how to learn neural network and encoder take us from our original variable to new variable y and then back and what do i want to have happen in this new variable when i go from snapshot y of k to y of k plus 1 i want there to be a linear map that does this so i want to learn some linear operator in conjunction with the coordinate transformation so it's true if i take two steps in the future it's k squared in this new variable all this work was done with bethany lush so bethany lush really pushed this forward and so she started engineering these architectures to do exactly this this structure here and so we started this and we started thinking well let's try to embed the non-linear pendulum in some linear coordinate system and we failed and it's actually in retrospect quite obvious why you would fail because in fact the pendulum the oscillations of the nonlinear pendulum the frequency changes over as the oscillations get bigger in fact if you want to see this here it is here this is as you take larger and larger amplitudes of of of the oscillations here is the shift in frequency and it does one other thing it generates harmonics and this is well known from asymptotics from the 60s so you can take the duffing equation and you perturb it you find as the as the perturbation gets bigger what you find is the frequency shifts you generate third harmonics and there's fifth harmonics and it's a little bit easier to see the fifth harmonics here and then there's a seventh harmonic and this is all due to the nonlinearity producing these extra frequencies and then the shift in the frequency is also due to these to the nonlinearity as well so this is interesting and it highlights something very important we were trying to embed the pendulum in a two-dimensional space because we know the pendulum doesn't change dimension it's still two-dimensional but look how many frequencies you need to represent it because of the continuous nature of the spectrum you can't represe a two by two system can only have a complex conjugate pair so a single frequency and this requires many frequencies to represent it so instead we modified this neural network architecture so if you want to keep a linear model that's at the intrinsic rank of the dynamics then you we learned a second network that parameterizes that linear operator as a function of the frequency so once you do this built this linear model with parametrization parametric dependency everything works just fine and so in fact you can start looking at embedding the null in your pendulum which you can see here some of the dynamics with very large amplitudes oscillations the dynamics is is not sinusoidal uh it starts to really become more exotic looking than sinusoidal and you can basically take your dynamics which is embedded in non-linear space to embed it in a completely linear space so that's really nice right so you're saying i can actually take very large amplitude oscillations embed this and still make it linear if i just warp the coordinate systems and these are sort of your eigen functions as it were to make this transformation happen for you so we were very happy with this we can go to harder models something like flow around the cylinder this is uh when you have von karman vortex shedding on the back end of that cylinder and the same thing you find that if you look at the dynamics back there there's some low rank structure you can take that time series which is generically some nonlinear dynamical system and we can embed that in a linear dynamic model and here is the linearized dynamics here's your essentially your eigenfunctions for this transformation so what we've done here then is be able to take this flow around the cylinder in this one karmic vortex shedding regime find a coordinate transformation to make this linear so that's the idea in some sense about behind coupeman theory we took this even further and this is work with craig ginn craig did this i i'm still surprised by this but this is kind of amazing so we started looking at more complex pdes and i don't think this could be done but the codes here the codes on github you can run it yourself but we basically take something like the kurmoto shivasinsky model which produces spatial temporal chaos very complex dynamics and by learning this neural network transformation and it's a little bit more complicated than the original neural networks we did this thing we can learn in embedding to take this pde and make it linear in this new variable so this is quite remarkable this is like the inverse scattering transform which linearizes the kdv or nls equations here we can just do it with as far as we know generic pdes if you can deal with kuromoto shivasinsky it gives you a lot of options because that's a very difficult equation and here's some of the results of that you can run the full pde and look at our linear embedding and back out into the original variables and it works marvelously well so you can find that up on the archive actually so that's kind of the idea behind linearization and of course we don't have to make linear models in fact sometimes linear models are a little restrictive so maybe what we should do is start thinking about like how do we make more general models and so we want to allow maybe to go to just discover governing equations that can be non-linear and so how we're going to do this so we're going to relax this assumption about linearity because nonlinearity is a fantastic way to parameterize dynamics and so we're going to do here is use this method called the sparse identification and all the dynamics or what we call cindy the idea here is this is just going to be an ax equal to b problem that's the math an over determined x equal to b where we're going to promote sparsity here's how it works if i give you data in this case y z time series of x y z you know the dynamics was generated by some x dot equals f of x well if i give you x y z you could compute the x dot that's b and the a x equal to b the matrix a is a set of candidate right hand side functions so it's x dot equals f of x f could be lots of different things you can pick up all your physics books out of the library and look at what kind of models have people built over time in history well there's lots of different models and for all the different terms in those models you can put them in as candidate functions in this right-hand side so you say here's a library of potential right-hand sides and i'm going to regress on this equal to x dot which you computed now if you do a standard regression like least square fitting it will take all those library elements and it will weight them all of them a little bit to try to fit the data however if you use sparse regression what it will do it will say okay i'm going to find the smallest number of library terms possible to fit the data in other words the derivatives the x dots and so what it does by promoting sparsity and we do this by sequential least square thresholding we don't use things like lasso because they tend to make be pretty unstable algorithmically if you do this you find here are the non-zero terms that show up and those non-zero terms are in fact the lorenz equation so essentially if i sample that data i can tell you what the dynamical system was that produced it so we built on this architecture to go to things like spatial temporal systems like pdes and here this is work with sam rudy and sam wrote this paper where he basically did a bunch of different canonical pd's out of mathematical physics and here they are kdv all the way down to navy or stokes and we here's the question i just give you spatial temporal data so snapshots and space time of these systems can you tell me the pde that produced them and sam's method is just basically using that cindy architecture on spatial temporal data so now you've got to compute spatial derivatives and in fact you can discover all of these and you know different of these systems can tolerate some amount of noise more than others and so that's also a study that sam was able to do to see how robust this is for being able to discover partial differential equations so this is a nice discovery tool right so if i take measurements of a system you can tell me what the governing equations were so this is a really nice framework to come to new systems to try to understand dynamics and build models from them where where i think this is actually even more applicable however though is in what's called discrepancy modeling so the idea is that there's very few problems that we come to anymore where we know nothing right where we just have no principles we don't understand anything about the physics in most systems we have some partial knowledge about the physics maybe we don't know everything maybe our model is here's some things that are true but we don't we're missing a bunch of physics but generally we do know some things so discrepancy modeling is a framework in which you can say i want to come to the table with an imperfect model so here it is i know some physics it just turns out that that's not good enough to represent the dynamics of my system what i really need is some kind of correction to the physics well the cindy architecture i just showed you it's trivial to handle this because if you know some physics f of x you can just move it over here to the left so you have x dot minus f of x that is now my new vector b when i do that ax equal to b regression the b just takes in the f of x you still regress to a library of terms to try to model what's missing it's as simple as that this architecture of cindy is very flexible you can you can even put constraints on it very nicely with the ax equal to b you know you can enforce things like conservation of mass or momentum it's a very flexible framework and it's just ax equal to b okay so that's it's a nice it's a nice architecture now where does this play a role well let me give you the example here these are pendulums that are on a cart double pendulum on a cart and if you just give it the platonic knowledge of physics what i mean by that is perfect mass measurements no friction no air resistance then it's very difficult for this thing to actually stabilize the pendulum but if it can learn the mismatch between its model and the actual data you can improve that model by building in the discrepancy to stabilize that pendulum and this is especially important for things like the digital twin so i'm showing you here at digital twin which is this is an emulation this cat emulation of an actual robot now they look pretty close but when you do precision manufacturing you don't you are not afforded the luxury of being close you have to develop a model here that really matches very closely that model there there are frictional forces and sticking of the joints if you allow this thing to learn that discrepancy model for every single robot each one will develop its own discrepancy and get you a better match between the emulated model and the actual digital uh which is the digital twin and the actual experiment right or the physical manifestation in the world so this architecture allows you to say i know most of physics for this i just need to correct for it to match that specific robot there and so this cindy architecture is a flexible framework for doing such a thing i mean you can also use gaussian process regression models and so forth but those are just statistical estimates whereas this is actually trying to build a model for what's actually creating the discrepancy okay so in a lot of what i've talked about here so far i've assumed that i actually took measurements of the system and i knew what the state space was what happens if i have the model where i'm taking measurements i don't really know what the state space variable is so here i want to discover jointly the coordinates and dynamics notice i even have to discover the state space and this is what we talked about at the initial part and so this is work with kathleen champion and kathleen really kind of uh solved this problem in in in in framing in the following way what she said is look if i have measurements of a system i want to learn a coordinate transformation and what i want to have happen in that coordinate is i want there to be a parsimonious representation of the dynamics so we're going to put a cindy model in the inside of that autoencoder structure and then be able to come back and here's sort of what the loss function might look like and then you can train this thing now this mod this kind of architecture is exactly what i talked about at the beginning with the celestial mechanics if you observe the night sky the first thing you have to be able to do is recognize what the appropriate coordinate system is which is the heliocentric world and then discover f equals may here so this is what this part of the neural network does it discovers there's a heliocentric world here f equals may and they come in a pair another good example of this is if i give you a video of a pendulum so like i film a pendulum what i give you for data is pixel space there's no way for that pixel space to know that there is this coordinate theta and theta dot that represents the dynamics of the pendulum so what this would do is take pixel space learn that in fact this thing should go to a coordinate system that's embodied by theta theta dot and then within that coordinate system theta double dot is sine theta so that's the idea behind this kind of architecture it's really powerful and it's exactly what we tend to do when we do physics which is can i find that coordinate system and then i get a good representation of dynamics this is doing both you're learning them simultaneously and so here's some different examples of this i just want to highlight the bottom one which is the nonlinear pendulum if i just give it videos of the dynamics it discovers theta theta dot and it discovers that theta double dot is sine theta that's kind of remarkable so it gives us a way forward when we think about all the kind of instruments that we're using nowadays to measure physics right which is a lot of a lot of imaging techniques and so we need the image itself doesn't necessarily give you the right coordinate but this gives you an architecture in which you can get to the right coordinate and build the model that you need from there the other thing that's interesting about this that inner layer has the potential for generalizability whereas the coordinate system you know like if i double the length of the pendulum in the video that coordinate no longer works but what i learned the theta double dot sine theta still holds so maybe i can pin this generalizable physics and just learn a new coordinate system if i change the length of the pendulum so these are very nice things to think about in terms of what's generalizable in a model what's not generalizable in a model we can build on these ideas and just think about what else can we use targeted neural nets for and i want to talk here about a couple of other themes that we've built on and one of them is just time coordinates so what if we just say well if i'm in this time coordinate i could build a dynamical model a linear model or this nonlinear model or maybe just regress directly because really what i want to do is forecast some some data and so this is work with henning lenga and what we started building is thinking about fourier and common forecasting the way this works is if i want to go into the future with some process and look say make a prediction long time into the future we have very few functions that long time in the future don't blow up or grow to zero and sines and cosines are perfect because they persist to infinity and so i want to do is take data time series data and you could say well a simple thing to do is fit it to fourier modes and what you want to do here of course unconstrain it so it doesn't satisfy a periodic bottom condition but fit at the fourier modes we take one extra step which is we also say what if we were to take whatever data we have make a transformation to a coordinate transformation in time to make these time series data look more sinusoidal then fit it to fourier and that's exactly what henning did is this coupeman forecasting is to learn a coordinate transformation in which sinusoidal functions are are really ideal so this paper is appearing in general machine learning research and you can take these more exotic looking functions and we do an amazing job of forecasting for instance uh for a long time uh systems that have sort of periodic quasi-periodic behavior that are quite of uh that are kind of exotic long time into the future and in fact if you compare it against other standard techniques and the standard techniques i would say come from two side one from things like arima so time series analysis out of the statistics literature or neural networks out of the computer science side of things and if you just compare this kind of fourier forecasting kubeman forecasting trick against them it beats all of them for long-term forecasting for a very short time delta t into the future many of these do quite well but for a long time this is almost unbeatable it allows you to do a lot of things with model building including things like this building low root low rank models without even simulating a system anymore so here's the ground truth of some pde i can reduce this down into some low rank subspace with an svd look at the time series of that and just fit it to 40 modes and almost do a perfect reconstruction of this and also then provide a prediction to the future state and i'm not even simulating a pde anymore right i'm just simply doing a projection doing a long term forecast rebuilding the system which is much more efficient than almost any reduced order modeling architecture because you're actually not solving or projecting the dynamics onto some sequential stepper in time you're just once you've got the model forecast as far out as you want rebuild and this is exactly what this does you can also think about how you want to handle multi-scale physics there's a lot of flexibility in handling multi-scale physics including architecture such as this where yu ying liu is a grad student where he started to exploit transfer learning and multi-grid architect architecture for computation in which he uses a convolutional neural net to look at the coarse grained features of the data and refine and learn new features only in the regions where it's necessary so you have this progressive refinement in the architecture in which you refine only where necessary so you start with a coarse grain representation and you go to a finer representation but not everywhere in your domain just where you need to and this gives you this ability to learn very rapidly because you transfer the learning from one level to the next to be able to refine the models as you go to a multi-scale spatial architecture such as this so neural nets have amazing flexibility when you put them in sort of these innovative architectures they can do a lot of flexible things you can also solve boundary value problems so we've been talking mostly about time but you could take all of this technology and frame it back into uh thinking about boundary value problems so the same architecture i exhibited before below above or in most of the talk where you have an encoder to a latent space then you do dynamics now instead we have this encoder and we do something with boundary value problems and a lot of this was done by dan shea where not only has he taken data from an unknown boundary value problem transformed it and found parsimony representation representations for that boundary value problem in other words discovered the boundary value problem he's also found representations of the greens function how do i learn a coordinate transformation to linearize this process so that what i get in here is a linear boundary value problem and i can construct the greens function and then come back out so this this is all these architectures solve a lot of our classic problems that we have in fact all the methodology you learned in classes for that are sort of classic techniques many of them can just be brought right over into sort of these modern architectures and with suitable adjustments and modifications of neural networks made to work in a very nice way by using targeted neural network structures so i'm going to end here and basically my conclusion here is that this parsimony idea this nominal resonant representation of the physics is i think the ultimate in physics regularization and i think it gives you the best shot at interpretability it gives you the best shot as extrapolation and generalization partly because you have a very uh what you end up with is a model with a small number of parameters that represent you know that give you that parameterize it so the idea is to discover models from data and this is the generic architecture which is it's all about coordinates paired with dynamics so i'm going to leave you there and i hope you have found something interesting in this talk and i would again encourage you i don't think there's can be much controversy around the idea that if you have a good coordinate system that's probably going to be a really great way to represent your dynamics and so what all i've talked about is using some of these machine learning methods to try to help aid in that discovery process which hopefully would help you do a better job in understanding many of your engineering and physics problems that you're looking at [Music] you
Info
Channel: Nathan Kutz
Views: 7,712
Rating: undefined out of 5
Keywords: kutz, brunton, physics-informed machine learning, machine learning, data-driven discovery, neural networks, deep learning
Id: gUxBJU5n2Zs
Channel Id: undefined
Length: 45min 38sec (2738 seconds)
Published: Mon Jan 18 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.