Benjamin Peherstorfer - Physics-based machine learning for quickly simulating transport-dominated...

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right great um okay so hi everyone to um welcome back to a data-driven methods and science and engineering seminar so today we we're excited to have professor benjamin bersorfer from the quran institute of mathematical sciences so uh professor benjamin was a post-doctoral associate in aerospace and computational design lab at the massachusetts massachusetts institute of technology working with professor karen wilcox he received his bs ms and phd degrees from the technical university of munich germany in 2008 2010 and 2013 respectively his phd thesis was recognized with heinz schweitzel prize which is jointly awarded by three german universities to an outstanding phd thesis in computer science and uh benjamin uh was selected for a department of energy early career award and the applied mathematics program in 2018 and from air force young investigator program award in competition mathematics in 2020 in 2021 benjamin received a national science foundation career award in computational mathematics his research focuses on computational methods for data and compute intensive science and engineering applications including scientific machine learning mathematics of data science model reduction and computational statistics so we're very excited to have you uh benjamin with us today and um i'm looking forward for the talk so um the floor is yours thanks thank you very much for this uh really nice introduction and especially thanks for the invitation it's it's great to be here and and have this opportunity to present some of our research so let me start by acknowledging um two collaborators joanne bruner and eric vandenheiten who both are at quran and we have collaborated on one of the core pieces that i'm going to present here today okay let me start by the motivation why are we interested in quickly simulating transport dominated physical phenomena and the reason for me is that i would like to solve outer loop applications and in outer loop applications we have a model that describes some say physical system and we have an application that requires us to build a loop around that model so that we need to simulate the model for many different inputs to compute the corresponding outputs and if the rate is maybe thousands maybe tens of thousands maybe even millions of times we see this kind of outer loop applications almost everywhere in computational sciences and also in engineering for example think of pde constrained optimization where the model is the pde and one has to numerically solve the pde in each iteration step to for example find a descent direction we have a similar situation in inverse problems for example bayesian inverse problems where there is the forward model and to draw samples from the posterior one has to evaluate or simulate the forward model many times in a row for different inputs different parameters different coefficients uh uncertainty quantification where we have uh an input distribution that we would like to propagate through the model to estimate some quantities of the corresponding output random variable these are of the monte carlo based methods where again we need to simulate the model many times in a row for different inputs different parameters and similar situations in visualization multidiscipline coupling control so why are those outer loop applications challenging because if you think of classical numerical analysis classical scientific computing where we have a model for example coming from finite element analysis then the rundime of an out loop application looks something like this we have here on the x-axis the runtime and we need to compute in each iteration of this outer loop application for example a solution q at a time and some input mu 1 and then we move to the second iteration need to compute the q at a certain time and a different input mu 2 and then for mu 3 and mu 4 and so on and for each of those iterations we have to reserve a certain chunk of run time to get the corresponding approximation based on this model and these classical numerical analysis of scientific computing models so often they are based on virtual differential equations they approximate the solution the approximate solution here is q at the spatial coordinate x in a domain time t and a and an input to a parameter mu they are derive approximations in spaces typically which means there is a basis basis functions phi that depend on the spatial coordinates and then there are coefficients that change with time and inwards and the corresponding approximation is a linear combination of this basis function with the corresponding coefficients and these basis functions they are typically obtained via some discretization putting a lot of grid points in your spatial domain for example based on finite element methods final volume methods and one gets then an approximation space is currently v that has a certain dimension capital m and in each in each of those iterations to compute one queue for a different mu one has to solve for the corresponding coefficients of that linear combination and since n um the dimension of these say finite element spaces is usually high this is typically too too expensive to really just plug this into your favorite outer loop application and and do that in a retractable amount of time so this is where surrogate models come into play and surrogate models one changes the way to approach this outer loop applications by splitting the computational time into two phases the first phase is the training phase a one-time high cost evaluation sorry a one-time high cost training phase where one generates for example data from this expensive model extracts some patterns that are important in that data and then derives a cheaply to evaluate surrogate model and this is all done once and for all in this in this training phase and then online one uses the surrogate model to very quickly make predictions how these solutions look like one does not get exactly the same but an approximation say q tilde here but one can much quicker predict or get a good approximation of of this high fidelity model states so this is the idea of surrogate modeling having this training phase a one-time high cost to learn a model and then an evaluation phase where one can very quickly predict um corresponding solutions okay so there are multiple types of surrogate models and i like to distinguish three different types of surrogate models the first one are simplified surrogate models for example if you again say that your model that you want to simulate this is based on a on a pde then one could discretize that pde on a corsa grid or if there's an iterative method hidden on when one constru obtains the numerical solution to the bde for example a newton solve or some iterative linear solver one could simply stop that earlier and this reduces the runtime at the same time increases the error but one could see those kind of things already as some kind of surrogate model the second type are data fit surrogate models that just view the input output map from the muse to the outputs as a black box and try to fit some parametrization some model to that input output this could be a neural network regression it could be classical response surfaces in very low dimensions support vector regression gaussian process regression so all these black box if one does this in a black box purely data driven way one gets for example a data fit surrogate model and then the third type a projection projection-based reduced models and those in some sense combine both worlds here we have a purely physics-based world that does not use data here we have a purely data-driven world it only looks at data and in projection-based reduced models it combines both first from data important dynamics are extracted based on say full model states that have been computed in that training phase and then in the evaluation phase one does not only rely on data but rather still goes back to the governing equation wants to solve them numerically but in now a data informed parametrization that one has extracted or learned or at least informed by the data that one has assembled in the in the training phase and this is the spirit that that we will follow here in the following and of course you are all very familiar with physics-based machine learning so projection-based reduce models in some sense are very early version of physics-based machine learning okay so how do these typically work they try to identify latent dynamics this is what i meant with important dynamics that this reduced models try to identify latent dynamics in the sense that if we look at a solution q at a time t and an input mu then one can think of this as a function in terms of x and look at the corresponding set of possible functions that one can reach over the parameter set for the input set and the and the time range and the idea of classical model reduction these classical projection-based reduced models is that these solutions are not scattered all over the high dimensional solution space curly v here think of again finite element space but that they form smooth and very low dimensional many phones and if you um are more coming from linear ultra perspective then empirically one could observe such a smooth and very low dimensional manifold by for example looking at snapshots so the the coefficients of these approximations for certain times and inputs and puts when one can put those as columns into a matrix and look at how quickly the singular values decay this then if one samples right gives at least an indication that there exists a low dimensional space that well approximates the columns of this snapshot matrix so this is the notion this is the idea of classical model reduction really where then in the training phase one constructs this much lower dimensional space curly v little n where little n is much smaller than capital n capital n think of finite element uh space dimension and this is this low dimensional data sorry the dimension of the low dimensional space this is trained or constructed um in the training phase once and for all and then in the evaluation for phase one tries to find an approximation now of the of the solution from the finite element space one tries to find an approximation in this much lower dimensional space curly v little n and if the dimension um curly v little n is much smaller than the dimension of the finite element space if one can choose it that way then um one can achieve really fantastic speed ups with these kind of classical model reduction methods and i'm just listing here a few survey papers on that from that topic okay let's have a look at a toy example this is the heat equation i'm skipping all the all the details here just think of a diffusive process where we have here time and here's spatial domain so it's heat equation just 1d and you can see here the the diffusive process and on the right plot here i'm showing the index versus the singular value of the corresponding snapshot matrix so i've collected states over time and computed now the singular values of the corresponding snapshot matrix and you can see the singular values dk here it seems fairly quickly with maybe 20 or so some modes we can up to machine precision describe at least the data that we have sampled let's now have a look at another example which is a linear advection equation so pure transport where we have here again time versus the spatial domain and we start with a gaussian bump as initial condition and the only thing that this linear induction equation does it transports this this gaussian bump through the spatial domain over time and here even with a constant speed and if we now look at the singular values of the corresponding snapshots then we can see here again index or number of modes versus the single value a much slower decay of the corresponding similar values so this already indicates that going from diffusion to transport there really seems to flip something that makes it much harder to find a low dimensional space that makes it much harder for this classical model reduction methods to work well in these transport dominated regimes okay these are of course two examples i just want to point out one real world example that came as a as a as an application this effort center for multi-fidelity modeling of rocket combustion dynamics that is a joint um center by afrl and avosr u michigan the odin institute university of kansas purdue and and also us are involved in that and in that center there is an interest for example in um combustion engines where this is a single injector combustion engine of a liquid liquid-fueled rocket engine so one part of that of that engine and what i'm showing you here is the pressure in how it evolves in that in that combustion chamber and you can see that there is a wave traveling through the domain so again there's transport happening here and this is what makes this kind of application um hard for for traditional classical model reduction and what engineers would like to have in that setting is that they can very quickly predict the amplitude of this pressure waves and how it develops over time and i'm showing this here time versus the pressure and this is the this is the amplitude how it increases and one would like to find a design and this is the outer loop application try to find the design a length of the combustion chamber or some other on properties that one can change would like to find a design such that this um growth is found so that at some point it enters the limit cycle and and the process is stable otherwise it blows up and and explodes okay so this is just a a motivation that we see um that there is a fundamental difference for model reduction whether we look at the diffusive process or transport dominated process and that we see this kind of behavior in in real world applications for example in these these combustion problems so this is something to think about now and this motivates the the following work i will first talk a little bit more about the formalities what does this exactly mean now in terms of mod reduction how can we describe this behavior mathematically that there's a difference between diffusion and transport and what could we do about this and then this is the the collaboration with um sean bruno and eric fondanidon we have one we want to propose one idea of how to use the networks in a specific way to address this kolmogorov area this this issue that classical model reduction does not work well for transport dominated problems then i will have some numerical experiments to showcase that approach all right so let's dive in um into uh describing a little bit more in detail what it means that classical model reduction cannot be applied and for this we need to understand the properties of the solution manifold and i said it is the set of all these cues that we can reach over time and different inputs based on the on the pde that we would like to solve and one measure to understand in some sense the difficulty of this of approximating elements of this curly m is the kolmogorov end with and there are multiple versions of the kolmogorov n with i'm showing just one of them here uh this d little m m that is little n is the dimension of the space and this kolmogorov end with tells us this is the best we can achieve this is the best case error over all elements that we can achieve with an n dimensional subspace of this high dimensional space curly v so picking the best space best subspace of dimension n and and then the the worst element that maximizes the error this is this is the error that this space achieved so there if there's no other space that uh if we have a space that achieves this d and um n this is end with then there cannot exist another space that um achieves anything better in other in other words if this kolmogorov end with the case slowly with the dimension n or it decays quickly with the dimension n it means we can either not find good spaces or we can find good spaces and there are some limited results on when the kolmogorov end with the case quickly here's an earlier result from 2002 but there are newer ones that are more general but for some specific setups for example here a single parameter symmetric recursive elliptic pde has been shown by made bacteria and then collaborators in 2002 that the corresponding chromograph and with the case exponential and this is what what typically we want to see in classical model reduction and exponential decay as we increase the dimension of our subspace and this a space that achieves such an exponential decay can also be found with freedom methods it has been another result a few years ago so to summarize roughly speaking we see a fast exponential decay of chromograph and with for diffusion-dominated problems and this agrees with our numerical observation that we had earlier that where we looked just at the singular values and said well they are became fast here clearly singular values decay does not directly say anything about the end width but it at least gives us a numerical indication now on the other hand we looked at a transport problem so here linear advection this is the linear advection equation with a speed mu here and then some boundary condition and the initial condition is a step and you can see the only thing that's happening here is over the spatial domain this step is just transported to the right with constant speed and in 2016 olberg and draven have shown that the corresponding end width of that solution manifold cannot decay faster than one over square root n so going from the exponential decay that we see in these diffusion dominated problems do not faster than one over square root n for this transfer dominated problem and this is the monte carlo rate so typically this is way too slow to be worth to have this offline phase of finding a space and then um solving that and similar results have been shown for other transport dominated equations for example wave equation by grife and urban in in 2020 okay so this this just formalizes that we can based on the kolmogorov end with um distinguish between two cases the the first case problems where there's a fast decay and then this classical model reduction methods work well and then problems where there's a slower decay and then we know there cannot exist a classical a classical mod reduction method that does well because there simply is no space there this this is good and bad news um it is badness in the sense that these classical non-reduction methods do not work in that case but it's also good news in the sense that the kolmogorov end with only applies to these linear approximations that classical model reduction methods do namely these approximations in spaces that one has basis functions that are independent of say time and parameter over time and inputs they are fixed computed once and for all and then one has these coefficients to form these linear combinations we can really think of this as linear models if we think of the coefficient vector here we can compute the inner product e transpose of this of the spaces and what i mathematically mean by linear besides this this scalar product is that this function space in which one derives approximation this low dimensional space in classical model reduction is independent of the element that one wants to approximate so one way out and you can see this already is going towards non-linear approximations and one step is localized model reduction localized model reduction one precomputes multiple spaces so curly v and one to v and j where each of those spaces is spanned by a different set of basis functions and then based for example on time and parameter one selects a different space online in the evaluation phase while one source the outer loop application depending um on on certain properties that one sees in the solution and i'm just giving here as an example based on time and and input so the approximation now q tilde is um depends on the or the space in one which one approximates depends on time and the parameter and in that sense it's a it's a non-linear approximation because now the approximation space changes depending on which kind of element i would like to approximate and these kind of localized small reduction methods have been investigated in the last 10 years or so quite extensively and they are also closely related to dictionary approaches in compressed sensing where you have a large dictionary and you want to pick a few elements that will well approximate your behavior and of course nathan and steve are experts in in that setting i'm just listing here at the moderate reduction literature okay so they're moving from localized mod reduction to adaptive model reduction where one says i would not like to pre-compute multiple spaces but i would like to change my space as i go so making the space curly vn depend on time and the input and corresponding it is the the basis depends on time and input and then one has to do two things in the online phase and the evaluation phase one first has to approximate or find an approximation in that in that basis that changes with time and parameter but then also one has to evolve the basis forward in time during the evaluation phase so finding an update to the basis functions as one moves forward in time so evolving the basis and they are really the challenge is to do this efficiently and in a stable way and there has been also quite a lot of work in in that direction for example dynamic low low rank approximations um by various people um subsys for example here there have been we have done some work on adaptation from sparse samples looking only at the few samples of the solution and so advancing the evolving the basis um kevin kaulberg has done work on enriching subspaces via an h adaptivity for example as well okay so we have seen linear approximations localized adaptive now the next step in some sense is abstracting away from time and and input here allowing the space to not only change with time and and prominent but making this a little bit more abstract but just saying we would like that um our space in which we approximate or yeah our space in which we approximate depends on some feature vector alpha and then our approximation depends on beta and alpha and beta is the linear coefficient that enters in this linear combination and alpha decides how to change the representation and there you already can see where this is going of course this is going now our steep neural networks where one changes the representation and which one wants to approximate the corresponding solution one learns the features on the way and this has of course seen very active research over the last several years and and and just listening here three key questions how to parameterize so how to choose this how the the the phi depends on alpha you can think of what architecture to choose in some sense what is the best approximation error that one can achieve we have seen for linear approximations there's the kolmogorov end with and if one can upper or lower bound this one knows the best one can achieve or at least in an asymptotic and in a bound in a bound way and then um the the numeric side how can we now build numerically efficient sofas based on parametrizations that look like this so where we have this linear coefficient data and a feature vector that enters in some sense the basis function that changes the the space as we move forward and there's no way anymore to really list the comprehensive set of of works in that direction so i'm just calling out here a few um highly visible works from the last several years by of course um nathan and and and steve but also by weiner and um lars rudotto and of course also the work by um paris and kania darkis on on physics informed neural networks okay so we are faced now with this situation and and it was a little bit a lengthy introduction but i really wanted to motivate why we need to look at nonlinear approximations like this so if if we solve these classical diffusion based problems then i when they are in lower dimensions then there's based on the column graph not really much use of going to a neural net but if one has this transfer dominated behavior then we know that um spaces are not enough and we need a non-linear approximation the question now for us and the following is how can we build a numerically efficient solver based on this parametrization how does this compare to other works that are set out there all right so this brings me now to the to the core of the of what i would like to talk about this knowledge method um as i said which is a collaborative work with uh schwa and bruno and derek von and i both had quran just checking here um okay all right um so what is the setup um we have a time dependent pde um here's the time derivative we have uh q as our still our solution that depends on time and x and i have a right hand side function f here that again depends on time on x and the solution we assume in the following we have suitable boundary conditions for this and suitable initial conditions and i skip the the input here for now there's no parameter mu i will come back to that a little bit later for now there's no parameter mu and then the only question that we're asking is we would like to numerically solve such a time dependent pe or such an evolution equation now we parameterize this so we take the solution q at the time which is then a function in x and it's in some space curly v this could be a very high dimensional space and then we would like to parametrize this with some theta that depends on time so that we can equivalently write q of theta time and and x so this is an equivalence there's no approximation at this point this could be infinitely many parameters for now it could be a function that changes with time and is in some space theta the key that we that we want to achieve of course in the following is that we have a non-linear or that we allow a non-linear parametrization so this theta t can enter non-linearly into q and if this is a finite dimensional parametrization then this fade and you can think of this theta t as being a vector of for example weights in your in your deep network parametrization but i want to point out that um deep networks are one kind of nonlinear parametrization that that is compatible with with what i'm going to present okay the question is now how are these parameters theta found and one of the widely used approaches is learning these parameters via collocation this is for example done by the dgm method or by the physics and four neural networks and and variants of of those those works the idea there is to draw samples t and x over this time space domain and then fit the parameter theta based on minimizing the residual at those sampled points or those collocation points so i call this a collocation approach because you draw many samples and then you evaluate the residual at those samples and so you adapt your parameter you fit your parameter based for example on some optimization some sgd approach and i want to point out that this parameter is really the parameter of representing the solution so we are not trying to learn the equation for things like this which is also done quite a lot with deep neural networks we really want to approximate the solution we have given the pde and i would like to solve okay so we are interested in transport dominated problems as i said and in transport dominated problems we often have local features that travel over time for example here we have the spatial domain versus time and there's again this this gaussian bomb that then travels through this spatial domain to the left over time if you now think of these classical collocation methods then what one has to do is to sample the time and space domain which is just two-dimensional because we just have a one-dimensional spatial domain but already in two-dimension you can think it is quite challenging based on purely on sampling to even find this very local feature that changes over time that evolves over time and most of this space actually nothing is happening so one really has to extensively sample this time spatial domain to um to find where the residual is is not zero and to adapt or fit the parameters of the of the parametrization accordingly with collocation methods and you can imagine that if you go to higher dimensions this gets of course exponentially more difficult because if you're in higher dimensions and you have one local feature that moves over time you have to extensively sample that domain to find that feature and this is something that we would like to address with this neuralgian idea there we say we aim to find a solution by imposing now dynamics on this parameter theta t rather than trying to discover via optimization where things are happening and then fitting the parameter based on collocation points so we now really would like to derive an equation of how this theta t evolves over time how do i say weights of the of the neural network evolve over time and the dynamics are given by the pde that we are that we want to solve okay so how are we doing that um let's first look at the residual function we take the the time derivative chain rule here we get the residual r that depends on theta and then and then ether and the and the spatial coordinate x of course and um this is this is the residual this is the time derivative this is the right side and then we have an objective function and now this objective function depends on time t so this is not a time space approach anymore where we sample over time in space but the objective depends on time t and changes with time t this objective um there enters theta and and ether and then we have here the the residual squared and a regularization to avoid an unbounded growth of the of the parameters for example now one key aspect is how we are doing how we are formulating this this integral here namely via a measure nu t that depends on time so um the the the objective is time dependent and it is time dependent also because the measure mu t sorry nu t with which we compute this integral here depends on time and boundary conditions can be imposed on either the penalty terms or encoded directly in the parametrization okay so having this time dependent objective now we would like to minimize it over all times we would like to find a theta dot t that minimizes the corresponding objective at time t and this now means that we for example want to set the corresponding gradient of this objective to zero and that leads to the euler lagrange equation but simply the gradient setting it to zero and you can see what enters here is to say the dot coming from the chain rule and the theta itself so if we if one looks at that more carefully one can see that this equation now is an ode that describes how the theta how the parameters of the network evolve over time if we want to solve the corresponding pde in a variational sense so this this kind of ode where we have here an m and an operator m that depends on time and the theta t and we have a right hand side capital f it also depends on time and the theta t this this system of odes imposes this dynamics on the on the parameters theta so that the corresponding non-linear parametrization q of theta t solves the pde in a variational sense so this is again different from collocation where one tries to minimize the empirical laws and so finds how to change the parameters over time all right good how about these operators now how do they look like i'm just writing them down here we have this m and we have an f the m both depend on time and this parameter and you can see here the again the measure nu t showing up so this m changes um with with time and also the way we can compute it or estimated later on will change with this measurement t will change with time t and the same for for the f what does this mean if we now have a finite dimensional approximation just for a moment we can interpret this m and f if we have a parametrization it looks like this which is written in a general way we have a q that now depends on this vector theta and time t and then we have here a linear output layer depending on this coefficient c and then we have here some channel and that depends on on again features c if we have a parametrization like this then the corresponding m and f um uh are operators that correspond to galerkin projection this this um explains the name neural golurkin because we have a golurkin projection here with respect to a test space that is spanned by this files and that is spent by the c times the gradients of these files and these files they change based on the on the features and then we have here we have the residual and again you can see here the time dependent measure so we have here n plus np equations in n plus np unknowns so one can explain where this is coming from even via a golderken projection okay so what do we do now with this ode we now um have turned solving the the pde and this nonlinear parametrization we have turned it into integrating an ode and in principle we now could pick any classical time integration scheme and apply it to that system of odes and integrate it forward in time and just uh just one comment that this worked by ryan iron and collaborators they also have not only a collocation approach but they also have some discretization and time in a very special case where then also the number of layers increases with time this is not the case here this parameters whatever how how independent of how many layers you have you have different time steps here you can integrate it forward in time with different time dispersation streams okay why is this now critical for us because um we can for example now choose a time dependent sorry we can now choose a time step size that depends on time so at delta t k we have here time discretization and the time step size can change with time and think again of our transport dominated problems it's really important there that for example depending on how the how the dynamics evolve we might want to have smaller or larger time steps just as in classical numerical integration where we also have for example when we put the four or five schemes that adaptively choose how you move forward in time this is hard to do with the collocation approach where you have a time space and you just sample over it and i denote in the following now these approximations of the parameter at time t k with theta k so we can discretize the corresponding ode now in essentially two different ways the first one is explicit the other one is implicit and both it's compatible with both these um both types of schemes for example if you have an explicit runge-kutta scheme then in each time step we have to solve such a system here where you see m depends on the previous time step and the parameter at the previous time steps and also f depends on the parameters of the previous time steps it's an explicit scheme so in each iteration in each time iteration we now have to always solve a linear regression problem it's only a linear solve for a linear regression problem in each time iteration um to move forward these parameters if you do an implicit discretization in time then we get a for example something like this this is an implicit euler we are now this m and f they depend now on the current time step on theta k the theta k that we would like to find and there now we get non-linear systems that we have to solve and non-convex problem that for example we have to solve in each time step with um for example sgd what we have done was that we looked at the time continuous problem derive the optimization problem and then discretize we could also turn this around and first discretize and then derive the corresponding um the gradient that we want to set to zero we are working in in that direction as well um you would understand what the benefits and traits of some of these two approaches all right so the final piece that is missing now is how do we get this m and f um and of course we cannot analytically compute those in in general cases so we have to resort to some estimation or approximation and the natural thing is to do monte carlo to replace m and f with monte carlo estimates of m tilde and f tilde this is also very similar to what one does in in by replacing the population loss with the empirical loss for example so one gets then an estimator m tilde that depends on the number of samples m that we want to afford to estimate that m and the key thing is that these samples are now drawn from this time dependent measure nu t so we draw samples from mu t x one t k to x m d k and then form for example this is monte carlo estimator and in the similar fashion for f now because in each time step we we can change this measure nu t we can also change the way we sample and then this is going back to this idea earlier that i mentioned where i said we have this local dynamics and we don't want to discover them by just purely sampling in an oblivious way but we would like to track them um in in a in a more adaptive fashion we can do this now by changing the smashing nu t for example by sampling proportional to the solution of the previous time step square and a careful choice of the units of the parametrization rule in the numerical results greatly help us to do a simplified adaptive sampling we change the way we sample data during during time integration so to summarize this neuralgian approach is adaptive in time in the sense that we can choose adaptively time step sizes and so can make larger time steps where possible but can resort to smaller time steps in a dynamic fashion where necessary and it is adaptive in terms of the sampling so that we approximate or estimate this m and f operators depending on how the solution changes over time okay so this brings me now to the numerical experiments um to show this method on on a few examples and the first one is a really classical benchmark called the vector freeze equation kdv where we have here a third order derivative and we have here a non-linear term this is just one d so i'm skipping the adaptive sampling for these 1d examples here i have higher dimensional examples later on we have periodic boundary conditions we use an adaptive time integration scheme room equal to four or five and just a uniform segment again it's it's one dimensional so the sampling doesn't really pay off that's it's more important in higher dimensions and then we discretize this just with a shallow network based on exponential units that are that are periodic to enforce the theoretical condition you can see how the solution changes over time here in the in the right plant this is the result that we get this is the truth we have an analytic solution for that setup spatial domain versus time of the truth this is linear galerkin so having basis functions fixed at an equidistant grid spatial domain versus time and you can clearly see it's not very well approximated and then we have neural galerkin that has the same number degrees of freedom as this linear governing fewer nodes but same number of degrees of freedom time was a spatial domain which more accurately approximates the truth vendor than the linear algorithm another 1d example that i wanted to show is the alien khan equation where we have a quadratic potential and where we have a time and space varying coefficient and you can see the solution here spatial domain versus versus the solution and it it forms these two states um over time and because we have this time dependent and spatially dependent coefficient it's it's a little bit um more tricky to form those those um two steady states than in classical settings here we use a backward euler time discretization and again uniform sampling because it's just 1d and we use a deep network with with multiple layers and and 10 h units where we make the input layer again periodic by composing it with a sign these are the results that we get this is linear galerkin and this is neuralgin and you can see linear galerkin same number of degrees of freedom cannot really predict the the right steady states uh the the neurological look and that's what it what it how the solution should actually look like it evolves into two steady states here one in the gray one in the red whereas the linear golurkin fails to do that and has still this kind of intermediate state there i'm also showing you here some relative error of the state versus time blue is linear galerkin orange is neuralgia working with just gaussian units and a shallow network and green is neuroglurkin with 10 h units and three layers which achieves about two watts of magnets improvement for the same roughly the same number of degrees of freedom than than linear bulk now i said we impose dynamics on the coefficients and on the features um how they move forward in time and i want to show that here this is linear golurkin this is time versus the coefficient that enters linearly and time versus the feature in that case it's just the position of the units which are fixed they don't change over time and they're equidistantly set in neural can we just have two units two nodes and the corresponding coefficients change with time but also the features evolve and this this this dynamics that i imposed they are based on this on this colurking projection scheme that i that i mentioned okay now to the more interesting examples higher dimensional transport this is really where we want to be at we have here a high dimensional advection equation high dimension means the only dimension five but it's already quite challenging there in my opinion because if you look at the marginals in dimension one two three four five you can see very local dynamics in this high dimensional five-dimensional space that that these local features that evolve over time with different speeds and that have different shapes in in the dimension we use your wrong code to four five for the time discrete discretization scheme and um we would like to avoid having to uniformly sample this high dimensional space to discover these features but rather would like to track them now how do we do the adaptive sampling in this case it is based on the units that we choose and we make our life here a little bit easier and say we just have exponential units in our network and then we can choose the time dependent measure mu t proportional to simply the mixture of those units of those nodes and and because the features of those nodes change over time also so does that the measure over time and then we can sample from this new d and each time step and so estimate our m and f operators and these are the results on the top row you can see the truth we have an analytic solution in that case on the middle row you can see uniform sampling and you can see nothing is happening there it simply cannot discover these these very local structures with just a thousand samples in this five-dimensional space and the last row is the adaptive neural galorkin approach where you can see it tracks these local features and only a thousand samples are sufficient because we adaptively sample in the in the right places based on this time adaptive measure mu t all right i'm seeing i'm starting to run out of time so last example is a particle trap this is eight dimensional where we have eight particles that are in that are attracted to a trap i'm showing here particle one versus four versus eight the position of those particles and you can see that they are approaching this this trap and then get trapped there and and start to oscillate this is the positions of these particles is governed by an ste and the corresponding density is d-dimensional and it is um it is described by a focal planck equation you can again expect here that we have local features in high dimensional spaces because the the particle density concentrates over time as these particles get trapped and this is really hard to track with just an uninformed sampling in in a general case these are the results that we get so these are three projections here particle one four eight one two three six seven eight and i'm showing in black the the the mean of the of the particles that we can compute here um based based on other methods and in orange i can show that i do show that the neuralgia approach based on adaptive sampling and in blue i'm showing neural lurking based on just a uniform sampling and you can again see that the adaptive sampling really is key here with a thousand samples per time step we can very accurately track these particles and and how they will move and you can see quite different dynamics depending on what kind of projection you are looking at whereas in all those cases pure uniform sampling simply fails to discover those dynamics and so cannot really well approximate the corresponding particles i also want to show some error plots this is the mean of the particle the relative error for the mean and this is the relative error for the covariance of the corresponding density in orange this is neuroglurkin based on our adaptive sampling um the mean is approximated with about a relative area of 10 to the minus 3. and blue shows the neuralgia with just a uniform sampling so which would uh which one would have if um for example just samples a time space domain and there you can see it it approximates very poorly the actual mean that the errors go way beyond one and this is a relative error for the covariance i'm not even showing what the uniform sampling gives because it would distort the scale here but i i show what um the adaptive neural working gives you can see for off diagonal elements we get about 10 to the minus 1 in accuracy for this covariance matrix and for the diagonal elements we get about 10 to the minus 2 relative accuracy for the covariance all right so that brings me to the conclusions the take away message i think is that we need nonlinear parametrizations for efficiently approximating transport dominated problems especially if you're interested in outer loop applications we would like to find low dimensional latent dynamics that we can quickly approximate this led to the question how should we do that numerically we can of course use a nonlinear parametrization such such as a beat network but then the question is how to solve for the corresponding parameters and features over time and what we proposed is to impose dynamics on those parameters that are induced by the pde and so then integrate those forward in time rather than trying to discover via sampling and collocation where the residual is high and trying to reduce the residual corresponding and this adaptive sampling really is key especially in higher dimensions um it's for example in this particle approach where features are local and where we don't really have much hope to to approximate those um by just discovering for example next steps are clearly to connect this back to the outer loop application and the mod reduction ideas but um i can mention that we will have a preprint hopefully out on that fairly soon the next two weeks on this neuralgian approach if you're interested much generally in nonlinear mode reduction there will be an upcoming more educational article on the notices of the american mathematical society in 2022 and we also have some other work on non-linear model reduction methods based on adaptive spaces and with that thanks for your attention and i'm happy to take questions comments thank you very much for this amazing talk so um now we'll open the floor for questions uh there's already an a question in the chat which i'll ask and and slide 12 uh does the same hold for a multi-linear approximation multi-index and some i think 12 um multi-linear multi-index sum um so if this is referring to tensor like approximations then then i would say no because there you have a non-linear composition in between so there you are more in the non-linear approximation type already adaptive model reduction maybe so if anyone has any questions you can raise your hand and can unmute you or just write it in chat um so just uh from jijon um just a quick question did you do commercial home equation um we did not do that equation we tried a few other ones um just internally to make sure everything is working well but um we have not tried that particular one um but it sounds interesting i will definitely look into it he's asking if he can unmute i think yes um can you raise your hand so that you can see on the chat yes we can hear you now hi uh this is uh in chao i have questions regarding kamasa home equation i heard your answer well thank you so much for addressing my question uh but uh you know kamasa home equation has picked a solitary so if you have any computational skill to desecrate uh discretize the kamasa home equation let me know thank you so much sure okay we will definitely have a look thanks for pointing that out uh hey ben uh i'll jump in since we don't have any talks in the chat uh great talk um the i was just curious uh the transport equations that you were uh approximating here have local features uh and just a few of them uh do you think your method has benefits when there's a lot of distributed local features like in a turbulent sort of model yeah that that's a really excellent question so you can of course push this to the extreme and say you have um in a classical sense a wave at each grid point right almost like um virgo lens this this randomization type behavior then there's no way to really reduce that um anymore right because it's randomized so there has to be some structure and the structure that we are exploiting is that there is some locality um it depends though what locality means right um here i focus more on the spatial locality but you could also have locality in some other sense that with the right units with the right parameterization you could exploit in a similar setting so that's that's a really excellent question that's something that we are still trying to really um get a mathematical grip on what how we could formulate something like this where something like this would show up um how this locality would play a role that's great thanks thanks so much so we have a question from alexander and just on youtube sorry yes not anymore so alexander yes benjamin thank you for nice talk can i see your talk um in the way that it's explained us why machine learning methods are successful in solving cfd problems or problems with store booleans you see you somehow that machine learning methods does this non-linear approximation for us can i see it like this i i wouldn't say i have answered that grand that grand question but for this specific um transport dominated problems i think there one can say that linear approximations simply suffer from these slowly decay decaying chromograph end with and anything that's non-linear can in principle break that barrier and i from my perspective from my limited perspective of course deep networks can help in that in that situation because they adapt the the feature representation and that's really the key why for us in this setting neural networks are i think useful but of course there are many other situations where they are really excellent and that has nothing to do with with this challenge that i'm describing here okay thank you and maybe one last question um in the chat there's so similarly for example uh stiff terms could be handled um yeah because you can change the time discretization schemes which you want to integrate these parameters in time and if you have a stiff source term then most likely you need something implicit there and you can we even use the implicit methods in in these numerical experiments so yeah the challenges are not going away of stiff problems you still have them but i think you can address them with maybe similar tools as in in classical numerics implicit methods for example [Music] all right great thank you very much um thanks for asking questions thank you for this amazing talk uh it was great having you and uh with this we will conclude um next week we won't have a talk but the week after and so looking forward to discussing this with you again in future so thank you everyone for being here and thank you benjamin for being with us thank you very much thank you great to see you good to see you
Info
Channel: Physics Informed Machine Learning
Views: 469
Rating: undefined out of 5
Keywords:
Id: 9Cxgu9TuY2U
Channel Id: undefined
Length: 60min 37sec (3637 seconds)
Published: Wed Dec 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.