How Can Physics Inform Deep Learning Methods - Anuj Karpatne

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right thank you everyone for coming this I believe the last talk for the day so I'll try to keep it short so today I'll be talking about how can physics inform deep learning methods in scientific problems which is very much related to the theme of this workshop here is an outline further for the talk for today we will first try to understand why do we even need physics to inform deep learning methods why are black box methods not sufficient just by themselves of in scientific problems I'll I'll discuss this emerging paradigm of theory Gary data science that tries to integrate scientific knowledge for data science methods and in this paradigm I'll discuss some of our recent work on integrating scientific knowledge with with with deep learning methods and finally I'll talk about some of the researcher research directions in this in this area so we are all aware of the deluge of big data in physical and life sciences which is reflected in the in all the great talks that we heard today we have we have data from from satellites from models in in climate science in particle physics in quantum chemistry quantum physics and so on all of this data makes it possible for us to apply recent advances in in data science such as deep learning to extract knowledge patterns and models automatically from large volumes of data and these methods are generally developed and deployed in a manner that is agnostic to the underlying scientific principles driving these these variables thus earning the name black box models these methods have been hugely successful in commercial problems in computer vision speech language translation and so on and we have been hearing this throughout the conference and there is a huge anticipation and and growing promise in this in the scientific community to untap the power of deep learning for scientific problems this is reflected in in a series of special issues in in nature and science journals to the extent that some of you even referred to the of black box data science as the end of fury where the idea is that given large collections of data we can we can completely discard scientific theory based models that have been the cornerstones of scientific progress and completely rely on the information contained in data our such tall flames generally fall flat when black box methods are actually applied in practice in scientific problems and a case in point is is their eyes in fall of Google Flu Trends so this was a system to predict the onset of glue of flu just based on google search queries without using any any physical knowledge of how diseases spread and despite its initial success in the years that was used to train the model it soon started overestimating by by several factors to the point that it was eventually pulled down and this is not an isolated story there are many articles and reports that highlight the pitfalls in black box applications of data science in scientific problems so what's what so special about scientific problems that makes black box methods fail or at least not able to achieve similar levels of success well there can be many many reasons one of the primary reasons is that in many scientific problems we are dealing with with the large number of variables that interacted complex non-stationary ways and we often have limited number of samples for example in climate science even if we look at course resolution data sets we have about 10,000 locations or variables across the world and we only have data good quality data for the last 30 to 40 years which is starting from the satellite era or we may go to maybe 100 or 150 years this makes it difficult to train data hungry methods to achieve good training performance and even more so when the size of both the training and test sets are small even standard methods for for evaluating model for performance break down this is because how if we have a limited labeled set howsoever we may divide or split the data into training and and and test portions it is easy to learn relationships that look good on both the training and test sets but do not hold well outside the label said this is this is what what actually happened with with a Google Flu example and and a more fundamental concern with that box methods is the need for interpretability in scientific problems yesterday we had a great symposium on interpretability in machine learning and and a very lively debate and III believe interpretability is even more important in scientific problems for several reasons because a method a model that can be explained by scientific theories by the by existing scientific theories can form as the basis of scientific advancements it can be used as building blocks by other researchers and a more direct importance is that a model that can be explained by existing scientific theory stands a better chance at learning generalizable patterns and avoiding the phenomena of overfitting so with that understanding let us look at this dichotomy between theory based models and data science models using this 2d view on the x axis we have the amount of data that we are using and on the y axis we have the amount of scientific theory that we may be may be using and in this space theory based models would reside in this particular region where they make ample use of scientific theories that can range from from close from solutions to to numerical models of dynamical systems and despite their huge progress they contain knowledge gaps in describing certain processes that are either too complex to understand or are too difficult to to observe directly on the other side of the spectrum we have data science models that make ample use of data but but are agnostic to - due to the underlying scientific theories and they both make ineffective use of the two sources of knowledge scientific theory and data and there is it there is a there is a promise in there is a need in developing methods that can use scientific theory and signs at an equal footing this is the paradigm of theory guided data science that that tries to take full advantage of the unique ability of data science methods to automatically extract knowledge and pattern from data but without ignoring the treasure of knowledge accumulated in scientific theories our recent perspective article in this topic tries to build the foundations of this paradigm by discussing several ways scientific knowledge can be integrated with the data science methods in various scientific disciplines and this idea of combining scientific knowledge by data science is already beginning to show promise in in number of emerging applications and this is also reflected in all of the talks that we heard today which are which are all examples of this of this line of research here is just a short highlight of some of these works this is in no way come complete or exhaustive we have some of our own work in earth science that I'll be going through today there are there are approaches to build machine learning methods for turbulence modeling for discovery of novel materials for for new density functionals and in many other domains and there are many other conferences and industry initiatives that are also focusing on this line of research one of the examples is is the physics involved machine learning workshop at law at Los Alamos National Laboratory so let's try to understand to understand what Furi gata data science is let's let's look at one of the overarching objectives which is to learn physically consistent models so traditionally there is a notion to favor simpler models to achieve generalizability and and one b22 view this is by looking at the space of all possible models for a given predictive learning problem where the star represents the true nature of relationships and the true model between the input and output variables and we have three different families of models m1 2 m3 ranging with different levels of model complexity shown as these three curves any point on these curves represents a specific model that can we learned given a specific sample of training instances so what we can see is that models belonging to m1 on an average are quite far away from the truth so they have a high bias but but they do not vary a lot with small changes in the training set so they have low variance on the other hand models belonging to n3 on an average are quite close to the true the true model but they have they have high variance so so we need to reduce both the bias and the variance and and one way to look at this is to to say that any estimate of generalization performance has to depend not only on the training accuracy but also on the model complexity so you want to be as close to the true model as possible without having a lot of variance and this is at the heart of several machine learning algorithms that try to use different notions of model complexity rooted in statistical principles but in scientific problems there is another source of knowledge that can be used to to to make this trade-off which is the physical consistency of the model so the basic idea is that if we can prune spaces of models that are inconsistent with our available scientific knowledge we can we can reduce the variance without likely affecting the bias so this would result in a revised notion and generalization performance that depends not only on the accuracy and the model complexity but also on the physical consistency of the model so this is this is at a very high level what the goal of discarded neural networks of theory gauri data science is and what I will talk about next is a specific framework for learning physically consistent neural networks which is precise guided neural networks this is a work that is in submission at SDM and a preprint of this work is is available on archive to illustrate this this framework of P G n n let's look at an at an example problem of Lake temperature modeling so so in this problem the basic goal is given a list of input drivers such as the amount of radiation preacher and so on we want to model the temperature of water at every depth in a given Lake across time so this is looking at a 1d modeling of temperature over time and the basic approach is to use a physics-based model so so the state-of-the-art physics-based model is a general Lake model GLM and just like many other physics base models it requires it requires it involves parameters that have to be calibrated individually for every Lake and and this really improves the the performance of the model for example for a given example Lake and Minnesota if we use the uncalibrated model which which uses generic parameters that haven't been tuned for this specific lake but but were reasonably well over over over a majority of lakes we have an r MSC in the in the model predictions as two point five seven degrees Celsius but if we use the the label data for this particularly to fine tune the model we can achieve an RMS C of one point two six degree Celsius but this this comes at at a cost of the amount of data the label data that we that we need for calibration and the many of these calibration methods are computationally expensive so the the hope is can be used machine learning - to help in this process to achieve better performance with limited training sizes and with limited computational resources and one naive solution could be to train a black box model that that takes us from the input drivers directly to the target output without using any physical knowledge but but in the presence of physical knowledge we can do more and in the first and the most simple way of using physical knowledge is to use the output of a physics-based model as an input in a neural network so along with the input drivers we can also use the outputs of a physics-based model in the input layer of a neural network and we can have multiple of these hidden layers fully connected to finally give us the estimate of the temperature Y hat on the output layer this is this is in some sense trying to build hybrid physics data models where we are trying to see if a neural network architecture can can estimate the residuals of the physics-based model using the feature space of the input ranges and the traditional approach to train any such neural network would be to you to minimize loss functions that depend on let's say that the discrepancy of the model predictions and the actual observations as well as on l1 l2 statistical bonds of model complexity but apart from these two losses we can also use loss functions that measure the consistency of the model with respect to two physical principles so the basic idea is if any model that that makes estimates of temperature should be aware that temperature is related to many other physical variables through these physical relationships for example temperature is directly related to the density of water through this nonlinear physical relationship where water is has maximum density at 4 degree Celsius excluding this the salinity of water and and the density tapers off as we move away from 4 degree Celsius so any estimate of temperature can be converted using this this physical relationship to an estimate of density Rho hat and once we have these estimates we can exploit a key physical constraint between the density and water and and and the depth of water which is that denser water goes down to the bottom of the lake and thus is at a higher depth so if we if we increase the depth of water we would see that the density would monotonically increase and this simple physical relationship can be used to design a physics-based function as follows for any two consecutive depth values or we can look at the difference in it in the in the density estimates at those two depth values shown here as Delta and this this Delta I has to be greater than or equal to zero so so Delta is basically representing the violation of the model predictions with respect to this physical constraint and a very simple loss function could be just to look at the sum of all physical violations that the model is making so this would this would this would lead to the following estimate of loss function weight along with the training loss and l1 l2 noms we are also using a physics based class function with a proper hyper parameter along with it so this is in some sense a working implementation of the high level goal of introducing physical consistency in in data science methods and a good part about this is that it doesn't require labels it can be applied in an unsupervised fashion here are the results of this model so vov already looked at the RMS C of the uncalibrated model which is two point five seven degree Celsius if we instead apply a black box neural network we can achieve an RMS u of one point seven seven which is somewhat better but if we combine both of them and and use physics-based loss function we can achieve an RMS C that is that is better than both of them it is even better than the the most finely tuned calibrated GLM model so this this demonstrate is the physical consistency can ensure generalizability and another advantage with this is that it can produce results that are physically meaningful to show you that here is here here are the density profiles of four different models on this particularly for a given time the blue curve shows you the density profile for the uncalibrated model the red curve shows you the the density estimate of the calibrated model and you can see that there is this this bias in the uncalibrated model which is the horizontal shift if we use a black box neural network it it is able to rectify this bias but but it but but it but it produces physically inconsistent solutions because the density first increases as we increase the depth but it's then it starts decreasing so it's not physically meaningful on the other hand the black curve shows the the PGN framework which is able to rectify the bias as well as capture more variations than captured by it by the red curve which which almost goes flat in the in the in the upper end and and lower portions of the lake and it's able to do though do this while being physically consistent and this is just for one single time step but we can look at many other time steps where similar results what would sure so this work is is part of a bigger goal of theory carried a science and and there are there's several future future research directions that need to be explored we already talked about ways to use physics to guide the learning of data science models using loss functions but there are many other approaches we heard about the use of pivot constraints which is another example of theory B theory guided learning theory can also be used in the design of the of the data science model by choosing right response functions activations functions like the jet parts trees that was introduced in the morning' theory can also be used to refine the outputs to post process and and to prune the results there is another emerging direction to create hybrid models of physics based models and heat assigns models so that data science can augment or replace components of physics based models that suffer from high systematic errors and finally there is this long-standing tradition on of inference of model parameters where data science can can help them and there are several posters and and and presentations they talked about the this this world so with that I would like to conclude so we introduced we discuss the the paradigm of theory got a data science which is which is gaining prominence in several scientific disciplines and thank you please let me know if you have any questions [Music] so the question is how do we know that the model is is is not extrapolating and interpolating well if we are if we are working in regions where in regions of feature space where we do not have samples we can we can look at the not just the the predictions but also the how confident are the model predictions in those features pieces and if you do not have samples the the confidence should be low so I think combining model predictions with the the confidence course is one way to look at that

Info

Channel: Deep Learning for Physical Sciences Workshop NIPS

Views: 7,655

Rating: undefined out of 5

Keywords:

Id: POsGBquHFaY

Channel Id: undefined

Length: 21min 4sec (1264 seconds)

Published: Tue Feb 27 2018