Data-Efficient Deep Learning using Physics-Informed Neural Networks

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so what i'm going to be talking today is about data efficient deep learning using physics inform neural networks and i think it's better if i focus on a single application go into a lot more details that i usually go in my thoughts so that's about the brain aneurysm of a patient which could rupture and this could be life threatening england what are the treatments for an aneurysm these days you can do open repair which is basically you remove the aneurysm you can do a stenting put something where the aneurysm is living in and by the way i don't hear i cannot see the chat so at any time if there are any questions please feel free to stop me and ask questions but i don't see the chat i'm gonna need some help so what are the treatments open repair stenting coiling and clipping but how do doctors decide these days to do a surgery or no there is this hand time has a scale it's from around 1960 and it says at level one if there is a symptomatic or mild headache probably there is no need for a surgery if there is moderate to severe headache the likelihood of the patient needing a surgery increases and then the last one is the patient being in deep coma but this is from 1960 can we actually do better we are in the 21st century can we actually do better so that's going to be the topic of this talk i usually give a thought on hidden physics models the talk is available online so i'm not gonna do that anymore but that's about a series of papers that we have been working on over the past few years under the umbrella of physics and for machine learning let it be neural networks let it be gaussian processes but today i'm going to be focusing on this paper it's about hidden fluid mechanics and it's exactly trying to address the problem that i just explained what i'm not going to be talking about today is about convolutional neural networks and images it's not about language and it's not about generative adversarial neural networks but why am i including this slide here the reason is there is physics even in our images there is inductive bias if we take this car shift it to the right left up or down it still needs to be classified as a car and maybe because convolutional neural networks are sort of invariant to this sort of maybe that's why convolutional neural networks have been successful a similar story is about language maybe the sequential nature of the language is something that we can build upon people that started with recard we're on networks because at that time we didn't have enough data to train giant models like transformers used in uh gpt or bird these days but the more data you have the less assumptions you need to make about your system because the more data you have you can actually learn those inductive ways it's the physics of the problem from your data i have this course it's applied deep learning it's a two semester two-semester-long course on all of these topics at the ones that are uh relevant that are that have an impact let it be computer vision let it be natural language processing multi-modal learning where you combine when you combine images and text generative adversarial neural networks variation authoring orders speech reinforcement learning graph recommender systems and computational biology or ai for science what i'm going to be talking about today is going to be related to 3d because you have a 3d geometry and it's going to be related to ai for science but the course is available online if you're interested in other topics in deep learning so let's go back to our problem the problem is about the aneurysm behind somebody's eye this is the input data you do for the mri or cts cad of the patient basically you go to your radiologists they are going to inject dye in our arteries and they are going gonna do their imaging in the end what you're gonna get is gonna be a video it's a 3d video that's your input and you want to design an algorithm that takes this image as an input and it's going to output the corresponding velocity inside the aneurysm and as soon as you know your the velocity and the pressure you're gonna be able to compute shear stress so this is gonna give the doctors more information more quantitative information of how severe this aneurysm is what are the likelihoods of this aneurysm rupturing so that's the input data this is the output from the algorithm so the input let's take a look let's take a closer look at the data it's going to be a point cloud and it's going to be time dependent so it's going to be a set of points in a space and time for instance at this point in time at this point in a space x y and z you know the corresponding concentration you can think of this similar to black and white type of images for instance you can't think of this as endless the color is going to be a value from 0 to 1. so we have t x y z and the concentration and then we have a set of these points these are the measurements the output of the algorithm what we expect out of it is again a point cloud it's going to be time dependent and then you want to know at this point in time at this location in the space what is the velocity u in the x direction velocity in the y direction velocity of the blood in the z direction and the pressure and as soon as you know that you're going to be able to compute shear stresses on the wall of the animals so that one is a post-processing if we solve this problem the other problem is solved and we want to do the player name let's put our deep learning hats on and take a look at the methodologies that are currently available and try to perhaps use one of them and see whether it's going to be successful or not matthew there's a question um and possibly it's better to clarify the question now um so the question is where do you get the pressure from when using non-invasive diagnostic images uh imagine it that's a great question and that's about the data where do you get the data and actually the entire talk is about that how do you get the pressure how do you get the velocity but for the sake of illustration you're right if this is a real the geometry is real but if this was data coming from a real patient you wouldn't know the exact ground truth for you to be able to compare that's why we have to go back to simulations but things are going to become more clear as i move forward so let's try to solve this problem using existing deep learning techniques how about using somebody might say 3d convolutions we know that we can put a box around this 3d object flexilize it black cell is similar to having pixels but if you do that there is going to be a lot of white space or empty space so maybe using 3d convolutions is not a good idea because it's going to be wasteful that's why people came up with the idea of point net i'm actually going to show you what point that is but then somebody else might say that there is a newer version version of pointnet how about pointnet plus plus let's use that and another person might say there is you can represent your mesh or your data using nearest neighbors create a graph out of it and then use graph convolutional neural networks or dynamic graph convolutional neural networks dgcns but before we dive into choosing a method or a model let's take a closer look at the data and this goes back to the question what is the data this is the input data for that particular patient you have t x y z and c i and this is a set for that patient then you somehow know the corresponding ground truth maybe you get it from simulation maybe you have it from expensive for the mri actually from 4d mri i have a colic here that we are working on some real data and then you have the corresponding velocities maybe you don't know the pressure but you know the corresponding uvw but for now let's say this is your data somehow you obtained it but not only you need to have it for this particular patient for this particular geometry you need to have m of those m is the number of data that you have m is the number of different patients and this is going to be your training data so you got you are going to have a set of sets or set of set of points as soon as you have that hopefully you have that you're gonna be able to train appointment so this is the architecture of the point net it's a famous type of a neural network it takes us input a point cloud and if you are interested in a classification task which we are not it's going to give you k classes so we don't worry about this part our task is a regression task maybe we can borrow ideas from semantic segmentation so maybe this branch of the network is relevant to us n is the number of points in one of these sets which could be different from one patient to the other the original pointnet paper has three inputs in this case we have one two three four five and the output of pointnet is gonna be the same number of points in your point cloud so it's gonna be corresponding to t x y and z and then you have m of those m is gonna be in this case one two three four so you're outputting u v w and p out of point match but the architecture is very simple what it does let's forget about this input transformation which is just a change of coordinates it takes a point pushes it through a neural network it takes you from dimension 5 to dimension 64 to dimension 64 and it's going to give you end points in 64 dimensions but then it's going to do the same thing it's going to be the same way it's and pisces and you are processing each point independently you do another transformation change of coordinates and then you keep doing the same thing you take your points push them one after another through your mlp multi-layer perceptron in the end you want to know you want to extract some global features from your geometry for this particular patient you just average this out or you take a maximum maximum pulling it's a global maximum along this dimension and that's why the dimension n disappears it's going to give you a very long vector you take that you broadcast it here from a layer before you just copy and paste you do the same thing take your points push them one after another through a bunch of mlps which which are sharing parameters and then in the end output your velocity and depression okay perfect now to train this we need to have millions of patients millions of geometries but we are dealing with a small m problem small data problem why is that and what is the scale of m that we can hopefully find it's not even in the order of thousands it's going to be in the order of 100. so it's not enough to train a pointnet so we have a problem here we need to deal with the small m problem okay let's put back our deep learning hats on come up with some ideas somebody might say if you if you don't have enough data don't use the planet that's not the correct way to go about it use classical scientific computing if you can and actually we are going to try to do that and see if it is possible in the future slides maybe use other type of machine learning algorithms maybe do linear regression maybe do gaussian processes but we really need that this is going to be a complex flow the velocity inside the aneurysm and we really need that function approximation properties of deep neural networks so we really need that let's see if we can get around this small end problem somebody might say collect more data maybe wait for the technology to evolve so that you have fancy mri machines that are going to give you more data faster even if you wait a few years to collect more data you still have a problem this is a medical field and in medical fields there is this idea of privacy maybe the patients are not willing to give you the data because this is their medical record they don't want to share it okay to deal with the privacy concerns there are also ways of dealing with that maybe doing federated learning and the idea of federated learning is that rather than transferring data over the internet you can actually transfer the gradients of the and loss function with respect to the parameters and transfer those just transfer the gradients that's an idea somebody might say in in the case that you have a lot of input data maybe that's cheaper maybe you can use cts cap maybe you can collect data that are unlabeled you don't know the corresponding uvw or p and then you do unsupervised self-supervised or semi-supervised learning the type of ideas that you see in natural language processing for training gpt or their type of models you first do unsupervised and semi-supervised or self-supervised learning and then you do transfiguration you transfer whatever that you learn from that exercise to this particular problem that's an idea another idea is to use simulated data maybe it's expensive to collect real data how about going about this problem by collecting simulated data and actually this is a promising idea we are going to pursue that and see is it going to solve a problem or not and one argument against using simulated data is that you're going to end up with a something called reality gap that the simulated data is going to have a different distribution compared to the real data there is going to be a gap between the distributions and to deal with that there are there are methods like domain adaptation or domain randomization type of techniques even if you manage to do all of those there is going to be the question of trust are you really going to trust the predictions that are coming out of a neural network to do a surgery on somebody or not how much do you trust that even if you trust it is it gonna be robust we know that neural networks are sensitive to adversarial examples maybe there is an adversary shifting the input a little bit and then causing a patient to go through unnecessary surgery let's go back to that idea of simulation that's promising where else do we see it we see these sorts of ideas in robotics when you want to go and work with a real robot in the physical world it's going to be more expensive much more expensive to cut to collect data how do people deal with that they collect data in a simulation in simulated environment like open aig and they do domain adaptation you see the same techniques in self-driving cars you can sit in a car or let the car drive itself collect data from a left camera center camera right camera perhaps a couple of lidars collect data and store them and then work with them but that's dangerous you cannot put a self-driving car on the roads on its own it's gonna be expensive to begin with and it's going to be really dangerous how about collecting data in a simulated environment and then do domain adaptation so now somebody might say if you are going the simulation ground to collect data how about doing simulation only once you extract the geometry from a patient you mesh the geometry and measuring for those of us while i've been working with meshing we know that you need to have an expert in the loop it's not only a science it's also art of how to come up with conforming meshes and so you need to have an expert in the loop even if you do your meshing you can do your navier stock solver your favorite one maybe use open form maybe use nectar plus plus etc and this is going to solve your problem basically extract the geometry mesh it run your navier stokes and it's going to give you the flow inside the aneurysm perfect but there is a catch the question is what are the correct boundary conditions what boundary conditions are we going to use at the inlet and the outlet to see the complexity this is the aneurysm behind the left eye of a patient you are cutting it from the entire arterial system of a human being yes navier stokes equations they are based on first principles like conservation of mass momentum but the boundary conditions that you're going to choose they're going to end up being really attacked somebody might say treat this as an inverse problem you have this data now your problem is to adjust your boundary condition at the inlet and the outlet you keep adjusting it and you keep doing multiple rounds of simulations so that the flow going to give you this pattern in the concentration so you keep adjusting back propagating your errors so that your forward simulation is going to match this pattern but let's see how many hours you're going to need to do that even if you have your mesh in place running this knob your stock solver is going to get is going to take you 48 hours for one forward simulation to solve the inverse problem you're gonna need to solve the forward problem thousands of times so it's gonna be thousands of times times 48 hours to give you a solution to this problem and this is certainly beyond my lifetime so or my patience so this route we are not going to go the scientific computing route the deep learning route didn't work the scientific computing route is not going to work but fortunately we were working on these ideas of physics informed deep learning for a couple of years and we were trying to devise a framework that you can use it to solve forward problems inverse problems data simulation type of problems and mother discovery problems so a white class of a wide range of problems actually and our problem is going to be related to the inverse problem that we're seeing but what is the observation what is going to be the physics here the physics is going to be written in the form of differential equations partial differential equations and there is this key observation that the derivative of a neural network is another neural network these two neural networks are going to share parameters they're going to have the same weights and biases the activation function changes maybe if you're using tan h it's going to become 1 minus time h squared basically the derivative of 10 h is going to be your activation function the weights are gonna get transposed but in the end of the day what you're gonna end up with is still a neural network it's gonna have a computational graph and why is this observation important because a lot of partial differential equations you can write them this way it's going to be ut if it is time dependent is equal to a non-linear function of time x the solution you are taking derivative of the solution with respect to x with respect to x twice etc so that's a generic form these derivatives we are going to compute using automatic differentiation these are going to be neural networks this is going to be another neural network based on this observation that the derivative of a neural network is a neural network you create a physics informal network f and then you fit it to the data that you have some data are gonna be on your solution and some data are going to be trying to enforce f to be zero and let's take a look at these classes of problems for the forward problems you know your n exactly so you know what your n is going to be and then you just give it initial boundary conditions as your training data and you solve this minimization problem for inverse and data assimilation you know your n up until some parameters or maybe you have observations on if u is a vector on only one of the components of your vector for model discovery you can actually say n could be a neural network or if you want to if you want it to be explainable you can say n is a linear combination of maybe polynomials or sines and cosines then you can discover your physics given enough data and there are two arrows one goes from left to right the other one from right to left unfortunately if you have a small data you cannot learn your physics you have to make some assumptions that's just a fact so let's try to use that for our problem previously pointnet was using fully connected neural networks mlps we are going to use that we had time x y z and the concentration as the input and our neural network was out within u v wrp we are going to make a slight modification we take c as the input we put it as the output and these sorts of ideas you see them when you are doing d2 learning for reinforcement learning you take your input put it as the output okay so first now this gray box is very boring there is nothing special about it but the cache is you have data only on this variable on the concentration you have no data on uvwmp to compensate for that you know your laws of physics you know conservation of momentum e2 e3 e4 you know conservation of mass these are first principles and then you have an equation for the concentration which is being advected by the flow it's being pushed by the flow and it's diffusing so this is the diffusion coefficient for the concentration and the reynolds number is related to the diffusion coefficient for the for your flute and you don't even need to know what is the peclet number what is the reynolds number this could be two additional parameters on top of the parameters of your neural network that you're learning e1 e2 3 e4 e5 are going to end up being physics are going to end up being physics inform neural networks they are going to share the same parameters as this gray blacks and the cool thing is you don't need to have data on e1 e2 e3 e4 and e5 you don't need to have labeled data because the label is going to be zero you have some data on the concentration this is one point cloud and then you are trying to satisfy your equation equation one being zero equation 2 being 0 up on some collocation points so it means that you can have infinitely many points here as much as you want because you don't need to pay the labeling cost once this minimization is done you are going to end up with your flow what did we just do we collected zero external data this was zero training data basically m that capital m is zero and we are only testing our algorithm so there is going to be a an optimization when it comes to inference when it comes to the testing so we had only one data for testing that was from scientific computing perspective why do people in deep learning should care about this or this sort of first principle type of thinking we had a speaker from google the other day and he was trying to solve this task it's a video prediction task given the previous frames predict the next frame that's a task and we know that in deep learning these days they try to solve the problems by collecting more data and training larger and larger models so they had a small model they had a big model trying to predict the future let's see what happens in the next frame the predictions are fine in the frame after it the short of this person is changing color in both of the models the smaller model is losing the leg of the person and the identity of this person is turning from a man to a woman we keep predicting this other neural network is losing it it's the person is disappearing the next frame it disappears totally now the smaller model is gone the person disappeared and that one is turning into a woman with a different color for the short let's predict more and then the clothing of that person changed again even from the biggest model so how about this we know that a pixel there is going to be this idea of conservation of pixels a pixel is not gonna appear or disappear out of nowhere it's just gonna get transferred from one location to the other location this could be the physics for this problem and perhaps it can solve the problem and we were using it we're using conservation of mass mass is not gonna appear or disappear out of nowhere it's just going to get transferred from one location to the other location and that was our continuity equation that we were using i think i will go through the source code for this the topic that i just explained and the source code is going to be hfm hidden fluid mechanics there is going to be some utilities we needed to do forward gradients rather than backward gradients because we had more output variables compared to the input variables we know that back propagation is useful when you have maybe a loss function which is only one output variable and a lot of input variables like your weights and biases so that's useful in that case because in one shot you're gonna get the gradient with respect to every single parameter in your model but in our case we were going from low dimension maybe type x y and z to more variables concentration u v w and p so we have more variables at the output level so there is this tree there is this trick that you can use you can do forward gradients by calling backward gradient torque twice it's gonna be as efficient as doing the forward gradient yourself in tensorflow we are going to define a neural network you're going to have a set of weights and biases and some gammas the it's a callable class we can call it we couldn't use patch normalization we know that for a lot of neural networks to train you need to do batch normalization we couldn't do it because batch normalization is gonna interfere with the physics of the problem it's gonna interfere with the gradients that you're taking so we didn't have that luxury so we said we are going to use weight normalization we couldn't use relu activation function we decided to use swish activation it's a smooth approximation to value we know that relu is a is an important contribution for deep learning in general it didn't work for us because we are taking derivatives of our functions with respect to the input and it's going to interfere with the physics if let's say your problem is doing a regression on a sine function the red dashed lines are going to be the regressed function the first derivative is perhaps fine you're going to have some noise which is perhaps going to mess up with some of your first order derivatives but maybe you can live with that but the second order derivative is just gonna zero out so we couldn't use rayleigh because we know that there is second order derivatives in the navier-stokes equations and if you kill them then they are not going to be navier stokes anymore so you create your neural network which is going to output concentration uvwmp this is exactly the figure that i showed then we are going to call forward gradient seven times and then we are gonna collect this is our concentration the derivative of the concentration with respect to x y x twice and then we are going to return our equations these are navier stores what else for tensorflow you need some placeholders this is where the data is going to go in so that's not important you create your neural network only once this is the gray box you're gonna call it twice on two sets of placeholders so it's the same neural network same weights and biases we just call it twice and then you are going to do your navier-stokes function which is going to return your equations and your last function is going to be you have some data on the concentration you just do mean squared error on those for your equations you don't need to do any labeling your right hand time your right hand side is just zero and then you do mean a square there you set a learning rate you set your optimizer to be added and then you do your minimization you set your model the peclear reynolds number you don't actually need to set them you can actually learn them you train and then you do your prediction these are the predictions of your model these are the relative l2 error in space and we know that our problem is time dependent so this is the errors that you're making for concentration it's boring there's nothing special but the velocities are coming out of the neural network and the pressure so you have no data on them so we had this reviewer of our paper asking us to do systematic study with respect to the resolution of a point cloud we are using a single point cloud but how sensitive are you to the resolution of your point cloud what if you have 201 time stats and 15 000 points in your space what if you reduce the number of points in this space from fifteen thousand to ten thousand to five thousand to two thousand five hundred what if you reduce the resolution in time what if you do all of them together reduce the resolution in space now time what's going to happen how sensitive is your algorithm to that and when is it going to break at that time i was at nvidia i had a dgx2 machine in front of me so i said i'm gonna fire up 16 gpus with different noise with different resolution run all of them and at some point the algorithm is going to break and we are going to report failure cases in the paper and the reviewer is going to be happy this is 26 2500. we are here and the algorithm didn't break so i had to fire up 16 other gpus to break the algorithm and at some point in the second exercise it broke where you have around 13 time steps and 2 500 points in your space there was another request study the robustness of your method to noise and keep adding noise one person two person three person twenty percent forty percent sixty percent eighty percent one hundred percent and at some point it's just pure noise compared to your signal the algorithm you can see it's not breaking it's acceptable relative error in all of these variables to do that exercise i fired up 16 gpus hoping that it's gonna break after adding 30 percent noise it didn't break i had to do another round it didn't break another round another round another round and the algorithm didn't break i gave up why is it data efficient not only we are being data efficient in the number of observations so we have only one observation for one particular patient we are also efficient in the number of points in the point cloud and we are also gropos to noise exactly because of the physics of the problem so many of the functions are not going to happen they are just not physical there is this nice observation that a constant is a solution to navier stokes so these are the trivial solutions these are the obvious local minima of the optimization and we can actually see that during training the loss of your concentration you have data on it so it's going to keep going down but your equations they're going to start from very low they're going to jump up and then they're going to go down so they're going to escape the local minimum same thing is happening here for the other equations we were doing internal flow you can do external flow and compute interesting things like lift and drag forces this is my last slide and the next one is future works what is the ultimate goal of machine learning these days and deep learning in particular it is to come up with an algorithm that performs the best on some test data and we usually don't care about the cost maybe it's time to start thinking about the cost and how does it compare to statistics in general the goal of statistics is to come up with a model that explains the data and these days there is this new emerging field of explainable ai trying to be somewhere in between the two but let's take a closer look at the problem that machine learning is trying to solve that problem seems to be imposed why is that we are looking at a performance metric for instance accuracy and you can see that every single day there is a paper coming and saying that we are pushing the state of the art we are getting higher level of performance it means that there exists many algorithms that are going to give you that level of performance and above that's just some evidence let's forget about choosing appropriate metrics what are some of the other interesting questions are you really coming up with the simplest model that is that you could come up with are you using the least number of parameters are you using the least number of data that you can and maybe it's time to start thinking about the oculum raisers principle again that we are looking for a model that explains the data that fits the data at the same time it is the simplest model that we can come up with questions such as how many training data are you using how many of them are labeled how many of them are unlabeled how many validation data are you using can you reduce the number of data that you're using how much energy are you spending on training your models how many gpus tpu cpu hours are you spending can you explain your model and its predictions is your model robust to perturbations and adversarial examples we are these days we are using more and more training data we are increasing the sizes there is a chance that our test data is going to end up being a subset of the training data or very similar to it are you just memorizing your data or no how about the corner cases if so if a car is driving on the wrong side of the road and your model is making some predictions and that data should we just brush it off as an outlier or it's actually telling us something about our model and i think i'm going to stop here and ask questions answer questions thank you very much mozilla for the introduction to your work and also outlining the future direction because several questions actually going in the direction of the future works and you have outlined but maybe we can start with coming back to the first question that was raised on where do you get the data from and maybe now you can explain if if you ever got data from real patients or was it just simulation that you went into so let's go back to this slide then for this particular problem because you need to know your ground truth and the ground truth may be the velocity you can get and it's going to be end up being very noisy if you do real data but the pressure you're not going to get from from real data you're not going to get that but now you want to prove a concept that if you know your physics and if you assume that your physics is correct is your algorithm actually going to give you back the corresponding velocity and the answer is yes for that you need to work with simulated data and there's also a related question to that so you mentioned that you're we're just using data for from one patient you're simulating the case for one patient but did you also validate what happens if you transfer it to a patient with slightly different characteristics or so how transferable is the model that you trained well yeah absolutely so what you can do in this stage if you have a new patient you can warm start these parameters so you don't have to start from random weights and biases you start from weights and biases that are good for the previous patient and then there is going to be this learning happening on the fly okay so you would not take the train model that you adjusted to the data from one patient and apply it to another patient but um rather start with a pre-trained yes so it's going to be the idea of find unique okay actually if we had more data if we had some training data then that would be the perfect scenario for us why because you still can put these kinds of constraints on your data for different patients and do your training maybe a train appointment but the problem was at that time you were in academia and once you're in academia it's very hard to have access to data and compute thanks for for the answers and there's again a bit of related question uh on how much training data would you still require so since you also called your talk data efficient um algorithms did you validate so how so how to how many samples can you go down that your algorithm can still generalize well and and did you also evaluate the generalization and extrapolation ability of your algorithm depending on based on your training data yeah so our training data m is zero so the size of m is zero the test data you have only one test data and this is your test data this is what you're testing your algorithm on and in terms of how sensitive your method is to the resolution basically the number of points in your point cloud these are the studies that we were doing so this is really sensitive 26 points in time and 500 points in space are enough to train this algorithm so not only you are being data efficient here [Music] so m is 0 the size of your point cloud you're also being data efficient there so 26 points in time and 500 points in space there's again a related question again to the date and uh there is another question in the chat now um in the problems that we don't have concentration of the fluid flow how can we solve the problem um which parameters or how would you define the problem and formulate your architecture uh what was the first part of the question in in problems where we don't have concentration because um of the fluid or or the flow because you were using this information as um for supervised learning basically um so what if you don't if you're liking even that information yeah so one way to know where you're on the landscape of what you can do and how much assumptions that you need to make and what type of method you need to come up with is to look at this arrow from left to right on your data everything with deep learning starts with the data in our case we had data on the concentration so we had some data we were landing somewhere on this arrow and therefore you can make some assumptions about your system maybe you have some data on the concentration and then you write down the equations for the concentration now the question is what if you don't have data on the concentration then you're going to fall back here your data is even smaller it means that you need to make more assumptions about your system maybe put some data on the boundary and then that's going to be a forward problem for you it means that you are doing a forward simulation given the inlet and the outlet solve the forward problem and actually this is counter intuitive the less data that you have and the more physics assumptions that you make it's gonna be harder to train your neural network but the more data that you have it's gonna be much easier for it because it's gonna pick up information from the data thanks i hope it did clarify the question but if not please feel free to raise hands and maybe you can ask the question again um there's also a question on the activation function because you mentioned in your presentation very shortly that our value did not work for you or that you did not use it but maybe you could explain why you did not why did not work out for you or also possibly why activation function was a good choice for you actually to be honest we are not using tan h we are using uh this is called this is the activation function it's gonna be x times sigmoid of x okay and this is uh gonna approximate value it's gonna be a smooth approximation to your radio function you need it to be smooth people in deep learning can get away with regul because in the end there is a summation over their data of a loss function for every single data point and as soon as you do that summation your function is gonna end up being uh that's the one that you're minimizing it's gonna end up being very as smooth as possible okay thanks for this comment as well um there's another question how do you build in constraints and what if parameters for the constraints can only take well used in a range so how would you define those models and integrate them yeah that's a good question and actually you're gonna need that for some problems maybe this reynolds number let's assume you want to let it train and learn it and let's say you want your reynolds number you're looking for a reynolds number in the range of uh 100 to 1000. you can put a tan h on that it's going to be 100 plus tan h of a variable maybe reynolds hat times 1 000 minus 100 so these are the ways that you can enforce the range to be bounded to be in a particular range okay thanks also for this classification and there's another question in the chat again um so pins work and then super with way you can simulate navi stocks they're starting from some movie points location and time points and get the corresponding solution what are the performances of the method with respect to the number of input points in terms of generalization to what extent neural networks help with respect to running the standard dns i think the second part of the question i answered it's that systematic study that i showed here with respect to the resolution and the noise so you are being really robust with respect to noise and resolution and the first part of the question is about how about don't using deep learning just using pure scientific computing to solve this problem the problem there is that you you don't know what is your inlet and outlet it means that you need to keep back propagating your errors adjusting your inlet and outlet it means that you need to do thousands or millions of simulations to adjust your inlet and output boundary conditions to match this pattern and each round of a forward simulation takes 48 hours that number times even 1000 is a lot of days that you need to wait for the algorithm this algorithm on the other hand is going to converge in 48 hours the entire thing matching this pattern so basically a patient comes in we do cts 10 or mri on them they go back home they're going to come back two days later and we're going to report the shear stress on the aneurysm with question from luca beaches if possibly if the question was not answered please um unmute yourself and just if you would like to clarify your question no thanks i think the the question was was answered thank you thanks for that as well there's another question can this method also be used to solve turbulent flows so the more complex you make this function then you're gonna need a deeper and wider neural network with more capacity and at some point i think we are not yet ready given the sizes of our gpus and tpus we are not ready yet to train a neural network that can approximate turbulent flows okay thanks for the sentence well um there's another question as only point clouds are considered there is no concept of context do you think graph network's able to solve this problem even if gradients are expensive so i'm not sure i understand the problem but the question you're not considering neighborhoods in in in the concert where the in the architecture the question is if you believe that graph networks would support this additional link and would help overcome some of the challenges that are faced by [Music] foreign actually now i understand the question it's about which architecture are we gonna end up using we're gonna do 3d car net pointnet pointnet plus plus graph neural networks and to be honest for this talk and this paper we were trying to keep this part of the neural network very boring we were trying to emphasize on the physics of the problem but you're absolutely right what if rather than putting a point you look at its neighborhoods and those be the inputs to your algorithm is that gonna improve maybe the training time maybe it's gonna improve the accuracy your error rates so that's a good point and that's open for future research and that's the idea of turning this gray box into something colorful something more fancy maybe dgcnn or pointnet or 0.0 plus plus did you explore that a bit um or where are you currently do you have something in the pipeline that where you're working on that uh not yet but that's really promising designing better architectures for the physics of the problem okay thanks for this clarification i think at the very beginning while he was still presenting possibly has already been clarified but one participant asked them if you could roughly explain what is meant by hidden physics versus lecture physics namely the first principles like conservation of mass and implicit physical models that i implicitly use um yeah for example the navier stokes and equations yes so this is exactly the hidden physics of the problem there is conservation of momentum you express them in terms of partial differential equation not their integral forms and then there is going to be conservation of mass and this is how you're going to represent that okay um yeah possibly we could even stay on that slide because there are two questions um that are going in the direction sir um the present network would predict uv uh w based on uh on the concentration into transfer and contrast due to colored dye can this be related to scalar reduction of a passive tracer yeah exactly so this concentration is a passive tracer so it's tracing our flow it's being advected by the flow and it has its own diffusivity so some of the dye that we inject is going to diffuse faster some of them are gonna diffuse slower and we don't actually want to make any assumptions like that so you can actually learn the diffusivity of the dye that is being injected and then there was a follow-up question what if the fluid flow itself would change speed during measurement for example a change of contrast due to diffusion of dye or change of concrete in mri due to flow yeah that could happen maybe you're measuring the temperature and then the temperature is gonna have some effect on the energy then there's gonna be conservation of energy coming into navier stokes we haven't yet explored that again when it comes to physics informed neural networks you can do contributions on what type of neural networks you want to choose and push the state of the art in that direction you can think of your physics and consider more complex types of equations you can study data maybe more data less data data on different variables we can't think about the mass function maybe mean squared error is not the correct mass function to work with maybe there are better loss functions so yeah these are the directions and the and actually more complex geometries more complex flow like turbulent type of flows so there is a lot of work to be done it's very true there is a question in the chat again can we train this neural network and spherical coordinates in what in a spherical coordinate yes can we train this neural network and spherical yes absolutely then your equations are gonna change which should be fine okay thanks for that as well um there's a last question on slider um can you please explain the pixel conservation concept a bit deeper because the place is conserved but the pixel values are not concerned so how do you define this pixel conservation concept and how did it help you for this problem so that was just the futuristic thinking of if you have conservation of pixel ideas like what you have conservation of mass then you can try to be a little bit more data efficient and maybe get away with the smaller models but the idea is that the color of a pixel should not change from one frame to the other frame out of nowhere that color shouldn't appear or disappear it just get transferred from one location to the other location and in that case you are going to get rid of these problems like the short of this person changing color to be green it shouldn't happen out of nowhere and then you can write a differential equation for it you first start with the integral form and then you put some you have some small area and then you'd let that area to go to zero and then you're gonna get the partial differential equation for that um thanks for that i think we answered all of the questions that we're in the chat maybe some some live questions because you also mentioned trust and robustness do you believe that adding equations and adds trust to the models or how would you define trust in this case and possibly how would you also present the information from these models to the main expert so i'm gonna trust this algorithm if i trust that mass is gonna conserve or momentum is gonna conserve if i don't trust that then i'm not gonna trust this algorithm it means that i need to change my way of thinking so the physics of the problem matters and the physics is the one that is bringing the trust because whatever function that you're going to end up with here is going to satisfy those constraints that you put and maybe your constraints are bad maybe navier stokes is a bad equation we don't know did you evaluate a bit how much physics do you really need to add um because for example you were mentioning the more complex the problem gets and the less data you have the more physical constraints you need to add there um did you perform any um thorough analysis on this kind of um how much for which type of problems did you evaluate that or or did you focus on very specific problems where physics was already given and then you did not face the question adding additional constraints and adding enough additional physics into the models so yeah actually i have a paper here in the big data regime where you can actually given enough data learn your equations and your equations could be in the form of a neural network or it could be a linear combination of a bunch of simple functions and the algorithm works the framework is still this framework you write on the loss function you have some data you have some equations you try to minimize that and it works and you're right in some of the cases maybe you know your advection and diffusion maybe you don't know what is the correct reaction terms those reactions you can actually learn from data and that's going to be this case of data assimilation again this arrow is really helpful tell us where to land and what method to use how much assumptions to make okay i think we're really well above the time but i still wanted to finish the question the questions that were actually asked there was one one last question that was just asked once you introduce obstacles the equations to solve for the same but the pre-trade pins still be able to solve it or does every change involve retraining every change that you make every new patient that comes in there is going to be an optimization so i wouldn't call this training because these are not your training data that that optimization is happening on the when you're doing inference when you're doing testing so there is going to be some optimization when you do testing but if you have data if you have m you can actually train this pointnet successfully make it obey the laws of physics maybe put some constraints there now a new geometry is going to come in you can just put push it through this neural network once and that's going to give you the corresponding flow there is not going to be any need for optimization so as soon as you have more data then your your life is going to be much easier okay thanks also um we don't have any questions we don't have any raised hands as far as i can see um so thank you very much for taking the time also um for for staying so long with us even though we were planning to have it just for one hour um and thanks for sharing the results and we are always looking forward to your new research results and and the new explanations that you're going to have to listen to thank you so much for the invitation again [Music]

Info

Channel: IMS Chair

Views: 968

Rating: undefined out of 5

Keywords:

Id: lYgFRCit8Xw

Channel Id: undefined

Length: 77min 31sec (4651 seconds)

Published: Fri Aug 27 2021