Nathan Kutz:"Data-driven Discovery of Governing Physical Laws"

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and sciences and you will basically present this today or not the treatment discovery of a new physical engineering physics awesome thank you all right thank you for coming to the talk like I've had a really nice day here today it's been awesome lots of orange and yellow and red trees out there so a lovely walk walk around the campus so yeah and my job is to sort of give you a half interesting talk so at least you didn't feel like you waste your time here so so let's let's see if I can do this so I'm very interested in data science in particular how it can help me with doing what I like best which is dynamical systems or modeling physical systems in particular the kind of physical systems I want to model I think about time evolution I think about engineering physics type things and so I want to start talking about how we can integrate some of these emerging machine learning techniques over to our traditional engineering science modeling techniques that we do I've broken the talk down into sort of three main pieces here that won't work that's all right model discovery we'll talk about that then I'm going to talk a little bit so I shifted to talk a little bit just actually over the last couple of days I'm gonna talk about neural networks because a lot of people are interested in neural networks and I have a few things I want to say about neural networks in case you're interested to hear it so well I'll talk about them and and you'll see some of the perspectives I think about for that both for thinking about manifolds learning coordinate systems and also about for learning okay all right so let's start off it's gonna start off super heavy slide one probably most you're gonna get lost but that's okay we'll try to boot cover this is gonna be what my talk is about mostly okay we typically overlook how powerful a X equal to B is I thought I knew a lot about X equal to B the last few years I've been very humbled by this because I actually didn't know that much about X equal would be ultimately what do I mean by that when we take linear algebra you sitting there solving is by the way that's most boring math course ever took I don't have the same for you but really it was like okay I got a good great by the way don't don't get me wrong it's not like I struggle it was like not that hard I was like what okay fine I can solve systems I have no idea what all these things are for later you start to realize what they're for but there was a very big lie in that class an equations and unknowns that's not the real world here's the real world real world is this data matrices I either have massive over or under determined systems that's the ax equal to B I want to solve in one case you have no solutions in the other case you have infinity and I don't even have to think about it because when I put Matt lag and I hit backslash I get an answer did you think about it I didn't I liked this fine backslash it who cares there's my solution I know there was supposed to be infinite number or none hi MATLAB gave me one what did it do for me so we're gonna start asking this question a little bit more carefully now because it really matters how we go about X equal to B now and in fact box that's not your only option in MATLAB you see these PM's lasso arrays that they're all solving X it could be what's the difference if you're gonna solve this equation like this there's no solutions or an infinite number of solutions you have to impose some constraint or regularization so really you don't want to solve ax equal to B in the modern data science world you're gonna solve this ax equal B subject to something alright and what I subject this to really matters and normally when you hit backslash for instance one of the things that will do if you have an infinite number of solutions it will actually go and say how about if we give you the one with the smallest l 2-norm okay so what yep backslash so it's giving you did you know that that's what it's doing for you okay so the subject to line matters I'll tell you what I'm gonna try to look for I'm going to make a subject to line that's going to promote sparse solutions knows I want most of the components of X to be 0 ok it's usually instead of an l2 penalisation as l1 l1 is a proxy for sparsity that will matter for me later but most of this talk is going to be ax equal to B whatever you walk away from here thinking it's like did he do some fancy math no I did ax equal to B and I'm proud ok I'm not ashamed the last half of the talk let's get let's say the last 40% of the talk I'm gonna switch it around change it up a little bit and solve that instead make it an omni ER set of equations all of machine learning I would argue is that an F and a G where a is some unknown parameters often the weights of a neural net doesn't have to be a neural net but it could be large one is you specify something you want to have match or be your your fit function and you specify a subject to line how are you going to regularize that solution because you have either infinity solutions or none so unless you place that in there you have an ill-posed problem okay so that's the more broad framing and like I said I am that is all machine learning into lines neural nets to everything else you do it's going to be F and G an optimization applied optimization problem okay so here's my specific view of what I want to be specifically doing here's the generic framework I have a set of dynamics DX DT equals some right-hand side function f the state space is X I have some time it depends on time it depends on some parameters and maybe I'm some stochastic effects that's my dynamics I have a measurement of that dynamics give my Y Y is measured at certain time points okay and there's a measurement model I don't necessarily measure X I measure it some function of the state and time and maybe there's in some measurement noise here's the goal if I only had Y could I give you everything else I don't know what F is I don't know what H is I don't know what X is I need to discover all of it where does this kind of application come all you have to do is when you get to really hard problems you say what kind of problem could have such terrible things like this biology that's the answer to all of it well you're they apology you weren't probably they'll give you one biology what part of biology every part of biology okay Balaji sucks I mean look I work on biology but it's really hard so I make sure to work on other things too so I don't get depressed too bad there's a lot of interesting stuff a lot of good data collection okay so this is kind of the goal and it seems like kind of an impossible goal right like this so underdetermined how you actually going to solve this inverse problem right does it make sense that you could do this but we'll talk about what there's but you can't do it unless you put some priors on this okay prior I'm going to bring to the table is you see this right hand side term there this this f I believe and here's my whole belief system on this that there's only two or three terms over there that matter generically okay so let's go to model discovery this is work with Steve Brunton who's was a postdoc of mine now as faculty in mechanical engineering at u-dub and Josh Proctor who's at the Institute for disease modeling and so we started coming up with this framework to try to to think about this which is alright I want to discover a dynamical system so I have the FCC comes in and a measurement G now the way when I was in grad school here's how the way the world worked for me I went to a class there was a textbook and that textbook told me what n was I'd have a choice about that I can make up my own end someone gave me a book on quantum mechanics and there was Schrodinger's equation that was my end someone gave me a book on Maxwell's equations and electrodynamics that was my end I did not have choice like you guys have you know I think you pick your own end too but boy those textbooks tell you what n is you're given the system of equations you go or you empirical law somebody drive something and then you simulate it you do some asymptotics you look at the kind of behavior you interrogate it once you've given that in and then you know there's some measurement of the system and you say like oh look they're consistent I might do some modeling here but in the end I match these up now we advanced to you guys what's changed well some things that have changed are this right this is an iPhone 6 oh I'm not even that modern really I mean I'm just I like four years behind now right with this phone I should just chuck it against the wall and buy a new one cuz that way that's probably the way I have to actually buy a new phone force myself if anybody wants to Huck it no okay the reason I hold this up on the front of this phone HD camera on the back of the phone HD camera these are like less than a buck apiece now it's amazing I can buy a laser pointer for less than a dollar okay I have to be in Hong Kong to do it but the point is these are sensor technologies and each frame of this if I go to HD video or 4k video which is going to come next on this is 24 million pixels per frame streaming at 30 frames a second do you know how much data I can collect in a very short amount of time this is your world you live in this has changed for you guys this G here is enormous you can sit there and record off systems and how many I mean terabytes do you want because I can get that for you in like a minute in some systems right the point is you're gonna have that data and you're gonna measure oftentimes a system where you don't know end what kind of systems don't you know end for biology some multi scale systems or lots of systems up there we don't know and we've posit empirical models but we need to improve them now so this is what we're gonna do good the reverse what could n be well let's make up a library of possible things and could be lots of things it could be like a linear term there right that's what the one represents it could represent linear terms or quadratic terms or cubics or sines cosines I can put anything in here I want Exponential's logs just make up stuff this is just a library of potential right-hand-side terms put in everything you've ever seen I don't care many terms over there actually I put thousands of library elements okay these are all candidates okay and that's what each column is is candidate functions each row is a time snapshot measurement x 0 delta t 2 delta t 3 delta t so the columns so the rows are time columns are candidates this is my matrix a in ax equal to B okay I'm gonna again emphasize my belief or my prior that and bring it to the table here's my prior only a few of those columns matter out of all these potential candidates I want to select the ones the chosen ones the Harry Potter's of the right hand side okay yo saw hi parrot player Potter right not even a crack of a smile for that one okay I just made I know my jokes don't always work but okay alright so here we go let's do an example here's the example very simple system Lorenz top-left corner there X dot y dot Sita simple three-dimensional system I'm gonna do a simulation of the system fully chaotic regime and what I'm going to give you is the time series data XY Z okay so I just give you basically in that lab of matrix three columns potentially with a bunch of time measurements I say okay here's your data tell me what dynamical system produced it could you reconstruct Lorentz so here's how we're going to do it if you give me that data XY and Z I can actually take time derivatives of it to produce X dot y dot Z dot that's what this represents here I can take your time series differentiate it that gives me B the right hand side a is all these possible candidate functions I have ax equal to B backslash and if you just did backslash right now it's a least square fit and a least square fit says oh I'll give you a solution since I'm trying to really give you a solution with a small cell to norm here's what I'm gonna do everybody's important every single library element I'm gonna just give it a little bit of score cuz I'm gonna give it a little score cuz the l2 penalty squares it so if I give you something small squaring makes even smaller and it's going to say every term matters and the fact is my bias is like no only a few terms matter so what I really want to do is solve that ax equal to B by promoting an l1 norm now if you promote an l1 norm this is just the standard lasso the lasso algorithm is all about l1 optimization okay now the problem with l1 is it highly unstable not very robust just in case if you've played with that one you know this if you haven't played with well why don't you go like oh that one's great no it's not great just want to throw it out there know the practicalities everybody who plays with knows like you don't do out one straight up laughs Oh sucks okay I'm telling you now look at all this insight knowledge you're getting it's all for free okay so last so is not doesn't work that well it's not robust so what we did instead is we get what's called sequential D square threshold a we did these squares all the stuff that's really small we throw it away we do it again all the stuff that's really small to throw a threshold we throw it away we do it we're doing it do it what are we left with these are the nonzero components what are they X Y and this is y so this tells you basically in the first term you have an XY and XY in the third you get here's your mixing turn XZ you basically recover that model right there the coefficients for those columns here are nonzero and it gives you back exactly the Lorenz equation not only does it give it back to look at the the values here this was supposed to be 28 you're within ten and two minus three all right so you recover the model and the parameters are extremely accurate so you rediscovered the lorentz system okay you also find that you can handle a little bit of noise this is very important why can you only handle a little bit of noise we can have about one or two percent noise here and part of the reason you can only handle a little bit of noise is it's really hard to differentiate get accurate derivatives when you have noisy measurements here's another thing that's really interesting if you come up with a really great algorithm for producing derivatives it's fast robust takes care of noise everybody will love you everybody because we need to do that and differentiations heart like we think we have that problem solved we don't because most of things we're doing now is like well I've really noisy data and I want to produce an accurate derivative good luck it's hard to do here we have to do TV differentiation that that is a fancy derivative scheme which allows us to one or two percent if you're doing finite differences here you can hardly put any noise on this at all okay let's do a harder problem here is fluid flow around the cylinder this is like a classic a classic of applied math just in case you wanted to study applied math you better at least know about flow around the cylinder flow around the cylinder what happens you start increasing the Reynolds number you get this bifurcation that happens you get vortex shedding on the back end of this thing okay so here's what we're going to do this is a pretty high dimensional system right because it's a it's a PD solve with descritization and X&Y and even if you do a hundred grid points this way you hunter this way you know you're at ten thousand degree of freedom system is that at ten thousand degree of freedom system doesn't look like to me it looks like just fluid flow doing this so what we do instead is we take that and we run it through an SPD the singular value decomposition looks for the low rank structure and when you look at the low rank structure you find if you look at flow around cylinder there's three modes that matter so when you do the SVD there's three matrices come out u Sigma and V the U the first three columns of u are these modes right here those are the dominant modes okay sorry here the dominant modes those are the modes okay so those are the PIO demotes or SVD modes and then the V matrix that comes out of the SVD tells you what these modes are doing in time and so you say like oh I have a time series of what these guys are doing and they dominate all the dynamics I'll run that through that Cindy algorithm which is the sparse regression and what you do when you run it through here's the dunt governing equations three dimensional system for the dynamics and in fact what was interesting about this was we just did this put it in got this model out start looking at so wait a minute this model is actually derived this is an open question for awhile in fluids how does navier's stokes which is a quadratic non-linearity produce a route towards turbulence which seems to go through a Hopf bifurcation which is cubic the nonlinearities don't match and so this was proposed so all the way back in the 71 and then essentially all the way down to here in 2003 were burnt no active some very careful Assam talks showed that it was basically through a multi scale interaction of fast slow scales fast and slow scale variable interaction that effectively produced the cubic even though you had a quadratic underlying not only arity so 30 years later you have this derivation we got it sampling the data sweet so it's kind of nice and in fact right over there is the full city of simulation versus the identified system so we had a harder problem a fluid flow problem and this thing discovers the underlying normal form dynamics of that system okay all I saw was ax equal to B okay I'm just telling you I haven't done anything beyond X needs to be yet let's keep doing ax equal to B but let's do it at bigger scale I'm gonna start discovering straight up partial differential equations I'm not going to reduce this anymore I'm just gonna go ahead and take the full PDE it's back to a flow around a cylinder this is work with Sam Rudy who's a finishing grad student I collect snapshots of the field I build my ax equal to B but now my library of candidate functions have spatial derivatives driven X tourism while applause ins all this just make a big library it's bigger and the state of the system is huge now right because now I'm actually working with the full PD data this matter just aches equal to be regress what do I find I discover the navier-- stokes equation no pill-box derivation no conservation of mass conservation event momentum these kind of things that you normally would use to derive this just straight-up regression on the data and you get back navier's stokes the other thing that's interesting about this as you can say well that's really a big regression problem that matrix a is massive I would really like to make this smaller so you can actually matte substantially subsample this in patches make a much smaller matrix to do the regression and you still get navier's stokes you need to do it in patches and the reason I say you have to do in patches because you still have to compute spatial derivatives locally spatial derivatives you have to use your neighbors to compute them okay again a good place to apply awesome differentiation techniques okay so you do P DS o DS another couple interesting points about this you can use a lagrangian measurement system in other words you can drop a sensor that's not fixed into the dynamics but moves with the dynamics why is this important if you're gonna measure the ocean dynamics for instance if there was some place you want to drive an equation you drop the buoy in and let the buoy just evolve with the dynamics okay so here you attach a sensor to a random Walker you have two random Walker do a bunch of trajectories you can collect the histogram of their behavior and then you regress and eventually if you get enough measurement data of these Walker's you find heat equation now this is a classic derivation out of applied math but the point is here you can start thinking about your your your sensors don't have to be fixed in the dynamics which is really hard to do for some of these things right it is really hard to fix a sensor in location in the middle of the ocean you know pay a lot of money to do that but it's pretty easy to just drop it in and let it go and just watch it evolve so this is a really nice way to think about that the other thing is if I if you saw a wave move across this room I can ask you what kind of equation would do that well you come up with a bunch of them which one is it so you have to be able to disambiguate between models so here for instance are two waves if I just showed you one wave move across the most parsimonious representation of equation moving across is just this simple one-way wave equation but if I said well actually if I use different initial data to different heights or multiple initial conditions with different heights waves you'd see this you'd like hold on there's no way it can be that model because it's tall one's moving faster than the small one in fact what could it be then it disambiguates and gives you back the KTV these are Katie these all times so you start eliminating out of a potential set of PDEs if you give it a diversity of initial data you can nail down the PD that should be there here's a bunch of canonical models PDE models that I learned in grad school and that are kind of standard for you know these are like laptop level computing we know a lot about these systems from KTV Berger Schrodinger nls Kermode of Shiva schinsky reaction diffusion Avior Stokes we can discover them all you just sample spatial temporal data you got the PD okay and it also tells you about how much noise each one you can take so I'm have pretty serious noise restrictions right you can't get that much noise on there else you can't it's hard to discover and there's there's some underlying reasons for that but I won't go into now all right time for a real experiment I then walk around your buildings I've been noticing a few things one of them is that you'll notice here I don't know why they do this this isn't in the mechanical engineering building at u-dub you know how they I don't know how do they think about this like how do you just say like hey I got an idea let's get some off-white cinderblock and then we'll set it off against a white off-white linoleum floor set off by a off-white you know let foamy stuff that's up there like how does that look good but there you go you can do experiments and you'll feel no guilt drilling holes into that wall because right it's just like ugly but here we go state school style experiment boom here this we threw a pendulum in there here's some trajectories can you see this little blue light on the in there a little bit it's also on the end there okay let me tell you what's on that blue light a little Arduino see that it's on there collecting accelerometer data all right so we just whip this thing around collect data these are some trajectories this is raw data out of that cellar ometer okay and you run this through the regression Sindhi regression here's the model you get so I actually just wanted to show you the actual output a Python and here it is it's a little Python notebook DX not DT is X 1 DX 180 is some small number X 1 minus about 4 sine X naught boom got pendulum equation real data right so by the way interestingly enough if I don't have some of the trajectories go over the top it will give me back those just say oh that's like X cubed or maybe the next fifth term they'll give me a Duffing equation or corrections once it gets over the top it says polynomials can't do that and it can't right so it says I have to switch out to the sign and that's part of the library so just picks it up okay so fairly clean data works nicely by the way I'm gonna say this now and also say it later all code all PD solvers everything we did here is available online you can download it all of it everything here is reproducible ok just want to throw that out there now not bullshitting you everything's there you can download it ok and do it now you notice also that we've pretty clean data right now it's on this is pretty clean experiment I'm gonna talk about some challenges here are the challenges so right away I'm gonna go ahead and pause it things that can go wrong right I don't want to sell this as like oh you can solve all problems with this clearly I've already showed you that noise is an issue oh yeah you have to edit out that me swearing forgot I'm on camera it's so immature you know I went I'm off-camera I think I feel a little more loose but I'll try to rein it in here's some couch challenges limit to data and noise it was terrible it's tough one of the biggest things that neural mats often assume by though will come back to neural Nets I got all this data hey a lot of systems you don't what kind of systems biology for instance hey you're looking this tumor growth hey how many snapshots in time you got five five snapshots in time you can't build bottle with five snap you can't yeah but like you know when we look at this mouse every time we do get the data we kill it so that's why they have five snapshots in time four five they killed mindset five different times right it's not like you say like what are you measuring at 20 kilohertz you know magic 20 killer it's a mouse again biology will Balaji has all this so you have noise measurement multi scale physics you're not just discovering one physics there's heterogeneous physics going on they don't couple together that's all part of the thing you're measuring how do you disambiguate parametric dependencies what if the whole system is changing slowly in time because the temperature in the room is changing it's the same system all the parameters are varying when you do this data-driven discovery it really suffers with that unless you can disambiguate what is the dynamical system and what is parameter variation stochastic systems let's leave that off the table for a minute because I'm not gonna I'm like so I'm gonna try to tell you that we can try it we can make constant progress on some of these okay a little baby step forward X we think we might be able to do this eventually but right now let's just that is an opens challenge all right so let me get one at a time multi scale systems so multi scale systems we can start thinking about like let me take two systems that are coupled let's say maybe a Vander pol and a Duffing once super fast once slow they're driving each other so what you can start to do is come up with a sampling strategy where you measure the Fast and Slow jointly by the way they're not this it's not like you got a measurement of each one you get them they just mixed so it turns out if you mention on a very fast scale if you have good scale separation that's the caveat if you have good scale separation so it is not gonna hold for turbulence where you this cascade you good sale of scale separation then you can say like if I measure very quickly the slow scale looks constant I can discover the fast scale and now with that information it's a prior for discovering a slow scale which now I can measure on much longer time scales okay so we can do this there are technologies for that this is a paper that's up on the archives now with student Kathleen who is fantastic she did some of this she also did some stuff on latent variables what are latent variables well you go back to the X dot y dot Z dot example of the lorentz so I'm not gonna give you X Y Z time series data I'm just gonna give you X now that's pretty challenging so first of all I may ask the following questions you measured this thing and you get this time series X and then you're gonna go like I'm not even tell you there's a y&z because in a lot of systems you may not know how many variables there are what kind of systems biology yeah exactly thank you for all answering at once in your head at least oh yeah so like when you measure systems what do you measure well you measure what you know what you measure you measure whatever you have a device for like I have a thermometer so I'll measure temperature here's a pressure gauge those may not be the right variables at all it's just that you had those instruments okay so in this case there's this fantastic thing you can do if you have just a measurement of X you can do time-delay embeddings you take this time series you build what's called as Hankel matrix we should take the time series the next row of that time series you shift everything over by one delta T and one delta T and one delta T you take the SVD of that matrix and when you take the SVD of that matrix the singular values so this is the SVD decomposition you look at the rank of that matrix and for instance for this here you find three modes dominate so it's giving you an indication that oh there must be a y - Z it's not going to give you y&z but it's going to tell you that this thing here I can work now in proxy variables for a three dimensional system of dynamics that I know exists I didn't know there's three variables I knew I do now and now I can build models in that space so this turns out to be a really nice Arcas there's actually a lot more here to be said about time delay in beddings time delay embeddings are interesting that they come out of what's called tokens embedding theorem which goes all the way back to the 70s and there's one of these kind of unicorn math things you know where people hey look at this abstract math it'll never be useful but it's really cool but never useful and now this is coming back in it's like actually it's completely useful right so sometimes you have to wait on this so for anybody who's into this high-impact science if you know York is high-impact at the moment you're writing and submitting it this is that's actually probably not truth the truth is you don't know if your works gonna be high impact maybe in for 50 years right so anyway you can maybe see how I feel about high-impact thoughts I think we do things that are cool we try to get results regardless if we think they're gonna be high impact or not usually parametric systems how do I can a disambiguate the dynamics from parameter parameter variations so for instance if I was measuring that flow and the Reynolds number was changing it would have a very hard time discovering the physics right but if I say okay let me do the regression in bursts I will do like a group lasso which I'll take every burst the sparsity pattern I get has to be the same for all the data but the coefficients can change but we don't do group lasso because it doesn't work well we do group sequential threshold analysis oaks and there's this other parameter it's doing this and that's the Reynolds number same thing you can use the same thing to discover spatial dependencies in PDEs you can also integrate this with control so you can add constraints to it like if you know there's conservation of energy in the system you can do things like this say hey look at this and by the way I'm gonna enforce a constraint because actually the system I'm measuring I might not know what it is but I know there's conservation of mass or energy or something like this there no it's just optimization it's F and G you get to put whatever you want in G if you know this conservation of mass even if you don't know the dynamics put your conservation of mass in there all right you know that's plenty nice and this gives you a great way to discover models with these constraints you can also bring in control you can tie this whole control structure into sampling data discovering the physical law and also building an MPC controller on top of this architecture okay and this works better I want to just show this slide here because I'm about ready to slow throw some rocks at neural nets so when you do this something like this we can train this with our cindy model because it's so parsimonious and small it actually does much better on generalizing and actually performance than neural nets that's us that's neural nets we're better I might say about neural Nets neural Nets have a little bit too much going on with them let's come back to that a minute okay so you can do control with this there's lots of other innovations going on here's a list of other people who have started working in this area doing things around you queue sampling strategies integral formulations corrupt data things of this nature and also I'll make one comment about sparse regression we recently developed a sparse regression framework which we think is much better than what we did with sequential d square thrush holding much better than less so in fact this sparse regular relaxed regularize regression Niq beats the state-of-the-art in lasso compressive sensing matrix completion TV differentiation super easy to implement codes online it's really robust and it follows from a very simple idea of relaxing the optimization into two variables one which tries to match the kind of data you want to do and it get the good fit the other one which tries to match the sparsity that you need so by basically separating the penalty's you do a much better job and these don't have to match they have to do the proximal to each other it's a much nicer framework okay here we go in neural nets I have a lot to say about neural nets I'm a little down on neural nets I'll tell you why in a moment there's a whole zoo of neural nets I made that chart by the way so I spent a lot of time so I hope you're super impressed because it took me forever I had to move those dots into place on PowerPoint by hand okay I'll let you admire it one more second yeah okay and here's what's kind of interesting about it I'm just gonna build a neuron that and this is named in building a neural net this code here is doing a simulation of the Lorenz model so the first few lines set up the Lorenz run it from time 0 to 8.01 steps and then I stave save snapshots between inputs and outputs pairs right so this is at T this is a T plus delta-t this is a t t plus delta T I just save all that data into a matrix called input and a matrix called output and I do a hundred different simulations with a hundred different random initial conditions this is my training data alright so it looks something like this right I mean this is so easy to run there's no there's nothing here this is just like Odie's OD 4 5 simple yeah off the attractor this random let him sell I actually want no I want to actually sample off tractor if I don't it doesn't know what to do off a tractor but here if I give it a diversity of off a tractor or let it settle down you know I'm just gonna now here's my code for training the neural net it's so much smaller in fact they just made up or random neural net a feed-forward net size 10 each the transfer functions I just made these up randomly log sig radio basis purely linear train input/output net now here's the danger of neural Nets tensorflow Karras all these platforms make it trivial to do this including MATLAB it was so much easier to write a training code up three times the amount of code than actually writing the integrator now what I get out of this is a model how good the model pretty awesome actually so it's a feed-forward net a lot of people say well you should use the LS TM it's time the apparent pile I've seen no evidence that LS TM works better than a feed-forward even though everybody tells me this I say is there a chart in there where they show this no there never is if someone can point to me the one show me this is what I hate about these like yeah it is better see it set up for time it's like show me it works better then I'll believe it nobody has shown me that yet in fact I bet I could take any of those neural networks zoo and get about the same performance with this trivia line is this okay that's the danger like did I learn anything I'm not sure I did I have a great predictor in fact here's what's kind of remarkable what you're seeing over there there's two new dots I just make up two new original conditions and I run one of them forward with a really high tolerance integrator OD for five like ten millions ten accuracy and I just say okay now run my network forward so put my initial condition into the network to get an output and just keep cycling it through and look at that they track each other amazingly well they eventually fall off and the step size between these is 0.1 okay that was the Delta T now here's what's interesting if I were to take OD for five just standard with a step size of 0.1 it would have fallen off way before my neural net this no man turned an integrator that was way better than no t45 how did it do it I don't know I just had these random functions I could have made up a whole bunch of other neural nets we've done just as well neural nets are universal approximator x' you do almost anything with them and claim almost anything that's the danger that I find with neural nets so what I like to use neural nets for is very targeted applications so I'm gonna use it in places is it true that if you gave me a bunch of snapshots that I could build a predictor for you absolutely in a neural net you can do anything if you have training data do you understand why you did it why it works no so interpretability and generalizability matter a lot to me personally okay cuz I want to actually engineer a system and I want to actually generalize anything I have to engineer a better system so let's talk about how I want to these particularly manifolds and betting's okay so I start off this whole talk with this idea of like hey give me time series data I want to discover a nonlinear dynamical system for you right that's what I said is that right about something like that yeah pretty close right you're not sure you want to talk to me that's alright I know that's the danger of sitting in the front row that's why all of when they came in they were looking for seats up there I'm gonna change that all of a sudden on you I'm gonna say you know what maybe maybe I should just give you a better coordinate system maybe instead of discovering this non-linear dynamics in fact this is gonna change your spin on what it means to discover physics maybe I can just find a better coordinate system in particular what I want to find is this goes back to this concept of Koopman which says you give me a finite dimensional nonlinear dynamical system there exists a functional space a function of that X of this state space where the dynamics in that space is a linear I said they have Koopman of course this is 1931 he has no computer he tells you this thing and then he doesn't tell you how to get gee like nobody tells you how to get gee still today except I'm gonna tell you maybe how to try to get G in a principled way by the way it's very easy to get G by just projecting to an infinite dimensional space but then then you know better than just just Trina no Matt if you're gonna do that well if you're just gonna go if you're gonna commit to infinity let's go to infinity but don't pretend you're doing some cool math all you're doing is a universal approximator and let's be fair about calling it that versus oh I'm doing something fancy no you're not you're just saying go to infinity I can make anything happen infinity okay that's one person's view okay I got I have a Doctor of Philosophy did you know that so it means I can philosophize a bit up here all right let me give some examples and how this might work here is like a voice and DePrima level problem look at that look at that little simple non-linearity and what would you do well okay I do face plains find the fixed points face plains qualitative theory of different equations what of it instead I gave you and said how about if I work in these variables y1 is x1 y2 is X 2 y 3 is x1 squared and if I make the transformation to these variables this thing becomes linear I took a nonlinear dynamical system in two dimensions I went to a three dimensional system perfectly linear why do I like linear systems because I can do everything now right I could solve it by hand I can write down the solution for you on a piece of paper and say why would you simulate this I got the solution for you okay here's another example that's famous burgers equation depends if you like PDEs if you like PT ease you know this one here's burgers with diffusive regularization is full non linear PDE and has dynamics something like this in 1950s two papers came out right about the same time 1950 a 1951 by Colin Hoffman they discover it at the same time that transformation here which is the Kohath transform and if you go into this new coordinate system the dynamics is linear that's a Koopman embedding right so you went from infinite dimensional space to another infinite dimensional space and then it begs the question what does it mean to discover physics now I was after parsimonious representations that kid you know normally I would just go after that burgers look it's only two terms I can if I can discover that from date I'm in pretty good shape right look at that I can get this model it's got sorry two terms on the right side UT close and there's two terms there but there's this other coordinate system that's kind of funky and which is even more parsimonious no isn't it more parsimonious it's linear I can solve that by hand okay so this is partly what I would like to do is figure out a way like how can i I'll kind of learn these things because if I can learn these transformations it'll transform the way I can do the problem right I can can I give you just a better coordinate system so that everything you're doing is linear that'd be kind of awesome right it's awesome for two weeks two reasons one because you can write down the solution pen-and-paper to all of control theory guarantees and theories almost based exclusive it on X dot equals ax plus bu so then go to a linear system it truly is linear not an approximation and I can do control all these complex problems cool all right so that's one after how I'm gonna do it this is where I use a targeted use of neural nets I'm gonna do it by doing the following this is work of Bethany unless hey Bethany undergrad from Notre Dame okay I was gonna do a cool symbol like Washington we do this at football games that's the W do you guys have a note Notre Dame's like hey come on anybody got one that's yeah Fighting Irish or yeah after you sideways so you can see this profile Bethany she is totally a hitter you could tell all right here's what we're gonna do we're gonna learn a map from the input space to this new variable Y and back it's just an autoencoder variational auto encoder you make a transformation go into this new space come back out okay what do I want to have to happen in this new space in this new space here's what I want happening I want to go from yfk which is a cave time point to YF k plus one with a linear map so f and G my G now that constraint is gonna say I need this to be a linear map the key to solving this problem which took us forever to figure out embarassingly was to realize let's talk about a pendulum for a moment it's obvious after the fact I know but let's just talk about it when I have a pendulum we think we know how to solve the pendulum right but the pendulum is a saloon your oscillator like this theta double dot equals minus Omega theta two-dimensional system right signs and co-signs what happens if the oscillation starts becoming bigger all of a sudden you start seeing more of that sine theta that you left out there right so you get a theta cubed in there you get a contribution from the non-linear what does the non-linearity do in fact what does non-linearity do in any system not just a pendulum it shifts the eigen values or frequencies and it creates harmonics we've solved these problems through asymptotics you do an asymptotic expansion all F you're gonna do this now you're gonna get sine 3 Omega sine 5 omegas the frequency is going to shift and I can kind of compute them through asymptotics as I go to bigger and bigger oscillations more and more Fourier terms they're going to show up in that expansion how many degrees of freedom is this thing - I'm just using a really crappy representation if I'm gonna stick with these frequencies I built so what you have to not only learn is that when I pinch down is two-dimensional but when I pinch down if I do not account for this frequency shift I will commit to an infinite series expansion there's no way to get that down into two terms there so I parameterize that linear operator the Koopman operator with that frequency ok so I learned a permutation as soon as I do that everything unfolds so let's get back put it here so here's the pendulum here's what we're gonna do with it this is the pendulum as you start going to larger and larger oscillations so state of theta double dot this is our theta theta dot here's what they look like and as you go to large oscillations these are not looking like signs you can clearly see the frequency shift you can see this thing's not sines or cosines so if you were to represent this we typically represent it as an infinite series expansion in terms of sines and cosines okay but with this transformation I learn I take this system and I transform it to a linear system those are the coordinates that get me there these are the eigen functions that take me to this new place and here's the dynamics that happens here the dynamics along these circles are all just hit it with a matrix a a linear matrix a gets me delta T into the future and it's parameterize by how far away you are from the origin okay so I've built myself a Koopman embedding a linear model for this fully nonlinear system you can also do this for flow around the cylinder which is a fully nonlinear system this three by three I wrote that down earlier as a three by three dynamical system which was non linear you can do the same thing here learn the linear embedding of that thing it becomes fully linear okay so I can transform this to a linear setting thing we're working on now is tying together this embedding structure with the Sindhi regression to discovering known your dynamics so the linear enforcement is harsh but that's a really tough that's a really stringent thing to meet so now what we're doing is saying hey well what if we go in here instead of making it linear or let's just make it a parsimonious nonlinear dynamical system a little bit like what we've discovered with that Sindhi regression so that's what we're doing now so now you can change your loss function to account for that plus the fit so you're not integrating about the both world I want a really great coordinate system plus a representation that's really parsimonious parsimonious representations typically only have a couple of parameters which allows you a great flexibility in generalizing the model unlike a known that notice the targeted use of the neural net by the way more generally if you have different parameter regimes dynamical regimes you can learn different Koopman embeddings around and then connect them together so it's not like you have one Koopman embedding for all the dynamics you may have several embeddings like this for different dynamical states of your system it's good a noise noise is always a killer noise will destroy everything you do at some point you get enough noise all your methods fail okay I speak truth okay all right here we go now here's what I want to do I'd like to try to denoise this signal if I take measurements like that remember I'm a little frustrated because I only handle about one or two percent measurement noise if you can't get past that the technique is not that good because you write in real-life constraining yourself to such small measurement noise is probably going to mean limited application so we're really trying to figure out like how could I do noise this thing here's how we're gonna do it you know that I'm gonna give a lot of examples here this is a kind of crowded picture but here's the thing let me just give you the skinny on this I'm gonna take my data I'm gonna separate it I'm gonna say my data that I measure is a deterministic dynamical system plus random noise added to it for every single time point for every spatial point what I'm gonna do at each time point is say well like part of this is deterministic noise and part of sorry part of this is a dynamical system part of this noise so I'm gonna say is but the dynamical system if I had it you know what I do to solve a nonlinear dynamical system I put in like runge-kutta fourth-order Ungar cut it okay so just have a bunch of ten snapshots I want to do the following whether I keep a part for my dynamical system it's gonna have to satisfy a Runge Kutta scheme right has to it's a deterministic dynamical system so I'm gonna keep splitting my data and updating my split based upon the fact that whatever I keep out which is the dynamical state has to satisfy some integrator and in doing this process in training yes split the data into noise and dynamics so I can take this really crappy data boom pull out this so now I can go like 20 percent noise and this thing here will learn the noise model pull it out say this is noise this is dynamics this dynamics here satisfies an integrator like a fourth order anger cut it if you have a stiff system ill sore you can enforce an implicit solver there I kind of actually was amazed this work it works codes available remember all downloadable everything ok final comments here so that was again another principled use targeted of neural Nets final one is about fast learnings a lot talked about learning and fast learning is really important actually for a lot of systems if you're gonna build let's say a control system especially and you have autonomous vehicles I have a flight model something happens the disturbance a disturbance in the force and you gotta like make adjustments alright so for instance a UAV is flying or you're flying in an airplane half your wing comes off I mean it doesn't get where shut off where you lost half a wing somehow okay and how quickly can you recover the flight of that thing you got a few seconds and that's about it you have to build a model you don't have like oh let me just take a bunch of data can you hang on for a couple hours I'll build a model for you know you got a couple seconds this actually happened the Israeli fighter pilot lost half a wing the dude was able to figure out to turn on his side thrust and fly that thing back that's awesome he built a model in real time without dying the better call would have been just eject that's I totally would have just I'm out no we got you figured out turns out he got plenty land he got arrested because the way he lost his half wing was under questionable circumstances he should have never lost the house rank in the first place so he gets arrested when he gets back but that's still Power Move I was awesome all right so fast learning is really important and you don't have to build a great model you just build a really crappy model quickly and reason what inspired us to do this was actually insect olfaction you have all these amazing neural network architectures for learning right like you can learn blah blah blah we say well you know what a moth can learn with one trial you give it an odor and a reward bones in memory who knows what it is one trial we're nowhere close that how does that do that so we actually built a model there's all this bollocks don't get data here we go tons of data this is the architecture has odor has an antenna lobe what's called a mushroom body they're connected the neural count we can kind of get most of that together the main thing is the sock dopamine that's the reward structure if you hit it with octo me when it gets a new odor it can build a code for that odor oh time to go through this let me just come back to here let me just go through here trust me for Mom for now I just want to finish up here quickly what we did with this architecture which was biological based pin down to real recordings of neurons as we said ok with this model let's go have teach let's just teach this moth how to read in this data set ok all right so that's the moth learning I don't read em NIST here's the architecture right there and I just want to show you the results here's the comparisons so right here the blue line is ours number of trials one trial it's almost at 70% it builds a very low fidelity model with one trial notice the model doesn't improve very much and it takes all these these are like the state of the art in fact we got this Google guy when you saw this we got written up in an MIT tech review and a quantum magazine you know like hey okay and then this Google guy of course their Google engineers they know better than everything like well the Commission made you need to really compare against this this is actually the state of the art so he actually gave us all the specs for doing the state of the art for these guys so we work with this guy for a little base oh no no you got to do this or else I won't believe you we did everything fine okay I believe you fine you know these guys are fine but look we still show them if you do this it's gonna take about 20 samples by the way these continue on to 99% there's a moth need 99% the moth flies through an incredibly noisy environment 99% ridiculous it flies through a pollution atmosphere with all kinds of other things 70 percents about all it needs and it can learn that with one shot like it's kind of remarkable so if you want a fast neural net to learn go to biology actually this is where biology actually shines so you know how it's kind of down casting biology now I'm up lifting biology biology is awesome bio-inspired you say things like that and people love it I have a bio inspired model okay there it is okay so overall final thoughts here you know I really think about a diversity strategy got to bring to the table if you're gonna do data-driven discovery I just showed you ax equal to B but neural nets play a role but there's a lot of things in between that I meant talked about and it all meant to handle these systems but maybe the first thing that we never talked about the first question should have asked in the first place is what is the nature of your data that's actually the first question you need to ask I think about data do you have a ton of data and here's a little chart which is how much of the data do you observe are you actually observing the full state or just like well I see temperature I don't even know if that's right variable but there's maybe a whole bunch more how much data do you have biology of five snapshots of time it's not much how much noise is in it and depending upon where you sit here if you have a lot of good clean data highly observed neural nets work pretty well down here way up there in the corner with crappy data and very little of it you might do dynamic mode decomposition and then everything in between so I think you need a diversity of techniques depending upon the data you have and a lot is it connected together through theory that's what these are meant to represent not a molecular structure but more of a connect in sort of theater way and by the way neural Nets don't just sit here as an input/output these can play role in all of it right but targeted this is a little self-promotion sorry if I made a bunch of money off these books that would be one thing but like okay I can go to Chipotle with a check I get so it's not like but there's lot ik this is going to come out soon and the main thing is this we have a ton of YouTube resources open source code everything's out there everything we do we film it up we have a bunch of stuff I've developed a course on a lot of these techniques a lot of lectures are up so if you are interested in this you can go to my website go to open source lectures look for codes everything can be downloaded and you can find this stuff and as is a ton of code in these books and we'll make it all available and that helps you come back to here and think about the diversity of techniques I'll stop there thank you guys very much [Applause] questions yes yeah so the the the short answer of course is what you just said we haven't even gotten to touch on that yet we were just so pleased cuz even the note denoising we had here is sort of like we just put this up on the archive a few weeks ago by the way should mention if your case you are interested in this kind of work we if you do know the noise if you do know the dynamics like for instance if I kept Kermit Oh Shiva schinsky here and I can load it with like 250% noise and pull it all out if I actually know oh I actually know the governing equations so they don't know the governing equations go 20% if I know the governing equations I can get like massive amounts of noise to pull out so it actually works better than a lot of these some song and unscented Kalman filters all these things it's like we've kind of showing that like dang this is kind of amazing now we can do different distributions of noise not just Gaussian colored fat tails but how we dealt with correlated noise no not yet we hope we can I mean the thing is that what's going to probably come into that is making a more assumptions fixing up that G right so with the G you might be able to say something like well okay how do I put a prior on there right what's going to come into the G to do something but we haven't done anything like that yet we're just so far going yeah we did some stuff it worked on galaxies and white noise yes that's the standard everybody seems like if a CC noise it's like oh galaxy and white noise and it's probably none of its bright galaxies and white noise but that's fine we're super happy if we can do that case yeah yeah so a good question so this is a question around what happens if the right term is not in your library or in your biblioteca alright so here we go if I had that library and the term is not there typically the nice thing about this Sindhi architecture it fails in a very stereotypical way what it will do is say hey I'm not seeing what I need so the only way to represent some of this is I actually have to use a bunch of terms so what it does the the keys hallmark signature is instead of getting any kind of sparsity patterns just the way I have to light up a bunch of these two kind of almost almost tries to build a Taylor series expansion to all the other terms to get that term now that's typically a dead giveaway now the question is what do you do about it we've been working on an architecture which would have a genetic algorithm on top of this you give it a library of starters which are pretty good and then it would start adjusting these as it goes so it can start building new terms like okay actually good hits here and here these don't seem to do anything but these aren't quite right now let's make some children out of them and start building this so we have some success there about building a genetic algorithm structure that will actually start on the fly start adapting and making new terms with the hope of basically converging to the right terms even if you didn't start with them so we hope that will work my only hope right now is and I'll just make this comment why do you think this will work is oftentimes a question I get which is you know when I look at this this is interesting to me at least in the sense of physics and what I grew up with it what I don't think we're discovering is actually the real physics the full physics what we're discovering is the dominant balance physics so this idea of dominant balance is really important goes back to the 60s of asymptotics we made a lot of progress in fluid by looking at different regimes hey what happens if I've slope like the Reynolds number such an important parameter because it lets us look at different asymptotic regimes how many parameters are there in fluids like eleven Prandtl number Reynolds all these are asymptotic reductions where we can say in this regime here the terms of dominate I can understand that physics in this regime here the terms dominate what we're actually discovering is just dominant balance so my hope is and by the way that's why I think polynomials work so well undoubtedly many of these systems are much more complicated in this but really the cubic somehow is representing this dominant piece of a Taylor series expansion of maybe something that's much more complicated but all the other stuff is so small that I can actually model the whole thing with that dominance so that's one way I'm thinking about this and that's why polynomials are actually so effective I think oh great question so some slides I cut out because I went over and still over time sorry about that arborvitaes apologies we actually have all of this set up with a model selection on the back end so we've developed out the thing is like typically what happens this can generate a combinatorial art set of models for you and you find that actually instead of just taking one model you can look at all the models that live on the Pareto optimal region and it's a nice thing so you it's kind of like a really nice way to do model selection which is because of the regression that lets you consider a combinatorial large and you would never do this in model selection because you don't have the bandwidth to run all those models instead you just look at the ones that sit near the Pareto optimal front run those get AIC be I see scores ranked them according to whether they strong supported weakly supported or not supported and then that way gives you model ranking and so you can tie in like if you have two models you take the one right with the best information score well that's good question I don't know if they'll tell you that no I mean normally what we'd say is you'd want a big diversity of initial data the more you sample like so if you just if you live on an tractor and you only sample the attractor sometimes it's hard to so for instance what if your tractor is just a traveling wave there's a lot of models will give you a traveling wave right so you need to see multiple initial data to really start discriminating and make sure you have the right model systemwide new feature oh well here's the only okay control is an interesting subject to me and I found it to be an interesting so so these control magazines are more about proofs and guarantees right right it's really rigorous and when and though mostly has to always be framed in terms of a linear sit the only one going to say something rigorous is if you have this linear model because non linear model is very hard to make any guarantees at all so even in the sendee in the cindy MPC architecture we have we saw the non linear model with with actuation so do we have control guarantees none yeah I don't think so it works but like it's a little bit okay so one of the things we also like about this this sparse regression and this sr3 I told you about is that you know what people didn't like so if you do optimization like sparse optimization even though you told somebody especially if they're in the statistics world because there's a proof about lasso even if no data ever satisfies the proof there is a proof right in practice and never holds they still like the fact that it's a proof and then we tell people like yeah you do sequential discretionary it's awesome and works great so much better than last so look at all these examples and you'll you know proof right so that's why we partly worked on that so we're working kind of next on guarantees if we can get any we hope to yeah yeah yeah yeah we know any performance can do parametric dependencies we can handle through that architecture I talked about which is thinking about setting up an algorithm where you discover at jointly the time-dependent coefficients as well as the dynamics so that if so that kind of parametric dependency we think we can find okay oh for control no I don't know yeah I think we're pretty so this this MPC controller hasn't even appeared in print yet like it's on the archive so it's it's kind of like what I would say is it's we're just starting in that area I think over the next few there's gonna be a lot of progress I hope we make right I mean I hope but we just have the first step that says hey this kind of works and we have all these examples to show it works but we don't have any guarantees or theorems around it yeah Oh see good see cuz then I don't answer hard questions we can we can come up afterwards all that yeah all right all right [Applause]
Info
Channel: Scientific Computing and Artificial Intelligence
Views: 1,937
Rating: 5 out of 5
Keywords: machine learning, Bayesian inference, Monte Carlo, approximate inference, uncertainty quantification, Gaussian processes, HMM, state-space models
Id: UnEuRYpqFM4
Channel Id: undefined
Length: 72min 51sec (4371 seconds)
Published: Thu Nov 01 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.