Tutorial: Julia for geoscience

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
live streaming is on all right and welcome everybody to the second last tutorial for transform 2022. um today we're going to look at julia for geoscience before we get on to that transform 2022 is a you know digital conference type thing with around digital geoscience online conference around digital geoscience with everything being as openly accessible as we can make it so tutorials are being given on youtube as live streams which if you're catching now hello if you're watching this in the future awesome um so many thanks to curvenote.com for helping us out with this um for the support of transform uh you can find the schedule and everything so you can see what tutorials you've missed if you haven't caught them live at software underground.org forward slash transform if you're wanting to get involved with the community in general you can join our slack softwareground.org will get you there and then look for t22 channels remember you can pause and rewind these videos if you missed something that is even if you're watching this live you can still rewind a little bit but i'm going to hand over to francis yin who will take us through some julia for geoscience i'm quite excited for this one great thanks martin for introduction um my name is uh zein and people usually call me francis because it's easier to pronounce i'm a third year phd student at georgia institute of technology advised by professor fedex herman and my research interest is in general inverse problems scientific computing deep learning and their applications in time-lapse seismic monitoring of carbon storage and sequestration and today i'll talk about julia for geoscience which will which is something that my research is based on every day and actually when i joined the phd program at georgia tech i came from a math background i didn't know anything about geoscience anything about geophysics like when they when the group members talked about the seismic vibrations for waveforming versions i had zero idea of what they're talking about but i'm very glad and appreciated that appreciative that the group members in our research lab built these fancy julia software stacks that built every seismic tutorials and also seismic inversion softwares in a very abstracted way so that i don't have to be overwhelmed about solving these seismic migration problems solving these seismic imaging problems literally with these correct julia software abstractions i'm just solving ax equals to b so that's the idea i want to convey in today's tutorial how can we use julia to abstract the physics or geophysics geoscience applications and how can we uh incorporate that with some deep learning frameworks things like that okay let me share my screen now hello uh i hope people see my screen right now and uh today's tutorial is placed in this github repository called slam tutorials we actually had a bunch of them from zero zero to 11. this is like a software stack that our group uses and today i'll mainly talk about five of them namely intro to julia i as i promised this will be a beginner friendly tutorial about julia for geoscience and then i'll talk about intro to jolly which is a software um for linear algebra abstractions and then i'll talk about intro to judy which is a software abstraction framework to run divito in julia so judy is actually a software for these large-scale seismic conversions and seismic imaging it calls the veto underlying so if you actually heard about the veto workshop the veto tutorial in the previous i think tuesday then it would be good for you to pick up your memory from there and then go to the judy which is really running the veto in julia in a quite uh simple and clean way and then since um this tutorial is um for geophysical community i'll talk about how we handle these psych wire files in julia so sec wi-fi is one of the um format file format that geophysicists use a lot to store seismic data for example and we have a quite uh clean way to handle that in julia and that can actually talk to judy which is the seismic inversion framework and in the end i'll talk about basic normalizing flow training and sampling which is something about deep learning so normalizing flow is a type of invertible network if you never heard about that before it's totally fine i'll introduce these details in these scripts so don't worry too much about them and basically we're going to run these tutorials and ipy nd files so we'll do these in jupyter notebooks and there is actually a truly a kernel entrepreneur books well actually the ju letters in jupiter is actually for julia so we'll do this in jupyter notebook but if you are in a crunch of time you don't have to time to install the packages or install jupyter notebook we provide these docker images that have a lot of these built-in packages already ended so you can do this docker run this will be basically bring bring you with a link that you can copy paste to a browser to run jupyter notebooks and if you are if you are more interested in running things into the terminal you can also use the docker image to run the terminal that's also fine so uh basically this is the docker image to go and if you want to install some packages well i'll talk about it later okay so let's start by introducing julia so julia is actually a quite um new language it was born in 2009 by some mit researchers in a nutshell julia is designed to be fast to be dynamic reproducible composable general and open source so i think from my user experience julia wants to be a language that is as fast as c or python and as easy to use sorry as fast as c or fortran and as easy to use as python julia wants to do something that get involved into these both worlds and its download uh installation is actually quite simple it's open source it's not like matlab which requires this license you can just do download right here and there are actually a lot of versions of julia right now and julia keeps updating itself quite frequently so the current release is version 1.7.2 which it was released in february 6 2022. so quite recently and you can actually follow the instructions here uh to install it on the system for example i'm using a mac so i'll just do this one i click it and then i'll install something for me then i just do opening and then in the end uh i just installed julia in mac as as if like i i installed install an application into my mac framework and you can also follow other tutorials while other these installers to install julia on your system and julia has this long-term support release which is version 1.6.6 released last year this long term support means if you find some problem about for example this matrix factor product really doesn't work as you expected then after this version 1.6.6 you can reach out to julia community and they will definitely support you to solve your problem and there's this upcoming release which is 1.8 it was released in march 29th but it's in beta version right now so we can pretty much expect it to be released in next few months for example okay after you install julia you can actually run julia in several versions so you can run julia in a terminal like this i open the terminal and make it big you can do julia and it will basically call out this julia repl which is an interac interactive framework where you can run combines commands line by line and you can see my version right now is 1.7.2 then you can rent some comment like brains hello this is julia great and then you can do access to exit this julia interactive repl and then you can also write a script for example you can do touch script and you can sorry you can do the same thing print hello welcome to julia okay then the script will be a sentence in julia right then you can do julia script perfect the script is run and it prints out hello welcome to julia and you can also do this you can do julia and you include one of these script you just written so include script.jl and it will do the same thing so this could be a very long julia script you can do include and what else you can also run julia and small in short sentences by this julia-e so basically if that's evil will evaluate things uh in this single quotation marks so you can do print ln hello julia same thing okay great and then uh you can also run julia and jupiter notebooks based on its uh julia kernel so uh the way you do julia these trooper notebooks is you have to install one of the packages in julia called i julia so right here let's talk about julia's package management management a little bit since i assume a lot of these uh audience will be new to julia that's no problem so in julia there's a you can have different ways to call out the package management system so if you press the right square bracket right here this one now i actually press tab uh press the space but if i don't do the space i just do the right hand side of the square bracket it will call this package management system and then you can do the command at i julia to install this i julia package which gives you that julia kernel in your jupyter notebook and this add command uh is actually a way you add a julia package into your system right then you cannot actually uh live in this framework and do you can do rn which is to remove a julia package i can remove itula no i won't do it this time because i really need that but you can do remove to to remove a package and what else you can also um run shell commands in this julia interactive framework you can do the single quotation sign on your keyboard sorry semicolon i bet you do this semicolon on your keyboard and then it prompts you to a shell script and then you can do pwd for example show up my rule and ls a lot of things okay and then what else um there's this um if you do a question mark in julia shift question mark it will basically prompt you a help so you can search for functions that you're not sure uh in this way for example if i'm not sure about how this function print ln works then i'll just do help print ln enter and then it will say well print out it print ln is trying to print something uh right here which is xs so you can for example uh you can do print hello world so if you do shift and uh this question mark it will prompt you to this help so you can search for function now at least over here okay i think this will be the julia interactive part and then let's go to jupiter notebooks to check uh our tutorials so you can do jupyter notebook then it will prompt you to uh here my my desktop is kind of messy but my um tutorials is right here slim tutorials and i'll first talk about intro to julia all right in this notebook i'll basically talk about five things so first i actually eluded a little bit just now how to use add remove or develop a package in julia the package management system and second i'll talk about how basic linear algebra runs in julia since this is beginner friendly then i'll i'll introduce you to the just in time compiler that julia uses that is quite different from some other packages other software languages for example and then at last i'll talk about multiple dispatch and type stability that makes julia run very fast in a smart way okay let's begin by um package management so we already see here um when we want when we want to use a package in julia this is like import in python we just do using and then we can call several packages linear algebra test benchmark tools we'll do benchmark tools a little bit later okay this is using which is the same as import in python and then we can run this line hope it runs okay okay it's okay and then to add or remove a package we already discussed this a little bit just now in the julia interactive uh terminal but we can also do this in scripts we can do using pkg which is the package management system is a package and you can do pkg add a package so this is the same as if you do this this is the same you just do add iterative solvers uh then it actually does the same thing as if you do do it here okay if you do this pkg dot add it will add this edge resolution package into our system and similarly you can do this rm which is removed to remove a package from your system as we show just now um you can also do this in the julia interactive ones and what else you can also install a package called iterative solvers same package uh by a specific version so you can specify a version 0.9.2 to julia say julia you have to install this version and then actually i will install the specific version and finally you can add a software package by its um oh it just shows the output you can add a julia package by its github repository link so this is actually a link for it resolvers and you can specify that and add also specify a branch of that then the julia can install whatever in that branch of this repository so these are the several ways to install packages into julia and similarly you can do this by the square bracket add or square bracket remove square bracket at a github repository same thing okay this is um what um julia mac package management system looks like uh the next part we'll be talking about linear algebra in julia so the way you set up a julia vector or matrix is actually quite similar to the way you set them up in matlab so if you ever see matlab you also use the square brackets to denote matrix or vectors and you use this um semicolon um to to change rows basically so i can write a matrix a in this case which is one two three four and this semicolon changes the row so basically the first row will be one two and second row will be three 4. this is how you set up a matrix in julia and julia actually stores these multi-dimensional arrays including matrix and a column major order meaning when we read this matrix we read one two three four but actually julia stores this matrix by every column into its memory so the way it stores is one three two four so if you do this a if you vectorize a this matrix a by the square brackets and semicolon it actually shows you this one three two four structure that julia actually uses to store this matrix a so this is a functionality to vectorize a matrix and you can also do same thing here this is another function that vectorizes a that does the same thing as this this cell okay let's run this all right and then um you can also write a vector here which is called x equals to 1 and 2. and julia always assume the vector you write into julia is column vectors so if you take the shape it will be 2 by ob2 but it assumes this vector is a column vector then there are also these built-in basic linear algebra operations like matlab or python you can do you can calculate the inverse of a matrix and oh do things like this you can do matrix vector multiplication so in this case a is a is here 2 by 2 and then x is here which is a column vector of length of 2. then you can do a times x which gives you this two element vector and then you can also do a backslash x this is this actually follows similar syntax in matlab the backslash is to solve the um system a ax a times something equals to x and it solves this system by um um by least squares so um you do a backslash x it will give you basically a inverse times x okay and you can also have these determinant of the matrix a so we have all um functionalities of um of matrix a of linear algebra and julia then that's the second part about linear algebra then the third part is actually one of the quite interesting thing in julia so truly i use this just in time compilation techniques that is quite different from c or python if you think about you run a c script then you have to compile the c script from the first line to the last line first before you run the script after you run the compilation of the entire script you would run the entire script so for c you have to compile then run but julia used this just-in-time compilation so let's see how it works suppose i want to do this matrix uh matrix vector multiplication where a is a random matrix that uses float32 as its type and it has these many rows these many columns and b is a column vector which is and type float32 and these many entries then we can actually run a times b like we talked about before right and then we can use the macro time so at time will basically tell you how much memory and how much time you use for this operation and let's do this okay you can see this operation takes 0.11 seconds this is the actually the first time we run this line a times b then if we do a second run you can see a much shorter time for you to run this line and you can actually do this multiple times you can always see well it's just about 0.02 seconds it's not something like 0 1 1. so what happened is when julia tries to run this matrix vector multiplication julia never runs this function before this multiplication before so if you want to run it in the first time julia have to compile the function of multiplication first and then after the compilation it runs this function so you can actually see here it takes these many times to compile the function that aims to multiply the matrix and vector but in the second run since this function is already pre-compiled from the cell here you don't have to compile it anymore then you can just do run itself so it takes you a much shorter time and much less memory allocation in your um in your operation and this is a bit weird because we actually run something like matrix vector multiplication before in this script we just talk about the basic linear algebra we set a two by two matrix we set a length of two vector and we did this matrix multiplication matrix vector multiplication before i thought it's some function that is already run before but why does julia still think that it needs to compile this function in this cell it's a bit weird right so let me explain it this way we'll we'll discuss this in the next session about multiple dispatch so basically julia considered these the the matrix pro vector product here different as this matrix vector product because the matrix a here which is defined as this one two three four matrix one two three four matrix it's an type of integer and similarly you can check in this cell we define a vector called 1 2 which is also integers then when you do this matrix and vector product it does the multiplication of a integer matrix and an integer vector so that's actually a very specialized multiplication and julia will definitely think that okay i've already done integer multiplications at this moment but i haven't done any float 32 multiplications at that moment so julia still thinks this is a brand new method for it to run so it needs to compile so we can do another experiment here we can do we can initialize a as a random float 64 matrix again this is a new type that julia never seen before and we can also initialize b as a float 64 vector then after i run the cell i can do time a times b so again let's run that cell you can again see this very long compilation time in the first one why because julia have done this matrix multiplication matrix vector multiplications from float32 before for integers before but it just never see any anything about float 64. then in the end julia still thinks this multiplication is very specific i have never seen something before so i have to compile it first so this is a way how julia runs so if you run the same cell again you can see very short time this time because julia already done something before done this multiplication before so julia has no problem of of running very fast in the later ones so the takeaway message from this small experiment is truly i use this just in time compiler so when you when julia sees a sentence julia will think about well have i run this sentence before or this method before i have to run this method and it's exactly the same way it's typed but if i see a new type i still think i haven't run this function before then i will need to compile it first and then run so that's the way how julia just-in-time compilation works it's not a stupid thing but i'll show later why it's actually a smart thing so the next thing i will talk about is this benchmark tools so basically we just discussed at time this macro is a way to uh to take the memory allocation and time from an operation but it only takes one time like the current one so if we want to benchmark a function the runtime of a function then we better use this uh at benchmark macro from this benchmark tools package which um actually gives you a very beautiful histogram that takes 200 samples about these multiplication with one evaluation and then in the end you can get the medium run time mean of the runtime things like that okay this is um how julia just in time compilation works then i'll talk about multiple dispatch which is one of the core things in julia that makes julia um kind of different from other languages okay so what does multiple dispatch mean as we just saw julia uses different multiplication method for different typed input right if you give it to float32 vectors it's different from it you gave it to integer vectors julia will still do the multiplication but use different methods of multiplication and we can actually do a small experiment here we can do a scalar multiplication 1 times 2 2. and then we can use this at which which which is a macro to ask julia which function did you run okay let's run this so in this script it basically the julia tells you if you want to calculate one times two two it uses this function which is x times y and they follow the same data type where the data type is a union of all the integers and this means basically julia uses the integer multiplication for this one times two two but what about this cell we still do the one time one times the two but we put them in float 62 sorry 64. then which of julia basically tells you now i use a different method to compute these multiplication i i use this multiplication of two floats 64 instances which is different from the previous one and then you can actually run this 1.0 times to 2. you asked julia which function did you use to run this multiplication julia will basically tell you now i use a generic function to run this namely i assume x and y are both generic numbers i have no assumption on what their type is and this is actually one of the instances one of the situations that we want to avoid before i insist to run things in julia because julia is very good at generating uh these com compile these functions where the specific types are provided into the functions so if uh for example this this one julia knows that these are float64s so truly has this specific way to compile the function just for float64 but if you actually provide two generic instances then julia doesn't have a specialized way to compute this multiplication then julia will basically choose the most generic way which is not good because you want things to be specialized in julia so julia can find the very specialized way to compute your multiplication very fast so this is actually kind of instance we want to avoid in julia and then we actually have another example which is a full example um we can define a full function which works for different kinds of input so let's again think about multiple dispatch in this case so i define a full function which takes two strings and print out my inputs or strings then we can see we define a full function which is a generic function with one method and then we can do full it takes two strings and it gives you the correct output but now if i want to run this full function for two integers the julia will basically complain no i i haven't noticed any method to run this full function for two integers that's because we only provide a function to run these strings however you can actually add another method to foo which works for the two integers basically you you define this line now the full 3 4 will work because we have this function right now and you can actually see the output from julia julia says well this is a function full it's a generic function with two methods where are these two methods these two methods are the full for strings and four four integers okay now you can still run the previous script uh you run full hello hi which is the thing we run just before and you can see this method doesn't get overloaded it's still there it's a different basically it's a different method so it won't be overloaded although it's under the same generic function okay um there are also another um example that i put here i'll skip it because um i think we already understand how julia multiple dispatch work but you can feel free to just test it out to understand it a bit a little bit more how to write julia functions in a smart way okay next thing i'll talk about is type stability so by type stability i mean we should always write type stable code such that julia can easily infer all the variable types in your function then julia can generate very specific kind of code to compile then run very fast okay suppose we have we want to define a function called function n which takes a sequence of numbers in this case and this dot dot dot is a spell splatting operation which separates out your sequence of numbers and then we want to compute the product of the entire a then we can define a function like this we define an inter intermediate variable c which copies a1 the first one and then for every um for every instance of this uh array we can do c times c equals to c times a l where al is the instance then we can compute the product of the entire sequence right okay let's run this then we can do a benchmark on this sequence 1 1.5 to 2.53 okay it takes this nanoseconds which is very short great then we can run this script which is a different x but the same input basically the same numbers now i run the benchmark here it takes a much shorter time than this line it takes only 200 then 300 so what happens basically we can see that all the inputs in this array are flow 64 where the instances in this array one two three they are integers so this is one of the situations that julia doesn't really like to run this kind of code you can actually use the macro called code worm type to see what's the problem so if you run function and x1 basically run this x1 which can consist of integers and float 64 you can actually see um a lot of red things in in the output which means julia warns that there's type conversion and several lines which julia hates to do but if you do the same thing for x2 you can actually see there's no warning in here meaning the code is type stable so let's go back and try to interpret what happens in this case you first initialize your c as the first instance of your um sequence right so if you actually run this x1 the first the c will actually be initialized by one which is an integer and then after you run the c equals to c times a l c finally becomes a float 64 because there are a lot of floating point numbers uh behind then judy actually did this type conversion which it has to do so julia isn't sure what kind of type what type of c it outputs then julia doesn't have a specialized way to compile this code it just uses a very generic way to compile the code in the end it won't be fast but if you actually initialize c as a float 64 like here it's 1.0 at the start then the c will always be a float city 4. so in the end things will run without any word type warning types so that's what type stability means so when we write code we want to write this type stable code that that julia it's easy for julia to infer every variable in your function then julia will find a very good way efficient way to compile your code your function okay that's something for intro to julia i'll directly go to the next one which is into to jolly and have a coffee okay after we talk about intro to julia i think people understand julia a little bit it has this multiple dispatch functionality that takes different when you define function you can take different kinds of input and that's a different method under the same function and also you want to make sure that things are type stable so that julia has this specialized way to um to do with to compile the code and this script is basically uh an introduction to this jolly jolly.jl package you can find it here under slim group actually you can find a lot of packages used today in the sling group github and it's a way to construct matrix three linear operators with explicit domain and range type control and apply them into basic linear algebra matrix vector multiplications so with this explicit domain range type julia actually likes it pretty much because it's always good for julia to know what kind of things you're running under the hoot what types of variables you're using right okay then let's in this script we'll talk about three things first how to build a linear operator in charlie and verify its correctness and then second thing how to solve a linear inverse from ax equals to b where a is not a matrix but a linear operator defined by jolly and third thing is how jolly interacts with machine learning toolbox like flux in julia which is the combination of pi torch and tensorflow and julia because julia only has one of them and optimization toolbox which is called optum.jl by chain rules a lot of details right here so basically jolia is a linear algebra package which inherits the same idea of spot if you if you familiar with matlab so spot is actually another matrix free well i don't think this is it but uh maybe this one yeah spot is basically another linear algebra matrix free package designed from that lab and surely inherits the same the similar idea of that okay let's begin by uh loading a few packages and try to build a linear operator so this could be one of the examples that geophysics cis are interested in about deconvolution so um we can actually do a convolution example in this script and try to invert for it which is deep convolution okay that's first by build a um convolution kernel and a convolution operator by the jolly toolbox which is this linear algebra framework okay let's first set up a convolutional kernel this is the time interval the length of the time and also the time vector and this is the convolutional kernel and you can plot it actually so basically julia uses the pipeline package for plotting on signals or images and it actually calls out this foreign library in python called matplotlib so the functionality is already the same because it uses this fluorine picon okay then you can see there's a convolutional kernel we just built for two seconds and then we want to build a linear algebra that this matrix three linear operator that does a convolution on a random matrix so that after i built this matrix i can multiply it with a vector x that's our end goal but we don't want to construct an explicit matrix because this convolution could be dense and right now this is very small scale example but imagine if you have a convolutional kernel of billion entries then your dense matrix will be billion by billion which is something you just don't want to store in the system right so the all of the ideas of concepts are purely is to design a linear operator which is not an explicit full matrix but this matrix three linear operators that are um that that provides the adjoint and forward evaluations of the um the linear algebra of this uh matrix linear operator so let's talk about convolution a little bit i think um if people are are familiar with fourier transform the convolution of two signals it can be easily computed in fray domain you can actually uh take the fourier transform of these two signals do an element twice multiplication and then you can do an inverse of the fourier transform this basically gives you the convolution of w and x where w is the convolutional kernel right here right then we can actually write these in a very abstracted way by drawing so we can so our end goal again is to design a linear operator that does the convolution with the kernel of w now we can write it in a very abstract way shown here we can't wait did i did i run this okay i actually run this i bet um we can actually define a freight transform matrix by joe dft in trolley so this is a linear operator provided in jolly that makes a makes a linear operator called f and this linear operator f isn't a full matrix but a linear operator equipped with forward and adjoint evaluations so it's matrix free and the second thing you can define with jolly is joel diag basically it defines a diagonal matrix with its diagonal as this so in this case we want to do an element-wise multiplication so we would define a digital matrix with f of t of w f of t as the fourier transform um so the strontiac basically acts this um element-wise multiplication on the free transform of x so in the end if we want to define convolution we can define this convolution by three objects three linear operators so first is a fourier transform we take this x to its frequency domain then we do an element-wise multiplication with the fourier transform w which is defined as a diagonal matrix right here w and in the end we do an inverse for a transform so that we get the convolution of w and x in the end and this is quite clean because you actually see this as the way you would interpret them in the linear algebra operations right this is a matrix a matrix adjoint matrix matrix although there are actually not matrices jolly actually use this as linear operators and they're matrix free okay so um in this way we can actually store these linear operators in much less memory because we don't have to store the explicit um entries in the matrix if it's if the matrix is full right and we can actually use this linear operators as a show below to do these iterative solvers and things like that okay then in case you want to build a linear operator which doesn't appear in jolly although i would strongly suggest you to check the drawing reference guides there are a lot of linear operators that are built in jolly already such as restriction operators compressive sensing matrices for a wavelet curve transform they're already built as linear operators in drawing so in case you don't find something that doesn't leave in the drawing reference guide you can actually build something on your own you can build a linear operator in surely on your own so suppose in this case we want to build a linear operator that is a full dense matrix although it's a it's a simplistic example it's it's actually a dummy example it doesn't make too much sense but this is just telling you a way to build a jolly linear operator so suppose this is the matrix and we can attribute a jolly operator a linear operator m1 by this joe linear function for a you will basically provide the size of the input and output which is both n make sense and then you will provide the forward evaluation of your linear operator and the adjoint evaluation of the linear operator so these are both functions this says do we input x i output um times x this is this says you input x i output you the adjoint of m times x forward and agile evaluations and you provide again the domain and range types and the name of your operator then in this way the m1 will be built as a jolly linear operator and then you can actually take the size of m1 domain type range type these are all julia native functions and all overloaded in jolly and then you can actually do the linear and adjoin test on this operator m1 so you can check here we we select two random vectors and then we do a linear test meaning i use this linear operator to multiply each of them does that provide me the same output as i multiply it with the um summation of them an adjoining test we actually use dot test basically we test whether you take the thought of a1 and forward evaluation of this m1 or a2 provides you the same output as you evaluate the adjoint first on a1 and then do a dot with a2 so these are the linear test and adjoint test to make sure that your defined function your defined linear operator is actually a linear operator right and after these test pass you can actually actually jolly also provides these testing functions so that it generates these itself and just gives you the test result so if it doesn't throw you any warning that means the test already passed okay so what's the point of building these linear operators let's think about it again so right here we viewed a linear operating jolly with specific input and output size domain range type and forward and adjoint evaluations right why do we want use why do we want to use trolley is these foreign actual evaluations might not be this full matrix vector multiplications it could be this abstract abstracted functions for example if we want to define a joe dft then the forward evaluation will be f of t on x which can be which can come from a foreign function call from this fft library of julia and this actual evaluation can also be i uh if t because the adjoint of a is inverse so these actually these functional calls are can be actually stored very cheap in your memory this doesn't need to store all the entries in your matrices then in the end your jolly linear operator can be cheaply stored in your memory while you can still have access to the forward and adjoint of your charlie operator and actually if the the jolly operator is equipped with forward and adjoint this is already enough for you to solve most of the inverse problems with these complicated fancy gradient based optimization algorithms so next i will show how how to solve an inverse problem with a built trolley operator so remember we already built a charlie operator a with the jolly toolbox that corresponds to the convolution with the kernel w right and now we can actually do a deconvolution namely we observe some convolution results called b and we want to infer which signal x does this w convolve with to get this b so this is deconvolution it solves an inverse problem as ax equals to b and we'll show that you can just use julia native iterative methods to solve these without any complications without any overload because things will work out of the box so let's first play around with this small scale example you have a ground truth x now in this in this script i actually um made a very sparse x so that we can test out different algorithms and um again this notebook is not a computation of different algorithms it's just a notebook to show that we have access to these algorithms and we can apply them with ease so in this case i set up a ground truth x which is sparse which is this long and then i pick 20 non-zero entries in this vector okay then i put these random 20 entries non-zero entries at random locations in the signal with random amplitude makes sense right and then i plot the ground truth x which is shown here so this will be the x and your inverse problem which you'll never know in your real life right and then you can actually convolve this x with the kernel w by applying a to your x then you have a get a convolved b in the end right and then actually you if you look at this line this is actually very plain linear algebraic abstraction code so you can generate a noise term in this case which is in the amplitude of this times a random vector and then you can actually generate a an observation vector b by a times x plus e so everything is pretty much linear algebra julia doesn't even notice that our a is not a full matrix julia doesn't even notice this is a jolly linear operator right because the charlie opera linear operator is already provided by the forward and adjoint evaluations although trudeau doesn't know where the entries are in this matrix it knows how to compute a times x because it has it's equipped with forward evaluations okay now we get this noisy measurement we can actually try out different algorithms in native julia packages for example lsqr now sqr is a very good uh algorithm to solve for least squares um inverse problems um to find a minimal two norm solution right so this lsqr actually lives in this iterated solvers package and then you can just supply lsqr a which is the linear operator again a here is a drawing linear operator it's not a full matrix and b is the observation you just generated right here and then you can actually run 500 iterations then sorry did i define it above did i run this oh i didn't run this okay i don't want things to crash okay that makes sense to me now you can actually run this lsqr with um your defined predefined charlie linear operator and observation b notice that this a is not even a full matrix so we actually kind of fold julia by providing a linear operator to julia instead of providing a full matrix julia doesn't even notice that but it still runs the lsqr and and iteration in each iteration basically truly evaluates the forward of a we provided already and also the adjoint of a to take the gradient we also provided that so in in the end you can just use the native julia lsqr method to so to invert this linear system and generate a solution then we can actually plot the solution from here you can see that well although there are some spikes in your solution there are a lot of these um oscillations which is white noise because um which which tends to be like a white noise because um your ground truth signal is sparse and your lsqr algorithm aims to find a minimal two norm solution so this is what lsqr can get in the end and then we can actually try out different algorithms we can try out a algorithm called spgl1 from this package um you can actually see we also provide the linear operator a which could be a matrix or a linear operator as we designed above and the observation vector b which is also something we already got and some options how many iterations do you want things to be verbals and then you can run this as ptr1 algorithm very easily because we already equip this linear operator a with the forward evaluation and adjoin evaluation then in each iteration julia already knows how to compute it although the a is not a full matrix so in the end we can get a solution from spgl1 which is sparse makes sense right because as ptl1 is an algorithm to promote sparsity in your solution instead of promote a minimal two norm like lsqr then in the end you can find a very sparse signal in your solution which is great right so we already see that after we build a linear operating jolly we can actually use this iterative method to solve for the inverse problem where the stroller operator is a forward operator is the a in your inverse problem right and then we can actually play around that a little bit with automatic differentiation and also machine learning toolbox so in this case we want to play around with a little bit with them with the package called chain rules so when we work with for example machine learning we we build a network we give an input to the network and this input is passed through these all different layers in your network you get an output you compute an objective function and then julia uses back propagation or any other for example pi torch or tensorflow they use this back propagation to propagate the residual the the objective function back to every layer then in the end when it's back propagated to the first to the input it's the gradient it will be outputting a gradient of your input then you can update your current one with the gradient by any gradient-based optimization algorithm right similar thing can happen here so we can actually try to make something fancy or complicated as you wish so we can minimize this two norm of ax minus b which is definitely something we saw before and then add a penalty term called lambda times one normal facts this might also promote some sparsity in your solution right because we added two a one norm penalty on your solution now things might be a little bit complicated because um if you want to solve this with an iterative algorithm um you have different choices and if you want to take a gradient with respect to x you can't because this one norm isn't differentiable but you can actually use some julia native on packages like optimum which provides you a lot of these optimization algorithms gradient-based optimization algorithms and as long as you provide with it a function value and a gradient like it tells you you tell them how to calculate your gradient then actually um you can use as f lb fts algorithm to solve for these inverse problems so let's show a small example um we we put lambda equals to one let's run this cell first we put lambda equals to one here and then we can write the loss function we tell uh julia what is the objective function you're trying to minimize which is a half of the two norm of the residual square plus lambda penalization parameter times one normal facts makes sense right and then you can also tell julia how to take your gradient right so this is an in-place gradient function that lbfgs algorithm needs uh in every iterations so g will be the gradient and if you add this sign it will be in place so uh provided with the current x we can actually use this gradient loss which is the loss function you defined above x which is your current x and then you provide one so this basically takes the gradient with respect to your x and provided the loss function here so we can do g gradient equals to this and we return the loss so basically this function is an in gradient function which outputs the objective function and overrides this g with your gradient so after you define these two functions one is for objective function the other one is for gradient you can actually use this optimize from this truly a native optimum package provide the function f and how to calculate the gradient the delta loss the initial guess which is zeros and provide an algorithm which is lbfgs and you add some options run for example 500 iterations then you can actually run this line this line works perfectly because you've already provided the gradients and the function value then the lbfgs basically knows how to um optimize for x provided the function value and the gradient and you can actually plot the solution here which also looks okay but there's something magic happening here namely i i think you might already have some question about this line like how does flux know to how to compute the gradient of the loss with respect to x this loss is such a complicated function well i understand automatic differentiation can differentiate through one norm or two norm but when it sees something like a which is not a matrix it's a jolly operator how does it know to compute the gradient with respect to this x it's complicated no one taught julia before how to differentiate with that right but i we actually add this uh overloading jolly by chain rules so if you actually check this script in jolly this is quite simple so this rule is basically a function functionality in chain rules package chain rules.jl so the idea is when you compute a times b which is the forward evaluation of your jolly linear operator you just have to tell it when you back propagate from the next layer this y layer you just have to apply the adjoint of your linear operator a on your residual and that's it because when you take a gradient with respect to this this linear a then you just have to apply the adjoint that's one of the simplest rules when you want to compute a gradient right so given this r rule we already we already taught julia how to differentiate through a jolly linear operator then julian since julia knows how to differentiate through a julia can differentiate through everything into this loss function and knows how to differentiate through norm how to differentiate through minus and since we taught it how to differentiate through a it also it knows everything how to differentiate this loss function so right here this comes very smoothly and then you can use this f of bfgs um to solve for this inverse problems so i'm not claiming this is a the best algorithm to solve this kind of problem but we just want to show that we can do this uh very smoothly and then the next thing we can discuss is the ad capability meaning um we can actually incorporate these two jolly linear operators with other complicated networks and such so if we if you actually read into some machine learning um literature there there's a method called deep image prior that was raised in 2018 by these authors and in our research group we also incorporated this deep prior um thing for for seismic imaging so we use this deep prior as a regularization parameter for um seismic imaging you can check this publication for details but the basic idea is instead of solving ax equals equals to b we solve a times our theta z equals to b with a fixed z a random input and we minimize this residual with respect to theta and after a lot of iterations we get a very good theta that fits this residual then f theta z will be our solution so actually we can implement this with flux which is the machine learning toolbox in julia so we can define a very simplistic network right here which is dense network take input 2n and output n and then we can initialize z as a fixed input like here okay then we set up f the network in train mode and theta will be flux dot params basically theta is the um network parameters of this app so f here is a dense uh network then theta will be the weights and bias basically this is a simplistic network if you want to really get the full functionality you should put f as a unit or something but this is just to show how you can run things so you can actually choose one of the optimizer from flux namely atom with a learning rate this is the optimizer and then for 500 iterations you can actually take the gradient right here so again the loss function will be a half times a times f theta z minus b square so this is the half of the square of the residual and this is a one norm well you can play around a lot of things but this is just um to show that we can make things work right we can uh promote the sparsity our result which is fc and then we can add another penalization on the true norm of theta to make theta as a white noise well you can choose different parameters here but in the end the takeaway message is flux and basically knows how to take the gradient with respect to this complicated function value because we've already taught flux how to differentiate through a and flux knows how to differentiate through f because it's network and flux knows how to differentiate through this norm because these are all flux functionalities like they they are also defined in pi tolerance tensorflow right and then flux can differentiate through everything in here to get a gradient then you can update the parameters theta which is the network weight by this flood optimize update and then if you run this 500 iterations you can get a solution by fz so you set up f in the test mode and then you can actually calculate this f z this f z will be your solution right here so you can plot this solution well it doesn't look that good because i i definitely believe there are a lot of parameters or network choices that you have to carefully choose but in the end we show that this also works like you you can actually um incorporate these jewelry operators with very complicated networks you can actually work on different layers you can apply a network on the left hand side of trolley operator um doesn't matter although things are complicated uh flux will know how to differentiate through the complicated network because we are we've already taught how to flux how to differentiate through the straw orbiters so if you want to understand more about jolly you can look at the reference guide and there are also some other applications of jolly for example loophole road seismic imaging where the seismic imaging operator is defined as a drawing operator and it's intertwined within a very complicated network and you can also read this publication that describes the basic concepts and ideas on how to do these software abstractions trolley and some um application scenarios okay that will be the um tutorial for jolly um we we can actually take some questions if there is any but i'm i'm okay to proceed to the next one if there isn't any question okay um let's then proceed to judy these these names are actually very easy to pronounce right okay 2d is a framework that is built for a large scale seismic modeling and inversion and it can actually scale to very large 3d industrial sites and our research group actually uh collaborated with um with a lot of people to make this happen for a large-scale 3d problems for example and then i want to say that judy is actually based on the veto because when you do this seismic modeling and inversion you solve a lot of wave equations right and judy actually interfaces with divito namely the weight equations are solved in divito and truly actually does this by forming pi calls so in julia you can all actually call python lines or packages or functions very easily and then um julia a 2d actually adds these linear algebra abstractions so that these modeling operators or migration operators they're defined as linear operators and you can use them in a very abstracted way so for example if i don't understand seismic migration or full informal version i don't understand how that process is going on i can still work with these abstracted operators and try out different algorithms to solve the problem so let's go into some details the 2d software is published here in the geophysics journal and you can also check the 2d reference guide so in this tutorial i'll basically talk about how to set up this geometry acquisition in a small seismic experiment and how does john judy do these software abstractions so that you can write code in very clean way and also talk about a little bit about the parallelization in julia we'll set up a mini cluster to run some of these experiments so let's start by loading these packages and talk about the uh um the structure of jolly so we first want to set up a physical grid for simulations because we work on true physics and these so so as as the tuesday tutorial we already showed divito uses finite difference operators to solve for wave equations for example so we like here we we will set up a grid which entails a shape of the grid um so you have these many grid points in your system and you set up a spacing so in this case it's 10 meters by 10 meters in your model and you have an origin i typically set up to be 0 and 0 because we can always assume the starting point of your model is 0 meters in the lateral position and 0 meters depth and after you run this so this basically is some metadata you want to you want to set for your model right you you you have a velocity model you want to propagate the wave views uh you have to know the spacing of your discretization and the shape of your model right and then the next thing to set up is a physical object so 2d actually defines a few types to handle these uh physical objects that you will see in these seismic modeling and version one of them is to set up a model and some other ones including julie vector are also quite interesting so the type physical parameter is a is an abstract vector which behaves like a standard vector and it is used to store the velocity models or square squared slowness models if you work with acoustic wave equation so by abstract vector it actually works actually the same way as a standard vector but it entails some metadata so we can first set up a velocity model like this we can set up a velocity model that is um water speed so this is kilometer per second we can set up a velocity model as water speed at the first layer and then the second layer so this is a 2d example so first dimension is x lateral position second dimension is depth z second layer is two kilometers per second third layer is 2.5 kilometer per second then you can set up a physical parameter here which is vp the p-wave velocity and the spacing as we defined above 10 meters and origin okay then we can use the plotting tools in julia to plot the velocity right here so we can first plot vp which is a matrix right it's a it's pure julia array and the second thing we can plot is this capitalized vp which is a physical parameter defined in 2d which entails metadata so if you plot them you can actually use the same syntax and they will give you the same thing because this vp as a physical parameter behaves exactly the same as if it's a pure julia vector right and what's what's fancy about this bp is if you consider the capitalized vp is a physical parameter right it does not only contain the data itself namely the velocity also contains the size and origin and spacing this metadata so it's a vector equipped with metadata but if you just consider the vp here the small small of ap which is pure trulia matrix defined here your vp is actually just the matrix itself it doesn't provide you any grid spacing information or where does it start or what is it for this is just a julia matrix you have no idea what it stores right but this vp capitalized vp is a physical parameter that you can you know it comes with the origin with a spacing oh it's a it's a model in in the earth subsurface so basically this is the earth subsurface water layer and two sedimentary layers that's the benefit of these abstract vectors so if you want to know more about abstract vectors i also suggest you to look at the julia's documentation and as we just discussed you can do all sort of things on these abstract vectors vp which is a physical parameter but it's just abstracted right so you can do all of these computations norm take the maximum minimum uh to a dot multiplication to a square you can also do all of these these these um linear algebraic things onto this bp although it contains metadata and knows how to do these things for sure and then you can actually set up a model structure that wraps um these different details right so the model is contains a shape a spacing an origin and right here i put square slowness because that's the way how judy solves wave equation it's pretty much parameterized by square slowness and nb is the uh padding grid on the sides so that you don't get boundary reflections and this is the model now we have a model we can generate shot records seismic data right okay and in this script i will basically generate the shot records on the fly by setting up the source receiver positions but in the next one i will talk about how to do this with segway i o files okay for 2d simulation everything the wave propagates in the 2d plane right but we still will set up a y direction which is always zero because it's only a slice okay now in this case we set up 11 sources and we set up sources uh equus space distributed on on the surface at um 12.5 meters okay then um judy actually uses these uh acoustic sources as cell arrays so you have to con convert the sources into cells because uh they're far i assume they they fire independently but if they fire uh simultaneously then you don't have to convert to cells so convert to cell is basically to set these sources to fire um independently okay let's run this cell and then let's set up the ocean bottom so we have 101 receivers in this case we can also evenly distribute them at the ocean bottom again y direction is zero because this is a 2d case although true you can handle 3d case so if you work with 3d then then this tutorial uh it doesn't have enough time for for us to run the 3d simulations maybe well while inversions definitely not okay and then you set up a spacing times 66 which is the ocean bottom grid times once so this basically means you put your um ocean bottom nodes at the ocean bottom because this is the index of the ocean volume okay then the next thing is to set up a recording time and a time sampling rate of your shot record so this is this to two seconds and four milliseconds okay after we set up the source and receiver geometries we can use these geometry objects in julia in in judy to wrap these up so this is to set up a source geometry x-source vice versa z-source sampling sampling rate record time and this is to set up a receiver geometry so in this case we assume the all the ocean bottom nodes are seen by all the sources so we'll just put in source equals to number of sources equals to 11 right here so that all the receivers all sources see the same receivers okay uh after we run this ah i think i forgot to yeah this this is um it happens sometimes when you work with twitter notebooks you will forgot to run some cells but yeah shouldn't be any error okay now we can visualize the geometry onto the model like this so this is the model as we just discussed receivers sources great now we can set up the source wavelet at the source locations right here okay we can use a record wavelet which is also defined in judy you can use different ones and this is to set up a uh central frequency of your record with it and you can plot the wavelength right here okay it's a bump cool and now we can set up these 2d vectors so 2d vector is also a structure that stores the data in this case wavelet but also comes with metadata so if you set up this 2d vector with the source geometry and the wavelet it will basically tell you this is a 2d vector with 11 sources so again you can access the wavelet by 2d vector.data you have 11 sources with each wavelet and this 2d vector also comes with geometry and you can access the locations of the sources by this geometry so again this is not a pure julia array but it stores its data as a jewelry array but it comes with other geometry instances and now we can actually do this linear operation linear operations uh for this wave field simulations so if you are familiar with seismic the the way that the data is generated is by first you have a source you inject the source into the whole field by this ps adjoint which is injection and you solve a wave equation right here am inverse a is the wave equation parameterized by sloan squared and then after you apply this add this inverse of wave equation you get the wave field everywhere you restrict the wave field by this projection operator on the recording time and recording location namely receiver locations and these are actually all linear operators although this m is this am is not linear with respect to m but linear with respect to the source so we can actually build this as we speak so we have this pr 2d projection which is protect to receiver geometries ps which is another projection which is projected source geometries its adjoint will be um and uh will be extraction from from this source q it basically projects the source queue into the entire subsurface and a inverse is the judy modeling takes the model so model is this n and also metadata and this 2d modeling basically defines this a inverse q okay these are all defined then we can check the math here p r a inverse p as i joined q okay do the same thing p r a inverse p s adjoint hue this will generate the data so all of the idea right here is this pr a inverse ps are abstracted linear operators by drawing so there are linear operators they have metadata they have adjoints and forward evaluations and julia doesn't even notice that they are they are linear operators julia just sees this sentence and say well i know how to run this there is no problem with that then after you run this you can actually get the data in the end so this is the diops which is the which is also a 2d vector comes with the shot record at from different sources and also the metadata so it comes with 11 sources and also metadata you can also combine these three operators in a single operator f okay you get a forward modeling operator which directly maps the input source to the output recorded data and then you can actually well this takes a while so i don't plan to run this so you can actually only run one of the source experiments by um by taking the index of that for example the six is means the sixth source and your 11 sources so this is basically taking the middle source and also q is taking the middle source so in this one in this way you can only run the forward modeling on the six stores on only one of them so right now you only get one source so this d op six will be the same as if you have d ops and takes the six of it okay then we can actually plot the data right here at different locations you can see there's two reflections first one and second okay next thing i want to talk about is parallelization so you you might already notice that this since this source is far independently you can actually fire them in parallel in the computer because they don't really interact with each other so in this case we're using this package intuit called distributed and you can actually set a number of workers in julia so in this case uh um we we first take out number of threads in julia and we set up two workers and and workers equals to two and then we say okay i want to display uh display the environment variables right now and i want to bind the threads into these workers so basically i have 12 threads on my desk laptop then six of first six of them will bind to the first one the rest will bind to the second one and this add frogs basically means you add the processors and you add two walkers into your environment and you can actually show these workers is number two and number three because then the main process is number one and then right here you basically um tells julia how to bind your threads into the workers so this this basically means my i have 12 threads namely 0 to 11 on my laptop the first six of them is bind to the first chord the first um sorry not chord the first worker and the rest of them from 6 to 11 is binded to the second word okay great and now you have two workers number two and number three you will use everywhere so that they all load the distributed package and judy package okay now we expect that this can be run in parallel namely i have um 11 sources maybe the first worker can run five sources second worker can run six sources because they are in parallel right okay now if you set this up you have two workers then this will run in parallel namely the workers will run independently for four different workers we set up a routine so that things run in parallel in judy okay although you have to have enough memory especially for migration because if you don't have enough memory to run two things in parallel then these two workers will compete for the memory then in the end you still you still don't get parallelism but if you have enough memory uh you run migration things will run in a parallel way so the next thing we can show is the reverse time migration which is um can be which is rtm so reverse time migration is a seismic migration method to map the deeping events in the data as we saw about those five figures to the earth's subsurface reflectors and the way we set this up is to set up a 2d jacobian with the forward modeling operator remember this f is p r a inverse ps that three the combination of these three operators right and this q is the 2d vector which is the source and then you set this up this is um the fourth of it will be the linearized form modeling and the adjoint will be the reverse time migration so basically you can again write these very clean algebraic abstractions this is your data you want to do a migration why don't you just apply the adjoint of the jacobian times the ops perfect that's that's a migration and that's written in a very abstracted way that if you understand linear algebra you don't have to understand cycle migration you can already do this reverse time migration and then this might take a while because it's 11 sources okay and then it generates a physical parameter again of with some metadata and then you can actually plot the reverse time migration image right here these are the source imprints this is the first reflector in your image and there's also a second one which is very dim but it's right here okay so this is a tutorial for judy basically the takeaway that message is you can set up the source and uh source and receiver geometries wavelet and julie in judy following this notebook and also you find that these seismic theta generation is actually very you can write this beautifully with these linear operators parameterized by this ties wave equation right and our concept is since you can write this in linear algebra you should also write this same thing in your code namely here so that's our ultimate goal of writing code as clean as possible and as informative as possible because while we read the code you know what's going on because these are linear algebra abstractions okay this is the introduction for judy um next one i'll talk about segway io so as we just discussed in the previous notebook uh we discussed how to generate these um site um these uh seismic data from setup source and receivers and then um we can also do it in another way we can actually load a cycling data from segwi file through this segue io package so this is a package um again under uh the screen group but um it's basically you you can find different versions of these types of segway ios for example there's also segway i o in python so this is a julia package for read and writing the segway files which are commonly used by seismologists so they have pretty pretty complicated file headers and things like that okay and uh we're gonna show how to interface these segway files into judy now let's begin by loading these packages judy segway io and pipelot and then you can run this script so this is again to run shell in julia you can run this thing to download the data and unzip the uh data into your system okay i think i've already installed them so sorry i downloaded them so i don't have to do it anymore and then there's a function called segby read which reads the segway file over here raise it into a shot let's do this so you can see this shot is read by the segway read function as a sec size block so again the size block is a structure that contains some data and also metadata namely the file headers you have uh for example sampling rate you have number of sources you have a lot of these metadata right here and you have different file trace headers yeah and you can actually access these file headers by running short dot file header then you can extract these file headers out of that right and then you can actually run short trace headers 10 to get the 10th one in your segway file so the 10th shot in your segway file makes sense right it denotes the source position [Music] and receiver positions and sampling rates for example okay then we can also extract the extra headers from the segway container namely you can you can get the get the header by shot get source x so this basically gets the source x from all these complicated paragraphs similarly you can get receiver and time sampling rate and you can set up number of time samples times uh your time sampling rate this will be the recording time so you can extract these file headers easily by this get header function and then in the end you can plot the seismic data stored in the segway file by calling dot data on this segway blocks that is loaded in the segway i o so you can actually do a plugin here the shop.data basically takes the seismic data and this this float32 basically translates that into pure julia arrays and you can see this is a source at this number of meters uh the time goes like this so the source is firing something like here and you can see these structures is a bp data so it's very large it's it's recording it's uh eight seconds i i personally never worked with that long of uh recording data okay and we can also have this on segway scan so the scanner is designed to be multi-skill meaning that your workflow will stay the same regardless of the size of your data basically so you can actually do a directory scan so this is the directory that contains a lot of seismic data and you can have a file filtering so there will be a lot of this segway file called bpti something and you can actually filter them and you can choose which keys you want to get access after the scan so you can choose these many keys and then you can actually scan the entire directory right here with the file filter of bptgi and only scan these case so by running this line you can see you can scan a lot of files in this folder dpti with one two zero zero one two zero one a lot of segway files in this system and you scan for these keys and with this filter of uh with the filter of bptgi then you can actually see you found all the files in the directory and you can actually check how many shot records did you scan in this period of time you scanned 111 shut records and then you can actually extract the global information such as source resistance from all of these segway files by this okay this is um x position lateral position and depth well we always fire the source on the top of the model so it's uh zero and then we can actually wrap the side segway container into 2d vector and use it for inversion so this is quite simple we just load this segway lu by scanning right and notice that we we haven't read the data into the julia environment yet and because we don't want to do that because this might take you a lot of memory since the cycling data is very large and you have multiple of these sources this will blow up your memory really fast so 3d vector can still take this seg ylu which represents the segway files but you haven't load the data into the memory yet okay so now the dti will be a 2d vector and you can take a number of sources into the vector geometry and data into the vector basically now um if you create now this is a duty vector with metadata but its data is not loaded yet so i'll explain that in a little bit but basically you can set up the source geometry by applying the geometry in judy onto the data you have take the key of source and you can set up a record wavelet based on that so the skew will be a 2d vector setup based on the source geometry of your of your segway files and then the interesting part is you can we haven't load all the data into the journal environment yet because these segway files are just pretty large but the way you can work with it is you can load one of them for example shut id equals to 47 out of this 101 and then you can do a get data so that this 2d vector loads the 47 shot records into your julia environment so if you load it one by one it doesn't blow your memory that much but if you load entirely one-on-one sources into your system then it's your your julia system will basically find a very hard time to to take these all into your memory right and after you get the data you can actually get the source location by calling because this shot again is a duty vector it's equipped with geometry this metadata you can get the source locations and receiver locations quite quite clear right and then you can actually plot it in like this so you plot the data since the data is already loaded into the memory now now so you can plot it with correct labels and such so again we plot the same piece of data but the first time we actually directly load the data into the memory the second time we scan a lot of these cycle files and then we only load one of them into our memory so we we get still the same short record because it's at the same location okay um that's so much for segway io if there's no question about that i'll go into my last tutorial about normalizing flows so the idea of normalizing flow um was actually um raised quite recently like in this decade and our research group obviously noticed that the normalizing flow is quite interesting because it's a very powerful generated network that can be used for geophysical applications for example when you solve an inverse problem these normalizing flows as a deep learning tool can provide you with a learned prior according to your pre-assumptions on model or pre um or or training data sets so this normalizing flow is actually a very powerful tools and our group has a couple of uh publications on how to use this normalizing flows for size bank imaging ultrasound imaging and photo acoustic imaging a lot of these um imaging and inversion problems that is involved in uh um inverse problem compute community so in this script i'll basically show how to uh to abstract these normalizing flows in the package called invertible networks right here so i think it's very well documented and it shows a lot of publications that use this invertible network idea like seismic imaging and um uncertainty quantification a lot of things like that and feel free to browse in to to check the latest work so um let's start by loading these packages again we should do it all the time and there's also another youtube presentation and juliacon 2021 if you are interested to see what's going on this is not only this can also be applied to other communities like medical imaging and stuff like that okay so first um don't worry too much it's uh a normalizing flow so let's discuss what's what is normalizing flow first so normalizing flow is a type of invertible network that contains a series of invertible layers so by invertible network i mean suppose you have an x you apply this network g theta you get a z then since this network is invertible if you apply g theta inverse on your output z you get the same x back up to numerical precision so that's why we call it invertible network and the forward and inverse evaluation of these g theta um neither of them should be very expensive so they should be both cheap and then why is this um normalizing flow very important uh it's because after you can actually apply these normalizing flows to learn a distribution of something for example if you have a lot of these cat images like different cats capturing different motions and you want to learn the distribution of all the cat images so that you can detect which cat if i provide you with a cat image you can detect whether there's a cat in the image or you can say how likely there is a cat in the image then you may want to learn the distribution of the cat images so the idea of the normalizing flow is after you train you you train the normalizing flow to map your cat image x by the normalizing 4g theta to a latent space z which is the normal distribution so your i your concept is to map the cat distribution to gaussian distribution then in the end after you train this network suppose you you train this very ideally you can sample new cat images from the gaussian distribution because you can just easily sample an instance from the gaussian distribution namely white noise and you can apply the inverse of your normalizing flow then it will give you something in the cat distribution namely cat which you've never seen before so this is the idea of the normal normalizing flow our group also extends this idea for seismic imaging namely you have different seismic uh you have different earth substructures you might want to learn a prior distribution of the earth substructures then you can actually provide a lot of these earths reflectors layers and then train this normalizing flow and after that you can draw samples of the earth subtractors by passing a white noise through the inverse of your g theta your normalizing pole so this is the idea for um why we train why we want to train online flow and in this in this notebook we're going to do a very fast training on a very small scale example we're going to work with this 2d rosenblock distribution because training all these cat images or cycling images will take you easily thousands of dollars now we can do this training right here so we set up n um training points as 60 000 and we call this a banana distribution which is actually uh called rosenbrock this 2d rosenberg distribution if it's in formal terms so we call it banana distribution because it looks like a banana so most of the points are clustered over here and it follows like this shape so this is basically this a probability distribution in 2d and we can plot it yeah for sure and now um the normalizing flow actually is trained to map the points and this distribution to a white noise in the latent space that's the that's the concept of normalizing flow because it's trying to normalize the input that's why it's called normalizing flow so the training is based on some change of variables formula and expectation and you you approximate this expectation by um some something like uh mc monte carlo samples there are a lot of math but i know this this is not a math conference so basically um a short answer to what we're training for is we want to pass a random samples from your target distribution namely this banana distribution you pass the samples to the normalizing flow you got get an output and you want that output to be white noise namely the norm of it should be uh you should be one so in the end that comes with the optimization function so the this two norm on the output of the normalizing flow corresponds to the gaussian distribution and there's log determinant term according to the change of variables um there are some math but i i won't go into there uh too much detail so this is the way you write a loss function in um uh in the julia so you do g which is the invertible network you do forward on x it will provide you a latent an output in latent space called z and the log determinant term and your l2 loss which is assuming your output is gaussian distribution is this two norm square on z and uh you can basically back propagate your um your loss um through this dz and z so basically this this sentence will set up the gradient for um g with respect to its um um uh network parameters so in this way this loss returns a uh l2 loss on your [Music] gaussian assumption on z and log that term so next we can set up oh did i run this cell yeah okay next we can set up these network structures uh we have uh input channels because this 2d this will be two hidden layers well some missionary details flow steps some details and this is the glow structure taking all of these so basically this g will be the normalizing flow that you want to train and its parameter is theta which is um hiding below okay and then you set up some training parameters maxwell filtrations learning rate optimizer you want to use atom and then you can train right now so basically in each situation you pick some random batch from your training set here you compute the loss function right here and then you set up the take the terminal log and then you're printing it out then right here since this this loss already computes uh the sets up the gradient of the network parameters you can just do update for p in all the parameters in g you update all of them so let's run this now this might take a little while so this is to basically print out every 50 iteration in your training and you can actually check your training object to log to see whether you get convergence or not so you have various ways you can train your network to see the objective convergence of the objective and you can also check the normality of your output because your normalizing flow aims to output normal an instance from normal distribution so you can just check whether you pass a training sample in your distribution through the normalizing flow does it provide a normal distribution okay so this training seems to be quite fast because we only need 400 iterations um okay i think it's already done it's train since the 2d example should be fast okay we can plot these uh these metrics to see whether we get the convergence again these are from that from pi plot which uses the web map lib library in python so you can check the l2 norm of your output of your normalizing flow and you can find the l2 norm of your output is roughly at one which is what you want because you want normalizing flow to output gaussian noise and gaussian noise has a l2 norm of one right and you also have a log that term and the full objective okay the full objective is going down which is cool and then you can actually test out the normalizing flow right here so you can draw 500 test samples you can general uh generate them by random this basically rent and will generate random uh samples from gaussian distribution and then you can pass them uh you can implode them first to check that they are actually gaussian distribution samples so this is z1 this is first dimension second dimension well this bump looks like a gaussian bomb right from gaussian distribution and and the test time will pass these samples in the gaussian distribution by the inverse of the normalizing flow remember we have an invertible network so by passing these gaussian distribution samples through the inverse of the network we're supposed to get the samples in the target distribution so let's try that so again g dot inverse on these tests test examples we plot above this will generate some samples called x test which should be in the uh banana distribution and then we also have a pretty figure that shows we um how these samples are matched to the ones so basically this is the this is the output of your uh this is from um this is white noise this is from gaussian distribution and you pass them through the inverse of the network then it will generate something in the domain where you you should see a target distribution so this shows how they are mapped to each other great and then you can actually check whether the generative samples uh of x actually works so the blue dots is actually uh the correct um samples from the distribution and the yellow samples are actually generated by random sampling white noise from the latent distribution and pass through our trained normalizing flow so this you can see that these yellow dots which are generated samples are pretty good they fit the distribution very well so in this case we train a very good normalizing flow and you can generally apply this idea to image to image mapping so basically you map a cat image to an image of white noise if you train that very well then passing white noise in your latent space to the inverse of the network will give you the cut images the generative samples of cat images also very well so these um ideas have been have been applied to photo acoustic imaging seismic imaging wave wave field wave equation based inversion things like that and one of the fancy things about invertible networks is since the network is invertible you don't have to um when you do the back propagation it's quite simple because suppose you have a very large network um in your forward evaluation you pass your input to every layer in your network you have to store the intermediate variables of each state each layer then after when then when you do the back propagation you will um compute the gradient according to the uh value of that layer so that you can back propagate the gradient then the memory might blow up if you have a very large network and you have very a lot of these layers so for invertible network since every layer is all invertible you can you you don't have to store any of the intermediate state in your invertible network when you do this back propagation uh you can recompute the state at each step because things are invertible every layer is invertible in in this case you can back propagate and recalculate every state and you can still propagate the gradient into the input and then in the end with while training the network you only have to know the input and output of the network you don't have to store anything in between in the intermediate layers so this is one of the benefits of training a normalizing flow so if you use the networks and the invertible networks package you'll find that benefit in your um your results ultimately okay so um this concludes all of my tutorials i think i pretty much uh land on tank um and i appreciate any questions on slack or you can um reach out to us on our github repository so if you find anything interesting in the um in our software stack you can actually check our github repository and if you find any issue you can open issues or pull requests under the github repository and welcome to talk to us and thanks again for software underground and agile for giving me this opportunity to share a lot of the research work in my group i hope i still catch everyone all right so thank you very much if you can stop your share cool thank you very much um that was definitely a bit of a whirlwind tour lots of interesting cool stuff in there so you know if you haven't seen julia before hopefully this has wetted your appetite do reach out um to the slim group you can also have a look on the software underground slack the links are down below if you found this useful please do feel free to share it if you're watching this live then you know these videos will stay on after so you know they're not going to go away so feel free to share with people who might be interested we have one more talk later today um and then there's still some other things related to uh transform 2022 happening but um this was the second last of our tutorials so do go and have a look at the ones that have happened throughout the week um they've been some really good there's been some really good stuff um but otherwise thank you again to francis for a very interesting talk and we will see you around for something else have a good day everyone cheers
Info
Channel: Software Underground
Views: 1,678
Rating: undefined out of 5
Keywords: conference, tutorial, julia, geoscience, programming
Id: HyWfp3NzIbg
Channel Id: undefined
Length: 120min 55sec (7255 seconds)
Published: Wed Apr 27 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.