Numba - Tell Those C++ Bullies to Get Lost | SciPy 2017 Tutorial | Gil Forsyth & Lorena Barba

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so I'm Gill I that's - mr. Barba I am a graduate student at George Washington University sorry V George Washington University in Washington DC they get angry at us if we don't use the V welcome and we're here today to learn about number has anybody here ever used number before show hands okay people awesome but mostly new to it that's good because yeah so you feel free to help out and that's never a problem so if you haven't done this already start right now and you may catch the second half of the tutorial it does take a little while to download all this stuff so to do that for sure when I write teaching notebooks especially I am very explicit in my imports for anybody with as much muscle memory as we usually have this can be confusing I'm kind of sorry but I'm not I think that this is better for teaching so just be aware if you type and peep out you're going to have to rewrite all the imports at the top and notebooks which is you know your prerogative but just warning you so what is numba numba is a JIT compiler for Python like I just read bullet points so it uses LLVM to compile down to something that's pretty fast a nice optimized machine code and it is well integrated with the rest of the sort of scientific Python stack and it's really fast most of the time it's an incredibly useful tool it's important to know the number is not Allah like pi PI a like full JIT replacement for everything it doesn't get it like your entire code unless you tell it to it's more on a function by function basis so the idea here is that it's really targeted at really at math right it's not just gonna you know it's not going to make your dictionary lookups fast or anything it makes your math code faster on top of this because of the way LOV m works it will do better or worse depending on your computer so the number of cores you have will come into play a little bit later I mean just be aware that like timings and differences will will be manifold also depending on sort of what instruction sets your CPU has available it has SSD 3 SC 4 all that stuff LLVM will take advantage of as many of those things as it can if you didn't understand I word I just said don't worry that's fine just know that like you'll see between you and your neighbor even on the same machine depending on if you have with the slack client like you know eating you know 3 gigs your RAM right now you will see some slight differences in performance and that's normal you should what you want to see is a an improvement over you're like vanilla Python performance you don't see that raise your hand we'll take a look at it do as I say not as I do you really want to write tests especially before when you're going to be using number or really any kind of like high-performance tweaking code generation anything especially with numba as we'll see when you have errors in a function you've written the trace back is going to be inside of number which is not going to help you at all I mean they've gotten very it gotten much better at trying to track down exactly what line of the code you wrote is maybe causing a problem but it's not perfect so I mean just generally always write test that's a really good thing to do yeah this is going to be we're going to do some very unhype on ik things you know especially after you you know spend all these years learning how to write you know nice array operations and numpy we're going to undo a lot of that and it might feel really strange yeah ok so having said that I will attempt to exit this cool all right oh right um in read the title of this tutorial if you like C++ that's great you're more than welcome here there's no problem with that C++ is incredibly powerful I suck at C++ so you know just put that out there and I think it's devilishly difficult and really easy to get things wrong what I specifically mean about this is that a lot of us well we're here we write code in Python we write something we prototype something it's awesome but now that it's time to actually like make it been a run for real net well now you have to go like rights like the expensive parts in C++ or in Fortran or whatever and what I'm hoping to show you is that that is no longer the case okay so my my prototypical C++ bully is the person who like you know you know turn their nose up at you and says well you're never going to get much performance out of that Jenga I mean so forget that all right so those are the bullies everyone else welcome we're happy to have you okay so if everyone would like to go to the notebooks folder and we're going to open this first notebook here when we're to use number people probably can't read that very well can die bigger okay so to begin numbah we're not going to look at number at all we're going to start with profiling who here has ever used a profiler and python okay cool so the idea here is is that you need to actually know what's slow in your code just beat it up you don't want to waste your time trying to like tweak you know if you squeeze all this performance out of something that ends if it was only taking like a microsecond anyway like you just wasted a bunch of your time the idea is you need to know what is running slow before you get started with trying to optimize your code you need to know what to optimize and so that's where profiling comes in all right so we're going to look at three tools there are many others this is just sort of a bare-bones introduction so they are see profiler a see profile line profiler and time it time it is built in to Python and also the notebooks and as a see profile line profiler hopefully you've installed if not you can do one of these two things can everybody read this okay can anybody not leave this and I guess you have it in front of you but still all right all right so I'm gonna hide this thing here yeah please raise your hands if you have questions feel free to interrupt me as well okay so here is some really horrible code that I wrote it's not the worst code ever written but it's close so I have three functions the first function just sleeps there half a second the second function sleeps for a full second is not very performant the third function is called simulate which is in the word it performs a thousand by thousand matrix multiply for no reason and just discards the return value of that and then iterate over a number for no reason and sums all the integers and then it calls to functions asleep and then it returns so you can probably look at this and gather there are a couple of bottlenecks in this this is this is a situation where you might not need use profiling just as you can spot it but for the interest of demonstration so if we run this and I want to sum the first 150 integers that should be faster than it is I think we know why but let's let's dig a little deeper so if we're going to use C profile that is the the built-in Python profiler it gives you a sort of a function by function break down of how many times each function was called and the amount of time it took each time it was called and then the cumulative additions that it made to the total runtime of your program so you just import the profile and then you pass it sort of like an eval string effectively here of what you want to run and then thanks the Jupiter folks you get a nice in line response here and you can see there's a bunch of stuff happening here this is also made slightly more confusing when in the No because some of the code is referring to are going to be ipython cells which makes it a little chunkier to look at but the thing you really want to notice is that there's sort of you have this n calls column so that's the number of times each function or method or whatever was called and the total time that was taken by those number of calls on that function or method and then it gives you a protocol per call breakdown and how it contributes to the cumulative time so looking at this we can actually see that I mean the culprit is right here right this is what's taking the vast majority of the time which is you have to be expected so I wanted to mention yet if you have a bigger program C profile can be hard too if you was wearing like a really large program it can be hard just to like read through it all snake is which I'm not gonna show you right now is a really cool tool for visualizing that in the browser so I Lee recommend it if you want to profile larger things but the C profile gives you this function by function breakdown but sometimes you have a function that is slow but you don't know what parts of it are slow I mean most of our functions aren't one-line long like they are in this case so in that case is we want to look at line profiler so you can start in the notebook you need to load the extension here to make it available to the notebook so that's just run this line cute one second right yeah this little lock demo here so anybody have any questions you're using no none at all okay okay we're back six okay so once you've actually installed line line profiler and followed your own instructions to the tee then you can profile a function this way so the syntax is you have the lp run magic which is made available when you load the extension and then each time you give a dash S call here those are the functions you want to profile so I want a profile bad call and worst call and then you give it a something that will trigger those functions that will cause a function to be run but you have to run it to actually profile it so so I'm not actually profiling some you light here I'm only profiling these two functions that are sort of called by some you laid so you run that and then you get this hard to eat on this projector I'll open it a separate window it's too small and so this is the sort of thing you get I mean this is a silly example because there's only one line each of these functions but the idea is that for each function so for bad call here and for a worse call here it will go line by line through the function it will show you how many times each line was hit how much time that took the percentage of time that took in this case we're pretty sure we've narrowed down which lines in each of these functions are causing us trouble okay yes great question so some things you can do you can also instead of doing that you can write it out to a text file we can also get to that thing too but um so if you give it a dash T and you run that then they'll write it'll also open this up and Langley but then you do have you can sweep it right down to a text file if you want to save it there's also a dash Q flag which I think I have a little further down which will quiet that screen output right okay and the other thing you can do so this is in the notebook but most of times especially if we're pro funds were probably not in the notebook or might be I mean you work however you want to work but in that instance which this is not the right way to do this oh all right okay I'm going to come back to this at one of the breaks because I'm missing a file for some reason not a huge deal okay and then finally we have time it time it is not perfect at all but it is helpful it's a good way just kind of get a rough feel for how your code is performing especially if you're making big changes like you might be with number I kind of see how you're stepping through it I think this is actually different now but generally it only runs to benchmark three times Sables garbage collection collection so there are things that it might not be accurately representing sort of how your code is running I don't have a youth problem with that Victor's dinner though has written a module perf which you can use instead which kind of lays many of those concerns so you can look at that as you see fit you can use time in as a line magic where everything has to be on the same line and it will run this many many many times and then sort of give you the best results of that so that's our average with standard deviations - yet you can run it as a cell magic which means then everything in that cell will be profiled and again you get the sort of the mean run and then the standard deviation from that and there are also a number of flags you can pass to time it to quiet output and to shunt the output of it into a variable which is handy if you want to compare the performances without just like either copying and pasting or typing into numbers that you got this used to disagree with OSX so if you if it yelled at you you can just remove the cue flag but once that works then you have this a variable and you can access these are sort of all the runs that were there and you can look at the best run and worst run and so this is sort of how you can compare results anybody have any questions about that right now okay cool Oh question for us to try to optimize things but actually understanding your code this two most important step you just don't know at the time so many times you see people going to optimize something without really understanding what needs to be optimized and so first understand the code before trying to do anything there's no where the hot spots are and then you make a judgement where it's worth your time to actually work to transform your code such a way that you can get receipt of it so this is really the most important step give it it's not the fun step I point that out but it's really important and now we've done it so now we can go on to the fun stuff okay so there are no questions there we can go and we'll open this intro to get dope book okay so now that we understand profiling a little bit we know how to find hot spots and so now we need to figure out how to improve their performance yeah also sorry I will continue to talk faster and faster unless someone stops me so if it becomes completely unintelligible just tell me so that's fine okay so this is here this first function is sort of a naive function to sum all of the elements the Goverdhan sum of all the elements in an array you would not do this if you know that numpy exists it's not the right way to do it but we're doing it anyway so we get the dimensions of the input array we initialize a variable to kind of hold the cumulative sum and then we iterate over every element adding it on to that and then we return the total sum then we're going to import numpy and to start with a 300 by 300 array and we'll try it out if you got the same sum as I did I think that means that the world just ended and then we can just use time it to see about how long that took that is 300 by 300 array okay so we got 13.7 milliseconds so not not bad I mean I was faster than I can do it I noticed you so I'm also saving the result in this variable plane here but I don't use a cue flag so I still get a printed to screen this is just so I can compare later it's happening all right so now that we've got this code we want to see if we can speed up we're going to import JIT from number so from number import ship and there are a few ways to use JIT just in terms of how you apply it to functions okay the sort of the less common way but good to know about or just sort of less natural way is to use it as a function call so here I'm calling it on some array there's this weird double parentheses thing which I'll get to in a second so what I'm doing is I'm calling JIT on some array and I want the results of the jitan to be returned into a function called some array number and the reason that generally you want to do this when you are experimenting is that I'll come back to that let's just see how this performs now so we run some array number now we get I'm pretty sure the same answer to 776 - great so that's good first first test passed and then we can see how what the performance is like time it is sometimes a little counterintuitive that it runs fasting so many more times it seems like it takes longer and makes you nervous when you're presenting but we can see that that actually took 74 microseconds so if we take the sort of the difference of the best times from our sort of vanilla some and the jaded one that's a hundred and eighty times beat up that's for one line of code so this is I mean again this is a simple example but this is the sort of thing that number is very very good at and so this is you're not always going to get a 180 times speed-up but this is sort of the kind of power that's successful here and this is why we don't need to C++ anymore the more more common way and certainly the prettier way when you're when you've decided to use JIT on something is to use it as a decorator and it since it's a decorator is that takes arguments as why is the extra parentheses thing and some will come back to it so you can just add the at decorator on top of your function definition is everyone familiar is anybody not familiar with decorators so on if you are cool so just really quickly a decorator is a special Python function that has been written effectively what it does is it you pass the function as an argument to the to the decorator function it does something either before after calling that function and then saves it into the original function name so really all it's doing is it's doing what I did up here this summer a number some array but it's it's just saying summer write equals something-something summer right so it's not it's overwriting the original func just you know renaming the original function so you no longer have access to it it's just a way to to tweak tweak things in this case it's a way to transform what functions do now yeah I will come to that very shortly do so as in the decorator form you just add the at JIT on top you run that and again we get the same answer and we should hopefully have very similar performance so again about 74 microseconds give or take you might be wondering how it compares to dump I just which would have been you know the thing to do numb pies faster this shouldn't be a surprise but I'll point out it's only two times faster give or take which is pretty good I mean numpy is is somebody's first up really well developed highly optimized calling you know Fortran and C and still like were you know about half as fast that's really not bad okay when does number compile things this is a to that question it actually compiled it the first time you run the function so here when I define when I define using the decorator and then I run that the number hasn't done anything yet it's just it's just primed to compile it the first time that function gets called similarly up here summer right number isn't actually going to do any compilation until the first time you call that so for bigger and heavier bits of code you might see you'll see like title will sometimes throw a warning or something similar that one of these runs was x times slower than all the others and that's usually just due to the overhead of that initial compilation step it tends to be pretty negligible and there are also other things which we'll look at in a little while that kind of kind of remove that from your execution pipeline but just be aware that yeah it's not actually compiled until you actually call the function at least once okay there any questions about what I've shown you so far okay so with that then we will have our first exercise so you can click on the your turn link so this is just to you know generate a Mandelbrot fractal because who doesn't like those I there are two functions here there is mandal and create fractal the code in this next block here I've come this is what will actually run that I let the commented because if you run this before you use number it takes a really long time so that's just to kind of stop you from having to kill your kernel and you know start over but so I'll give you guys about five minutes on this so you all you want to do is you you want to use what you've learned about number so far to try to accelerate either or both of the functions in this cell here mandoline create fractal and then you can see how it runs and then hopefully get a pretty picture okay okay so hopefully everyone's about finished one more minute that's one of those NV extensions I think it's called code collapse or something like that yeah in the readme for the for the tutorial there's a kind of sick it's got a name that I can never be like super underscore something nd extensions alright so hopefully you guys are all of a pretty picture in front of you but yeah so to solve this I'm going to I'm going to do a live coding demo thank you thank you thank you for mana yeah so it really is you know you just get both of those and you get a pretty picture and in terms so on so yeah so I'm running this is actually all happening on my lab machine in DC because my laptop is older so this is on a six core ice like a recent I seven and it takes about 15 seconds to do this and without the jet and then by about 43 milliseconds so again reasonable reasonable speed oh it's not too bad okay without the decorators so if you got that okay it's really creepy so if these aren't here then what I would do is I would call here I would say like a Mandal get equalled JIT of Mandal and then create fractal oh I got it this has been to be a little more involved than I was expecting actually because one of the functions is calling the other I need to also change it in here so into that but it would just I mean if you were going to rewrite it like that you would just assign them into different names basically and then you can run those instead if you wanted to compare the performance of the two decorators are much easier to use yeah I mean yeah decorators are definitely what you're going to want to be using most of the time the only time really that you wouldn't use it is when you want to have like a direct comparison between the before and after without doing a lot of copy and pasting what on earth is happening right so this is pretty cool and sometimes we just want to like you know trust that it's magic but that's usually not a good idea so just for the purposes of exploring what numba is doing we're going to define a very trivial function here where we're just going to add two numbers and return them we're sorry add two variables a and V and then we're going to run it one because we actually need to run it to compile it and we've done that we get back to great once you have run the number func the the number compiled function once then you will have this method available called inspect types if you can run and this is sort of the the typing information what number one of things the number does is it tries to infer the type of the inputs and the outputs and by adding types it becomes faster and smaller I think it has to feed into LLVM to actually get it to compile and so you can see there's going to be a bunch of random stuff in here but the sort of the important thing here is if we see that there's this type in 64 and 64 and 64 so it you know I gave it to in stride I gave it one and one and it figured that out and that's sort of a sign that that I mean obviously adding two numbers together is never going to take very long but that when you see proper variable types there that's a sign that things have gone well now what if we call it again but instead of giving it integer ones we give it float ones who is that a dot after them we get a 2.0 out so that's a good sign it means it knows that we're dealing with floats and we can call inspect types again and you can see here actually it says in 64 and 64 here and this is actually the same thing we saw before we keep going down now there's another one right and this one says float64 float 64 so what number is doing is that it's not going to compile functions for every single variable type preemptively because mostly you don't care you don't need all of that right away it will only compile things as it needs them all right there is a caveat here to be aware of if you're really want things to be in store for instance like that if you if I run this first using floats like the first time I run I use floats and then I give it something like an integer it will not recompile it'll just cast in the float okay so I mean so if you want all these things to be available you start within 32 and then you go to n64 and you go to float 32 and you to flow 64 and so on and so forth so just be aware that if you give it something that's sort of less precise I hesitate to use that word but um if that will kind of take precedence once you have both of these compiled if you give it to integers it'll give you an integer back it's not going to change that I don't think so I think because it was before the type hinting was there I think they do okay ah things I forgot to say I don't work for continuum I have no affiliation with numba I really love number I'm a I like it and I'm a fan of it which is why I like to tell people about it which is why I'm here I have looked at the codebase I have I fixed one error once like that sort of the limit of my pencils into it I'm pretty sure they do beautiful horrible intersections of the bytecode to pull the stuff out but yeah that part's actually black magic but yeah you can't come alright so what is this actual kind of stuff look like you know that it's we see I say I keep saying it's generating code and if you want to if you want to check you can actually you can call this inspect LLVM method and then it's a dictionary it returns a dictionary so we can look at the keys and values and this is what the code looks like that it's generating and it just keeps on going remember we're adding two things together so this is mostly just to impress on you how cool this project is I mean the fact that this is possible yeah to be fair that steps that's a double definition right we did it twice but okay so we've seen it works for intz it works for floats we got those nice like in 64 float64 markers and we did inspect types everything's working well but there is a caveat says the header if we do something that seems very natural in Python but isn't doesn't really have kind of a sound mathematical basis like adding strings together I mean we know in Python we just concatenate them but that's not strictly speaking I like a mathematical operation right so what happens if we do that here I'm just going to keep it separate I'm not using the decorator for reasons that I don't remember and what happens if we run this that works great everything's fine it turns out that adding strings together is a pure mathematical operation fine no if we now call inspect types on this you'll see that it's a little bit different so we don't have string or you know jar or anything we have PI object and what's happening here is that this function has been compiled in what's called object mode and this is where things get a little confusing just in terms of terminology we're going to power through it and everything to understand it or we're not leaving no so there there are basically two modes that numba has to compile functions one is called object mode and that is sort of where every everything will work it'll make it work by falling back on to basically the you know the Python object system and it'll work in Python will work with us then there's a thing called but potentially very slow so like yeah the other option is called know Python mode still I mean yeah but the naming is and what that is is sort of a subset of Python that numba knows how to deal with and knows how to make very very fast so specifically things like all you know mathematical operations almost all of numpy there's some parts of it I think it still doesn't quite handle but I could be wrong about that but there are things that can't do it can't you can't get a class for instance and have a endo Python mode so these are sort of similar chickens and when you have something in object mode that's when potentially you could not be getting the sort of performance improvements that you're expecting so especially if there's one thing in your function that is not compatible with no Python mode then those nested loops that you just re learned how to write because I told you two are going to be nested loops in Python and they're going to be really slow like a microphone no because it's not about get to a point where there's a mas will have to fall back to okay but it cannot deal with not a nice it'll grow there as a regular so it gives you a hint when you when you use no wise mode is simply you want to know when numbers going to fail please help this situation where you're getting these five objects that you're not going to be able to speed up please Sally that's all that haven't gotten up yet so - oh that thing is going to do it right it doesn't air instead of just working but not up to much yeah yes so um yeah if you want to yes yes please let me come back to that okay I'll come back to that right after I show you this bit because it'll be easier to understand so in an effort to understand what is failing here we're going to I've created this this is going to write a file called know Python failure by and I'm doing some mathematical operations here which are mathy and then I'm doing the string at or I'm going to give it these you know a string a and B here the reason I'm creating this as a file is because if then you can actually there is a a command line tool that number comes with so if you prepend an exclamation point on an IDE Python on a notebooks a little run as a shell command so that's all I'm doing it's like I'm running this in the command line and I'm giving it this annotate HTML and I give it a file name which I've called failed at HTML and then the file I want it to run so if you do that it will generate this nice sort of yeah the description of the code and you'll notice that there are kind of regular lines they're green lines and there are red lines in this case again because this is basically telling us that this red line this is where number is falling back to object mode saying this is what I didn't know how to speed up but at the same time it says oh but hey while I was doing that I noticed you had this loop here that was all numbers and I did speed that up right so even if you fall back to object mode depending on what's their number can still help you out it's not worth just totally abandon it it may still be able to provide those speed ups but it's just good to check on this thing so if we look at the this number intermediate representation thing you can see that there are some n64 is in here which is where it's and this is how where it's kind of called loop lifting with lifting loop here and it's making that part fast and then if we get down to where we're concatenated the strings then they are PI objects so back to the question about whether you can Jack a method yes but the the the self is never going to be in no Python what it's always going to be in PI object mode so you usually aren't going to want to get a method unless you have like a loop in there that you want to lift so if there's something in there that you can speed up that's kind of a pure math operation then possibly but usually that doesn't work very well yeah basically they're I mean they're extending it sort of all the time but it's there they basically rewritten a bunch of stuff to make it work yes no no that's up to us I'm I will show you how to avoid this happening when you're unaware of it very shortly I think now actually so as Lorena was speaking to a second ago if this if no python mode is what sort of with exceptions is sort of be the thing that allows us to get this BS we want then very often you want to be able to tell number like no no I really want this to run and no Python mode and if it can't I don't want you just to pretend that everything is okay because that that'd by that is the default behavior it will work it will not break your code which is very nice but it will revert back to this object mode and you're not going to get the speed up to you're expecting um so if you want that to not be the case if you want instead to say no no I'm done like I can't do this then you can pass this no Python equals true keyword argument so you can see here's where those kind of double parentheses things come in and this works the same way in the decorator you just @ji Coran know Python equals true with the decorator if you do this then you get this error I have toggled my trace backs as you if you haven't you might be seeing a much longer trace back right now this is at the very bottom of it so at the bottom of the trace back and you should ignore the trace back effectively because it's not going to tell you anything except where in number like in terms of where in its type int like its introspection of the code it decided it should bail but it does give us this reasonably healthy message so it failed at know Python front end what that means like we can't run we can't make this whole function run in this fast know Python mode and it tells us actually that it's an invalid usage of plus with the parameter string and string it's telling us it doesn't know how to quickly add two strings together and it actually very helpfully tells us here are all the different things I do know how to add together if you want to use that like just do do these things like it knows a lot of stuff not strings so this actually this happens so frequently I mean like very often if you're trying to make stuff fast you want it to be fast so they added a shortcut for this so if you can also from number import engine engine works exactly the same way as JIT you can use it as a function or as a decorator but it always passes no python equals true it's just a shortcut to always do that so you can see if I run this with engine instead I get the exact same error and again it tells me is the list of the signatures that it would have accepted there are a few other compilation flags you can pass to both JIT and engine that are more or less useful depending on your use cases I guess that's true of everything in the world but so you can pass cash equals true now when the before I said it runs it compiles the first time you run a function and that's true and then for the sort of in this case in a notebook for the life of the colonel it's compiled and it's ready to go but if you restart your colonel it's going to go I mean it's being stored in memory and then it's going outside it doesn't need it and garbage collection is just going to take it and throw it away so if you're actually running this in your code that you you know maybe run on more than one day a year or not when you're restarting Colonels or something and you don't want to pay that compilation overhead every time you can pass cash equals true and that will sort of write something akin to like a pyc file into the PI cache folder and it will save you the compilation overhead the next time that you run it yes yes there's all you can also run it with the second set of parentheses and engine I know right yes yes it is yeah number is very smart about that if you make if you make any change this I was just about to say this great question if you are about if you make changes to something and you have this cash equals true flag set number will detect that the function is different and it will recompile it then so it's never it's never going to let you run something you don't expect to run yes unless you know you just made a mistake the other one I mean everyone talked about releasing the Gil I really hate all the talks about like you know collect amis and kill the Gil and all that I mean part of that is that my name is Gil but and like Larry Hastings has said some horrible things about me and I don't think it's personal it's fine if you want to release the Gil you can you just pass no Gil equals true this does nothing to help you I mean like it's not it's not going to save you from yourself at all like you now have to figure out what is a thread safe operation use concurrent futures don't like roll your own if you want to do that having said that there are better ways to make numba work in parallel which I will show you so you can if you need it it's there I'm telling you because it's there I've never you ah yeah okay it is 853 why don't we take a quick break I think there's coffee and stuff outside and we'll come back right at 9:00 all right okay guys let's get started again I had a question over the break that I wanted to address somebody asked if I could speak a little more about how you might use number with classes you can't really but nobody put this so that what you would what you do is if you have something that's in a method that's expensive that you want to speed up you just break it out of the class you know you use you define it as a separate function and then the method that you call just calls the jetted function because that's I mean like a method can call something that's been visited just fine it's the actual ginning of the method itself because of the self you know dictionary is getting passed and that's what messes it all up so it makes code slightly less pythonic and slightly less pretty and the that's the price you pay for speed but we would be doing the same thing if we were you know using the Blake's Lysanne or you know f2 pie or whatever so I don't know the answer of that there'll be someone around this week who can definitely answer that it's just not me okay so now that we yeah so now we're going to go to this fourth notebook here direct summation so this is going to be the first something closer to an actual application all right all right so n-body problems they're everywhere so I mean they show up in like the dynamics astrodynamics electromagnetics you can usually like use like greens functions to make all kinds of things look like and body problems so it's a common enough thing and really I mean I have an equation here this is just a you know a generic kernel you know something there's we have a set of pairwise interactions between every possible set of pair is in a system so it's something where you're going to have to loop over a bunch of particles and say for each one of these look at this other one and figure out something this is a sort of an order N squared operation right you know whatever if you're having to loop over all the particles in the field it's going to cost you so there are certainly like algorithmic approaches that can make this faster we're not going to do those right now we're just going to brute force it because we can and also because the other stuffs much harder and will take longer than 30 minutes all right so to start with I have just as we have a comparison starting place I have a class here called point point initialized with just three random coordinates XY and Z and then it has one method called distance where you pass it a another point and then it returns it's was that distance to that point it was a word I was looking for there just left my head but tacky got the word was taxi cab I have another class which is a which inherits from point and also just sets a mass and some by some potential which defaults to 0 here and so if you're going to do something like that you might just want to kind of experiment create like a thousand particles and then just play with it so we can do that here where I'm just going to define a thousand as a thousand so when I create a thousand particles the mass of each of one will be one over N so if I want to look at it I'll have this long list of particles and now I want to sum the potential at each particle and so to do that I have to loop over every particle and then for every other particle not including the one I'm currently examining I'm going to calculate the distance from my target apart from my target target particle to the source particle and then I'm going to add that to the potential of that particle as a function of the mass over its distance so we can run that and actually run it okay so it's reasonably quick it takes a little while then you can see the star for a second it is an instantaneous thing so let's see how long that takes and time it 792 milliseconds your your advantages will be different probably slower unless a laptops have gotten really good so we know it slows no let's load the line profiler and let's see what is taking so long like what the what the pain point is here so again we want to we're doing LP run the function we want to look at is direct sum and then we're calling direct sum of particles in order to kind of trigger the run okay let's take a look at we got here okay so here's an example first of using line profiler with something that isn't just you know a one line sweep function that's good to begin with and you can see here the number of hits so the things and the inner loops are going to have been hit more than things in the outer loops and the all the time here sorry I forgot to mention the default unit I'm sure there's a way to change it but I don't know what it is is always microseconds so when you say when it says time with just a number here it's in microseconds is what it's giving you but really the line the column rather that that is usually most informative in this is the percentage time that's telling you of the of the run time of what I just did like how much time was spent at each of these lines and so we can see that there's definitely sort of a stand out here that's costing us right when we calculate the distance between the target and source particle that's where the hotspot is in this I mean 20 percent is not great either but I mean eventually things just balance because time exists you can't get away from that okay so we know that this is not as fast as we want it to be we know that the distance function is what's hurting us so we want to make it faster but I just spent like 10 minutes telling you you can't JIT classes so what do we do so there is I should say there is a JIT class structure in numba it is still in early development and it's a little bit tricky and doesn't work the way I expect often times I'm sure will get better but I don't I'm not showing it to you here because I think it would number this development cycle is so fast that like anything I'm telling you now is going to be old news in two months if not sooner so but so how would I am the way that I will go about using this is it's nice to have like attributes protective record of you know literate programming you want to be able to view things like well have well this is my X this is my Y instead of you know code where you're calling into a specific well I know that my first column is going to be the X is my second column is the Y's I mean it's easier to make mistakes that way so one of the things you can do to sort of get that literate programming back even when you're not using classes are none PI custom D types now and so we're going to look at that a little bit here so this is an example of a of a D type which I've named particle D type because I am NOT creative and so I have I'm giving it basically five I'm going to call them attributes but they're you know they're just kind of flops effectively XYZ M and Phi to match the attributes I had in the particle class and then you have to specify a data type for all of them so that's what I've done here and you can see then if we've done this we can then like create an array in this case just using ones and then pass the D type of that custom D type we created and then you'll see that actually each entry of the array is going to be five values right and those five correspond to the XYZ m and five that we have there does anybody use custom D touch before just to cool curious okay so then that lets us do things like actually access things by their attribute names right so in this case if I want to take the first particle and my array of particles the zeroeth and then I want to change the x-value of it to two I can do it this way right I can do you know I still take a lot of brackets but you know my array 0 x equals 2 and then that 0 at x value has been set so it's an easier way first off to be able to know what you're referring to when you're assigning things or calculating things I will show you my solution and make sure this works first though someone just ran it can work and may be kind of scared thank goodness okay so here is how I did this so I'm using engine you don't have to raise your hands for this I know when I started with this I kept forgetting to actually add the JIT decorator so if you're you just want to do it do a quick double check make sure you put that in there it won't be like super slow anyway we're dealing with a relatively small problem but always remember to actually JIT the function so what I did here is I there's right the array initialization here and then I'm just in a very non pythonic way I'm eating over every element in the array and then assigning individually which it depending on the number of years you spent with numpy is like your fingers will like will reject you if you try to do this it's like why would you ever do this why are you writing a for loop here and this is part of the unlearning involved with number sort of you need to it's hard it's hard for all of us but for so just you know we're for these certain places you want to sort of take the naive approach is easier for a number to understand and so we'll make it faster as a rule of thumb it's not always true but it's almost always true one other thing I'll point out and I just really like this it's just a slick little feature is that if you notice my assignments here are not the you know P of like that I mean it can be that definitely works as I'm sure most of you just found out in a JIT function though and only only initiative function if you are using a number I can G type you can actually access the attributes the way you would as if it were a class so you can use the dot accesses here so if you want your code to look this way because you like using that because it's a you know syntax we're all used to it's available to you the caveat there is that if I take the decorator away this is invalid so that's a it's a balancing act you know you can you can choose how much you like the pretty syntax if you're trying to debug it you're definitely going to want it just with the more with the bracket quotes something to be aware of I don't know the answer to that I don't know if numba supports record raise just so the video knows that I don't know that I was otherwise I'm just saying I don't know two people randomly who are off-camera but so we look at that we get just looking at three of the particles we get the distances and the mass and and they shouldn't have any potentially because we haven't done anything with them right okay so we've now created the particles that's awesome we did it and the next thing that our class did it we're replacing is that it calculated distance between two particles and so the next exercise much like the first one is to replace this class method with a standalone function it does the same thing right so I've given you again just a little skeleton it should take two particles and just return the distance between them and this by the way up is a balloon okay the link to this exercise you'll see it's an egg it's in the same notebook so as long as you ran exercise one you already have like a heart of hearts array defined in here so you don't need to worry about setting that up should to be there and I'll give you guys like five or six minutes for that okay I'll show you my solution now for this so again this is pretty straightforward again I'm using the sort of the record right style attribute access style here with the elements but we're just squaring the difference of each of the particles and then square rooting the whole thing and returning it we can try it out to make sure it works just always a good idea just see how long that takes for two particles it should be pretty fast nanoseconds that's always a nice numbers okay great and so now if you you know guess it or haven't scrolled down the notebook kind of take it the last step right so we had a function above called direct sum I lost it somewhere here that was operating on the list of particle objects and now we want to change that so that instead it operates on our numpy array of particles instead okay so I will give you guys about seven minutes for that actually if you click on if you click on the exercise I've copied the function again for reference so you don't have to keep popping tabs back and forth so again just to show you my my solution there are other solutions of course again it's sort of approach it in a in a really naive way a lot of like you know when I wrote the first version of this I definitely am NOT you know I'm you don't put an if statement inside of a double you know a double for loop right that's usually not something you want to do if you can avoid it so you get away with it by concatenating arrays or something instead but with numba you sort of don't actually have to worry about that because it's still going to be blazing fast so I'm just looping over like depart the cell whole set of particles twice and then as long as I'm not talking about the same particle then I calculate the distance using the distance function we wrote and then I just add that to the potential and then I return the list of particles we can just check and see how this runs in comparison to the original version which remember was using numpy and was not written in a super naive way just happen to use objects and so we had a list of objects instead so that took 7.6 milliseconds which is a 100 times betta so there are other ways to make this faster right again I was saying there are algorithmic improvements we can make certainly instead of just not mean we are naively looping over a bunch of particles but if you think about sort of just return on developer time I mean we just spent maybe 15 20 minutes or so rewriting rewriting a collec admittedly a simple class but we did that and you get a hundred times speed up this is nothing nothing to sneer at what sure it's in place already really i one thing I do want to mention I meant to mention earlier and didn't is that and you guys have already discovered this I'm sure but you can call jaded functions from other jaded functions right and similarly you can you cannot always call non jaded functions from jaded functions it needs to number need to be aware of these things so in the event that you have a function that you want to call from within a jaded function it's always best just to at jit it even if you know you're not expecting a speed-up from it that basically just makes it mixed number aware of it otherwise it's going to come to a variable name it may be I mean it's actually a name that doesn't really understand so it's something to be aware of if you get a weird error like a name error you're like why is this here it's usually just adding a JIT somewhere don't know the answer that I think so but I can't say for certain I'm going to reduce you to su like later this week and you can ask them all these questions yes yeah yes I I tested that a few months ago and what I think I remember is that there really is no difference because effectively it this is in a weird way this is not the code that we're running right it's sort of the it's the representation of the code that we're feeding the number that then is writing code and to number using this just says look at you know this you know ents element of this array and if we do it the other way it says the same thing so I think when it gets optimized and compiled down it's effectively running the same thing so I don't think there's a difference beyond just pewter in your you know CPU clock or something okay it is time to 9:45 the next two notebooks are kind of like face punchy in a good way but they're they're dense so why don't we take and also before I went up this to the coffee we'll take about a 10 minute break be back here at I forgot what time was already 955 and we'll start into the next bunch so you guys have a break before then okay ready everyone we're going to give you the fastest introduction to some of the tricks of navier-stokes equations for computational fluid dynamics how many people in the room are have some background in partial differential equations alright we have perhaps more than half of the room and that that's good for you for the others bear with us and you'll get some ideas it's going to be purchased a real fast introduction with the purpose of giving you just some clue of what the application is for the next part of the tutorial the purpose was to move very fast in this tutorial from something that was a simple toy function adding two numbers to something that looks like a real application where you would get a feeling for how number might be used in in in a real context where you have a complicated piece of code that you want to explore find the hot spots and then optimize using numpy so for that reason we we are going on with this application and just trying to give you the gist of it okay and this is the famous and loved by some reviled by others navier-stokes equations it is a partial differential equation up there a nonlinear partial differential equation which makes it special and indeed also vectorial so it's written to one equation there but we have the vector U which is the velocity of a fluid that we'll have in three dimensions three components so you have three equations there those three equations are for the three unknown variables the components of the velocity U V W in 3d space but there's an additional unknown there which is the pressure P they are their first term on the right hand side so we have four really unknowns with three equations which makes it an unclothed system if we are solving in the context of compressible soil dynamics we have an additional equation from thermodynamics from the equations of state and that closes the system but in incompressible fluid dynamics we're kind of stuck we don't have an equation of state so what we do in that context is to use the conservation of mass the equation of conservation of us for a incompressible fluid that actually takes this very simple form this arm partial differential equation here that expresses that the divergence of the velocity is equal to zero that is also called the dilatational of the velocity equal to zero it simply expresses that the V are a volume of fluid in an incompressible scenario cannot really change its total if you compress it one way it has to expand another way that's the physics of it that's for an incompressible for something like water you cannot really compress water for example so that simple equation that represents in compressibility can be used with the original equation the navier-stokes equation to close the system in the following way we take the divergence operator so now that dot the divergence operator of the first equation well it's three equations really so we can write them out as three equations but for the context of the tutorial and most of the times in for pedagogical purposes you will start with just two dimensions to make our lives easier so let's write it down as the only two equations there so thinking that the fluid velocity only has two components U and V in two dimensions we have two equations the pressure of course is still there to have a cursor the pressure of course is still there and we have the gradient of the pressure in these two terms okay so now we have these two equations for a two dimensional fluid and we're going to apply the divergence operator on these two equations when we do that we have we can write down you work out these combined differentials in pretty much one and a half sheets of paper you can work these out so there's a little bit of effort to simplify firms where you can reuse previous this equation we use the fact that D u DX plus DV dy equals zero if I expand that out in two dimensions and combining terms that appear the udx plus vdy they appear together you combine them and you cancel them out because they're 0 then the result is that you can simplify that into the equation that you see below this equation that you see below and you can recognize here for example that if I took the derivative with respect to X of this whole equation on the top I'm going to get a second-order derivative with respect to X of the pressure term and that's what appears right here at the bottom right here if I take the derivative with respect to Y of this whole equation the second equation you see that is I'm going to get a second-order derivative of depression and that's how this term comes about and so on the simplifications then give us only these terms with respect to the velocity so this is how the famous pressure Poisson equation comes about you take the momentum equation navier-stokes you apply the divergence operator and cancel out terms like crazy and after a sheet and a half of paper depending on how big you write you get to this beauty here plus our equation is pretty cool it's called an elliptic equation and it has several nice properties one of the properties of the Poisson equations or the elliptic partial differential equations in general is that they physically you can imagine for example something that is typical of a Poisson equation is a membrane right a membrane imagine a membrane on a drum kit if you push the membrane somewhere in the middle the whole shape of that membrane adapts to that situation if you move the boundary of the say instead of having just a round boundary you change something around it then the whole memory will adapt to that change in the boundary that is a physical manifestation of the properties of an elliptic partial differential equation so those properties allow us to solve a Poisson equation by starting with an initial guess say the function P equals zero everywhere and then utilizing either the boundary or any internal sources in the physical system and just iterating with the Poisson equation open over time over a pseudo time so to let the function adapt to the boundary conditions or the sources inside so it's a physical property of elliptic equations that allow us to use an iterative solver to get a solution an iterative method to get a solution here below it's written out as in a simpler form where all of the terms that include differentials with the velocity I just bundle together in this letter B so this all of this thing on the right hand side here is just a new variable B and that's going to reflect on how write how we write the code how we write the solution for it but to explain the iterative method that we're going to use to get the pressure from the pressure Poisson equation you imagine a grid of points we're talking two dimensional so we have a set of points in a in a grid to solve the equation and the way to write a partial differential a second order partial differential of P with respect to X what is a partial differential it means the change of P in the direction of X right so P in the direction of X to be able to represent the change of the function P in the direction of X I only need the points on a horizontal line right that's the X direction if it was a first-order differential I could represent the change just a good gradient right the difference in the two values of P divided by the difference in X so that is the first order find a difference representation of the partial differential with respect to X but we have a second order partial differential and the second order partial differential the formula for the second order and finite difference formula is written here it takes the square of the distance between two points in the X direction and it takes not two points but it needs three points imagine instead of taking just the slope we're needing to take the curvature of a function and so we need three points to represent the curvature the point in the middle pIJ will require the point before it P I minus one J and the point afterwards P I plus 1 J to be able to write down a discrete form of the second order partial differential in the X direction similarly in the Y direction I only would need you have to scroll the other way around is my computer and it just really just driving me crazy I can't do it I really can't okay so we have this center point and I'm saying I need three points to represent the curvature of a function but only in the Y direction that would be the part of differential with respect to Y so I need P I J minus 1 P I J and P I J plus 1 with those three points I can write a formula that approximates got it approximates the second order partial differential of P with respect to Y so this is where these formulas come from this is a finite difference representation of the continuum are partial differentials that appear in the equation and with these discrete forms we can now write some code to approximate this continuum problem so we have the subscript I and J they denote spatial locations in x and y in our two-dimensional grid and they're also going to be our indices in the code of course we're going to simplify this now the way we talk about this formula right here is the discreet laplacian the laplacian is second order derivative with respect to X plus second order derivative with respect to Y in 2d if it was 3d I would have to add a second order derivative with respect to Z that's it that's the laplacian a very typical partial differential operator so this is the discrete form where I've expressed the partial derivatives now as a formula on a grid we're going to take that the separation between two points in the X Direction is equal to the separation in the Y direction which is our choice to make just to simplify our lives so that we can then make these two things equal and we can write the formula in an easier way so this is what we're saying right here that we assume that the mesh computational mesh is uniform in both directions and with this discretized form now we're going to use a very neat trick which is an iterative method called the Jacobi iteration the simplest iterative way to solve an elliptic equation not very fast not very efficient but easy to understand and easy to compete to to to to code so we start with that and what we'll do is the following we're going to isolate the term the center point in this little piece of grid that I that we have drawn here P IJ okay we're going to isolate that and put all of the other terms that hi that have I minus 1 that have J minus 1 that have I plus 1 and so on all of the two of those terms on the other side on the so we isolate them I think this is the equation here in addition so this is here only the laplacian what is what is written out here so that is only they find a different representation of the part that includes pressure over here only this but that we have B we have all of those terms that have velocity derivatives and herein it is all written out with all of the velocity derivatives here with also their finite difference representations so we've moved from a continuum world into a discrete world by writing out finite difference representations for every single partial differential term that was in the equation and then we've isolated P IJ and moved everything else to the other side so with what we normally do is we start with an initial yes ok the initial gift corresponds to the value of the function P at every point of of that mesh and the value of the other functions U and V on every point of that mesh we assume that to be the eurasian zero and we use this formula on the right hand side to compute the green value of p that was going to be p IJ i plus 1 represents now an iteration of k plus 1 sorry k plus 1 on the top here represents an iteration so we use the values to calculate the values of our first guest to calculate the new values of p in the next iteration p IJ and then that value now can be used again with all of these terms on the right hand side to update p and so on and so forth we can iterate on the values of the pressure and velocity to solve the Poisson equation solve the elliptic system as it relaxes to its steady state we say this is a very neat trick for elliptic equations and the good thing is that we can now write explicitly a iterative way to obtain a solution for P and once we have P we can obtain the velocity components again and and the circle goes around until you have to stop at some point and this is really a iteration towards a steady-state and you have to write into your code some sort of criterion that says well when the value @k compared to the value of K plus 1 is really close together and you have to come up with a judgement by about what close together means then we're going to stop we're simply going to say you know the sum of all the differences for every point perhaps not the Sun but perhaps the sum squared or the square root of the sum squared some form of norm to compute the distance between two solutions it's smaller than say 10 to the minus 6 or 10 you have to have a sense for what a small difference between two iterates is according to your solution then you exit the iterations and this is how we solve the Poisson equation and in CFD there are some tricks to this that come in to the way that you you know when you apply the diversions to the whole navier-stokes equations there's a lot of terms and come out and you have a choice of eliminating these terms using the continuing the continuity equation or leaving some in some people have developed some sense for some for for better or ways to get to a discrete equation but this is the most typical form sorry to get to a Poisson equation but this is the most typical form right here in 2d at least and the Jacobi method to solve for the pressure as it's the slowest iterative method that we know but it's simple to explain and to code and a lot of people still use it in fact and we're going to show you how to a code for solving the Poisson equation and give you some pre-written code for this part over here where the velocity is updated with the new values of the pressure but the point of this is going to be to give you a code that is more realistic where you can see the effect of number now on on this real application still pedagogical application because it's two-dimensional and so you have to imagine that in a real application we'll have 3d but but it is it is a real application people do solve the navier-stokes equations in this way what else yeah this is notebook 5 do you have it in you sure and then the number then we have some optional notebooks I was gonna go straight to does it give you an idea of where with using a foundation or any question ok ok you never knew CFD was so simple guys that's actually and I was like if you're going to get a 10-minute intro to CFD if that's a good way to do it first of all ok so now that we all understand deeply and intuitively help with some equations work this is so this is sort of the the one of the classic validation cases in CFD so you have a cavity square cavity and the idea is that the three walls along the bottom sides those are still still as a wall there's their walls I think that's the word we use and then the top is is moving along at a steady velocity and so because sort of a no slip condition between the wall and the fluid that will induce sort of a circular motion in the fluid and over time that will sort of reach a certain steady it's always going to be moving but it'll resore tuddy position and you'll end up with slightly higher pressure in the top right corner slightly lower pressure in the top left corner because of the web fluid is flowing and there are lots of I mean that goes back to I don't know the early 80s or something when sort of the definitive definitive computation activists were we're first established now in terms of how we actually go about turning this into code we have here the pressure Poisson equation and I know that this looks extremely nasty because it is like it's fine but one of the really nice things actually and especially in terms when we're iterating through the pressure is that if you look at you can't select latex anyway if you look below the first line right so ignore this top line for the moment and just look at the other three lines and everything in there is either Delta X which we know a delta T which we know or a U or V right and all of those values stay the same at a given instant right so when we're calculating the pressure iteratively we actually only have to do these bottom three lines once the only thing that's changing as we iterate in pseudo x we're trying to find this relaxed pressure solution for a given time step is the top line that's the only thing we have to actually have to change the rest of it stays the same so we don't actually have to rewrite any of this stuff just the top one so it's not not as bad as it looks in the interest of doing that we have provided it for you so the velocity term is what I'm calling it and remember it was D like before we had to you know so not look squared P equals B so this is B this entire function is just the bottom three lines of this equation here it's just just a guess that's what it is okay so I'm going to run that so now we get to the pressure field and this is what's actually happening right so we give it an initial pressure field that's our guess either it's we say zeros or random because it's the first time step or it's whatever we have in the previous time step in this case I'm giving it B because again we don't want to recalculate that every time step so we can just do that once and hand it off we don't worry about it and then this variable called l2 target and what professor Barbara mentioned towards the end there you just have to pick a time when close enough is good enough and what that l2 target is is that we take the l2 norm of the pressure field at one time step and the previous time step and if the difference between those two norms you're right iterate between the iterates and if that if the difference between those is smaller than the l2 target we say close enough good let's move on to the next thing okay and so you can see here we have a while loop and while this I see if I even call it I toured if I knew what I was doing so it well the iterative difference is greater than L to target and I've added in here for the sake of not burning people's laptops if there's a mistake we just bail after five hundred iterations so I mean otherwise this can go for a long time it's shouldn't for this particular set up but if you misplace an eye or a J or something so this one will just bail out after five hundred durations so we copy the existing pressure-filled we make a copy of it so we have it to then update the new thing and then here you see this line and this is really you know this is the right so if so look we've got point two five times PNP NPN PMD right and then if we go here we have 0.25 times PN PNP NPN be like B we're going to worry about the IDS of Caribee we're not making you do B B's over he's dead to us and then the only other thing you have to do after learning all this in 15 minutes is then apply some boundary conditions so after each after each update to the pressure we enforce at the boundaries some conditions that are specified by the problem setup and in this case what we said above and we looked at the cavity is that three walls are just walls they don't move and fluid can't flow through walls and the ceiling is moving the top of the cavity is moving at a constant velocity and so that's represented here by saying that the the change for instance at y equals zero so along the bottom of the cavity the change in the Y velocity at that point is zero right it can't they can't continue to change a lot because there's a wall there just stops similarly at the two sides a change in the X velocity is zero and then pressure I apologize pressure and then at the top we set the pressure equal to zero and then in the interest of also making this little speedier rather than checking that difference in iterates every single loop because it takes you know a couple to get there I just check every tenth time so we know it's very possible that we're it's a guarantee that at some point along the way we will over calculate three or four times but it's better than doing it every single time so that's what this last kind of if statement is okay I hesitate to ask are there any questions don't worry about like grokking everything that's happening here it's mostly about examining this is first off the student we're starting with right this is just the how we are we have some arrays that we're updating by combining other arrays in an iterative fashion and then we're going to see what we can do to speed that up class to get to this point it will probably take about what four weeks so what we wanted is just to give you a hand wavy presentation so that you know where the code coming from a complete mystery but be patient that really this is syste system a lot of background that we're trying to give you the system okay let's do it so we define the pressure Poisson equation I wrote here in the interest of brevity which is hilarious but um but we're only dealing with the Poisson solver so there you are welcome at your leisure if you spend your free time like I do reading you know code for discretizing partial differential equations to check out in the the Smith's folder is a thing called NS helper which contains sort of all the boilerplate for how to run this setting up the velocity fields so all the setup and stuff is being handled in the background we're not really looking at that we're really focusing on the on the the Poisson solver here so that's what this cavity function does I am also there are we import pickle here because I wanted everyone to start with the same initial conditions so I created them and I pickled them for you so you can just load them up and again that's baked in here so and then we have this function called run cavity and this is how we're going to specify the the actual problem so we're saying that our discretization parameters is pretty coarse we're saying we have 41 points in the X Direction 41 points in the y direction and we're going to go for a thousand time steps basically in each time step we're saying a delta T is 0.0 0.8 time units you could call them seconds the probably not and then we're going to this art all argument here that is being passed is that that l28 or gif target so we can make this stricter as we go and we normally would but again for example we're just going to go to 10 to the minus 4 because otherwise we're all going to be sitting around waiting for a long time they're a little slow they're going to be faster soon just wait yes ok so now that's all defined we can run this and we can say you VP are their work gets returned here that's going to be the velocity in the X velocity and Y in the pressure field we run cavity nope no you actually hit enter ok doesn't take that long ok and then I've got this cuivre plot function setup with kind of all of the various colors and things and you can see that we get it worked it's great news is cavity flow and so you can see the the Quivers the arrows are showing the velocity at each point I've cut out mm because otherwise there's too many arrows to really understand what's happening and you can also see that the the color in the back is a contour map of the pressure in the cavity so you get a high point in the top right and a low point the top left which sort of maybe makes intuitive sense as fluid is drawn away from somewhere the flute the pressure kind of drops and is pushed towards something the pressure rises we see the expected behavior so that means our code is working which is great we did the first thing we got an answer and we got hopefully the right answer and now we're going to make it fast those are the three steps all right so now just we have this to compare to act 2 whatever we're going to do now I say as if I don't know we're just going to save that UV and P that we've calculated into just dump it to a pickle file so we can load it up later ok so what do we do if we have something that we think is slow we want to make faster what's the first step yeah and Rach awesome all right this whole tutorial is going to success if we just got that across that was the only thing we needed all right so first we'll run time it this is the naive profiling we just want to see how long this is taking so remember this is a thousand time steps at 47 good points and x and y or whatever number I said 44 something okay 450 milliseconds let's say we squint okay that's fine if this number will go up really quickly if you make those grids larger and now we can profile it too so we're going to use the line profiler a line profile we knew the line profiler we're going to look at cavity flow which is here sorry I lied which is here oh I didn't show you what it was specifying and I will open this out them okay so here's one of those functions that's hiding in the background but we're just defining the velocity as we're setting making them to zeros we define our B initially and then we just iterate through time and at each actual time step we're calling this pressure Poisson equation we're calculating that B velocity term and then we update the velocity here which we're not going to worry about so our there but now see what you're exercising so this is yeah this is a better example of real profiling every incompressible CFD situation most of the time is going into solving that if you've seen incompressible flow before you know that this is what always happens it's always the pressure Poisson equation I mean this is actually better than usual it's you know can be like 90% of your compute time is spent on this one thing so if we were going to pick one hot spot to speed up in this code it's probably there and by probably I mean it's there so we're going to do that so you can click on this exercise beat up the PPE and here's where we get into loop unrolling and not like in a code way that number does for you I mean we unroll the loops so we have is this equation right and at each iteration in the personal equation we want to we make a copy of P into P N and now I want to update the value of P at at every I and J and set it equal to a combination of these four peas and the B which is given to us and update P everywhere and that's what's going to go in the bit here we took care of the boundary conditions because they're just hard to think about and like I got them long five times and so it's not like torture you with us but so take a let's say ten minutes to start with on this and just you know feel free to bop back working the notebooks and stuff but really what you're doing is remember like we have before we have it as an array operation now we want to make it not an array operation which is again weird encounter intuitive we want to loop over the eyes before over the Jays and at each point update P as a combination of the previous peas okay and definitely ask questions when one more thing I didn't mean to leave this out please please for the sake of your computer don't forget to do that before you test this going to take a really long time otherwise just add the add the decorator supportin so my solution for this is I have that same skeleton that I gave you guys and what I added in was just this right so for in this case I'm skipping the 0s and the last element in each direction because those are being handled by the boundary conditions if I didn't do that it would be fine because the boundary conditions would overwrite them with the correct value anyway it's not a not a big deal but hit right over all of I and all of GA and then at each I J points we recalculate T as a combination of PN which was the previous device the previous time step and we add in the B values which are the velocity calculations so let's run this and always good to make sure you get the same answer right it's always a good thing so yes good no errors and if you get the same answer and you don't get the same plot something's really wrong so just that's also good you know little spot checks is important I think I forgot what the first time was let's check so I had 447 milliseconds for the initial run and this time yes yeah counter-intuitively takes longer because it's faster 180 milliseconds okay granted this is not a 100 times speed up but also the code we were starting from in terms of here if you're going to implement the Jacobi and Python there's nothing in there that's that's kind of slow right I mean that's sort of the fastest way to do it in Python with numpy right now to the best of my knowledge I'm skipping I'm not checking l2s every single iteration you know there's lots of little bits in there that could make it slower and I could make this more impressive but also if anybody here would not like a three time speed up on their code just raise your hand and you know you can go it's fine you don't to say but and I always find it's especially useful after you've done this I mean you get that it runs faster that's great profile again right take another look at it and see how the how the percentages have changed right so we are at close to 70% and now the whole application faster we hi yes I'm not kidding only one part of it which used to be 70 some-odd percent so we all need to speed it up part of it but part now instead of what's it's only something 70 mission it's not 21 29% from this total so the other parts of course so this raise is a more of a philosophical question about when do you stop right so we drop this down to 30% okay well what about the what about the velocity Turin I was taking up 20% we haven't done anything to that like maybe we can speed that up I say this and I know it sounds silly just remember that it's always going to add up to 100% like it just because you can go crazy with this you really can I mean in a in some kind of weird perfect world I guess every line would be like the same percentage of time or something I don't know I stopped at a certain point give you know take a deep breath it's okay I mean I find you you owe a if you want to make sure you optimize one more thing than you had to write because you optimized that last thing and there's an like okay there's no change and that's when you can stop if you're really you know just grinding for performance maybe this was plenty right in the interest of showing you guys this though I would say like I did I did unroll that whole B term I'm not going to make you guys do this and so now if we run this again we go from 180 to 143 we can eke out some more there we can take a look again at the percentages and now that velocity ohms down to 4% but now because that went down the pressure Poisson went up again and you're so here's about where I would call it quits it's generally what I was is what I would offer here but it's good to check how all the parts of your program are adjusting and behaving with respect to one another in terms of their their execution time because maybe something shows up and maybe something doesn't but is precisely the scenario why we wanted to show a reallocation because in the previous example you just have an isolated function 200 students but this is the reality of hard work you have a complex code where the hotspot you see that up but the code itself as so many other part that you believe that by speeding up that function even many times the whole application only natural treat you like creative you just have to now realize that that's the reality right speeding up one function out of time gives you an application speed-up erroneously yeah welcome to real life sort of now the the part where I lose the room if I haven't done already so this didn't used to be possible but it is possible now I'm showing it to you in the interest of full disclosure the other way to do this is just to add the JIT decorator to the original thing I gave you which does work but it's you know it it's not as fast so if you are trying to eke out the performance you can you should take the time to unroll those loops I know it's not always fun then it can be really painful if you're like I mean right there's two hard problems and computing and off-by-one errors and all that but you the best performance you're going to get out of this is if you if you unroll those loops and make it look like C effectively or like you know but like like a compiled language if you just want a little boost I like furred like a one-off or two off or something you can try as a first step just adding an engine decorated or something and seeing if it works it won't always sometimes it'll break but it's worth taking a look at okay it does hand it does not handle yields I don't think unless they have unless that's it's probably on the on the docket for them to add in they added recursive functions last year because I begged them to because it broke one of my examples and then was a whole thing the application example was actually moving relative and by the promise of a little while ago to something that is a recursive function and it didn't work yeah so it supports simple recursive functions so if you're passing I mean as long as the function doesn't mutate itself or something like as long as sort of the basically the types need to be consistent that are getting passed recursively into it if those change for some reason to boom I don't know okay we're not I have a guys I'm here have already seen there's a bunch of notebooks I've labeled optional here I'm going to just tell you what they are you can look at them on you know in your spare time these are things that I think can potentially be very useful to just and it's more about like admin stuff and packaging and things like that that I'm not going to like kind of just you know slog through with you right now because it's not not that much fun but it's you just you have examples for how this stuff works the question that gets asked a lot you know is number as fast or faster than or slower than Fortran or siphon or see I have a comparison notebook of this same Poisson problem with all those my scythe on is okay there that people have made to make tweaks to it already so those other people have improved it to make it faster Fortran I don't know to take a shot at it I can't make it any better um basically they're the same is where it comes down to you know depending on any number of factors one might be slightly faster than the other but numba basically gets you roughly the same performance if not sometimes better as scythe on or or Fortran being loaded to F 2 pi where there's some extra overhead managing you know the way the memory is aligned and stuff like that but so if you want to check it out and see to know that it really is ok to never like write Fortran again it's there you can look at that in time also just in terms of sharing code with other people and this is I think incredibly cool you can compile a like a number function into something that's distributable like I mean like you can like include in a package and I have the code in here to do that and then how to you might install it and check it out there are there's a caveat and I'm going to really awesome upside the caveat is that when you do this first off it's only platform to platform so Linux to Linux OSX so actually maybe Linux and OS X can play together on that because they're both positive enough Windows needs to be compiled on Windows for this to work and any processor specific speed ups that like the LLVM would target you lose because it has to be generic and to work on a range of processors the upside is that the people on the other end using your code only need numpy to run it they don't actually need to have number installed they will actually be able to get that you know pretty good performance speed up without actually having to install you know what is you know a 70 Meg dependency give or take with LLVM and all the stuff that comes with it so it's just going to be aware of if you want to bundle this into a price you have and you don't want to like tack on a huge dependency although condom made that easier for us beware that there are options there then you can use them yeah okay cool so now let's open up notebook number seven here okay this is where number gets really cool this is the part I really like okay so you funks are universal functions they've been in numpy for under the whole time for a long time all right I mean a you func is at any time that you operate on an array and like you add to or I mean adding them as a youth I mean there's all sorts of you folks right in the numpy documentation is they actually have a walk you through how do you write your own you thunk and the example they have is this F of a is equal to a log of a over 1 minus a that's example this and the code they have for that is this okay this is how you do that but it's not actually this is how you do that only for doubles you have to do it each time for every variable type you want your you thunk to accept and as you guys if you guys have tried I mean you saw that list of supported like data types right and when from the number message I mean they did this a lot for a lot of things so this is the moment where I do like just to say like thank you to the numpy Debs for doing all of this for years and years like it's hard work we're also that's they think you to the number Debs for making it we never have to do it again because it's terrible alright so how do you make you funds without doing all of this we use vectorize so to start off with I'm going to import when I'm importing numpy that's fine importing math though and what I'm going to do here is I'm defining a function called trig again I'm not clever with naming that this only takes scalars mind you right madam at library only operates on scalars so it takes the sine of a squared multiplies it by the exponential of B and returns the product of those I mean it's the vainest gibberish right I mean it's mean it doesn't mean anything but you can run that we get two point two eight seven okay that's what that does if we want to operate on arrays though and we try this we're going to get an error because again math only works on scalars the math library should say the math itself works on working right so what do we do now is this is when we can use a vectorized right so we can import vector eyes from numba numba more chest and again you can see that i'm doing the same thing with the weird extra parenthesis and so you should be able to extrapolate from there that you can also use as a decorator but for the moments we'll do it as the the function call way and I'm naming it SEC trig and now a and D are still arrays and that works done that's it that's what vectorize does it takes a function that you you write a function that operates on scalars you add vector eyes and now it operates on a raise it is different and I don't remember how but yes yeah yeah and it yeah yeah so this without giving a signature so we've just called vectorized that anything else it's called the dynamic u funk just so you know that so how does that compare me now obviously if we were doing this and you didn't know but number you wouldn't have written that function using the math library you'd use none by that's going to make sense so if instead we write write that same operation right sine of a squared times the exponential of B but using numpy so again for reference our vectorized version tix on my computer thirty point five milliseconds and the numpy version takes 33 milliseconds so now we're actually are faster than um by not by a lot but faster yours might be a little bit slower they should be about even it's roughly where it comes down for something like this now it is possible though when using vectorize to specify the type signature you can tell number like actually you know you should expect these types of variables variable types so in that case the way we do it we say vectorize and then in those inner parentheses we give it a string and this string here is basically saying the out that float64 outside is the type of the return value and then inside the parentheses are the types of the of a and B that we're passing in and then you pass the function you want to vectorize if we run this what kind of speed up do we get now why do I tell you this does the reason I promise so I because the way the Python works that's the there's a positional argument that you need to fill and that the positional argument that it expects is a type signature so we give it to it because that's important once you have that you can then add a keyword argument called target one of those targets is called parallel thoughts on what parallel does so again same thing this is still the same initial trig function I was that was written using Matt like the math library I have my signature and target equals parallel and I'm calling it the DECA trig and now now it takes longer 8.15 milliseconds right so that's that's that's the ball game right there I mean automatic parallelization is supposed to be like a joke that doesn't actually work yeah there is also I'll tell you there's a target equals GPU or target equals CUDA or something that actually works a surprising amount of time if you have a capable GPU yeah but so this is to my knowledge the fastest and easiest way to paralyze things in Python it's going to let that sink in just yeah yeah it's going to it will automatically use as many cores as you have on your machine it's going to it'll take everything something to be aware of but yeah it works again to there is overhead I uses threading to do this in the background there's always a little bit of overhead and setting that up I mean if you if your vector eyes function is adding two numbers together and you do this it'll probably be slower just because the the overhead of setting it up in the first place might kind of cancel out unless you get sufficiently large enough problems right now I'm it's good to check both cool all right so also in terms of having two paths managers if you want multiple signatures like multiple types that your vector eyes functions can accept they need to be listed in order of sort of most specific to lis specific right you need to let the stop the right word to use but anyway like in 32 has to come before in 64 has to come before float32 so on so forth if you do that then your function will actually return the kind of the same type as you give it you know assume that that's what the operations so that that forces in this case integers to be dealt with as integers so you get two back as opposed to if you give it floats and then you can always check to see what how many types your vectorize function has by looking at n types and I'll tell you before because I find for okay anybody have any questions about that okay cool so let's do an exercise now this is going to be clipping an array there is absolutely already an umpire if we shouldn't have to do this but it's this exercise and so what you should do is write a function using vector eyes that clipz clipz the values you know such that that there's no value less than a min force I guess yeah nobody left the name in no value greater than a max right if you give it an array or if you give it a value and and the bounds okay any of you guys like five minutes for that everyone get there more less usually let's say yes okay no blanket well I'm still gonna shortly okay so this is what I have for my I called it truncate just put this taken and so this can little you know you guys have to think about things in a scalar way being expanded to you know to a non scalar way so but you really want to only operate on you right two functions you only offer it on one element at a time right because that's all that you that you are giving it right I mean vector is going to take care of the broadcasting and making sure that each element of the array gets applied to it in turn but what we are writing is a function that operates on scalars you know we give it a scalar a scalar min a scalar max and it gives us back something mm-hmm I think technically speaking mines mathematically incorrect since I say greater than and I set it equal to that it probably should be but you know what are you going to do and just in the interest of showing you guys how performance might go this isn't parallel or not I've vectorize mine twice one you once using the the parallel target and once not using it so let's see what happened so I have an array of length 5000 and I'm going to clip random always goes to 0 and 1 right so I'm just in a clip between point 2 and point 6 so it should be in range okay so the serial one took six microseconds my my bed is that the parallel is not going to be too much faster than that that's pretty fast yeah slower a lot slower I mean a lot slow I mean okay it's like 20 microseconds slower it's not a lot slower but but just by way of saying like it's sometimes parallel is not the right choice I mean if something is relatively small and you know five thousands pretty small as these things go especially for an Operations that's just you know if return like that's that's pretty light so be aware that yet parallel is not always the best choice often it is but not always so just you know test things out you know make sure you check after you do something and don't just assume that it'll work well but we have comparison always a good idea to check what the numpy built-ins do know that was unexpected I think last time numpy was still faster um again when you're talking about like 30 some odd microseconds like like CPU jitters coming into it so a figure on you know they're about equal and then just to I think this is where I found that the that the parallel performance already becoming worth it you know so again sometimes it may just depend on the size of the data you're passing in so in serial if we have an array that's a hundred thousand elements long and I do that in parallel and I actually get a pretty good speed up there that's pretty reasonable so it just depends on you know how big the you know the array or the data that you're operating on is and also on your computer and how many cores it has and probably a bunch of other things that I'm not aware of um numpy do here that's impose oh wait numpy okay I can never tell when numpy is doing things in parallel or not because like mkl is weird and great but it's hard to track okay so if you recall that we started this little you funk journey on the example that looks like this so now I want you to write this but in a cleaner way so just a second exercise on that page you can click here is just a vectorized function au funk that returns the log of a over 1 minus a let's give you a few minutes for that and then I'll I'll break it up so um so my log it you thunk looks like this know why I do that um yeah so I'm just I mean that's right return math of log of a over 1 minus a ok so it's sort of a just you you write the scaler operation that you see and then let number do the rest for you and it works yeah ok no no I think that's because somewhere earlier in the notebook I think I may have imported that I hope I have otherwise oh no Oh black magic okay yeah object I've never seen yeah it doesn't with the obviously if they're imported yeah I I meant to I switched all these over to strings just so where there won't be like missing in force or something but yes if they're imported it will work that way and actually either from number or from none by as well you can pass whatever I think I think the check that I mean whenever but yeah okay the last thing I want to show this is stock being true and either and the most recent release of numpy or in the upcoming release of numpy so I'm about to find out because I forgot to check earlier so we'll see but if I create a vectorized function that just takes it like to calculate the discriminant right so b squared minus 4ac and I make my a B and C sufficiently large say like ten thousand elements each then so this is calculating using the vectorized version of function I wrote 611 microseconds and then here I'm just just as numpy right I mean I'm just passing it b squared minus 4ac so I know that there have been recent improvements in the numpy to alleviate some of this one of the things that happens or usefully at least used to happen and numpy was that each of these operations creates a temporary array right you get a an array that's equal to a times C and then you multiply it by four and then you get an array of people to b squared minus 4ac and so those copies can I mean they're relatively cheap but they do to track performance a little bit I think that's mostly been eliminated at this point but if they're large enough to if those temporaries are large enough to say like take up your cache then you'll see this kind of performance change so num number does not do that right it just because it's already parse it out and those what it's doing it doesn't have to evaluate on the fly were so you can get somewhat surprising results like this where you expect that you know just like this simple operation should work the same in one or the other but actually you'll get a little more speed out of number a lot of the time the weird I would also even if you ran that I would just run Dell ABC because otherwise you might run out of memory and I give this to kind of big okay 12 et alright so I have this thing on using generalized u thunks that I'm going to skip for a reason that should hopefully become clear soon I think that they're no longer useful and I'll explain that soon so next I want to just go to this little tips and FAQ thing given and this you're welcome to run through the cells I'm going to kind of do this is sort of a demo mode explaining some stuff just little helpers and things to be aware of and potential pitfalls yeah so can vectorize college it function if you wanted to do that I forget the answer yes yes again good things to know right can a JIT function call a vectorized function right so I've defined vectorize add add back calls add yeah okay cool right so those things both work so number is usually aware of itself right I there is no we were having it was a question before about error checking or like raising errors I actually I was thinking about I don't think you can raise errors in an agitated function I don't think that works I could well be just yeah okay numbers amazing so number can raise there is indented functions that's incredible I learned two things today maybe three I would lost count that's preposterous but great news so good on good on the number depths for that you cannot okay I has they say you used to not be able to do like an is instance check if you want to like see like is this thing the type that I'm expecting or not so one of the ways around that is this thing called generated JIT which is yet another decorator and it's a little bit confusing but like in the instance I'm thinking of it if you wanted a function that would like some all the elements of an array and you just wrote that it would mean if you wrote that in JIT you would have to like loop over something and it would like or in this case if I'm just doing mmm some if M is an integer is not happy right that doesn't exist it doesn't understand that method is it's an integer so if you want to have a little more kind of what we call that protective coating something defensive coating there we go you can do this generate a JIT thing and in generated yet you can use is instant so in this case if M is of the type array return and then rather confusing you return a function that does the thing you want it to do not the actual thing right so I usually use lambdas for this so yeah so what I'm doing is if it's an array I return a function that sums an array and if it's not I'm just returning the thing itself this is still not even there are ways to break it you know six ways from yesterday but this will let you run this sum function on an array or on an integer this is sort of one of these little inner bits that sometimes you might might find a need for I've used it like once I think an actual code but it does come up so good to know about oh right I was I was talking about this before um if you have we were adding adding strings and things before one way to also handle this if you wanted to basically have different functions called depending on the type of your inputs you can do that so long as those functions are themselves JIT it they can be JIT in an object mode and be able to handle things like straight like adding strings or whatever but they need to be again known to number so in this case I have this add function I did it with no Python for numbers and then I just did it simply like regularly for Strings and then depending on what I do here I can have it return either the string concatenation or the number concatenation right if you want to disable JIT for testing or something but if something's broken you don't know where if you set this environment variable with a notebook you need to set it and then restart your notebook server works where needs to be in you know either be in the environment for it to work but that will let everything still run but it will stop JIT from actually working so I mean it'll make things slower but if you need to find out where something broke it's a good way to find where the bugs are I haven't had it bless my tried didn't work that way but um but things change that we're going to talk about yes remember that functions are compiled and so if you have global variables they aren't I mean or they are but they'll they're their global up to the point where you compile something and then they're not so if a is equal to 5 and then you have a function that's impure and you're adding a to an input that will work I mean it does work you get 12 but if you then make a 12 it still doesn't work actually 12 is a bad thing to do there because it's the same number and it seems like if you make a in 2547 it doesn't work right so a solution as I've written here is don't use global like don't don't do that it's a terrible idea but in the event that you have to you can force a recompile of a JIT function so I've updated a now to be 2547 i can recompile it and then 2554 so I know it doesn't yeah sorry good questions all but all right so we have about 30 minutes left I'm not going to take that long you guys get out a little bit early but I want to show you this oh yeah absolutely yes yes oh I like I mean if something is if you I mean you couldn't write a function that will do what vectorize does right and you just loop over all the elements and apply whether you know for for I in range lengua try and enumerate sing and then CI equals AI plus bi it's just simpler usually for me to write the vectorized calls if i know that that function is going to be called on each element of an array individually right if it's going to if it's if it's this element by element then I usually just use vectorize with the the added benefit of that that parallel target being an option if it's sufficiently expensive that you can parallelize having just said that about to contradict myself with this so yeah but yeah I mean I use vectorize for for simpler things it can be really hard sometimes to take a complicated operation and turn it into a vectorized friendly thing so I mean usually you know going to JIT is a good place to be and sometimes vector has to be a little bit faster so this part really is going to be Show and Tell because this only works on the development version of number right now but they in Continuum's continuing quest to make me look silly released a release candidate of this like 6 days ago so it should be out and about in the near future you can install the release candidate if you want later today don't do it now will kill the Wi-Fi from the number Conda channel so if you do the dash D number number that will give you this release candidate this is version zero point three four and they added something so here I have something similar to what we did in vectorize right we have this do trig so sine of x squared plus numpy cosine of Y and define two arrays each a thousand time at thirty one point four milliseconds seems reasonable we can get it and honestly we probably won't get much yeah so a little bit slower but about the same I mean it's a I mean um PI fast right that's sort of what we've taken away from this but they just added this thing this is a help from Intel and they've got a project called parallel accelerator and they just added a parallel keyword to JIT and it works really well duh so this is a very silly I mean it'd been the best way is so um so yeah so I mean really all I did I mean I I you know I I wrote this in more lines than I had to but if you just do at engine parallel equals true over a function it will automatically paralyze with caveats which I'm about to go into but it does work very well I played with it it's like a lot in like the last two weeks it's quite fast so I recommend it so we saw before if you unroll operations you unroll things you get it you get more speed out of it right yep sorry did I all right okay so this is unrolled so again not parallel this is unroll so 28 point 2 milliseconds compared to 31.4 very minor minor speed-up there you can paralyze the unrolled one and this is a little counterintuitive it's not faster um I know right like what's happening okay so here's the deal I've written here automatic parallelization is a pretty hard problem yeah this this automatic JIT parallel thing it knows how to hand it knows about numpy right basically it actually it paralyzes around numpy array operations right so that's why it's able to make this first example faster because I'm just doing with numpy like sine of x squared plus numpy of cosine Y it knows oh well assuming that X and Y has kind of compatible shapes users that are the same or that you know Y is a subset of X or some such in this in terms of shape then it can break that down and be like oh well chunk it this way and I'll send that out and have it be parallelized yeah so that's that's brand-new like really brand new and it's very powerful yes sometimes so I know I have a little FAQ that I made up no one's actually asked you these questions but I predicted some of them so let's see why didn't you tell us about this before it really is brand new and yeah numba keeps releasing things like five to ten days before I run tutorial so I've got it just I've got a shunt to my schedule little earlier maybe but um is regular just dead now maybe honestly I mean I haven't seen a performance hit from using this yet it either doesn't do anything which is fine because just already fast or it makes it faster so it may just be I mean we may soon end up with like a PNG decorator because everybody's just tired of typing parallel equals true I mean both have to see this is again early days but could happen is all the stuff about vectorized useless now no no it's not actually and so this is otherwise you know you guys would really really be angry at me right now so remember we did this before and I'm doing again with math so twenty point four milliseconds what we saw from numpy before if you run this in parallel and again is a little more work to do it in this in this fashion but that's six point three eight milliseconds versus what I said up here twelve so it's still I mean a factor or two faster than this so for the things that and again this is just from my experiments in the last you know week or so things that you can vectorize you should still I mean when we're talking about speed being the issue here you should still use vector eyes and use parallel so long as the work is sufficiently large that you get that benefit otherwise just use like JIT I mean all the time for everything because it's super fast gu fung said this actually generalize you func thing that I skipped also has like is for dealing with things that aren't easily vectorizable like if you need to have something like a stencil I can I minus 1 and I plus 1 but that is just basically being completely negated by this parallel operator and so it's it's not worth the pain of trying to figure out how it works because it's slower than this and then here again is that discriminate that we did before so numpy takes about six point nine milliseconds if we vectorize it it takes two milliseconds is it four sorry what size of these may be a thousand by a thousand if we paralyze that using vectorize probably a little slower it is on another branch that's going to emerge in shortly I didn't want to I didn't want to merge in changes like the night before people are it's all available though will be available so actually is a little bit faster this is again vectorized parallel on this discriminant thing and then if we just end it that two point two four milliseconds that's pretty good and again that parallel thing is not quite as fast as the vectorized but it's pretty close yeah so that's the that's the new development in numberland which is pretty amazing so that's all that I have for you guys are now I'm happy to take questions that you have and otherwise thanks for sitting through this I'm glad you come [Applause]

Info

Channel: Enthought

Views: 31,436

Rating: 4.9799333 out of 5

Keywords: SciPy, SciPy 2017, Numba, C++

Id: 1AwG0T4gaO0

Channel Id: undefined

Length: 145min 38sec (8738 seconds)

Published: Tue Jul 11 2017