Brute Force Processing

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello in this video I want to try and harness all of the power of my PC to solve a problem but before we get started three quick points number one this could be considered quite advanced for this channel but don't worry I'm only going to look at some things in light detail if you are interested in the specific details the source code is of course available and we can discuss it on the discord and in the comments point - I've no doubt some of the armchair experts out there are going to poke holes all through this that's ok because really I'm just taking a very broad look at some of the techniques available to us and point three yes the lockdown goatee is progressing just fine thank you very much so let's get started all program is at some point during the programming careers encounter a problem where they really wish they had more CPU power to solve a particular problem and I'm going to demonstrate this by rendering this Mandelbrot fractal now if you're not an expert on fractal geometry and I am certainly not an expert on fractal geometry don't worry what it really means is we're going to be creating some really cool imagery and we're going to go and explore them with this tool that I've created in the pixel game engine and the idea of a fractal is it's got lots of self repetition and those self repetitions can go on and on and on and on and on and you can just keep on zooming in on particular locations and this tool allows you to sort of explore things as if it was a Google map of the Mandelbrot fractal so as I keep zooming in I need to perform more and more calculations and we'll look at the basic calculations of the Mandelbrot fractal in a minute and we can keep zooming in until we actually run out of precision on my computer and I'm going to be using 64-bit floating-point double types and I think we're approaching that point now yes there you see things start to get a bit wavy and wobbly then pixelated and ultimately completely collapse she'll just zoom back out and the nice thing is it zooms to the cursor so we can really go and explore these patterns and see what's going on I'm also quite aware that this is exactly the type of video that YouTube doesn't like to show because there's lots of noisy pixels going on it can't compress it very well and it will require more bandwidth and during this period of lockdown bandwidth is at a premium so apologies in advance if it does look a little bit blurry however I'll upload the binary alongside the source code so you can run it on Windows at least let's go and have a look here lots of detail going on in this particular section and the the tool is quite an interesting tool because we can actually choose the amount of detail we want things to render too and by increasing this level of detail we require more and more computations and so what I want to do in this video is compare that various different approaches to solving this rendering problem and I hope to take you on a journey from simple levels of computation to extreme levels of parallelism now of course if we're doing a video that is primarily going to be showing fractals it's useful to know what a fractal is but they are quite a complicated thing and if we look here on Wikipedia at the formal definition of the Mandelbrot set it is defined as the Mandelbrot set is the set of values of C in the complex plane for which the orbit of zero under iteration of the quadratic map Z n plus 1 equals Z n squared plus C remains bounded indeed we've not dealt with complex numbers on the channel before and I've no intention of starting now but we can take a rather simplistic look at it the complex plane really consists of two axes the real axis and the imaginary axis and we can completely consider that x and y and the objective is that for every point on that plane we repeat a calculation until that calculation satisfies some constraint and when we visualize the Mandelbrot what we are actually visualizing is the number of iterations it took before that constraint became satisfied this means we can start to reason about how to draw these things we know we've got a screen and in this case because I really want to test the processing capabilities of my computer I'm actually using a high-definition pixel game engine application I've got 1280 pixels across and 720 pixels down the screen is of course in pixel space or screen space but I want the boundary of the screen to be represented in the fractal space and so at all times my screen is actually going to reflect a small window in the entire domain of the complex plane or the 2d plane that represents our fractal and just for illustrative purposes we can assume up here for example I've got 0 0 and down here I might have something like 2 4 so what I'm actually drawing in the background is in a world space and we've frequently done screen space to world space transitions before on the channel and in fact if you do want to look into that you can look at my panning and zooming video because I'm using code from that video in order to implement well the panning and zooming as we move around the world we effectively change these coordinates so we could end up in a situation where the entire distance across the screen is really say naught point 7 in our domain of the fractal and going down might be something like naught point 5 so for every pixel in my screen space I can work out to the position in world space by taking this distance and dividing it by the number of pixels therefore I can create a mapping that will give me any pixel in my screen space and give me the precise floating point coordinate in world space and what we will do then is calculate what's necessary to tell us what the fractal looks like at that coordinate one of the problems we'll encounter with phat tools is as we're zooming in these numbers get smaller and smaller in fact it's not uncommon to start seeing numbers like this because fractals can just keep on going in certain regions eventually we're going to run out of floating-point precision and they demonstrated that in the introduction when everything fell apart there are of course techniques to try and remove this limit of precision but we're not going to be looking at that today so all of my floating-point calculations are going to be C++ doubles which are 64 bits to try and give us the most conventional precision we can get the Wikipedia article goes on to detail some pseudocode for what is called the escape time algorithm for plotting a fractal where for every pixel on the screen we do the following calculation and what we can see fundamentally is there is a calculation that contains a loop with an iteration counter and as I said before we're going to be visualizing the number of counts that there are before this condition becomes true and internally in that loop the maths is actually very simple and I guess this is what is so wonderful about fractals and why people enjoy them is how can such simplicity deliver such complexity but that's definitely the topic for a different Channel I think the point I'm trying to make is it doesn't really matter if you understand fractals or not what's important is it's a problem to solve and it's a computationally tricky problem to solve and we're going to look at developing this application from various different processing techniques to try and solve it more effectively so this is a skeleton pixel game engine application well not quite a skeleton it's actually fleshed out with a fur few things already and you can see I'm creating it at 1280 by 720 with a one to one screen ratio this means it will be quite a nice high-definition application and it also means we've got a lot of pixels to process and we'll just have a quick look at what the code is doing in on user create I'm allocating some memory that's going to store the count of the iterations per pixel so this array is the same size as the screen but the action happens in on user update the first thing I'm going to do is get the location of the mouse and then I'm going to implement precisely the same code I used in the pan and zoom video that will convert the mouse location into the world location and handle zooming with the mouse wheel or the Q&A keys once I've got my transformation from screen to world I can work out where I am in the fractals space and I do that by taking the top left pixel and the bottom right pixel which of course is zero zero and screen width and screen height and I use the screen to world function again from the pan and zoom video to translate that into this world coordinate system so somewhere in fractal space I've then got a little bit of user input so I'm going to use the numeric keys to switch between processing methods and the up and down arrow keys will change the number of iterations we're going to calculate and here is the first important part instead of using frames per second as a performance metric I'm actually just going to time with a high resolution counter how long it takes to call this function create fractal basic to which I give the pixels in the top left to the pixel at the bottom right the fractals world top left coordinate and the fractal world bottom right coordinate and the number of iterations so we'll have quite a few of these functions all with the same set of arguments it's basically saying please generate the fractal for this region once the fractal generation has completed for that frame I then need to draw it to the screen because so far all I've got is a 2d array of integers which are counts the number of iterations that that particular pixel required to satisfy that constraint so I'm going to have a pair of nested for loops that iterate through every single pixel and I'm going to colorize those pixels and a big shout-out to Erickson because what he doesn't know about fractals simply isn't worth knowing he provided me with this little colorization function which can take the number that's taken from the fractal array and using some magic maths converts it into a nice color and that's how we got these nice smooth transitions as we were zooming through the fractal at the beginning so thank you Erickson once we've drawn the fractal and then going to render some information about how long it took and how many iterations and the rest of the class just contains these will to screen and screen to world functions which handles the panning and zooming there are two reasons I'm not using frames per second as a performance metric firstly I'm uninterested in how well pixal game engine is performing in this task I really just want to time how long it takes to generate that array of iterations to tell me what the fractal looks like but secondly and this is a trap for new players frames per second can be a little bit misleading and may lead you to try and optimize your programs in an incorrect way we know that frequency is related to one over the period of time the duration of the frame and so if we plot this frequency or let's consider it frames per second versus time we get a relationship like this up here let's say we've got a really high frame rate 500 and we make a change to our algorithm and notice that it drops to 400 frames per second naturally we can calculate the time difference here simply by looking at the two values and that was for a drop of 100 frames per second let's assume we have 200 frames per second dropping to 100 well again we've only dropped 100 frames per second but the algorithm is taking much much longer and this is why it can be a little bit deceptive to consider frame rate as a metric of performance it is accurate but it's not very intuitive to think of it in this way dropping 100 frames per second from 200 frames per second is not the same as dropping 100 frames per second from 500 frames per second and this is why I prefer to just simply take frequency out of the equation and measure time because if something takes twice as long as something else it takes twice as long as something else there's no other way to interpret it so here I use the rather wordy chrono library to give me two time points using a fully high-performance clock and the only thing I've got going on between those two time points is the generation of my fractal so let's take a look at this function and this is my first passed most basic naive implementation of the Mandelbrot fractal create fact or basic now to all of these functions I'm passing in the rectangular region of pixels which in this case is going to be the full screen and then I'm passing in what region does that correspond to in the sort of numerical fractal domain and then I'm telling it what's the maximum with iterations you can go up to because I do actually want it to finish I don't want you to just keep computing forever and ever and ever I can work out the scaling factor based on the number of pixels as I mentioned in the slides at the start and then I just simply iterate through every single pixel in the required range and I'm going to be using the standard complex number format because we're working with complex numbers to implement the Mandelbrot calculation I initialize one of the values based upon its location in the 2d space and I initialize the other value to zero and then I sit in a rather tight loop modifying this Z value until the absolute value of Z is less than two or I've gone beyond the number of iterations I've allowed for this calculation either way eventually this loop will stop and I can store the number of iterations in my big fractal array this is basically a direct implementation of the pseudocode provided on the wiki page the only difference being is I've used this inbuilt complex number type provided by the standard template library so let's take a look right well I've actually took quite a while to start so I'll just use the middle mouse to pan over the fractal and oh dear as you can see it's very slow it's taking about 2.4 seconds in this case now we've got more information going on here it's taking almost five seconds to update the frame hmm time for the first stage of optimization and some of you may laugh because you've probably noticed what the problem is but have you actually compiled your program to run in release mode and not debug mode a lot of new people particularly those just coming to visual studio will click that play button and not realize that by default it's actually going to run it in a specialized debugging mode which is a lot slower than the release code you would give out to customers or end-users of your software so let's see what difference that makes on Linux systems it's not quite the same but you can specify a compiler flag to achieve the same result so now we're building in release mode 64-bit release mode in this case let's see what happens now okay that's a lot more sensible we've got a degree of real-time control over what's happening we can see that given this particular set of equations to solve it's taking about one quarter of a second to generate the 1280 by 720 pixel image and it's allowed up to 64 iterations maximum per pixel so I can zoom in with the scroll wheel and we see the time doesn't change with the zoom but eventually we just sort of run out of things the the fractal seems to just stop and that means we need more iterations if we want to keep on zooming so I can press the up key and give it another 64 iterations of calculation this is of course significantly affected up time it's now taking about half a second per frame and again a fractal has stopped prematurely so really we want 192 let's just keep going 256 iterations 320 it's grinding to a halt now so almost about one second per frame which does start to make things a little bit unusable however what we can see is we are rendering a Mandelbrot and it to all intents and purposes and to my fully ignorant eyes looks okay so for now we can assume it's taking about one second to do three hundred and twenty iterations and if we look at this little graph over here on the right it's a fully saturated one of my CPU cores now I have a four core processor and that has another four virtual cores so there's eight cores available and I've completely filled one of them but before we worry about parallelism are there any optimizations we can make already well if you take a naive look at the code and perhaps you're just starting out you've not built up any experience yet there are certain things you might think will help you out for example for every single pixel were calculating this multiplication and this multiplication and we're doing one down here and these are pretty much always going to be the same so are the things that we can pre calculate to take out of this tight loop we're also doing other things like initializing this X variable every single time so let's create a second function where we can explore pre calculating as much as we can the scaling side of things hasn't changed and I'm going to stick with my loops but I'm going to try and declare everything before we enter the loops perhaps that'll make a difference here are my variable types and here are my two loops looking at what's going on with this calculation here there's a linear relationship between the pixel position and the value that I'm constructing this complex number with in that case instead of calculating it each time I can perform a running accumulation as we iterate across a particular row or a particular column so I'm going to store two additional variables which represent the start of my fractals rectangular region in space and I know that the value of my X&Y scale variables is the equivalent of how much fractal space do we change if we move in one pixel in one axis so as I iterate across the screen in the x-direction I can simply accumulate the X scale value to give me the equivalent x position in the fractal space and I can do the same outside in my Y loop I must remember though that as I'm iterating through wise I need to reset my X's this means in the loop I can now initialize my variables directly there's no calculation required so we have reduced this repeated calculation for every single pixel into simply an addition there isn't a great deal I can do to change the nature of the fractal calculation just yet so that remains the same but we do need to store the result the number of iterations for that particular location in the original this is where we were doing that and we can see we've got another multiply in there perhaps we can pre scale so we don't have to do that calculation every pixel I'll add in two more variables pre calculated one is called y offset and one is the size of a row in pixels because every time I move along an entire row on the screen I effectively move down one in the y axis therefore for each Y iteration in my loop I'll increment my y offset and this means I can see we store the number of iterations in my fractal array now using a simple addition I'll now quickly add in the code into our framework to call this function instead so let's take a look so here I have the original method before I'm just moving the Mandelbrot in so it's actually got things to calculate and for sixty-four iterations using the naive approach it's taking about naught point two five seconds maybe a little bit less okay so I'll press the two button and we'll move to our new pre calculated method and it's taking well about not 0.25 seconds a little less ie we've really not achieved anything at all by doing this well that's quite disappointing but it's not entirely unexpected I naively assumed I could do a better job than the compiler that trying to optimize the code and indeed I couldn't indeed code performs pretty much the same it probably is marginally a little bit quicker because of the way that we're doing things is slightly different but it's barely measurable it's no good and indeed if anything all we've successfully achieved is making the code more difficult to understand so there's a little lesson to be learned here don't try and outwit the compiler unless you really know what you're doing we've gone from reasonably simple code where it's quite easy to see what it's doing we're going to loop with some calculation going in into something well it's a lot more fragmented and nowhere near as intuitive to understand what's going on I know what the problem is it's that stupid standard template libraries rubbish isn't it everybody says it's really slow and bad so I'll get rid of this standard complex and do it myself yes that's gotta be a nobody likes the standard template library if that must be the source of all my problems so let's create another function and we'll hand code the mathematics this time I'll call this function create fractal no complex again exactly the same arguments even though this code was a little trickier to read it probably did actually buyers a slight optimization if nothing else it got rid of some unnecessary multiplies so I'm going to keep a similar store yeah and I'll start by simply copying the code we've got already it's my intention to remove the standard complex component to do that would require you know a little bit about what a complex number is but fortunately all we're doing with the complex numbers here that is slightly strange is working out the absolute value and squaring them and we'll look at that in a minute but for now I'm going to replace that simply with direct double values that represent the real and the imaginary components of the variables we had before inside the loop I'm going to replace this line with this exploded maths complex numbers have their own set of rules for doing certain operations I can assure you I'm doing nothing particularly fancy here other than implementing those rules directly what was said is now Z real and Zed imaginary but we also need to change the nature of our condition we wanted to work out the magnitude of our complex number we normally call the ABS function but I can replace that with the following now all that's left to change is our initializations so how we managed to go from absolute of Z is less than two to this calculation is less than four for all intents and purposes we can consider a complex number to be a 2d vector has an X component and a Y component and when we think of it as a vector it's far less scary because we know if we wanted the length of the vector its magnitude we know that the magnitude of a vector is the square root of x squared plus y squared it's Pythagoras the constraint in our fractals while loop is fundamentally checking to see if the magnitude of the complex number is less than a certain value in effect I'm looking to see if the following is true well given that we know that x squared plus y squared must always result in a positive number I've absolutely no need to do this square root at all if I square the other side of the Equality therefore I can quite happily check for the occurrence of x squared plus y squared is less than four this is a nice optimization we've removed the potential to have a square root calculation everything else in my create fractal no complex function is the same as before so I'll add that into the program so let's take a look and it's starting with the naive method I'll just bring that back in to give us some more data and it's taking about 0.2 1 of a second and if i press this 3 key we'll bring in our no complex implementation and great oh we must be so much better than the standard library we've halved the time required perfect so that means in theory we can do twice as many iterations now and maintain the same level of performance that we started with well as much as I'd love to be able to lay claim to being a better programmer than the countless number of experts that have provided the standard library unfortunately I can't I don't believe actually taking out the standard complex components has optimized this code and I'll show you why it is very tempting to suddenly think that you're a much better programmer than the rest of the world but in reality we've not changed anything here at all the standard complex operators will simply resolve to what we've already got so there's no real game that the compiler is smart enough to do that the one thing we changed was the nature of this condition we've changed it to two multiplies and addition and a comparison and if we look in the original implementation actually what we were looking for was the absolute value of Z being less than 2 for our needs conceptually we have optimized that function however what we have done is sacrifice the generic ability of the absolute function to do its thing and that's what the standard library is pretty good at this function will guarantee that it provides us with a sufficient level of detail and precisely the right solution in all situations our method doesn't do that it simply gets the result directly what we need in order to satisfy the fractal calculation and we can test for this by using the peek code function in visual studio if we actually have a look at the definition of what is the absolute function it calls the high pot function and if we have a look at that and have a look at that again once we get through all of these different layers fundamentally it calls the high pot function it's quite useful to look at these standard high pot function it's part of C math this is implemented by the C standard and of course right now I'm using the microsoft visual studio compiler whatever implementation it's going to be using for that function I can't readily access that but what I can do is look at how the GNU implementation works and this is suddenly very revealing whereas we had a pair of multiplies and an addition the actual high pot function is considerably more sophisticated with all sorts of checks and still has the potential for a square root as well there is a reason experts tend to write these functions and there is a very good reason you should use them at all times but occasionally it is useful to know if you can get away with some specific reinterpretation of the data you are using you can actually gain some processing performance but in turn we have sacrificed the flexibility of the functions we want so in effect so far all we have done is get rid of the square root in this tight loop and we've doubled our performance so it seems to open quite a lot of effort to get to a rather unsatisfying conclusion I think it's time to bring out the big toys to play with and we're going to look at the extensions provided by most modern-day processes that allow us to manipulate numbers in parallel and I don't mean threads just yet so far we've been using the fully bog-standard arithmetic logic unit component of our processor this allows us to provide two numbers an operation and it spits out a result and we've been asking it to do that as fast as it possibly can whilst we're developing the fractal what you may have noticed is right now there's no need for data to communicate back with any memory we simply read from memory perform a computation upon it and then ultimately write that result back to the memory and as we're generating the fractal we're doing that one pixel at a time all the way along down and then all the way along again games developers have pushed CPU manufacturers for the last 30 almost 40 years now to develop better and faster hard work and one of those things happens to be called a vector processor and these have existed in various forms over the last 20 to 30 years we used to have what was called the SIMD streaming extension set or I probably got the order that the wrong way around and these days we have something called a VX and specifically for this video I'm going to be looking at a VX 2 5 6 and these are quite useful because they're very flexible and they allow us to implement quite fine grained parallelism into our program without a great deal of effort let me explain why it's a bit different they are literally a processor extension and so conceptually what I'm going to draw here is an extension to our ALU and what I want us to imagine is this instead of just having the one ALU now we have four of them and each one of these al use can take in a pair of doubles and each one of them can spit out a result but they all perform exactly the same operation so if the operation is an ADD all the Al use or perform an addition specifically the vector extension sets work with memory of a specific size I'm going to be using a VX - or I call it a B X 256 because that means it has a single register which is 256 bits long so we'll start with naively bits 0 over here and bit 256 somewhere over here one of the nice features provided by this ecosystem is I can use these bits partitioned in various ways I want to process doubles and my doubles are 64 bits each so I know I can fit four doubles into a single register for this extension 64 128 bits 192 256 I could however work with less precision and work with 32-bit floats which means I can operate on eight numbers simultaneously or with the same operation but fundamentally or with unique data and in fact you can go even further if you're working with integers and you have a 16-bit integer type then naturally you can start fitting more of those into this vector processor and this factor processor follows what is no as Cindy single instruction multiple data regardless of how we've partitioned up the vector the individual data's inside the vector remain distinct but they all have to operate with a single instruction so single instruction multiple data if I now implement our fractal routine using a B x2 or maybe x2 five six I'll be able to process for 64-bit doubles simultaneously I'll also be able to take advantage of some bonus functions you get as part of this extension there are instructions for example that allow me to multiply and accumulate at the same time and all of these instructions are guaranteed to happen fairly quickly in silicon they could in many cases be far faster than the standard ALU way of doing things so it's probably a good idea when we're working with large data like this but with simple instructions to think about using the vector coprocessors on your cpu to start using such weird and wonderful components of your processor you need to include the right library and for a C++ it's the intrinsics library because I'm not going to be writing this directly in assembly fortunately this library provides some convenient C and C++ wrappers for the individual function so they do actually compile down to the most minimal assembly you need but remain quite convenient to use for regular C++ programming so I'm going to add in yet another function create fractal intrinsics and as before it's got a similar start up and I'm going to simply cut and paste in the intrinsic equivalent of the fractal generation algorithm kaboom yes this is one of those rare occasions where I'm just going to ask you to take my word for it I don't really want to go into detail in this video about what's going on here if you are interested in how I've gone from converting the very simple fractal routine into using these intrinsics that do shout out in the comments below and I'll make a video specifically about that conversion process but what I want you to take from this is that it does work it's converted over to intrinsic and I am indeed going to be operating on four pixels simultaneously with each instruction so yes I've no plans on detailing precisely what's going on here but if you are curious and do look at the code what happens is our 256 bit registers are specified as being comprised of four doubles all in parallel and so all of the calculations to those doubles happen in parallel - in this instruction I'm multiplying four expose offsets by four X scales and storing the result back in X pass offsets you can think of it each time as working with a little array of four although things start to get tricky and that's why it needs to be a specific video rather than going into the detail is you can't implement if so the condition that our fractal stops on is significantly more complicated now requiring a little bit of binary mathematics in order to implement you can see the loop is actually here using this go to and repeat label and once it's finished looping we'll unpack the four values in our special 256 bit register into the fractal array at four adjacent locations but like I say worthy of its own video if there is enough interest and if you if you are interested do shout out in the comments below so time to add in that method so let's take a look here is our naive approach again again bring it to the middle sixty-four iterations about 0.25 of a second if I now compare that to using the AVX instruction set suddenly we're now an order of magnitude better so this has been a real performance change in fact you should always try and measure their performance in orders of magnitude so this gives us the real ability now to start navigating the fractal and we can see we've very quickly run out of iterations so I'll ask you to compute some more and it can happily do that without sacrificing too much performance so this is very nice as we start to zoom in and the demands become more add in a few more iterations we do see that it takes a bit longer each time I've changed this now to 1,024 iterations and we looking at updating the entire image in about 150 milliseconds in fact at this level it's quite possible to start reaching the end of our floating-point precision but as we zoom in and things require more and more iterations we do start to slow down so now we're at about a quarter of a second so this is originally at the time of our naive method to do sixty-four iterations let's go back to our naive method and see how it can purrs so I'm just waiting for it to do the computing and now it takes about 1.9 seconds to actually update the screen in fact two seconds in that instance that's 1,024 iterations and if I go back to a VX implementation yeah a good order of magnitude better so it's still quite usable for exploring the Mandelbrot fractal and you'll see we're still only using one CPUs worth so that's actually quite a good improvement but I do have more than one CPU core so let's have a look at using now our very fast vector extension implementation but in multiple threads naively this is quite an easy thing to implement now using some of the more modern C++ features threading has always been quite tricky but the the newer features I say newer I really mean sort of as of C++ 11 they they do actually make your life a bit easier in this instance and so are another function called create fractal threads at the moment we're trying to process the full screen using a single thread threads allow us to do things in parallel and they'll occupy more and more CPU cores in order to do that one of the advantages we have working for us here in fractals is that we never need our threads to actually communicate any information with each other nor do they interfere with each other's memory so in reality working with fractals and threads is quite trivial I'm going to take my entire screen and then partition it into the number of threads I want to test I'm going to partition it in columns so if I had four threads working for me I'd have thread 1 or thread 0 over here working on this column one two and three and they'll all go away and chew on their data and they might not all finish at the same time but I'll wait until they have all finished and then I'll proceed and that's going to be the bit that we time since we already have a function for calculating the region of a fractal all I need to calculate is what is my sub region now that I'm going to allocate to the thread and I'm going to add to the program a max threads value do that at the top I'm going to specify 32 threads now you might think oh you can't have 32 threads if you've only eight virtual course well I can because my threads are not all going to finish at the same time and we'll see why when it's starts rendering in my function I'll create a small array of threads and then for each subsection in my screen I'm just going to create a thread calling our create fractal intrinsics function we created earlier but the only thing I need to change are the arguments to that function so we can specify it only computes a particular region of the fractal it's kind of amazing how these things come together isn't it by virtue of creating the thread it will also start to compute it so I'll have a second loop which waits for all of the threads to finish and that's it we can now add this into the program so let's take a look and I'll start actually by going to our AV x-2 implementation so I can move things around a lot more smoothly than I could before and I'll press the 5 key now to implement threads and well we've got a small performance increase go back to vectors and this is threads but it's not a huge one wonder why that could be let's zoom out a little bit and find somewhere else to look at I think we might need to zoom in quite a bit so let's add some more iterations I'll stick it back on threads mode as well that would help wouldn't it we'll give it a thousand and twenty four iterations I think that should now be our benchmark so a thousand and twenty four iterations and I'll just quickly compare that with the vector method naught point naught four versus naught point naught - so we've potentially we've doubled our performance by moving to threads now the reason it's not consistent is in this particular instance for example we've not really gotten much to do a lot of our threads we'll just hit the end condition whereas over here there might be a lot more to compute and in fact it's spatially dependent were partitioning our screen into vertical columns processed per thread so some threads are definitely going to finish before others and this means depending on how busy one particular column is on the screen will actually be the slowest part of our algorithm and everything else will be dependent upon the complexity of that column let's keep zooming I love fractals so you never get bored of this because you can always find something different or some interesting pattern maybe we're getting to the point adding more iterations worthwhile I mean we can still keep computing these iterations and but of course our while loop will exit at some point so we might not actually always compute that many that's one of the difficulties in benchmarking something like this you really do actually need to understand what the fractal is doing and which regions are going to require more processing than others in order to do a fir comparison but for our simple needs this is good enough in fact we can now reach the limits of precision and you might not be able to pick this up in the YouTube video but you could you can on here certainly see vertical columns appearing in the fractal data in fact they'll be 32 of them one per thread and that's because of we're getting to the point now where we have limited precision on our double values and so some of them are duplicating and there's a little bit of error that's propagating from one region to the next purely spatial error don't forget the threads do not communicate with each other but there could be a duplication of computation and so that precision is highlighting the boundaries of our threads because we can go into sort of oblivion here and it just completely falls apart let's zoom all the way out because it's nice I like watching these things that's one of the reasons I've also mapped the Q&A keys to the zoom because doing this with the mouse wheel makes your fingers a bit tired but also interesting to look at how the performance changes over time so you see how it's actually getting quicker when we're not quite as far into the fractal and now for some reason it's got really really slow that's because there's a large region of the fractal it's doing all of this computation but never hitting the constraint so this is potentially our worst case scenario we've got half the screen here on the left that's literally just maxing out that it's always hitting the maximum number of iterations if I increase that number you can see the color changes to tell us that it's hitting that maximum of iterations and the time taken is increasing I'll just compare that with the vector method now and the vector method C doesn't actually buy us anything at all it's considerably slower than the threaded method it's go back so the threads of vectors is working very well for us we can even look at the CPU chart here and see the difference if we go to a purely vector based implementation and this is really chewing through the numbers because even though we don't see any detail here it's had to calculate everything up to the maximum number of iterations we allow here with vectors it's saturated the one core but if I go over to threads hold down the button we've moved over to threads it's now starting to occupy a lot more of my processor this is good we're getting getting value for money finally out of my CPU there is something else worth noting here and that's this little window at the bottom and I apologize if the one lone coda logo obscures what I'm looking at but it's here we're constantly creating new threads and those new threads are exiting and this does have a performance overhead spawning a thread these days is a lot faster than it used to be there is still an overhead attached to it so what I'm going to do next is create another function which uses a thread pool a thread pool is a collection of threads that are already running and they're just waiting to be started they're literally itching really trigger-happy to get going computing some data and they're very useful for just chewing through large amounts of data if you know all of the constraints in advance so for my thread pool I'm going to have an array of threads and all of these threads are going to be stuck in a while loop waiting to be given a trigger to start this means the threads are always in existence I don't need to request the operating system create a new thread handle for me and pass it over and give me the appropriate stack and do all of the things it needs to do when normally creating a thread and cleaning it up instead I do it once at the start of my program and I just tell it to wait until I need it to do something we're going to remove all of that overhead and so now I'm going to add our final implementation and it is probably the most complex implementation and again I won't go into too much detail regarding some of the thread handling mechanics that are going on create fractal thread pool is going to be a bit different to all of the functions we've had so far it needs some stuff set up in advance and just to keep it distinct and so I don't contaminate the rest of the program I'm going to create a structure called worker thread which contains the parameters we've typically been specifying as arguments to the other functions up there it also contains a handle to a thread and some pointers to the outside world again not going into too much detail when I want the thread to start and I don't mean start as in OS start to start actually crunching through the numbers of the fractal I'm going to call the start function which quite deliberately looks exactly like all of the other functions we've been using except this time it proposed the structures local variables with the correct data and triggers a condition variable and that's some multi-threading magic item which tells a thread to actually get going with something because the actual creation of the fractal data is going to be occurring in this thread phone and this is going to be happening all the time this is the while loop I was telling you about in the slide our while loop effectively sits waiting for the condition variable to be told to start once it receives that command it's then going to go and implement the intrinsic function I've just copied it in verbatim here this means before I can use any of the threads in my thread pool I need to have an array of them and I also need to have initialized some of the properties so I'm going to use the same max threads variable I was using before and I'm just going to iterate through that array of worker threads set up some of their information and start their threads I'm going to call this initialize thread pull function in on user create now in the create fractal thread pull function as before I need to allocate a particular section to a particular thread in that pool but instead of creating the thread explicitly I just call the start function as part of my worker thread structure but I now need to wait until all the threads have finished and whereas before I would wait for the thread to literally expire and see when I can join with it again I don't have that option now because my threads will never close down unless I explicitly kill them and so to implement this I'm going to add in a standard atomic integer which is sort of it a thread safe variable called n worker complete every time a worker thread has completed working on it part of the fractal it's going to increment that variable therefore in my function if I set that value to 0 before starting the workload and then I simply sit in a very tight while loop waiting for that workload to complete I mean this loop literally does nothing other than waiting for that value to indicate that all of the threads have completed their work no doubt there is a more elegant way to do this but I'm just conscious of video length and so for the last time time to add that function to the rest of our program great fractal thread pool now I've also added some code just to clean up after myself all those threads need shutting down and memory needs removing that's all available in the link from the github in the description below this video but for the last time today let's take a look at what's happening so here we've got the naive method on the starting screen not even going to move it sixty-four iterations about naught point naught 5 I'll move straight to a vector extension so that's not point naught 6 I forego 2 threads while threads is a little bit slower at the moment because we the reasons we just discussed if I go back to the thread pools thread pulls very fast so about 1 millisecond per update using the thread pool approach so let's throw some more iterations at it we've been standing around a thousand and 24 I think is a good number seems for everywhere in the array we'll compare that back to regular threads against irregular threads now taking about a hundred milliseconds thread pool taking about 60 milliseconds so we can zoom in and start enjoying exploring the fractal see at any point we can actually reduce the number of iterations and see what effect that has we can see does it does require iterations as you are zooming in I will just keep going it's a nice fast zoom and now this is interesting could we have hit a limit here so let's get that right back up thousand and twenty four iterations so I think with this application we can confidently explore a fractal to the maximum of the 64 bit precision of the double that we have available and we can also explore these things and are pleasing in your framerate contrack arounds go and have a look what's going on down here so that just looks like a never-ending spiral I'm sure YouTube is hating me for this tell you what let's go and explore one of these arms cuz these look a bit different see look how fast it is now assuming in everywhere we'll just keep going keep going yes that was really nice so it took no time at all for us to get to the point of oblivion lots of repetitions of the mandelbrot itself of course fractals are great I mean I've been playing with this program all week and I've still not got bored of finding lots of interesting different shapes and phenomena it's really nice just being - zooming around and playing around with it and trying to experiment with the colors as well and again a big THANK YOU to Eriksson for providing with the color code in the github unusually for Windows I'll upload the binary of this if you do want to play with it without compiling it yourself of course just be careful scan it and make sure it's right for you to be running some software which is unsigned because I don't have a signature for these things it's probably the wood texture because motor tests are doing vast amounts to de Fonteyn before to a particular color so it's not because things be tormenting with that's the beauty of optimizing like this is there is always something else you can do and so that's that a fully whirlwind tour trying to brute-force process your way through a problem lots of detail has been overlooked in this instance and we haven't looked at sort of the mic optimization things either what if we did everything in assembly language to begin with could we actually outwit the compiler lots of interesting questions to try and answer anyway if you've enjoyed this video a big thumbs up please come and have a chat on the discord the source code and the binary for this is available from the link below in the description on the github and I guess stay safe during the lockdown take care and I'll see you next time

Info

Channel: javidx9

Views: 125,896

Rating: 4.9773226 out of 5

Keywords: one lone coder, onelonecoder, learning, programming, tutorial, c++, beginner, olcconsolegameengine, command prompt, ascii, game, game engine, pixelgameengine, olc::pixelgameengine

Id: PBvLs88hvJ8

Channel Id: undefined

Length: 52min 49sec (3169 seconds)

Published: Sat May 16 2020