C# LINQ Performance Tips #4 - Branch Elimination

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone so today we're gonna be talking about link performance tips this is video four out of who knows how many and we have a single point on our agenda for this video so we're gonna be talking about branch prediction and branch elimination and link in reality is just a vehicle to be able to show these mechanisms and apply them in link all right let's move on let's see some code so what we're going to be looking at today is a very simple function at least from now that's called called count odd and as you can see it's not linked yet so maybe let's explain a couple of things so while it is branch prediction branch prediction or much more branch predictor is a component in the cpu that has the responsibility to predict branches so what's a branch if statement is a branch and why this is needed in a cpu so as it turns out when we're executing code we're actually exiting instructions in the pipeline and these things uh waiting on like evaluation of a branch is expensive and we have to stall the entire pipeline so cpus have built branch predictors in order to be able to pick a branch ahead of time and based on some like data that we have from the past and that would allow us to not wait and you know schedule these instructions immediately the problem becomes when we miss that branch so we have a branch misprediction we pick the the incorrect branch so we have to then throw the whole pipeline away and load up a new one and that of course is expensive because pipelines are pretty big right now in modern cpus and uh it's a performance cost so it would be good to not even you know to do it correctly to pick the correct branches or even to force the cpu to be able to do that and second of all um there's a technique called branch elimination where we could not have any if statements in this for example code here of course this makes sense if that branch elimination is not is cheap because sometimes eliminating all of the branches can be expensive and that's gonna eat away at our like gains from branch prediction so that's that all right so let's test some things so let's test this function what we're gonna have is we're gonna have a sequence uh that's you know um will go from 0 to 20 and then it will roll back so it's a simple you know increasing sequence and then it's restarting and let's see how that performs so it took 17 ish milliseconds let's just run again 16 around 17 milliseconds all right so um that's i think pretty good but now let's test the same function but let's take that sequence and let's randomize it so now what we're going to do is we're going to switch the indexes of the values so it's a it's a random sequence now so it doesn't have a clear pattern or easy like to be easy detectable pattern so let's run it again and let's measure performance so as you can probably tell the performance got you know a bit it's not what it used to be and it's not great so it's like what four times three times slower something like that so um as you can hopefully see with this simple example um branch prediction is important for performance first of all a second of all branch prediction if you have a branch in your code it's going to be the performance of your function is going to be determined by your data if you don't do anything with the branch okay but let's now try and eliminate the branch from this function so we have another function with no branching in it and how it's built so each time you have an odd number in in like bit terms if you if you you know take the number and convert it into a bit code then what you're gonna see is um the first significant bit of any number if it's odd it's gonna have a one if it's even that's gonna be zero so we can use that and do an end operation with the item and if the significant bit is one then this is going to evaluate to one if it's zero it's going to evaluate to zero so we know how to count odd numbers without any branches so we can use this function here and let's test now if you know random or sequential data is going to have any effect on on the performance so let's start with the increasing and receding sequence and it takes roughly 10 milliseconds 9 to 10 and now let's do a random sequence again and let's see if it got worse no it didn't so it's actually what we expect so there's no performance differences between these two you know data sets now so that function is consistent it's not you know data does not affect its performance so that's good because it's going to be very stable in time and sometimes you want that characteristic in your code and yeah it's overall good but how does this have to what this has to do any in any way with link well let's see a function where we could for example apply um you know branch elimination and see let's see if it's gonna be you know working for us so we have a function called count and we have a list and we're going to count items that are greater than some number and that's some number i just picked at random it's nine so this function will will do accounts and let's quickly see how it's implemented just to be sure so count um you know takes an i enrolled and we're not going to attach any interfaces because we already discussed you know the cost of virtualization of these if statement new checks uh in previous videos so now we're gonna just touch um upon this this year code so it's gonna do a bunch of new checks and it's gonna just get a for each loop and it's gonna you know evaluate the predicate and if the predicate is true then it's gonna count pretty simple pretty simple function okay so uh let's let's run it and let's see how it performs so it takes 134 milliseconds roughly yeah give or take and let's now see how random completely random sequence is it have any like effect on this function so it has indeed an effect it's a bit slower it's not you know terribly slower but it is slower you can you can tell so uh what's the you know what's the solution can we have something that will eliminate this branch here can we do something about this well let's see so um i have a custom count link version here where um you know it does all of the same things but the implementation is a bit different i do these new checks as well and what i do here that you know it's not being done in that function is that i take the predicate and i do a little trick in that net because there's no other way to do it in dot net at least not for now where i'm gonna take the boolean and i'm going to convert it into an int and now i'm going to get the volume so there's no easy way of converting a boolean to indian.net but if you can take the pointer to it then you can convert that boolean to an endpointer and you can get the value and you're gonna get one or zero if it's true or false so um we can use that and eliminate the branch this of course is expensive it's not a cheap operation to do so this this is something that we have to keep in mind but uh at least you know we have we've eliminated the branch that would mean that our functions should be stable and with performance okay so let's um test this version so um let's run this version on us you know increasing sequence 117 milliseconds i believe it's a just a tiny bit faster than link who cares and now let's test it on a random data set so the random data set is pretty consistent let's run it again yeah 129 ish milliseconds so let's compare that to link yeah so as you can see it's faster it is noticeably faster okay so um i hope that you see the benefit of like branch prediction branch elimination this was like an obviously very simple and small example the last thing that we can do is we can you know come use something more robust than this measure function here and we can do benchmark.net where benchmark. has the added benefit of having a uh you know a bunch of configuration flags where you can do hardware counters so we can measure branch miss predictions and branch instructions in general so the test is pretty much the same thing we have two sequences random and increasing and they're global so they're gonna be set once so that's kind of not what we wanted but we can run with it for now and we just have two categories so sequential and random and we're gonna be counting the link version and our custom link version so uh let's run the test and let's see what our results gonna be okay so here are the results and you know they got a bit mangled so um let's like do this do this okay so here are the results so um the mean for the sequential versions in like custom link and you know just link is pretty much the same so the interesting bit to look at is branch instructions per operation and branch mispredictions per operation so there is a difference here but it's probably because you know count makes like an if statement somewhere like a new check that i might miss or something like that and you know still it's not much of a difference so that doesn't make a lot of difference but here if we have the random sequence then this whole thing is like tells another different story right so the difference is noticeable here it's around you know 30 percent and branch mispredictions as you can see it's like 150 to almost 2000 so the performance between the sequential version and the random version for our custom link is consistent because it's like 120 to 115 well the random version here is 127 165 so let me show you this maybe on this slide because i did did have a run again of this function so um as you can probably tell there is a difference there isn't like a measurable difference and it's the bigger the difference the more the branches you can optimize like that or you can arrange them in such a way that the branch prediction will guess correctly so you have to work with the branch predictor so in fact even if we knocked down like you know uh virtualization of interfaces uh structs not being on the heap we've optimized all of these things there's still things that we can do with link to make it even faster those are extreme but like i said the link in this video at least is a vehicle to tell you about these mechanisms and how you can apply them so we started with just odd numbers and we successfully applied that to link as well so um yeah i hope that you have now just a tiny bit better understanding of what it is uh i mean branch prediction and branch elimination um that's all for this video i plan to do more videos about link and there's gonna be probably a whole separate video about branch prediction and branch emulation and how it works and you know what not right um leave a if i made a mistake leave a comment if you liked the video you know leave a thumbs up subscribe if you think that this material is you know worth your time and that's all for this video so have a nice day and thank you bye you

Info

Channel: LevelUp

Views: 1,954

Rating: 5 out of 5

Keywords: C#, dotnet, dotnetcore, performance, linq, memory, allocations, csharp, lecture, programming, tips, tutorial, computer science

Id: gO3t9kauCwg

Channel Id: undefined

Length: 15min 3sec (903 seconds)

Published: Wed Aug 19 2020