C# If Else Internals

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

One small correction in (17:26)

It turns out that you don't need a function, but anything that *can be resolved at compile-time*. So const int Flag will work as well.

Proof Here: https://twitter.com/badamczewski01/status/1311271077935546368

👍︎︎ 5 👤︎︎ u/levelUp_01 📅︎︎ Sep 30 2020 🗫︎ replies
Captions
but well why why is it working hi everyone and welcome to another video so today we're gonna be talking about if else internals we're gonna be looking at how jit handles if else statements and expressions and on top of that we're gonna be looking at four different examples of if else statements and expressions and see how they perform and we're gonna see some techniques how you can write your c sharp code that contains if else statements and make them faster so let's check it out let's do the code and let's start with a very simple example so we have a function called c1 here and it takes a single argument and let's just say that if that argument i is equal to one then we're gonna return one which we'll call an invalid state and that invalid state is not for an exception now because perhaps maybe we could do something with this minus one i don't know we're not just gonna throw the exception yet this is a invalid state but this is a valid state so you might imagine that this is not going to be called very frequently but this is going to be our super duper happy path so this path probably is the thing that we want to optimize for so let's see what jit generated for this case and we're going to compare our i to 1 and if this is not equal to 1 then we're gonna jump and then we're going to return i plus 1. otherwise we're going to return -1 so already you can see that something is strange here a bit because we have to jump in order to execute i plus one because this will jump if this is not equal to one so this condition got sort of rewritten into not equals really and now in order to not execute the jump we have to be equal to one so we have to go here and that's a problem so you might ask yourself why is that even a problem so it's just a jump right well the problem is that modern cpus have a component called branch predictor and then branch predictor has a certain default implementation that it follows and in order to be able to explain that just in a short bit i'm we're gonna have to introduce two sorts of branches so there's just two most of the time so this is a what we can call a forward branch because we have jump down in the instruction set and also there's a different kind of branch which is a backward branch which will for example take the code from address here and we're gonna jump up so if statements are a forward branch but for example for loops while loops and forged loops are a backwards branch because you know you might imagine that would have to spin right so we have to jump up and then go down and jump up again now let's get to the you know problem of this so modern cpus and modern branch predictors assume most of the time that the default state for forward branches should be not taken if they're taken then we're gonna slow things down and we're gonna probably have a misprediction already so it would be good to rewrite this if this is a hot path to not take this branch here so how can we do it we can reverse this condition and say that if i is not equal to one then we're gonna return i plus one otherwise we're gonna return minus one and now as you can see this is correct because now we're gonna compare and if we're equal to one we're already gonna go to the minus one so we're gonna jump but if it's not then we're never gonna jump and we're just gonna do return i plus one which is what we wanted by the way so now perhaps let's put that to the test because let's just see if the performance is going to be worse or even is if there's going to be any difference in performance so i have two functions here called account simple one and count simple two they're actually identical but the only difference here is this function c1 and c2 and these functions are the same functions essentially that we just wrote so this is the function where i is equal to 1 and this is the invalid state and in c2 we just reverse the condition and now the valid state is with inside if statement and invalid state is outside so let's see how they perform you might also notice that they're um they have an attribute called no inlining and we're gonna get to that in a minute why is this but for now let's just measure performance so we're gonna test this on a big array and the first example took 24 ish milliseconds let's just run it again just to be sure 23.2 milliseconds now let's see what's going to happen if we have our critical state within the if expression and also if statement itself let's just run it and it takes 19.5 milliseconds so it's already way better and it's pretty stable so it the performance you know characteristic didn't change so it is faster as you can see so now let's go and let's rewrite this in another way because as you saw we have this no inline argument there and let's now see why is that so i have my function here and let's add the loop on top of that and let's see what's going to happen if my function is inlining here and let's go here and let's check it out so i'm gonna skip you the details but we're gonna just see where this got inlined so this got inlined specifically here so this is the this is the code that got inlined and perhaps you can see it already there's a problem even though although we have our hot state in the if expression we're not gonna take this jump we're gonna go here but we're gonna take this jump which is an unconditional jump so the unconditional jump is still better than a conditional jump in terms of performance but it's a jump nevertheless so that's a problem so we optimized one jump but we created another so this is effectively like a sort of like an almost else statement and this is here because it's really difficult to have these two states all together in one simple like code base this could be split into like two multiple sections almost but the jit compiler doesn't do it so it is what it is really and probably there's good reasons for that but i'm i don't know the reasons by the way because this could be rewritten in a fashion that would probably incorporate these two states without any jumps but it's not being done like that so that's a problem and you might ask yourself well why we did the first example because there's no different benefits because we have two jumps now we replaced one jump but we did the other so hey that's a problem right well no no we can still use that and let's move to another example that we have that could solve our problem so what we have here is we have a counting function now and that counting function does something interesting so we're going to start with the slow version so the hot path is placed within the jump here as you can see but now what we're gonna do in this function is if the array element equals to one then we're gonna not count this element otherwise we're gonna count the element and let me just you know modify this to count by one and we could rewrite this function in the following fashion so if the array is not equal to one then what we can do is we can set the counter to one and then just count otherwise we're gonna always set this to zero so these two are equivalent but the difference is that this is a jump but this isn't a jump and quite frankly this will probably get optimized as well so right now we're seeing sort of the same thing it's going to get us the same results but this will use the hot path and will not generate any jumps but this will jump and let's measure maybe the performance of these two so let's measure the performance of our count function one and let's add console right result just to be sure that they're equivalent so let's write out the result let's do count one which again will jump so it takes roughly 12 milliseconds and let's do another round so it takes around nine milliseconds all right let's do count two so count two takes around nine 9.7 milliseconds and let's just do another one so it takes 9.1 milliseconds so that's good because we can already tell that this is probably faster and i've went ahead and did a benchmark.net run on a different data set and on a different data set we got these results so as you can probably tell there is a performance benefit to this so it's data dependent so if you have a non-uniform data versus you know from there you're gonna have different results but as you can see there's results and count two is faster by almost fifty percent so that's good so let's for example change this around and let's use our random array this time around to check if we're gonna have any sort of benefits or you know maybe it's gonna be worse i don't know so it takes 23.2 milliseconds and this time around it takes 24 and let's go to count 2 which should be the more optimized version and it takes 19 milliseconds and let's run it again just to be sure 20 milliseconds so as you can see there's performance benefits of of using this this technique but now let's let's switch to yet another example because right now as you're probably imagine representing invalid state in that way is not really good this sort of count that we had here so this count here has its uses or you can actually find a lot of cases like that in nlp where you just don't have a invalid state but you have a state that changes another state but we would for example when we have a invalid state we would like to throw an exception for example or return something else or log something else so let's now see how we can change this around and let's see what's going to happen when we're going to have a example which just has a return statement so we we're going to have pretty much the same loop here we're gonna have pretty much the same counts and if the array element is equal to minus one we're gonna just return zero and just you know let someone else hold that and now what's going to happen is a something really interesting because all of our conditions all of our branches got effectively reversed so now we're gonna check against minus one and we're gonna see if this is equal to minus one and if it's equal to minus one then just return zero so our jumps our branches got rewritten in the inverse order so now this is a hot path and this is not a hot path so that's good because that's what we wanted initially right and now we have it although this sort of returns euro is still probably not the thing that we would like to have so let's for example do a fro let's throw a new exception and as you can hopefully see again this state got rewritten correctly so if we're equal to minus one then we're gonna go to the exception like exception block and this exception block creates the exception and throws the exception so that's really good because again that worked in our favor so now this is our hot path but there's there's certain problems here so we increase the stack space by a lot and we did additional things here because we have to reserve certain stack space for this exception to be able to happen and since this can be called almost like zero times in a program so we're just increasing the stack space and doing certain extra work for nothing so there's additional technique that we can we can sort of pull off here and it's called a fro helper so a referral helper is a special function which has to have a very special signature so it has to return a void it can take any sorts of arguments that it could you can for example like but you have to return a void otherwise it's not a fro helper and it's not gonna work so what i mean by that so if we have a fro function here and let's throw something so let's throw a new exception again so this is a throw helper so now if we're gonna replace that with a throw and it has to be static so now our function got written in such a way that we still have the correct signatures and everything but we have uh this call you know here and all of the things that are related to exception throwing and the stack counting and stack you know increasing r in this sort of block here so now we have a nifty function that basically is capable of like handling our exceptions but let's just you know see what's going to happen if we do like the following so now it has to return an end and let's do a return zero although we're gonna still throw an exception it doesn't really matter now because all of the branches got rewritten incorrectly this time around and this has become a hot path again and this is the cold path again so this is the preferred path so as you can see the fro helper has to be extremely specific so it doesn't no returns basically what i'm trying to say you and it has to be inlineable so if you do like a no inline r parameter attribute on it it's not going to work because if you have a known inline attribute then the jit will not analyze this function at all so it will not determine that it's a throw helper first of all and second of all if you have like a very complicated fro helper you can you can have a complicated one you can even have multiple if statements with different sort of exceptions depending on the state that you're in that's all good but if you make it too big and it couldn't be in line that's probably going to cause you problems because that will not be a throw helper so kind of keep that in mind all right so let's now do another example so this is the last example and i think this one is the most interesting one because it's a really cool optimization technique that you can do with the jit but you have to know that it even exists so let's go into another example and let's copy this bit of code here and this bit of code is super interesting for a lot of reasons so um i'm gonna have a flag here which is a let's just transform it into a field for now so let's do like flag equals one and depending on the flag let's let's actually do four depending on the on the value of the flag we're gonna do and we're gonna pick the correct like branch here and we're gonna write either 100 or 200 so what we generated right now is two console write lines so we have effectively incorporated two branches so we have to account for both but there's a clever technique to be able to like eliminate automatically branches in your code which is good because it decreases the code size there's no jumps the branch predictor has less work to do and the performance will be better so what you can do is you can create a for examples function out of this so we can convert this to a function and as you can see if this function returns a const argument so a thing that will never change what you now effectively did is you optimized the entire branch so this call this code got just collapsed to console write line 100 because we know that we're going to pick this path and this could be like a static function starting function starting read-only function it can be a property it can be an expression and it can be a function it can even be a function which takes arguments but it has to resolve to something that's known at a compile time so you can do a lot of things with it so like i said you can for example do and x we can do return four we can do something with that x but we cannot add this to h4 because it will not resolve correctly so for example we can do this there's no problem with this it's going to generate but as soon as we we're going to modify this and this will not be a const well we're going to have a problem so now as you can see we did we were actually eliminated all of this code because we don't even believe that this is going to resolve to either four or eight but if we would do like three for example then it still works which is surprising but well why why is it working all right so it apparently it's working so that's cool oh because this is still a cons and these two will get folded into a4 because this is a cons okay so that makes sense but if we would change that to a y argument and we would pass that y around well now there is a problem and like i said it could be any number of things but it has to be resolvable at compile time if it is reasonable resolvable at compile time then it's all good again it has to be inlineable and without inlining this function it's never going to be analyzed correctly so it has to be unlikable that's that's the that's the reason so how can we use this to our advantage because it's very cool so for example what we can do and let me copy another method here we can add a for example logging function of sorts where we're gonna create a sum again and let's add a in flag and let's set that flag to 4 for example so now as you can see we have this big function we have our flag here and our flag is set at compile time so we can sort of enable logging by just having this flag here so now we're logging stuff in this loop so for example we're logging arrows but if we turn this around and set this flag to eight then we're just not logging anything and we have a simple and compact loop so we can sort of create this dynamic where depending on the configuration flags that we have in our app or on certain different things we can eliminate certain branches and that's a very powerful optimization technique okay so if you liked the video leave a like possibly subscribe and leave a comment if i missed something if there's a mistake here and perhaps leave me some feedback about what you would like to see in the next video because i have ideas but maybe you have better so yeah thanks for watching and see you next time bye
Info
Channel: LevelUp
Views: 2,551
Rating: undefined out of 5
Keywords: C#, csharp, JIT, performance, dotnet, compilers, programming, computer science, tutorial, C# tutorial, clr, clr internals, just in time, just in time compiler
Id: XLcPTf0efPk
Channel Id: undefined
Length: 22min 41sec (1361 seconds)
Published: Wed Sep 30 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.