C# What JIT Generates? - Loop Cloning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone and welcome to yet another video called what to do in csharp.net and today we're going to discuss three things really so first of all we're going to discuss loop cloning what it is how to apply it how to optimize it and certain other things well second of all while we're doing loop cloning we're gonna discuss what's valuable hosing and first of all we're gonna compare if statements to if expressions and check if they're equivalent so let's jump right in let's start with the if statements versus if expressions so i have a method here which is empty right now which will take a boolean and return an in value and let's just say that if the boolean is true let's return one otherwise let's return zero so as you can see what we're doing here in this assembly code we're checking if this is true and if it's sure then we're gonna assign one and return otherwise we're gonna assign zero and return so let's let's now compare that to and if expressions because people will often tell you that if statements are equivalent to if expressions but that's not really exactly true so let's check it out so i have an if expression here and if we're gonna compile that and decompile that to assembly code then what we're gonna see is that the order of these checks is reversed so first here in this first one in b1 we're checking if b is true here at the other hand we're checking if b is false and then we're gonna you know return one otherwise we're gonna return zero so the order of these checks is reversed which means that they're not really equivalent and let's move on to yet another interesting bit that we have today so let's move to you know the primary thing that we're going to be talking about which is loop cloning so whatever i have a method let's just rename this to m and let's do a bunch of things so i let me have a method which has an array as an argument and then two additional arguments x and y and what we're gonna do is we're going to assign the the computation of x plus y to each index of the array so i have a for loop here which checks if i is less than 100 because my ray is going to be 100 elements and what i'm going to do is in each index like i said i'm going to assign x plus y and right now an interesting thing happened because what happened now is this loop here got cloned but let's let's not jump into conclusions yet let's verify that this happens let's first of all find our loop here in this you know big assembly code and then let's see what else interesting is here so my loop is here and i can verify that because what i what in what i'm seeing here is that i'm assigning a you know register to an index and then i'm incrementing i then i'm doing the check if i is less than 100 then i'm doing jump and i i'm gonna start over so an additional interesting thing happened here because this computation got hoist and what i mean by that it got pushed out of this loop to some temporary like variable here and how can we verify that so if this is our loop we're spinning here then as you can see we're assigning to each array index the value from r10 register and r10 is being set here which means that our 10 is loading our 9 or our 8 plus r9 which is x plus y so what we're doing here is we're doing x plus y assigning that to r10 and then we're just using r10 from now on which is very good because we just optimized that so as you can see i have this loop i have a bunch of checks and if i scroll down what you're going to see is that we have another loop here because this is our loop so again we're incrementing i we're checking if it's less than 100 then we're gonna do the jump that jump is going to land here so this is my loop it's a bit bigger as you can see because now we're doing a bounce check and if it's out of bounds then we're trying the we need to throw this exception here and as you can see we're loading our 8 plus r9 into our 10 in each iteration of the loop which is slow and unnecessary so now let's you know let's try to figure this out why this even happens why have two loops so let's go here and check because there needs to be a bunch of conditions which will lead to one loop or the other and what are they right so the first condition is here this will do a and binary end operation on the register rdx which means effectively that we're checking if the length is zero and we're gonna jump to that second loop which is slow then what we're going to do is we're going to check if the length is less than 100 and if it's less than 100 then we're gonna jump to the slow version of this loop but if these two conditions aren't met then we're going to go to the fast version of the loop so let me show you what it looks as a generated code in c sharp so i have the loop clone version here so it it's pretty much this so if array length is 0 or is less than 100 then we're gonna go to the slow version which will do a bounce check in iteration and if these two aren't met then i'm going to you know hose my computation out of the loop and i'm just going to assign in each iteration there's going to be no bounce check here and i'm going to return that's it so why do we need two versions because as you can probably tell by now if the array length is less than 100 we're gonna crash here right so that's not good because why would you need a loop which is slow and on top of that it crashes well the reason is that if we have a loop and we're gonna assign something to a array which we got from somewhere uh someone else might have a try catch somewhere else and we're gonna crash yes that's true but the program might not terminate and the array might still get used somewhere and it might be unofficial that this computation actually computed some of the elements and that's one of the reasons really why we just cannot do these optimizations because we're not trusting this code kind of so probably that's why it's not being caused because there's no point because we're gonna crash obviously so that's one of the reasons and if we're good to go if there's not going to be any problems with this array then we're just gonna go to the first version which means that most of the time we're gonna go to the first version of this loop which is good okay so still you might think well why do i need that because i know that my array is never gonna crash and i don't want to generate all of these checks and all of this code because you know these are branches maybe maybe it's going to slow me down a bit so i would like to have a more compact version of this code without any checks well turns out that if you know if your array is 100 elements and you don't need to do anything else you just iterate all of the elements of the array you just do a length and we just generated a version which hasn't have doesn't have any checks asides from like checking the zero if if length is zero but other than that we don't have any special checks our computation will get hoist out of the loop and that's good because we're gonna start from here and this is our loop so we're doing the computation once and then we're gonna assign it so that's really beneficial that's really cool and we solved the problem without having to resort to loop cloning of course if the array if we want for example wanted to have like first 10 elements or like half of the array then we would have to go through that optimization technique but that optimization technique is still pretty good because it is fast all right so let's do another trick so if you want this version this fast and optimized version you need to be mindful of like certain optimization techniques that you might do yourself so you don't have to you just cannot be clever with it and what i mean by that that you cannot be clever so let's just say that you're gonna plan to use this somewhere in this method let's just say that this is not that you know not the end but you have something else to do and that length is necessary for example so you might think that okay i'm gonna assign this to a local and then i'm just gonna use that local because i'm gonna use that local elsewhere in this method somewhere you know down below and that might be beneficial for me but as you can see we just generated a loop conversion plus extra checks that we have to do now and we just you know we have a not very compact version of the code now so if you want to have the best possible version of the code you don't uh you know you cannot do these local assignments because although it's we still know that this is not going to crash currently in the current implementation the jit will you know it cannot figure this out on its own and that's why you're seeing these effects so that's a problem okay let's move to yet a different example so let's just say that this array is a field in a class or a property in the class as well because this often happens that we have a class it has its data and we're having we we have some methods and we're doing something with the data so now if we're gonna do this experiment we're gonna generate a slow version of this method because what what needs to happen here is we're doing a bunch of checks first of all we're checking if the length is zero then we're doing a bounce check and let's verify that it's true because this is our loop we're going to do that we're going to do the jump to 13 so our loop effectively is this so we're doing bounce check here uh of course our computation got hoist and that's good so that's that isn't a problem but we're doing all of this work here uh maybe that's not really necessary for us um and you know it's because the jit cannot really determine if this can be optimized away these checks because this is global it might be used by someone else probably who knows so if you want to you know skip these bounce checks and extra checks then what you have to do is you have to assign this array to a local so if we're going to assign this to a local and change you know every alias of a to l here then we're going to have a compact version again so this is a really neat optimization trick if you know that you have a global array for example in in the class then you just assign this to a local and if you have a tight loop then that may be beneficial to you because you're gonna you're going to do less checks so that's a very interesting bit to to do and let me let me switch now to some performance tests so what i have here on this slide i have the loop clone version versus the loophole version tests and as you can see it's slightly faster it's not something that you can be like super excited about but you can still tell that the loop version is faster but there's a catch the catch is that method has to be called the loop conversion all the time or it has to be in a very tight loop in order to see these um you know to basically tell that the looped version without any cloning or optimization is better but otherwise they're pretty much the same but if you have like a very tight loop that you want to optimize that's still a thing to look at it might be worth file same with this version if you have something that needs to run as fast as possible perhaps it would be good to copy this to a local okay if you got value out of this video you know like and subscribe leave a comment if there's some problems you see with this video and perhaps leave a comment um maybe there's something you want to see in the next video so leave a comment as well and that's all for this video so see you next time and thank you and bye you
Info
Channel: LevelUp
Views: 933
Rating: undefined out of 5
Keywords: C#, csharp, JIT, performance, dotnet, compilers, programming, computer science, tutorial, C# tutorial, clr, clr internals, just in time, just in time compiler
Id: zxcHkEu6aTY
Channel Id: undefined
Length: 14min 24sec (864 seconds)
Published: Wed Sep 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.