C# LINQ Performance Tips #6 - Value Delegates

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone so today we're gonna be talking about link performance tips this is video six out of who knows how many again and i have something interesting to talk to you uh about today because we're gonna be talking about value delegates so this is a very cool thing that we can that we're gonna discuss so what are value delegates um they're just behaving like normal delegates but uh they're completely stack allocated they're not heap allocated at all and if we're going to capture some like you know if we're going to use a lambda and capture some variables and do some things we're going to first of all land on the heap we're going to generate a display class to be able to capture these locals and we're going to do a bunch of allocations which are not really necessary they could be necessary but they're probably not in our context so yeah that's why i'm super exciting to be talking about them okay so let's move on and you know let's see some code so last time in the previous videos we talked about how to optimize certain link operations either by implementing a better version of the algorithm by using structs instead of classes because we don't want to let the heap then by de-virtualizing interfaces because every time we use a structured interface we're going to box so we don't want that and that's why and lastly we looked at branch elimination and branch prediction that's a really good optimization as well but all of these all of the things actually uh still left us with a bunch of bytes uh on the heap and let's verify uh sort of what happens so we have a list here and we have a function and we're gonna use the plain link version now so we use a where operator where you know our x is greater than some number and our number is 90 and then we're just going to do a first of default so let's fire up our trusty debugger and let's see what's going on in in you know in allocation world and in the heap so let's dump the heap and what we're gonna see is a bunch of uh stuff that you probably didn't want on the heap so first of all we have our you know wordlist iterator but we know how to deal with that because we sort of optimized that away in the previous videos but we what we didn't do is um we have a display class here that got created and that should contain our value 90 which it does and on top of that what we're gonna have is we're going to have our funk which will take and 32 and return a boolean so that's one of the things that we probably don't don't want on the heap as well because it just you know take you up space and that space is you know governed by the garbage collection and it it can move and you know other stuff can happen so we would like to have everything packed nicely on the stack if we don't have a specific need to be on the heap okay so how do we deal with this problem there's a couple of ideas that we can use so first of all let's uh fire up our you know simple measure function let's see how the classic link performs it's around 15 15 milliseconds then let's fire up our custom implementation which we did in the previous videos where what we're going to have is we're going to have a struct uh we're custom enumerable it's going to you know pass the list and create an enumerator struct and have a custom signature of get enumerator and pretty much that's it and we're gonna implement our custom first or default version so we're gonna use that enumerator here and as you can see all of the signatures are just for the enumerators and the labels so we're just passing structs because we don't want to do accidental boxing you could you know to be fair you could use interfaces and still not box but that's tricky so we opted for like the simplest version for now and let's see how this version performs so this version takes 30 milliseconds okay so what we can do to you know bring this you know speed things up even more but not having heap allocations at all because we know that we have heap allocation still so what we can do is we can have a custom where where we now have two two parameters so one is an int and the other one is a struct that's called greater than 19 and that struct is implementing an interface called i predicate and that i predicate has a struct st as well it could have really a class or a struct but for now we're just going to roll with the struct and it takes a single it has a single function called invoke which takes the takes the t and returns the ball so um this is our struct here and what we're doing is we're implementing that by saying that x has to be greater than 90 and that's it so um you know it's a thing but if we're if we gon if we're gonna use that interface anywhere we're still going to box but there's a cool trick that we can do in order to be able to get around this problem and let me show you the trick let me show you the word implementation so the where a t source and a t predicate and the t predicate in this example here uh has a uh you know a narrowing sort of uh signature where the we say that the t predicate has to be a struct and has to be an interface and now what we're going to do is we're not going to accept i predicate in our arguments we're just going to accept the t predicate because we already know what it is and this will allow the jit compiler to generate code for a struct not for a just a generic interface because we know that we have a struct so we're not gonna box and that's pretty awesome okay so let's let's just create the you know our struct because it's an empty struct doesn't have anything so let's do that uh let's pass it along by ref all the way down to the innermobile and to our enumerator as well and let's just use it so um since we have a predicate and you know it we have to filter out certain values and move next we're just gonna do a while loop as we did before and we're gonna check if the predicate is true and that's it that's all uh that we have to do so as you can see it's the same sort of use case as we would use our lambda here and there's a commented output here on the predicate because sometimes if our predicate uh will be used in and will have certain fields will get used uh we we can have a defensive copy and this could in theory allow us to bypass that defensive copy but you know in this case we know that we have a very simple struct and so we don't have to use that okay so let's measure the performance of this guy so that takes 20 milliseconds so that's awesome right that's super interesting that we could optimize it even further so let's now see um if indeed there's nothing allocated on the heap so let's let's do this let's fire up our debugger let's restart let's go let's dump the heap and as you can see there's nothing here so we didn't allocate anything really that's awesome that's really cool okay so let's uh you know verify that this doesn't happen because you know we might missed something and this measure function is not the greatest in the world so uh what we can do now is we can do a benchmark.net where we're gonna do two things so first of all we're gonna do a memory diagnose so that will tell us what's on the heap and what's not and how many bytes are on the heap and the second thing that we're gonna do is a disassembly diagnose so we're going to look at a you know assembly from these functions because there's one of the interesting bits that these value delegates can do which you know unfortunately lambdas currently can't do and that thing is really let me comment this out we can inline so if we have a display class we just cannot inline that no matter what we do there's no way to be able to inline these functions but there's a way to you know explicitly say that we want this guy to be inlined and um that makes a world of difference in high performance scenarios so we're gonna verify that indeed this happens so let's run our benchmark.net uh it's the same thing that we did before so there's three functions to test and let's see the performance all right so let's fix the formatting a bit yeah okay so in terms of performance where link is obviously the slowest one uh it has a bunch of gen zero allocations um and it's worth like 160 bytes then we have our d virtualized function without value delegates it's around somewhere between 50 percent or 40 faster we have some still allocations because we have delegates but it's like half of the allocations and lastly we have our vault value delegates function with all of the optimizations applied to it so it's around sixty percent faster uh sorry not sixty percent it's actually two times faster than you know the the wear link so that's awesome actually it's even more than two times faster but that's that's not really what matters what matters is that we have absolutely zero allocations great cool all right so let's see if our disassembly will tell us that we sort of optimized much further inlined our function code so let's look at the world link first and 5a means 90 and hex so as you can see we're loading up our value here to that we're link function and it will do a bunch of calls but what we're interested in that one of the things that it generated is a display class and there's a function to be able to check if the value is greater than 90. so this here function has to be called a bunch of times and as you can see there's multiple calls to multiple things because well certain things could be inline but sometimes inlining doesn't make sense but at least you know this this method here cannot be inlined really at least not for now because the compiler does not support this and let's look at the last implementation the third function where we just really have a single method it's all of the code is inlined and because it's in line the jit compiler can do additional optimizations because uh it's you know sees that this code is together so it will do like a bunch of crazy stuff and let's try to find out uh where's our uh function uh value delegate here if it is here so what we're going to do is we're going to jump here and this bit of code is the while index is it's the you know move next basically where we're just going to do a while loop and we're we're going to check the condition and the condition is being checked here so we're going to jump uh to this label and in this label what you can see here is that this is our volume 90 here and this is where we're doing the comparison and we're gonna take this branch here so um you know oh good and um yeah so i'm not an expert in assembly but what i can say at least from the looking at this code that indeed everything got inlined and it you know we could use some extra optimizations thanks to that so that's good all right so um that's all for this video if you know you got the value out of this video you learn something new like and subscribe because that's that obviously helps and you know hope hope to see you in the next video so what i meant to say is hope that you see the next video and you know please like and subscribe and i'm hoping that i'm gonna have even something even better to raise the bar than this sort of here and the bar has been set very high because this is a super cool feature thank you and see you next time bye
Info
Channel: LevelUp
Views: 1,809
Rating: undefined out of 5
Keywords: C#, dotnet, dotnetcore, performance, linq, memory, allocations, csharp, lecture, programming, tips, tutorial, delegates, computer science
Id: L64BSzRwaHw
Channel Id: undefined
Length: 15min 7sec (907 seconds)
Published: Mon Aug 24 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.