C# LINQ Performance Tips #2 - Structs vs Classes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone so today we're going to be talking about link performance tips this is video 2 out of who knows how many videos and today we're going to be talking about type information and why losing type information is bad for performance we're gonna be looking at how link handles trucks versus classes and finally we're gonna be correcting some stuff from from my previous videos because some of you left comments that not all the things are accurate and some of the information is now wrong so i'm gonna be correcting that information as well and let's move to the corrections first i guess because um why not okay so in the previous video i've talked about the last operator from link and what i told you is that link just uses the i and numerable so it will enumerate until it gets the last element and then it will return and i've implemented the last operator myself using the indexer and i told you that you know that's why it's faster but that information was true but link changed the implementation and now the current implementation actually does check if we're having a list so let's see the source code for that let's do the last so we have the last here and what it will do is it will do a nil check it will check if it's castable to ilist if it is then we're gonna do some additional checks but we're gonna return the last element using the indexer which is good which is fine which is fast but in our tests that we did last time um we were still faster so let's run the link last version which takes roughly what 15 milliseconds 14 milliseconds something like that right okay and this is our version of the last so our version of the last will just take the list and it will just return the last element but uh unsurprisingly or surprisingly depends on how we're gonna treat it this version is way way way faster and why is that because link actually does the same bit is it the checks well yeah the checks are expensive they're especially expensive in terms of branch prediction so if we have a bunch of if statements and the processor will make a decision to go ahead and take the branch and it will be wrong it has to like throw that pipeline away and get the new one and it's going to be slow but most of the time in this call it will not happen because it we will always take the crave branch because we will always have this operation here in the loop right so there is no risk of taking the incorrect branch most of the time here so what happens so it turns out that interfaces are expensive and what i mean by that so if we have a a innumerable here then the problem becomes that this is an interface and we have to have a vlookup uh table so there's a there's a virtual uh lookup virtual dispatch table basically um it used to be like that but now there's a new mechanism actually which got implemented not so long ago called vsd which is a virtual stop dispatch this is a new mechanism that is more robust it has additional features and it should be for instance methods it should be faster just just a bit so roughly what it will do is it will create a stop and each time we need to call an instance method it will look up the code ins from it's like lookup table it will do a bunch of things and i don't want to talk about them in great detail yet but it will then generate that code and the next of the calls will be just a simple like cache lookup because those values will get cached and it should be faster than the old implementation of course if we have multiple implementations of the interface and it we will have a cache miss because of that then it will fall back to the to the slow path version but in our case we should always have the fast path but still this is a cost that's associated with it even if it's cached and we have to take that cost when we're using just interfaces but it's it's good i'm just trying to tell you that this mechanism is pretty good but there's still some costs there's always costs so that's why uh we're slower so there's a bunch of extra checks and we're just operating on plane interfaces and that's that's why so let's measure the vsd performance actually because we can so i have a class called vsd here and let's change that implementation here to vsd and what we'll do is we're just going to have a class that implements an interface that's empty method do and for one example we're just gonna call the interface and for uh the second example we're just gonna call the class and let's check the performance so 32 milliseconds 25 it's gonna default to 25. okay so 25 milliseconds for the interface implementation so now let's go to the concrete class 3.5 4 milliseconds 3.1 yeah so roughly speaking this is the overhead that's associated with this and like i said there could be more costs because we're just using the single implementation of a single interface it's gonna it's going to be a bit worse where we have like class b class c class d and they all implement the ei interface and they all have nothing do but this method is completely different from each other so that's going to be a bit slower in in in this situation but we don't have this cost here all right so let's move now to uh link performance so let's look at preserving type information because we just learned a bunch of stuff about interfaces and about concrete classes and if you wanted to know more about vsd there's this article here about vsd so virtual top dispatch so if you just type into google virtual tab dispatch or vsd and you know at c-sharp you're going to get this implementation vsd is a technique not exclusive to clr or net there's a couple of different implementations of this technique in a couple of different languages runtimes and what have you but this is specific to dotnet so if you want to know how this works in that net go here and by the way vtable is still here um because vsd cannot handle currently virtual stubs but you know maybe someday okay so moving on moving on to the problem why um interfaces are bad as well so what we have here is find first method uh we have the same list here we still gonna do first or default um we're gonna check if the item is greater than this guy here so we're gonna put a nine and hopefully we're gonna return this element so all good right okay so now let's check how it performs and in my previous video i told you that this is generally a bad idea because we were actually discussing a different method but this method uses i enumerate or so um that enumerator of our list will get boxed so what i mean by that let me give you up just a tiny more um context here so this uses i enumerable and let's look how the first or default is implemented so this is first or default but we're actually interested in this method here so again a bunch of new checks uh we're doing the for each here um and we're checking in against our predicate and if our predicate is true we're gonna return the element otherwise we're gonna return to default so what's the problem here the problem is that a list has a enumerator which is a struct it you know implements all the good all of the interfaces are disposable a numerator and a numerator of t and unfortunately for us this will get boxed we don't want it to get boxed but it will get boxed anyway and in order to be able to you know show you that let's comment this out for performance because we don't want to waste time on this and let's fire up our debugger and let's see if we have uh you know a list enumerator that's on the heap in order to do that we're gonna dump heap and we should see a generic list enumerator so there's an enumerator here but let's verify that it's actually ours so our list is a very simple one which has elements from 1 to 10. so what i'm doing right now i'm following the pointer to that implementation so i have the implementation here so this this is my enumerator and it has a field called list and right now we're seeing our list and our list has a array you know under the hood so that's our array let's print it out let's see the elements remember we have elements from one to ten very simple list one two three and let's maybe look at the tenth element ten so it's our list so that's what i'm trying to say so it's our list there's no like mistake here probably and you know we have unfortunately put an enumerator on the heap and this is not what we want because we want to retain remain on the stack because stack is faster has better better their locality it's more predictable there's no garbage collection here so our pointers will not get moved ever and um it's better for like cash locality as well so we want to preferably we would like to have our whole implementation on the stack not on the heap so how do we solve this problem for link well what we have to do is we have to retain the correct type information so what i did here is i reimplemented the first or default and i could just re-implement this interf this signature here but i just wanted to show you a different version so i have a custom suffix in my function otherwise it's exactly the same and the implementation here is exactly the same as well as the linked version so what we're going to do extra here is we're going to check if the source is a list and notice that we're not checking against ilist and the reason for that is because we would have the same problem as before we have to have a com correct type information concrete type information so we're gonna do that we're gonna cast a concrete type and then we're just gonna have the same code as we have before and that's pretty much it and now let's fire up version two so 11 milliseconds even nine i'd say yeah nine milliseconds as compared to 20 19 yes so it's around 19 milliseconds um two times faster right is it two times let me check fourteen nine yeah it's around two times faster i'd say okay so it worked now let's verify if our enumerator is on the heap or perhaps it's you know nowhere to be found on the heap which is what we want by the way so let's go here let's break let's dump the heap dump heap and yeah no interfere no not interfaces no enumerate terms here so that's good that's pretty good so why this version is around two times faster so the first first of all uh we're not doing any uh allocations on the heap we're just operating on the stack that's good the second bit is that we don't have any you know interfaces so we're not going to go we're not going to go for vsd um this is obviously faster as well and all of these things uh combined give us the better performance so it's all good right well no so we have to be very careful when doing these checks because if we have a system and we have a bunch of checks like that in every overwritten link method it's gonna bog down performance significantly because these are expensive and um i've talked about this a bit before about about branch prediction so the more stuff you put in here the the more likely that branch prediction will not hit the correct branch and this will cause a significant performance downgrades so you have to be pretty conservative with this and you have to add this if you know that your dominant type into your system is for example a list or maybe a dictionary then you can add these two ifs and have good confidence that you're gonna hit these branches like all of the time if you don't have that confidence then you're just gonna slow things down so you have to be careful and you have to have data on hand and in fact now that i think of it the link version that i was referring to when having the last operator could end up the same way because they didn't have that check they only added that check later because probably they had data on hand that everyone is using the i list so or something that is a list and something that has the inductor so it was like word file to add this check here although we're not always going to hit that branch and if we don't hit that brand then maybe that's a problem but uh it's like highly likely that we're going to hit that branch so that's why they did it okay so moving on this is like i said you have to be conservative with these checks and you have to measure the performance measure the performance of the system and yeah you have to be basing these checks on statistics all right so let's move to structs so let me comment this out not yet this is not a good time so i have a struct called point here and it has a bunch of fields uh it has a string which is a pointer and a struct and this is interesting you know to me if you know what happens if we have a pointer in a struct for example what happens to the truck but anyways let's see uh what's going to happen so i have a bunch of points uh they have a bunch of coordinates and the example is the same as before we're doing first or default where x is greater than one which will return the last element of our array and i have three versions of this code here so we're gonna do first or default the second version will do a for each and the third version will just do the for loop and use the indexer to access the item and check okay so let's run 70 milliseconds 70 yeah 70 milliseconds the second version will take 33 millisecond 27 milliseconds something like that and the third version will take five milliseconds yeah it's around five milliseconds okay so as you can see there's a big performance improvement between this and this and this uh this version versus this version has significant performance improvements from like o2 to o3 right and why is that happening so that's not good and let's check if we had a class for example what's going to happen if we're going to have a class here let's see here it's 39 so it's a bit faster and this version is 10 and this version is two and a half so that's pretty good what happens if we remove this guy oh no so for the class that shouldn't make any difference that's 10 [Music] it's 39 yeah so no difference here what about the struct 40 milliseconds and now we have 19 milliseconds so it got faster it got a bit faster and now we have 4.2 which is more or less the same right so what we did here is we deleted a pointer in a struct and if we have a list like this and we have a point so it's gonna be still on the heap uh nothing really changes because we have an array of points under the hood so we're gonna land on the heap what's going to be a bit different is that you know when we're doing our for each operation which is this case and this case what we're doing is we're calling that enumerator and that enumerator what what we'll do is it will do a defensive copy because we're you know having a we're operating on instructs so that defensive copy is something that we would not want to have and preferably we would want you know we would like that to be faster because we're doing a bunch of stuff that we don't want to do so how to solve this problem well we can use a data structure called ref list and that data structure behaves like a list with one difference and that one difference is that it has a ref enumerator and that enumerator is again your plane enumerator instead it uses the current version which has a ref return so now what we're doing is we're doing ref returns so that means we're we're not going to be probably doing a defensive copy of that truck we're just going to return it by reference so okay let's let's test the performance now with this version let's start with link because we're interested in like this sort of abstraction so it takes now 14 milliseconds so it's around two to three times faster already yep so the performance is pretty good for link uh the performance for for each should be pretty good as well it actually should shouldn't be indistinguishable from version with the for loop and as you can probably tell it's indistinguishable in performance right now so uh our ref return enumerator is actually a benefit for us because all of those all of these free versions are actually at a level of performance which i would call acceptable in high performance scenarios so even the link version is pretty good and there's one catch here because as you can tell probably this does not implement i enumerator of t and it does not implement i anor variable of t which links requires so what i had to do here is i had to implement my own first or default which operates on a concrete type of ref list and that concrete type is used to just do a forage which is the same almost the exact same thing as here so what we're gonna do is we're gonna do four each check the predicate if the progress is true then we're going to return it otherwise we're going to return the default version so that's that's it yeah and this is an example where you can sort of tweak link in these sorts of ways um i would not recommend doing it for like all of the all of the things there are frameworks like that actually one of them is i believe called hyperlink which uses all of these things that i just mentioned not not those things but it does other things to make link faster so it will re-implement operations of link to make link faster it will add new operators in to be able to express link in like you know more high performance ways as well so um i've been putting um the the hyperlink for framework in my uh you know video description so you should definitely check that out okay so that's all for now um if you have any questions if you think that i made a mistake post that in the comments i'm gonna try to correct each and every mistake if it's a mistake and if you like the video leave a thumbs up subscribe hopefully and see you guys and girls you know and really everyone next time in the next link video so yeah thanks and bye [Music] oh [Music]
Info
Channel: LevelUp
Views: 3,280
Rating: undefined out of 5
Keywords: C#, dotnet, dotnetcore, performance, programming, linq, tips
Id: atUpAx_0lqg
Channel Id: undefined
Length: 24min 36sec (1476 seconds)
Published: Sat Aug 15 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.