C# What JIT Generates? - Struct Devirtualization

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This video is the extension of this twitter thread where JIT optimized the entire class by just changing a single method...

https://twitter.com/badamczewski01/status/1299416623720402944?s=20

You can find the entire series of JIT videos here:

https://www.youtube.com/playlist?list=PLzQZKn8ki7X1ImuZfa0IzAHTcbIKIJGeD

👍︎︎ 3 👤︎︎ u/levelUp_01 📅︎︎ Sep 14 2020 🗫︎ replies
Captions
hi everyone and welcome to another jit video so today we're gonna be talking about struct d virtualization this feature is very cool because not many people know that net supports automatic structure virtualization out of the box when we say about the virtualization we usually mean a type the virtualization through generic arguments but this is not not it really so struct the virtualization is a very fragile feature unfortunately but because what you have to do you have to fulfill a set of criteria that has to be very specific for it in order to be able to even work so there's some there's some problems with it there's some good things that we can do with it and there are some things that will totally totally surprise you because they're not expected and they look like bugs but they're probably not bugs and we're gonna see what's up with that and try to explain the problem and solve the problem so let's jump in let's start with the virtualization so let me just paste this code here and what we have here is a struct which implements an interface and that interface has a single method called get and we're gonna implement that interface and we're gonna just return four so a very simple method very simple thing to do and of course we we're gonna have a method called m which we'll use to do the following so what we'll do is we're gonna create a interface called a and we're going to instantiate it with s and now we're going to return a get so what's going to happen here if we jump to il code you're gonna see that the c sharp code com c sharp compiler actually generated a box instruction here because it knows that we have to deal with the value type and that value type has a virtual call in it so it has to get boxed unfortunately under the current implementation but what the jit compiler did is in it in line this function and while doing that it noticed that it's a const so const can be just returned without any problems and there's no boxing in this situation so that's very good because that actually proves that what we did is we effectively de-virtualize the struct without any problems awesome right very cool let's see now what we can do else with it because when we add one for example as you can see it still got like folded into a const and this got evaluated at compile time so it's five now and it's all good right because these kinds of falls can be implemented very efficiently but let's see now what happens when we add one from the other side so now one is a const added from the left hand side and not on the right hand side so it's not really a huge difference but in generated code it's actually a drastic difference because now what's going to happen is that we're going to box no matter what this is a boxing call and we're gonna have our s and ours tracked on the heap instead of a stack or anywhere really because when this gets folded it's really in a register and it doesn't have any presence but now what you're gonna see is that we box so what's the problem here well the problem is that the compilers usually when they're doing certain optimizations possibly this optimization as well they have something that we call items so item is a very specific pattern in code and when we look at the il code now what's going to happen is wherever we have this construct here and this construct here when you click on this as you can see this is a block so an item in this case is all of these instructions on their own means something different than their when in this sort of group here so in this group they mean just an instantiation in the box so this call this call here to get means something different and if we combine these two then we have an item that we can detect and transform in the way that the compiler does right but when you when we introduce something like plus one here we cannot effectively detect this pattern possibly because there's something that's getting in the way because when we have the optimal version here as you can see we have a box and that box is followed by a call to an instance but when we do the call from this side for example well this still holds true but when you reverse the order it doesn't really hold true anymore because we have to introduce a load to a const here and now the pattern breaks so i'm not sure if this is the way that it's implemented in code but looking at different sort of compilers you could tell that it's probably what what's happening here but it might be not true but there's still some of variation of this pattern happening here so to make matters worse if we introduce an empty and not needed call like this here the pattern is still going to break because we now introduced a duplication so that duplication means that this button cannot be applied and if we see at the jit assembly then we have a problem again so that's one thing so like i said this is a very fragile feature because even the smallest change can break it and that's a problem okay so how to now let me leave this plus one here because we're gonna use it later so how could we do something with it so let's do something different let's have another method that we're gonna change a bit so let's use our struct again but now let's not use the interface but upon returning let's say that our a it will be an actual interface so it's going to be i a and our ia will call the get method so it's pretty much the same but the implementation and the resulting code will be different because now what's what's happening really is we're allocating this guy on the stack and this got effectively de-virtualized although we're asking for an interface implementation here the compiler detects the jit compiler detects that this is a struct and instead of doing that what we're going to do is we're going to allocate this on the stack and then call the method get and then return and a very interesting bit is the knob operation here because this is a release build so we're doing this all in the release mode in x64 and for some reason the compiler emitted and no operation here which is strange but okay we can live with that so this implementation is better although we would in like production situations probably this would be an argument that interface would be an argument here and we couldn't like deal with this in such a way right we would have to get that interface and do something with the interface and the virtualization would not work if we would introduce some extras here so it's very fragile in that in that sense right so let's see a another example which is bizarre and that's a very surprising thing that's gonna happen now so i have these two methods right this one boxes effectively but this one does not box and what we're gonna do now is we're gonna look at the method n and keep a close loop on that generated code because magic is going to happen and let me remove this x this sub expression here from m remember we're looking at m n let's remove that and surprise surprise everything got fixed so everything got folded into a const now although m and n are not related to each other in any way so they don't share data they don't show nothing all and as you can see we just optimized everything by fixing a single method m although we're using n why is that so that's very surprising and i was extremely surprised when i saw this the first time i showed that to a bunch of people in the.net community they were surprised as well then i showed it to a bunch of people from microsoft and they were surprised as well so it has to do with something that's called a jit tiered compilation so g tiered completion could be enabled in dotnet core 2.1 and 2.05 i believe and it's by default in three points and 3.1 and 3.2 and it's gonna be enabled by default now so it used to be that jit had a like a whole call and and called path there were multiple jits um before but now what what we have is we have a tiered compilation so tiered compilation means that in tier zero we're just gonna compile stuff quickly and we're not to optimize it very well because what we're trying to do is we're trying to speed up the load times of all of the things that we sort of care about of the application and the code will not be optimal but under certain conditions where we know that we have a hot path for example we're going to go back to that method and recompile it and we're going to use all of the optimizations that we can so this thing here that i'm just showing you now is related to that problem but we're using a tool called sharp lab and we don't really know if this is using tier 0 tier 1 and perhaps that this disassembler is broken in a way because that these two methods shouldn't affect each other right well the compiler is just the the compiler is just fine the disassembler is just fine everything is fine but we have to prove it right so let's prove now that this is related to tiered compilation and we're gonna see how we can abuse that feature all right so let's jump into some real code and i have the same example here so i have these two methods m and n they're doing pretty much the same thing and i'm just going to use a new instance of program i'm going to call m and then i'm going to call m just to be sure that the dit will compile them and i'm just going to return okay so we're going to use a different disassembler now and that's called wind dpg because wind dbg has support for dotnet types and we can totally decompile our application so let's compile this first and let's see what's going to happen in wind dbg so let me switch to pdpg let's load the application let's go let's break let's dump the heap with a type called program because what we're trying to do is we're trying to get a method table r of a class in order to be able to see what methods are method definition so we have our method descriptions here as you can see we have a bunch of methods that are jit compiled already so let's look first at m and as you can see they're quick jitted and the quick jit version is a very slow version that actually boxes it's not really the performant version of a method although it's you know we're not using the virtualization here because uh like i said work on the box and this is the proof that we're gonna box here but let's go and look at method n and the method n again is quick jitted and the implementation is a bit different than we saw in sharp lab because it's allocating stuff on the stack but we're still gonna box here so it's not the same implementation as we had when we're testing stuff in sharp lab well maybe because sharplab already compiles in tier one and we have to verify that first so let's detach and let's see how can we force the compiler to compile something in gtr1 because that's not very obvious we have to make this a hot code path and that could require like loop iterations different things and that may be not feasible so there's a way to enforce that by using aggressive optimization they're gonna end up in tier one anyways so let's do that now and let's recompile and let's see what the result is going to be so let's restart and again let's dump the heap with a specific type let's copy them to information so now i have two methods mln and now we're gonna see a change and the change is optimized to one as you can see so now when we go to the code we're gonna get an optimized version which is much shorter the allocation call here is fast alloc so it's a fast way to allocate stuff on the heap but it's still on the heap but now let's look at n because that got allocated on the heap before and like tier zero so quick when it's quick cheated but in tier one compilation it's allocated on the stack so that's very good but now we have to somehow how explain the situation why is it happening if we're gonna re remove this and everything is gonna fix itself because it totally will fix itself in this um win dbg as well well let's look at something different now so let's go here and let's do the following trick let's switch these methods around and now as you can see n got optimized by default but m didn't so as it turns out the tiered compilation and fast compilation is really dependent on the ordering of the methods in the class and that's a that's kind of a problem but that's kind of cool um because you can do certain things with it which is interesting and one of the things you can do is if you know that you want stuff if you have a problem like that and you want n to get possibly optimized away we can do it by inch by calling n first so you can imagine that you have a certain startup or bootstrap method which will call certain methods in certain order in order for for them to be able to be compiled in the most efficient way it's not ideal it should just work but it's not work working currently so this is what we have to do sometimes and just to be sure let me prove to you that this is not a bug in sharp club let's do the following let's switch places so let's call and first then let's call m and that will fix effectively the method n and it would fix every single method that we we're gonna have in that class so every single method will get optimized that can get optimized so if you call m which is the slowest version of the input first everything will not get optimized but if you do it in the reversed order everything will get optimized instead that can be optimized so that's something that you have to keep in mind that's related probably to the tiered compilation and it's probably still related to in-lining as well because the folding in order to be able to fold like that you have to inline the function get but let's test it out so we switched places so now we're just calling n and then m so what's gonna happen now so let's dump the heap again let's get the method table let's go to m first so as you can see m got unchanged nothing really changed in m but now let's see the method n and as you can see it got optimized and folded into a const which is awesome so this is this is a very interesting feature currently it's not all that useful because it's very sort of like hidden it should just work but the the algorithm that the text uh the con the conditions that you have to fulfill is very tricky to do and currently you have to be very aware what you're doing in order to be able to use this feature correctly so perhaps in the future it's going to get implemented a bit better and that will be awesome but for now you have to be very aware that this exists and you have to be aware of all of these situations that i've just shown you because otherwise you cannot really use this effectively and it's really effective when you think about it because it can fold stuff it can optimize stuff it can be virtualized stuff so it's something that's in like performance critical scenarios you have you want to do so that's um that's something to consider so if you liked the video leave a like possibly subscribe leave a comment if you have any suggestions if you found some bugs possibly we're gonna try and fix them in the description and that's all for this video so thanks for watching and see you next time bye
Info
Channel: LevelUp
Views: 1,004
Rating: 5 out of 5
Keywords: C#, csharp, JIT, performance, dotnet, compilers, programming, computer science, tutorial, C# tutorial, clr, clr internals, just in time, just in time compiler
Id: EUbOaUthJPk
Channel Id: undefined
Length: 19min 23sec (1163 seconds)
Published: Mon Sep 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.