C# Switch Case Internals #1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

If the jump table is slower with only a few cases, why doesn't the compiler compile a simple switch to the same assembly as the corresponding if/else statements?

Seems like a pretty common case which you would like to optimise for.

👍︎︎ 3 👤︎︎ u/Vis-Knut 📅︎︎ Sep 24 2020 🗫︎ replies
Captions
hi everyone and welcome to a brand new video series called c-sharp clr internals and today we're gonna be talking about switch case this is going to be a multi-part video because the switch case has a lot of features as an instruction so it's a very big topic to to talk about and that's why it's going to be split into multiple videos but today we're just going to focus on simple switch case examples with mostly numbers so let's jump in and if we have a code like that and for example we're trying to check for something um if x is for example equal to two then we're gonna return two and if x is equal to one then we're gonna return one what you're gonna see that the compiler will generate a bunch of um comparing instructions so first of all we're gonna compare if x is equal to two and if it's not equal we're gonna jump to one and if this isn't equal then we're gonna return zero otherwise we're just gonna return two or one so this is a very simple if else statement and by comparison let's see how we can rewrite this in a switch case manner right but first of all let's ask ourselves the questions why would we want to rewrite this in a switch case scenario for example well most of the time the community and microsoft will recommend that you use a switch case keyword when you have a bunch of if else if statements because what they claim is that the performance of a switch case will be faster and second of all you have much more readability and much more control because the switch case it's much it's like a simpler instructions instruction you can read it um without any problems it's not so deep it's not so nested and you have a bunch of additional features under your toolbelt so we're gonna put the performance claim to the test today so let's try and rewrite this in a switch case manner so let's do switch x um case two return two and let's do now case one return one so almost the same code got generated but as you can probably tell it's a bit different so now we're comparing our x to one and if it's equal uh then we're gonna return and otherwise we're gonna compare it to two and if it's equal then we're gonna return to otherwise we're gonna return zero so as you can see these branches got sorted which is undesirable from high performance scenarios well because what's happening in high performance nerves so um there's there's this component called branch predictor and it will do speculative branch analysis and it will take branches speculatively so in order to increase performance by the way but in order to like work with it and you know make it fast what you have to do is you have to order your branches in a very specific way in high performance scenarios so that sorting operation heal will completely destroy that process so for example intel recommends that the most likely branches should be at the top and uh the less likely branches should be at the bottom of this sort of if else if chain right so that's one of the examples that you can sort of have but there's more optimization techniques around like setting up branches in a very specific way but this will completely trash it so it's not very good for like high performance but still this is a very simple example it's just two branches so maybe if we keep adding branches more interesting things are going to happen and let's see so let's do case 0 now and let's return 0. so as you can tell probably this is completely different now so this code um got changed effectively to a jump table so the correct terminology is the offset jump table and this is the part where we have our jump table so we're going to load a table from this address here then we're going to pick the correct index of the table and then we're going to load our address the first address of this method and having all of this so let's add the offset to our base and this will actually create a jump to the correct instruction here so that table contains relative addresses of this method here so now what we have is we have a jump table which is interesting and it's really cool and you might ask yourself why is this even here well the obvious answer is because of the performance because the jump table should be faster than a bunch of like if else statements that this sort of generated in the previous example so let's put that to the test now so let's go to some code and let's measure the performance of our if else statement so we just have two statements as before and let's measure the performance and let's do a bunch of measurement and then let's average them out so let's see here so it took 80 ish milliseconds right by comparison let's see how our simple switch case will get handled which sorts branches so our simple switch case runs in about 90 milliseconds so um the performance is a bit worse uh as compared to if else statement but like microsoft and people recommend if you have a lot of them then you should consider using a switch case because otherwise there's no benefits so let's test out the free case scenario so we have now three cases here and let's do an if else check so unsurprisingly that didn't take much longer because well first of all this data set is not uniform it's increasing so the branch predictor will have an easy job to optimize this and take the correct branches always but let's see how our switch case performs which now will generate a jump table so let's see how it performs and perhaps surprisingly it's two times worse than the last switch case and the last if else statement so um why is that well if we look at the jump table code then what you're gonna see is that these instructions will have a cost associated with them and that cost is a const cost but it's not free so we're paying for that so perhaps if this is a const cost maybe if we have a really big you know if else chain that would be faster so let's check it out so now what i have here is i have a big else uh else if chain and let's see now how will this perform so now that takes 280 milliseconds so let's now check the same switch case example you know that this we're gonna have the same checks but we're gonna be using a switch case which means that we're gonna be using the jump table now and like i said the cost should be pretty much the same and it is so it takes roughly 126 milliseconds as the first example of jump table so that means indeed the switch case is probably faster in most scenarios as opposed to a chain of if else statements because as you can see the jump table has a static cost associated with it there's other ways that this is really interesting because that table is very robust and that that whole feature is very robust and we're going to see now what else it can do to improve performance so let's jump to um clr code and let's um let's actually see how this is implemented so this is called a lowering this this feature here so a lowering in a compiler it means you know we're gonna take a complicated abstraction for example a complicated um set of instructions and then we're gonna rewrite them in order to be more simple in a way so this is what it is maybe it's not more simple in this case but it's much more optimal at least so as you can see this is a gigantic feature this is one method by the way and this contains like something like eight hundred to one thousand lines of code if we would do the accounts correctly so it's a very big features with a lot of branches a lot of things that can optimize the switch case in different ways so let's see how else we can optimize this so for example if we have just two jump targets and by jump target i mean if our method has because this return by the way is going to be a part of this switch even if it's not write it out as default it will be implied that this is the default path in the switch case because we don't have any code here so let's see what's going to happen if we just have to jump targets so this is a jump target and this is a jump target so we still sort of generated right now our you know jump table but let's see what's going to happen if we keep adding cases so let's add another case and let's just call this case like 4 for example and now what we have is we generate a bit test so that bit test will be much faster than the jump table and why we can actually generate a bit test well because what we have here is just a bunch of instructions and that bit test is totally doable here so this is faster and we can go even faster if we don't have any gaps in our cases because the bit test got generated only because then our numbers aren't really without any gaps so if we don't have any gaps now we can reduce it to a simple just just a simple check because what we're gonna do now is we're gonna check if we're within this range and that's that's going to be it because remember we have just one jump target but if we have some simple gaps then we can do a bit test to do the same thing so that's two interesting cases where this table got optimized away but it got optimized in a very specific and interesting ways so let's fix our jump targets and let's do the following so as you can see now this code generates a jump table again but we have this compare with four so what we're going to do is we're going to check if we're in this range if we're not then we're going to jump to zero and remember that this array that got generated the table is index bar by our x here so what's gonna happen if we're gonna have a big gap let's let's for example say that our gap is going to be five that's still good but what about nine well if we have nine then we check if we're within this range and we're gonna generate a jump table for this range exclusively and if we're not in this range we're gonna jump to 1d and we're just going to emit um what we can call an if statement so now this is a jump table plus an if statement okay so what's going what's going to happen if we add more so let's add like 10 and let's correct it to 9 and 10 for example the same thing happens but now we have two checks so this is a jump table and two ifs okay why does it happen like that why isn't it a part of jump head because if we have an if this is a this is an index of the table what we're gonna do is we're gonna have a continuous block in memory which will have a lot of wasted memory because this is a big gap and that big gap is not going to be allocated by anything so that's just waste of memory and that's why we don't do it like that so let's add a third item let's add a third case and now as you can see we're gonna uh we're gonna have a offset table so jump offset table and we're gonna generate another table just for these two so as you can see um the compiler is really smart here because if the gap is big enough and there's you know enough items between these two gaps so we're actually splitting them into this thing set divided by the gap then we can generate multiple jump tables and it's going to have really good performance as well because that check and that offset that we need to add in order to be able to set ourselves correctly here is not that expensive as compared to um you know just generating a big table because that will consume a lot of memory so that's um that's one of the ways that the compiler optimizes ends so if you like the video leave a like possibly subscribe if you think that this has bring you some value and in the next video we're going to see how this handles boolean you know boolean expressions and possibly string expressions as well because that's going to be a completely different optimization technique so stay tuned and that's all for today thank you and bye
Info
Channel: LevelUp
Views: 1,798
Rating: undefined out of 5
Keywords: C#, csharp, JIT, performance, dotnet, compilers, programming, computer science, tutorial, C# tutorial, clr, clr internals, just in time, just in time compiler
Id: soDX_IeZsqM
Channel Id: undefined
Length: 14min 45sec (885 seconds)
Published: Thu Sep 24 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.