Performance Improvements in .NET 8, ASP.NET Core, and .NET MAUI | .NET Conf 2023

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hey everyone we're going to talk about net performance and I have my best friends Stephen and Jonathan and we're going to talk about performance improvements in net uh Maui and as.net core so I believe Jonathan you're going first so you take it away with M performance improvements yeah sure so my name is Jonathan peppers and and I work on the Android workload for net um but I also work on net Maui um so yeah I'm the opener opening yeah you're the opener let's get going yeah um so net Maui in general like our Focus for net 8 has been you know fundamentals and quality uh we really want to start with a good foundation for everyone to build their apps with um and so you know we really focused on bug right um where performance you know wasn't our top thing you know it's still important we want to be slightly better than Net 7 and when it came time for me to kind I've been writing this yearly blog post about the stuff we did I was surprised at how much we we had to talk about um so if you want to check that out I'm this is really going to be a whirlwind of of the couple things that I'm going to talk about but if you want to see the post check out akms Maui per8 um so here's a visualization of where we where we are um this is uh Android startup time on a pixel 5 uh Android we we focus on improving startup where iOS kind of has a different problem where we focus on app size uh and we're just trying to progressively get better each release um the Net podcast app here is a sample app that we've had for a few years and it was written in Maui and so we don't have a zon number but you can see here you know what you'd anticipate to get moving from Zurn forms to d8 um and so likewise IOS app size we've moved things a little bit um but the big winner here is uh Native aot on iOS which uh this is a new experiment that we have available now uh and it's it's like that's that's a lot smaller bar um so let's talk about what that is and how that works um so native aot U to turn this on is the same way you would turn it on for a console app or a web service is you set published aot true and and for now we'd recommend that you try this on a release build uh of your iOS uh or or Mac app um and the thing that mentioned here is that when you turn this on you're no longer using mono as the runtime of your app so things are going to act differently um NATO aot actually has to fully trim your app all the manage code and and ahead of time capile everything uh and that's why it's so much smaller is that it actually does trim everything and and trimmer warnings are going to be a big focus of of net in the future uh so that's one of the things we want to improve on like to get this like the stable uh thing that everyone uses we want to uh solve all those those trimmer warnings um we also want to experiment in n like how we could get this bring this to Android we don't know 100% like you know that's probably not going to happen immediately uh but we're going to work towards those goals okay so here's the first Improvement I'm talking about U this is really a quality issue but um it it impacts performance a lot so if you look on the top right there is a a Swift Code example and so you imagine that you created this View and a parent and a child and you call ad um the parent has a strong reference to the child and the child a reference to the parent um if you do this on a reference counted platform like iOS or Mac um that's a memory leak and and net developers are not particularly uh knowledgeable about this problem this all new thing like it's not a problem if you're just using vanilla C objects um and so it what we did to combat this problem was we created a rosin analyzer um and with the goal of we would turn this on in the Maui code base and and kind of find all of these and then fix them all um we made a lot of progress in8 using this um but we hope that maybe one day we put this in the iOS workload so that like if you're coding against UI kit that you'll get the same like benefits that that we're doing uh in the mountain code base um so if you want to learn more about this there we do have a Wiki page that link there um that goes into this problem and that talks about other memory related problems that may be interested for Maui developers so let's talk about Android Improvement um this is probably the most basic control uh on Android is the text view um and it's it was originally Java which you can also call it from cotland but um the way we we provide this to net developers is instead of uh providing this strange get and set text methods and a car sequence type which car sequence is an abstraction over java Lang string U we want to offer them uh a text system string property and so that's uh that's how we've done things even back in the zamon Android days this binding hasn't changed but this is very usable um for C developers if you look underneath one layer to see like what is what's happening underneath um what we previously did is we create a Java length string call the underlying funny property let's call it funny property and then dispose it immediately um this wasn't great because you know it allocates an object on the Heap and we have to do some bookkeeping to manage this net object and this and the Java object so that that wasn't great um so in some cases we were able to move to this and so some of the code here is is what you would have seen if you looked at that text formatted property but the part that's interesting is We Now call jni environment strings new string which it does a p invoke into Java to create a string and we get back a jni object reference which is a struct or a handle to the string um so we don't have that object floating around we allocated it on the step Z instead of the Heap um and then we just dispose at the end but we don't have all that bookkeeping between the the C and Java worlds um so that this was a good thing now this does feel kind of lowlevel but imagine every label in your Maui app is calling this this property um and if that's in a list view or a collection View and you're scrolling this is getting called like while you scroll um so this is why we try to to make these kinds of improvements um at a lower level uh so one more thing to talk about this is um uh diagnostic tooling that we've had in net um net Trace lets you profile your app and and look at that in either speed scope or perfu or you know your choice and GC dump is similar it lets you get a memory snapshot of your app um and so on mobile the way you you use these tools is you use another another tool called DS router to um intermediate and and talk to your app on the mobile device and then you use net Trace to talk to DS router and I mean it sounds complicated with me explaining it but we did take a pass to try to make it even simpler than it was uh you used to have to forward some ports on Android and and iOS and so that we've simplified that a bit um and then the other thing that is new is we didn't have GC dump support before um and where we have it today so uh setting this up is still a little tricky and so we do have a Wiki page uh akms profile DMI um and there's a link for each platform because it it depends on like if you're on Windows you can you it's completely different than if you want to like profile or get dumps on Android or iOS so so check out that page as well so that's all I have um I think you know these are the links I mentioned before make sure you check out the blog post on Maui if you want to uh get into more details I couldn't mention everything those are just kind of the highlights but um I'll go back to David and and let's see what he has to talk about asp.net if he's ready Jonathan that was pretty awesome I learned some stuff just know um I think it's pretty amazing how you don't think about how individual labels work but then when you kind of like have a thousand on a on a big UI or big page you're like oh crap you know we're doing this thing a million times that adds up right um yes yes so I'm I'm going to talk about aspnet core performance improvements and then we'll go to Stephen to close us out with the net performance improvements and his massive blog post that he's done every single year that I love to read um let's let's get to it so performance improvements in ASN net Court um 8 as usual um the over overall team spends a lot of time on performance improvements and this year is no different I want to start with the blog post so every year we have um a blog post for kind of like every part of the stack um stepen to's epic like multi-page blog post that that you can print out but we also have a blog post for as.net and and for moi um Brennan who's one of our Engineers who works on a lot of the performance improvements himself um wrote the blog post this year give it more likes please give it more comments um give it more love it's a lot of work the engineering team does every single year to make things go faster what always impresses me is that we find ways to make things faster even though they're like really fast already there's just so much I think a lot of the L hangan fruit has been discovered or handled so we we're finding more creative ways to kind of get perf gains as we go through the cycle I stole this slide from the keynote um this is our Tech EMP power uh benchmarks we tracked this this every single year so every every new release we kind of get together as a team we talk about goals and how we're going to improve various benchmarks a subset of those happen to be the tech EMP power benchmarks um last year there was a lot of Rumblings about you know is a core fast are the benchmarks cheating we spent a lot of time this year making sure that our benchmarks for the idiomatic iset core applications was as fast as we thought they were um um I have a there's there's benchmarks that compare us to other Frameworks I don't have a slide for it but definitely go check out the EXP core benchmarks we spend a lot of time making minimal apis NVC middleware the entire normal pipeline as well as the lowlevel benchmarks faster some of these gains aren't just from the platform they're from the benchmarks themselves we often find issues in the platform from the benchmarks we also benefit from improvements happening in the runtime itself the runtime um Dynamic PJ um jit improvements GC improvements those end up bubbling up and and combining into this 18% so the improvements I show don't account for like all the improvements that you're that you see in benchmarks it's a culmination of of you know across the stock Investments right into it let's start with header parsing so digging into how header parsing Works in Castro we have we use pipelines which is this awesome awesome um tool that we built in net core like 3 and the way it works is we grab data from the socket and then we allocate 4K blocks and we stick the data from the socket into these 4K blocks right and it for the most part if your request fits into a 4K buffer it's really fast um we run a couple of benchmarks and we were seeing a lot of allocations in places in places that we didn't we didn't think we we would see allocations and it turned out whenever you got bigger headers or you got different size requests on these boundaries between this 4K buffer link list um we would allocate an array to make it to make it EAS easier to parse so you have like 4K 4K 4K and you would parse parse parse parse and you would hit this boundary where you were crossing two 4K buffers we would grab we would grab that entire block and then copy into a new array and then parse it and move on so one of our engineers Brandon actually spent time looking at this specific issue and he you know tried to optimize that to avoid the the um the new the new Heap allocation that happened per per request essentially and he got a 18% performance Improvement um the change was rather small the technique used was one that we use all over the stat so it wasn't a brand new technique but it was one of these places where you know in that's an edge case that won't happen super often but as we got more and more benchmarks that did more and different kinds of request we saw that um you know it it did matter in some cases so the before is you know 1.7 million um Ops per second and then like 2 million so we got a 18% improvement with no allocations right in in most cases and then the allocation savings if you if you if I zoom in I'm not sure if I can zoom in here but on this this little thing I'm hovering over that you can't see okay I I can zoom in there you go you see this two array that is the difference between these two profiles right this is perfum it's beautiful um UI showing you that this two array call is happening every time you parse in this Benchmark so we we got the allocations down from 7.8 gigs to 2 gigs what's always funny about looking at like gigs per second these benchmarks are doing millions millions of iterations and requests per second it's not your typical application but we basically run these apps in this stressful environment to see what the worst case scenario is so we got a huge huge gain here and then just to show the the kind of code we had to write to make this work before so header is where the header buffer was that was maybe across multiple different buffers and before we used to do two array right you saw that that code here two array now we say is this header tiny if it's less than 256 bytes just stock stock allocate that bite array and copy it into there otherwise rent an array from the array pool and use that as the storage so you still end up copying data and slicing it but we end up not allocating for a vast majority of cases um there are like cases where headers are massive as I've seen a 16k header before but you know for the very common case of you going over the the 4K boundary for a small header we handle it pretty well and don't forget to return to the ray pool right to make it effective you can use these these techniques in your apps as well if you're trying to really hyper optimize low-level details this is how we we do it in Castrol and you can copy the same into your apps all right more improvements so we didn't just improve Castro we have other servers in aspnet core and a lot of these servers are used by internal teams who were Windows based and they kind of find these improvements and they'll send this email or F issues on GitHub and we'll invest we we'll invest as well the team cares about the performance of everything so we improve performance for is hgp CIS even if our primary focus is Castro you should use castol for sure but we spent um a bit of time improving the response um buffering and the throughput of HB assist applications one of those big changes actually I I made the first one it was to avoid double dispatching from the thread pool so there was a case where we would cue a threadpool work item on you know a new request and then do it again and it kind of happened just because I believe the code was written in a way that it was hard to avoid so we untangled it and then we got this 15% performance Improvement um it turns out thread hopping is pretty expensive and then one of the other big ones came from an internal team seeing a performance issue and then the funny thing about this change was that it was a simple flag we had to set um whenever you wrote the request so HTTP CIS is in the windows kernel and whenever you call right on the the response it could either choose to write to the socket directly or it could buffer your right and then choose how it optimizes your right later on that technique is also used in castal you know at at Ouray but you can actually have the kernel L do the same thing by setting this flag so we have this new new flag that you can turn on to avoid every single right going to the underlying Network and it's a boo that we turned on very simple um enable kernel response buffering true and I believe it's on by default no in net a someone someone can tell me if that's right or not um and this sample just shows a minimal API returning a path to a large file there's a big file somewhere and the scenario that we we we had was a server that it believe was in or the client was in West us and the server was in Sweden and we tested a 200 12 megabyte download um where the round trip time was 200 200 Ms and without this feature the the the result of the file download was 11 minutes on no before this feature was on so Net 7 and before and then with the feature it was 30 seconds so huge huge huge huge value um this also goes to show that you need the right benchmarks to show you the right kinds of um throughput issues we wouldn't have caught this if the team didn't have this this big gap between the client and server uh round trip time so we we were thankful for for teams that had this experience in real life and could give us a benchmark so we can actually verify that this change made a big difference all right server more server improvements for htb CIS we use GC because we're trying to because htb CIS lives at the kernel air and we're trying to interrupt you know every single time you write a response or you read a request you're doing interrupt um there was a lot of fragmentation because we kept allocating GC handles over and over and it showed up in a couple of our profiles so we ended up using uh Native memory for this small part of the response writing to reduce pinning and fragmentation um this isn't a thing I would recommend anyone just do for their apps but it it showed um beneficial on our benchmarks when we were testing out um response response rights another Super interesting L hanging fruit I think we wrote this code forever ago um and it affected both I and htb 6 because they share the header implementation because under the covers is does use hbis so we ended up sharing that between those two servers count just did a crazy link statement and and know and and it was super expensive and now it's free but that took someone using it seeing the per issue and sending a PR so I believe this came from the community and before you know it's kind of embarrassing counting a single header you know it allocated um 176 bytes and then counting large headers you know had this big um N 9 kilobyte um cost but then afterwards it's zero allocation and much faster so there you go you can now count headers much faster in net net 8 with HBS all right we're almost hitting at the end I want to talk about Native alt we spoke about Native Alt breef briefly in the keynote native aot enables you to build small fast to start binaries that you know embed the net runtime including the GC there's no jit but no including the GC and it trims the code super aggressively so the team spent you know the entire release trying to make a subset of the platform trim friendly and the funny part about it is when you use things so we did minimal apis you know as the first um framework because it didn't have a lot of depend IES but just doing minimal apis had this dependency graph where we the team learned how to make assemblies Trimble and we had to make I want to say 60% of the framework terminable to make minimal apis even function so we we had a lot of experience and trying to you know make sure that apas are paid for play so when you you know turn on one feature it won't pull in all of this and that those features um to make this end to end work really well where you get um small fast to start binary you basically have to make everything you do you know opt in so we added new apis we added new features to enable making this thing work the other part is reflection and dynamic Cod generation a lot of aset core is you know classes and Discovery and dispatch and those patterns don't gel with aot so a lot of this work was trying to figure out how to make you know the entire framework work while getting these perf gains of fast start time and and um reduce this size and not you know making through put worse uh go to Damian's talk Daman Edwards has a talk on Wednesday about aot and we're dive into more details about you know how we made it work and the specifics and what it means for you I just wanted to talk about the performance improvements that we saw as a result of this work so here's one of the apis we ended up designing as a result of having um aot um the aot goal how do you build a super tiny aspnet core applic appliation without dependencies as we looked at the dependency graph of the default project because the web application Builder kind of brings in authorization authentication rexes the dependency graph was huge massive and we had this goal of trying to get it below 10 Megs right I think yeah 10 Megs by default um for for no features not not with features this create empty buer is the most minimal app you can build with no dependencies it doesn't even have a server it isn't usable but it's used to just show you how far we got on the journey right and I have this really cool animation this is like 50 Megs um as you add more features you pull in more stuff and more content so just be aware of the fact that when you're when you're um trying to aot an application you have to worry about every dependency and how big it is and what it might bring in um but yeah they're new apis to enable new scenarios that you didn't have before so there there're more tools know to to make it small and all right so one of the coolest features of net it in my opinion is um the minimal API request delegate Source generator we call it RDG and if you've ever seen minimal apis you call like you know app. mapg get map post maput whatever the the htb verb is you can do binding in functions um you can do um open API and various other other features we minimal API today is extremely it uses Dynamic code generation a lot the reason it's so fast is because when you call map get map post we build this super efficient function that does only what is needed by by the um based on your arguments so it can basically hyper optimize for zero alloc for a zero allocation method that you know will parse the int and will pass an INT into your function and will not box or not allocate it's super efficient but it also uses all the runtime code gen to make it efficient um what RDG does is RDG takes all of that logic that happens today at runtime and it does it at compile time and it does it with this new feature called um interceptors which you'll probably hear about later in the conference so we can look at calls to map get and when you turn on publish aot it will replace that map get map post map put call with one optimized for aot so instead of looking at all the arguments and having this code generation using IIT to create a function and do all that stuff it'll actually spit out source code that you can read and debug into that shows you okay if you declare an integer as my input you can kind of like um see the int. parse being called and being passed to your function so it makes it really transparent um to figure out how um the the how it works and it's actually a really cool way to get an understanding of how the framework works works because the dynamic code gen is no part of your application and it's more static and it moved a ton of the the code a ton of the start time from you know first run to compile time which has its own trade-offs right so here's an example of you know uh a sample that this this is a time to First request so I start the app the app is cold I make a request and I time the the booting of the app to the first request response right um and in the case of two rows the difference between having RDG enabled versus not is Tiny right but when you have a thousand rades it's huge right almost 10x so that just shows that shifting the startup from Shifting the time from startup to compile time has this huge benefit on first request which will make a huge difference in serverless um serverless platform like function style platforms where you're constantly coold starting your application so the the the idea here is that you know with aot plus you know Source generators moving the performance cost from start time to compile time you end up with these kinds of um big wins so not a throughput win but a startup win there's a lot more improvements um as one of the big things as part of native alt was that the the GC got this really cool feature called datas dynamically adapting to application sizes where server GC can kind of grow and Shrink heaps on the flight it's a really cool feature and it kind of solves one of the big complaints that we've had in net from the beginning of time where server GC you know leaks memory or Tes it too much memory because it's lazy um that's one feature I I care a lot about um that's coming up so just read more about improvements in ascore and.net on our various blog posts and here are some links performance blog ET and now I think I'm going to go to Steven to close us out with his awesome net blog post performance things awesome thanks David that was great uh one of the things I love about do8 and previous do net releases is how these performance improvements sort of up and down the stack are done sort of coherently aspet seeing that certain things would really help performance working with uh folks in runtime and core libraries and the Community making those changes lower in the stack taking advantage of them higher in the stack and you get these amazing endtoend performance improvements that was that was great to see um so you know I I want to spend the next 20 minutes uh talking about performance improvements inate sort of in the in the runtime in the core libraries now as some of you may know uh you know it was mentioned Jonathan has this great post on Maui every year there's this great post on aspnet performance improvements every year uh I've also been writing performance improvements in net posts uh going back to net core 2.0 so I wrote one about net core 2.0 2.1 net core 3.0 performance improvements in net 5et 6 Net 7 and then about two months ago I published performance improvements in net 8 and to just to give you a sense of how much goodness there is in these releases if you were to print out this post it comes out to about 220 Pages um hopefully 220 pages of enjoyable reading uh but that also means there's there's no way I can cover 220 pages of performance material in 20 minutes I'm not going to try uh so what I've done is I've instead pulled out just three changes uh that I particularly like I'm not going to talk about AVX 512 I'm not going to talk about how async gets faster I'm not going to talk about how formatting and parsing get faster or uh how crypto gets faster any of that stuff I've pulled out just three changes and we'll dive into those to give you just a taste of what awaits you when you upgrade your applications and reap the benefits of of net8 so with that let's dive in um David uh mentioned this word uh previously just briefly but this is uh by far uh my favorite Improvement in do8 from a performance perspective when you upgrade to do8 this has the greatest chance of being the thing that really significantly moves the needle for your applications and services in terms of performance Dynamic pjo uh we shipped a preview of in net 6 it was off by default had some rough edges we were kind of just kind of getting experience with it we shipped a preview of it in Net 7 again off by default we wanted to get some real world experience people willing to opt in uh but now if I had to pick a single PR in net 8 one PR that is the most impactful it would be this one that changed a single character there was the entirety of the pr poll request changed a single character from a zero to a one enabling Dynamic pgo for every net application and service running on Neta it is a game changer now I've said the word Dynamic pggo probably five or six times down I haven't actually said what it is so what is dynamic pggo with understand that we have to go back a few versions to net core 3.0 and understand tiered compilation tiered compilation is the idea that the just and time compiler the jit compiler can compile a method uh once and do it kind of at startup the first time the method is invoked and basically totally isue optimizations just tries to get in and out as quickly as possible at usable codee not necessarily particularly efficient but that's okay because the vast majority of methods in a typical net application are only ever invoked once or maybe a small handful of times and if the jit were to spend a whole lot of time optimizing them it'd be spending more time optimizing them that it would actually save from those optimizations so the jit first compiles them just as quickly as it can gets in and out gets the app up and running and then the runtime keeps track of how many times those methods are invoked and once a method trips over some threshold say 30 invocations the jit kicks in again now recompiles that method and generate and throws at it all the optimizations it can muster to make that code super efficient so you get really great startup and great throughput but the really interesting thing about this is there's information that we can learn from that initial fast compilation which we call tier zero we can learn information from that that we then apply to that tier one that optimized tier one compilation for example if you have a method that's making virtual calls or interface calls those are less efficient than a a direct call uh but the runtime can now track in that tier zero compilation the concrete type at every call site that's most commonly used to make a virtual call or an interface call and then when it comes time to generate the tier one uh code if the jit sees that at a particular call site there was a very dominant type let's say you had an I list of T and 99.9% of the time it was a list of int well the when a generates the tier one code it can choose to specialize code for list of in and then have a general fallback uh as it would have in the past uh and this allows you to uh significantly across the board improve uh amazing amounts U of net code so let's take a look at this in a in a small demo I have a uh a trivial little main program here I'm calling a billion times to a method is empty uh and this is taking an ey list of int and it's simply making an interface call on that ey list event to get the count and comparing that to zero pretty trivial thing uh now I'm going to uh build this just doing it from the command line because we're we're going to be interacting with this from the command line I'm going to take my Net 7 version of this because I'm building this for both Net 7 and net 8 with two target framework monitors we'll run this and we'll see for that billion invocations of is empty and then billion interface calls this is taking about three three and a half seconds to to do that work onet 7 uh I will instead run this now on net 8 the exact same app the exact same command just for the different build and now instead of three and a half seconds it's taking uh a little more a little better than 2x that 1.7 or 1.6 seconds and we can see exactly why that is so one of the really cool features the jit compiler shipped in uh inet 7 is that an environment variable jit jasm summary which I'm going to set to one and what that does is it tells the jit compiler to print a line to the console for every method that gets compiled uh so now if I run that exact same application again we see a bunch of stuff spewed and we see something interesting we see a call we see a a line here uh where the jit was compiling is empty at tier zero so this is that initial unoptimized implementation of is empty but then the runtime noticed that is empty was getting invoked quite a bit because we were calling it a billion times uh so now it compiles it again this time it does it still unoptimized but with additional instrumentation in that method in order to learn information about it what's hot what's cold where are my branches going what am I calling what do I need to De virtualize and so on and then after it's gathered enough of that information it compiles it yet again this time it's compiling at tier one with Dynamic pggo taking full advantage of all of that information and we can drill down even further let me delete that environment variable so it doesn't clutter up uh our output and I'm going to add one more environment variable now instead of jit tasm summary I'm going to do jit tasm and specify the method that I want to look at and what this is going to do is tell the jit to print the console not that it was compiling but what it actually generated the actual assembly code don't worry you don't need to know assembly I'm going to highlight the the relevant pieces but it's really instructive to understand exactly what the jit is doing here so we saw that there were three times is empty as compiled we expect to see three blocks of assembly code generated and that's exactly what we see so if I scroll up here we can see the first compilation was tier zero and we see this call to I collection. count right this is that virtual or interface call being made great nice and simple cool then we move on to the second compilation now we're instrumented tier zero we still see this call to I collection. count but we also see a call being made to this other function core info help class profile 32 this is the jit instrumenting this function to build up a histogram of the concrete implementations what was that act what was that I list of int actually was it a a list of int or was it something else after it's gathered up enough enough of that information it then does a tier one compilation we still see this call here but now it's at the bottom of the method it's no longer sort of the primary thing uh this is the most complicated part of this assembly we need to understand but what this is doing is um this number here represents list of int this is the the number that the runtime uses to represent list of int it's loading that number and it's comparing it to the same to to the number that's stored in the object that's being passed in so it's checking to see whether this object is a list of int if it's not okay it falls back and it does the same thing that we done previously it calls I collection. count but if it is then it calls list of int. count well list of in. is simple it Dev virtualizes it and it inlines it all it was doing was loading a field and so that entire interface call uh becomes a simple move operation which is how we were able to see that 2x Improvement in these results uh so uh just by enabling this just by upgrading to net 8 Dynamic pjo is able to kick in and do all these kinds of optimizations across your entire codebase it's incredibly exciting all right so that was my my first favorite Improvement so let's take a a look at another one again there's tons there are hundreds upon hundreds uh you should go read the blog post uh but we'll just highlight a few um my second one is about system text Jason uh there were actually a ton of improvements both functional and performance in system text Json in net8 we're going to call out one in particular um there is a whole lot of work that system text Json needs to do to serialize and deserialize types it has to understand the shape of those types and to do that his historically serializers and deserializers would use reflection at runtime gather information about the types being deserialized and deserialized and just like David was talking about how the aspnet minimal apis uh would do all this stuff with reflection and reflection emit at runtime uh and it was replaced by a source generator that can do that all at build time similarly the system text uh Json library has a source generator that can take all that reflection of work that it was doing at runtime and instead do all that analysis at build time and generate at build time the all of that code so that no reflection is actually necessary and while it's doing that it can also generate additional fast paths to allow certain things to be optimized for that dedicated type if all it had done was gather up data on say a rectangle type great it doesn't have to use reflection to get at what properties are on the are on rectangle but it's still needs to go through sort of a generic code path to say for each property do some work for for each property to serialize it out but instead if it's generating code anyway it can generate an optimized serialization routine that just says start Json object WR X right y right width right height uh close Json object and be done and this is way faster that's great the problem is uh this functionality hasn't historically been used if you were doing streaming like you were calling uh serial Json serializer do serialize to a stream or if you were doing async you know serial I I async to a stream or otherwise and that's a big problem because ASP net that's its primary mode of operating right it wants to serialize to streams and not buffer everything it wants to go asynchronous and not block the world so this was sort of a a disconnect until now now in donet 8 all these fast paths apply everywhere they apply streaming they apply to async and the impact is dramatic so if we take a little Benchmark I've got my rectangle here setting you know X Y width and height and then my Ben Mark is just calling Json serializer do serialize async I'm serializing to a stream in this case it's just throwing the data away uh and I'm serializing asynchronously and if we look at the results for that uh on net 8 it's about three times faster than it was on Net 7 because it's able to take this optimized path and it allocates 10 to 20 times less uh going from something like 600 bytes per serialization to about 30 bytes per serialization uh so really really dramatic changes is uh very exciting uh and the uh the last change I want to highlight again one of hundreds um is a new type called search values um this is one of my favorite features in net 8 and I think it speaks to our general philosophy about how we do performance in general that the previous changes you saw Dynamic pggo the Json serializer you just upgrade and it just gets faster that is an awesome way of operating there's another uh apprach we take as well which is in addition to upgrading and things just get faster we also expose more Primitives more fundamental components things that you can use where if you want to tweak your code a little bit you can get even more improvements and then we also try and take advantage of those in our own code and get even more benefits and search values is a great example of that search values is a type that allows you to efficiently search for arbitrarily large sets of byes or characters so if you wanted to sear search for a set of you know bites or characters in the past you would have used index of any and index of any is really highly optimized but today only for small numbers of byes are characters so if you wanted to search for a or b or c Index of any has been awesome but if you wanted to search for six or 10 or 20 or 50 or a hundred different things the perf with index of any kind of falls off a cliff until now with search values search values allows you to make minor tweaks to how your code is structured uh rather than saying caching a Char array for example of the six things I want to search for abcxyz I can instead cach a search values search values. create abcxyz my call sites look identical I there's now an overload of index of any that takes a search value in addition to taking a Char array or taking a Spam but the power of this is that because we're doing this this call to create once when we call create we're able to analyze the set of things that you've passed in and choose from 15 or 20 different implementations which we can grow as well uh in the future choose from 15 or 20 different implementations behind the scenes to choose the most optimized implementation for this P particular set of characters or bytes and then we can also go a step further and compute all the relevant tables or vector Maps or bit Maps or whatever we need to make that search efficient we do it once we cach it and then every call to index of any can Dive Right In immediately take advantage of of those optimizations this can lead to massive throughput improvements and as a result we now use this throughout the stack this is used I think last count it was something like 50 different places in the core libraries in aspn net uh in fact there was a code in ASP net that was doing custom Vector uh vectorization using the vector 128 Vector 256 Vector of T and so on and we've been able to rip out a lot of that code and replace it just with search values which takes care of of all that for you so we built this functionality it's used throughout the stack HTP client websockets URI aspet and you can also tweak your code to take better advantage of it we also take advantage of it in Source generators and so uh as an example of that I want to highlight the regular expression Source generator which we shipped first in Net 7 but we've made even better in net 8 um so let me bring up uh Visual Studio again um here I'm using the regular expression Source generator just by having a method that returns a regular expression and has this attribute on it generated Reed Rex and what this is telling the source generator to do is fill in this method implementation to generate a fully custom implementation of Rex for this pattern now currently this pattern is empty you can see the XML comment here say this matches an empty string and if I were to drill into the implementation which visual Visual Studio shows me live what the source generator is is outputting we can see that the generated implementation for this recognizes that this is just matching the empty string and it doesn't do anything interesting but if I were to change this to say ABC now you can see the code updates live uh and what it's actually doing now is an index of for ABC and if I were to change this to ABCDEF you can see the code updates live uh as the generator is recognizing that my pattern is changed and it's doing something else um now if I were to go back to ABC uh and I were to change this instead of searching for a string ABC to search for a uh a character class ABC you can see the code has now updated to take advantage of another new method in net 8 index of any in range so now it's efficiently searching for anything a through C and if I were to change this to DF we'd see just updates live now it's searching for anything a through F but if I were to choose something that wasn't contiguous like XY Z now we see the code changes again it's searching for a character class everything a through c and x through Z and the source generator has recognized that the best way it could it could generate this Search is with search values so there's this field here that is a search values um you can see the comment says that search for anything in or not in ABC XYZ and if I go to the definition for that you see it looks just like the the slide I'm doing search values. create that's getting cached and then the source generated liation is able to take advantage of it and it's able to take advantage of it in a variety of ways for example if I were to tweak uh my regular expression here to be a star followed by that same character class um we still have this index of any here but the actual match possibly requires backtracking because if we had a whole string of A's and we need this expression to end with this well it's possible that this Loop would have matched all the A's and then it needs to back off uh in order to find something that would allow this to be matched as well so there's this whole implementation that's generated here to implement that complicated matching routine including that backtracking and to do that backtracking to go backwards through it if we scroll right here we'll see that there's this last index of any so not only is there index of any there's last index of any there's index of any except there's last index of any except that all allow efficiently searching using one of these search values of instances so not only are we using this throughout the core libraries not only are we using in aspnet not only is this exposed for you to use in your own code but things like a source generator are able to take advantage of this to generate really efficient code into your application um I would so you know I'm I I have 20 seconds left uh and with those 20 seconds I will say this is just a glimpse into the functionality performance improvements that are available on dtet 8 uh after this is done after you've had an exhausting day of watching all these amazing D netc videos and and um demos I would highly encourage you to read the aspnet blog post the Maui blog post and performance improvements inate which you can just find by uh searching in your favorite search engine being uh for performance improvements Ina grabbing a cup of coffee and uh hopefully settling in for a really good read uh and I think we have time for some questions thank all right the best part about those changes we get to delete code in aset core exactly h

Info

Channel: dotnet

Views: 7,616

Rating: undefined out of 5

Keywords: .NET

Id: YiOkz1x2qaE

Channel Id: undefined

Length: 48min 30sec (2910 seconds)

Published: Wed Nov 15 2023