High-performance code design patterns in C#. Konrad Kokosa .NET Fest 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone I'm really happy that I'm here today and I'm really happy that you are here today we will be talking about performance my beloved topic so I'm really happy that so many people is interested in performance my name is Konrad Koko's as I said as a single slide about me I'm an author of this book which is only 1,000 pages about dot dot memory management it weights to kilograms I measure the quality of books in kilograms since I wrote this book I'm also a co-founder of the initiative nataas which is an initiative spreading knowledge about knowledge about internals and performance and also a currently I mean involved into the development of the card game for dotnet developers how many of you like card games or I'll playing card games if you are interested I have prototype with me we can play after the talk or during the break or during the after party so that's about me the talaq the talk is about high performance called design patterns in c-sharp so the very first question is what is high performance in c-sharp whether we should even care about something like that for me it is very important I'm a c-sharp developer for many years I involved into dotnet echo system and projects that are written in dotnet so I really wanted to have a possibility to write high performance code in syphilis position sorry there's a fraud fraud mistake I was just going to say that I don't want to write in C++ but instead in C verb because we have such possibilities and having said that having said that this talk is not about a prepare a general performance like application performance monitoring is not about front-end or architectural scalability it is not about cashing cred these databases I'm only talking about performance in terms of c-sharp code and code level performance and also not talking about trivial advices like do not use do not avoid memory leak for example or do not use lean COO or such trivial advice as everyone probably knows that we should avoid heavy things this is about the low-level code that we are trying to write in c-sharp this mistake will follow him in the world of the talk okay so I well as a person involved in the high performance stuff in that nautical system I'm seeing some design patterns and I thought that it will be interesting to list some design patterns what we can do to write high-performance codes in c-sharp and all those design patterns that I'm listing here today is a kind of mine own initiative to try to list those design patterns that we are using a design pattern is important I think because it is just a common solution to a very common problem like in our object-oriented programming we have design patterns so here we will be having design patterns for performance for problems that occur in different context very frequently obviously such design pattern contains the expertise knowledge good practices how to solve these problems and also very important for us it evolve it helps us to avoid reinventing the well problem we have such problem and we are trying to solve it while the solution already exists in the market so let us just reuse this and more importantly such design patterns are based on more general fundamental principles exactly like the same is in the case of object-oriented programming we have design patterns in object-oriented programming but they are placed on solid principles clinicals principles modularity loose coupling high cohesion a lot of principles that are underlying the design patterns in object-oriented programming the same is for high performance there are some principles about more fundamental principles about that but this talk is only 50 minutes I was trying to put the description of this fundamentals into this talk but it does talk then it took me 2 hours I don't have so much time so there is a version of this talk which contains all these principles and link to it will be provided at the end of the presentation so I invite you to look at the slides but now I'm just skipping directly into the design patterns and the spirit of these principles will follow us during all this description of those design patterns so exactly like the case in the object-oriented programming design patterns what is a design pattern in contains a name which is the name of the design pattern obviously it contains the description of the problem that this design pattern tries to solve the solution obviously the benefits what are the benefits of using this design pattern besides the fact that it solves the main problem and also whether there are some consequences drawbacks caveats of using this pattern and with this structure I'm trying to describe you some common patterns that I found in the market that I'm found on the mouth my own experience so design patterns I this will be just a list of design patterns not the comprehensive one because again it the talk would took could take two hours if I would try to describe all I'm describing some most typical one the very first pattern and the domain names are my by my own so this is my name that I've just chosen the frugal object which is a pattern trying to solve the problem that we need to store efficiently set of data set of later that they can take various forms and the very typical problem here will be that we need to represent a collection a collection that is a has can have zero elements one element two elements or more elements this is a very typical thing but how we can store it more efficiently than there's just simple collection we can do it in the foremost kind in the kind of form of discriminated Union and maybe the description is not the most clear here so the example probably will be the best way to describe it this is a exactly the this pattern applied in asp.net core code base which is a type called string values and this is a type which is storing 0 1 or many strings and instead of just of using list or array we are having here a field which is an object and this is beneficial for us because this is design it for a case when most typically this collection contains zero or one element so in such case we can just store this simple single string into this type instead of using an array which will contain a single element this may be quite quite counterintuitive and quite strange but this makes this more it makes it with the better data locality because we don't have a lot we have one last reference to follow we will just have this single string into this type instead of having the array obviously if we have we want to store more elements than one we will be just using the array I underneath so it is kind of quite in counter intuitive way of solving this problem but as always in fast performance code it is all all the code which is fastest ugly sorry for that and as Dimitri one of self recently on Twitter most code bases that take care of performance will have such kind of collection that is tuned to have only one element for example all two elements because most typically we know for example that in case of our application our collection most typically contains zero one or two elements and he provided also my nice examples of this pattern like for example it JetBrains code base they are using a compact list which is a list design it for having typically only one element and if there is more elements they are just storing it into the really real array a little list quite strange but still it is beneficial it for the case that we have only single element art for two elements code base also from JetBrains a list that typically contains no more than two elements so this is a kind of quite strange pattern Fillie feel invited to in the.net performance world we have strange patterns here and this is like that if you have any questions about that I have to speed because I have some more patterns to say I've strongly invite you to come after a talk ask me questions if you have any doubts why this is beneficial or or so so this is a frugal object we are trying to have object designed for specific case like having a small number of elements in a collection obviously this makes also this another benefit of it is that it can be have better opportunity to tune the code because for a single element the access to this element will be much faster than accessing this array so there are benefits from many from many perspectives obviously there is a consequence the API becomes a little bit more complex because instead of using simple list or array we are using you are using some strange type now and also the code because a little bit uglier but this is a as always in case of performance so this was the very first pattern just to up you now the second pattern pulling pulling his tremendously popular way now how how we can solve and speed up a c-sharp code the problem that we are trying to solve is that we have create we are creating a lot of temporary data a lot of create a lot of objects which is which we create very often the cost of it might be direct like the cost of the allocation of the object or the indirect lag we are introducing fragmentation because of creating many objects and then the let deleting them so is the solution is by the idea quite easy instead of creating every time we need an object they create an object we instead of that we are just reusing some objects from a pool and tremendously popular way of solving this currently in a dotnet is a array pool heaven of you used a ripple so this is as you see quite quite easy to use approach instead of creating an array we are just renting such an array from a pool we are consuming it somehow and then we are returning this pool disarray to a pool so quite easy from the perspective of the API but it provides a very very big performance benefit this is a description of the short instance it has some predefined sizes of arrays and this predefined number of a race of in such of those buckets of different sizes this is an implementation little for us it is most importantly that instead of creating an array every time we need some array we are just renting it using and returning so let's imagine a very simple case we have some we have some methods with that we would like to benchmark with benchmark dotnet and this is like it contains with two stages it contains some list items which is a field of some which is a just let's treat it as an input we need to process the input and send it for dirt or some process batch method so we need this temporary array temporary collection to represent our pre process data it doesn't matter what the processing look like here it is as an example the most important thing for us is that we just need for some time a temporary collection to be passed to a further API may be to call a rest api or something so this is a method from which we are starting we are just creating an array every time we need some array and the result is like that for now we don't know it whether it is fast or not it is us just our baseline we can very simply change that to using a ripple now which instead of new ink up new array we are just renting them consuming by populating data then a process make processing and after after everything we are returning it to the pool and as we see the solution is quite easy API doesn't change didn't changed a lot but the result is quite important for us because now the code is almost twice faster just because we are omitting the the issue of creating an array and moreover as you can see on the allocated column we are just almost no allocating anything because we are just reusing the same erase every time we need the array there is only forty fifty two bytes allocated as for some enumerator here but the array itself is not being allocated and in this very first version we allocated four kilobytes per its match so it was quite big we retrim endlessly reduced the number of allocations here and so the pooling will be very popular way of solving many issues performance issues in dotnet there is an abstract memory pool with which is a represents a pool of memory and it can may contain various implementation and array pool is kind of this memory pool also you can look for a slab memory pool which is a memory pool used by the kestrel hosting and the requesting arise whether it is beneficial to make a pool for objects for individual objects I like strings or maybe your customer class or anything you use so most necessary it is not necessary most most often it is not necessary because the cost of handling the pool will over will outweigh the benefits of the pooling itself so we should be really careful careful with the pooling small objects pulling a lot of a lot a big objects is beneficial because we get rid of the cost of initialization and allocation of the big objects but for small objects we should be really thinking twice whether it will be good for us or not but there is for example you can copy paste object pool code from Rosland codebase and use it because there is currently no public api for object pooling in dotnet and there was an issue which was to do such thing but unfortunately unfortunately it was closed so maybe in the future there will be such api provided okay so it is a pooling you should remember a pooling is one of the best solutions to reduce the GC overhead to reduce allocations to provide better data locality which is very important principle that we need to operate on the same memory again and again because the cache of the CPU for example will be populated with this data so when we are reusing the same objects again and again it is very beneficial from the performance perspective because there are cached into the CPU there are some caveats like for example trimming strategy may be not trivial when to trim this pool I want to reduce the number of arrays from the Indus pool the the question does doesn't have any trivial answers and also the API becomes a little bit more complex because now we are just using the kind of manual memory management because we need to remember that we are renting an array and then we have to remember that we need to return it because without returning it to to that pool it doesn't make sense so it creates a little bit more complex way of handling memory but the benefits are huge ok filled pattern because the time is ticking so we need to move along the zero copy design which I'm calling like that because this is all about avoiding coping in fact coping memory which is a very costly operation every memory copy is a cost for our application and the performance hit so zero copy design pattern will be all around requiring using sub data like for example substring when you have a string and you are using substring method this is just creating a new string with the content of the part of the original string so we are just creating a new string and coping memory from the string into this smaller string this is a completely complete disaster in terms of performance obviously this old coping the same with you'd like to create kind of sub array or sub lists all these creates in Mike's memory copying so it is very very bad and the solution here in dot that is also tremendously popular we introduce some special type of kind this is a special kind of type which provides slicing capability and here our beloved span of G comes Kevin of you used so far span of T this is a really really powerful thing and in terms of performance this is the most popular thing of and every highly performance code will contain span of T in this or other way being used so span of T is kind of various in fact simple type it contains so-called manage pointer and a length and thanks to that span of T is saying that I'm presenting a memory from this place with a given laugh so it is just kind of representing memory from some place to other place very simple idea but it can provides very powerful capabilities because of that because now we can slice memory just by operating on the span of T if we have this kind of this amount of memory represented by span we can slice it to smaller part of memory just base because we are now saying that the it points to this place with given length on the operating on the manage pointer and the length itself and moreover beloved span can represent a lot lot of various type of data it can represent regular arrays and can represent stack allocated data that I will be describing later it can represent unmanaged memory in fact in can represent any pointer so you can be helpful with talking when we are talking with unmanaged code also it can represent strings and no matter what type of memory the current span is being is representing we always can use slicing so we can slice erase we can slice a stack allocated data we can slice strings and all this will happen without coping memory this is just creating a span that is representing a smaller part of the memory of the original memory so it is very very tremendously popular for example in parsing the whole casserole hosting is based on span because what these parts are doing parcels are mostly descendant in that way that we have some strings and then we need to interpret the smaller part of the string the smaller part of the string and so and so on we are interpreting the smallest and smallest part of the original string or like in kestrel when this is TCP data provides a strings we need to parse it for to find the lines of the HTTP protocol and then we need to interpret the specific lines of HTTP protocol to get the web to get the URL to get header to get the value of the header so every time we need to interpret the smaller part of something span of T will be very very beneficial for us for parcels as B - valium and there is only one caveat related to span Spanish is a special type as I said it contains manage pointers so because of that it cannot appear on the manage tip in any way because it will just crash the runtime so we have this warranty that the span of the Union will be never boxed and the problem is that it contradicts usage of assing because I think underneath is using so-called state machines the state machines can be boxed they can live in the managed heap so in other words we just cannot merge the a sink and span if we have a nothing method it cannot be using a span as argument or a local variable it will just simply not compile because it is prohibited and here on this small very fast solution to that problem there is a more general memory of T type that we can also meet whence talking about performance memory of T is kind of let's say brother of the span of T which is a little bit less flexible because it can represent only strings on race ID but the most benefit for that when using memory of T is that it can be living on managed zip it can be used in fact on every normal a c-sharp code including a sink so it will be also meeting very often such pattern that there is an async method that accepts memory of T and then you can fool somehow a c-sharp compiler just by using a span of T into the synchronous part of this I think method we are just somehow operating and fooling that we now can split the processing from a single part to synchronous part and this synchronous part can use span but yeah this is the idea behind it and for example if you will find currently in dotnet c 3.0 there are pipelines introduced which are all about zero copying this is a pipeline which is a kind of type which produces which improve in introduces a way of producer cost consumer communication or write a reader and this is all about zero copying there will be a lot of spans here because it's all buffered and both reader and writer can look at the buffers via spans white the zero copy approach so everything is about spun here and also referee turning and in general refs that I will not be describing here they're also ref returning slight is in this bigger version of the talk that I'm just linking at the end of the presentation so zero I really invite you to look at spam this is a future of c-sharp high performance programming and the king everyone is using it because of this capability of slicing obviously this is a very beneficial there is only one caveat I'm seeing that from now in high performance code instead of using strings or arrays we are just starting to use this very strange type which is a span or memory which so IPI becomes a little bit less unclear because we are now using these very specific types instead of the generic ones okay the fourth one struct of a race is also kind of producing so it's solving a problem when we have to process a lot of data when you hover we have a lot of data to be processed and you want to layout somehow the memory that in a way that will be efficient for the processor simply because in a normal way we just don't care about it and the solution will be just dead just that special type of data organization which is called struct of erase this struct here in the name doesn't mean destruct in c-sharp it is a struct which we can understand just as a type so it may make a classmate we are struct it doesn't matter this is we can translate it as a type of a race what does it mean in fact if you would be designing a normal processing of data it will probably look like that it is a well designed let's say well designed object-oriented code so we have customer class which encapsulate all the fields there is only public method operating all those fields there is a repository which contains all the lis a list of all customers and we are just having this very important for us update scoring algorithm that is just processing customer by customer every customer is being processed one by one Oh obviously this looks nice from the dis clean and to everyone understand it in terms of design but in terms of memory performance access it is a complete disaster because in fact what we have here is a lot of references to references for example list of T is just internally has just a storage in a form of array of T so we will have a reference to array of customers and then this array of customers is just a list an array of references to customers and these customers are also has separate objects into the memory we don't have any guarantees how those customers are laid out in memory completely so when we will be traveling and we will be iterating those customers why buy one we will be just jumping in fact being between various memory regions like in a random way which is a complete complete waste of memory access we don't need and we don't want random memory access because nothing can be cached nothing can be in taken and read in advance random memory access is just an evil and moreover when the processor accesses memory it does it in so called cache line the cache line is 64 bytes so even if you ask for a single byte the whole cache line is always read from memory so even if you are asking for a single byte or 2 bytes the whole 64 bytes will be read from the memory into the cache so we really want to have thus the such design that when we are reading some cache lines we will read every important data for us and this is not the case here because the customer object has some fields they have fields have some automatic layout we completely doesn't here completely doesn't have any control in what order in fact those fields will be in the memory and in fact we can be sure that the access will be not optimal the customer object has some fields but we are using in our algorithms only some fields so we are just waiting memory because we are reading cache lines reading some fields from the customer but the rest of the fields is not interesting for us so we are just wasted some memory access how we can solve it the very so benchmark some baseline that we don't know whether it is fast or no currently it is some number with the result we can fix it in a simple way by having the taking advantage of structs better better data locality because now I changed the class to a struct struct are not very popular because we are they're a little bit tricky we don't want to use them because we don't we are not sure whether it is good or not for this case for me it is good because I have a struct I'm you I am using here sequential layout so I'm this expert is saying in what direction in what order those fields should be laid out in memory when representing such a struct and moreover the list now contains the customer values so it contains structs and this is a beneficial because in case of dotnet implementation it means that the values of those trucks will be inlined into the array that is inside this list in so instead of having references to customers spread around all the memory we have in lined all the customers into this an array so it is much better for in terms of performance because we will be reading this sequentially and through sequential access is really really good for CPU for operating system for every caching mechanism what is the difference the difference is like that the code becomes quite fast I mean we can say that it is twice faster almost twice faster like just by replacing this object and the class to a struct so it is better but it is all about array of structs and this is our starting point array of structs is a kind of anti-pattern what I'm describing here is a structs of a race pattern this is in the verse of this which looks like that and this is very ugly I agree this is violating every good design pattern object-oriented design patterns I just get rid of customer entity completely I have no customer I currently have struct of a race which means I've have a structure which represent which contains a race representing the fields of all customers this is roughly but this is really good for performance why because now every argument is in a separate array and when I'm accessing those arguments those fields that are interesting for me it will be read completely sequentially and moreover every cache line we read we reread every interesting attribute for me because I'm just interested in scoring earnings smoking and so on so every disabled contains those attributes and every cache line contains all data I'm not wasting and a single byte when accessing memory here the difference in performance is 10 times faster than the original one just because I'm completely redesign the way how those data is represented and obviously I didn't in I don't want that you will live now with the feeling that now you can you have to change every design in your code from this good object-oriented design in to the planar race but you have those small parts of your applications when you the high performance is really important this is the place when you can think about making such changes ten times faster for me it's quite big benefit and there is so called entity component system anyone in gamedev especially in game development involved will recognize it because it is very popular pattern in game development I invite you to read I'm also having some slides more slides about entity component system in those bigger version and this is very popular solution which is are built around this concept of struct of arrays so as you see this is a very ugly again empty pot a pattern sorry which provides us much much better access perform memory access performance obviously the main trade of his here that we are just created a much worse design which is completely not object-oriented but again it can be beneficial on those smart parts of our application okay another thing and without any better name I'm calling it simply stack based data maybe we will have some better name for this design pattern it is about using in fact stack because this is a another tremendously popular way how we write high-performance code now in c-sharp the problem that we are trying to solve here is that we are allocating a lot of small temporary objects which obviously put a little high pressure on the GC because we need to allocate them we need them to clean them so this is not very good from the performance perspective and how we can solve it then the most simplest way just not to allocate on the managed heap and there we in c-sharp we have various possibilities to do that obviously we have structs which are in with us in c-sharp from the very beginning of the C sharp but we are so they provide good data locality there is no metadata there is no object header so there are more packet in even jitka and register them so they can be living on the stack but they can also be and registered which means the sheet will put the data of distracting to the registers of the CPU so it will be even faster but unfortunately turn can be boxed and the scenarios in which the structures become box are so complex that I'm not even aware of every of them probably and we are just that using of structs is a little bit tricky so we are little bit afraid of using them still there is some popular the structs recently introduced like value tuple which represents at room to pool but as a struct or value task which is a task but is a structure also interesting things to find out but those are obvious things for let's say a little bit obvious things there is also more advanced stuff I would like to present you like stack a lock operator refs trucks I will be not talking about them there is a there are some slides into the bigger version of this presentation and fixed size buffers so stack I looked after the span it is my second beloved finding in the sea verb however of you used stack a lock I will give you a price but I don't have funny [Music] maybe two or three hands this is a operator that allows you to unlock memory on the stack directly because normally we are allocating on the managed heap the GC is taking care of this object and so on and so on but on the other hand we have stack which is just a small stack frame when the method is called the stack frame is being created there are local variables everything is out of the box but we have this powerful possibility to stack allocated directly kind of array and into the stack frame so all this data that we are stuck allocating will be just into the stack making the stack frame a little bit bigger and from the very beginning this operator has been available but it was not so popular for example the result of the stack a lock is a pointer so to use the catalog for many years we need to use unsafe code but then since the c-sharp 7.0 there is a span which represents which represents also can represent result of the stack a look as we said so before so now the code can be safe in terms that sometimes we can deploy unsafe code for example on some are shorter deployments but now we can use its span and we can use extends stack a lock safely and unfortunately there is one really big risk when using stack a lock we can't let you don't have any control whether it will kill our application or not because when stuck allocating we are just using the stack the stack has its own limit of the size and we can be just hitting Stack Overflow exception and the stack of exception is one of those two in fact exceptions in dotnet which cannot be catched so by drawing it we are just kidding our application kind of big risk for me but still but because of that we should be really careful how big big buffers we are stuck allocating the size should be small and there is no official what the meaning of the small in such case even Microsoft is not sure it is saying it should be like less than one kilobyte for example one less than one kilobyte is okay so coming to our benchmark we have been using an array then we change that to using an array pool but now my Pugh processing and intermediate collection is represented just by a stack allocated data which is just very simple change also because now I'm using structure lock I'm having some data structure which is into in represented in this array stack allocated array and populating data and then using him in the folder API the result of the benchmark is nice for me because it is let's say even further over 50 percent faster than the version with array pool moreover I have completely no allocations again because I'm just using stack for everything so there is no DC overhead energy C overhead and you will met a lot of articles recently a lot of libraries that are just reducing allocations just because of using stack more and more parsers various Network stacks or so on so on will just be using stack more and more with the help of stack a look well here currently the most cost is because we are zeroing this stack with a run time provides this guarantee that we need to zero the stack memory because we are accessing the stack we have this the runtime provides this guarantee that when accessing the stack all the memories being zeroed and this is not good for us I don't want to see your memory because as we are seeing I'm just stuck allocating some buffer and then I'm just feeling the this buffer with data so why do I need zero allocating zeroing this memory I don't need zeroing this memory and we can do probably do that and with the help of such locals in it attribute sets to false we are linked going really deep here now because we are saying that we are using stuck but we don't want to zero memory of the stack I don't need zero in memory of the stack because I will be just using this stack writing with data I don't care about this what was there before unfortunately so we would like to have such locals in it attribute the result will be even better the code becomes faster again because we don't need do not have this overhead of zeroing this memory so it is the fastest possible solution to this problem as far as I know currently and this attribute doesn't exist in dotnet but it exists in a body and I'm not sure if you are aware of such a code Weaver as a library for T which allows you to manipulate code on the intermediate language level this is for the library and there is a plugin locales in it for D which allows you to add this attribute for a particular method and with the help this attribute the il will be manipulated in a way that shows I don't want to zero my memory in this method so it is possible but there is an ongoing process of adding such attributes also to the dotnet itself to the c-sharp itself also fixed size buffers when we have we want to have big date but good data locality we have a struct and struct will be on the stack probably but still we have we need some kind of array and here we have arranged a struct but this array is just a reference to an array that will be allocated somewhere on the stack sorry for us this is Stahl still not everything on the stack we can use fixed size buffers this is also quite old thing but recently used more and more so it allows a stow inline an array into a struct and if this struct will be on the stack the whole array also will be on the stack so it looks like that in case of struct regular struct which the regular referenced an array we will have a struct winner with a reference to an array for from memory perspective it is not so good because then we will need to follow this reference to some place on the GC heap but in case of fixed size buffers everything is in place we have in place this array we have all other fields everything is packed and danced this can be very useful in case of talking with unmanaged API for example and also you can find such examples in quarry fix library because here we are in lining our buffer which will be provided to when calling unmanaged code a very nice thing to do and very nice optimization and moreover this could be even not pinned because this lives in riff struct a lot of different possibilities here is in terms of performance for us it is important that you can remember that you can provide this much better data locality with the help of fixed size buffers and they will live also on the stack so stack based data embracing using the stack avoiding using reseed is the most important solution now when designing high-performance code obviously we have one one one very small cave it of stack overflow exception and possibility of killing our app but when dying when doing wisely this should be safe for us when we are operating on small buffers everything should be okay the last one pattern I would like to describe is called buffer the builder because it is also very oftenly we can meet it very often in high-performance code and the problem that we are trying to solve here is to to generate a lot of temporary data because we operate on our immutable data like strings string casino table when we are adding something to string in fact we are creating new string with the bigger content so we need some kind of builder that will help us to manipulate this immutable data in a mutable way let's say and the ultra popular example here will be a string builder which is available in dotnet from every time I remember seeing builder is the solution and the typical solution here so I just created a very simple example to show you the difference we have a two actions in our controller the Left controller is a disaster in terms of performance because it creates a lot a lot of temporary strings every line is just using string concatenation which means that we are just creating a new temporary string that will be replaced just in the second line and so on and so on the set arrived left side is a string builder exactly the same called exactly the same result producing some XML exactly the same logic in fact the difference is that it is using string builder and the difference is that it is buffered builder so in case of string concatenation we are just creating a lot of those temporal data in case of string builder we are just operating on the internal buffer which just doesn't care and the DC doesn't care for many time because for a long of time because we are just operating on pre-allocated buffer adding some data into this buffer and the solution is this is in Polish maybe we understand some words but the difference from the load let's load test looks like that on the left side there is a load test with the string concatenation and this orange line let's say or yellow depends how you see the course I don't know the color so it's let's say it is orange it we can see that it average percent in GC time is our de la average is like 70% so 70% of the process is consumed by the GC we because we are allocating so many tempura today it's temporary data temporary strings we just need to take care of them and in case of the right side this is a result for when we are losing string builder the overhead of DC will be at the level of 10% in average so we have a huge difference of the overhead of the GC moreover from the perspective of the customer also the difference is huge like in case of string concatenation with during the load test I was able either able to process let's say 10 9 10 requests per seconds and just because I choose it I changed it to use a string builder I was able to make the improvement and process 150 requests per second just because I just get rid of this overhead of GC by reducing the number of allocations by reducing the operations on the immutable data data and you will find also probably in the future more and more such approach string builder is all that tremendously popular example but they will be more in future probably like for example big integer type which is also immutable we present have the integer with any possible size so there is no max value for big integer but it is immutable when we are adding dividing big integers we are just generating more and more temporary big integers so there is a proposal for creating big integer builder with the operations like add divide multiply and then only after doing all those operations into some on some internal buffer we will just be calling 2 big integer method and have this resulting integer big integer and here is also an example of a value string builder my proposal of doing something on the stack I don't have time for describing everything here but what is important for us is that it is ref struct which means is a special new type of struct which has this guarantee that will be always on the stack it will be never boxed and will never happen on the managed heap so here we have this benefit of this it will be always on the Stark it we have initial buffer here provided the value stringbuilder accepts an initial buffer as a span I will just understand why in few slides and this is this has very similar API to a regular string builder what is important for us here is that the IP I of using it looks like that exactly like in case of string builder but the tremendously popular solution currently is that such buff such builders accept in those initial buffers and in a constructor and thanks to that we can stack a log buffer in our method and populate this builder with our stack allocated buffer so it is very fast because all this code in fact allocates nothing until we call this unfortunate to string method which means we need this resulting string in the end but all the lines do not allocate anything we are just operating on the stack allocated buffer initially the difference in the performance is not kicking our asses let's say but the difference in allocations is quite wicked we have just reduced the number of allocations by the 4 times less allocations we are having here so it is good if you will be using such string builder tremendously and popular hot paths of our application we will just reduce the number of allocations by the 4 and so it is really really good for us such builders will be popular especially when using stack allocated data everything is about using and about avoiding GC obviously the API is much more complex now we have to think about those builder stack allocated data all this has its own overhead but this is the world we are living in case of dotnet performance there are also other patterns that didn't fit into this presentation I have one minute left probably so it is a perfect time to say that they I'm also seeing other patterns like lightweight wrappers for memory like metaprogramming which allows us to avoid virtual calls and obviously such popular things like caching batching and using seemed for vectorizing operations on CPU those are the slides that would feed the second hour of this talk some of them are described in this in these slides this is the version of this presentation that I'm referring I was referring to before it is a longer version so he has this long version a long postfix because it contains some principles and a little bit more detailed description of those patterns that I was just describing now ok it seems that is all thank you and now we have four minutes for questions great there is one I thank you for your talk so you have mentioned that you cannot store spin off see on heap so it cannot be a member of a class yes for example if I have some applications that has heavy interrupts with native code and I want to implement a a poet array that will give me chunks of native memory but I want to wrap it in safe context so like there will be a huge buffer and I will say that you can rent this piece from 8 to 16 but I want to do it in a safe context is it possible yes you can use span of to your memory of this context I believe yes I as far as I understand your question it seems completely do eyeball with this pun it was a little bit a simplification it has only a possibility to represent something which is unmanaged and then you have do some a little bit more plantings define some memory pool and type but it is possible if you'd like I will just contact me and I will provide you some details thank you thank you very much Conrad for your presentation and I kind of philosophical question regarding these optimizations so is there any point when we should stop optimizing dotnet code and just write nice that not compatible C++ library hi yes I was expecting this question I have even a slight about that [Laughter] this is always happening that the question arises what's the point I mean the point is for me very far because I understand I believe we can write a really efficient code in C sharp still and the question for me Z I have a company I have a dot that I a lot of dotnet developers I have C sharp developers people with no C sharp with no older they could not that echo system obviously I can go and write something in C C++ then but then I need to hire some C C++ developer and only for this simple small part of the my application I probably need to do such strange things then I need to plug it with the Interop for me my it is much better to have it in C sharp which is really as efficient as C++ and can be instead of just going to write that direction so tunnels for us words free answers here describe it in a consistent format it's just much better to have this possibility to write in c-sharp then just find the solution in C++ and I'll invite you to find this project XE is a very interesting Academical project is the road network drivers which are working in user space in a very in very different languages including C Java Java JavaScript rust and C++ and they showed that the c-sharp version is all almost identically fast as C++ code so it is they contain a lot of very interesting material to read regarding your question also so maybe just too obsessed with classes and is optimization but as a useful in a typical code I mean not just for small objects yes this is interesting question when is the point when we should think about changing use of classes to structs and whether this is maybe it is just good to replace all class destruct in our source code but I for me the class is just good enough for everything when I having a highly performant code that I'm just taking care of performance it will be the place that I will start thinking about using structs because structs are tricky they are box they have known obviously behaviors defensive copies might be created they the values may be passed by copy so in fact the performance becomes worse than using just the class which are passed by reference so in general using structs is kind of tricky we will really need to understand what we are doing so for me still I will stay with classes and this small part of the our application is to 3% of our applications that we really want to tune theya we start about thinking classes and structs still even though big is clear sometimes they seem to be faster I show you that in that way but then I could make a talk about why structs are slower in some scenarios so we should be really careful we have probably one here was there was a it is a last question because the time yeah if you have any questions I will be here around today tomorrow so feel free to ask can you please advise regarding the best approach you can advice for the task of search in strings for example you have a list or array of cloud objects and they have some string fields so for the tasks of search stretch without using some special yes yes sir tea parties like last year sorry is yes we cannot do here in many ways probably I would go into the direction of using span of T which we allows asked us interpret these strings in slicing fine for this particular strings and there is a great talk from Stephen Gordon you can find it he is describing how he applied span of the ins parsing and finding elements in Jason's for example so it will be like direct answers answer to your question probably also there is an ongoing the work of using seemed intranet six code to interpret to interpret to find substrings in strings in a vectorized waste thanks to seem the instructions which is a totally brain exploding thing let's say but it is possible so it will be the mass the most possible the fastest way of searching text with the help of that I don't have links about that but for sure you will find it and seemed have text search algorithm that exists but it will be ultra specific but also ultra fast and you really want to do it fast Lee ok so I believe we don't have any more time if you have any more questions I will be the hero remember about the game and have a nice conference thank you you
Info
Channel: Fest Group
Views: 21,035
Rating: 4.955862 out of 5
Keywords: Konrad Kokosa, code design, .NET Fest, C#, .NET 5, NET Core 3.0, Span of T, patterns c# programming, design patterns c#, common design patterns c#, design patterns c# example, .net design patterns c#, design patterns c# tutorial, design patterns c# with examples, code performance, производительность кода, архитектурные паттерны, architectural patterns
Id: 3r6gbZFRDHs
Channel Id: undefined
Length: 57min 45sec (3465 seconds)
Published: Fri Jan 24 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.