The best way to create a string in C# that you shouldn't use

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody i'm nick and this is going to show you what is the fastest and most memory efficient way to create a string in c-sharp without having to deal with any unsafe and low-level code and like the title says you should never really have to use this in your day-to-day programming life this is here just for educational purposes and i am going to show you a total of four ways to solve the problem i'm gonna create in the beginning and yes the three first of them are actually uh applicable and usable easily however the last one and most efficient one which this video is about is here purely for educational purposes there might be a 0.00 something percent of you who might ever need to do that level of micro optimization but for the rest of us any other of the three solutions will do just fine if you like the content and you want to see more make sure you subscribe bring the certification bell to get alerted when i upload a new video but before i move on let me tell you about the sponsor of this video the ndc conferences now many of you reach out on twitter and linkedin and you ask me hey nick how are you finding ideas for your videos how are you learning all those things you're showing and how can i learn them as well and the answer is always i'm attending conference talks now previously in person now online and hopefully in the future or the very near future in person again my favorite one to attend as a donut engineer or ndc conferences in case you don't know our conferences that happen around the globe mostly focused on dotnet but other tech as well and they happen every year the biggest one is in oslo there's one in london which is pretty big there's one in sydney in porto and you can click on their website and the link down below to check them for yourselves now just to understand how influential those talks are blazer was a talk given by steve sanderson in ndc john skeets abusing c sharp is an ndc talk jimmy bogart's domain driven design the good parts is talking ndc and the legendary the art of code by dylan bt is also an ndc talk so you can understand how much value you can get out of an ndc conference seeing these experts giving these talks this year i will be in ndc oslo and nbc london so if you want to come with me come watch these talks with me please click the link description sign up buy a ticket you can also get your employer to buy the ticket for you this is what i've done in the past and let's have a beer so i have this clear value which is password one two three exclamation mark and then all i want to do is write a function that keeps the three first letters and then masks the rest of it with asterisks that's it and what i would normally do in a very like naive approach is i would say first characters here and then do clearvalue.substring and get from 0 to three so get the first three characters and then i would get the length the remaining length so the total number of characters minus the three i extracted basically so clear value dot length minus three and then what i would do is i would write a for loop um and the upper limit would be the length the remaining length and for every iteration i would stick an asterisk at the end of this first charles variable and then all i would do is i would say console.rightline and i would print it and that's it so if i go ahead and run this you should say that i get two identical things one for what i want and one for how i did it with my code now the problem here and there's a few but the main one if you were to look at this from a critical standpoint is that this operation here because strings are immutable you cannot really change the value of a string you allocate a new string every time you change it what this means is that for the i don't know eight asterisks or however many it is nine and we're gonna create nine different strings for this thing which means that we're going to allocate a lot of memory that ultimately we shouldn't really need to do let me just quickly change that to a var um and actually so we can measure this and get an actual representation of how our code is performing in terms of speed and memory i'm going to go ahead and add benchmark.net here so i'm going to say benchmark dot net and i'm going to add this package and now with patchmark.net i'm going to quickly create a benchmark class so i'm going to say over here public class benchy and then i'm going to say that this is a memory diagnoser benchmark class meaning it will collect memory metrics such as invocations of the gobs collection and also the memory that the method we're going to benchmark overall costs or allocate and i'm going to say public string we're going to return the string that we want which is um effectively the masked out version of that thing so string or mask naive is what we're gonna call this and then we're gonna take all that stick it in here and then return it so we're gonna return first characters and i'm gonna take that clear value as well and i'm gonna put that at the top of this class so we don't have to reallocate it every single time so we're gonna remove that and we're gonna say benchmark runner and i'm gonna say run this benchy class this benchy benchmark so now we have this i'm gonna stick a benchmark attribute here i'm gonna change that to release and i'm going to run it and by running it we'll see how it performs and we're going to get a baseline for all our following tests and results are back and as we can see we have a mean execution time of 98 nanoseconds now we have some gaps collection in gen zero and we have 500 bytes allocated not terrible but for that sort of operation we can definitely do better so how can we optimize this i remember when i said that every time we iterate over this that we're going to allocate a new string because appending this character will locate a new string because strings are immutable well actually i know a better way to do this we can use a stringbuilder and a stringbuilder is a class and we're going gonna create it here i'm gonna say new string builder and i'm gonna put the first characters um as the initial value and then instead of appending on that first characters all i'm gonna do is say stringbuilder.append this character and now instead of allocating a new string every single time a string will internally will not build the string until we say do string so it won't actually allocate a new string every time we append something because the string itself doesn't exist until you invoke this two string so let's go ahead and run this benchmark now with these two tests instead of just one so results are back and let's see what we have so as you can see mean execution time of the naive version 102 nanoseconds and 400 bytes allocated the string builder approach 39 nanoseconds less than half and 184 bytes again less than half so this is roughly 60 more efficient in both speed and memory which is awesome because we didn't really change much in our code that's a very very practical and usable example you should be using screen builders when you can especially in scenarios like this because they decrease your memory locations it is faster because you only build a string at the end and in the meantime you only append and you should really be using it this is one of the examples that you should be using in your code now i think we can do better than this still and i'm gonna give it a proper name once i have the solution but for now i don't think that we need the string builder in c-sharp the string has a constructor and you can say new string one of those constructors is very interesting because it allows us to say let's save our asterisks and we can have a repeated pattern of a specific character so i can say new string asterisk and then i can give it a comma and give it a number and for us the number is the length of the asterisk that we want and what this will do is it will repeat this character an amount of times for the value of the length so that means that then all we need to do is return first characters plus the asterisks and that's it and so i'm going to call this mask new string and this is another one that is very very realistically usable and applicable you have many scenarios where you might need repeating patterns and this new string thing can really help you a new string has usages even outside of this repeated pattern approach so results are back and let's see what we have so we were at 40 nanoseconds with a string builder which was 60 faster than the original version and now with the new string where 21 nanoseconds almost half of that string builder approach and four times faster than the naive approach the original and very memory efficient we're down to 120 bytes this is very very low and very very efficient however this is still not the most efficient way to create this string and solve this problem and the next one will be the most efficient way and we're gonna do that with the new or relatively new string dot create method and let's see how that works because it's pretty pretty interesting and this example is carefully selected to showcase how it benefits us and how using it benefits us so what i'm going to say is mask string create that's the ultimate way of going about this problem so we don't need any of this all we need to say is return string dot create and let's see what this method takes first it needs a length that is the length of the final string that we're going to return and the reason for that is because this works with the span behind the scenes and the span is an arbitrary size of memory allocated for your benefit to your usage and because it is a rough struct it only goes on the stack it doesn't go on the heap meaning it won't cause those expensive allocations and it kind of breaks these rules the strings are immutable within this delicate that you're gonna see in a second where we're gonna be allocating bytes in a span you won't be constrained by this idea that strings are immutable you would be able to effectively mutate it even though it's not a materialized string in terms of minus memory at that point so let's see what that looks like first we need the final length and the final length is the length of the clear value because it doesn't change and then you need the value that you're going to have to tinker with within the delegate so for us it's a clear value that's the value we are inserting effectively into the string.create method and then we're gonna say span and i'm gonna call it value then value is the same thing as clear value so we can deal with it and span is what our strings value will finally be but at this point it is a span it's not a string remember we're still dealing with a chunk of memory effectively i think this will be way clearer if i just debug this for a second so let me just remove this benchmark runner and instantiate this class so i'm going to say benchy equals new benchy here and then i'm going to say benji.com string and i'm gonna debug this so change that to debug and call it so let's have a breakpoint outside and then a breakpoint inside and see exactly what's happening here so outside you can see clear value the full thing we need the length because we need to allocate that size of memory then we have the clear value and then we go inside and inside you deal with the span which is the span that you finally gonna spit back as a string that's why you see uh effectively what looks like an array of 12 um char locations that's where your your characters will be allocated for the new string and then you have the value which is this clear value now why don't we use the clear value from outside it's because i would suspect that will cause a closure uh for this delegate and it's not efficient and memory wise if we do that it kind of defeats the purpose so we have the value in here as a parameter to deal with and then all we're gonna do and look how simple that is at least for our use case is we're gonna say value dot as span and we're gonna we're gonna take it as a span and whatever starts with s in c sharp is casting is not reallocating for example if you do two list or two array those things are reallocating to create that list of the array if you get something as something else as memory as enumerable as parallel you're casting you're not creating something so as span and we're going to copy that to the span what does that mean what does that look like let me debug again and show you exactly what that is so here we have the span which is currently just empty and then we have the original value so if i step over that and i go to the span now the span has been written with the values from the raw text the clear value so currently the value of the thing that this thing will create the new string will be the same thing password one two three exclamation mark now what we want to say is span from three and picker range and then fill the rest of that with asterisks and it looks like we're gonna be overwriting the rest of that string from the fourth character onwards with asterisks but really because we are within that uh scope here the span itself the value of the span is not reallocating any memory so again let me just debug again and see exactly what this is doing so now we have the span uh here all the the characters are here and if i do the fill thing then you see that i have pass and then all the asterisks so that is the most efficient way to solve that problem not the most easy to understand if you're not familiar with span but it is the most efficient and let me just show you how efficient it is by commenting out uh this or reversing the comment out and then clicking release and running all the benchmarks again and let's see what the final outcome is so results are back and let's see what we have so that is absolutely insane so comparing to the first naive approach this approach the string they'll create is 10 times faster it is just 10 almost 10 nanoseconds compared to 100 nanoseconds and memory wise it's almost 10 times more memory efficient only 48 bytes now does this mean you can actually use this anywhere no it doesn't really mean that one of the bigger limitations of this approach is that you have to know the length of the final string up front which sometimes is not really the case so it becomes hard to use however the approach with the new string and even the string builder is significantly more efficient than the naive and reallocating approach by creating a new string every time so as long as you're careful when it comes to that then any other solution that you come up with should be fine however if you do have a use case for string.create feel free to use it but feel free to also explain to your teammates exactly what you're doing because it is relatively new it is relatively low level as well and even though it's a really cool feature it can be hard to understand so don't prematurely optimize something that you don't need to optimize just to use the feature that's all i have for you for this video thank you very much for watching special thanks to my patreons for making videos possible if you want to support me as well you're going to find a link in description down below leave a like if you like this video subscribe for more content like sharing the bell as well and i'll see you in the next video keep coding
Info
Channel: Nick Chapsas
Views: 35,839
Rating: 4.9406595 out of 5
Keywords: Elfocrash, elfo, coding, .netcore, dot net, core, C#, how to code, tutorial, development, software engineering, microsoft, microsoft mvp, .net core, nick chapsas, chapsas, clean code, what is span in c#, span in .net, span and memory in c#, dotnet, .net, The best way to create a string in C# that you shouldn't use, string.create, new string, strings in .net, strings in c#, fast strings c#
Id: Kd8oNLeRc2c
Channel Id: undefined
Length: 16min 40sec (1000 seconds)
Published: Fri Sep 03 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.