Moving Faster: Everyday Efficiency in Modern C++

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I just wanted to say, that the idea of that C++ flame logo (I guess the thumbnail of the video) is dope.

๐Ÿ‘๏ธŽ︎ 27 ๐Ÿ‘ค๏ธŽ︎ u/tsojtsojtsoj ๐Ÿ“…๏ธŽ︎ Jan 26 2021 ๐Ÿ—ซ︎ replies

Good talk, but confused (30:25) why this is this more costly

vector<string> vec;
vec.reserve(...);
vec.push_back("Hello");
vec.push_back("World");

than this

vector<string> vec;
vec.reserve(...);
vec.emplace_back("Hello");
vec.emplace_back("World");
๐Ÿ‘๏ธŽ︎ 11 ๐Ÿ‘ค๏ธŽ︎ u/PRSprogrammer ๐Ÿ“…๏ธŽ︎ Jan 26 2021 ๐Ÿ—ซ︎ replies

At 22:53, he says that taking the vector by reference is fully optimal, yet I get better performance using a return type: https://quick-bench.com/q/JIkVuaXAVSrvJulIjfg6-eGLzIQ

In general I think writing clear code using return types is more important than messing up your interface with out parameters, whilst also typically not hurting performance much if at all with modern C++ move semantics.

EDIT: On further inspection, it seems like the improved performance only occurs with Clang using libstdc++. MSVC, GCC and Clang (using libc++) all have identical performance between returning by value and passing by reference (which is what you'd expect).

๐Ÿ‘๏ธŽ︎ 19 ๐Ÿ‘ค๏ธŽ︎ u/Ayjayz ๐Ÿ“…๏ธŽ︎ Jan 27 2021 ๐Ÿ—ซ︎ replies
Captions
so I was chatting with Howard Henan ideas for a talk like this and he suggested the name moving faster so thank you Howard he also told me a story his wife Teresa can type 90 words a minute which I gather is kind of par for the course for a professional typist it sounds scary to me but and thirty years ago her computer could keep up with her so let's look at that a minute 90 words minutes four hundred and fifty characters per minute right seven and a half characters a second so thirty years ago I had a Mac two on my desk had one thread running at 16 megahertz so let's see that's per character two million instructions right per character of course it could keep up you can do a lot in two million instructions so we got frozen for 30 years and they've just thought us out and now it's 2018 well I just bought this thing a couple of weeks or a couple of months ago and it's a yeah it's that it's got eight threads running at two gigahertz when it's calm it goes faster than that so what's that everybody two billion instructions per character so you all know where this is going right Howard's wife Theresa her computer today doesn't keep up with her typing what are we doing wrong I mean that's ridiculous well I don't know all the answers but here's some things that I think that we might be doing wrong and I'm going to talk about some specific things that maybe we can do better so when does efficiency matter typing really seriously the world record for typing speed according to Wikipedia is 216 words per minute set by this person in 1946 on an IBM electric typewriter that was pre ball not a Selectric okay computer can't do that I mean and over the last thirty years a thousand times increase in speed I mean sure we're doing more but are we doing a thousand times more stuff well there's a lot of you know there's a lot of reasons for this and I'm only going to talk about some of them but I want to give you a couple of examples though of when efficiency matters company I used to work for they hired me to work on a program that extracted data from their database and made a product with us is a digital map company so their data was pretty big and their system took three weeks to run and that one of the things they wanted me to do is make some intermediate stopping points so that you know if something went wrong they didn't have to go back all the way to the beginning of the process well I didn't do that I rewrote it to run and five maybe it was eight something like that anyway overnight about a hundred times increase I mean this was a game changer business why as you can well imagine the current company I work for we have a test automation system for our rail simulator and 2,700 tests we have at the moment take about an hour and ten minutes to run today we're rewriting it different design reducing that to about twenty seconds so that's also going to be a game changer for test-driven development so my point here is that efficiency matters efficiency always matters this kind of efficiency is something that you can get I mean this kind of efficiency change is something that you're going to get with design changes which is not what I'm here to talk about today but once you've made those big design changes now all of a sudden your code is running much faster and now the bottleneck may be things that aren't so obvious subtler things in the implementation and that's what this talk is about so where does efficiency matter the rule of thumb that I grew up with was don't optimize your code right at the nice clean way go find the five percent of the code that's spending ninety five percent of the time and fix that well you may throw things at me but I don't think that's the whole story I think that's necessary but not sufficient and I think that there are many programs that do not spend all their time in a small big small bit of code and that if you write a big huge program without thinking about the performance anywhere you'll end up with a big huge slow program that you can't really fix actually the rail simulator is an example of this because if you run it electrically with the electrical traction power simulation on it actually does spend almost all of its time in a very small bit of code but if you run it with the electrical simulator off it actually spends its time all over the place and there's no one spot that you can fix so I say right almost everything optimally but optimize code is ugly right I think in C++ optimal code can be elegant and that in fact optimal code is typically the most correct code and so therefore kind of by definition the most elegant code I think that our idioms should be for optimal code and our idioms are the kind of by definition the elegant way to do things and if something really ends up being ugly you can hide it in C++ behind a very elegant zero overhead abstraction often this is done in a standard library in fact the date-time library does this all over the place matrix arithmetic libraries do this so what about compilers well compilers today are very smart and they can't fix design problems that's pretty obvious but they also can't get around all coding in efficiencies they'll fix that right don't have to worry about that but they won't fix this and we're going to go into this and what's wrong with it so I grew up with simple architectures I didn't know what a cash was floating-point was super expensive this was back in the the old days all right the golden rule used to be reduce FP divides count your FP divides and get those down as far as possible if you wanted your code to go fast well floating points faster than integer now or certainly as fast I didn't actually realize that until not that long ago when I suddenly did some experiments and realized wow you know this this rule that I've been holding in my mind it's just not true so caches caches are big and they're much faster than main memory but they're not big compared to main memory and this faster they get the smaller they get there's more than one right so locality of reference is critical which means that size matters the fact that you've got 32 gear 16 32 gigabytes of RAM doesn't mean you can just arbitrarily write huge code because it won't perform because of caches so cache misses cache misses are the new floating-point divides so theorem 1 avoid cache misses dynamic allocation heap allocations are the most expensive things we do I mean we're not talking about going out on the internet for something and deallocations are the second most expensive so we need to reduce these as much as possible we need to count our allocations don't worry about counting floating point count our allocations reduce them ideally to zero can't always do that standard containers they use heap allocation but heap allocations are not only expensive but the locality of reference is terrible they're all over the place right so there's a double whammy so theorem number to avoid allocations the stack on the other hand is contiguous so it does great locality of reference and it's used constantly so it's always in cash so if something isn't too big and its local should be on the stack a reason to put it in heat nothing's faster than a register right and there so the the compilers gonna try and put things into registers there's lots of them it's gonna use them in fact lots of variables are only in a register ever but if you take the address of a variable or of course if it's involves dynamic allocation it's not going to be only in a register all right so that and by the way I didn't mention but interrupt with questions any time so those are some basic principles maybe most of you already know these things let's talk a little bit about embedding objects so size matters and locality of reference matters and dynamic allocations matter so you've got a contact object do you implement the phone numbers as a vector or as an array well it's a trade off right if you do it as a vector you're gonna end up with two different allocations and bad locality of reference if you do it as an array well you've got a limited size and array doesn't actually have a size so you're gonna have to have some way of putting phone numbers into it and telling that the rest of them aren't valid so this is your situation but this lower approach has a potential to run an awful lot faster we don't today have a fixed capacity vector or a short string optimization what I call it SSO it's probably really a short buffer optimization vector both of these are ideas that are being floated for the standard library but we don't have them today this would be a perfect case for the short string optimization right because you usually just have one or two phone numbers and sometimes you've got ten well we can go better right embedding objects what about if we embed on job jex on top of each other how can we share space well unions in C++ 98 unions were kind of useless you can't really put anything in a union that means much of anything that's got fixed in C++ 11 so you can use unions they're quite useful they're still dangerous if you write one thing and read another you're doing an undefined behavior don't do that but you can really save space and that saves time but unions aren't really ever safe so also in C++ 17 we have variants the variant is a safe Union and they're great should certainly use them but they add some extra space because they have the the type tag so you can't for example save 4 bytes when you want to superimpose two intz by using a variant whereas you can do that with the Union before I go on another thing I don't have a slide for but it's quite important be sure and squeeze all the air out of your objects - meaning order them by size so that the compiler doesn't put in a bunch of padding bytes we saw really this morning's talk by Arthur was had a great example of that where he had this tremendous speed increase because he just got something a little bit smaller and it ended up fitting in the cash so now I'm going to talk about passing things around because I mean one of the reasons I wanted to do this talk was that I'm always asking myself um you know how do I pass this how do I pack get this back from a function what's gonna be most efficient I wasn't sure the answers so here's some of the answers past simple things by value value semantics are a good thing anyway and C++ has all kinds of great support for value semantics but remember that you're making a copy and sometimes a copy is expensive if you need a copy anyway then cost nothing pass by value generally speaking we pass other things by Const reference even shared pointer so Neko had a great talk here forget what year it was three years ago maybe where he pointed out that passing shared pointer by reference when all you want to do is read it is a lot faster than passing it by value which is kind of a natural thing to do so pass by constant reference also if you're writing generic code and you want to know okay what should i what how should I take this parameter take it by constants it doesn't matter if it's a built-in type it'll be just as fast I measured this it's just as fast there's no penalty if you're passing an int and you're taking it by conference not a problem but of course you know all the usual caveats where the reference you have to be concerned about lifetime and what and there's some efficiency considerations so watch Nico's presentation now I know that your mother told you not to speak to strangers and not to pass things by Const reference by non conference it makes code much harder to understand reason about depending on what it is that you're sending in value semantics may be just fine move gives us a huge benefit depending on what your objects like but sometimes it's necessary and sometimes it's much more efficient so I'm going to give you an example of when it's much more efficient so here all I want to do is with this one vector I'm gonna go through a loop some number of times and every time I'm going to clear it I'm gonna load some stuff into it and down here I didn't write any code I'm gonna do something with it and I'm gonna come back and do it again lots of times okay so let's think about what happens the second time through the first time through you know the vector I'm passing it by value here write the first first time through it's empty nothing really happens a bunch of allocations happen here as it fills it up it gets returned right it gets returned by value which means and we're going to talk about this in a little bit but it means that it's going to have the rvo so there won't be any particular cost to that all good right what happens the second time through the second time through what happens here thoughts questions anyone let's look the second time through the key size is a thousand right and capacity on this thing or whatever I did it on happens to be 1066 the library was using so we do a clear the size of zero the capacity is 1066 right we do it we call load numbers we do a copy constructor right now it happens size is zero right we didn't move anything there aren't anything out there but the capacity is zero right because I've constructed a new one with an empty one I'm not going to get that capacity right it's not going to do any allocation system until I actually do something with it and I have my whatever 10 allocations again every time through the loop I'm gonna do all those allocations this is gonna go super slow so let's pass it by Universal ref that's not the right word anymore it's forwarding reference right yeah yeah yeah yeah right right exactly I get those confused so we're gonna move it here right we're gonna take simple our value reference so what happens so this fixes it right sure no problem we we move it here and there's no constructor at all happening here right we got our capacity we fill it up zero allocations all good is there a problem here anyone notice anything that's true but it's worse than that what's this number is not the same what happened to our 1066 now it's a thousand that's because it got returned by a copy constructor which is kind of what you're saying but not only did we knock at the our vo we did not get automatic move here so we're copying it every time still pretty slow not as slow but still pretty slow so what can we do about this well we have to do the move explicitly and now we have no rvo we have two moves here and here but we have zero allocations and zero copies so we're doing pretty good can we do better that's pretty good I mean notice we got our capacity back where it should be what happens if we pass it by good old-fashioned non-contrast well thoughts well when we pass it nothing happens the size is what it is the capacity is what it is no allocations occur when we return nothing happens because there is no return no return statement all right this is fully optimal we're not paying for moves we're not paying for anything so it doesn't get better than this so for situations like this passing by non contra it's a good thing even though your mother said not to myself I'd pass bike by non construct before pointer but that's I mean that eliminates certain yeah same same it does the same thing all right so how do we return so return by value unless you're writing an accessor you're gonna get copy elision return value optimization in many circumstances everyone I'm assuming everyone knows what that is but just briefly it means that there's nothing happens at all because the object that you're actually working on in the function is the object at the call site and this is man day actually mandated in C++ 17 I believe under certain circumstances if copy elision does not occur then an automatic move may occur basically what happens is the compiler tries returning the expression as an r-value and if it can't it returns it as an l-value but an R via and r vo is better than a move because of the r vo nothing happens the move actually has to do the move so there's some rules here don't do don't do a move in your return statement you notice I had to do that before to get it to do it but if you do you will inhibit the copy elision so don't do it if you don't have to the function return types got to be exactly the same type as the type you're returning or it can be a local variable excuse excuse me or a but a vie by value parameter if you have multiple return statements that will often inhibit the r vo I gather that's because it's hard to implement so it's gonna depend on your compiler i don't know that data of what compiler does what here but multiple return statements will not stop the automatic move from happening conditional expressions on the other hand will often prevent the automatic move so either do one of these okay don't do this without the moves so here's some examples of good return statements all of these are going to be are gonna be fine and here's some examples of bad return statements these all have the potential to inhibit your efficiency questions I'm going fast so I with it hoping not to run out of time too badly the third one now no I know it's doomed you yeah I mean you could do an explicit move no you can't you can't you can't you're doomed yeah yeah maybe really what you wanted to do is pass by non-contract I'm assuming everyone knows quite a lot about moving so I'm not going to belabor moving right anyone want me to spend time on move the the main issue is coming up move semantics solves a whole lot of problems and makes a lot of code go a lot faster because you're not actually copying stuff around that you don't need to be works very well however there's lots of objects that absolutely move semantics do nothing for and it's very easy I've found to say oh well it'll be fine because because we've got move well it'll only be fine because of move if your objects manage resources but if your objects just contain resources which is what you want them to do in general as we talked about earlier you want to embed things in your objects to make them smaller and to and to limit allocations well those objects are not going to get no benefit from moving at all the contact object that we talked about moving and copying the same thing okay a standard string is the same thing if it contains 15 characters or less from in most most libraries right because the standard string has a short string optimization which is great and the cost of copying the string when it's 15 characters is the same as the cost of doing a move okay I measured this it's the same cost but it's still a cost and if you're doing a whole lot of them that cost might matter I threw this in here mostly to get you to watch Howard's discussion of move oops sorry wrong button in fact if you take away nothing from this talk but the resolution to go watch Howard's talk and Nico's talk fine they're great the big thing here is you often don't need to write any of these because they get written for you and they're often correct all right let's look at an interesting case of doing a move so what's happening here this good bad and different anyone boy is everyone asleep what's going on here is that you're getting a copy that's what push back does push back takes a const con stress and it was going to do a copy of this string and let's say this is your latest russian novel that just happens to begin with hello world you're gonna copy all of that so that's not so good so let's fix it does this fix it sure right in place is great right that fixes it does this fix it this is exactly the same code in fact if you step in to the to the library that I use you'll find that pushback calls in place back ok so why doesn't this fix it well because in place backs just calling that same constructor what about this does this fix it hmm question yes no yes what's that I think as well as it can for this code now we're going to get a move of course down here if you look at s you're going to be in trouble right so this isn't necessarily ideal but you know this is pretty good we've gotten rid of the copy we aren't copying your Russian novel anymore ok what about this situation though we've you've decided that instead you're you're writing a very short poem well does this help no we just discussed that right it's a short string optimization it's not going to go any slower than the code we just saw even though the code we just saw how to rush a novel hanging off it but it's not going to go any faster that's true oh you're saying you're saying that it's going to write a null right so so it might be slightly slower okay so what do we do does L know the same thing okay how can I thought we could use in place to fix this somehow can we why yes like this this is how in place is meant to be used okay we're going to construct the thing in the string in place in the first place with our initial piece and then we're going to act on that particular string and there's no other string involved I'll stir that requires 17 only be cut but only because of this here you can do this and it you can do the same thing it'll work just as well it's just one more line of code because you can't in place back prior to 17 did not return anything and so you would have another line that says Auto reference s equals V dot back but that won't cost you anything thank you for pointing that out okay questions how about this situation well we're just okay no we're in the same we're back in a similar situation here because we now have to construct an R value right which can be moved from for sure and will be but the move is going to cost you something maybe more than you think this does absolutely nothing because they were already our values move is just to cast our value this however does exactly what you want okay I have to take a little sidebar to talk about perfect actually it's more than a little sidebar now isn't it yes I'm going to talk about perfect forwarding a little bit this is a little bit different but it still leads into a discussion of efficiency in case you aren't sure because it took me a long time to be sure in fact I think it was sometime last Tuesday what standard forward actually does it's used only in generic contexts and it ensures that you retain the r-value Nisour l valueless whatever you're passing down into another function and you need to use a this is now the correct use of the term forwarding reference for this so I want to look at a little case study of you actually using this so I've got this foo which I construct with some stuff okay and I want to call call it with this so what's the problem here we just discussed this at length right what happens here in efficiency right same inefficiency we've just talked about so what can we do to fix this well it's actually pretty easy you make this a template you make this a forwarding reference you call forward and pass that on to your string and now this is fully efficient this directly constructs the string in place and all as well now let's look at a slightly more interesting situation supposing I have this class bar and I want bar to have a vector of foods so how should I write that what's the best way to write that ad I want to I want an ad because I don't want to expose this to the public but I do the public knows that I like hold foods so the public wants to stick Foos in so this is my first attempt well this is terrible for the construction of the foo is good we just fixed that but now I'm gonna copy the foo right entirely including the string I'm not going to get moved to happen here and the call sites really ugly so my first thought is okay I can fix the call site by just doing this I'll pass in the you know what I want and then I'll push back and I'll put the foo here well that's nice it cleans up the call site but it's no more efficient it's doing exactly the same thing so I know I'll use modern C++ this help no this does exactly the same thing is it no I don't think so oh yeah right that the thing is that right the thing is that food doesn't doesn't implement move string implements move but food doesn't that's true so you would get that is true you are getting the free move good good point so if we move of a short string which isn't free good point yes this one oh yeah so this is something I do I don't know I should but I do it for very simple I almost always named my member variables with an M underscore so I can tell their members unless it's a really simple struct where it's it just seems pointless like my point doesn't say M underscore X M underscore Y it just says X Y and when I do that I actually do this because yes you can it's totally legal and in my way of thinking it's not we wanted something this simple it's not confusing but I could one could argue that easily the other way that's true that is true there's another problem with this by the way there's a subtle correctness problem here what's that someone here surely can come up with this that's right the correctness problem is that we haven't constrained STR at all and that's easy to do well it's kind of easy to do today not that easy but in this company pretty easy it's gonna be way easier to do it C++ 20 when we can use it requires but I'm just pointing that out it's not germane to our conversation but it is true all right so some good good points here so we do get the free move but it's not really I mean remember the string isn't the whole story here we're moving some other stuff here and these don't have move semantics okay this is just bike copying so what do we do does this help assuming a pointer to character well we know that it stores it as a stood string but well that's true regardless of what food does exactly this only accepts string literals suddenly I broke Mike what's that that's right or other it only accepts char star it does not accept strings so I just broke half my code so what do we really want to do well turns out it's the same oh we want to do this well no that gets us right back to the same situation in fact what we want to do is this now this suffers from the same correctness thing we just talked about but that aside this is really awesome because now not only is this optimal from a runtime point of view because we are calling the constructor directly with this of foo directly with this in place but now when foo changes that gets some some more constructors adds some more parameters got some default parameters doesn't matter bar doesn't change it just works questions yes I think the question is if if we change this from a queue to an R does that make another ad get get instantiated the answer's no because the type of quote our quote and the type of quote queue quote are the same and this is a this is a type parameter that's correct that's correct that's right because they're different lengths and that is that is true if you were going to call this a bunch with a bunch of different ones of these you may want to do something about that but if you don't I mean if it's if it's if you don't really have that many literals I mean how many different length literals are you gonna have right before it starts to be some other way you're doing it a single parameter of what type Oh be because I'm passing in four things what because it and we don't want to construct a food down yeah question okay now different topic there's a reason why we have all of these different containers and why we'd like some more they all have a purpose and they all have different trade-offs between these three things speed space and convenience but it's really really important to pick the right container for the job and we're gonna look at that in some detail so the first question that really comes up is when should I use vector right actually I'm gonna claim that that's the wrong question I think this is the better question when shouldn't I use vector it's the go-to container the other containers can have vastly better performance or the other sometimes vector won't work at all we're going to talk about that or vastly greater convenience and maybe performance just really doesn't matter there are those situations but start by analyzing weather vectors the right thing to use so vector why is it great contiguous memory fastest traversal random access right all of these things because it has contiguous memory on the other hand growth and validates everything if it grows but you don't have to have a vector grow you can pre reserved if you know if you either know how many things you have or you know how what is a number that's larger than how many things you're gonna have you can pre reserved and you won't get any thrashing and you won't get any movement so now all your references and iterators will not get they are never going to get invalidating if you're trying to do large amounts of data you cannot use a vector because it grows geometrically if you've got four gigabytes of RAM and you've got a two gigabyte vector that's full and you want to add one integer to it you cannot do that so vector doesn't work for large things array array is great it's just like a C array only a cleaner fancier it's great when you want to embed something in your object so how do I choose well if the size isn't fixed you kind of need to use vector B really nice if we had a vector that has a fixed capacity but we don't yet I personally use C arrays for very simple situations you following the principle that use the simplest tool for the job and that's and that's a very good point I mean that's that's a good argument against what I'm saying and I you know I go back and forth on this I tend to use you know if I have three thing if I have three numbers that I need access to you know in the function this you know half a page long I generally use a see array but as soon as you start trucking in it you know around your program then that argument really starts to cut in I'm sorry good point thank you the question was or that the comment was why that that CRA has pointer decay characteristics that are surprising and annoying and that if you use standard array you don't get that so Dec is great I really like deck because deck has clumps of contiguous memory and that gives you pretty fast traversal and you get random access growth invalidates only iterators not your pointers and references but it has linear growth so you can use it for large amounts of data on the other hand if you have a small amount of data decks are very wasteful and they were slower and lard much larger so simple rules right the thing is pushing the whole point of deck is you can push to the front it's faster to push to the front of a vector until your vector gets pretty large so other other factors are more important the question was what what do I mean by small and large that's coming yes almost not quite list what are lists for their node based so the overhead is very high the locality of reference is terrible right because you have to allocate a separate node for every single element and they're all over the place it gets slow or certainly slower traversal you get no random access on the other hand they're perfectly stable growth that growth anywhere never invalidates anything and they have perfectly linear growth like per element so you can make them very large on the other hand they use up an awful lot of space per element so you know unless your if your element is very large and then the overhead of the list ism unimportant but if your element is an INT the overhead of the list is quite important if they're if you have a small one there's almost no good argument for them so vector is much much smaller and much faster for anything up to I mean insertion anywhere in a vector is much faster until you get pretty large and we're going to talk about that in a minute the big read a big reason to use lists is when you can benefit from splicing so let's look at some numbers what I did here this this fine print basically says this is for when you read the slides later is that I created 10 million I had I had a table of 10 million things I used a map that's not involved in the discussion my thingies had a list I'm gonna call it a list I don't mean it had a kind of container that contains some elements and I implemented that three ways list deck and vector when the container size for each of my elements of the larger thing was 10 and I had 10 million of them look at the times this is the time spent just what I did was I pushed that happen to see it yeah I pushed them into the front of the list and then I removed them from the front of the list okay so worst case for four-vector and look at the numbers I mean vector blew everybody away I mean incrementally so over deck but way faster than the list and look at this memory used by all this okay so list is just way better there's no no no argument I mean excuse me vectors way better no argument I went up to a hundred ok deck has pulled ahead speed wise but I mean look at the memory usage so okay it's a little faster to use a deck but do I really want to expend that much more memory depends on your situation list is still way behind now when I make the interior container a thousand well I didn't use list at all because that just ran too slowly to measure in this situation because I had 100k of them and I wanted to keep the numbers in the same ballpark deck is now you know really really fast and vectors now quite a lot slower but it's still a lot smaller questions set and map so this slide should look familiar it's the same slide as for lists the same principles apply your nodes it's a node based container it's just a matter of how you hook them up so arbitrary insertion is faster until the size is quite large this shouldn't be surprising at this point maintaining order is faster until the size is quite large if you need to maintain order binary search on vector works just as well as searching in a tree in fact it's probably faster for small containers linear search is faster than binary search you don't need to keep it in order if all you're doing is gonna find it it's actually faster to do a linear search so let's look at how I tested this so I created an element that was going to keep a set of integers conceptually so I a unique ordered list of integers and I said okay that's what I need in this thing and I threw in something else just to make it somewhat realistic and I implemented this two ways this is one way this is the other way they work identically the only difference is that this uses a vector and I wrote this probably fairly dumb insert that just does a ordered insert okay and then I ran the comparison so again at ten million objects of size ten vectors faster and way smaller now mind you what it says here in the fine print is that I put the elements in backwards so that there was worse case for vector again the vector had to slide I put him in and then I put him in again and it and in the first case the vector had to slide every time it never got the advantage of oh it goes in the end it had to actually make room every time so worst case for vector at a hundred well vectors even fit I mean there's you know tenth as many of them but vector is the the improvement of vector over said is more at size a hundred I would have thought that you know it would get worse and worse no it's actually more so at size 100 and far more savings in once we get to a thousand vectors now behind set in terms of time but I mean look at the memory usage and it's not way behind right it's just incrementally slower does this all make sense questions it's not as bad right I wanted to make it worst for vector yeah that's a good point so theorem three right is vector except of course when it's not the right choice my real point is you have to make this choice every time sure okay so interesting so okay so the comment was that under some circumstances with a good hash function an unordered map hash map of some sort but presumably mean the standard one yeah and if you don't care about order is even better than vector I'd be interested to see if that's true as the size gets small because the overhead of that is going to be tremendous for small containers right I mean I didn't include the unordered maps and I probably should have because it they're definitely different but I think that this gets the point across which is for every use you need to evaluate which is the correct container for this situation not not you noted every other situation is different right see if I can paraphrase that you're saying be sure and benchmark exactly what you're doing the set of operations yeah that because it's something different is going to behave differently yeah okay so now we come to the spiritual portion of the talk we're going to have a little eastern philosophy here and we're going to talk about how you can move even faster by not moving at all so we've already seen that objects may not get any benefit from being moved right what moving is just as expensive as copying furthermore construction and destruction may have side effects that you don't want to repeat if your object launches a rocket in its constructor and blows it up in its destructor you probably don't want an accidental copy in there that might not go well and that can be true of things that are less dramatic right like a database connection and you may want objects to be stable in fact some objects want to be permanently stable for their entire lifetime okay node-based containers are stable vectors can be used in a stable way but you need to get things into them and out of them okay and sometimes esoteric examples aren't necessary sometimes you have a simple little object it's got 128 bytes it's your you know your contact object it's got a couple of phone numbers and you whatever but you've got zillions of them so it's just too expensive to mess around constructing destructing copying etc etc so and we've this is this is somewhat repeating things that I've already said but basically you want to use in place to construct things in place so that they never have to move or copy okay here's some reasons why you'd use in place this is one of my favorite little idioms okay and yes you can do this and C++ 14 and earlier by just with two lines of code but I really like it now I've got my vector let's say I call him place back i I'm asking for the default constructor because I don't want it to have anything because I'm about to read it from a file I just call read I pass it the file object boom I've just read element number one when not to use it the question was that the default constructor might also be expensive sure but it's hard to imagine how you would well yeah absolutely so so there's a reason I use this but you you also another idiom that would solve that problem is to have a this is assuming that there isn't a constructor a read constructor on the object but if there's a read constructor on the object you just take this read and a couple of parentheses out and you call them place back with the file object and you've eliminated that problem good question when not to use in place well if it's a copy anyway like for built-in types principle of using the simplest tool for the job is says use push back not in place there's another little more subtle one if you want to ensure that explicit constructors are not called if you call push back you will ensure that if you call someplace back it will call explicit instructors because that's what you're doing is calling the constructors of the object Alistair okay you have to know more about okay so they yeah so allister's comment was that you could argue the simplest tool for the job the other way and say that it's simpler to call in place back because then that's what you always use and you don't have to think about well should I be calling push back or in place back how simple is simple you just always use that and it's not as we've seen it's not gonna matter at runtime because it's gonna call them place back anyway so the question the comment was that with aggregate construction you mean with braces yeah with aggregate yeah with aggregate construction you can't call in place back I don't know about that is that true does anyone else I mean you and Wyatt why is that I think yeah okay that's that's interesting so aggregate the comment was that aggregate types require a curly brace initializer and that that gets lost in the process of passing it down through in place back to the constructor right yeah he said there's an it's a non deduced context right okay so Alistair says that that it's maybe going to get fixed for 20 and the other thing was that it works now in certain special cases right okay okay interesting I don't do much with other than to just use them I don't do much with aggregate initialization myself all right so now let's talk about splicing here's another way of not moving or copying out things stay put everyone knows that splicing is right let's have a splice method in fact it's one of the few reasons to use lists is that you can splice it nothing's invalidated it's super fast you're just twiddling pointers all right we all know this right you go from here to here okay until c++ 17 there's nothing you couldn't do anything like this to a map or a set alright and furthermore you couldn't change the key of an element anything like that that you wanted to do you had to create new elements delete old elements construction copying all this stuff we're trying to avoid with C+ 17 we have node extraction with node extraction you can do all these things and more you can change the key of an element you can transfer elements around even between maps and multi maps you can merge one container into the other you can take an element out and hold it for later use I'm going to show an interesting example of this and you can move an element out of a set I actually didn't realize this until very recently but until node extraction you could put move only elements into a set and then presumably observe them there but you couldn't move them out of the set which seems like a major problem so let's look at how node extraction works let's say you want to change the key of an element and all the other things sort of follow the same pattern so let's say we want to change this guy from X to Y well what we do is we remove it from we extract it from the map or set from the container and we hold onto it with a node handle then we can change the value because it's no longer a member of the container you see it's not the same color anymore so we can we can change the key without messing up the container the containers happy it doesn't have that element anymore and now we can put it back in you'll notice the element never moved memory all we did was twiddle the pointers and change the the key value so what's the code looked like for this well it's awfully simple here's my map all right I call extract pass it a value or or a an iterator and I get this node handle thing which we have to use Auto for because it's a one of these you know secret library types it's not secret it's in the standard but it's nothing that you really want to mess with but you do want to use its API - whoops sorry to access the key it gives you a non Const or the key don't inquire too hard as to how that happens and then you just call insert now this node handle thing is a move only type right for good reason you wouldn't want to make a copy of it that's the whole point is that you're trying to move this thing and not copy it not I mean you're trying to transfer this thing I don't watch my terminology without moving it or copying it so you can't copy it but you can move it the move the node handle mind you know move happens to the element and you just call insert with this thing and you're done and of course if your code is such that this all happens in a our value context then you don't even have to call move another example let's say I want to merge these two sets well that's one line of code just call merge notice what happened here right it's a set not a multiset there's two fives here so we were merging source into destination what when we're done we get destination with all of source except that extra five and that's left in source so no element ever gets dropped on the floor if you attempt to insert a node handle and it doesn't work the insert returns an extra parameter that is a node handle of the thing that didn't get inserted so it doesn't get dropped on the floor now if you ignore it it'll get destructed by the destructor of the node handle but and by the way that'll happen using the correct allocator a copy of your allocator but nothing's ever dropped on the floor unless you want it to be you also can use this this is a slightly more interesting example here to make incredibly efficient factories now sometimes I mean it's great to be able to in place you know things in place but sometimes a factory is really a pattern that you want to use so how do we make an efficient Factory so this factory function takes a car star and it it's gonna create a new a new record in my map of int and string right and it's gonna manage the ID for me so that's that's why I want a factory here okay so how do I do that well down here I just call insert new record with whatever and it just works so how does that work there's a temporary map I'm going to emplace the the thing into the temporary map so that's fully efficient constructs it once in the temporary map I'm then going to extract that node and return the node handle notice it says auto here okay and that's now here is going to be an R value so I can directly insert it into my new table and all it does is fiddle with the pointers the thing in memory never moves and I think that's it that's gonna be questions discussion thoughts yeah so the short the really short answer is I don't know because I'm not concurrency expert by any stretch of the imagination so concurrency I took a slide out where I was going to talk about this but there's a lot of ways to get speed out of with modern computers modern programming probably the single most important is is concurrency right we have multiple processors we need to keep them busy that's a whole topic on its own it's I'm not the person to talk about it and when you talk about concurrent container containers I don't know how those stack up this is how you write individual threads and make them fast so one of the things that I didn't mention earlier give me a minute I have to bring it back any other questions while I'm thinking of what I was gonna say okay so his comment was that moving shared pointers can really improve your concurrency the performance characteristics yeah so what I was gonna say is that what the whole point of this talk is to say that writing efficient code isn't just for people who work for you know National Laboratories or that guy down the hall who never opens his blinds and has a editor that's green on black it's it's for everybody all the time you you know I like to always be thinking about is the code I'm writing optimal and efficient and when you do that you end up we after you've written a big program with a big program that runs fast that and and is correct and you don't have to worry about going back and tuning it unless you run into one of those situations where it's spending a whole bunch of its time in one little spot Alistair's point is a very good one efficient isn't just fast it's it's energy efficient I learned something oh this was some years ago now that really surprised me and there's people in the room who I'm sure can talk about this all day but the cost of a typical computer over its lifetime is something like at the time it was something like 25% the equipment and 75% the power to run it for its lifetime I mean the cost of power is huge and if you don't need as much computing power right if you don't need as much memory and all these things cost us money and and if you can if you can run it for less time to get a certain amount of work done that's you're saving money other questions thoughts comments suggestions I know everyone's thinking beer and hot dogs I'm sorry I I can't accept I have been unable to get invited to the slack Channel I've been it's been blocked and not working but I'm sure that the slides are going to be made available because I and I and I'm pretty sure that the video is gonna be made available so you will it will be available sure are you talking about new record sure if I wanted it to take something other than a char star right oh oh you're saying in this actual code okay so the question was should this be a forward right okay so so so you're saying change change the new record to be able to take more than just a char star and in which case we would want to if we're gonna make it a template we'd want to perfect forward what whatever came in into the end place that's correct I wrote this to show something else so I wasn't really worried about that but if we wanted this to be you know more Universal we would want to do that for sure yes you're right that's a very good point the way this is implemented this particular you know which you'd never hopefully never do like this or maybe that would be exactly the the soso allister's point was that as soon as you start making more more of these because it's a template every single one of them has its own static in ID so its own ID starting from zero and since they're all going in the same place that's almost certainly wrong so the question the comment was that you would have to use piecewise construct here in the in place call if you use a very attic that's a really interesting comment I don't know the answer I would have thought not I would hope not I hate that anyway a lot but it wasn't the original design um but it it may be the case and I don't know the answer I don't know if anyone's compiler and in their head can do that that's right that's definitely true if there's more than two things here you're definitely going to have to use it yes because you're not going to make this a very attic right you're gonna make this to take a single string template parameter right yeah other questions comments thoughts anyone I have a question okay I'm sorry good yes okay I like that very much thank you so the comment was instead of saying optimize everything or write everything optimally the topic really should have been don't pesum eyes at all anywhere I agree I mean I think it's it's somewhat a matter of semantics but it's a good way to think about it right it's like if it doesn't matter or if you know if you're writing code where it just doesn't matter you know how the normal way to write it can't be improved on which is lots of code or you're in a situation where it really genuinely doesn't matter and I'm gonna say like user interfaces but then I'm gonna remind you of my opening story typing is a user interface phenomenon right so it's not always true that user interfaces it doesn't matter then just right normal code but when you are in a situation where it could matter or where you're doing something where how you write it matters be sure you're not Pesa maizing at all yeah I agree that's excellent I had a question does anyone disagree that that we need to think about these things and that there's some sort of a problem with a thousand times faster computer not keeping up with typing that a thousand times slower computer could keep up with even given that it's doing lots of other stuff I know there's a lot of fluff in there but a thousand times anyone disagree okay the comment was we need OS driver developers in the room to hear those things fair enough there's a lot of factors absolutely but I've certainly run into a lot of of the kind of thinking I'll oh oh did someone turn there we go thank you I've run into a lot of attitude of the attitude that it just doesn't matter I mean you know I not to pick on Java but I'd love to pick on Java in Java and and I'm not job expert so tell me if I'm if I'm incorrect but in Java you've got little int and big in right and they're very fundamentally different but all of your user-defined types are like big and they all are objects which means they're all dynamically allocated so you can't embed things in you can't embed one user object in another directly and that just has mungus space and time implications and I was involved in a project some years ago now where that was really they tried to do a Big Data project in Java and it failed miserably and that I'm pretty sure I'm pretty convinced that that kind of thing that was a lot of the reason different design patterns we used fly weights a lot more in Java where you can stock using object identity to avoid having these plans and things yeah I know the comment was used the design patterns that work for the language you're using which is certainly a a wise thing to say but I think that in C++ using the correct design idioms you end up with code that can't be improved on particularly even if you were writing an assembly code and that's the whole point of C++ right so it is exactly 6 o'clock any other questions comments thank you [Applause]
Info
Channel: Coding Tech
Views: 48,048
Rating: 4.8958707 out of 5
Keywords: c++, performance, fast c++, software development
Id: LFv7XwgsdLY
Channel Id: undefined
Length: 88min 56sec (5336 seconds)
Published: Sun Nov 04 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.