CppCon 2018: Alan Talbot “Moving Faster: Everyday efficiency in modern C++”

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- So I've been thinking about this talk for a while and a while ago I was talking to Howard Hinnant about it. He actually proposed the great title Moving Faster. But he also told me a story. Apparently his wife can type 90 words a minute. I guess, for professional typists, that's kind of normal. Seems a little scary to me, but. The point was that 30 years ago on her computer she could type 90 words a minute and her computer kept up with that. Hello. Oh, I think I pushed the wrong button. Ah, there we go. So let's look at that a minute. 30 years ago I had a Mac something or other on my desktop. Oh sorry, first of all. 90 words a minute is 450 characters a minute or seven and a half characters a second, right? So 30 years ago I had this on my desk. So how many instructions did the thing execute per character? Two million instructions per character right? Of course it kept up. You can do a lot in two million instructions. Why wouldn't it? So let's say we all got frozen and thawed out 30 years later and now I have this thing, which is that. So it does? Don't all talk about once. Two billion instructions per character. Actually a little more, right? Does everyone know where this is going? Howard's wife's computer no longer can keep up with her typing. (audience member talking in the background) What's that? I don't think she's become faster. 90's pretty fast, try it. So we must be doing something wrong. I don't know all the answers, but I have a few ideas. Here's a couple of 'em. Thinking it doesn't matter. Designing like it doesn't matter. Coding like it doesn't matter. So today I'm gonna talk about the last thing and some of the second thing. So when does efficiency matter? Typing, computer can't keep up with her typing? The world record for typing speed is 216 words per minute according to Wikipedia set by this person in 1946 on an IBM electric typewriter. Now this was not the Selector, this was before the ball. Computers can't do that? The company I used to work for they hired me to work on a program that extracted data from a database. It took three weeks to run. And in fact they asked me to work on it, 'cause they wanted me to put some breakpoints in so in case something happened, they didn't have to go all the way back to the beginning. You can understand why. I didn't do that, I rewrote it so that it ran in more like five hours or so, which was a business game changer for them. Company I work for now, we have a test automation system. We use it to do test driven development and such. Currently we're running 2,700 tests, takes about 70 minutes. We've just finished rewriting it and we're expecting it to take about 20 seconds to run those same tests. That's gonna make a big difference. My point here is, if this thing would work, performance always matters. Sometimes you don't actually have to do anything to get good performance. And when people say, well performance doesn't matter, I think what they really mean is, it's already good enough, we didn't have to do anything special, but it always matters for everything. So there's this rule, that I have always heard, tune, don't optimize. Measure it, find the 5% of the program where it's spending 95% of its work. I claim that this is necessary, but not sufficient. This was formulated to keep programmers from writing squirrely messes in the idea that, oh we'll make it go faster when they didn't need to. But I feel that in most cases, if you write a big complicated program without paying attention to performance, you'll end up with a big program that doesn't perform very well and that you can't really fix. I think you have to always be thinking about efficiency in your coding. And in fact, in the keynote a couple a days ago, Mark Elendt said this really nicely at the end of the talk, he said, "performance is hard to retrofit." I said, yeah exactly. But optimized code is ugly, right? In C++ I think optimal code not only can be elegant, but is the most correct code, and so is the most elegant by definition. And I think the most elegant correct code should be our idiom for what is the way we should be writing code. And C++ gives us the tools that if something really is ugly you can hide it behind a very elegant abstraction. The chronos date time library does this. This happens all over the place, matrix operations. So another way to say this by the way, is rather than always write optimal code is you could say, never pessimize. Be sure that you're not pessimizing your code ever. So let's start by talking about memory usage a little bit. Why does this? So I grew up thinking that floating point operations were slow. Golden rule used to be, reduce floating point divides, count your floating point divides. Get as many of 'em out of there as possible. Well somewhere along the way, I actually did some measurements and discovered, whoa, this isn't true anymore at all. Floating point's just as fast as integer if not faster. It's not an issue anymore. What is an issue is caching. Caches are big these days and they're much faster than main memory, but they're not big compared to main memory. And they get faster, but when they get faster they get smaller. So locality of reference is very, very important in modern computing, which means that size matters. So cache misses are the new floating point divides. So theorem one, avoid cache misses. Dynamic allocation is the most expensive thing we do if we're not talking about disk operations or internet operations and deallocations are probably the second most expensive. So we wanna reduce our allocations, our dynamic allocations as much as possible. Count your allocations. So we're not counting floating point divides anymore, we're counting allocations. Most of the standard containers and almost any use of a smart pointer of course is using the heap as a dynamic allocation. We need to think about these things as expensive. Furthermore, heap allocations tend to be all over the place, so the locality of reference is terrible and we get caches misses. So theorem two is avoid dynamic allocations whenever possible, and you can't always avoid them of course. Static allocation on the other hand is awesome right, because when you put things on the stack for example, the stack is contiguous, so it has great locality of reference, it's always in use, so it's always in the cache. Unless something is pretty big, if it has local scope and fixed size and all that, it should be on the stack. Use the stack. And I took out, I had a slide about registers. The compiler generally takes care of that for you though, so I'm not gonna talk about that. Obviously, registers are the fastest of all. So something that C++ makes easy and that certain other languages one might mention make impossible or hard is embedding things. So if we want good locality of reference, we want small size, we want to create objects by composition not by reference to other allocations. So for instance, you have a contact object, you wanna list of phone numbers. Do you use a vector, which means a second allocation or do you use an array which means that it's internal to the object? Well it might be worth wasting, it would be worth wasting some space in an array that you're not always gonna use, as long as you can pick a maximum number of phone numbers to embed the object. So in other words, the lower thing, not the upper thing. Because you're gonna have much better locality of reference, much faster allocation and deallocation, 'cause there's only one, much less space because of the overhead of allocation. Sadly we don't have a fixed capacity vector, which would be perfect for this, or what I call a short string optimization vector. But that's something that is being discussed for the language, so we may someday. And of course, squeeze the air out. You don't ever do this do ya? Don't ever do this. Sure that's the order of members that makes the most sense when you read the source code and looks pretty, right, but really what you want is this, down from 56 bytes to 32. And share space. So unions used to be kinda useless before C++ 11, but they got fixed. You can put almost anything in a union now. And you can share space very easily. Now unions aren't safe, but they're reasonable if for example you have a situation where which member of the union is active is never gonna change for the life of the object. And that happens quite a lot and that allows you to put things on top of each other, which is even better than putting them right next to each other. If you want safe, we now have C++ 17, variant, which is a safe union. Works very nicely, it's a wonderful tool. But you're trading a little bit of space for that safety, so you're not going to save four bytes by superimpose two ints using a variant, 'cause it has a type tag that's probably gonna be eight bytes in addition. But it's a great tool if you're talking about cases where that much size doesn't matter or we got large things in it. So one of the reasons I wanted to do this talk was that I'm always wondering how to pass things. Like what's the most efficient way? What's gonna produce the tightest code? So simple things, pass by value. By simple things, I mean builtin types for example, or your simple types, small types. Value semantics are great, just remember you're making a copy. But for simple things, that doesn't matter. Also, if you're gonna modify, if the function is gonna modify the thing anyway, well pass it by value and then modify the argument. No reason to pass it by reference and then assign it to the local variable. That's doing the same thing in more code, more source code. Or pass by shared_ptr of course to, oh pass shared_ptr by value to share ownership. But, if you're not sharing ownership, pass by const reference. Pass most other things by const reference. There's no penalty for passing, like an int, a builtin type by const reference. I measured this. It's just as fast. So if you're writing generic code, pass by const reference, don't worry about it. It'll be just as fast as passing an int by value and it'll do the right thing, when you're passing your giant objects. So Nico had a great presentation a few years ago where he pointed out that passing shared_ptr by const reference is much faster than the mechansim that will occur when you actually pass it by value and make another share of that thing. So if it's a read only access and there's no ownership, so all you wanna do is observe this thing that you have a shared pointer to, pass it by const reference. Obviously with any reference you have to think about the lifetime of the referenced object. I highly recommend watching that video by the way. Now your mother told you, don't talk to strangers and don't pass by non-const reference, right? Outies are bad, right? We don't want out parameters. Well, there's good reasons for those rules. Makes code harder to understand and reason about. Value semantics may be fast enough anyway and they're much easier to understand. But sometimes, it can be much more efficient to pass by non-const reference. And I'm gonna show you an interesting example of this. So what we have here is we want to, we have a vector and we're gonna go through this loop and each time through the loop, we're gonna clear the old stuff, we're gonna load it up with some numbers, and then we're gonna do something with it that I didn't include in the slide. And the loading of the numbers is just gonna put the numbers from one to 1,000 in it. So the simple way to write this, the obvious way to write this is pass it by value, count on move semantics to not make a copy and all is well, right? Well let's look at what happens here. What we're gonna look at is what happens the second time through the loop, through the lower loop. The first time through the loop, I mean, what more can you do right? You're passing an empty vector, so nothing really happens. You're gonna do this push_back 1,000 times so you're gonna get logged something 1,000 allocations as rows and reallocates and then it's gonna return it. And because you're returning a parameter that's passed by value, you're gonna get the automatic move on return. But what happens the second time through the loop? Well this is what happens. We started out with size 1,000 and capacity whatever my library happened to make it this. And then we clear it, so size is zero, capacity's still there and the we pass it by value, which it means it doesn't move to this new vector, that's the parameter, the argument of the function, and now the size is zero and the capacity is zero, because when you do a move, you don't move the, the whole point of the move there is it's empty, right, so you're not gonna keep the capacity, you're not gonna keep this allocation that's out there that has nothing in it. So you have to do 10 or whatever it is more dynamic allocations. And then you're gonna move it back with a move constructor and a move assignment. And the next time through you're gonna do it again and you get 10 more allocations. This is gonna run really slowly. So let's try something different. Let's pass it by r-value reference and let's move it in. So what happens here the second time through? Well the size is 1,000, the capacity, we move it in, this size is, I mean the size is zero after the clear. We move it in, the size is zero. We've still got our capacity. Now we have zero allocations, this is good right? But is there a problem here? Does anyone notice that the capacity is not 1066 any more, it's 1,000? Why is that? Well because we are not getting a move we no longer have a bivalue parameter, so we're not getting the automatic move out, we're getting a copy. And the copy creates a new copy with a fitted capacity. So this is not gonna move as fast as it could if we have a copy constructor there on the return. If we do an explicit move, now we've moved constructor, now our capacity is retained, and we're now getting pretty close to optimal, but the code's a little bit messy and I mean it's a little bit odd, isn't it, to move from, to call a function, to move out of something in the function call, and then assign back to that same something? Someone might think that was a little weird. And it's still not as optimal as it could be, because these are move are not free, and we're gonna talk more about that later. What happens if you pass it by non-const ref? Well now, your call site is as simple as it can be. And what happens, again the second time through? Well, the call, nothing happens because there's no construction, it's just a reference, so of course your get the capacity. You do zero allocations and there is no return, no return statement, so nothing happens there. So this is optimal, but it does you a pass by non-const reference. So I'm gonna gloss over the return rules a little bit. There've been other talks this week about this that have been very good and it's complicated and I don't really have time to talk about it today, but here's quickly some rules want you to keep in mind for returning values. You get the automatic move under some circumstances. You get the copy elision under some circumstances. This is an area worth studying. And the key, the thing that we were looking at just now was that you can return a local variable or a bivalue parameter, but not a bireference parameter to get the automatic move. Don't ever move on your return, because that will inhibit RVO. And you don't need to, 'cause you get automatic move. So let's talk about moving a minute. Now I'm not gonna introduce moving 'cause I assume anyone who's here already knows about move and uses it all the time. But, (sighs) what I've discovered is that, and this is like common knowledge these days right, ever since C++ 11. What I've discovered is that people who probably should know better will saying things like, oh that's alright, it'll get moved so there's no efficiency issue. Well that's only true in a certain class of situations because many objects get no benefit whatsoever from moving. Simple example, if you've got an object with 1,000 ints in it, it's gonna copy 4,000 byte whether you call it move or copy. Move does nothing for you. How about a string? Well string is a perfect poster child move, right? You have a string that's attached to your Russian novel and you get a move instead of a copy, that's a huge win? But what if your string contains a zip code? Well the string is gonna store that in the short string optimization. And the short string optimization, the copy of the move and the copy of a string of that size are exactly the same performance. I've measured it. So, the short string optimization's a great thing. Doesn't make it any worse that the move would be, but it costs something. And what if you have lots and lots and lots of 'em? Well you're gonna have to pay for that cost. So saying, oh it's getting moved, well it doesn't mean anything. You still are paying for the copy, 'cause that's what it is. So let's look at an example. So I'm gonna build up the string, get lots of other stuff, gets all kinds of characters, your Russian novel, and I'm gonna do a push_back into the vector. Well this is a problem because this is going to do a copy. Excuse me, it's gonna do a move, but it's... See what's next. Now it's gonna, no I'm sorry, it's gonna do a copy. Yeah, yeah, no I had it right, because it can't do a move because the string is local, right, we can't eviscerate the string, so it's certainly going to do a copy. And calling emplace_back doesn't help. In fact push_back calls emplace_back. They do exactly the same thing. So how can you make this efficient? You can call move explicitly. So now of course after this, s is in this weird state. So this isn't great coding style, but maybe this is what you need to do. Now you have a move and now it's much, much more efficient. But what about this case? In this case the move as we just discussed, doesn't help at all. It's pretty fast, but if you've got lots of 'em, it might be an issue. This as we just said, doesn't do anything different, doesn't help. So is there a way to use emplace_back in such a way that you get an improvement here? Yes there is. This is the right way to use emplace_back. We're going to emplace it in the first place. We're not gonna have a local variable. And then we're going to modify it as necessary. So this is optimal. Can't do better than this. You're constructing the string from the car star directly in place in the container. What about this situation? Well this is more of the same, right? This here, if you've got a lot of 'em, it's not gonna go as fast. This doesn't help as we just discussed. It's already an r-value, move is just a cast r-value, this did nothing. But this will make it go significantly faster, potentially. So now for some Eastern philosophy, you can go even faster by not moving at all. And we've already hinted at this. Some thing shouldn't be moved because they're too big or because construction and destruction have side effects. If you constructor launches a rocket and your destructor blows it up, you probably don't want any accidental copies and constructions and destructions. You may want the thing to be stable in memory so you can have all these references to it, pointers to it, whatever. If you have lots and lots of objects, even if they're small simple objects that don't have strange side effects, it may be very expensive to construct them more than once. So you'd rather not move them and again emplace is the way to do that. You want to construct them emplace. Another thing you can do with emplace is insert a default-constructed element without copying it. And if it's default-constructed, the chances are pretty good that move and copy aren't gonna have different performance, right? One of my favorite idioms is this one, this emplace_back(file). You're trying to read, you've got a serialized set of objects and you're trying to them back, you call emplace_back with your file object, calls the read constructor, builds it, it reads, constructs it in place and reads it. No extra copying. You don't even have to call the default constructor. So when not to use emplace. Well if it's copy any way, like built-in types, using push_back for example is a simpler tool, it's been around longer. You could argue that you should use the simplest tool for the job. Emplace will call explicit constructors, that's what it's for, it's for calling constructors. Push_back will not call implicit constructors because you have to, explicit constructors, because you have to implicitly construct an object. So if you give it something that is not the object it will call implicit constructors to convert it, but not explicit. You also can't use aggregate construction with emplace. I think this is gonna get fixed. I've heard this is gonna get fixed, but I don't know the details of that. Another way to avoid moving, avoid moving or copying of any sort, is splicing. So you all know about splice, right, nothing's invalidated, just move things around between different lists, let's say. All right, you all know how this works. Go from this to this, very easy. But until C++ 17 there was no way to do this with sets and maps, which meant a lotta things. It meant you can't change the key of an element in a map. There was now way to do that prior to 17, without removing it and putting a new one in, which means allocation and deallocation. You can't transport an element to another container. You always have to create new elements and delete old ones. You also could not move an element out of the set. You could move an element into a set, but then it was stuck there. You couldn't move it out, you could only make a copy of it. Node extraction fixes all of these things. You can change the key of an element, you can transfer an element to another container, you can merge containers. You can take an element out and hang onto it. The container can die. Another container comes along you put it in. And I'm gonna show you an example of factories in a minute. So the way this works is, let's say we have a set or a map here and it's got object of value x and we wanna change that to y. The first thing we do is extract it. Now notice that it's not a member of the container anymore. It's being held by this node handle thing. Now we can access the key in a non-const way. We can change the key to y and now we can re-insert it into the container and of course it may do rebalancing or whatever, but no memory go allocated or deallocated, nothing got moved around in memory, just the pointers were changed. This is the code for that, it's very simple. Don't ask too closely how that key accessor works. That's a little tricky, but it does work and you can do something like this, it's very simple. Now you need to move it because is this auto, this node handle object, this node handle type is move only so you have to move it back in when you insert it. Mind you I'm talking moving the node handle, I'm not talking about the element. The element never budges, but the handle is a move only type, otherwise is wouldn't, we can't make a copy of it, 'cause then we'd have to copy the element, which is the whole point of this, is to avoid that. And often you don't need to say move because the code is such that you're in an a r-value, you can make an r-value context anyway. Here's an example of merging sets, incredibly simple, one line of code. Notice the result. The merge is going to put all of the things from source into destination except, since this is a set, not a multi-set, except the extra five and it left that in the source. Nothing ever gets dropped on the floor when you use these tools. And in fact, if you go to insert into a set or map and the insert fails, the return value of the insert statement includes an extra parameter that's a node handle holding the node that was constructed and you can choose to ignore it and it'll get deleted with the allocator, the appropriate allocator, or you can do something with it, but nothing's ever just magically disappears on you. So this is kind of a neat pattern. I haven't actually used this, but this facility lets you make very efficient factories for elements for maps and sets. So let's say I want to have this factory managing the creation of the ID for something and my map is a table. I've got an ID and a value, which is a string in this case. I can create this factory and what it does is it has a temporary version of the container. I'm gonna emplace the ID in the string in this case. So that's perfectly optimal. And now I have the element and it belongs to this temporary. I'm now gonna extract the element from this temporary and return it. The temporary gets destroyed. I've got this handle which is returned and of course is an r-value and then I just call insert with that into my actual table and all it does is link up the pointers. So I think there's gonna be lots of interesting uses for this sort of thing. I'm just gonna go through this quickly because this is, I thought, quite interesting. This is a little bit of a side bar, but it uses some of these tecniques. I had this situation where I had a foo object so I wrote it in the most naive way. Well this is a problem because we're going to have to construct, we've already looked at this, we're gonna have to construct a string temporary, and then move out of that string into the string that's in struct foo. We can improve on that by making it a template and now doing perfect forwarding of whatever is passed in directly to the string in the struct and it will call the appropriate constructor in this case for the char star. Some of you may notice that there's a correctness problem with this in that I'm not constraining it, my STR template parameter in any way, but that's easy enough to do, especially with concepts in C++ 20 hopefully. But now I have a situation where I have this class bar that has a vector of these things and I want an add function, 'cause I don't wanna expose the vector directly to the public. I just want people to call add and I want 'em to pass in the foo thing. Well, start out with this approach, but this is neither optimal nor pretty. The call site is ugly and of course it's going to do a complete copy of all this because, I mean, sure the string will be moved, but all the other things don't have move behavior. And the string in this case is a short string, so as we've discussed, it doesn't really have move behavior either. So this is not only inelegant, but inefficient. Well this makes it more elegant. Here I've just move the foo construction into the add function, but that hasn't helped the efficiency at all. I can do it this way. That looks all modern, but it's not any different. It's doing the same thing. So what's the right way to do this? Well, I can call emplace_back. Now this is fully efficient and it's gonna do the right thing, but now it only works for half of my use cases 'cause I only take a char star, I don't take a string reference, constant string reference anymore. So that's no good. So I could write two of 'em, but I don't wanna do that, so what do I do? Well this doesn't help, because now I'm back to the same problem of inefficiency, so I do this. So this uses variadic templates and the beauty of this is that it's fully efficient, it's very elegant, the call site looks the way I want it, it's fully efficient, and best of all, when I change foo, I don't have to change this add function. So that was a little bit of a tangential discussion, but I thought it was interesting. All right, containers. It's really important to choose the right container for your job, every time. Every container has a purpose and the containers have different trade offs between speed and size and all that, convenience. Committee is looking at new containers that offer different choices of speeds, space and convenience. The point is, there are no fixed rules. You should use this instead of that, instead of that. You always have to pick the right container for the job. But the question is, how do you do that, what's the right container? Well you have to measure. So let's look at this. First thing everyone asks is, when should I use vector? Well I actually claim that's not probably the right question. I think the right question is, when shouldn't I use vector? Vector is the go-to for dynamic storage. Other containers offer special properties, better performance sometimes, more convenience sometimes. But you should always start by asking, shouldn't I be using vector here and we're gonna see why. Vector has contiguous memory, so it's fast traversal, it's random access, excellent locality of reference. But growth invalidates everything, right, if it grows. If you know the final size, or you know a number that is guaranteed to be more than the the final size, you can pre-reserve. And you pre-reserve and you are sure that you're not going to to beyond that value, then it's not gonna reallocate. So suddenly, vector has all of the best properties, 'cause it's not gonna reallocate, nothing's gonna get invalidated. And by the way, you can always shrink it later, which may or may not do anything. I mean shrinking is kind of a hen in terms of actual memory use for your operating system. The biggest problem with vector is you can't use 'em for large data. If you've got four gigabytes of RAM and you've got a two gigabyte vector and it's full and you wanna add one more thing to it, you can't. So that's a problem for certain applications. Array, like a C-array has all the best possible things, but it's fixed size, right? The biggest thing about array, is it's local. So how do you choose? Well, if the size is gonna change, you gotta use a vector, at the moment anyway. I use C-array for really simple little situations. And I use STD array if I want container-like behavior. Another reason is that with STD array, you don't get pointer decay behavior such as you do with C-arrays. Deque, I love deques. Deques are great. They have clumps of contiguous memory, so the traversal's pretty fast. They have random access. Growth only invalidates iterators, not pointers and references that you keep to your stuff and they have linear growth behavior, so the work very well for large data. However, they have tremendous amount of overhead if the size of the thing is small. So they're useless for small things. So it's pretty obvious. If you need big data and you want those other behaviors, use a deque. However, pushing into the front of a vector is faster than pushing into the front of deque. Faster that pushing into the front of deque until the size of the container is quite large. This is surprising. We're gonna see some numbers in a minute. And vector's much smaller. List, there's cases where list is useful, but not many. It's node-based so the locality of reference is terrible, there's all this overhead, slow traversal, no random access. However the nodes never move, growth invalidates nothing, growth anywhere, middle, ends, right? And they're perfectly linear, they grow one element at a time, so you can fill 'em up until you hit the end of memory, but their size is large, so whether that's really gonna be a better solution is not clear, depends on your situation. And if again, if it's small, they're very wasteful. So let's look at some numbers. So what I did was I had a element that contained a container and the inner container was either 10, 100, or a 1,000 in size and I filled it up from the front, which is the worst case for vectors. And I wanna say that I then read, I then refilled, I can't remember. It says in the fine print. That's for when you read the slide later, but I made it the worst case for vectors. And then I did 10 million of size 10, one million of size 100, et cetera, et cetera to keep the numbers in the same range. So look at this, vector blows away deque and list, mind you filling it from the front, so every time you put something in, it's gotta slide the rest of it down. The size is much smaller and the access time is considerably better than, is noticeably better than deque, and much better than list. Go up to size 100, now deque has pulled ahead a little bit, but look at the size difference. It's only pulled ahead a little bit and list is still trailing. We get up to 1,000, now vector is starting to do a little bit worse than list and way worse than deque, but the size is sill very small. So this is, I mean 1,000, think if somebody asked you, would you use a vector for a 1,000, for something that was 1,000 long and pushed to the front of it, you'd say, well no, of course not. But maybe you would. Maybe the difference in performance isn't as important as the difference in size. This is why you always have to figure it out for each situation. Set and map. This should look familiar, it's the same slide as for list. Has all the same considerations, because they're node based. So what's interesting is, arbitrary insertion into a vector is faster until the size is quite large and maintaining order is faster until the size is quite large. You can use binary search on a vector, it's just as good as looking something up in a set. In fact, it's probably faster. But what's really interesting is for smallish containers, which are very common I find, linear search is faster, faster than binary search, which means you don't have to keep it in order. If you have a list of 10 things, don't keep in order, put it in a vector, and just do a linear search. It will be faster than any of the other ideas you might have, like oh, we got look things up in it, it's gotta be set. Use vector. So the test I did here, I had an element which looks like this for testing set and I just threw in an id for something just to make it a little more realistic. And then I had an element which looks like this for testing vector, and I created this insert which does an ordered insert. And then the numbers I ran, again, I put things in in the opposite order, in reverse order, so it was worst case for the vector test, vector's worse case. And again, at size 10, vector blows away set in time and of course in memory. What's weird is that at size 100, the difference was even greater. I would have thought it would get progressively worse, but at size 100 vector was an even better solution. And of course the size is still, is actually, I guess about the same ratio, maybe a little better. So you have to measure. This stuff is not intuitive. Get up to 1,000, now vector is slower, but look at the memory difference. It's not much slower. It's not even twice, so you have to decide. So theorem number three. Vector is usually the right answer for an awful large class of problems, except for it's not, and then, you know, there may be a case where vector would be impossible, or much, much, much slower than using a map or a set. Or maybe performance doesn't, really doesn't matter in that case and you've measured to prove that and the map or the set or the list or whatever is much more convenient. But again, you gotta check. And so we have some time for questions. (audience applauds) Just yeah, line up at the mics. - [Man] So in one of the first few slides you said, prefer stack allocation when objects are smallish, but you never defined smallish. - Well that, I suppose depends to some extent on how big your stack is. I mean, I've put some pretty big things on the stack. I'm sure you can look up what your stack size is. It depends on your environment. - [Man] Is it a ratio of the maximum size of your stack? - The question was, is there a ratio? You know, I don't actually know. I've never ended up pushing, I've never ended up pushing the stack to the point where I had to go figure that out. So for my environment I don't know, but I'm sure you can find that out for any environment. And you know the stack, if you make the stack frame big enough, you're gonna have caching issues, right, as you go back and forth to the function. So you have to sort of think about. When I talk about small things, I usually mean small as in bytes, kilobytes. Your stack is unquestionably megabytes at least. And you know, maybe someone else will come up and answer that, 'cause I don't really know the right answer. - [Howard] I've made a terrible mistake. My wife informs me that she types 125 words per minute, not 90, so. - Oh my gosh. (Howard laughs) I got redo the whole talk. - That's right. Nice talk, thank you. - Thank you. - [Audience Member] Two quick points. One is beware in Microsoft environments, the default stack size is one megabyte. And I work in a native C++ library that's called from .net code and of course .net's never heard of the stack and we learned the hard way, that you got be really parsimonious with stacks base there because we unfortunately could not get the default stack size overridden in our environment, just due to political factors I suppose. Anyway, my question is, you mentioned that fixed-sized vectors are in the pipeline at some stage. I got the impression it was kind of preliminary. Are we talking maybe in C++ 20 or 23 or for some n greater than 23? - I'm pretty sure not in 20. I hope that there's gonna be some new containers in 23. I'd like to, there's one I'd like to do to, because I think that there's just not enough flexibility. Vectors, you know, if you extrapolate this, you realize, oh, there's some other things one can do that'll make it go even faster and even more useful and more flexible and we just can't do that with vector. - [Audience Member] Well in doing away with the reallocation possibility is just so helpful for stability and pointers and references and things. So, okay, thank you. - Sure. - [Participant] Have you considered comparing with an ordered set, an ordered map in addition to just set. - Thank you, you know I should have, I got asked this last time I did this and I should have put it in the slide, but, I didn't measured unordered, because I wanted to make a point, not give a complete... The point is you have to go see. So I didn't measure unordered. They're going to have the same relationship to vectors for small sizes. It's not node, well, (laughs) standard STD unordered containers are node-based, so you're gonna have the same problems that you get. If you have a custom one that's not node-based, you're gonna have different performance characteristics. And again, you've really, you've got to measure for your situation, is really the point. But I didn't measure the unordered maps. Because they're node based, I suspect it would be similar. - Thank you. - Sure. - [Male] Hi, I was looking at your factory example for putting things into, what it is a map? - Yes. - [Male] Of the advantages of C++ 17 is with something like try_emplace if you pass the map in by reference, you can even make it a static generic function or something, you can do try_emplace and that way you don't actually have to, say if you increment the id number, and then it turns out that there's already something like that in the set, you're sort of committed and the idea is with try_emplace you can either go forward and do something efficiently in the factory or you can decide that it was inefficient and then you don't lose the thing that got moved into you. You can pass it back out again, sort of like you were talking for other case. - Right-- - [Male] Emplace and maps is really a weird thing. That's why they have these irregular methods, like what is it, insert or assign is another one like that, that's tentative. - Right, yep. That's solving a slightly different problem than I was addressing here, but that's true. I haven't actually used try_emplace. - [Male] Well emplace has been sort of questionable all along especially for move only things, because unless you're in a vector where's there's no possibility of that sort of thing happening, that you never know whether or not you're gonna use the thing you just constructed with an emplace, and a lotta times it ends up being worse, because you end up creating an item, and then because there's duplicate, throwing it away. - Well sure, but if you didn't use emplace, that would still be true. - I suppose. - It's not that emplace made it worse, it's that it didn't make that issue better. - [Male] Well it does help with move only, because a lotta times they can give you the move only parameter right back, and you can do something else as you write. - Yeah. - Anyone else? 25 seconds. Anyone else? Thank you. (audience applauds)
Info
Channel: CppCon
Views: 24,057
Rating: 4.6784315 out of 5
Keywords: Alan Talbot, CppCon 2018, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, conference live streaming, event videographers, capture presentation slides, record presentation slides, event video recording
Id: EovBkh9wDnM
Channel Id: undefined
Length: 59min 42sec (3582 seconds)
Published: Sun Nov 11 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.