- So I've been thinking
about this talk for a while and a while ago I was talking
to Howard Hinnant about it. He actually proposed the
great title Moving Faster. But he also told me a story. Apparently his wife can
type 90 words a minute. I guess, for professional typists,
that's kind of normal. Seems a little scary to me, but. The point was that 30 years ago on her computer she could
type 90 words a minute and her computer kept up with that. Hello. Oh, I think I pushed the wrong button. Ah, there we go. So let's look at that a minute. 30 years ago I had a Mac
something or other on my desktop. Oh sorry, first of all. 90 words a minute is
450 characters a minute or seven and a half
characters a second, right? So 30 years ago I had this on my desk. So how many instructions did the thing execute per character? Two million instructions
per character right? Of course it kept up. You can do a lot in two
million instructions. Why wouldn't it? So let's say we all got frozen and thawed out 30 years later and now I have this thing, which is that. So it does? Don't all talk about once. Two billion instructions per character. Actually a little more, right? Does everyone know where this is going? Howard's wife's computer no longer can keep up with her typing. (audience member talking
in the background) What's that? I don't think she's become faster. 90's pretty fast, try it. So we must be doing something wrong. I don't know all the answers, but I have a few ideas. Here's a couple of 'em. Thinking it doesn't matter. Designing like it doesn't matter. Coding like it doesn't matter. So today I'm gonna talk
about the last thing and some of the second thing. So when does efficiency matter? Typing, computer can't
keep up with her typing? The world record for typing
speed is 216 words per minute according to Wikipedia set by this person in 1946 on an IBM electric typewriter. Now this was not the Selector,
this was before the ball. Computers can't do that? The company I used to work for they hired me to work on a program that extracted data from a database. It took three weeks to run. And in fact they asked me to work on it, 'cause they wanted me to
put some breakpoints in so in case something happened, they didn't have to go all
the way back to the beginning. You can understand why. I didn't do that, I rewrote it so that it ran in more
like five hours or so, which was a business
game changer for them. Company I work for now, we
have a test automation system. We use it to do test driven
development and such. Currently we're running 2,700
tests, takes about 70 minutes. We've just finished rewriting it and we're expecting it to take about 20 seconds to run those same tests. That's gonna make a big difference. My point here is, if
this thing would work, performance always matters. Sometimes you don't
actually have to do anything to get good performance. And when people say, well
performance doesn't matter, I think what they really mean is, it's already good enough, we didn't have to do anything special, but it always matters for everything. So there's this rule, that I have always heard, tune, don't optimize. Measure it, find the 5% of the program where it's spending 95% of its work. I claim that this is
necessary, but not sufficient. This was formulated to keep
programmers from writing squirrely messes in the idea that, oh we'll make it go faster
when they didn't need to. But I feel that in most cases, if you write a big complicated program without paying attention to performance, you'll end up with a big program that doesn't perform very well and that you can't really fix. I think you have to
always be thinking about efficiency in your coding. And in fact, in the keynote
a couple a days ago, Mark Elendt said this really
nicely at the end of the talk, he said, "performance
is hard to retrofit." I said, yeah exactly. But optimized code is ugly, right? In C++ I think optimal code
not only can be elegant, but is the most correct code, and so is the most elegant by definition. And I think the most elegant
correct code should be our idiom for what is the way
we should be writing code. And C++ gives us the tools that
if something really is ugly you can hide it behind a
very elegant abstraction. The chronos date time library does this. This happens all over the
place, matrix operations. So another way to say this by the way, is rather than always write optimal code is you could say, never pessimize. Be sure that you're not
pessimizing your code ever. So let's start by talking about
memory usage a little bit. Why does this? So I grew up thinking that floating point operations were slow. Golden rule used to be,
reduce floating point divides, count your floating point divides. Get as many of 'em out
of there as possible. Well somewhere along the way, I actually did some
measurements and discovered, whoa, this isn't true anymore at all. Floating point's just as fast
as integer if not faster. It's not an issue anymore. What is an issue is caching. Caches are big these days and they're much faster than main memory, but they're not big
compared to main memory. And they get faster, but when they get faster they get smaller. So locality of reference
is very, very important in modern computing, which
means that size matters. So cache misses are the
new floating point divides. So theorem one, avoid cache misses. Dynamic allocation is the
most expensive thing we do if we're not talking about disk operations or internet operations and deallocations are probably
the second most expensive. So we wanna reduce our allocations, our dynamic allocations
as much as possible. Count your allocations. So we're not counting floating
point divides anymore, we're counting allocations. Most of the standard containers and almost any use of a
smart pointer of course is using the heap as a dynamic allocation. We need to think about
these things as expensive. Furthermore, heap allocations
tend to be all over the place, so the locality of reference is terrible and we get caches misses. So theorem two is avoid
dynamic allocations whenever possible, and you can't always avoid them of course. Static allocation on the other hand is awesome right, because when you put things on
the stack for example, the stack is contiguous, so it has great locality of
reference, it's always in use, so it's always in the cache. Unless something is pretty big, if it has local scope and
fixed size and all that, it should be on the stack. Use the stack. And I took out, I had a
slide about registers. The compiler generally takes
care of that for you though, so I'm not gonna talk about that. Obviously, registers
are the fastest of all. So something that C++ makes easy and that certain other
languages one might mention make impossible or hard
is embedding things. So if we want good locality of reference, we want small size, we want to create objects by composition not by reference to other allocations. So for instance, you
have a contact object, you wanna list of phone numbers. Do you use a vector, which
means a second allocation or do you use an array which means that it's internal to the object? Well it might be worth wasting, it would be worth wasting
some space in an array that you're not always gonna use, as long as you can pick a
maximum number of phone numbers to embed the object. So in other words, the lower thing, not the upper thing. Because you're gonna have much
better locality of reference, much faster allocation and deallocation, 'cause there's only one, much less space because of
the overhead of allocation. Sadly we don't have a
fixed capacity vector, which would be perfect for this, or what I call a short
string optimization vector. But that's something
that is being discussed for the language, so we may someday. And of course, squeeze the air out. You don't ever do this do ya? Don't ever do this. Sure that's the order of members that makes the most sense when you read the source
code and looks pretty, right, but really what you want is this, down from 56 bytes to 32. And share space. So unions used to be kinda
useless before C++ 11, but they got fixed. You can put almost
anything in a union now. And you can share space very easily. Now unions aren't safe,
but they're reasonable if for example you have a situation where which member of the union is active is never gonna change for
the life of the object. And that happens quite a lot and that allows you to put
things on top of each other, which is even better than putting them right
next to each other. If you want safe, we now
have C++ 17, variant, which is a safe union. Works very nicely, it's a wonderful tool. But you're trading a little
bit of space for that safety, so you're not going to save four bytes by superimpose two ints using a variant, 'cause it has a type tag that's probably gonna be eight bytes in addition. But it's a great tool
if you're talking about cases where that much size doesn't matter or we got large things in it. So one of the reasons I
wanted to do this talk was that I'm always
wondering how to pass things. Like what's the most efficient way? What's gonna produce the tightest code? So simple things, pass by value. By simple things, I mean
builtin types for example, or your simple types, small types. Value semantics are great, just remember you're making a copy. But for simple things,
that doesn't matter. Also, if you're gonna modify, if the function is gonna
modify the thing anyway, well pass it by value and
then modify the argument. No reason to pass it by reference and then assign it to the local variable. That's doing the same thing in
more code, more source code. Or pass by shared_ptr of course to, oh pass shared_ptr by
value to share ownership. But, if you're not sharing ownership, pass by const reference. Pass most other things by const reference. There's no penalty for
passing, like an int, a builtin type by const reference. I measured this. It's just as fast. So if you're writing generic code, pass by const reference,
don't worry about it. It'll be just as fast as
passing an int by value and it'll do the right thing, when you're passing your giant objects. So Nico had a great
presentation a few years ago where he pointed out
that passing shared_ptr by const reference is much faster than the mechansim that will occur when you actually pass it by
value and make another share of that thing. So if it's a read only access and there's no ownership, so all you wanna do is observe this thing that you have a shared pointer to, pass it by const reference. Obviously with any reference
you have to think about the lifetime of the referenced object. I highly recommend watching
that video by the way. Now your mother told you,
don't talk to strangers and don't pass by
non-const reference, right? Outies are bad, right? We don't want out parameters. Well, there's good
reasons for those rules. Makes code harder to
understand and reason about. Value semantics may be fast enough anyway and they're much easier to understand. But sometimes, it can
be much more efficient to pass by non-const reference. And I'm gonna show you an
interesting example of this. So what we have here is we want to, we have a vector and we're gonna go through this loop and each time through the loop, we're gonna clear the old stuff, we're gonna load it up with some numbers, and then we're gonna do something with it that I didn't include in the slide. And the loading of the
numbers is just gonna put the numbers from one to 1,000 in it. So the simple way to write this, the obvious way to write
this is pass it by value, count on move semantics to not make a copy and all is well, right? Well let's look at what happens here. What we're gonna look at is what happens the second time through the
loop, through the lower loop. The first time through the loop, I mean, what more can you do right? You're passing an empty vector,
so nothing really happens. You're gonna do this push_back 1,000 times so you're gonna get logged
something 1,000 allocations as rows and reallocates and
then it's gonna return it. And because you're returning a parameter that's passed by value, you're gonna get the
automatic move on return. But what happens the second
time through the loop? Well this is what happens. We started out with size 1,000 and capacity whatever my library
happened to make it this. And then we clear it, so size is zero, capacity's still there and
the we pass it by value, which it means it doesn't
move to this new vector, that's the parameter, the
argument of the function, and now the size is zero
and the capacity is zero, because when you do a
move, you don't move the, the whole point of the move there is it's empty, right, so you're
not gonna keep the capacity, you're not gonna keep this
allocation that's out there that has nothing in it. So you have to do 10 or whatever it is more dynamic allocations. And then you're gonna move it back with a move constructor
and a move assignment. And the next time through
you're gonna do it again and you get 10 more allocations. This is gonna run really slowly. So let's try something different. Let's pass it by r-value reference and let's move it in. So what happens here
the second time through? Well the size is 1,000, the capacity, we move it in, this size is, I mean the size is zero after the clear. We move it in, the size is zero. We've still got our capacity. Now we have zero allocations,
this is good right? But is there a problem here? Does anyone notice that the capacity is not 1066 any more, it's 1,000? Why is that? Well because we are not getting a move we no longer have a bivalue parameter, so we're not getting
the automatic move out, we're getting a copy. And the copy creates a new
copy with a fitted capacity. So this is not gonna
move as fast as it could if we have a copy constructor there on the return. If we do an explicit move, now we've moved constructor,
now our capacity is retained, and we're now getting
pretty close to optimal, but the code's a little bit messy and I mean it's a little
bit odd, isn't it, to move from, to call a function, to move out of something
in the function call, and then assign back
to that same something? Someone might think
that was a little weird. And it's still not as
optimal as it could be, because these are move are not free, and we're gonna talk
more about that later. What happens if you pass
it by non-const ref? Well now, your call site
is as simple as it can be. And what happens, again
the second time through? Well, the call, nothing happens because there's no construction,
it's just a reference, so of course your get the capacity. You do zero allocations
and there is no return, no return statement, so
nothing happens there. So this is optimal, but it does you a pass
by non-const reference. So I'm gonna gloss over the
return rules a little bit. There've been other talks
this week about this that have been very good
and it's complicated and I don't really have
time to talk about it today, but here's quickly some rules want you to keep in mind
for returning values. You get the automatic move
under some circumstances. You get the copy elision
under some circumstances. This is an area worth studying. And the key, the thing that
we were looking at just now was that you can return a local variable or a bivalue parameter, but
not a bireference parameter to get the automatic move. Don't ever move on your return, because that will inhibit RVO. And you don't need to, 'cause
you get automatic move. So let's talk about moving a minute. Now I'm not gonna introduce moving 'cause I assume anyone who's
here already knows about move and uses it all the time. But, (sighs) what I've discovered is that, and this is like common
knowledge these days right, ever since C++ 11. What I've discovered is that people who probably should know better
will saying things like, oh that's alright, it'll get moved so there's no efficiency issue. Well that's only true in a
certain class of situations because many objects get no
benefit whatsoever from moving. Simple example, if you've got an object with 1,000 ints in it, it's gonna copy 4,000 byte
whether you call it move or copy. Move does nothing for you. How about a string? Well string is a perfect
poster child move, right? You have a string that's
attached to your Russian novel and you get a move instead of a copy, that's a huge win? But what if your string
contains a zip code? Well the string is gonna store that in the short string optimization. And the short string optimization, the copy of the move and the
copy of a string of that size are exactly the same performance. I've measured it. So, the short string
optimization's a great thing. Doesn't make it any worse
that the move would be, but it costs something. And what if you have lots
and lots and lots of 'em? Well you're gonna have
to pay for that cost. So saying, oh it's getting moved, well it doesn't mean anything. You still are paying for the copy, 'cause that's what it is. So let's look at an example. So I'm gonna build up the string, get lots of other stuff, gets all kinds of characters,
your Russian novel, and I'm gonna do a
push_back into the vector. Well this is a problem because
this is going to do a copy. Excuse me, it's gonna do a move, but it's... See what's next. Now it's gonna, no I'm sorry, it's gonna do a copy. Yeah, yeah, no I had it right, because it can't do a move because the string is local, right, we can't eviscerate the string, so it's certainly going to do a copy. And calling emplace_back doesn't help. In fact push_back calls emplace_back. They do exactly the same thing. So how can you make this efficient? You can call move explicitly. So now of course after this,
s is in this weird state. So this isn't great coding style, but maybe this is what you need to do. Now you have a move and now
it's much, much more efficient. But what about this case? In this case the move as we just discussed,
doesn't help at all. It's pretty fast, but if
you've got lots of 'em, it might be an issue. This as we just said, doesn't
do anything different, doesn't help. So is there a way to use emplace_back in such a way that you
get an improvement here? Yes there is. This is the right way to use emplace_back. We're going to emplace it in the first place. We're not gonna have a local variable. And then we're going to
modify it as necessary. So this is optimal. Can't do better than this. You're constructing the
string from the car star directly in place in the container. What about this situation? Well this is more of the same, right? This here, if you've got a lot of 'em, it's not gonna go as fast. This doesn't help as we just discussed. It's already an r-value, move is just a cast
r-value, this did nothing. But this will make it go
significantly faster, potentially. So now for some Eastern philosophy, you can go even faster
by not moving at all. And we've already hinted at this. Some thing shouldn't be moved because they're too big or because construction and
destruction have side effects. If you constructor launches a rocket and your destructor blows it up, you probably don't want
any accidental copies and constructions and destructions. You may want the thing
to be stable in memory so you can have all
these references to it, pointers to it, whatever. If you have lots and lots of objects, even if they're small simple objects that don't have strange side effects, it may be very expensive to
construct them more than once. So you'd rather not move them and again emplace is the way to do that. You want to construct them emplace. Another thing you can do with emplace is insert a default-constructed
element without copying it. And if it's default-constructed, the chances are pretty
good that move and copy aren't gonna have different
performance, right? One of my favorite idioms is this one, this emplace_back(file). You're trying to read, you've got a serialized set of objects and you're
trying to them back, you call emplace_back
with your file object, calls the read constructor, builds it, it reads, constructs
it in place and reads it. No extra copying. You don't even have to call
the default constructor. So when not to use emplace. Well if it's copy any
way, like built-in types, using push_back for
example is a simpler tool, it's been around longer. You could argue that you should use the simplest tool for the job. Emplace will call explicit constructors, that's what it's for, it's
for calling constructors. Push_back will not call
implicit constructors because you have to,
explicit constructors, because you have to implicitly
construct an object. So if you give it something
that is not the object it will call implicit
constructors to convert it, but not explicit. You also can't use aggregate
construction with emplace. I think this is gonna get fixed. I've heard this is gonna get fixed, but I don't know the details of that. Another way to avoid moving, avoid moving or copying
of any sort, is splicing. So you all know about splice, right, nothing's invalidated,
just move things around between different lists, let's say. All right, you all know how this works. Go from this to this, very easy. But until C++ 17 there
was no way to do this with sets and maps, which meant a lotta things. It meant you can't change the
key of an element in a map. There was now way to do that prior to 17, without removing it and
putting a new one in, which means allocation and deallocation. You can't transport an
element to another container. You always have to create new
elements and delete old ones. You also could not move
an element out of the set. You could move an element into a set, but then it was stuck there. You couldn't move it out, you
could only make a copy of it. Node extraction fixes all of these things. You can change the key of an element, you can transfer an element
to another container, you can merge containers. You can take an element
out and hang onto it. The container can die. Another container comes
along you put it in. And I'm gonna show you an
example of factories in a minute. So the way this works is, let's say we have a set or a map here and it's got object of value x and we wanna change that to y. The first thing we do is extract it. Now notice that it's not a
member of the container anymore. It's being held by this node handle thing. Now we can access the
key in a non-const way. We can change the key to y and now we can re-insert
it into the container and of course it may do
rebalancing or whatever, but no memory go allocated or deallocated, nothing got moved around in memory, just the pointers were changed. This is the code for
that, it's very simple. Don't ask too closely how
that key accessor works. That's a little tricky, but it does work and you
can do something like this, it's very simple. Now you need to move it
because is this auto, this node handle object, this
node handle type is move only so you have to move it
back in when you insert it. Mind you I'm talking
moving the node handle, I'm not talking about the element. The element never budges, but the handle is a move only type, otherwise is wouldn't, we
can't make a copy of it, 'cause then we'd have to copy the element, which is the whole point
of this, is to avoid that. And often you don't need to say move because the code is such
that you're in an a r-value, you can make an r-value context anyway. Here's an example of merging sets, incredibly simple, one line of code. Notice the result. The merge is going to put
all of the things from source into destination except, since this is a set, not a
multi-set, except the extra five and it left that in the source. Nothing ever gets dropped on the floor when you use these tools. And in fact, if you go to insert into a set or map and the insert fails, the return value of the insert statement includes an extra parameter
that's a node handle holding the node that was constructed and you can choose to ignore
it and it'll get deleted with the allocator, the
appropriate allocator, or you can do something with it, but nothing's ever just
magically disappears on you. So this is kind of a neat pattern. I haven't actually used this, but this facility lets you
make very efficient factories for elements for maps and sets. So let's say I want to have this factory managing the creation
of the ID for something and my map is a table. I've got an ID and a value,
which is a string in this case. I can create this factory and what it does is it has a temporary version of the container. I'm gonna emplace the ID
in the string in this case. So that's perfectly optimal. And now I have the element and it belongs to this temporary. I'm now gonna extract the element from this temporary and return it. The temporary gets destroyed. I've got this handle which is returned and of course is an r-value and then I just call insert
with that into my actual table and all it does is link up the pointers. So I think there's gonna be lots of interesting uses
for this sort of thing. I'm just gonna go through this quickly because this is, I
thought, quite interesting. This is a little bit of a side bar, but it uses some of these tecniques. I had this situation
where I had a foo object so I wrote it in the most naive way. Well this is a problem
because we're going to have to construct, we've already looked at this, we're gonna have to
construct a string temporary, and then move out of that
string into the string that's in struct foo. We can improve on that
by making it a template and now doing perfect forwarding
of whatever is passed in directly to the string in the struct and it will call the appropriate
constructor in this case for the char star. Some of you may notice that there's a correctness problem with this in that I'm not constraining it, my STR template parameter in any way, but that's easy enough to do, especially with concepts
in C++ 20 hopefully. But now I have a situation
where I have this class bar that has a vector of these things and I want an add function, 'cause I don't wanna expose the vector directly to the public. I just want people to call add and I want 'em to pass in the foo thing. Well, start out with this approach, but this is neither optimal nor pretty. The call site is ugly and of course it's going to do a complete copy of all this because, I mean, sure the string will be moved, but all the other things
don't have move behavior. And the string in this
case is a short string, so as we've discussed, it doesn't really have
move behavior either. So this is not only
inelegant, but inefficient. Well this makes it more elegant. Here I've just move the foo construction into the add function, but that hasn't helped
the efficiency at all. I can do it this way. That looks all modern, but
it's not any different. It's doing the same thing. So what's the right way to do this? Well, I can call emplace_back. Now this is fully efficient and it's gonna do the right thing, but now it only works
for half of my use cases 'cause I only take a char star, I don't take a string reference, constant string reference anymore. So that's no good. So I could write two of 'em, but I don't wanna do
that, so what do I do? Well this doesn't help, because now I'm back to the
same problem of inefficiency, so I do this. So this uses variadic templates and the beauty of this is that it's fully efficient, it's very elegant, the call site looks the way I want it, it's fully efficient, and best of all, when I change foo, I don't have to change this add function. So that was a little bit
of a tangential discussion, but I thought it was interesting. All right, containers. It's really important to
choose the right container for your job, every time. Every container has a purpose and the containers have
different trade offs between speed and size
and all that, convenience. Committee is looking at new containers that offer different choices of speeds, space and convenience. The point is, there are no fixed rules. You should use this instead
of that, instead of that. You always have to pick the
right container for the job. But the question is, how do you do that, what's the right container? Well you have to measure. So let's look at this. First thing everyone asks is, when should I use vector? Well I actually claim that's not probably the right question. I think the right question is, when shouldn't I use vector? Vector is the go-to for dynamic storage. Other containers offer special properties, better performance sometimes,
more convenience sometimes. But you should always start by asking, shouldn't I be using vector here and we're gonna see why. Vector has contiguous memory,
so it's fast traversal, it's random access, excellent
locality of reference. But growth invalidates
everything, right, if it grows. If you know the final size, or you know a number that is guaranteed to be more than the the final
size, you can pre-reserve. And you pre-reserve and you are sure that you're not going
to to beyond that value, then it's not gonna reallocate. So suddenly, vector has
all of the best properties, 'cause it's not gonna reallocate, nothing's gonna get invalidated. And by the way, you can
always shrink it later, which may or may not do anything. I mean shrinking is kind of a hen in terms of actual memory use
for your operating system. The biggest problem with vector is you can't use 'em for large data. If you've got four gigabytes of RAM and you've got a two
gigabyte vector and it's full and you wanna add one more
thing to it, you can't. So that's a problem for
certain applications. Array, like a C-array has
all the best possible things, but it's fixed size, right? The biggest thing about
array, is it's local. So how do you choose? Well, if the size is gonna change, you gotta use a vector, at the moment anyway. I use C-array for really
simple little situations. And I use STD array if I
want container-like behavior. Another reason is that with STD array, you don't get pointer decay behavior such as you do with C-arrays. Deque, I love deques. Deques are great. They have clumps of contiguous memory, so the traversal's pretty fast. They have random access. Growth only invalidates iterators, not pointers and references
that you keep to your stuff and they have linear growth behavior, so the work very well for large data. However, they have
tremendous amount of overhead if the size of the thing is small. So they're useless for small things. So it's pretty obvious. If you need big data and you want those other
behaviors, use a deque. However, pushing into the
front of a vector is faster than pushing into the front of deque. Faster that pushing
into the front of deque until the size of the
container is quite large. This is surprising. We're gonna see some numbers in a minute. And vector's much smaller. List, there's cases where
list is useful, but not many. It's node-based so the locality
of reference is terrible, there's all this overhead, slow
traversal, no random access. However the nodes never move,
growth invalidates nothing, growth anywhere, middle, ends, right? And they're perfectly linear, they grow one element at a time, so you can fill 'em up until
you hit the end of memory, but their size is large, so whether that's really
gonna be a better solution is not clear, depends on your situation. And if again, if it's small,
they're very wasteful. So let's look at some numbers. So what I did was I had a element that contained a container and the inner container was either 10, 100, or a 1,000 in size and I filled it up from the front, which is the worst case for vectors. And I wanna say that I then read, I then refilled, I can't remember. It says in the fine print. That's for when you read the slide later, but I made it the worst case for vectors. And then I did 10 million of size 10, one million of size 100,
et cetera, et cetera to keep the numbers in the same range. So look at this, vector
blows away deque and list, mind you filling it from the front, so every time you put something in, it's gotta slide the rest of it down. The size is much smaller
and the access time is considerably better than, is noticeably better than deque,
and much better than list. Go up to size 100, now deque
has pulled ahead a little bit, but look at the size difference. It's only pulled ahead a little bit and list is still trailing. We get up to 1,000, now
vector is starting to do a little bit worse than list and way worse than deque, but
the size is sill very small. So this is, I mean 1,000,
think if somebody asked you, would you use a vector for a 1,000, for something that was 1,000 long and pushed to the front of it, you'd say, well no, of course not. But maybe you would. Maybe the difference in
performance isn't as important as the difference in size. This is why you always
have to figure it out for each situation. Set and map. This should look familiar, it's
the same slide as for list. Has all the same considerations,
because they're node based. So what's interesting is, arbitrary insertion
into a vector is faster until the size is quite large and maintaining order is faster until the size is quite large. You can use binary search on a vector, it's just as good as looking
something up in a set. In fact, it's probably faster. But what's really interesting
is for smallish containers, which are very common I find, linear search is faster, faster than binary search, which means you don't
have to keep it in order. If you have a list of 10 things, don't keep in order, put it in a vector, and just do a linear search. It will be faster than
any of the other ideas you might have, like oh, we
got look things up in it, it's gotta be set. Use vector. So the test I did here, I had an element which looks
like this for testing set and I just threw in an id for something just to make it a little more realistic. And then I had an element
which looks like this for testing vector, and I created this insert which does an ordered insert. And then the numbers I
ran, again, I put things in in the opposite order, in reverse order, so it was worst case for the vector test, vector's worse case. And again, at size 10,
vector blows away set in time and of course in memory. What's weird is that at size 100, the difference was even greater. I would have thought it would
get progressively worse, but at size 100 vector was
an even better solution. And of course the size
is still, is actually, I guess about the same
ratio, maybe a little better. So you have to measure. This stuff is not intuitive. Get up to 1,000, now vector is slower, but look at the memory difference. It's not much slower. It's not even twice,
so you have to decide. So theorem number three. Vector is usually the right answer for an awful large class of problems, except for it's not, and then, you know, there may be a case where
vector would be impossible, or much, much, much slower than using a map or a set. Or maybe performance doesn't, really doesn't matter in that case and you've measured to prove that and the map or the set
or the list or whatever is much more convenient. But again, you gotta check. And so we have some time for questions. (audience applauds) Just yeah, line up at the mics. - [Man] So in one of the
first few slides you said, prefer stack allocation
when objects are smallish, but you never defined smallish. - Well that, I suppose
depends to some extent on how big your stack is. I mean, I've put some pretty
big things on the stack. I'm sure you can look up
what your stack size is. It depends on your environment. - [Man] Is it a ratio of the
maximum size of your stack? - The question was, is there a ratio? You know, I don't actually know. I've never ended up pushing, I've never ended up pushing
the stack to the point where I had to go figure that out. So for my environment I don't know, but I'm sure you can find
that out for any environment. And you know the stack, if you make the stack frame big enough, you're gonna have caching issues, right, as you go back and forth to the function. So you have to sort of think about. When I talk about small things, I usually mean small
as in bytes, kilobytes. Your stack is unquestionably
megabytes at least. And you know, maybe
someone else will come up and answer that, 'cause I don't really know the right answer. - [Howard] I've made a terrible mistake. My wife informs me that she types 125 words per minute, not 90, so. - Oh my gosh.
(Howard laughs) I got redo the whole talk.
- That's right. Nice talk, thank you.
- Thank you. - [Audience Member] Two quick points. One is beware in Microsoft environments, the default stack size is one megabyte. And I work in a native C++ library that's called from .net code and of course .net's
never heard of the stack and we learned the hard way, that you got be really
parsimonious with stacks base there because we unfortunately
could not get the default stack size overridden in our environment, just due to political factors I suppose. Anyway, my question is, you mentioned that fixed-sized vectors are in the pipeline at some stage. I got the impression it
was kind of preliminary. Are we talking maybe in C++ 20 or 23 or for some n greater than 23? - I'm pretty sure not in 20. I hope that there's gonna be
some new containers in 23. I'd like to, there's
one I'd like to do to, because I think that there's
just not enough flexibility. Vectors, you know, if
you extrapolate this, you realize, oh, there's
some other things one can do that'll make it go even faster and even more useful and more flexible and we just can't do that with vector. - [Audience Member]
Well in doing away with the reallocation possibility
is just so helpful for stability and pointers
and references and things. So, okay, thank you.
- Sure. - [Participant] Have you considered comparing with an ordered
set, an ordered map in addition to just set. - Thank you, you know I should have, I got asked this last time I did this and I should have put
it in the slide, but, I didn't measured unordered, because I wanted to make a point, not give a complete... The point is you have to go see. So I didn't measure unordered. They're going to have the same relationship to
vectors for small sizes. It's not node, well, (laughs) standard STD unordered
containers are node-based, so you're gonna have the
same problems that you get. If you have a custom one
that's not node-based, you're gonna have different
performance characteristics. And again, you've really, you've got to measure for your situation, is really the point. But I didn't measure the unordered maps. Because they're node based, I
suspect it would be similar. - Thank you.
- Sure. - [Male] Hi, I was looking
at your factory example for putting things into, what it is a map?
- Yes. - [Male] Of the advantages of C++ 17 is with something like try_emplace if you pass the map in by reference, you can even make it a static
generic function or something, you can do try_emplace and that way you don't actually have to, say if you increment the id number, and then it turns out that there's already something
like that in the set, you're sort of committed and the idea is with try_emplace you can either go forward
and do something efficiently in the factory or you can
decide that it was inefficient and then you don't lose the
thing that got moved into you. You can pass it back out again, sort of like you were
talking for other case. - Right-- - [Male] Emplace and maps
is really a weird thing. That's why they have
these irregular methods, like what is it, insert
or assign is another one like that, that's tentative. - Right, yep. That's solving a slightly
different problem than I was addressing
here, but that's true. I haven't actually used try_emplace. - [Male] Well emplace has been sort of questionable all along especially for move only things, because unless you're in a vector where's there's no possibility of that sort of thing happening, that you never know whether or not you're gonna use the
thing you just constructed with an emplace, and a lotta
times it ends up being worse, because you end up creating an item, and then because there's
duplicate, throwing it away. - Well sure, but if
you didn't use emplace, that would still be true.
- I suppose. - It's not that emplace made it worse, it's that it didn't
make that issue better. - [Male] Well it does help with move only, because a lotta times they can give you the move only parameter right back, and you can do something else as you write.
- Yeah. - Anyone else? 25 seconds. Anyone else? Thank you. (audience applauds)