All right, it's 1:30, it's time. Thank you everyone for coming
back from lunch through rain. I'm here today to talk about bugs. My name is Louis Brandy,
I work at Facebook. I have brought to you a selection of bugs. I want to give a tiny bit
of context here before, just to give you some
background on why it is I have opinions on recurring bug patterns. One of the questions I get a lot, I've been getting since
I've come to CppCon, is does Facebook actually
have a lot of C++? Like, don't you guys just use PHP? It turns out we have an
enormous amount of C++. We've actually had enormous amounts from effectively the beginning. Facebook infra has been built on C++ effectively from the beginning. It's the second most used language behind our PHP hack codebase, and our PHP hack codebase
runs on something called HHVM which is a C++ process, which sort of lives in our repository. There's other languages
and infra at Facebook. Python, notably, is definitely second, Java, third, and then everyone's favorite language, et cetera. There's actually quite a
long tail of languages. Let's see Clicker. So our team is effectively the
core C++ group at Facebook. There are two groups in this group, two logical sections. One is very tool-focused,
one is very codebase focused. So the tool part of our
team works a lot on things like the compilers, the sanitizers, the build systems, and so forth like that. The code-based part of our team deals a lot with the core libraries, folly, we literally have a
directory called common, right. And so because of these two things, like the sanitizer is one half, the core library is another half, we're often in escalation
of point of both. Me personally, I'm involved often in our site event review meeting. So when something crashes the site, especially if it involves a C++ bug, I see it. So I have a very rich data source. Sometimes it's crashed the site, sometimes it's, hey, is this an Address Sanitizer false positive? It's like, no, no it's not, you have a bug in your code and it's subtle and it's here. And so this is my data
source for this talk. So obviously my agenda is bugs. I brought some bugs with me. I had some criteria for the
bugs I wanted to show you. I want them to be scary, so like a syntax error or whatever
is not interesting. Okay, so these are things that are scary. I want them to be common and recurring. So these are things that keep happening, over and over, despite our best efforts. They're also not the list of the craziest thing I've ever seen. Okay, that's a different talk. And they're concrete, so I
have bugs that fit on slides. Like I'm going to show
you C++ that is broken. So it's not going to be like libraries that have bad ownership
models cause lots of bugs, which is true, they do, but that's abstract and
not super interesting. I'm going to talk about mitigation, and all the tools we have. I didn't intend for this to become a recruiting event for ASAN. But it turns out that ASAN
is actually highly impactful on almost every bug we're
going to talk about today. Almost every bug. So we'll talk about ASAN a fair bit, but also all the other tools. And, just as fair warning, especially since there's going to be a lot of talk about sanitizers here, gcc/clang Linux bias from me, especially in the tooling part. Those of you that use other tool chains, hopefully have equivalent technologies. If not, you should get some. Okay. All right, when you
give a talk about bugs, it's actually kind of terrifying, because there's two different groups of people you might be talking to. One is audience number one, I show you something you
have never seen before, okay? I save you from a terrible
day, with any luck, like if I'm interesting enough for you to be paying attention. Right, so I'm gonna show, I'm hoping I'll show at least everybody one thing that they've never seen before. If you've seen it, you forgot, and maybe we'll flesh it out a bit. For some of you, like if you're a student or something, maybe I'm
going to show a bunch of stuff you've never seen before, and you might actually, I might save you like a week, the worst week of your career. Audience number two, I'm going to show you a bug
you've seen a hundred times. Almost always you see other people making, you're the one in the code reviewing whoa, whoa, whoa, whoa, whoa, there's something you
don't understand here. And of course if you're in audience number two and you're an expert, you never make these mistakes yourself. Never, never, never. This never happens. But you and I are having a
slightly different conversation. So our conversation is why
do these keep happening? What can we do about it? Can we fix the language, can
we make the tools better, can we do education better, right? But what can we do about it to make some of this stuff go away? All right, so enough gibberish. Enough philosophy. It would be intellectually dishonest to not mention the single biggest cause of bugs at Facebook, okay. It is the square bracket
operator of a vector, okay? I put a bunch of stuff in this bucket because frankly C index
operator is the same thing and frankly pointer dereferencing
is roughly the same thing. This is probably the single biggest cause of problems in probably every codebase. Right, it's the most
important bug most of us have. But for today's purposes, I'm going to declare this uninteresting. Only for today's purposes, okay. And the reason I declare
it uninteresting is it's not really C++ self-inflicted wound. We, computing has this
problem, in general. Every language is built
or not built around it. And so I think this is actually a hard inherent problem in what we do. Most of the rest of things I'm going to talk about are
not really inherent. They were self-inflicted. So yeah, if you wanna crash this site, don't check the bounds of your vector. Start indexing into it. You'll definitely crash
the site eventually. Or better yet, integrate through two vectors and assume
they're the same length. So mitigation. So let's, we'll practice now,
we'll practice mitigation. I've declared this uninteresting, but let's talk about it. Static analysis has been studying this problem for a really
really really long time. It's sort of the queen mother of hard static analysis problems. They've made a lot of progress, but the good static analysis
to find this is expensive. Interprocedural, et cetera, et cetera. A lot of languages, and
C++ is headed in this way, change the abstractions. You don't wanna do for loops, you wanna do range-based algorithms. If you do range-based algorithms, often times you can hoist any bounds checking outside the loop. And of course you can
do things that run time, you can literally bounce check
every single array access. This is what, like, Ada does, and has done for a really
long time for safety. But as far as I'm
concerned, at least in C++, your best chance of finding
these is Address Sanitizer. Address Sanitizer was built for this. If you really really really don't want out of bounds problems in your code, you need to have fuzzing
with Address Sanitizer. I'm not going to talk about it, there's many talks to go talk
about this particular problem. But if you take nothing away from today, what you need to take
away is fsanitize=address. Clang and GCC, again, specific, but Address Sanitizer is probably the most important thing that's happened, at least in our tool set, in a long time. All right. So, bug number two, the
greatest newbie trap, in my opinion, I'm
nominating, in all of C++. I'm curious, by the way,
before I tell you what it is, if you have a personal nomination and you think you've got me
beat, come find me after. I want to hear what you think is the greatest newbie trap, in all of C++. In my opinion, the greatest
newbie trap in all of C++, so again, before I do the big reveal, I'm going to sort of
increase the chrono ... I'm going to put these
in chronological order, so this is sort of the order in which you will probably encounter
them in your career. So hopefully, many of you
have seen this already. We'll end on something a little more hard. I believe it to be maps operator
square bracket operator. Ooh, I heard whispers. Yes, I've called it out. So, I hope most of you
know where I'm headed, but if you don't, let's be clear. Map has this property that ... So in this case I've
set hey to 12 in my map. And then I try to print
it out and I typo it. Okay, I've typo-ed it, so I'm trying to print out a key that
doesn't exist in the map. Most of us know what happens here. It prints out zero, right? And you're used to that,
that seems normal to you. Show this to a programmer
who doesn't know C++ well and ask them what they think this does. The actual answer was no higher
than third on their list. This doesn't crash, this doesn't print out an initialized memory, this
doesn't throw an exception. What this does is print out zero. And it actually inserts
something in the map. So the size of the map at
the end of this is two. Okay? If you don't think that's weird, you've been doing C++
for a really long time. Yes. We won't think twice about using this. This is an occurrence count, right, like a really dopey small one. Like loop through the string, and in my map, increment
for every character I see. Okay, you might write something like this. The Java version of this is horrible, because you have to initialize
your variables, right? And C++ just gets kind of,
the integers get initialized, they just get initialized somehow. Don't let a beginner C++ programmer stare at this for too long, they start asking really
complicated questions. Like, wait a minute, you don't
initialize any of your ints. Nobody sets anything to zero. How does that work? Does that work with normal ints? No, no no. No no, it only works because of value, oh I don't want to talk
about value initialization. Right, not with someone
who's falling into this trap. I want no part of like, well see it's like a default
construction of an int, even though ints don't
have default con ... Okay, never mind, never mind. Just, next slide, next slide. Okay. So, we actually protect our sort of beginner programmers from this because we drill into
them const correctness. Be const correct, it
will save you from bugs, this is one of those
bugs that saves you from. This code will not compile, okay? You have a settings map that
you're trying to read from, with a square bracket operator, but the settings map is const,
so this won't compile, okay? Because the settings
square bracket operator might need to insert,
therefore it won't work, right? No, yeah I ruined my own joke. How does a newbie fix this code? If you paid attention, you know. How does a newbie fix this code? - [Audience Member] Remove const. Yes, remove const, thank you. Yeah, you remove const. Guess what, it works. I know he told me to be
const correct, but it works. Yeah, so I was sitting with someone who was struggling with something else, they were relatively new to Facebook, I saw code that was exactly like this, and I said, whoa whoa whoa,
why is this not const? I already kinda knew why. And they're like, well, it
doesn't work when I put in const. I'm like, yes, let's put it in, let's look at the error message. This is the error message. Error: passing const
std:: map dot dot dot, as this argument discards qualifiers. So as a C++ expert, you
know what that means. You can interpret that. A newbie sees "error
passing const," right? The error literally says, there's an error passing this
const map, into this thing. So I should remove it, it works, right? Now of course only newbies
fall for this mistake, right? Of course. Twice, twice as far as I
know, major site incidents have occurred in exactly this pattern. Experienced, very experienced C++ teams had a widget, in this case it
was like a service handler, in its constructor, it
took a map of settings, which is already gross, already gross. And they assigned the data member settings to the settings that gets passed in, and then they print their settings. Server initialized with timeout, whatever. Well, what happens in a refactor if you pass in a map that
doesn't contain timeout, well, we've just helpfully,
in our print statement, in our print statement,
set the timeout to zero. In most socket libraries, a timeout of zero, infinite, yes? So it's a really excellent way of making, like, one service is dying,
and now this service, all of its threads are
in infinite timeouts. But of course, you know, everyone tests all their timeout code
paths really extensively, so this doesn't happen, yeah I know. So here's the bad news
about this newbie trap. It's actually the one we're
the most powerless against. I have nothing to help
you, I have nothing. Like people have legitimately
suggested we just ban it. Just ban the square
bracket operator of map. Const correctness helps to
some degree, but not always. So I got nothing, if
you have a clever idea, let me know, come find me, okay? I'm interested. (audience member shouts indistinctly) What? We'll call at, yeah but, so, that's good if you know
that, but they don't. People complain about at for different reasons, that's
a different talk, okay. So, good, at, at is one
solution to this problem, he mentioned at, because at will throw if the key's not there. Most languages have an equivalent or rather an alternate mechanism
which is extremely common, which is this sort of get with default. So you call it on, you can call
the const map with the key, and if the key's not
there, instead of throwing, return the default argument. C++ doesn't have one of these, turns out it's actually kind of tricky. And this is the official
actual implementation in GHC Haskell, of this exact idea. This is a real programming talk now. We have Haskell in this talk. We've put away the toys,
this is Haskell now, right. Welcome. This is the actual
implementation in Haskell, it says, findWithDefault, default k m, they're in reverse order in Haskell, okay. Says look up the key in the map. If you find it, or rather
if you find nothing, return the default, if you
find an x, return the x. That's what it looks like in the prelude, or rather in the GHC standard library. This is what the folly version of this exact thing looks like. So we have respectively
this same function in folly. It's not super fair to compare this, because most of that
noise is type information and Haskell has type inference. But this is a template that takes a map and a key and the code down
at the bottom is really like what you'd expect, like find the key, if it's not the ended narrator return it, otherwise return the default. That's what the sort of
last two lines look like. It's very straightforward. There is something in this
slide that C++ programmers hate. They hate it, they hate it, they hate it. (audience member shouts indistinctly) Oh, I love this man, okay. So I had a whole separate
slide to build it there, which is, I moved all the template noise to make it more obvious, I said let's get rid of all
the template nonsense, and I said let's just do
the string version of this. So this is the exact same slide except with string substituted. So I made it much more obvious now. It returns a string by value, that means there's a copy
here, an unnecessary copy. And when I was proposing this talk, I was saying, hey, give me ideas for bugs that have been recurring,
so many people mentioned, oh, having extraneous copies
is a really common bug. Oh, it is, isn't it? Well how do you fix it if
you hate extraneous copies? Easy, right? We know how to fix it, we return
a reference to the string. This works great if the key is in the map, I don't have to copy anything
out of the map, right? I don't have to anymore, I'll just return a reference to what was in the map, and there's no more extra copies. This is hopelessly broken. This code is hopelessly broken. Do not copy and paste this code. It turns out, right, oh yeah, this code is hopelessly broken. Congratulations, we
just crashed this site. It turns out that people
love default temporaries. They love default
temporaries, this is in fact the way this function
is almost always used. On the right, you're
creating a default string, and if you return that by reference, because the key's not in the map, v is now a reference to a temporary that has now been destroyed. Yes? This patch that I just showed you has been submitted like four times, and I've rejected it 3 of those times. One time, I shouldn't say I, someone rejected it three times. One of those time it got in,
and we're like, oh no no no. I need to do negative
caching of gifs and say, if anyone submits this
to just auto-reject it. Okay, people want to make
this change so badly. But this is a broader class of problems, this is a much bigger
problem that's well known. There's many creative ways to do this, to pass temporary to an object and then somehow return a reference
to it, to the caller. This is a really hard problem to fix, I think, in the language. There's some library
tricks you can do often to make it hard, harder to do this. But at some point the
language has to fix it, and I don't know if it can or will. But it can be mitigated, so again, Address Sanitizer catches
this, catches it cold. The only problem is, you
often need this extra flag. So default Address Sanitizer,
at least up to some point. Didn't have this on by default. I believe it's now turned
on by default in Trunk, I don't know which version
that's actually going to ship in. But when I turned it on,
I had to add this flag. And when I turned this flag on for ASAN, we found dozens of bugs
of this form, dozens. It was bad. All right, bug number four. So this is intermission. How are we doing on time? So the intermission is interesting, because I wanted to
tell a story that works, so this is not a recurring bug. This is a bug that is not recurring. This is a bug we killed. It was horrible, it was a
constant source of fighting, and we actually fixed it. It involves the worst
keyword in all of C++. Who knows it? - [Audience Member] Auto five. Auto, oh no, what else? - [Audience Member] Go to. Go to, oh, that's a good
one, but no, not go to. What else? - [Audience Member] Volatile. Volatile. Volatile, oh. Who knows what volatile does? That's an honesty check. Oh look at all these liar ... Okay, I'm just kidding. Yeah, volatile's a really scary one. Lots of people think
they know what it does. At least historically, and
use it really really wrong, really really wrong. We introduced what is actually
in practice a terrible idea. We throw a paragraph at you,
if you write the word volatile. Like our linter literally spews a paragraph of gibberish at you. It's kind of mean, but that's fine. It's not actionable, which makes it a terrible lint rule, but I don't care, if you use volatile I'm
going to lecture you. Almost every use of
volatile in our codebase had to do with threading and
trying to synchronize threads. Now there are legitimate
use cases of volatile that don't involve threads. They just don't exist in our
codebase for the most part. So you might live in a codebase where you actually have memory map diode and you need to do volatile. But we don't. So I had a really bad lint rule
by most lint rule standards, which said, if you write the word volatile and it's not in a comment I'm going to lecture you
on your dev, automatically. This really angered a lot
of people, it was fun. We'd get a post that was like, actually, I'm using it correctly, I've one thread, and there's a boolean,
and he's writing to it, and then the other thread's
trying to read the boolean and I don't want the
compiler optimization, so I make the bool volatile, we've been doing this for 20 years. You know, this kind of stuff. We had so many of these arguments, but the thing that's
interesting is that we won. We won the arguments. Like people have been at
Facebook for a few years, they don't remember volatile arguments. The Great Volatile Wars of
2010, they're gone, 2011. People stopped using volatile for threads, and I think if you pause for a second, it's kind of interesting, like, why? How did that actually happen? It turns out people wanted atomic flags or mutexes or higher, yet
higher order abstractions. And once we started giving it to them, somehow volatile just vanished, and we sort of trained the
old guard who had learned. But the new people who wandered into the language must be just finding what they want, the right thing. The use of volatile
synchronization mechanism seems to have mostly vanished. And it's, I'm not exactly sure why, but I think it's an
interesting case study, and sort of, people really want to do x, and they'll find a way to do x. And maybe if you just
give them a solution, they'll use it, turns out. All right, bug number five. This is the worst question in all of C++. This is the one that
makes my blood run cold. The way this usually works is someone posts in our internal group. They tell a story, the
story is always the same. It goes like this. We've been having an
intermittent crash for a month. It's really hard to reproduce, I can't reproduce it. What we've ended up having to
do is bisect it in production. We build a binary, we
run it for six hours, if it crashes then that's bad
and if it doesn't that's good. We've been bisecting for like 3 weeks. We finally found the diff and we don't know why this diff is broken. We've been staring at it and
ripping it apart and we don't, there's something here we
clearly don't understand. But we finally came up with a hypothesis. And then they ask you the question. It's only three words. Is ... Shared_ptr ... Thread-safe? (laughs) Is shared_ptr thread-safe? What's the answer? - [Audience Member] No. No, I like that. Any other answers? (audience member shouting indistinctly) I'm sorry, I couldn't
hear, it's not this ... You hear answers like this,
like yes, no, it depends. People love this answer, which
is totally useless but true. "It's exactly as thread-safe
as a normal ptr." Technically true, but useless. If you go to Google and you type in is shared_ptr, the top result. Is shared_ptr thread-safe? By the way, the third
result, is shared_ptr slow. No, no shared_ptr is not slow. If it's in your profile, fix your code. Is shared_ptr null is fourth. Yes. Yes, shared_ptr is null. Put a check in. Okay. But we're here about
thread safety, all right. Is shared_ptr thread-safe? So I've had a lot of time
to come up with an answer. I've had a lot of practice. I believe the answer to, is shared_ptr thread-safe, to be no. That is technically, I
believe, to be a lie. But I'm answering a slightly different question than the one you're asking. You asked, is shared_ptr thread-safe, but what you meant to ask was, is my code correct? And the answer, if you have to ask, is no. No, no. So here's my teaching sense, so you might have no idea
what we're talking about. You might be in audience one right now, going what are these people talking about, what do you mean thread
shared_ptrs and thread-safe, let me explain it to you. A shared_ptr, this is a mental model, this is not an implementation,
it's a mental model. It's three things. There's the actual shared_ptr struct, which is a pair of pointers, that's what it is under the hood. Doesn't have to be, but in our mental model that's what it is. It points to two things. One pointer points to the shared T, one points toward the
shared reference count. This is where the C++ people
get really uncomfortable. They're like, oh, it's
not a reference count. There's a lot of other stuff in there. It's technically called a control block, and it contains, you know, meet pointer, dah dah dah dah dah. But for mental model purposes, it's a reference count, okay? And the answer, or rather,
yeah, when you copy it, all you do is you copy the
struct with the two pointers, and they point to the
same two shared things, the shared reference
count and the shared T, and then we increment the
reference count, all right? That's how shared_ptr
works under the hood, that's the mental model. The thread-safety kind of
approximately looks like this. Anything that touches T is unsynchronized. The shared_ptr will not
help you synchronize the T. That's on you. If you want synchronization
at your T, do it yourself. The shared_ptr will, however,
manage the control block, and in particular the reference count. So if multiple threads
destroy shared_ptrs, they will both atomically
decrement a reference count. That will happen. But the one that trips people up tends to be this thing on the left. For some reason, people
want to read and write to a shared_ptr unsynchronized. It doesn't work. For the exact same reason a pair of pointers wouldn't
work from two threads. This is a thing that for some
of us seems super obvious, so I was testing this
with people at Facebook. Like, do you know this? Is this obvious, does
everyone know this already? And one person literally
went, that's the crash. That's the crash. We have a map, and the
shared_ptrs, that's the crash. And I was like, okay, I guess
that's going in the talk. I've changed the definition
of the colors now. I want to talk about our intuition now. So now I'm moving towards audience two. You hire lots of programmers and you're training them over time and you should know this
about their intuition. They get the right hand side correct. They naturally assume
that T is unsynchronized when playing with shared_ptrs,
they get it, just intuitive. They naturally assume
that the control block, the reference counting,
happens for them automatically. The thing they seem to get
wrong is the one on the left. People really want to read and write to shared_ptrs unsynchronized. The pattern that comes up over
and over and over at Facebook and I think that this is an artifact of the fact that we
write a lot of services, is this, it's a buggy form of RCU, and it's actually a degenerate
but very common form of RCU. RCU, by the way, is Read-Copy-Update, has a Wikipedia page that
was to talk about it. I'm not going to talk
about it too much here. But in our most common use
case it's actually very simple. You have a service, it handles requests, you have one thread who's
responsible as the updater. So one thread wakes up every
20 minutes or whatever, goes and reads some database, populates some in-memory efficient version of whatever's in the database, and then stores that into a shared_ptr as the one true most recent
state in the universe. The worker threads go into that one true shared state at
the start of their response, they pull one out, grab
a reference count to it, and say, okay, I'm going to use this state of the universe for this request. And then on the next request, I'll grab a new state
so you won't update me between whilst I'm in a request, but I'll pick up the
update after I'm done. And this is really, this has a lot of lovely properties
in a service, right, because updates are really cheap. You're not really blocking
anything to do your update. It has the downside, of course, that you might have
multiple states in flight, like whilst you're updating one state, someone might be using the old one, and you've gotta be able to tolerate several states in memory at once, to have this work effectively. But, and the implementation
here is actually really easy, all you gotta do is at
the heart of that handoff, have a shared_ptr and a mutex. And have an atomic get and an
atomic set and you're done. Right, it's not even that. It's literally the easiest thing to write. But what's interesting is
our intuition fails us. People keep writing the wrong thing, and then they have a really bad month. Finding that bug takes a month, if they don't see it or know it. So mitigation, yeah. So thread sanitizer finds this,
if you run thread sanitizer. Address Sanitizer often finds it as well, but it's not super helpful. Because what happens up happening is you end up corrupting the shared_ptrs and you destroy the wrong thing and it looks like memory corruption, and it's not always clear. Address Sanitizer will
tell you something's wrong, but it won't necessarily
aim you right at it. Now we get to the
library part of the talk. There has been a thing called
atomic_shared_ptrs in C++11. On paper, this solves the problem. Of course, most implementations
haven't had one. And then there was a concurrency test that added atomic_shared_ptr like this. This here, atomic_shared_ptr
like this, and then like this. And then we went back to
atomic_shared_ptr like this, like I heard something
about a vote in Toronto. I asked someone in SG1. so I tend to hang out in the evolution working group of the committee. I don't really hang out in the SG1 which is where they talk about this. I was like, can you give
me the history of this, because I'm going to give a talk, and I'd like to at least tell people what the deal is about atomic_shared_ptr. They're like, eyes glazed over, and apparently this is
a big long interesting, I don't know about interesting. This is apparently a big
long topic of back and forth of what's going to happen
with atomic_shared_ptr. I don't actually know its current state, I suspect we'll get something
like it at some point. It does solve my problem, but to be fair I don't need a highly performant atomic_shared_ptr
to solve my problem, I can get away with a mutex and a lock. And in fact, there is
also another proposal if this is interesting to you, about a cell type of an abstraction, which is effectively exactly what I want. That's also before the committee. Oh, bonus bug about shared_ptrs. We have this one all the time too. I don't know why people write this. I can't explain it. Some function returns a shared_ptr, they just dereference it, immediately, and they grab a reference to
whatever's at the other end. And now you call ref.boom(). So the problem here of
course is everything that happens after that,
the shared_ptr's gone. It's been destroyed,
because you dereferenced it. And you basically, you don't have the protection of a reference
count anymore, right? So all your code is now running without a reference count protecting you. So this one's come up a number of times, but again Address Sanitizer
does catch this one. Very easily. All right. Let's see. All right, last one,
this one's my favorite. Bug number six. This is code review, this
is going to be interactive. At least a little bit. I'm going to show you a series of slides with very little code on it. Very little code. And I have two questions for you. Question one, does it compile? Everyone's favorite C++ game. Will this code compile? Question number two,
if it doesn't compile, what error message would
you expect or even like, if you're being ambitious? So, does it compile, what kind of error message do you expect? Here is the code. Include a string. Void f. Standard string, parentheses foo, close parentheses, semicolon. Raise your hand if you
think this code compiles. So, for the purpose of
the audience, maybe 10%. Maybe 10%. But I pointed that because that's Richard Smith, right there. You're, what, the editor
of the C++ standard, right? Yes. The editor of the C++ standard
says this code compiles. Who's changed their mind? (audience laughing) Richard, unsurprisingly, and
surprisingly, is correct. This code compiles. Almost everybody in here got it wrong. This code compiles, what? What? Okay, take a minute, pause
the video, if you're on video, think really hard about how this code can possibly compile. How this code can possibly compile. And then think a little bit
more, like more existentially, like what are we doing, why are we here? I feel like I'm wasting my life a little. Okay, this code absolutely compiles, and the answer, and I'm not
an expert in the standard, but my understanding is,
the way the standard works, at least in this part of the standard, if something looks like a declaration, it is a declaration, no
matter what else it might be. And that looks like a declaration. So, this, and this, are
basically the same line of code. Those parentheses are optional. So what you're seeing, on the top, is a declaration of a string called foo. Okay? Now, that seems like trivia. That seems like that seems
weird, like, okay, how. Oh yeah here's me proving it, by the way. So you put it into Godbolt, and you can't really see this, but what you'll see in
Godbolt with no optimizations is the default construction
of an empty string, and then the destruction of the string. So you'll see the creation and destruction of an empty string, foo. So, okay, but the question
becomes like, so what? How is this possibly one of the most common and horrifying bugs in all of C+? And I posit to you that it
is, just not in this form. It's in a slightly different form. This is often what it looks like. You have an object that
has an update method, and this object is meant to
be used from multiple threads. So it needs to be internally synchronized. And so you of course have a
data member that's a mutex, and on any mutating method you have to lock your mutex, right? That's the way it works. So you lock your mutex, and
then you do the mutation. Right? Everyone sees the bug, right? Do you see it yet, who sees it? Only half of you, okay. Yeah, so I didn't name
my lock here, right? I am not, that unique lock is intending to be like a lock guard, right? I'm intending to, on construction, lock my mutex, and on
destruction unlock my mutex. That's what I'm hoping this code does. That is not what this code does. No, what this code does is it declares the existence of a
unique lock, named my mutex, that shadows the mutex data member that you were trying to lock. And to be clear, nothing gets locked. There is no lock that occurs here. It looks like it's locking,
but you want it to lock. But it's definitely not
locking, what you want to lock. And the fix is super easy. You've seen it a million times. You have to name the lock guard. You have to give it a name. Now, this is a unique log named g, whose construction locks the mutex and whose destruction unlocks the mutex. So your code is now protected
by a lock guard, right? Do a code search for this. This is the pattern, do it. Unique_lock<mutex>, if you have a code that's using relatively recent C++, do a code search for unique_lock<mutex>. GitHub search won't let me do it, so I couldn't do it on GitHub, but I figured since I'm giving this talk and I'm telling all of
you to search your code, I should search mine. Do you think I found a bug? Yes. This is what the actual code looks like, std::unique_lock<std::mutex> (mtx). That is not doing what
it thinks it's doing. That is not a lock guard. That is a shadowing unique lock that is not locking the
mutex it tried to lock. I didn't just find one bug, I found two bugs in our codebase. Here's the sad thing about this. I only found two bugs, which is okay, but we actually have a lint
rule to protect against this. So this is two bugs that got into our code despite the linter telling
them not to do this. Imagine how many we would have if we didn't have a linter
telling people not to do this. All right, now we're going to do the worst C++ quiz of all, all right? I've taught you everything
you need to know, there's no more tricks, I've taught you everything
you need to know to understand what happens next. So does this compile? So remember, this compiles, this compiles. Shockingly, perhaps, this compiles. Does this compile? Raise your hand, does this
compile, if it does compile, yes. Only half of you. So this compiles as well. This compiles as well. The first one, as we mentioned, is the declaration of a string named foo. The second one is a constructor. It constructs a temporary
string from a char star. All right, let's make it worse. Does this compile? I wrote it twice, it's copy and paste. Raise your hand if you
think this compiles. No, no this doesn't compile. (laughing) No, this is a redefinition error. This is a redefinition error, because if it looks like a declaration, it is a declaration, so this is the same as
starting standard string foo, standard string foo, and we'll say you can declare the variable with the same name and the same scope. So this is a redefinition error. All right. Does this compile? Look closely, what? We have a question? (audience member speaks indistinctly) It could be interpreted that way. It could be interpreted the second one is a copy of the first. But the rule is if it's ambiguous, and it could be a declaration, then it is a declaration. So this is a redefinition error. To be clear, I'm slightly
out of my league here, because now you get into
super-parsing of the standard, but as far as I tested
nobody will compile this, and I believe that's why. Because the rule says if it's
ambiguous with a declaration, then it is a declaration, period. And that's why you get
a redefinition error. Right, does this compile? All right, yeah, most of you've got it. Yes, this does compile. The first one is a declaration
of a string named foo, and the second one is a temporary copy constructor of the string. It's calling the string's copy constructor with the first foo,
which is doing it okay. All right. And then finally, just
for symmetry purposes, you grab the squiggly braces
and you pull them out. Does this compile? Yeah, all right, most of you, yes. Most of you say yes, this compiles. So the reason this compiles, this is again a second declaration of foo, but it's now in an innerscope. So the squigglies are a scope, and it's now shadowing the outer foo. You have two declarations of
variables with the same name but one of them's in an innerscope
shadowing the outer one. Right, that's what's happening here. So this works. So, some super important notes. There's actually a subtlety here that's critical to understand. There are two things that
make unique_lock conspire to have this really pathological bug-prone behavior, as
far as I'm concerned. The first is its RAII type. If you have the exact same situation with a non-RAII type, you tend
to notice very quickly, because you didn't name it or anything, so you'll try to use it or something and you'll be like whoa,
okay, I forgot the name. But RAII types, their mere declaration is often all the functionality you ... That is the functionality. I declare it, right, and
that means I've got one. So you're much less likely to name it, because you don't actually need a name, you never need to reference it. The second part that's
actually really important is this only works because standard_unique_lock has
a default constructor. Because I default construct
mutex, in this case. I have to use the default constructor. So unique_lock is affected,
but lock_guard is not. So if your standard
lock_guard will not compile, in the case of, if you
were to swap that out. Now, if you're using
standard types that's great, because standard unique_lock has this weird property you can
lint on it fairly easily. If it's like our codebase, you have lots of reimplementations of lock_guards from the years, right? Like, dozens of them. And very few people fully
appreciated this problem. So we found, I don't know, tens, maybe 100 cases
of our other lock_guards that did have default constructors being bugged in this fashion. So if you have a lock_guard, run, don't walk, to see if it has a default constructor. By the way, unique_lock does. C++17 has a thing called scope-lock, and it's entirely unclear to me if it suffers from this problem or not. This is complicated. As of yesterday, when
I put it into Godbolt, Clang will not compile it and GCC will. I'm not sure. So there's, scope-lock may or may not have this problem in C++17. So anyway, the TL;DR on all this is if you have an RAII type and it has a default constructor, you're in a very dangerous
situation, potentially, where people will try to wrap something and forget to name the object instead of what they're wrapping and they'll end up shadow ... Declaring a shadowing thing. So that's my other takeaway for you. All right, so mitigation. So, Wshadow is a, at
least in GCC and Clang, is a warning that's been around a while. It absolutely detects this, Wshadow, so shadow tells you
about shadowing warnings. That's when you have a
variable in an innerscope that shadow a variable on an
outerscope, it will warn you. If you put this particular pattern in and you turn on Wshadow, it
will absolutely find this bug. The problem, of course, if
you've ever used Wshadow, is it's extremely pedantic. It almost always gets turned off. It's full of noise and things that you probably don't want to fix, and makes the code maybe worse. There have been these hybrid flag suites, so there's shadow-compatible
and shadow-compatible-local. These are meant to be much higher signal versions of Wshadow, so shadow-compatible only will tell you about shadowing warnings if the two types are
compatible with one another. And shadow-compatible-local is only if they're local in the same function, and they're compatible with one another. I strongly encourage you to turn on at least Wshadow-compatible-local. You will find all kinds of, if
not broken code, smelly code. Like there's an outer loop using I and a bunch of inner loops using I, and it's like let's just not do that. Even if it's not bugged, you
don't want it to be, right? So shadow-compatible-local is good, you should turn it on, but
it does not catch this. Only Wshadow catches this bug. So, I'm sorry. The story there isn't great. But, you know, this means this
is easily detectable, right? Our extendable Clang based
linter has a check for this. It's actually fairly easy to detect. But it's actually not widely detected. If you don't have a tool
that finds this pattern, which is to say, unnamed lock guards, this sort of immediately
wink out of existence. I suspect you have a lot of these, especially if you have
default constructors on your lock_guards, or you
use unique lock in your code. Again, I'm not exactly
sure what the best ... I have some extra stuff here that I didn't want to get into in too detailed. There's a related class of problems which isn't this declaration issue, but it's more like you create something that exists for one line of code only when you expected it to live forever. That's a place where a nodiscard attribute for constructors might
be helpful eventually. So it's like two flavors of this, but the more common one
is the declaration one. So again, we have a Clang based linter, so something like clang-tidy, for example, could very easily find this. All right, so conclusions. C++ is hard. That's not fair, it's
kind of a value judgment, but we should just admit it, I think. C++ is hard. Like a lot of this stuff here, we've had some of the best C++ people spend a week finding a bug, and at the end track it down to, like, their lock_guard is missing a g. And like, that stinks. And the consequences of that are really, it's a painful thing to be bugged. It's an incredibly painful
day to find that bug. And it's a one character change. The patch is one character. Tools are our best teaching weapons. So, lectures are great. I would love to put
every Facebook programmer through a lecture that just taught them about all the bugs that
they should never write. Like, does it actually work all that well? The sooner after you write the
bug I tell you, the better. Everyone would pretty
much agree with that. If your IDE tells you you just wrote a bug, that's fantastic. The compiler tells you you just wrote a bug, that's pretty fantastic. If your dev review tool runs some expensive asynchronous thing and comments on your change set and says, oh that's definitely a bug, that's pretty good, that's pretty good. If you get an e-mail two days after you committed it because some monstrosity, like some supercomputer did some interprocedural static
analysis and e-mails you a bug, that's still not terrible. Not great, but not terrible. If I have a talk like this and two years after you spent a week debugging
it, I explain it to you, that's borderline useless. So really, for a lot of this, for my money, I want better tools. I'll talk in a minute about this, but the sort of Clang extendable linters have been really powerful internally in terms of, for a particular data type, having specific rules we can check, checking certain kinds of semantics. I don't want lock_guards that
shadow mutexes, for example. That's a really good rule. Clicker. Yeah I just said this. So yeah, extendable linters. So Clang-tidy is an example of this. We have one internally called how to even. Because you can't even,
they're clever, they're clever. An Address Sanitizer. Yeah, like I didn't really
intend for Address Sanitizer to be the solution to all
of my problems in this talk, it just worked out that way. I've already ... One cannot speak highly
enough of Address Sanitizer, and the amount of bugs
it finds in a codebase. If you live in a codebase that
compiles with Clang or GCC, and you've never turned
Address Sanitizer on, leave. Go do it right now. Go to one of the desks. I hope all of them are turning on Address Sanitizer right now. It's really, it's life altering. You'll find a dozen bugs in your codebase, running your test suite,
with Address Sanitizer turned on for the first time. You know, don't tell anyone
until you've asked for a raise, but then teach your team
about Address Sanitizer. You're welcome. Okay, questions? I'm done. (audience applauding) If there's no questions, I'm going home. All right. No questions. Oh, no no no. - [Audience Member] What
bugs didn't make the cut? What bugs didn't make the cut. Everyone kept trying to
say unnecessary copies, and I was like, I have a
huge bug that I keep having because you all hate
unnecessary copies so bad. So that one didn't make the cut. A lot of people want to do small things, like pass by reference
instead of pass by value. Just going to pick all the ones Nicholas just submitted, a few minutes ago. The other one that's really common, so I actually think one of
the most important bugs in C++ that we haven't talked about
that's actually really hard is asynchronous programming and lifetimes. So we have Folly Futures, and we have fibers, and we
have a lot of lifetime problems related to asynchronous programming. And other languages that have solved this, like have asynchronous programming frameworks have garbage collectors, which help them with
all the lifetime issues. So we have a lot of
problems related to that. That was a very popular nomination, but we didn't ... It doesn't fit on the slide. But I do think that one's
actually really critical. Asynchronous programming and
core teams will help with this, and lifetimes become a
really tricky balance. So that's a short answer. Okay, who's next? - [Audience Member] Early on,
you had a slide about a diff that people kept wanting to
submit on your get default. Yeah. - [Audience Member] Why
wouldn't you put a comment there saying, you might expect this
to be this, but don't do it. Yeah, I mean, so I have a
skipped slide, I won't go to it. No, what we ended up doing was making a overload to
that function that takes the r value default and
delete, equals delete. That solves most of the problem, and then put a big comment there. That's what we ended up doing. The actual more complicated lie here is that we've, or rather
the more complicated truth from the simple lie that I told. We have more than one of these. So the folly one we fixed, but there was a different one in our STL utils that was some ancient code, and that one had a bug in it that got committed without us noticing. So, that's what happened. - [Audience Member]
Will return an optional to a value save my map index operator? Will returning an optional
from the map index operator? - [Audience Member] Return
optional to reference to a value. Will it save this index operator? Yes, do you ... Yeah, so square bracket operator on map returning an
optional, on a const map. So I don't hang out in library, my suspicion is that that
particular thing is ... Everyone would like something
like that, eventually. It's complicated. For reasons that, oh,
optional's another good one. Just corner a library person and ask them about optional and all the
ways that it gets hard. But yes, I agree, that is, that's what Haskell does,
right, it returns a maybe type. So that's what you want, you want an optional as your
square bracket operator. So I agree, that's the right direction to head in eventually. - [Audience Member] Thank you. - [Audience Member] But standard optional doesn't support references. Someone who would know
better than I, yeah. - [Audience Member] So
my actually question is, will C++17 rules regarding
mandatory copy elision simplify or complicate the
concerns about too many copies? I don't know, what do you think? - [Audience Member]
That's why I asked you. (audience laughing) I think we're rightly
worried about copies, so mandatory copy elision is good. I just, you know, not all APIs, especially ones that take references are, can return references, right? That's part of ... I don't know. I don't think so. I think copy elision is good. On average, it'll be good. I think the broader problem of, it's really easy to smuggle a reference out of a function, is its own problem. I don't know how or if we can fix it. Do you think we can fix it? - [Audience Member] I'm
wondering if people will be less scared about returning copies so we just don't run into the problem, or if we start talking too
much about is this copy getting into a more
complicated spatial language? Yeah, well I mean, so the
example of get default, you have no choice, because
you have more than one source. You can't elide the copy. If it's in the map, you
have to copy it out. So in that case, it won't help, but on average you're probably right. It'll make it much less likely for people to be paranoid about copies. Dmitri. - [Dmitri] These linters
looks really useful. Do have plans to open source them? What looks ... - [Dmitri] Linters, like Clanger based. So, we built ... This is a long story too,
we built how to even, which is effectively
clang-tidying right before we had a clang-tidy that we could use. So we sort of built our
own version of this thing, and there's been great long
term discussion of, like, if we open source it, are we
just bifurcating the space. Should we convert all these checks to clang-tidy checks and go that way? So this is an ongoing discussion. Yeah, I want to open source stuff, like we should open source all
these tools, and we often do. We got to figure that out,
that's the real issue. It really is just a simpler version of clang-tidy, that's what it is. So I'm not sure how valuable it would be, in its current state. - [Audience Member] This is more coming for the audience actually, so when Facebook copies
100 bytes too many, it costs them one million dollars. Me and you, it costs us zero. You have so much bigger problems than just copying a string one too many times. I just had to say it out loud, sorry. It's true. But the problem is if
you're a library author, you have users who care. Occasionally. Often times, even the ones
who do care, care too much. It's like, no, I don't see
it in your profile, go away. But fair point. Tony? - [Tony] Have you brought up
the scope-lock constructor? So I've discussed this with people in SG1, and they said to do a DR, and
I said I'm not actually very-- - [Tony] Library evolution,
library, talk to those guys. Oh okay, yeah. Yes, I've had several
conversations with these people, but I'm not actually sure if it's broken. Or, I don't know if
broken is the right word, but I'm not sure if it
suffers from this behavior. I have to spend some more time. I originally thought it did, and then I was going
to go ahead and do it, but then on second glance it interacts with some other weird stuff. Because it's a very added
template, so it actually, and then there's argument deduction now, so it's complicated. So I'm not actually sure if it suffers from this behavior or not. So if someone wants to-- - [Audience Member]
Well, there's people here I think could probably
answer that for you. All right, I'll ask them. Who should I ask? - [Audience Member] Well you got, you got a few extra tools ... (audience member talks indistinctly) I know, poor Richard, he didn't know what talk he was coming to. Okay, any other questions? We all set? All right, thank you, everyone. (audience applauding)
Hey, speaker here. I have a little update.
After I gave this talk, both gcc and clang devs in attendance implemented warnings for bug #6. I've not kept super close tabs on the patches to know when/if they've been committed or on track to be released.
But in theory they will both have warnings for code of the form:
I learned something very useful from this talk:
When a presenter "quizzes" the audience with C++ questions, sit where you can see Richard.
Wow, I didn't know about bug #6; it's crazy that this code compiles:
It's not a very often recurring bug, but I encountered a bug once, where I found a new language feature I didn't know even existed before(similar to std::string(foo); ).
We had a line of code similar to this one:
std::vector<int> ints((20,255));
What would you expect to be in the ints vector? 20 integers with value 255? The answer is that (20,255) is an operator which evaluates the left argument first and then return the value of the right argument. (20,255) will evaluate to 255 and the vector will end with 255 elements (instead of 20 with value 255).
I don't understand the discussion about fixing
map::operator[]
. Doesn'tmap::find()
already solve the problem of looking up without adding entries? Rather than change the semantics ofmap::operator[]
as was mentioned in the Q&A, it seems we could just educate our developers on the current semantics of the container.