CppCon 2017: Louis Brandy “Curiously Recurring C++ Bugs at Facebook”

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Hey, speaker here. I have a little update.

After I gave this talk, both gcc and clang devs in attendance implemented warnings for bug #6. I've not kept super close tabs on the patches to know when/if they've been committed or on track to be released.

But in theory they will both have warnings for code of the form:

unique_lock<mutex>(m_mutex)

👍︎︎ 20 👤︎︎ u/lbrandy 📅︎︎ Nov 03 2017 🗫︎ replies

I learned something very useful from this talk:

When a presenter "quizzes" the audience with C++ questions, sit where you can see Richard.

👍︎︎ 9 👤︎︎ u/tvaneerd 📅︎︎ Nov 03 2017 🗫︎ replies

Wow, I didn't know about bug #6; it's crazy that this code compiles:

#include <string>
void f() {
    std::string(foo);
}
👍︎︎ 10 👤︎︎ u/FbF_ 📅︎︎ Nov 03 2017 🗫︎ replies

It's not a very often recurring bug, but I encountered a bug once, where I found a new language feature I didn't know even existed before(similar to std::string(foo); ).

We had a line of code similar to this one:

std::vector<int> ints((20,255));

What would you expect to be in the ints vector? 20 integers with value 255? The answer is that (20,255) is an operator which evaluates the left argument first and then return the value of the right argument. (20,255) will evaluate to 255 and the vector will end with 255 elements (instead of 20 with value 255).

👍︎︎ 8 👤︎︎ u/konanTheBarbar 📅︎︎ Nov 03 2017 🗫︎ replies

I don't understand the discussion about fixing map::operator[]. Doesn't map::find() already solve the problem of looking up without adding entries? Rather than change the semantics of map::operator[] as was mentioned in the Q&A, it seems we could just educate our developers on the current semantics of the container.

👍︎︎ 8 👤︎︎ u/patatahooligan 📅︎︎ Nov 03 2017 🗫︎ replies
Captions
All right, it's 1:30, it's time. Thank you everyone for coming back from lunch through rain. I'm here today to talk about bugs. My name is Louis Brandy, I work at Facebook. I have brought to you a selection of bugs. I want to give a tiny bit of context here before, just to give you some background on why it is I have opinions on recurring bug patterns. One of the questions I get a lot, I've been getting since I've come to CppCon, is does Facebook actually have a lot of C++? Like, don't you guys just use PHP? It turns out we have an enormous amount of C++. We've actually had enormous amounts from effectively the beginning. Facebook infra has been built on C++ effectively from the beginning. It's the second most used language behind our PHP hack codebase, and our PHP hack codebase runs on something called HHVM which is a C++ process, which sort of lives in our repository. There's other languages and infra at Facebook. Python, notably, is definitely second, Java, third, and then everyone's favorite language, et cetera. There's actually quite a long tail of languages. Let's see Clicker. So our team is effectively the core C++ group at Facebook. There are two groups in this group, two logical sections. One is very tool-focused, one is very codebase focused. So the tool part of our team works a lot on things like the compilers, the sanitizers, the build systems, and so forth like that. The code-based part of our team deals a lot with the core libraries, folly, we literally have a directory called common, right. And so because of these two things, like the sanitizer is one half, the core library is another half, we're often in escalation of point of both. Me personally, I'm involved often in our site event review meeting. So when something crashes the site, especially if it involves a C++ bug, I see it. So I have a very rich data source. Sometimes it's crashed the site, sometimes it's, hey, is this an Address Sanitizer false positive? It's like, no, no it's not, you have a bug in your code and it's subtle and it's here. And so this is my data source for this talk. So obviously my agenda is bugs. I brought some bugs with me. I had some criteria for the bugs I wanted to show you. I want them to be scary, so like a syntax error or whatever is not interesting. Okay, so these are things that are scary. I want them to be common and recurring. So these are things that keep happening, over and over, despite our best efforts. They're also not the list of the craziest thing I've ever seen. Okay, that's a different talk. And they're concrete, so I have bugs that fit on slides. Like I'm going to show you C++ that is broken. So it's not going to be like libraries that have bad ownership models cause lots of bugs, which is true, they do, but that's abstract and not super interesting. I'm going to talk about mitigation, and all the tools we have. I didn't intend for this to become a recruiting event for ASAN. But it turns out that ASAN is actually highly impactful on almost every bug we're going to talk about today. Almost every bug. So we'll talk about ASAN a fair bit, but also all the other tools. And, just as fair warning, especially since there's going to be a lot of talk about sanitizers here, gcc/clang Linux bias from me, especially in the tooling part. Those of you that use other tool chains, hopefully have equivalent technologies. If not, you should get some. Okay. All right, when you give a talk about bugs, it's actually kind of terrifying, because there's two different groups of people you might be talking to. One is audience number one, I show you something you have never seen before, okay? I save you from a terrible day, with any luck, like if I'm interesting enough for you to be paying attention. Right, so I'm gonna show, I'm hoping I'll show at least everybody one thing that they've never seen before. If you've seen it, you forgot, and maybe we'll flesh it out a bit. For some of you, like if you're a student or something, maybe I'm going to show a bunch of stuff you've never seen before, and you might actually, I might save you like a week, the worst week of your career. Audience number two, I'm going to show you a bug you've seen a hundred times. Almost always you see other people making, you're the one in the code reviewing whoa, whoa, whoa, whoa, whoa, there's something you don't understand here. And of course if you're in audience number two and you're an expert, you never make these mistakes yourself. Never, never, never. This never happens. But you and I are having a slightly different conversation. So our conversation is why do these keep happening? What can we do about it? Can we fix the language, can we make the tools better, can we do education better, right? But what can we do about it to make some of this stuff go away? All right, so enough gibberish. Enough philosophy. It would be intellectually dishonest to not mention the single biggest cause of bugs at Facebook, okay. It is the square bracket operator of a vector, okay? I put a bunch of stuff in this bucket because frankly C index operator is the same thing and frankly pointer dereferencing is roughly the same thing. This is probably the single biggest cause of problems in probably every codebase. Right, it's the most important bug most of us have. But for today's purposes, I'm going to declare this uninteresting. Only for today's purposes, okay. And the reason I declare it uninteresting is it's not really C++ self-inflicted wound. We, computing has this problem, in general. Every language is built or not built around it. And so I think this is actually a hard inherent problem in what we do. Most of the rest of things I'm going to talk about are not really inherent. They were self-inflicted. So yeah, if you wanna crash this site, don't check the bounds of your vector. Start indexing into it. You'll definitely crash the site eventually. Or better yet, integrate through two vectors and assume they're the same length. So mitigation. So let's, we'll practice now, we'll practice mitigation. I've declared this uninteresting, but let's talk about it. Static analysis has been studying this problem for a really really really long time. It's sort of the queen mother of hard static analysis problems. They've made a lot of progress, but the good static analysis to find this is expensive. Interprocedural, et cetera, et cetera. A lot of languages, and C++ is headed in this way, change the abstractions. You don't wanna do for loops, you wanna do range-based algorithms. If you do range-based algorithms, often times you can hoist any bounds checking outside the loop. And of course you can do things that run time, you can literally bounce check every single array access. This is what, like, Ada does, and has done for a really long time for safety. But as far as I'm concerned, at least in C++, your best chance of finding these is Address Sanitizer. Address Sanitizer was built for this. If you really really really don't want out of bounds problems in your code, you need to have fuzzing with Address Sanitizer. I'm not going to talk about it, there's many talks to go talk about this particular problem. But if you take nothing away from today, what you need to take away is fsanitize=address. Clang and GCC, again, specific, but Address Sanitizer is probably the most important thing that's happened, at least in our tool set, in a long time. All right. So, bug number two, the greatest newbie trap, in my opinion, I'm nominating, in all of C++. I'm curious, by the way, before I tell you what it is, if you have a personal nomination and you think you've got me beat, come find me after. I want to hear what you think is the greatest newbie trap, in all of C++. In my opinion, the greatest newbie trap in all of C++, so again, before I do the big reveal, I'm going to sort of increase the chrono ... I'm going to put these in chronological order, so this is sort of the order in which you will probably encounter them in your career. So hopefully, many of you have seen this already. We'll end on something a little more hard. I believe it to be maps operator square bracket operator. Ooh, I heard whispers. Yes, I've called it out. So, I hope most of you know where I'm headed, but if you don't, let's be clear. Map has this property that ... So in this case I've set hey to 12 in my map. And then I try to print it out and I typo it. Okay, I've typo-ed it, so I'm trying to print out a key that doesn't exist in the map. Most of us know what happens here. It prints out zero, right? And you're used to that, that seems normal to you. Show this to a programmer who doesn't know C++ well and ask them what they think this does. The actual answer was no higher than third on their list. This doesn't crash, this doesn't print out an initialized memory, this doesn't throw an exception. What this does is print out zero. And it actually inserts something in the map. So the size of the map at the end of this is two. Okay? If you don't think that's weird, you've been doing C++ for a really long time. Yes. We won't think twice about using this. This is an occurrence count, right, like a really dopey small one. Like loop through the string, and in my map, increment for every character I see. Okay, you might write something like this. The Java version of this is horrible, because you have to initialize your variables, right? And C++ just gets kind of, the integers get initialized, they just get initialized somehow. Don't let a beginner C++ programmer stare at this for too long, they start asking really complicated questions. Like, wait a minute, you don't initialize any of your ints. Nobody sets anything to zero. How does that work? Does that work with normal ints? No, no no. No no, it only works because of value, oh I don't want to talk about value initialization. Right, not with someone who's falling into this trap. I want no part of like, well see it's like a default construction of an int, even though ints don't have default con ... Okay, never mind, never mind. Just, next slide, next slide. Okay. So, we actually protect our sort of beginner programmers from this because we drill into them const correctness. Be const correct, it will save you from bugs, this is one of those bugs that saves you from. This code will not compile, okay? You have a settings map that you're trying to read from, with a square bracket operator, but the settings map is const, so this won't compile, okay? Because the settings square bracket operator might need to insert, therefore it won't work, right? No, yeah I ruined my own joke. How does a newbie fix this code? If you paid attention, you know. How does a newbie fix this code? - [Audience Member] Remove const. Yes, remove const, thank you. Yeah, you remove const. Guess what, it works. I know he told me to be const correct, but it works. Yeah, so I was sitting with someone who was struggling with something else, they were relatively new to Facebook, I saw code that was exactly like this, and I said, whoa whoa whoa, why is this not const? I already kinda knew why. And they're like, well, it doesn't work when I put in const. I'm like, yes, let's put it in, let's look at the error message. This is the error message. Error: passing const std:: map dot dot dot, as this argument discards qualifiers. So as a C++ expert, you know what that means. You can interpret that. A newbie sees "error passing const," right? The error literally says, there's an error passing this const map, into this thing. So I should remove it, it works, right? Now of course only newbies fall for this mistake, right? Of course. Twice, twice as far as I know, major site incidents have occurred in exactly this pattern. Experienced, very experienced C++ teams had a widget, in this case it was like a service handler, in its constructor, it took a map of settings, which is already gross, already gross. And they assigned the data member settings to the settings that gets passed in, and then they print their settings. Server initialized with timeout, whatever. Well, what happens in a refactor if you pass in a map that doesn't contain timeout, well, we've just helpfully, in our print statement, in our print statement, set the timeout to zero. In most socket libraries, a timeout of zero, infinite, yes? So it's a really excellent way of making, like, one service is dying, and now this service, all of its threads are in infinite timeouts. But of course, you know, everyone tests all their timeout code paths really extensively, so this doesn't happen, yeah I know. So here's the bad news about this newbie trap. It's actually the one we're the most powerless against. I have nothing to help you, I have nothing. Like people have legitimately suggested we just ban it. Just ban the square bracket operator of map. Const correctness helps to some degree, but not always. So I got nothing, if you have a clever idea, let me know, come find me, okay? I'm interested. (audience member shouts indistinctly) What? We'll call at, yeah but, so, that's good if you know that, but they don't. People complain about at for different reasons, that's a different talk, okay. So, good, at, at is one solution to this problem, he mentioned at, because at will throw if the key's not there. Most languages have an equivalent or rather an alternate mechanism which is extremely common, which is this sort of get with default. So you call it on, you can call the const map with the key, and if the key's not there, instead of throwing, return the default argument. C++ doesn't have one of these, turns out it's actually kind of tricky. And this is the official actual implementation in GHC Haskell, of this exact idea. This is a real programming talk now. We have Haskell in this talk. We've put away the toys, this is Haskell now, right. Welcome. This is the actual implementation in Haskell, it says, findWithDefault, default k m, they're in reverse order in Haskell, okay. Says look up the key in the map. If you find it, or rather if you find nothing, return the default, if you find an x, return the x. That's what it looks like in the prelude, or rather in the GHC standard library. This is what the folly version of this exact thing looks like. So we have respectively this same function in folly. It's not super fair to compare this, because most of that noise is type information and Haskell has type inference. But this is a template that takes a map and a key and the code down at the bottom is really like what you'd expect, like find the key, if it's not the ended narrator return it, otherwise return the default. That's what the sort of last two lines look like. It's very straightforward. There is something in this slide that C++ programmers hate. They hate it, they hate it, they hate it. (audience member shouts indistinctly) Oh, I love this man, okay. So I had a whole separate slide to build it there, which is, I moved all the template noise to make it more obvious, I said let's get rid of all the template nonsense, and I said let's just do the string version of this. So this is the exact same slide except with string substituted. So I made it much more obvious now. It returns a string by value, that means there's a copy here, an unnecessary copy. And when I was proposing this talk, I was saying, hey, give me ideas for bugs that have been recurring, so many people mentioned, oh, having extraneous copies is a really common bug. Oh, it is, isn't it? Well how do you fix it if you hate extraneous copies? Easy, right? We know how to fix it, we return a reference to the string. This works great if the key is in the map, I don't have to copy anything out of the map, right? I don't have to anymore, I'll just return a reference to what was in the map, and there's no more extra copies. This is hopelessly broken. This code is hopelessly broken. Do not copy and paste this code. It turns out, right, oh yeah, this code is hopelessly broken. Congratulations, we just crashed this site. It turns out that people love default temporaries. They love default temporaries, this is in fact the way this function is almost always used. On the right, you're creating a default string, and if you return that by reference, because the key's not in the map, v is now a reference to a temporary that has now been destroyed. Yes? This patch that I just showed you has been submitted like four times, and I've rejected it 3 of those times. One time, I shouldn't say I, someone rejected it three times. One of those time it got in, and we're like, oh no no no. I need to do negative caching of gifs and say, if anyone submits this to just auto-reject it. Okay, people want to make this change so badly. But this is a broader class of problems, this is a much bigger problem that's well known. There's many creative ways to do this, to pass temporary to an object and then somehow return a reference to it, to the caller. This is a really hard problem to fix, I think, in the language. There's some library tricks you can do often to make it hard, harder to do this. But at some point the language has to fix it, and I don't know if it can or will. But it can be mitigated, so again, Address Sanitizer catches this, catches it cold. The only problem is, you often need this extra flag. So default Address Sanitizer, at least up to some point. Didn't have this on by default. I believe it's now turned on by default in Trunk, I don't know which version that's actually going to ship in. But when I turned it on, I had to add this flag. And when I turned this flag on for ASAN, we found dozens of bugs of this form, dozens. It was bad. All right, bug number four. So this is intermission. How are we doing on time? So the intermission is interesting, because I wanted to tell a story that works, so this is not a recurring bug. This is a bug that is not recurring. This is a bug we killed. It was horrible, it was a constant source of fighting, and we actually fixed it. It involves the worst keyword in all of C++. Who knows it? - [Audience Member] Auto five. Auto, oh no, what else? - [Audience Member] Go to. Go to, oh, that's a good one, but no, not go to. What else? - [Audience Member] Volatile. Volatile. Volatile, oh. Who knows what volatile does? That's an honesty check. Oh look at all these liar ... Okay, I'm just kidding. Yeah, volatile's a really scary one. Lots of people think they know what it does. At least historically, and use it really really wrong, really really wrong. We introduced what is actually in practice a terrible idea. We throw a paragraph at you, if you write the word volatile. Like our linter literally spews a paragraph of gibberish at you. It's kind of mean, but that's fine. It's not actionable, which makes it a terrible lint rule, but I don't care, if you use volatile I'm going to lecture you. Almost every use of volatile in our codebase had to do with threading and trying to synchronize threads. Now there are legitimate use cases of volatile that don't involve threads. They just don't exist in our codebase for the most part. So you might live in a codebase where you actually have memory map diode and you need to do volatile. But we don't. So I had a really bad lint rule by most lint rule standards, which said, if you write the word volatile and it's not in a comment I'm going to lecture you on your dev, automatically. This really angered a lot of people, it was fun. We'd get a post that was like, actually, I'm using it correctly, I've one thread, and there's a boolean, and he's writing to it, and then the other thread's trying to read the boolean and I don't want the compiler optimization, so I make the bool volatile, we've been doing this for 20 years. You know, this kind of stuff. We had so many of these arguments, but the thing that's interesting is that we won. We won the arguments. Like people have been at Facebook for a few years, they don't remember volatile arguments. The Great Volatile Wars of 2010, they're gone, 2011. People stopped using volatile for threads, and I think if you pause for a second, it's kind of interesting, like, why? How did that actually happen? It turns out people wanted atomic flags or mutexes or higher, yet higher order abstractions. And once we started giving it to them, somehow volatile just vanished, and we sort of trained the old guard who had learned. But the new people who wandered into the language must be just finding what they want, the right thing. The use of volatile synchronization mechanism seems to have mostly vanished. And it's, I'm not exactly sure why, but I think it's an interesting case study, and sort of, people really want to do x, and they'll find a way to do x. And maybe if you just give them a solution, they'll use it, turns out. All right, bug number five. This is the worst question in all of C++. This is the one that makes my blood run cold. The way this usually works is someone posts in our internal group. They tell a story, the story is always the same. It goes like this. We've been having an intermittent crash for a month. It's really hard to reproduce, I can't reproduce it. What we've ended up having to do is bisect it in production. We build a binary, we run it for six hours, if it crashes then that's bad and if it doesn't that's good. We've been bisecting for like 3 weeks. We finally found the diff and we don't know why this diff is broken. We've been staring at it and ripping it apart and we don't, there's something here we clearly don't understand. But we finally came up with a hypothesis. And then they ask you the question. It's only three words. Is ... Shared_ptr ... Thread-safe? (laughs) Is shared_ptr thread-safe? What's the answer? - [Audience Member] No. No, I like that. Any other answers? (audience member shouting indistinctly) I'm sorry, I couldn't hear, it's not this ... You hear answers like this, like yes, no, it depends. People love this answer, which is totally useless but true. "It's exactly as thread-safe as a normal ptr." Technically true, but useless. If you go to Google and you type in is shared_ptr, the top result. Is shared_ptr thread-safe? By the way, the third result, is shared_ptr slow. No, no shared_ptr is not slow. If it's in your profile, fix your code. Is shared_ptr null is fourth. Yes. Yes, shared_ptr is null. Put a check in. Okay. But we're here about thread safety, all right. Is shared_ptr thread-safe? So I've had a lot of time to come up with an answer. I've had a lot of practice. I believe the answer to, is shared_ptr thread-safe, to be no. That is technically, I believe, to be a lie. But I'm answering a slightly different question than the one you're asking. You asked, is shared_ptr thread-safe, but what you meant to ask was, is my code correct? And the answer, if you have to ask, is no. No, no. So here's my teaching sense, so you might have no idea what we're talking about. You might be in audience one right now, going what are these people talking about, what do you mean thread shared_ptrs and thread-safe, let me explain it to you. A shared_ptr, this is a mental model, this is not an implementation, it's a mental model. It's three things. There's the actual shared_ptr struct, which is a pair of pointers, that's what it is under the hood. Doesn't have to be, but in our mental model that's what it is. It points to two things. One pointer points to the shared T, one points toward the shared reference count. This is where the C++ people get really uncomfortable. They're like, oh, it's not a reference count. There's a lot of other stuff in there. It's technically called a control block, and it contains, you know, meet pointer, dah dah dah dah dah. But for mental model purposes, it's a reference count, okay? And the answer, or rather, yeah, when you copy it, all you do is you copy the struct with the two pointers, and they point to the same two shared things, the shared reference count and the shared T, and then we increment the reference count, all right? That's how shared_ptr works under the hood, that's the mental model. The thread-safety kind of approximately looks like this. Anything that touches T is unsynchronized. The shared_ptr will not help you synchronize the T. That's on you. If you want synchronization at your T, do it yourself. The shared_ptr will, however, manage the control block, and in particular the reference count. So if multiple threads destroy shared_ptrs, they will both atomically decrement a reference count. That will happen. But the one that trips people up tends to be this thing on the left. For some reason, people want to read and write to a shared_ptr unsynchronized. It doesn't work. For the exact same reason a pair of pointers wouldn't work from two threads. This is a thing that for some of us seems super obvious, so I was testing this with people at Facebook. Like, do you know this? Is this obvious, does everyone know this already? And one person literally went, that's the crash. That's the crash. We have a map, and the shared_ptrs, that's the crash. And I was like, okay, I guess that's going in the talk. I've changed the definition of the colors now. I want to talk about our intuition now. So now I'm moving towards audience two. You hire lots of programmers and you're training them over time and you should know this about their intuition. They get the right hand side correct. They naturally assume that T is unsynchronized when playing with shared_ptrs, they get it, just intuitive. They naturally assume that the control block, the reference counting, happens for them automatically. The thing they seem to get wrong is the one on the left. People really want to read and write to shared_ptrs unsynchronized. The pattern that comes up over and over and over at Facebook and I think that this is an artifact of the fact that we write a lot of services, is this, it's a buggy form of RCU, and it's actually a degenerate but very common form of RCU. RCU, by the way, is Read-Copy-Update, has a Wikipedia page that was to talk about it. I'm not going to talk about it too much here. But in our most common use case it's actually very simple. You have a service, it handles requests, you have one thread who's responsible as the updater. So one thread wakes up every 20 minutes or whatever, goes and reads some database, populates some in-memory efficient version of whatever's in the database, and then stores that into a shared_ptr as the one true most recent state in the universe. The worker threads go into that one true shared state at the start of their response, they pull one out, grab a reference count to it, and say, okay, I'm going to use this state of the universe for this request. And then on the next request, I'll grab a new state so you won't update me between whilst I'm in a request, but I'll pick up the update after I'm done. And this is really, this has a lot of lovely properties in a service, right, because updates are really cheap. You're not really blocking anything to do your update. It has the downside, of course, that you might have multiple states in flight, like whilst you're updating one state, someone might be using the old one, and you've gotta be able to tolerate several states in memory at once, to have this work effectively. But, and the implementation here is actually really easy, all you gotta do is at the heart of that handoff, have a shared_ptr and a mutex. And have an atomic get and an atomic set and you're done. Right, it's not even that. It's literally the easiest thing to write. But what's interesting is our intuition fails us. People keep writing the wrong thing, and then they have a really bad month. Finding that bug takes a month, if they don't see it or know it. So mitigation, yeah. So thread sanitizer finds this, if you run thread sanitizer. Address Sanitizer often finds it as well, but it's not super helpful. Because what happens up happening is you end up corrupting the shared_ptrs and you destroy the wrong thing and it looks like memory corruption, and it's not always clear. Address Sanitizer will tell you something's wrong, but it won't necessarily aim you right at it. Now we get to the library part of the talk. There has been a thing called atomic_shared_ptrs in C++11. On paper, this solves the problem. Of course, most implementations haven't had one. And then there was a concurrency test that added atomic_shared_ptr like this. This here, atomic_shared_ptr like this, and then like this. And then we went back to atomic_shared_ptr like this, like I heard something about a vote in Toronto. I asked someone in SG1. so I tend to hang out in the evolution working group of the committee. I don't really hang out in the SG1 which is where they talk about this. I was like, can you give me the history of this, because I'm going to give a talk, and I'd like to at least tell people what the deal is about atomic_shared_ptr. They're like, eyes glazed over, and apparently this is a big long interesting, I don't know about interesting. This is apparently a big long topic of back and forth of what's going to happen with atomic_shared_ptr. I don't actually know its current state, I suspect we'll get something like it at some point. It does solve my problem, but to be fair I don't need a highly performant atomic_shared_ptr to solve my problem, I can get away with a mutex and a lock. And in fact, there is also another proposal if this is interesting to you, about a cell type of an abstraction, which is effectively exactly what I want. That's also before the committee. Oh, bonus bug about shared_ptrs. We have this one all the time too. I don't know why people write this. I can't explain it. Some function returns a shared_ptr, they just dereference it, immediately, and they grab a reference to whatever's at the other end. And now you call ref.boom(). So the problem here of course is everything that happens after that, the shared_ptr's gone. It's been destroyed, because you dereferenced it. And you basically, you don't have the protection of a reference count anymore, right? So all your code is now running without a reference count protecting you. So this one's come up a number of times, but again Address Sanitizer does catch this one. Very easily. All right. Let's see. All right, last one, this one's my favorite. Bug number six. This is code review, this is going to be interactive. At least a little bit. I'm going to show you a series of slides with very little code on it. Very little code. And I have two questions for you. Question one, does it compile? Everyone's favorite C++ game. Will this code compile? Question number two, if it doesn't compile, what error message would you expect or even like, if you're being ambitious? So, does it compile, what kind of error message do you expect? Here is the code. Include a string. Void f. Standard string, parentheses foo, close parentheses, semicolon. Raise your hand if you think this code compiles. So, for the purpose of the audience, maybe 10%. Maybe 10%. But I pointed that because that's Richard Smith, right there. You're, what, the editor of the C++ standard, right? Yes. The editor of the C++ standard says this code compiles. Who's changed their mind? (audience laughing) Richard, unsurprisingly, and surprisingly, is correct. This code compiles. Almost everybody in here got it wrong. This code compiles, what? What? Okay, take a minute, pause the video, if you're on video, think really hard about how this code can possibly compile. How this code can possibly compile. And then think a little bit more, like more existentially, like what are we doing, why are we here? I feel like I'm wasting my life a little. Okay, this code absolutely compiles, and the answer, and I'm not an expert in the standard, but my understanding is, the way the standard works, at least in this part of the standard, if something looks like a declaration, it is a declaration, no matter what else it might be. And that looks like a declaration. So, this, and this, are basically the same line of code. Those parentheses are optional. So what you're seeing, on the top, is a declaration of a string called foo. Okay? Now, that seems like trivia. That seems like that seems weird, like, okay, how. Oh yeah here's me proving it, by the way. So you put it into Godbolt, and you can't really see this, but what you'll see in Godbolt with no optimizations is the default construction of an empty string, and then the destruction of the string. So you'll see the creation and destruction of an empty string, foo. So, okay, but the question becomes like, so what? How is this possibly one of the most common and horrifying bugs in all of C+? And I posit to you that it is, just not in this form. It's in a slightly different form. This is often what it looks like. You have an object that has an update method, and this object is meant to be used from multiple threads. So it needs to be internally synchronized. And so you of course have a data member that's a mutex, and on any mutating method you have to lock your mutex, right? That's the way it works. So you lock your mutex, and then you do the mutation. Right? Everyone sees the bug, right? Do you see it yet, who sees it? Only half of you, okay. Yeah, so I didn't name my lock here, right? I am not, that unique lock is intending to be like a lock guard, right? I'm intending to, on construction, lock my mutex, and on destruction unlock my mutex. That's what I'm hoping this code does. That is not what this code does. No, what this code does is it declares the existence of a unique lock, named my mutex, that shadows the mutex data member that you were trying to lock. And to be clear, nothing gets locked. There is no lock that occurs here. It looks like it's locking, but you want it to lock. But it's definitely not locking, what you want to lock. And the fix is super easy. You've seen it a million times. You have to name the lock guard. You have to give it a name. Now, this is a unique log named g, whose construction locks the mutex and whose destruction unlocks the mutex. So your code is now protected by a lock guard, right? Do a code search for this. This is the pattern, do it. Unique_lock<mutex>, if you have a code that's using relatively recent C++, do a code search for unique_lock<mutex>. GitHub search won't let me do it, so I couldn't do it on GitHub, but I figured since I'm giving this talk and I'm telling all of you to search your code, I should search mine. Do you think I found a bug? Yes. This is what the actual code looks like, std::unique_lock<std::mutex> (mtx). That is not doing what it thinks it's doing. That is not a lock guard. That is a shadowing unique lock that is not locking the mutex it tried to lock. I didn't just find one bug, I found two bugs in our codebase. Here's the sad thing about this. I only found two bugs, which is okay, but we actually have a lint rule to protect against this. So this is two bugs that got into our code despite the linter telling them not to do this. Imagine how many we would have if we didn't have a linter telling people not to do this. All right, now we're going to do the worst C++ quiz of all, all right? I've taught you everything you need to know, there's no more tricks, I've taught you everything you need to know to understand what happens next. So does this compile? So remember, this compiles, this compiles. Shockingly, perhaps, this compiles. Does this compile? Raise your hand, does this compile, if it does compile, yes. Only half of you. So this compiles as well. This compiles as well. The first one, as we mentioned, is the declaration of a string named foo. The second one is a constructor. It constructs a temporary string from a char star. All right, let's make it worse. Does this compile? I wrote it twice, it's copy and paste. Raise your hand if you think this compiles. No, no this doesn't compile. (laughing) No, this is a redefinition error. This is a redefinition error, because if it looks like a declaration, it is a declaration, so this is the same as starting standard string foo, standard string foo, and we'll say you can declare the variable with the same name and the same scope. So this is a redefinition error. All right. Does this compile? Look closely, what? We have a question? (audience member speaks indistinctly) It could be interpreted that way. It could be interpreted the second one is a copy of the first. But the rule is if it's ambiguous, and it could be a declaration, then it is a declaration. So this is a redefinition error. To be clear, I'm slightly out of my league here, because now you get into super-parsing of the standard, but as far as I tested nobody will compile this, and I believe that's why. Because the rule says if it's ambiguous with a declaration, then it is a declaration, period. And that's why you get a redefinition error. Right, does this compile? All right, yeah, most of you've got it. Yes, this does compile. The first one is a declaration of a string named foo, and the second one is a temporary copy constructor of the string. It's calling the string's copy constructor with the first foo, which is doing it okay. All right. And then finally, just for symmetry purposes, you grab the squiggly braces and you pull them out. Does this compile? Yeah, all right, most of you, yes. Most of you say yes, this compiles. So the reason this compiles, this is again a second declaration of foo, but it's now in an innerscope. So the squigglies are a scope, and it's now shadowing the outer foo. You have two declarations of variables with the same name but one of them's in an innerscope shadowing the outer one. Right, that's what's happening here. So this works. So, some super important notes. There's actually a subtlety here that's critical to understand. There are two things that make unique_lock conspire to have this really pathological bug-prone behavior, as far as I'm concerned. The first is its RAII type. If you have the exact same situation with a non-RAII type, you tend to notice very quickly, because you didn't name it or anything, so you'll try to use it or something and you'll be like whoa, okay, I forgot the name. But RAII types, their mere declaration is often all the functionality you ... That is the functionality. I declare it, right, and that means I've got one. So you're much less likely to name it, because you don't actually need a name, you never need to reference it. The second part that's actually really important is this only works because standard_unique_lock has a default constructor. Because I default construct mutex, in this case. I have to use the default constructor. So unique_lock is affected, but lock_guard is not. So if your standard lock_guard will not compile, in the case of, if you were to swap that out. Now, if you're using standard types that's great, because standard unique_lock has this weird property you can lint on it fairly easily. If it's like our codebase, you have lots of reimplementations of lock_guards from the years, right? Like, dozens of them. And very few people fully appreciated this problem. So we found, I don't know, tens, maybe 100 cases of our other lock_guards that did have default constructors being bugged in this fashion. So if you have a lock_guard, run, don't walk, to see if it has a default constructor. By the way, unique_lock does. C++17 has a thing called scope-lock, and it's entirely unclear to me if it suffers from this problem or not. This is complicated. As of yesterday, when I put it into Godbolt, Clang will not compile it and GCC will. I'm not sure. So there's, scope-lock may or may not have this problem in C++17. So anyway, the TL;DR on all this is if you have an RAII type and it has a default constructor, you're in a very dangerous situation, potentially, where people will try to wrap something and forget to name the object instead of what they're wrapping and they'll end up shadow ... Declaring a shadowing thing. So that's my other takeaway for you. All right, so mitigation. So, Wshadow is a, at least in GCC and Clang, is a warning that's been around a while. It absolutely detects this, Wshadow, so shadow tells you about shadowing warnings. That's when you have a variable in an innerscope that shadow a variable on an outerscope, it will warn you. If you put this particular pattern in and you turn on Wshadow, it will absolutely find this bug. The problem, of course, if you've ever used Wshadow, is it's extremely pedantic. It almost always gets turned off. It's full of noise and things that you probably don't want to fix, and makes the code maybe worse. There have been these hybrid flag suites, so there's shadow-compatible and shadow-compatible-local. These are meant to be much higher signal versions of Wshadow, so shadow-compatible only will tell you about shadowing warnings if the two types are compatible with one another. And shadow-compatible-local is only if they're local in the same function, and they're compatible with one another. I strongly encourage you to turn on at least Wshadow-compatible-local. You will find all kinds of, if not broken code, smelly code. Like there's an outer loop using I and a bunch of inner loops using I, and it's like let's just not do that. Even if it's not bugged, you don't want it to be, right? So shadow-compatible-local is good, you should turn it on, but it does not catch this. Only Wshadow catches this bug. So, I'm sorry. The story there isn't great. But, you know, this means this is easily detectable, right? Our extendable Clang based linter has a check for this. It's actually fairly easy to detect. But it's actually not widely detected. If you don't have a tool that finds this pattern, which is to say, unnamed lock guards, this sort of immediately wink out of existence. I suspect you have a lot of these, especially if you have default constructors on your lock_guards, or you use unique lock in your code. Again, I'm not exactly sure what the best ... I have some extra stuff here that I didn't want to get into in too detailed. There's a related class of problems which isn't this declaration issue, but it's more like you create something that exists for one line of code only when you expected it to live forever. That's a place where a nodiscard attribute for constructors might be helpful eventually. So it's like two flavors of this, but the more common one is the declaration one. So again, we have a Clang based linter, so something like clang-tidy, for example, could very easily find this. All right, so conclusions. C++ is hard. That's not fair, it's kind of a value judgment, but we should just admit it, I think. C++ is hard. Like a lot of this stuff here, we've had some of the best C++ people spend a week finding a bug, and at the end track it down to, like, their lock_guard is missing a g. And like, that stinks. And the consequences of that are really, it's a painful thing to be bugged. It's an incredibly painful day to find that bug. And it's a one character change. The patch is one character. Tools are our best teaching weapons. So, lectures are great. I would love to put every Facebook programmer through a lecture that just taught them about all the bugs that they should never write. Like, does it actually work all that well? The sooner after you write the bug I tell you, the better. Everyone would pretty much agree with that. If your IDE tells you you just wrote a bug, that's fantastic. The compiler tells you you just wrote a bug, that's pretty fantastic. If your dev review tool runs some expensive asynchronous thing and comments on your change set and says, oh that's definitely a bug, that's pretty good, that's pretty good. If you get an e-mail two days after you committed it because some monstrosity, like some supercomputer did some interprocedural static analysis and e-mails you a bug, that's still not terrible. Not great, but not terrible. If I have a talk like this and two years after you spent a week debugging it, I explain it to you, that's borderline useless. So really, for a lot of this, for my money, I want better tools. I'll talk in a minute about this, but the sort of Clang extendable linters have been really powerful internally in terms of, for a particular data type, having specific rules we can check, checking certain kinds of semantics. I don't want lock_guards that shadow mutexes, for example. That's a really good rule. Clicker. Yeah I just said this. So yeah, extendable linters. So Clang-tidy is an example of this. We have one internally called how to even. Because you can't even, they're clever, they're clever. An Address Sanitizer. Yeah, like I didn't really intend for Address Sanitizer to be the solution to all of my problems in this talk, it just worked out that way. I've already ... One cannot speak highly enough of Address Sanitizer, and the amount of bugs it finds in a codebase. If you live in a codebase that compiles with Clang or GCC, and you've never turned Address Sanitizer on, leave. Go do it right now. Go to one of the desks. I hope all of them are turning on Address Sanitizer right now. It's really, it's life altering. You'll find a dozen bugs in your codebase, running your test suite, with Address Sanitizer turned on for the first time. You know, don't tell anyone until you've asked for a raise, but then teach your team about Address Sanitizer. You're welcome. Okay, questions? I'm done. (audience applauding) If there's no questions, I'm going home. All right. No questions. Oh, no no no. - [Audience Member] What bugs didn't make the cut? What bugs didn't make the cut. Everyone kept trying to say unnecessary copies, and I was like, I have a huge bug that I keep having because you all hate unnecessary copies so bad. So that one didn't make the cut. A lot of people want to do small things, like pass by reference instead of pass by value. Just going to pick all the ones Nicholas just submitted, a few minutes ago. The other one that's really common, so I actually think one of the most important bugs in C++ that we haven't talked about that's actually really hard is asynchronous programming and lifetimes. So we have Folly Futures, and we have fibers, and we have a lot of lifetime problems related to asynchronous programming. And other languages that have solved this, like have asynchronous programming frameworks have garbage collectors, which help them with all the lifetime issues. So we have a lot of problems related to that. That was a very popular nomination, but we didn't ... It doesn't fit on the slide. But I do think that one's actually really critical. Asynchronous programming and core teams will help with this, and lifetimes become a really tricky balance. So that's a short answer. Okay, who's next? - [Audience Member] Early on, you had a slide about a diff that people kept wanting to submit on your get default. Yeah. - [Audience Member] Why wouldn't you put a comment there saying, you might expect this to be this, but don't do it. Yeah, I mean, so I have a skipped slide, I won't go to it. No, what we ended up doing was making a overload to that function that takes the r value default and delete, equals delete. That solves most of the problem, and then put a big comment there. That's what we ended up doing. The actual more complicated lie here is that we've, or rather the more complicated truth from the simple lie that I told. We have more than one of these. So the folly one we fixed, but there was a different one in our STL utils that was some ancient code, and that one had a bug in it that got committed without us noticing. So, that's what happened. - [Audience Member] Will return an optional to a value save my map index operator? Will returning an optional from the map index operator? - [Audience Member] Return optional to reference to a value. Will it save this index operator? Yes, do you ... Yeah, so square bracket operator on map returning an optional, on a const map. So I don't hang out in library, my suspicion is that that particular thing is ... Everyone would like something like that, eventually. It's complicated. For reasons that, oh, optional's another good one. Just corner a library person and ask them about optional and all the ways that it gets hard. But yes, I agree, that is, that's what Haskell does, right, it returns a maybe type. So that's what you want, you want an optional as your square bracket operator. So I agree, that's the right direction to head in eventually. - [Audience Member] Thank you. - [Audience Member] But standard optional doesn't support references. Someone who would know better than I, yeah. - [Audience Member] So my actually question is, will C++17 rules regarding mandatory copy elision simplify or complicate the concerns about too many copies? I don't know, what do you think? - [Audience Member] That's why I asked you. (audience laughing) I think we're rightly worried about copies, so mandatory copy elision is good. I just, you know, not all APIs, especially ones that take references are, can return references, right? That's part of ... I don't know. I don't think so. I think copy elision is good. On average, it'll be good. I think the broader problem of, it's really easy to smuggle a reference out of a function, is its own problem. I don't know how or if we can fix it. Do you think we can fix it? - [Audience Member] I'm wondering if people will be less scared about returning copies so we just don't run into the problem, or if we start talking too much about is this copy getting into a more complicated spatial language? Yeah, well I mean, so the example of get default, you have no choice, because you have more than one source. You can't elide the copy. If it's in the map, you have to copy it out. So in that case, it won't help, but on average you're probably right. It'll make it much less likely for people to be paranoid about copies. Dmitri. - [Dmitri] These linters looks really useful. Do have plans to open source them? What looks ... - [Dmitri] Linters, like Clanger based. So, we built ... This is a long story too, we built how to even, which is effectively clang-tidying right before we had a clang-tidy that we could use. So we sort of built our own version of this thing, and there's been great long term discussion of, like, if we open source it, are we just bifurcating the space. Should we convert all these checks to clang-tidy checks and go that way? So this is an ongoing discussion. Yeah, I want to open source stuff, like we should open source all these tools, and we often do. We got to figure that out, that's the real issue. It really is just a simpler version of clang-tidy, that's what it is. So I'm not sure how valuable it would be, in its current state. - [Audience Member] This is more coming for the audience actually, so when Facebook copies 100 bytes too many, it costs them one million dollars. Me and you, it costs us zero. You have so much bigger problems than just copying a string one too many times. I just had to say it out loud, sorry. It's true. But the problem is if you're a library author, you have users who care. Occasionally. Often times, even the ones who do care, care too much. It's like, no, I don't see it in your profile, go away. But fair point. Tony? - [Tony] Have you brought up the scope-lock constructor? So I've discussed this with people in SG1, and they said to do a DR, and I said I'm not actually very-- - [Tony] Library evolution, library, talk to those guys. Oh okay, yeah. Yes, I've had several conversations with these people, but I'm not actually sure if it's broken. Or, I don't know if broken is the right word, but I'm not sure if it suffers from this behavior. I have to spend some more time. I originally thought it did, and then I was going to go ahead and do it, but then on second glance it interacts with some other weird stuff. Because it's a very added template, so it actually, and then there's argument deduction now, so it's complicated. So I'm not actually sure if it suffers from this behavior or not. So if someone wants to-- - [Audience Member] Well, there's people here I think could probably answer that for you. All right, I'll ask them. Who should I ask? - [Audience Member] Well you got, you got a few extra tools ... (audience member talks indistinctly) I know, poor Richard, he didn't know what talk he was coming to. Okay, any other questions? We all set? All right, thank you, everyone. (audience applauding)
Info
Channel: CppCon
Views: 104,728
Rating: 4.9059033 out of 5
Keywords: Louis Brandy, CppCon 2017, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, conference live streaming, event videographers, capture presentation slides, record presentation slides, event video recording, video services
Id: lkgszkPnV8g
Channel Id: undefined
Length: 52min 1sec (3121 seconds)
Published: Thu Nov 02 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.