- Uh, my name's Geoff Romer, I'm a software engineer at Google where I work on improving the
C++ development experience, especially as it relates to concurrency. I represent Google on the
concurrency study group of the C++ standards committee. I wrote our C++ concurrent
programming guide and I'm one of the engineers responsible for our C++ style guide. Today I'm gonna be talking
about how to talk about thread-safety in C++. And please feel free to jump
in with questions at any time. So when we're dealing
with multi-threaded code, we often toss around the term thread-safe. It sounds reassuring, right? We have threads and this is C++ so we need all the safety we can get. So thread-safety sounds like a good thing. But what does it actually mean? If I see a function or
a class or an object that says it's thread-safe, what does that allow me to do? So this is how Posix defines thread-safe. It says a thread-safe function can be safely invoked concurrently
with other calls to the same function or
other thread-safe functions. And that sounds reasonable,
but what about code like this? For all the examples that I'll be showing, we're assuming that
thread one and thread two are functions that might run concurrently. And Posix says that memcpy
is a thread-safe function. So is this code safe? Hands up anybody who
thinks this thread is safe. Good, yes it's not. It's definitely not safe. We have two concurrent rights
to the same data, namely out. And those rights could
clobber each other unless there's more synchronization. But how 'bout this? Who thinks this code is safe? Who thinks it's not safe? Okay. This code is safe, at
least as far as I know. And all we've done here is
changed that output parameter into a local rather than a global. So thread-safety isn't
just about functions, it's also about their inputs. And what does thread-safety
mean when we talk about thread-safe types? So here's one partial
example, partial answer, from a talk that Herb Sutter gave in 2012. And incidentally, that talk
is well-worth watching. There will be a link in the slides. So bitwise const or
internally synchronized actually gets at a lot of the key ideas. But this is obviously nonsense. I'm being kind of unfair here. The point of Herb's talk was
that both const and mutable imply important
thread-safety requirements. And that slide was a
way of summing that up with a memorable joke
and obviously it worked because here I am six years
later talking about it. But I think part of what
was going on is that Herb was using the term thread-safe with a different meaning when
he was talking about const than when he was talking about mutable because more precise terminology
wasn't available to him. So in this talk I'm gonna
present the terminology that we use at Google to
document and reason about thread-safety and C++. So what does thread-safety mean? We'll start with what safety means. When we talk about thread-safety, what are we trying to be safe from? Anybody, what are we
trying to be safe from? Any guesses? I hear data races. Anybody else? Bugs, corruption. Okay. I am simultaneously alarmed and relieved not to have heard the
incorrect answer that I have my next slide about,
which is race conditions. The trouble with race conditions is that the term just means that
there's some valid thread timing or sequencing where the program
doesn't do what you want it. And that covers basically
every kind of concurrency bug other than a deterministic deadlock. So it would be nice to be
safe from race conditions, but it's not clear how we
would actually do that. Whereas when we talk about terms like memory safety, type
safety or signal safety, we're not just talking about being safe from certain kinds of bugs, we're talking about
safety from some specific locally avoidable bug pattern. Race conditions aren't
locally avoidable in general, so I don't think race
conditions is the answer. But I did hear data races. We're getting warmer. So just to review, what is a data race? This is a simple example. We have one thread that's
trying to modify an int while another that's trying to print it. And that's what a data race is, two operations that happen concurrently, while one of them modifies an object while the other is accessing it. Which is pretty straightforward. But here's another example. This is pretty much the same code, but now we're working with a
std string instead of an int. But again we have two
concurrent operations accessing the same object and
one of them is a modification. So is this a data race? Who thinks this is a data race, hands up? Okay, I am going to disagree with you, and the standard actually
disagrees with you too. The standard gives a precise definition for the term data race, which mostly looks like what we'd expect. A data race is when two
potentially concurrent non-atomic actions conflict, and they conflict if one of
them modifies data access by the other. But it's not talking about objects here, it's talking about memory locations. In standardees, a memory
location is basically an object with a built in type: integers, floats and
doubles, pointers, anumes, things like that. Object of class types are
never memory locations and in particular, a std string
is not a memory location. So according to the
standard, this second example is not a data race, or at
least we can't say for sure that it's a data race
without breaking open the abstraction and peeking
into the implementation details of std string. It's still a bug, and it's
still undefined behavior, but we just can't call it a data race. And that seems silly because
from the programmer's point of view, it's exactly
the same mistake either way. So I talked to a bunch of
concurrency experts about this but they were pretty
much unanimous that no, even informally, we can't
really call this a data race. At most we can say that
this code will probably result in a data race for most plausible implementations of std string. But you never know if your
library vendor is a lunatic, std string might be
implemented with atomics and this might not
result in any data race. But it would still be incorrect. Since we can't call this
data race unfortunately, we need to introduce a new term. And the best term that I've
found for this kind of bug is an API race. An API race occurs when
the program performs two concurrent operations on an object when that object's API
contract doesn't permit them to happen concurrently. An API race is always a bug, and in fact, it's always undefined behavior,
just like a data race. So I claim that when we're
talking about thread-safety, we're really talking
about avoiding API races. And API races are a huge category of bugs. They include all data races
and also all of those bugs that look like data races
but involve object types and other more complex types. And yet, this definition is
still concrete and local, unlike race condition
because it's about misuse of a specific object. And consequently it seems possible to systematically avoid
API races and maybe even to detect them when they occur. One other thing to notice
about this definition is that in C++ we answer
the questions about how an object is accessed and
what its API contract permits by looking at its type. And that means that
rather than talking about the thread-safety of
functions the way Posix does, we should be focusing primarily on the thread-safety of types. But if an API race is a
concurrent access in violation of the contract how do you
know what kind of operations the object's contract permits? For example, does this
code have an API race on shared widget? We're concurrently invoking foo and bar, but to figure out if this is an API race, I have to go check the
documentation for foo and bar to see if they can be
invoked concurrently. And if this were a more realistic example with a bunch of operations, I'd have a sort of n-squared problem of checking each pair of operations to see if all of them are
allowed to be concurrent. But it gets worse. With this example, foo
and bar aren't methods of the same class, they're
methods of separate classes. But they're taking the
same object as a parameter. I could still go check the
documentation for foo and bar, but now they might not have
any idea that each other even exists, much less be able to tell me whether I can call them concurrently. I could go look at the implementations and see what they do
with that shared widget, but in order for that work,
I would not only have to look at the bodies of foo and bar, I would have to look
at all of the functions that they pass the widget
onto and so on transitively. It would also mean that
I would have to look at all possible future implementations
of all those functions. Because if somebody makes a
change to the implementation details in there somewhere,
they're not gonna come make sure that my code is still
safe after that change. And obviously that really
doesn't scale well. So in order to cope with
this, we need some help from the widget type. The most obvious way that
the widget could help would be for it to guarantee
that there are never any API races on widgets. If that's the case, we
don't need to dig through the implementations of foo and
bar because we know a priory that there's no way for
them to create an API race on shared widget. And obviously this is a
really useful property. And at Google unsurprisingly,
we say that such types are thread-safe. And having that shorthand
makes it really easy to document those types,
and more importantly, makes it really easy
to recognize them when you're reading the code. There's one caveat to mention here, which is that even for thread-safe types, destruction can't be
concurrent with any other operation on the object. You need some logic outside the object to determine when all
threads are done with it and so it's safe to destroy. Shared putter is pretty good for this. And there's some better libraries for it making their way toward the standard. So if thread-safe types
never have API races, should we just make all types thread-safe? Problem solved, right? No. The thing about thread-safe
types is that they're not free. You typically need a mutex or some other synchronization perimitive
inside every instance of a thread-safe type. And that can create deadlock
risks, and even if it doesn't, it increases the memory
footprint of the object and it imposes some performance
overhead on every operation. Now it's true that
mutexes are pretty small, and acquiring an uncontended
mutex is pretty cheap but the problem scales up as
you compose objects together. So this example, we have a jobrunner class that's maintaining a set
of the jobs it's running and a set of the jobs that it finished. And it's using a thread-safe type jobset for both of those sets but
it still needs its own mutex to ensure that every job is
always in the running set or the done set. That means that job runner
is paying the storage cost for three mutexes, and the
runtime cost of locking and unlocking them, but two of those three mutexes
are literally doing nothing. The jobrunner mutex on
its own ensures that only one thread at a time
will access either set. So we would've been
better off if we just used std set directly. And incidentally, for those
of you who were at Herb's planary talk this morning,
I tend to disagree with his first example
because it has this problem. His example where he had a
class with data and a mutex, but the client was
responsible for locking it. I think that's problematic
for this reason. And I can go into that a bit more later. As a consequence of this
issue, we generally don't make types thread-safe unless
their primary purpose is to big directly shared between threads and those threads can mutate the object. So for example, std
mutex and std atomic are thread-safe types but
ordinary library types like std string are not. Fortunately, it turns out
that there's a way for a type to provide a lot of the
benefits of being thread-safe while paying very few of the costs. This is essentially the same
as the previous example, but now we're sharing an
int instead of a widget. In this case, it's a lot easier to tell if there's an API race, because as we saw earlier,
the language will say that API races on built in types or in other words data races, only occur when someone
is modifying the object. And in C++ we have const,
so the type system can keep track of which inputs
a function can modify. And that means to figure
out if this code is safe, we don't have to dig
through the implementations of foo and bar because
it's enough just to know their signatures. With these two signatures,
the type system guarantees that shared int won't be mutated, and so there's no API race in this code. So the question is, can we
apply the same reasoning to a widget, is this code
guaranteed not to have an API race? Not in general, but it
is guaranteed to be safe if widget provides the
same safety guarantee as a built in type. In other words, if it
guarantees that an API race can only occur if one of the
operations is a mutation. And at Google we say that
a type is thread-compatible if it makes that guarantee. This isn't quite as simple a
guarantee as being thread-safe but it's pretty close and
it's far easier to achieve because thread-compatibility composes. If all your members are thread-compatible, chances are your type is thread-compatible with no extra effort on your part. And chances are, your
members are thread-compatible because nearly all -- or sorry, all built in types are thread-compatible, nearly all standard library
types are thread-compatible, and all thread-safe types
are thread-compatible by definition. You only really have to go out of your way to be thread-compatible
if you have const methods or friend functions that
modify some part of your physical state, or in other
words, they're logically const but not physically const. And the type system normally stops you from doing that accidentally
so the main things to watch out for are members
that are marked mutable since that indicates
code that is explicitly opting out of that const type checking. So for example, here
we have a pretty stupid string view like type that
computes its size lazily. And so that means that the
size accessor actually modifies the size member even though
the accessor is const because it doesn't modify
the observable state of the object. And that means that we need to
add explicit synchronization in the form of a mutex in
order to make this class thread-compatible. As an exception to that
rule, it's okay to have a mutex as a mutable member and in fact, mutex members usually
should be marked mutable so that you could lock
them in const methods like in this example. And that's good news because
if making the mutex mutable meant that you need another
mutex to protect it, then we'd be in trouble. More generally, you might
be able to avoid adding synchronization if you're mutable members are thread-safe but only if you make sure that you're const methods never break your types and variants, even temporarily. And that can be pretty easy to mess up. So generally speaking,
it's better to avoid that whole can of worms and
just make your const operations be physically const wherever possible. Keep in mind though that if
some of your object state is behind a pointer, then that
state will behave as though it were a mutable member
even though it's not marked with a mutable key word. So that's a situation that you
need to keep an eye out for. As a final caveat, I should
mention that we stole the term thread-compatible from the Java folks, but we gave it a stricter meaning. In both Java and C++,
thread-compatible is the term for the baseline level of thread-safety that nearly all types should provide, but in Java it doesn't
mean that concurrent reads are safe because Java
has no language level concept of a read-only operation. Java doesn't have const. So just be aware that if you
hear this term in the context of other languages it may
mean something different. So we've seen that if
widget is thread-safe, then this example is safe,
and if it's thread-compatible, then we just need to make
sure that neither foo nor bar are taking non-const references. But what if widget isn't
even thread-compatible? In that case, unless widget
is giving you some kind of custom guarantee, it's
gonna be nearly impossible to be sure that code like
this is safe as written. Instead, you have to use
a mutex or some other synchronization outside
the object to ensure that only one operation at
a time has access to it. But if you can make sure that
only one thread at a time accesses the object,
that object is guaranteed not to be the sight of any
API races, no matter what. As I mentioned before,
the most common reason that types fail to be
thread-compatible is because they have mutable members
or in some other way, they have unusual or
broken const semantics. And here is an example that
is near and dear to my heart. We have here a counter struct
that has a call operator that increments its c member. And the call operator is not const, so counter is a perfectly well behaved thread-compatible type. But when we wrap it in std
function everything goes south. We're calling f in two different threads, and f is const, f is
originally declared const, so we know that we're
invoking a const operation, and yet this code contains
an API race because the counter is getting incremented
in two different threads with no synchronization. Even though std functions
call operator is const, calling it concurrently
can be an API race, which makes std function
one of the few types in the standard that's
not thread-compatible. And the reason is that std
function stores the underlined function object as the moral
equivalent of a mutable member. The good news is that std
function is a rare exception. Most types are thread-compatible or have thread-compatible alternatives, so you're rarely stuck with
a non thread-compatible type. As I said earlier, in C++
thread-safety is primarily about types not about functions, but there are some rare cases
where you do have to start thinking about functions. So going back to our widget example, if the widget is
thread-compatible and foo and bar take the widget by const reference, that's still not quite
enough to guarantee that there's no API race between
these two lines of code. Specifically, foo and bar
might say that you're not permitted to invoke them
concurrently at all, even if their inputs are different. For example, their implementation
might look like this, where behind the scenes
they're both mutating the same static int
with no synchronization, and that would be an API race, if you called them concurrently. So when a function called
can create an API race on a object that's not one of its inputs, we say that the function
is thread-hostile. And notice this is a
property of a function, not a property of a type. And it's virtually always
because the function is accessing some data
other than its inputs. When you're calling a
thread-hostile function, all bets are off. You have to check the documentation to figure out how to call it safely, or better yet, don't call it at all, and find a better function with
no behind the scenes inputs. Functions virtually never actually need to be written this way. It's almost always an accident
or a mistake of some kind. By the way a lot of
sources refer to functions that are not thread-hostile
as thread-safe functions. And that's particularly common
in sources that are focused on c like the Posix definition
that I showed earlier. I recommend avoiding that
terminology because people tend to assume that it
means more than it does. Especially when you
describe a member function as thread-safe. Furthermore, in modern
code, you can and should just assume that every
function is not thread-hostile unless it specifically says otherwise, so we don't really need a special name for functions that are not thread-hostile. With those definitions in hand, we now have a pretty simple procedure for reasoning about the
thread-safety of line of code. A given line of code is
guaranteed to have no API races if it doesn't call any
thread-hostile functions, if there are no lifetime
issues, and if each input is either not being accessed by other threads or it's thread-safe, or
it's thread-compatible and not being mutated. And that's pretty much all you need in order to avoid API
races in the first place, or at least track them
down after the fact. There are a couple subtleties though, mostly around the issue of
what counts as an input. So consider this example. Like almost all standard library types, vector is thread-compatible,
not thread-safe. And here we have two different
threads mutating the vector. So does this code contain an API race? Who thinks it does? Some of you. Yeah, I would argue no it doesn't. This code is safe because
the threads are mutating different elements of the vector and those elements count
as separate inputs. You have a question? - [Man] Yeah if you go back
to the last slide a second, I don't know if I just
missed this but could you just go like what you mean
by an input being live again? - By an input being live? Just in the C++ sense that
it hasn't been destroyed. As I mentioned earlier,
you have to handle lifetime separately outside the type in some way. - [Man] Cool, thank you. - You're welcome. So as general principle,
when a thread-compatible type exposes some of its sub-objects, like how vector exposes its elements, for thread-safety purposes you can treat each sub-object as a separate object. And you can also treat the
remainder of the parent object as a separate object. And that applies not only to
the elements of containers, but also the members of pairs and tuples, the values of types
like optional invariant, and public data members
of classes and struts. However, that only applies
if those sub-objects are real objects that the API gives you direct public access to. And here's the exception
that proves the rule. This is almost the same
example as the previous one, only now we're working
with bools instead of ints. And this example does contain an API race and that's because the
elements of a vector bool aren't real objects, they're just notional boolean values that are represented using some unspecified implementation details inside the vector bool. The rule of thumb is that
for thread-safety purposes, a sub-object is
independent only if you can take its address or
form a reference to it. And you can't do that with
the elements of a vector bool, so this code is not safe. And that principle means
that if you're writing a thread-compatible type you
need to be thoughtful about when you're exposing pointers, references, or even const references
to internal objects. If you do that, you
have to clearly document which of your operations can
read, mutate, or invalidate those sub-objects. Of course, you have to
do a lot of that anyway, or your API will be confusing
even for single threaded users but thread-safety raises the stakes. And by the same token, when
you're using one of these types in a multi-threaded
context you need to have a clear mental model of what operations on the parent object can
access the sub-objects. So going back to the vector int example, there's a related issue
that it highlights. Sometimes a non-const
method isn't a mutation. In this case, the square
bracket operator overload is non-const but it doesn't actually mutate the vector itself, it just gives non-const access to one of the vector's elements,
to the calling code. So sometimes you can
treat a non-const method as being non-mutating for
thread-safety purposes but only if their API guarantees that they really are non-mutating. And the standard containers
make that guarantee for the square bracket
operator and for most of the other methods that you'd expect. So here's another way that
the notion of an input can be tricky. In this example, we have
two concurrent calls to f and they have no arguments in common. Nothing that's an argument
to one is an argument to the other. And yet, this is an API
race because they're both incrementing the second
element of v concurrently because they're operating
on overlapping ranges. You could argue that means
that f is thread-hostile, but we would hope not because
f is a perfectly reasonable and well-behaved looking function. This could have been a standard algorithm if I'd wanted to do it that way. So I claim that no, f
is not thread-hostile, and the reason for that
is that even though the second element of v isn't an argument to either of these calls, it's an input to both of
them because it's clear at the point of use, at the
point of the function call, that these functions are
going to access that object. As other examples of
the same kind of thing, an object might be an input
without being an argument if it's a sub-object of an
argument, or if it's pointed to by an argument, whether by a pointer or a
reference or a smart pointer or what have you. The thing to be extremely careful of here is if you have a private class
member that points to data that might be shared. Like in this example here. In this case, the widget has
a hidden pointer to a counter but the counter isn't
necessarily private to the widget because it gets passed in
through the constructer. And that means that we
can wind up in a situation like we see on the right where
these two calls to Twiddle have an API race even though they have completely separate inputs. And that makes Twiddle a
thread-hostile function. So it's much better to avoid having private handles like that, but if you can't, one
option for dealing with that is to make sure that the
shared data can't behave like an input for thread-safety purposes. And at a minimum that
means that you need to make sure that the shared
data has a thread-safe type. It's not enough to add a mutex to widget because the int is potentially
shared by multiple widgets and maybe even by code
that's completely unrelated to widget. So you have to switch to
sharing data whose type is inherently thread-safe. So for example we could change
this code so the counter is a pointer to an atomic int rather than and ordinary int. You also have to be very
attentive to the risk of race conditions in a case like that if you have any invariants
that relate the shared data to other parts of your program. So for example, that fix
of turning the counter into a pointer to an atomic int, might be sufficient if
the counter is just used for monitoring or something like that, but if the counter actually affects the logic of the program, I'd be very worried about this code, even if the counter were atomic. The other option for
dealing with this situation is to make your type
very very explicit about the fact that it points to external data and so that data is potentially
input to any function that takes your type as an input. And that's what the iterator was doing in the previous example. An iterator type is a type
that is very very explicit about the fact that it
confers access to some underlying range and
so if a user sees code that passes an iterator as an argument, they're not surprised
when the underlying data gets accessed. As a final note, it's
important to keep in mind that the more layers of
indirection there are between the formal arguments
and the actual inputs to your code, the harder it can be to
determine whether the inputs are mutable because the
const part of the type system essentially has a harder
time getting a hold. So for all those reasons,
it's generally better to keep the relationship
between the arguments and the actual inputs as simple
and as direct as possible. So much for the theory, how
do we actually apply this in practice? When you're defining types,
you should make sure that your types are at least
thread-compatible if possible and you should avoid patterns
that make thread-compatibility hard like mutable members. You should make your
types thread-safe if you expect public mutable instances
of the type to be accessed by multiple threads, but
otherwise you don't need to worry about making a type thread-safe. You should always document
if your type is thread-safe or if it's not thread-compatible
because those are the unusual cases. Of course it's better to
document all three cases and to document every
type as being thread-safe, thread-compatible, or
not thread-compatible. But if you omit thread-compatible, that's what readers will tend to assume. Assuming you make your type
at least thread-compatible, you should be thoughtful
about directly exposing any of your sub-objects
because that requires you to make it clear to the
user how the sub-objects relate to the parent object
for thread-safety purposes. Question? - [Man] You seem to --
your classification is thread-safe, thread-compatible,
and not thread-compatible. - Yes. - [Man] You're not saying a -- - I'm not saying thread-unsafe, yeah. Internally we've, uh, we've talked about
thread-safe thread-compatible and thread-unsafe. The thing I don't like about that is that it's not a hierarchy. Every thread-safe type is
also a thread-compatible type, but every thread-compatible
type is not a thread-unsafe type so it's a little -- it seems
a little simpler conceptually to me to just talk about
thread-safe and thread-compatible as two strengthenings of the
baseline fact that it's a type. Does that make sense? - [Man] Yeah, I buy it, cool. - So when defining functions,
including member functions, you should avoid making them
thread-hostile at all costs, just never do that. And that means that there
should be no hidden, mutable shared state and you
should be very very careful about having private pointers
to data that might be shared across threads. If you have a thread-hostile
function that you can't fix, you should document very
clearly that it's thread-hostile and explain how to use it
safely or point readers to safe alternatives. When you're writing
concurrent application code, avoid calling thread-hostile functions, and make sure that all inputs
to a given piece of code are either thread-safe, not
being accessed by other threads, or thread-compatible
and not being mutated. And usually the best way to do that is to make sure that all shared objects are either thread-safe or
thread-compatible and mutable. And if you need to share
state that doesn't meet those requirements, define a wrapper type
that does using a mutex. And in the very rare cases where you can't follow those guidelines, read the documentation
and be very very careful. And with that, are there any questions? (applause) Thank you. (applause) - [Man 2] Question, why
did you not talk about re-entrancy concepts? - Sorry why didn't I talk about what? - [Man 2] Re-entrancy. - Re-entrancy? - [Man 2] Re-entrant. - Um, mostly because -- - [Man 2] It's very close
to thread-compatible right? - Re-entrancy has, particularly it has some close connections
to thread-hostility. Honestly the reason I
didn't talk about it is just that it doesn't
come up that much for us. Normally if a function
has re-entrancy problems, it's probably gonna be thread-hostile too. So that's the main reason
I didn't talk about it. I will say there's been
some discussion recently on the committee mailing
lists about what exactly do we mean by re-entrancy. Some people were thinking
that it's just about functions and other people were thinking
that it's about types. That like, whether a member is re-entrant is a question of whether
you can re-enter a function on the same instance
versus separate instances. And that's all pretty fuzzy,
but again, in practice, if you're writing code
in a concurrent world, at least we haven't
found that re-entrancy is an issue that comes up very much. - [Man2] Okay thank you. - [Man 3] My question is
about your definition of what is a data race. Let's consider a case when two
threads set the same stream or something, of course -- - Sorry I'm having trouble hearing you. - [Man 3] So let's consider the situation, two threads set the same
stream simultaneously and it is a data race. Then I add mutex to
(unintelligible) that as a function, well from the definition,
it's not a data race anymore. The question is if I add
a mutex but there's only purpose to (unintelligible) an exception. If the mutex is hard to
locate, is this situation, it a data race or it's
kind of not a data race because well, simultaneous
access to the object is generally considered the exception. - I'm sorry I'm having
trouble hearing the question. - [Man 3] If I cover possible
API race moved by some, will form its behavior,
but just so an exception in our threads who is not
capable to acquire it, is it a source? - If you have a mutex but it's just for -- - [Man 3] So exception, if it's locked. - Throwing an exception if it's locked? - [Man 3] Yes. - That, I think it would
depend on the specifics of the situation, but it sounds
like, so long as you, uh -- at the end of the day, an
API race is just a situation where you have concurrent access
that the object's contract doesn't permit. If you have a mutex
guarding an object then whether you throw if the mutex is held, or block if the mutex is held, shouldn't matter so long as it enforces mutual exclusive access. - [Man 4] I kinda wanna take slight issue with one of your earlier points about not having the possibility of an API race if you don't have concurrent access. And it's sort of a special
case but I've definitely run into it before. I would say you would be
correct if that was qualified to say that the functionality
does not explicitly or implicitly rely on
anything that is tantamount to thread local storage. However, I have run into cases
where you can get API races depending on which pseudo-thread context accesses the object first and
this is particularly the case with certain types of call objects in the apartment model
that might get your object implicitly loaded into it the
first time that you access it. So my slight quibble with that
is you have to qualify that by saying there's no implicit
reliance on something like thread-local storage
otherwise there still exists the possibility of an API race. And I guess the follow up to that is, is there any context in
your internal nomenclature to mark something that would
be potentially dependent on something akin to thread-local storage for that type of situation? - Um so I think I would be
inclined to describe a situation like that by saying that the
operations that potentially access that thread-local
storage are thread-hostile. Because the -- at least if
I'm understanding correctly, you're saying that you
could wind up in a uh, in a race even if the
inputs are different? - [Man 4] It's not a classic
data race, it's more of your API race where the
behavior of the underlying APIs might be different
depending on which apartment context you had implicitly loaded into, which is based on the
timing of which thread accesses the object first. - Mhmm. Yeah I guess uh -- - [Man 4] I mean you could
construct a simple example where you had a thread-local
storage variable and depending on which thread,
it did something different. - Yeah, I guess I would
say that I would classify that as a form of thread-hostility
because it's about hidden implicit inputs, namely
the thread-local storage, which isn't explicitly
referenced by the function call. - [Man 4] Okay, I just wanted to note that you can have those even if
you're doing effectively immutable operations against
the particular object. - Sure, yeah. Once you're in the realm of
thread-hostility, the whole -- basically thread-hostility
is where the notion of types as an abstraction for
dealing with thread-safety breaks down and you
have to start looking at individual functions and
things like immutability stop mattering as much. - [Man 4] So then would
it be fair to say that you would characterize
anything that's implicitly dependent on thread-local
storage context as implicitly thread-hostile? - Um, if it's implicitly
dependent in the sense that the contract forbids you from calling it on different threads then uh -- - [Man 4] In this case you
can, you just have to be aware of -- - Well if the contract
permits it, then yeah, we're no longer in the realm of API races, we're talking about race
conditions where you have some, you potentially have some
higher level logic bug that depends on threads
or timing in some way. But you're not violating
anybody's contract. - [Man 5] Hi, how does your classification differ from how the standard talks about thread-safety of its types
and is the standard gonna move in the direction of
what you're describing or -- - So the standard for the
core language just talks about data races and built in types. So the core language
doesn't really have to worry about these issues. The library standard does. And the library wording
in this area is very mushy and like, even the people
who wrote it are not very happy with it, but it
actually attempts to capture exactly these concepts. It was co-written by a Googler
who's aware of this concept. The intent roughly speaking is to say that standard library types are
thread-compatible unless otherwise specified. - [Man 5] And the way that
they talk, you mentioned the case of the vector bracket operation being a non-const operation
but is not an API race. How do they express that and is that -- I'm just curious because
your terms seem pretty clear. I'm just curious if the
standard is going to move in the direction of
describing types in this way and describing exceptions to
them using this framework. - I'm not aware of any movement toward -- also, well, there's two
parts to the answer. Regarding the square
bracket operator case, there's wording, I believe
it's in the general description of containers
that says methods with these names are -- it
doesn't say methods but, um, functions with these names uh, don't access or, don't mutate the object for purposes of determining a data race. That's one thing that I think is not ideal about the library wording,
is that the library wording is still talking about data
races rather than API races. The other half of your answer
is there's one small way that the standard is starting
to move in this direction. There's a proposal that
is pretty well-advanced, although it's probably gonna
go into the concurrency TS rather than the standard right away, where there will be an is race free trait that defaults to false,
but defaults to true for const types and then
is overwritten to be true for things like std atomic. And that intentionally
reflects the notion that we can just assume that
types are thread-compatible unless they say otherwise. - [Woman] Your talk
was mostly theoretical. What about more practical
advices, like what problems with straight (unintelligible)
you have mostly productions or inproduction, like, I
don't know, people using mutables or internal production or -- - Sorry I'm having -- can
you move closer to the mic? - [Woman] I'll try to repeat. So your talk was mostly about theory, what about more practical advices like types don't use mutable
because we had too many production incidents
or something like this. - Yeah, um. I wouldn't say that uh -- I can't draw a straight line
between much of this guidance and production issues. This is mostly about how
to organize your thinking about concurrent code
and make it tractable. I don't really have any
specific practical advice to add beyond what I've already said, particularly the stuff
from the last slide. - [Man 6] If you wanna
take it offline afterwards, I have a fair chunk of
practical advice from Google on this front. - Any other questions. Oh yeah. - [Man 7] Hi Geoffrey, um more
of a question slash comment. Are you familiar with exceptions
that distinguish between strong and basic exception guarantees? - Somewhat yeah. - [Man 7] I was just wondering if the term strong versus basic are
better adjectives for what you're describing? - I don't immediately see any
problem with calling this, like saying strong thread-safe
and basic thread-safe, it's a little more wordy. I can't claim that these are, you know, the best possible terms. These are just terms
that we've used at Google for quite awhile and they've worked. - [Man 7] Yeah and it's
just a suggestion I guess, but like you said, when people
say just the term thread-safe they mean different things. - Yeah, so Herb Sutter
actually mentioned to me a couple of days ago
that he prefers the terms internally synchronized
in place of thread-safe and externally synchronized
in place of thread-compatible. I'm not sure I like that
terminology quite as much because types that are
internally synchronized might not actually -- like, std atomic
doesn't contain a mutex, and an externally
synchronized type might not actually be externally
synchronized at all. It might just not be
accessible to other threads, things like that. But that's another set of
terminology that exists for this. - [Man 8] A couple quick things. First you mentioned std function, you mentioned the problem that was there. I know your answer to this,
but I think it would be great if you said it here. You consider std function
to be const, correct? - Yes, I think this is
a bug instant function, plain and simple. There have been attempts to fix it, time's starting to run out for C+ 20, but maybe we can make it. - Yeah and I guess, you
mentioned that there were other places where we are
not thread-compatible, did you have any other ones
in mind, other than like, you know, vector bool or
well, not vector bool, other than std function I should say. - Std function is the main one, is the main type that I
know of in the library that's not thread-compatible. There's, I believe, there have been -- there's been at least one
type in the standard, I think, it was some kind of reverse
iterator something like that, something that was
specified in terms of having a mutable member and thereby
became thread-incompatible, but I think that was fixed
and off the top of my head I don't remember which type it was and I haven't tracked it down. - [Man 9] Do you have
any thought on how to doc something like std maps operator bracket, which is only thread-compatible
when insertions do not occur? - Um. I don't think there's any
good shorthand for that, I think documenting a situation like that is just gonna be a matter of
writing complete sentences. I think the square
bracket operator on maps is a fiasco in a bunch of different ways. I'd really like to see a
proposal for overloading the square bracket equals, in other words, a mutating square bracket separately from an accessing square bracket. But that hasn't happened yet. But yeah in the mean time, operations that don't look
like they mutate but do are very problematic when
it comes to thread-safety for precisely this reason. Go ahead. - [Man 10] Obviously
you talked a lot about how you guys talk about thread-safety with respect to objects. Is there any movement or
impetence to attribute types appropriately and
then do some compile type checking on that? And then just as a total aside note, I'm gonna take a slight disagreement with you on std function. I think std function
internally has the equivalent of a private pointer and it
has the exact same problems as any other class that
has a private pointer, where the pointer itself can be const, but it can have mutable
operations inside it, and if there were an
effort to attribute code that would have to somehow be encapsulated where the type is const but
the things that it could be pointing to may be non-const. - So, taking that sort
of in reverse order, um, I see your point but I
disagree with the notion that std function is
essentially a pointer like type. And the reason I disagree with
that is that std function's cpy deconstructor performs a
deep copy of the underlying function object. Which is actually turning
out to be kind of a problem because it means that you
can't use std function to wrap a move only function object. So that to me makes it
much more of a value type than a pointer type. And makes the behavior that
showed const incorrect. As for attributing in this kind of thing, we haven't done any
work in that direction, we do have some --
there's some thread-safety annotations that I think
are now public in client, but that's more for marketing things like this data member needs to be protected, you need to be holding this
mutex when you're accessing this data member, that kinda thing. And it's a pretty limited
best effort kinda thing. I have a sort of pipe dream,
but one of the reasons why the experts won't let me
call an API race a data race is that they wanna
reserve the term data race for things that TS can diagnose. I have a sort of pipe
dream that maybe we could attribute types as being thread-compatible or thread-safe and then
have TS diagnose those rather than waiting until
they turn into actual data races but there are
some formidable obstacles to actually making that work. Any other questions? Okay well thank you for coming. (applause)
I think this is one area where C++ should look at how Rust addresses the issue. Data race safety is enforced in the safe subset of the Rust language. A similar type of enforcement is available in a (data race) safe subset of C++. (A subset that excludes raw pointers and references.) (Shameless plug alert.)
First of all, you want to note the distinction between a type being safe to pass (by value) to another thread and being safe to "share" with another thread. (Rust calls these "Send" and "Sync" traits.)
And rather than just "documenting" the safe "passability" or "shareability" of a type, it can be "annotated" in the type itself. This allows thread objects to ensure/enforce at compile-time that none of their arguments are prone to data races. And any type that needs it can be annotated by wrapping it in a transparent "annotation" template wrapper.
For example, it's not really safe to share
std::vector<>
s among threads, because any thread could obtain an iterator to the vector and inadvertently dereference it outside the period when it's safe to do so. But you could imagine a vector type that (is swappable withstd::vector
and) does not support (implicit) iterators, and so would be more appropriate for sharing among threads. (And recognized as such by the thread objects.)You could even imagine a data type that safely allows multiple threads to simultaneously modify disjoint sections of a vector or array.
I think we really want to get beyond having to explicitly deal with mutexes. Just like how
std::shared_ptr<>
(andstd::unique_ptr<>
) is premised on the notion that it's not a good idea for the lifetime of a dynamic object and the accessibility of that object to be manually coordinated by the programmer, it's similarly not generally a good idea for the synchronization of the object to be manually coordinated (with its lifetime and accessibility) by the programmer. I think the obvious progression is to have reference types that automatically (safely) coordinate lifetime, accessibility and synchronization of dynamic objects.is vector::operator[] guaranteed to be "thread safe"? I don't see anything on it that says it's safe to call concurrently. He uses this in an example (on a non-const vector) talking about how it's safe, but I don't see how that's guaranteed.
relevant part of the talk: https://www.youtube.com/watch?v=s5PCh_FaMfM&t=23m14s
edit: I'm asking VERY SPECIFICALLY about whether the code inside
std::vector::operator[]
is guaranteed to be thread safe.The const qualified version certainly is, but what about non-const qualified version?