Concurrency in C++: A Programmer’s Overview (part 1 of 2) - Fedor Pikus - CppNow 2022

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] so it's a two-part talk on uh concurrency in c plus plus language so it's kind of a very general overview of concurrency features that we have in uh the language the revolution with the standard and you know i don't want to be basically the audibles for cppreference.com you can go and read that yourself so the i'll show you not everything but a lot of things kind of and i will show you basically what they are where to find them what they are for and then if you so if you want to know more about each individual construct or object or class function uh you can go read about it i'll tell you sort of the types of things that are possible but more importantly i'll tell you the oh basically you know the reasons for them and the kind of lessons that you can learn from the design of these features in the standard for the design of your own software when you build it so at the same time you know i'm not going to be like speaking in standards i'll the program programmer's perspective basically means what's in it for me as a programmer and it depends on who you are with regard to this programming so and in in a lot of ways it's kind of make your own adventure because this i have never given this talk as a two-part talk so i added a lot of material but i also don't want to bore you with something everybody knows so let's see where you're all at uh who has never written a concurrent program in c plus okay so everybody was writing some concurrency all right so who has the luxury of starting a new project and is not bound by existing code base okay quite a few people all right uh who is working on an established system that has concurrency okay and who is writing libraries more than half of you all right so basically if you were starting your first concurrent program you get a basic set of tools with a few things missing which is like almost enough if you are an experienced programmer starting a new project and you want to use nothing but the standard you get uh kind of enough to get you hooked not enough to actually complete the job if you're working on an established software system you probably already have everything you need so why would you need this at all and i'll tell you i'll answer that question for the library well it's one of the ways to get all the low-level tools you need it's not the only way but it's one of the ways but more importantly and that's for everyone you get guarantees provided by the language you get a dictionary you get a vocabulary that you can use to speak to other programmers that is common enshrined in the standard so you have certain assurances that you will understand each other and a kind of a way to think about and to reason about these concurrency problems that again is common so when you discuss your concurrent designs with other people you have kind of a a common starting point so a lot of the presentation will actually be on that on those last three items on the slide and with that let's start the first part which is the introduction into the language i'll show you uh in the introduction part i'll show you like the basic language features tell you when they appeared how to use them what they do i'll show you a few of them in real life so let's start with uh before c plus plus 11 so c plus plus oscillating earlier this is all the standard has to say on the subject of threads fits on that one slide so there is no words thread there is no a word concurrency okay why does it matter because in practice okay who was writing concurrent programs before 11 rolled in in c plus plus i mean not okay so quite a few people so why does it matter uh and we'll talk about that specifically then c plus plus 11 uh came in introduced the thread both are there type uh an object thread and and well not the object this function but and also but as a concept memory model which will defer for a little bit a bunch of concurrency primitives promises and futures atomic operations memory barriers which relates to memory model thread local data threat safety guarantees we'll cover all of those c plus plus 14 added a couple primitives two synchronization primitives to the standard 17 added a few more and the big thing was parallel stl we will cover that in the second half 20 added a few few enhancements here and there to existing classes and the big one was coroutines and i'm talking you know specifically about concurrency additions of course concepts and stuff but that's kind of orthogonal to concurrency so i'm only focusing on the concurrency related additions okay let's start with c plus plus 11. then go ahead and ask questions if so and in order to understand the importance of what they did in class plus 11 because you know it's more important than than they added gave us a couple headers it's a lot more than that a couple headers are really not not that big of a deal in order to answer that question we have to understand what didn't we have really like what was really missing before that all right so this is your this is how you wrote code uh in c plus plus or three for concurrency it looks something like this yeah you might have had a class that encapsulated it but uh i'm stripping off that class and that's what it boils down to so if we have some computation that needs to be locked we called let's say if you were on a posix system we called the p thread functions uh if you were on windows you called the win32 api and this is uh this is how you wrote the code now for let's look at it the way a pure the way a c plus plus compiler looks at it we only know c plus plus standard from this point forward we don't know anything else uh there is a we clearly notice that the p thread mutex lock operates only on mutex let's assume that somehow the compiler either knows that x is not global or it can see implementation of p thread mutex locks and it knows that x is not involved anywhere in that implementation let's assume that compiler can deduce that there is no aliasing that really pthread mutex locks operates only on mutex and same for unlock the middle line clearly does not depend on mutex again if we can prove that there is no aliasing of any kind or it doesn't depend on music well if one of the most common optimizations that the compilers do is hoisting the common sub expressions out of the loop this is what it looks like in this case according to the c plus plus standard this is a legitimate optimization well it would be whether or not it has to repeat it 10 times maybe maybe it has to maybe it doesn't depending on what it knows about the functions that's not important either way you really don't want the compiler to do this everybody understands why you don't want the compiler to do this yes okay on the other hand nothing in c plus plus or three prevents the compiler from doing this so in practice like everybody wrote the code like this so it worked why did it work and the reason it worked is uh well okay before we say the reason it worked here is another thing that you do you have an array of ends and on one thread you you do some operations on this data sub-zero element and you assign some value into it and on another thread you do some operations on data sub 1 and you assign another value into it is that thread safe again the standard says nothing about thread safe because it doesn't even have that word in but whatever guarantee does that the previous example work there is some magical thing in practice there is some magical thing that guarantees us that the previous example was with locks worked does that magical thing still guarantees us that this work okay who thinks out of the people who programmed before 11 for concurrency who thinks that this in practice was threat safe who think that that wasn't threat safe maybe okay what if instead of int we have some type t which is not in did that work depends on the type t all right so now let's let's actually find out what the magic is the magic was that every compiler that was allowing you to write concurrent programs in practice took upon itself to obey another standard and in case of pthread mutex logs it would be the posix standard so for example gcc was constrained by two standards not one one was c plus plus standard and the other one was pause extender and the posix standard among other things those names b thread mutex lock and so on they were actually mentioned in the standard as things that you shouldn't move around basically they you know that it has a global global side effect uh you should not move code across these invocation code now there were still few problems with that one uh none of these standards uh were actually poor as portable sc plus plus itself was on windows you had a win32 which also has its own standard for the concurrency calls but not necessarily the same as posix two for example posix actually does not tell you what this does for all types t and in practice if you were on x86 this was thread safe if it was aligned correctly and if you change it to bool then it wasn't and if you change it too short then it may or may not be depending on on the code gen or what what code compiler actually generates but with bull it was very easy to observe that this races this gives unpredictable results because posix actually did not specify that uh as part of its uh standard restrictions that this that accessing adjacent elements of a buller or char array should do any is restricted in any way um so what was missing what the thing that was missing that magic thing is called the memory model and that's why we have a an entire part of this talk dedicated to memory model because that's what we got in class plus 11. we had a posix memory model which doesn't as it turns out it does not provide sufficient guarantees for for writing concurrent programs we had i don't know if m32 provides sufficient guarantees or not uh but in any case they would be different from posix guarantees and uh anybody anybody tried to cancel a thread uh be like before c plus plus or three like in posix if you did you would find out that the different versions of posix standards stated slightly different things about what should happen if you do all right so i'll i'm going to uh defer the discussion of the memory model because i don't want to go into kind of semi-philosophical discussion right away i want to show you a few concrete things first so c plus plus 11 came in and gave us the memory model which i'll show you but it also gave us a some some practical tools so let's see the practical tools first of all we got the thread the way they implemented the thread is like this it's basically a type erased thing so you give it a function pointer you give it you give it the arguments it creates an occur it's an object the constructor is a template the object itself is not a template so there is type erasure going on inside when the object is created this callable is invoked with these arguments creates a separate thread runs in c plus plus 11 you have to join it yourself if you destroy it without joining it's undefined behavior any callable returning void is allowed you can use lambdas uh so as i said it's type erase because the object itself is not the template the const the constructor is inside it's probably done with runtime polymorphism uh like a normal type erasure a lot of libraries before c plus 11 were making threads like visible classes you would derive from a common base so the class itself was a template derived from a common base doesn't really matter i mean they chose this way it's it's equivalent um there is no way to reuse this object after the computation ran to its end you have to delete it well okay let's see how it uh works in practice okay so c plus plus 20 did one minor addition it stuck the j in here and all it does is it calls join in the destructor that's it the exact same thing except destructor automatically calls join simplifies a lot of coding basically you don't have to manually if you if you destroy the the plane thread without j in um and you don't call join it's ub and actually like on gcc it will terminate all right let's see what it does well okay let's see what let's see what the code looks like you can uh so ignore the log guard stuff for a moment uh i have the so this is my thread it'll start a new thread and it'll keep printing these asterisks in a loop and this is the main program which will uh which will create a thread and then the program itself will keep printing dots in a loop so here they're both running at the same time and they're printing interleaved their print statements aren't relieved uh they're on different sleep schedules so the the two beats are no not commander that's beating at three and five um that's just how long i choose to sleep for uh that's just to show you that they're running at the same time okay so it creates it creates this thread and calls a function on it now what do you what do you use this for okay uh let's see lots of different users so let's see who uses who who use threads for pure performance just to get more computation done okay that's all everybody what about asynchronous operations that aren't particularly intensive and computing you just don't want to wait for them you okay no if you do it in an hpc context you want to use more hardware resources you have multiple cores you want to get more out of hardware if you're doing for example a ui you just don't want the ui to freeze while it's if if if your program needs to do like half a second of computation you don't want ui to freeze on you or if you're waiting on a socket you don't want your main program to block while you're waiting on a socket so your task thread was waiting on a socket it doesn't actually use any cpu so you are not utilizing any more cpu your main program still really like one thread running on that cpu you just don't want to block waiting on a socket so the second thread doesn't burn any cpu that just asynchronously waits on a socket so you your main program doesn't block things like that okay so why isn't this enough i mean c plus 11 gave us threads seems like okay we're done you have a lot of parallel computing to do you create a thread put the put the computation onto the thread uh join the thread when when you want the result you join the thread join will block if if the function is still running it will not block if the function is already done if whenever the join actually completed you guaranteed that the function completed so whatever the result it produced uh it's uh they're finished uh these are threads so it runs in the shared memory you can you have access to the same addresses so if the thread writes into some variable you get that's how you get the results out so then you can look at the data so we're done well not quite uh i have to explain to you now about what are the threads and what is the nature of the threads so the threads first of all there are two kinds of threads well at least two that we're talking about too there are kernel threads which are independent instruction instruction sequences from the point of view of the processor well and the os obviously that manages them so operating system schedules kernel threads to run on the cpu you can have more uh kernel threads than you have cpus but at any given time one cpu will be executing a instruction sequence from one kernel thread now if you have smt uh then it's effectively like one cpus effectively two cpus so then i would say these two virtual cpus can execute instruction stream from two threads but for the purposes of this talk i'll just treat smt cores as two separate cores smt is just a stands for symmetric multi-threading that's what intel calls hyper threading uh all it does is basically it's it's it's really one core with uh a little bit of extra hardware that allows it to have a once again another pair another program counter no smt is is a general term smt was actually there before before hyper threading uh and like even on even on like on some of the intel machines in the bias that option is enable smd and any other questions said smt is not is not going to be important for the purpose of this so uh these things so the cpu info says that i have 16 cpus on this machine that's with smt enabled i'm going to say pretend that i have 16 cpus on this machine for the for the duration of this talk uh okay now at any time as i said one of these kernel threads run on every uh processing core uh the kernel is in charge of deciding which one when you create a new thread in the user space what you get is a user thread the operating system runs all the user threads on all the kernel threads however many kernel threads it wants to use for for running your user threads um now your if you worked on linux or on windows you're probably used to not having that distinction and that's because both of those operating systems support what's called one-to-one threading model one user thread is mapped to one kernel thread okay anybody has worked on a system that that had a different one that had more user threads than kernel threads no uh spar sellers anyone ax okay um so there are operating systems that have what's called uh a to m uh model where you can have n user threads that are mapped to m kernel threads from the user point of view there are n independent instruction sequences from the processor's point of view there are m independent instruction sequences the os maps your user streams to kernel streams and decides will basically you can think of it as user threads are tasks that are running on a thread pool that is that kernel threads form that thread pool so this is kind of the easy mental model for this now stood threads stood thread creates a user thread uh no oas that i know of can manage a huge number of kernel threads and by huge i mean like millions because what what would happen if you said that i'm going to take all my independent computations i'm going to create a thread for each one and just let it go and wait on it all at the end well in a program with a lot of intrinsic parallelism you would be able to fire off millions or maybe even more of independent tasks as stood threads and then on a one-to-one operating system it would go ahead and create millions of kernel threads i'm not going to demonstrate this because i would probably have to remove the battery from this laptop to recover to force it to reboot if i did that so in reality you you almost always get one-to-one model your second option is to get a very badly broken end-to-end model there is you know a lore of secret knowledge of how to write a functioning end-to-end model uh i think in this or in this room i think only one person have actually seen it and that would be you aix was the only one that i know that actually had a beautifully working end-to-end threading model you could actually go ahead and create 10 million threads and it managed it for you well you could run out of stack space you would have to reduce your presumably you know if you're calling that many functions like at the same time each one doesn't have that much stack space you you would have to prune the stack you reduce the stack space so just like on linux it's configurable but uh the important part is it wouldn't try to actually run all of them at the same time it basically ran a thread pool under covers treated your threads as tasks and and executed well the second problem the second reason why you don't want to do threads directly is threads are actually very expensive to start and join at this point you should demand that i prove it to you because never take any statements about performance on faith performance should always be measured now i am going to prove it so let's see i'm just going to do it repeatedly to average the results but this is what i'm doing and this time i'm actually doing it with the standard with a simple 11 thread so i'm starting a thread the thread uh the first thing and the only thing i do is i'll capture the clock before starting the thread i'll also capture the clock and when i join the thread i'll also capture the clock and that's all i'm doing so this will tell me uh basically after i ask it to start the thread how how quickly the thread actually can begin doing useful work and also how long it takes me to join number of threads okay i can do eight on this machine without uh overloading i can i could do 16. so as you can see the average time to start a thread is well 0.1 millisecond the to join the thread twice as much the worst case in this case it's about the same okay 16 threads this is actually a pretty long time for for for computation uh like individual operations i measured in nanoseconds microseconds are typically like hundreds of thousands of instructions you can execute in this step so this is a lot which means uh you actually can't put short computations into these threads because that you would you would take longer firing up the thread than uh doing the work so if your computation it takes less time than this you're better off doing the computation on on the main than firing the thread for it okay just measured you can there are several things you could measure you could measure the throughput which is how many threads you could launch per second the latency the delay but the bottom line is it's all of these delays are significant compared to the computation times uh tricky to measure as i said but uh we get the idea so how do we use the threads correctly then since threads are expensive to start and join we have to start a small number of threads you can start one per cpu maybe maybe you can manage two per cpu but not a lot more than that and then you have to manage your own work on those threads so you keep those threads alive for a long time so you don't pay the overhead for starting and joining them and you get those threads to execute the pieces of work they're usually called tasks that you have and when it's done executing the piece of work you don't join the thread you tell it to go find another piece of work to execute and so on now this is at least true for computational threads for hpc threads the background threads are like uh waiting on sockets you know that's a lot easier because os actually manages large number of ideal threads reasonably well probably still not millions but like 10 times more than cpus no problem if the threads are just sleeping uh uh they're much easier to manage for the os uh so if you're if you have so for like for example in in our distributed software system we we can be talking to we will we talk through sockets to every uh remote machine and we can run on twenty thousand of those so and we and we use one thread per socket so we would fire up twenty thousand threads just to manage the circuits no problem but they're most less sleep they wake up when there is something to do with the sock so background threads are easier computing threads you want to do you want to manage them yourself all right what tools do you have for that i'll briefly mention it and the reason i'll briefly mention it because i'm only mentioning to tell you what not to use and i'm not going to demonstrate that either because the end result will be either it won't scale at all or it'll kill the machine there is the async there is a future there is a promise now future and promise are just the types future is a placeholder for a result that will be computed asynchronously and will materialize sometime later promise is where the thread that computes the result that will be computed sometime later puts it temporarily until the promise becomes the future these are just some types by themselves they don't do you any good async is the the thing that actually executes that action as asynchronously or whatever computation you want in theory that's all you need in practice if if you're if you want a background thread that's not so bad on the other hand you can you might just launch a background thread it's not that hard if you want to do computations you don't for performance this async stuff is entirely useless the standard actually doesn't say that thou shall implement this as badly as you can but in practice that's what everybody does the usual implementation so the standard gives you two options when you launch it one is asynchronously run it serially if you ask it serially you'll get you get no scaling at all these async tags are actually executed like as you call them they get executed one time if you ask asynchronously it immediately fires up in practice it's it's not required to do that but every implementation that i have seen immediately fires up a bunch of threads and now we're back to running one million threads all at once and choking up the machine so what we really need is the thread scheduler and we haven't got that in c plus plus dialog the thread scheduler is called executor and we haven't got that yet all right stood future and stood promised they are just types with some convenient member functions if you have your own thread scheduler and you like them feel free to use them almost every library that actually implements a thread scheduler has their own version of futures and promises that is part of the library and sometimes they look a lot like standard c plus plus ones and sometimes they don't either way no big deal all right well you know if you have threads you must have logs everybody who programs those threads has locks so any questions on threads by the way no okay logs c plus plus 11 gave us that uh basic mutex stood mutex has lock unlocked try lock uh it's actually implemented it just forwards all of its calls to to the operating system mutex posix mutex on linux the way you use mutex is like this before doing some unsafe computation you and i'm dropping a lot of words that i haven't defined yet like unsafe computation i wanted to show you the you know the toys and then i'll we'll talk about memory model because it's kind of drier stuff than this so you lock your mutex before doing unsafe computation and unsafe computation is the one where you are competing for the same output from multiple threads so i'm adding to this sum on multiple threads this is unsafe well actually never write code like this so no do not use mutex like this the way you use mutex is like that and the reason is uh the same as with any resources if you throw a return out of here this is left uh locked permanently no there is nobody to unlock it nobody else outside knows that they might be responsible for unlocking it which means it's locked permanently the next time you try to lock it you'll deadlock so no do not use mutex like this right away in the same surplus plus 11 they gave us this which is just uh a resource acquisition type wrapper around the normal mutex the constructor locks the destructor unlocks this is a c plus 17 syntax i'm deducing the type the template argument and the constructor in c plus plus 11 you would have to put the the mutex type in here it basically provides you uh so provides the automatic unlock and makes it exception safe there is a slightly more complex thing called unique lock which can do that but it can also take ownership of a locked mutex and transfer it it's kind of like unique pointer for for locks if you just destroy the unique pointer you destroy the object you can move out of unique pointer now you have another unique unique pointer somewhere that now owns that object same here unique lock is the same way if you move out of unique lock into another unique lock that's that one now owns the the resource now there are a few more mutexes so doing this on one thread as written here is a guaranteed deadlock why well this thread is holding the lock the same thread is trying to get the lock the second call will block until somebody unlocks the lock the only thread that can unlock the lock is the one that's holding it trying to unlock a lock from another thread is actually defined behavior but this thread is not making any progress wherever the call to unlock is we're not getting there because we're blocked here which means this is a guaranteed deadlock it's best not to do that if so if you really have to do that and pretty much the only place where you end up maybe having to do that is if you have classes with internal locking like oh so let's say you want to create a safe vector and you put an internal mutex into the vector and now one method looks and does its stuff but then it calls another method for something and that other method is also public so it also locks now you've done that well it's bet it's better not to do that it's better to create alternative private non-locking versions of your methods and call those but if for some reason you have to call you have to do this there is a recursive mutex that reference counts uh another interesting thing that they gave us in recognition of the problems with mutex here is one thread we have some mutexes here is one thread locksm1 then locksm2 here is another thread locks m2 first okay then locksm1 the problem is that this thread is already locked in one that locked them too successfully now thread two tries to lock uh m1 no can do thread one tries to lock m2 no can do both of them are responsible for unlocking the mutex that they already lock but they can't get there because they're both waiting on some other mutex again that's a guaranteed deadlock what they gave us is this function stood look which whatever algorithm it uses internally guarantees that no deadlock will happen uh in c plus 17 we got this thing called scopedlock which so notice that this is a function this is not an object this is a function which means you call this to get the logs and then somewhere down there in your code you have to call unlock on all of them the order there doesn't matter you can unlock in any order it's a locking that that kills you but now we're back to throwing exceptions before unlocking the locks so uh you used unique lock in conjunction with stud lock uh because unique lock will lock mal can on multiple locks will call lock on them and well you can sorry you can call lock on them yourself and then uniclock will unlock the mutexes in 17 you got the scoped lock whose constructor takes multiple mutexes so internally does the same thing as lock does but it does it all in one shot destructor unlocks all the mutexes in c 17 we got shared mutex which everybody else calls read write log uh now in c plus plus 14 we actually had shared timed mutex i have no idea why uh we got time tumultuous before than before the the non-timed one maybe somebody knows understanding here i'm sure somebody knows maybe somebody here knows uh you can do a the plane log gives you exclusive or right access if you hold exclusive access to a mutex nobody else can get any access everybody else will block if you only hold shared access or read taxes somebody else another threat can also get shared taxes but an attempt to get exclusive access will block for them now until everybody releases their uh their read access so read write look a very you know very cool idea uh in theory uh performance usually is pretty bad there is a object oriented the object wrapper for it against the shared locancy plus 14. we got conditions and condition variables which are modeled really really closely on p threat condition and in fact on the posix system they call p threat condition you have a pair condition variable and the mutex the protocol is between two threads so you lock the mutex so let's say this so this is how you transfer data from one thread to the other producer locks the mutex constructs the data that is guarded by the mutex notifies the the condition this is a signal this is in in pthread in pthread language that's called signaling condition in it has nothing to do with signals signaling the condition has nothing to do with like signals in unix sense uh on the consumer thread you uh it's a unique lock uh acquires the ownership of the lock and tries to lock it uh and then you wait on the condition the condition actually does two things this this weight does two things you see you have to give it the lock and internally it unlocks the mutex and then it waits on the conditions so this blocks but not on the mutex and the reason is you want to give up the ownership of this mutex so the producer can actually produce the data for you producer will then come in lock the mutex because this mutex guards our data so producer will come in log the mutex we'll do stuff to data save knowing that we have exclusive access to the data because any if we say that mutex guards the data it means that everybody else who would like to do something to the data must acquire the mutex first but producer already locked it the producer will then uh notify will unlock the mutex so the closing curly brace unlocks the mutex nothing happened here because this wasn't locked on the mutex this was blocked on the condition not locked blocked on the condition we notify and now this the consumer thread wakes up re-locks the mutex automatically this is all done inside the weight goal re-locks the mutex if somebody else managed to lock the mutex in the interim that attempt to relock will block until you can actually succeed to do that once you finally relock the mutex you come out and you can consume the data so that's a condition based handoff protocol posix people were doing that like since i don't know 70s wherever posix was introduced in this case they're exactly equivalent and that's basically what i wanted to show that for the simple use like that they're exactly equivalent but on the consumer side on the consumer side you must use something that you can pass in here yeah so you can't pass lock guard here because unique lock has lock unlock it export it's a unique lock behaves like a mutex it exports the unique lock itself as a lock so you can actually put a lock guard on a unique lock not that you would want to but syntactically you can uh unique lock behaves yeah so thank you i forgot to mention it unique lock behaves syntactically like a lock uh it has lock unlock so you can pass it into anything that into which you can pass a mutex and this requires a you know you could in theory pass a mutex into this you would just be you would just be just naked mutex you would just be responsible for unlocking for unlocking it later that's all uh here you can use unique mutex if you want sorry uniclock or you can use log guard yeah i believe the distinction is literally just a constructor yes so the question was log guard is just a constructor structure that's absolutely true unique log does a lot more than log guard in particular it yeah it exports the lock enter the mutex interface and it it has a move constructor you can transfer ownership or move assignment operator but if you if you don't do anything else to it if you just construct it and let it go out of scope it does exactly the same as the log guard one other thing with uniqlo you can construct it without actually locking the mutex there is another argument to the constructor that you can give that tells it to defer locking the mutex so it takes ownership of the mutex but doesn't lock it until then you then you have to call lock on the unique lock itself and then the destructor will still unlock so uniqlo can do you can do more things also if you if if you're using is so this is a plus plus 11 so far so far the same but in c plus plus 14 because you've got shared mutex and shared log now you can use both unique log and shared log with a shared mutex and then they will take their res so uniqlo will take the unique exclusive access the right access and the shared log will take uh read access on the shared mutex the names are match there is the basically everything that i have shown you there is a timed version of it and what the timed version does is you give it an interval it uh if it it tries to lock for that much time or less it could give up spuriously but ignoring that that's how long it will wait if it couldn't get the lock or the condition wasn't signaled so it couldn't get notified it will give up return and let you know that it actually didn't get the lock um so if if you basically have something that's like using timeout you you want if you couldn't get the law quickly enough you want to do something else for a while and then try to get the lock again you have these timed versions all right a few more interesting things you've got in c plus plus 11 which aren't part of any header one is local static variables the standard always says that this these are initialized as the first use what's the first use in c plus plus three the first use is the first use the first time you call this function there's nothing else because nothing happens at the same time as something else there's no such concept in c plus plus 11 there is what if i call this function at the same time on two threads which one is the first use and what happens to the other one we'll talk about this in a lot more details when we talk about memory model but c plus plus 11 says this is guaranteed to be thread safe which means exactly one installation happens exactly once and all other all the threads that didn't get to do the initialization will see the initialized initialized version of it barring bugs in some old compilers and as far as i know none of the modern compilers have any bugs on this uh when it was first introduced several compilers got it wrong uh stood call once is basically the same thing only it's a function that gets called once you you can always write your own code called once by writing a function that returns something that you buy into a static now the only interesting thing is inside this call once if or inside that constructor if the exception is thrown then the once didn't happen so the first the the first the first successful pass through that code is the one that actually gets to initialize okay here is one that is also not not a new function or a new object but a new storage type so we have static we have dynamic we have now thread local this is how it's written thread local just basically it goes anywhere where static goes well on the variables not on the functions static on the functions means something else uh it's a lifetime thing it's like static except there is one static you have one per process thread local you have one per thread whenever you start a new thread new new instance of this is created this is actually kind of interesting to see okay so i have a class that prints when it's constructed and distracted including which thread is constructing and distracting it and i'm also printing so i have this i have a thread local instance of it at the file scope in this case i also inside the function i have thread local of so this is like local static except one person so you can do it in both at the file scope and the the function scope i told you it's just a static so both of these could be static and then there would be one per process if they are thread local exact same thing except there is one per thread you'll see in a moment i'll prove to you that that's really true but there will be one of these per thread they're initialized whenever each thread goes through this code the first time and on different threads they're different so now i'll start a thread right here capture by reference so i can actually get access to that and in here i'll call function f and function f will return the address of this now this is not returning an address to a temporary because this within a one thread it's like static so returning an address of it is fine all right so both s and uh this and f are thread local so on the main thread so this is my main thread this is the thread id sorry the first print this is my main thread this is the next thing is the thread id and the next is the value of the the f of i right here the last one so main thread thread id is some large integer number and that's the last one is the address of the thread worker thread thread id different id and the last one is the address of the variable i on that thread so you see address of i is different on different threads so they got different copy different instance of that object that's what thread local is static but per thread each within that thread there is only one every time you use you take address of that static thread local variable you get the same address on that thread on another thread you get another address okay in the interest of time i'm not going to show you the performance i'll just tell you that and yeah you shouldn't believe me but comment the break i'll prove it to you they behave exactly performance wise accessing them is exactly the same as accessing static variables in a single thread what's the big deal why would you use that well the reason you would use that is you can access them without any locks at all because you have a guarantee that two different threads using the same name i are actually not referring to the same object so you never need looks to access these so uh now of course you it also means that every thread writing into it actually writes in some other place so at the end of the day you would have to uh ask each thread to take its you know the result out of the thread local and put it into some shared state under log but that's at the end of the day that's hopefully only once all the computations you can do on the so if you have a need for static variables in a multi-threaded program make those you can make those static variables thread local now you don't you don't need to synchronize them you can access them the same way you access them in a single threaded program so in a single thread program they were static in a multi-threaded program you change them to thread local the only thing you have to remember is you have to ask each thread to put put that result into some shared place there is actually another way to implement that local that they didn't use where you can iterate over if you implemented the other way you would be able to iterate over all instances of thread local so in our internal library that i wrote thread local is done that way the main thread can loop over all instances of thread local of course that's not red safe so you better make sure that all the other threads are halted at this point but the main thread can go and collect the results without asking anybody to do it that's not the way they have chosen to do it okay so we've just talked about that uh nothing in particular interesting in c plus plus 14 a few cool things were added in 20. uh synchronization primitives so we we have mutex and conditions and then they added three new synchronization primitives first one is a latch and latch is used to synchronization to synchronize thread arrival you declare a latch and the count you can every thread arriving to this latch and by arriving i mean you call that member function arrive and wait will block until that many threads whatever you put on the constructor arrive at which point they all unlock so if you wanted all threads to pause and wait and then run at the same time into some shared block of code that's how you do it after after it happens you have to throw that latch away you can't reuse it again there's no way to initialize this object and arrive to it again there is a barrier different type stud barrier which is basically a reusable latch or even latches a single use barrier if you prefer a barrier you can reinitialize you again you so basically barrier works exact same way i'll show you the code so barrier works exact same way you tell it how many threads you want to suspend on this and as soon as two threads arrive to the barrier here is my arrival and weight so the first thread will block the second thread because i said two will proceed and unblock the first thread then they just print stuff in my case but i uh so uh so you can reuse it you you reset it and then you can reuse it that's in the main okay not going to run it but it it just holds two threads until until they both arrive uh useful i mean it's it's pretty trivial but it's it's useful uh so uh by the way i said that uh you can reset the barrier it actually resets itself so when the synchronization moment happens when the so you said barrier of two when two threads arrive at that moment latch just becomes unusable barrier automatically resets itself back to two and if you start doing array one way to knit again it's again two there is another member function that lets you change the count uh but if you don't do it just as as soon as the threads pass the barrier it just resets itself back to its original freshly constructed state semaphores uh that's a little kind of a little different so it's basic basically the the posix semaphore uh but uh you know we haven't got that one for a while so that's done uh for the it's to hold multiple threads for a certain number of events so uh here's how it works uh you construct the semaphore in that in this particular semaphore we have to give it max count as the type there is a default for that which is implementation specific now if you it has to remember well it has more than two but the two interesting ones acquire and release when you acquire semaphore you will be blocked when you're when somebody else releases the semaphore one of the threads that acquired the semaphore will be unblur arbitrarily chosen will unblock and be allowed to proceed the remaining ones will still be blocked and this is basically done to usually done to manage uh resource transfer between threats so the classic example is shared queue i have a queue and there are some producer threads and some consumer threads the consumer threads come to the semaphore call acquire and block the producer thread puts in a resource on the queue calls release once signaling that one resource became available one of the consumers will wake up successfully acquire or exit out of the acquire call and proceed forward with the guarantee that there is one resource that can be taken now yeah you still have to lock the queue obviously to prevent like shared access with a producer who may be modifying the queue at the same time but you have a guarantee that one resource has been placed and nobody has taken it before you if you call produced wise uh sorry released twice two threads will be woken up each has a guarantee that there is enough resources for them even if the other threat comes in first there will still be at least one left so this is what the semaphores are for um yeah the constructor takes a count like normally it's a zero but you can pre the count so the count for the constructor is different from the count for the ledge the count for the latch or barrier is how many uh basically how many hits it'll take before it'll break here it's how many initial free really acquires so yeah the count of the semaphore gives you that's why it's not it's very often zero it's basically you you precede it with certain number of resources so if you if you say semaphore of eight it means that there are eight resources already there free for the taking so eight acquires will succeed before it will start blocking you normally you set it with zero which means yeah yeah yeah normally you set it with zero meaning you don't you didn't see that with any resources because usually the the one who constructs it doesn't have any resources to give there there are cases where you know you see that with some number but like in case of the queue you have producers which will produce but they haven't run yet and what was like the main thread which constructed doesn't have any resources to put in the queue the producers will you know read them or compute them when they're ready so you you create it with zero which means there are no resources to take the very first acquire will block because there are no resources there and then when somebody calls release it'll bump the count yeah so if you call release and nobody is waiting it'll just bump the code uh so yeah it basically it literally counts how many times you can take stuff out of it and the constructor allows you to pre-fill it and uh the the max count yeah uh just the other example that is useful to explain is like to make job server did you start with like 16 slots and then each job like gets one of the same acquires one and then they release when they finish which allows audio one to okay so here is an example where you have pre-existing resources so you start with certain number of slots then yes a valid example you start with certain number of slots uh for to put something jobs resources whatever if somebody comes in and says i want to i'm producing another resource give me a slot to put it in well i don't have one so you have to extend you increase the number of slots but you start with 16 pre pre-made let's say then you put 16 on the constructor okay so we have seen all these tools we haven't seen core routines we haven't seen uh basically any equivalent of threat pool uh i mean this looks really pretty underwhelming for anybody who did his own work in any existing concurrent system because as i said in the beginning you really have your own already so what's the big deal we'll get to tbb so hold your comment on tbb please because it'll be in the second half after the break when tbb comes in when we talk about parallel still until that point it's just just a cool library third party library which has nothing to do with c plus plus it will become something to do with c plus plus in about like what 45 minutes uh before we move on to the kind of philosophical part of the memory model any questions on on on the shiny things no oh yeah do all of those have tried versions like this uh yes uh the semaphore in particular has the tri version i don't remember if it has time try but you would have to you have to go look so as i said i you know i'm not going to read you the entire man page for this uh okay thank you so much comment was that lash and barrier do not have a time to wear synthord does but my goal for this was to tell you like th these things exist if you want to use them read the full documentation they actually have a lot more member functions for example you can arrive to a latch and not wait you can just say arrive bump the count and proceed without blocking that's you're incrementing the count for somebody else but you're not blocking uh if there is a function like that you there is another function that is weight without without counting so if you set it with a count of two and you have two special threats that must arrive and 20 you know less special threads that must block until those two special ones arrive the the 20 less special ones just wait on it and they'll be blocked forever they don't count until the two special ones arrive bump the count twice and now everybody can go through so yeah they have a bigger interface that i have shown you uh any other question yeah um we ever protect against accidentally not naming the lock guard like if you leave out mine here then you get a temp you get a lot god goes away it goes away at the end of the scope right uh yeah it will i don't know we can let's try end up during the break we'll try to compile it and see what happens i just i want to get through the memory model uh yeah so basically this serializes your questions okay there are no more questions to serialize let's go do the memory model and i don't know if they've done that and in the break we're going to find out all right so uh the rest of the talk the remaining 20 minutes i'll be explaining to you basically what's what's the really big deal about all of this because you know thread mutex sure but in reality everybody who programmed with threads before c plus plus 11 had that stuff yeah it's a little more portable now still what you know if you had to program for both windows and linux you had the wrappers for for this so what's the big deal the big deal is it begins begins really now that wasn't a big deal this is a big deal what's the memory model and why it's a big deal let's start with the definition there are several differently worded definitions floating around they'll boil down to this memory model describes interactions of threads through memory and a thread for the purposes of this definition is a sequence of instructions that can be executed independently of other threads why so interaction of threads this part is kind of clear why is through memory because they basically don't have any other sort of explicit interaction they they all pretend that they have their own cpu and really they do kind of for the you know they're they're running exclusively on the cpu they can share io sure but that's still dance from memory at least through the address space um so the memory model describes interaction of threads remember another way to say it is the memory model describes the state of the memory and the guarantees and requirements on the state of the memory that can be observed by multiple threads at the same time in particular the interesting part is what guarantees do we have uh when multiple threads are accessing memory now who is responsible for providing us with the memory model actually quite a few things first of all your hardware has a memory model now the hardware always has a memory model it may be not documented but it has one the operating system may impose additional memory model on top of what the hardware does if you're executing under vm you know that machine provides you some kind of memory model compiler basically either relaxes some guarantees or adds some guarantees that effective effectively translate into the memory model for the for the c plus virtual machine in our case and all of this together forms the actual memory model for the for your current program running on a current hardware compiled with the current compiler that compiler par portion came in class plus 11 before sa 11 we didn't have a memory model in the language which means the compiler wasn't on the list which means whatever os and hardware gives you that what you got now we have additional guarantees from the compiler which are true no matter what these two other ones do the memory model can never be more relaxed like looser when will by uh for now intuitively i mean if you have restrictions and requirements you can tighten them and loosen them i'll show you what they are but intuitively you can see if i have restrictions and requirements i can be stricter about it or less strict so when you have multiple pieces influencing the memory model you can never make it looser you can make it more strict you can impose more restrictions but so if the compiler has certain guarantees it has to do so in such a way that no os and no hardware can relax them compiler may be unable to re to loosen any of these guarantees that are already provided by the os and hardware but it has to force the os and hardware to not relax the guarantees provided by the compiler that's how they combine together so important parts of the c plus plus memory model let's start with the one that pos-x actually didn't tell us anything about every variable or object distinct object has a distinct address accessing two distinct objects is always from different threads is always safe no matter what those threads are doing as long as of course they're operating on the object and nothing else and no matter what the type is so in particular writing into an array of bools or array of chars on two different threads is thread safe as long as you're not writing it to the same element and yes on x86 this restricts the code generation of the compiler and yes there may be a performance cost yes i assume what you're saying there is even if they overlap in adjacent bytes or something like that they're still going to be safe okay so basically the question boils down to why why would you have anything else in the first place uh the reason like why you know who gave you the idea that two different addresses may not be safe in the first place the reason is hardware operates in words like uh on x86 the four byte word is your native word size and one of the ways to write a single byte on x86 is to do a mask right which is you read the entire four byte word you change the byte that you wanted to change and you write the entire 4 byte word back in and hardware that's an option in hardware to do that obviously you are restoring the value of the remaining three bytes to whatever it was before you started if somebody else came in and changed one of the other bytes in that four byte word you're going to blast it away and return it to what it was so you're going to undo somebody else's right in not in your own byte but so bytes zero and one let's say this is byte uh i'm writing zero in here what i'm really doing is reading the four bytes doing the bitwise and to set the my byte to zero and writing four bytes back c plus plus 11 says no you cannot you cannot use do that during the code gen emit some other instructions that don't do that so on some hardware including x86 actually doing that masked right was the way everybody had done for it was actually the fastest way to do it because the processor is faster operating on for on native word size accessing single byte not aligned on a four byte boundary was a little slower and the c plus plus 11 said no you cannot do you use the hardware instructions that actually arrives a single byte at a given address um so uh this gran this uh guarantee only goes up to bytes accessing different bits in the same byte all bets are off even if you have instructions that can access bits in a byte the language guarantees nothing now hardware may give you additional guarantees on x86 there is a bts instruction test and set for a single bit you give it an offset in bits and it sets a single bit you can you can put a log prefix on it which would make it atomic hardware in that case guarantees you that writing into two adjacent bits with the log prefix is thread safe but that comes from the hardware language doesn't expose it and short of writing assembly there is nothing you can do to actually get at that guarantee in writing assembly of course you know you're out of the language you're on your own so okay well you know from the sound comes the sunburn you know and from threads come that data races everything has its price so the standard does define uh what is safe and unsafe memory access and it says that concurrent accesses to different memory locations and by that i mean well different variables basically different addresses are always safe regardless of the type and what you do concurrent accesses to the same memory location are safe only if all accesses are reads you don't change the value if at least one thread changes the value then concurrent accesses both reads and writes are undefined behavior so if there is one writer and many readers everybody is undefined not just the the readers or the writer everybody is undefined question to avoid this problem you must use synchronization what kind of synchronization well we've seen mutexes there are atomics there are the semaphores uh you must do something to ensure that only one thread at a time is actually writing well from data races came the logs so we've already seen the logs this plus plus is a modifying operation uh you try to do it without the locks what's going to happen well undefined behavior anything can happen if you actually just try to do it in hardware straight up and assuming compiler doesn't take advantage of undefined behavior like i've talked about on monday what's probably going to happen is both res will read the value of zero they'll both increment it on the register to one and both write it back as one and it'll end up being one that's probably what's going to happen again undefined behavior you know anything you can end up with four all right there is a slight question okay what about the mutex itself both threads are accessing the mutex variable at the same time and you have to assume that lock the lock member function isn't just a read so how is that safe well magic mutex is a special i'll talk about magic in the second half mutexes are special okay so back to our thing how did it work you know we had our p thread lock and unlock well posix required that uh you couldn't do this reordering that was posix stuff all right so how do we have what do we do now now it's that same thing has been ported into c plus plus memory model in c plus plus memory model it basically well you could say that it recognizes the lock and unlock except of course in reality if this is a separate compilation it doesn't know what's in the function so it has to assume that there might be lock and unlock and there therefore it cannot reorder anything across a function that it can that it can't see and do so the compiler is restricted to the worst possible case whatever it cannot prove it has to assume that it happened yeah so then if those should those functions be in the same translation unit and it realizes it's not doing any mutex well uh they are in the library so here this is not an option oh i suppose the case of if there's just normal well if okay if everything was in the translation unit and you put stood okay so the easier question what happens if they don't lock or what happens if they do log okay if they were in line or you had a compiler that is capable of doing whole program optimizations the compiler could figure out that they're not locking and hoist them the normal way if the compiler can see stood mutex or whatever stood mutex actually unravels to uh like the thing inside the mutex that actually you know notifies the compiler to to pay attention uh uh and i'll show you what the thing is in the second half but basically the compiler could see into the into this or into the stood mutex it wouldn't it would it would notice there is a there is a special like mark there saying that this is the place through which you cannot move instructions so if compiler can see that it knows exactly where that is if the compiler cannot see it it has to assume that it's there the usual case so the locks actually do work and pretty much the way you expect oops sorry one thread locks the lock enters what's called critical section that's the part that requires exclusive access the other thread tries to lock the lock has to wait until the first thread unlocks then it enters the critical section which requires exclusive access they can both modify the shared data during the the time they're in critical section then it unlocks and it can't touch the shared data anymore because accesses to the shared data except except reads are undefined behavior unless you're doing them exclusively uh that's actually not the that's important these guarantees are important uh the other part of it is there are both correctness and efficiency guarantees so what guarantees do you have and therefore what can be optimized without violating the guarantees that you have that's also important so what do we have what what's the memory model consists of the guarantees on the order of memory operations reads and writes synchronization events at which points the memory access is synchronized what's called consistency model i have two threads running on two cpus they're both scanning the memory and reading the data out of memory and reporting me on what they saw do i have the guarantee that they see the same state of memory and if not what could be the differences if one of them is modifying the memory what does the other thread see what guarantees do i have about what the other thread sees when does it become visible so all of this is included in the memory model so as i said hardware has a memory model a language provides memory model compiler provides memory model the memory model is important not just for the hardware but also compiler itself has to avoid making transformations that would violate the guarantees of the memory model so com hardware obviously if you have lock and unlock there is something special going on in hardware that for bids exclusive for bids shared access to the critical section but the compiler has to obey that restriction too the compiler cannot move the code outside of the critical section so both hardware and the compiler participate in providing you these guarantees so memory model is a property of the entire system and any one component could give additional guarantees all right let's let's see what kind of like what kind of guarantees we're talking about actually on the hardware level there's lots of things uh we we can talk about it from the point of view of what can be reordered two reads can they be reordered well yes maybe depends on the hardware a read and a write which means if i first read a value and then i write into a value am i guaranteed that the read will see the what was before and not was not what was written possibly right read same thing am i guaranteed to see the value that was written two rights again if somebody else is observing can they see the second value before the first one uh here's a good one dependent read if you have instar p and you want to read from star p and you but you need the read of value of the pointer itself can you reorder those can you first read the star p and then read p there was a piece of hardware that did that anybody knows what that was the calpha dec alpha the kalpha could reorder if if it could prove that the value of the pointer hasn't changed it could uh keep the old value and swap the the swap the redundant read of the pointer was the oh who is the reference in the point uh here is a good one if you write into memory and you're reading instructions from memory can that be reordered if it can you can't write self-modifying code not that you should x86 our favorite platform actually has a very strict memory model almost total ordering write read is the only one that can be reordered right read can become read write and everything else has to be executed in program order and by program order i don't necessarily mean what you wrote what the compiler wrote the compiler can still reorder stuff all around what the compiler wrote is the program order and the hardware will execute it in that order except it can hardware itself can swap the write read but not read write so if you have right read it can become read write not the other way around compiler of course can do any sort of rotating at once read um a single thread or do the right degree happen in different threads if it swapped it within the single thread how would you know you have to you know you have to have an observer because like uh you'll you'll see it a little bit later when i talk about it from the point of view of what's called memory visibility but uh you know all of this makes uh makes sense if there is a way for you to observe it so you would need either that same thread would need to observe in which case it's not just these two there is another read stuck in there somewhere or some other thread observes so you need somebody to observe it for just one thread with just these two uh uh well okay so the uh within one thread i mean they'll have to execute in this order so basically like this okay so this reed will will will always see the the the the you're generally talking about swapping swapping orders when it happens yeah within on x86 within one thread this read will see whatever you wrote so in in this sense it cannot be reordered arm reorders pretty much everything by default the only thing that it doesn't do is it doesn't reorder the the data depends i respect it on arm okay so and this actually touches on this question that was asked a better way to speak about this ordering is not so much about which operations are reordered but what guarantees do you have on the visibility of memory operations that's a more accurate way to talk about it it's actually it's convenient to talk about in terms of ordering reads and writes because it's a kind of simpler model but the more precise one is talking about visibility so instead of saying which memory operations can be reordered let's say how the effects of these operations can be observed in different order so we can now say that no write read cannot be reordered right read is right read but the effect of the right may be able to maybe become visible after the the effect of the read if and then that's what it means so before we used to say write and read can be ordered we will now say that the results of the right become visible after the read so it's equivalent languages but uh the the visibility language is is more precise and it boils down to in the end what are the consistency guarantees because the visibility guarantees gives you in the end uh multiple threads observe the same memory and can see it in inconsistent state one thread believes that this is the these are the bits in the memory and the other thread believes that something else are the bits in the memory at the same time what guarantees what what can and cannot be different when viewed this way and what inconsistency is permitted that's uh basically this view centric uh view of memory model for the most practical purposes all of this is equivalent and it's just a matter of which language and mental model is easier for you to have okay so before we break i'm going to put this one statement and then we'll go actually into the next part of it which is okay memory models model whatever logs work somehow who cares and with that idea i'll leave you here for the break take time for questions and then in the second half when we come back i'll explain to you what else is there other than locks and why you care about memory model as a programmer besides the match the believe in magic so one reason would be you don't believe in magic you want to know how it actually works but the other reason would be as a programmer you can take explicit control of all of these things that i told you about which constitutes memory model you can basically go and directly ask memory model to do to like please give me these guarantees or please do not impose these guarantees on me yes but that's obviously how you do it uh specifically through the memory order argument which you can only put on atomics well you can also put it on fence but fence is actually atomic fence so uh so yes exactly uh the second half will cover atomic parallel steel and corotins when you talk about the bright green order and that it can pretend that the right has finished is is that talking terms of the pipeline of the cpu itself and like how it makes its way into the layers and whatnot and can just assumes that the right will finish and just breeds as if okay uh in terms of memory model itself it's not that specific so the question was basically uh is is the fact that right and read can be reordered is it related to pipelines and so the memory model itself isn't that specific it doesn't tell you how it's arranged it only tells you what's guaranteed so if you want to know how it's arranged it's on specifically on x86 it's mostly has to do with the right buffers because the rights don't go straight into memory they go into right buffers and the reason they're going to write buffers is be well one of the reasons i don't know they're probably a hardware pure hardware reason for speed but one big reason is speculative execution because if you have to enroll speculative execution you cannot have changed the state of the memory because that cannot be undone so if you have rights in a speculatively executed branch where do the results go they're held in right buffers until speculative execution becomes real execution or you know that you mispredicted and you have to flush the pipeline so those right buffers are basically a synchronous parallel channel for holding the results of the rights while reads are being executed directly from cache so you're reading from cache but the results of the rights are sitting in there in the right buffers until speculative execution context clears so that's one example that's going yes speculative execution if it's reading from a location that matches that right buffer the data would come back for that thread yes but for any other thread that runs on a different core no yeah it's not global state that's correct yes character buffer or character error um just assume i have got a function and the compiler can't prove that it's not used in paulo compiler assumes that everything is used in parallel contexts he has to the gomba has to if you have okay if you have on the main and then and basically from the main it sees though everything is in line and domain can it can it still in theory it could it has to obey yes because remember the binary you're shipping. so the binary code has been generated in practice they just changed the code gen to use like move zbl and similar things and you know that's that's just what the code gen does they don't like fork i've measured on recent cpus and i couldn't see the difference but uh like when 3.11 was introduced yes i could see the difference on the on the hardware at the time i've measured it like two years ago and i couldn't see any difference slide 10 okay let's go back to slide 10. you're trying to win the calendar okay slide 10 that's a lock okay lock is definitely in a memory furthermore lock is a shared variable and i just told you not to access shared data without a lock and here we are accessing a shared variable inside this lock without another lock so we'll figure out how that happens in the second half proximity wise tricky question hold that thought it will come up in the second half there are arguments for both and you have to have very accurate discernment to figure out in your case is it better to put them together or not it has to do with so if only one thread accesses them if the other thread wasn't there then of course yes cache locality if the other thread does access the lock and is banging on it trying to get the lock while this thread is in the critical section during the computation that's where it becomes tricky maybe the answers maybe not and i'll show you why also sorry well both shared pointers and weak pointers have interesting relation with concurrent concurrent data structures that outlive your storage of your shared data you want them stored well the pointer itself you probably don't care it's the the control block is where the synchronization is yes the control block yeah the control block yes the pointer itself wherever yeah the control yeah the control block well again it depends on how many you have if you have only one then it doesn't matter then actually probably want it together no if you have only one or if you use them all on the same thread if you have two but you use them both on the same thread you probably do want them together if you if you have a lot and they're on different threads then uh so yeah i don't want to kind of jump too far ahead but i mean you know where what i'm getting at and for everybody else who doesn't you will see it in half an hour uh basically the accesses to this and to this have different performance requirements between when multiple threads are accessing and it may be better to keep them far apart yeah oh okay uh i'm told to wrap up now i'll be available during the break so you can ask questions just privately during the break and then in what half an hour we resume with atomics then parallel stl and then coridians [Applause] you

Info

Channel: CppNow

Views: 8,502

Rating: undefined out of 5

Keywords: Concurrency in C++, concurrency, concurrency c++, concurrency explained, concurrent programming, C++ facilities, first concurrency-aware C++ standard, C++ concurrency for maximum performance, memory model in C++, thread-safety, Fedor Pikus, standard libraries, code, primitives of concurrency in C++, locks, barriers, conditions, atomics, lock-free programming, concurrent programs, C++20 coroutines, concurrency features in C++23, CppNow 2022, cpp, boost, cppcon, C++now, software engineering

Id: ywJ4cq67-uc

Channel Id: undefined

Length: 94min 46sec (5686 seconds)

Published: Fri Jul 22 2022