CppCon 2015: Fedor Pikus PART 1 “Live Lock-Free or Deadlock (Practical Lock-free Programming)"

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Is Part 2 still underway?

👍︎︎ 2 👤︎︎ u/AntiProtonBoy 📅︎︎ Oct 10 2015 🗫︎ replies

Found this talk rather entertaining and found it pretty easy to follow even though I have little experience in Concurrency.

EDIT 3: My goodness, I was being stupid. I was trying to replicate this in global scope outside main(). I move it into main and everything compiles correctly. Thank you for all of your responses. Apologies for stupid question.

I did struggle to completely understand the deconstruction of the new operator though. Given the code:

0.   class Gizmo {};
1.   Gizmo* the_gizmo;
2.   void* temp = malloc(sizeof(Gizmo));
3.   new (temp) Gizmo;
4.   the_gizmo = static_cast<Gizmo*>(temp);

I understand the following:

  1. Instantiate a pointer to a Gizmo. Currently == nullptr most likely garbage (Thank you for the correction c0r3ntin)

  2. Allocate enough memory for a Gizmo on the heap and store its address in temp:

  3. Construct a Gizmo in the address held by temp;

  4. Assign the temp address to the_gizmo pointer.

Line 3 fails to compile with an unqualified-id error on my machine. The syntax seems correct from googling so I'm not sure what's going on. Any ideas?

EDIT: I forgot to mention it but I have Gizmo defined as as class at the top of the file. Added step 0.

EDIT 2: I inlined the placement-new into the static cast and it worked:

the_gizmo = static_cast<Gizmo*>(new (temp) Gizmo);

My compiler (clang) just doesn't seem to like it on its own.

👍︎︎ 1 👤︎︎ u/dillinger__88 📅︎︎ Oct 09 2015 🗫︎ replies
Captions
welcome to the first of the talks on Atomics and lock-free and all the concurrent stuff that there will be several more and I'll try to introduce some of the materials that will be covered later this is in two parts we'll find a good place to break so the roadmap well we'll get through all the stages yes I will do the double check locking everybody does it I have my own twist on it so I hope I can show you something interesting even though the subject has been beaten quite to death no this was on Wednesday I give a talk about handling exceptions this was supposed to happen then not now okay this one is not about handling exceptions so wait for exceptions for Wednesday this is about writing good concurrent program specifically with C++ emphasis is on practical which means in practical applications not a library development okay how many people here write like applications you know the specific concrete very good who writes libraries well you may find something useful in this but you will be limited now I do like questions I didn't expect to be in in this room and here it's hard to pass mic so if you can project and something is unclear to you try to interrupt me and ask questions I don't like to leave anything unclear as we go through because as you know we have two hours of material if something is unclear by the time we get to the end of it in the first hour you will be missing a big chunk also there are while I try to be accurate I am explicitly not going to be precise I will be jumping over things I will be simplifying things and there are several talks following on if you want to know the real truth if you can handle the truth you go there and you learn all about some of the details the reason I put this on is not you know the regular slide about the authors how many papers how many of these how many of that it gives you the better understanding of your my perspective sort of everybody has a keyhole everybody has a lens they look at software this is mine I work on software that deals with very large scale problems we have solvents of course and hundreds of maybe hundreds of thousands of course running for sometimes for hours sometimes for days billions of objects at the same time and they're getting turned around you know if you're if you have one gigabyte of memory we probably have churned around a terabyte per hour or more we write software for sale we have to write it for whatever our customers running on how many people write software for like for sale to external you know basically your customers you live by the money they pay okay so you understand we have customers who are very conservative and the reason they're very conservative is you know not because they're stupid or anal but because if we screw up the production line on on a fab it could really be hundreds of millions per day so you know half of our customers are around on RedHat five right now the newest compiler you can get on a standard hat hat five is GCC is three four something or other which brings us to the subject writing log free programs writing log free programs is hard what's harder than writing log free programs incorrect love free programs in practice so you go to theory about lock-free and they tell you everything you know about all the problems with locks and all the reasons to go lock free and you go out and you forget it and you start writing log free code because you think it's faster that's 90 percent flood okay if you are after the best performance what's rule number one about performance who knows the rule number one of performance yes or the other ways never guess about performance you will be surprised pretty much every time log three algorithm surprisingly enough do not always provide better performance even when they are correct incorrect ones often provide better performance so I have these take-home points that I'm going to kind of accumulate throughout the talk if you leave this room and you remember at least one of them I have done my job well now speaking of doing my job well who knows what a Q is Q okay very good I'll ask the same question at the end of the talk at the end of second hour if at least one of you hesitates to raise your hand I have done my job really well since I brought up the question of C++ 11 C++ 11 has atomic support and has memory barrier support as I said for some OVA for some people like myself it's not yet available but not because we you know we can't port our code to say plus plus eleven compilers because a large portion of our customer base works and systems where these compilers would be a custom software would require a custom library load and nothing custom gets loaded on those machines it's in some way more complex than the simplified level I'm going to be talking about now don't be alarmed if you're using C++ 11 I will use a kind of pseudo code that map you could one two one two C++ 11 you could implement your own C++ library to that exact interface that I will be using you can map it one-to-one to boost Atomics and pretty much any library that provides memory barriers and lock free atomic primitives so if that's the only thing that you know stops you from using it this is the least of your worries okay time for acknowledging the great ones so last year herb setter was giving the talk on come on one lock free and this is a slide from his talk he was giving this analogy this is your single-threaded program at the top do we have the note that you need this right now well I'm not so much interested in that I'm interested in oh all right it was like I can see it on my hand okay so anyway single-threaded program very easy the traffic lights are like locks and the highway overpass is like your lottery program everything just flows smoothly at my speed nobody waits for anybody this is why it's much better since I'm talking about practical I will do everything as example driven so I'll do by example and we'll try to generalize the examples this is one of the simplest examples of local programming double check locking everybody knows about it everybody have seen it at least once it does have some twists that I like to expose that maybe you haven't seen before now before C++ 11 it was actually really important for you to know because there was a huge caveat that you'll see in a moment in C++ 11 they made the thread safe static initializer which made double check locking irrelevant children know it made one application of double check locking irrelevant you'll see more double check locking is actually very useful for practical luxury programming toward the end you'll see what well let's do the classic application of double check locking the classic application of double check locking is singleton initialization we have some magic yzma thingy which has a transmogrify method and that member of that method is thread safe in order to operate the magic thingy we need to create one and there is only one of them there is only one for the entire program and then we need to transmogrify it there's no purify thread safe even if you call it by multiple threads at the same time on the same object so the question of thread safety is that line with a static initializer is it thread safe and what do I mean but thread safe by design there should be only one of these objects in the entire program well that's what static normally says now unless multiple threads enter operate function at the same time then it's a little less clear C++ before C++ eleven C++ standard had exactly this to say on the subject what threads C++ 11 introduced the concept of threads and the memory model so they could now actually talk about it and they said yes it is thread safe and the compiler said okay kinda most like the modern compilers actually mostly get a try GCC got it right got it right after four or four I think 4.4 not four zero four they had a bug where it wasn't safe anyway it's still instructive to go through so what's the problem first step you understand why wouldn't it be thread safe what the static initialization really do you know what's behind that first line that first statement well we have a pointer which starts its life as now on its its linker initialized so it's initially now Eve that point we go through that context of code if the pointer is currently now we need to create new object store its address in the pointer otherwise we don't need to do anything well here is the problem the problem is we have a data race two threads come in both start executing this code they both see now they both go in construct the copy of an object now one of them leaks because you only have one pointer the other assignment got stomped on unprotected static immunization is not thread safe and what's the root of the problem the root of the problem is that the checking for now and handling the null as creating the object assigning the pointer to the address to the pointer has to be done atomically in a single transaction it the whole thing has to be done as one uninterruptible atomic operation this concept of atomic operations and i don't mean in the sense of hardware Atomics i mean in the sense of this whole operation either happen or didn't happen but it cannot have happen nothing else can go on while this uninterruptible atomic transaction is being processed so the concept of such atomic operations is central to all concurrent programming whether you use locks or not you have to start thinking in these terms okay so the usual way of dealing with that is of course if you need if you need an atomic operation you put lock around the whole section of code that you want to be atomic and non-interactive also we put a lock before check for pointer and we unlock after this works there is however a big waste going on here after the few threads go through this may be the few threads come in at the same time and they struggle for the locks and so on but eventually it'll get initialized after that it will always be non now and it'll always be initialized and nobody needs to create a new one however we're always taking the lock locking and unlocking every time so that's the big waste you know our program has run for an hour we're still looking and unlocking okay we can since we know that eventually the pointer will be known now we'll just check for the pointer if it's now we know that this could be a data race here so we're now we're going to take a look check again and this is atomic transaction inside the lock this is why it's called double check locking there's only one problem this doesn't work it may work on your compiler I am told that I don't personally know that some versions of Visual Studio detect this in the source code as a pattern and emit correct code if you reorder a couple statements they don't detect it as a pattern so what does the initialization really do well it does something like this first it has to allocate memory obviously nothing can go on before we allocate memory then it has to construct the object in that memory so that's a placement new and finally it has to initialize that static pointer with the address of the temporary memory that we just allocated that's of the things that the three things that that must happen well this is also valid implementation allocate memory store the address of that memory in the pointer it right now doesn't point to an object here that points to uninitialized memory then construct the object in that memory and now now you're back to the same exact same state we changed the order we first initialized the static pointer then we constructed object both are valid from the point of view of the C++ compiler which means that the compiler can implement this expression any way it wants both both options are valid this one doesn't work for us because the pointer will be known now other threads will go through C the pointer is known now and try to use the memory when the object hasn't been constructed yet this is a bad order we want to force the other one okay so we're going to force this order we're going to write it explicitly like this allocate first construct second initialize third optimizers do this that's what the compiler writer is getting paid for to detect those temporary variables and eliminate them so compiler probably reduces it to something like that just eliminate the temporary store the address of malloc right in the pointer and then call the call the constructor on that memory compiler wins this round but never underestimate the ingenuity of the programmers we have a big hammer in C++ and by God we're going to use that we're going to the problem was that memory acts can power your direct memory accesses we have this volatile thing which tells the compiler do not eliminate any reads or writes and do not reorder them this is exactly what we need well we're going through this volatile on every invocation so there may be some performance penalty but never mind that now compiler cannnot tree order really every read and write has to happen exactly in the order that I have written so allocation has to happen then it's the address of the allocation stores in the temporary variable temporary variable now cannot be eliminated cannot cannot get rid of all at all reads and writes and the last thing is an assignment of the static pointer and since the last thing is a read from volatile has to happen there has to be last okay we win the victory will be very short-lived unfortunately the volatile keyword only applies to the C++ compiler there if you look at the object code there is no volatile there so now we have to think about what will the hardware do and we know that the for you basically if you read and write at the same time on multiple CPUs this is in general not thread safe there are all sorts of race condition and hardware but this is a talk about luxury programming Atomics have to be in here somewhere so that's probably what's missing right we're missing Atomics let's put them in okay as I said I told you I'm not using C++ 11 but you can map that atomic but in this case by dropping the lower case doesn't so the static pointer will be atomic now if you noticed volatile disappeared there is an assumption that Atomics have the volatile semantics and if you look at how they're implemented like for example if you look at GCC assembler they do so we have Atomics now since the our main static pointer is atomic we have to load it atomically check for now then we take the lock with in the lock we cannot do it atomic matter and we have to do a filmic store okay volatile again sorry actually yep okay sorry okay so atomic has been introduced there are volatile semantics compared to the previous slide with volatile who thinks we fixed at least some of the race conditions here and who thinks we haven't fixed any single one yeah we didn't really fix anything and this is probably the most teachable moment in this whole double-check looking pattern so let's explore this this is what we're counting on here no this wasn't what we're counting on okay can you hear me if it's lower like this okay this look is better CPU issues some memory rights and four at first are the memory writes for the constructor whatever we're using to initialize the object and they happen first and they get committed to memory first second is the memory right that commits the value of the address into the static pointer and that gets committed to memory second this is what we're counting on this is not how the world works and the reason this is not how the world works is because we have those pesky things called caches and they're very annoying they're very inconvenient they make life very complicated in the second half of the talk you'll see why we still have them so CPU indeed issues the rights in the order that I just described the there is no cheating here first grow the rights first we write into the object second we write into the pointer that's true what's not true is not necessarily true is what happens that those rights get committed to memory in the same order what's entirely feasible because at this point CPU has no control over what's going on CPU finished its writing to cache has gone on to do some other things cache gradually flashes it into memory or maybe not gradual if you ask for it again but in any order it was in particular it's entirely possible that in the main memory the static pointer gets updated first it's pointing to the location of the object and what was what's in main memory at that location of the object whatever was there before the random garbage that's entirely possible now this same CPU if it starts reading will read from its cache so it'll see the correct picture another CPU that doesn't share the cache was this one will read from the main memory and will see incorrect picture okay we're one of the battle was the compiler we lost it to the hardware not good well at this point you should actually question okay if everything is so messed up how the locks actually work yeah hold on to that question there is more than one atomic actually if you want to really know how many atomic though there are come to Michael once talk but I'll give you a simplified picture there is a reason why I said atomic store I'll say a little bit later about what kind of atomic store okay why does this never happen with looks I have to introduce this concept of memory barriers Michael will talk about memory barriers in great details and specifically for C++ 1114 and so on and if you want to know like really know about memory barriers also follow on with polemic a nice talk but I'll introduce a simplified view of memory barriers without memory barriers there is no concurrent programming locks or no logs synchronous because memory barriers are what provides synchronization of memory visibility across different CPUs memory barriers are implemented in Hardware they are in Hardware specific and the last line used to be true it's not true in C++ 11 there are portable memory barriers now okay as I said I'll give you a very simplified picture of memory barriers there are two talks that deal was what's really there you really want to know how the world works so I'll reduce the entire universe of memory barriers to two barriers acquire and release on x86 it's actually pretty much good enough for you to know not entirely but close enough acquire barrier guarantees that all memory operations that were scheduled after the barrier become visible after the barrier so whatever was in the program order after the barrier becomes visible after the barrier usually there is no acquire barrier per se a Quarrier is a modifier on some other structions like a choir read well Sam CPUs have a just just a barrier instruction so if we have if this was the program order then the memory visability order could be something like this you can go down through the choir barrier just can't go up well if there is a choir there has to be a release release has the opposite guarantee anything that was done before has to be visible before the barrier you can go up you just can't go down okay this is what locks do locks combined a choir barrier at the beginning and release barrier at the end to form what's called critical section anything that is done within the critical section cannot escape up or down any reads or writes that are done outside of critical section can enter a critical section and be done there but anything within critical section has to stay with the in critical section so that's actually why the name is you acquire the lock when you enter a critical section and you release the lock when you leave the critical section that's why the barriers are called this way now very important when you use locks it doesn't matter when you start doing lock three people forget all the time this gets you every time those memory barriers only work if both CPUs cooperate it's not like a push across the entire system both CPUs have to take a barrier instruction between you and then you get a synchronization between those if one of the CPUs does the barrier and the other one doesn't the second CPU can see whatever it wants in any order no guarantees its cooperation affair okay well Michael will actually cover this material in a lot more detail so in the interest of time I'll skip that I just saw his slide so I'm kind of adjusting some of the things you know he'll cover it in a lot a lot more accurately okay let's before before the break which is coming up let us finish this so now this really works this is a double check pattern that really works there was a question about store whether the store guarantees the store well it has to be a release store now on some platforms every store released store and some it isn't but if for whatever reason by luck or by intent your store is a released store and of course the load has to be acquired load remember I said there has to be two memory barriers on both both CPUs have to play then yes this is guaranteed to be executed in the proper order and become visible in the proper order so what does it mean release store older all the rights that were done before released or have to become visible before before so before you can read that that value from whatever was released stored all the previous rights have to be visible to you and what are the previous right well that's the construction initialization of the object so all of those rights which were done before we stored the address of the object into the static pointer will become visible before you read from the static pointer if you read with acquire a load so this double check pattern actually works atomic instructions that's why sama constructions are the easy part 90% of all log free programming is about memory barriers and in general the visibility order a lock creates critical section now okay look at the look at the second line in the code that lion is outside of the log which means it happens before let's say that instead of a choir load imagine that a set of a choir load I said simply load which is no barrier load now that lion happens before the log which means no memory barriers have occurred on the CPU that is running this code so now there are no guarantees on memory visibility I can get an on now pointer from that second line of the code and the rights that were initializing the object may not be committed for another hour exactly so I have two CPUs the first one was really the first so the first CPU went through and said okay is the value now yes it is now because it's it's further first view let's do the lock okay fine we did the lock we're the only one okay let's check if it's if it's still now yeah it's still now where the first CPU going through construct the object really a release store unlock the second CPU is going through again imagine that instead of a choir load I say just plain load no barrier load reads the now yet soon on now I mean reason oh sorry that reads the static pointer gets not an owl okay do I have any guarantees on the memory of the gizmo object itself no I don't those rights are sitting still in the cache haven't propagated through main memory haven't propagated back up won't be propagated through main memory for the next hour in reality doesn't leave that long but standard you know that there is no current if you're not using barriers there is no guarantee I mean on some hardware it could be an hour so those rights haven't propagated through the system to main memory so if you didn't take the barrier you we'll never go into the lock which means you will never get any barriers at all which means there is no reason for your main memory to be updated ever remember if you don't take barriers memory updates are basically at the mercy of the hardware hardware may update them now or next day whatever so if you don't have that first acquire barrier you may run through this entire code and never hit a single memory barrier is that clear no second CPU sees that it's not now second CPU can see anything if it sees that it's now then it will try to initialize it again let's let's assume that it sees that it's not now you know any any subset of the rights can be made visible and the other subset non visible let's consider the example where the static pointer was made visible and nothing else was made visible now the second CPU see is not now and reads uninitialized memory from from the object itself because no other rights have been made visible okay let's well I think we actually have a what 30 minute break a break between the two talks if I'm not mistaken right sorry okay so okay so let me just go through all the way to the end and I'll take questions between and for the for the second half okay so as you saw and we had just had a confirmation lock-free program is mostly about memory order Atomics instructions were the easy part yeah I taught me to read atomic right every shared variable that you read and write was out the lock has to be atomic that's easy they have you have to take the correct memory barriers and order your code correctly between those memory barriers to make sure that what you think is done and should be visible is really visible now this law this whole thing went away with C++ 11 if you have C++ 11 and you want static singleton you do this and this according to standard is thread safe and is guaranteed to work what the compiler do is very much what I just shown you you can look at the assembly that's something very similar now the reason I'm still talking about it one it's relatively easy to explain it's not very easy but it's probably the easiest vehicle I have for explaining the memory dependency issues but more importantly while in like pretty much every talk on double check locking pattern uses that singleton Association that's not the general pattern that it's solving think for a moment what's the real problem that we are solving we have normal case which always happens and exceptional case which rarely happens when exception case actually happens it's difficult to handle there is a lot of data to be messed with so in general you want to have a lock protecting it it's not thread safe it's hard to make thread safe but you don't want to take that lock every time the problem is in order to test you have to test for the expecially and handle special case in an atomic transaction as a single non interruptible operation which means the only way to do it was a lock is to put a lock around both the test and the handler of this special case 99.9 percent of the time the handler doesn't need to run in case of singleton like almost never the handle doesn't need to run but the test is still guarded by the lock that's what we're trying to solve that's what we're trying to avoid now when can we use double check locking pattern let's assume that we have a very fast non-locking test for whether we are in trouble whether we need the exceptional handling and I don't mean C++ exceptions I mean just something that is not usual if there is no false success if the test says we're good we're good if the test says no we may be in an exceptional situation we may or may not be in an exceptional situation this is the case was that singled and if the pointer is not now assuming you have written it correctly if the pointer is not now go ahead and use the object if the pointer is now maybe the object needs to be created maybe it isn't maybe somebody's creating it right now in which case which you shouldn't be so this is the double check locking a pattern this is the case where it's used in general when you want to test for a special case very quickly a lot of times without a lock and if the for if the test for special case shows you that you may be in an exceptional situation then you take the lock and really figure out where you are in case of singleton once the test succeeded it always stays succeeded succeeded in general case that doesn't have to be true double check locking pattern actually doesn't depend on it you could go into it like basically if you handled your problem you may have to handle it again later just not very soon okay you will see why this is actually very useful in practice so this is this is kind of the pseudocode for the general double-check loading code pattern if the fast test fails we have to log we have to do the exact test we have to handle the problem and we have to unlock now there is another assumption here the assumption is that basically if the problem happens after the fact if one thread goes through does the fast test and the fastest succeeded you can actually keep using whatever you were you know whatever like your object or whatever it is you are trying to use even if the problem later happens so classic example is memory allocation if you you're testing for whether I have all my memory allocated if you do that memory is not going away even you already got it if the memory allocation fails you run out of memory whatever you are located is still there you may not be able to allocate new one you have to handle it so but the idea is basically if the fastest fast then you're good you know nothing that happens afterward can can break your well-defined behavior this will come very useful later during the second half okay good time to take a break if you have any questions come see me otherwise we'll resume for the second half where we change gears and look at some real log free programs start with some hues
Info
Channel: CppCon
Views: 14,109
Rating: undefined out of 5
Keywords: Fedor Pikus, CppCon 2015, Computer Science (Field), Bash Films, Conference Video Recording, Event Video Recording, Video Conferencing, Video Services, Non-blocking Algorithm, Deadlock
Id: lVBvHbJsg5Y
Channel Id: undefined
Length: 38min 46sec (2326 seconds)
Published: Fri Oct 09 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.