""go test -race" Under the Hood" by Kavya Joshi

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

good Maureen has everybody doing today cool my name is Kavya and today I'm here to talk to you about data race detection and more specifically my favorite rese detector tool the go race detector now like most of you in this room I think I write code for a living and like many of you in this room I think that involves reasoning about and thinking and building highly concurrent systems actually just just real that warm fuzzy we're all in this together feeling how many people here are intimately familiar with the pains and pleasures of multi-threaded programming about everybody well that's not surprising in this day and age multicores power everything from the i5s on your laptop to the snapdragons or AAS on your cell phones everything's multi-core and we want to write multi-threaded programs to take advantage to leverage having these multiple cores now obviously is the number of multi-threaded programs being written has gone up so too has the number and the frequency of the related bugs concurrency bugs like data races and deadlocks now this talk is about the former it's about data races and just to make sure we're all on the same page when I say data race I mean the situation whereby two threads or two which maybe userspace threads like Ingo or system threads when they concurrently access a shared memory location and one of those accesses is a right so here's an example of the of a simple probably what is the most obvious data Rees now why is this a data race this is a data race because there is at least one order of execution at least one ordering of the events which results in the wrong answer in this case for example if G ones read or G ones write happens before g 2s read we're good right we get the right answer we get the expected answer which is one at the the value of count at the end of the program however if they race to read and write the count variable so they both read count at the same time and then one right happens before the other right we get the wrong answer we get two and in this case the excesses are concurrent and this is a data race now this simple example is going to be a running example for the rest of the talk now what's not a data race a race condition a lot of people it's a common misconception that data races and race conditions are the same thing but they are not the same thing here is an example of not a data race right because of the lock the accesses to the count variable are synchronized they will never be concurrent however this is a race condition because if the if the if your program depends on the value of count we don't know deterministically ahead of time whether the final value of count is 1 or 2 so this is a race condition so why do we care about data races well in addition to being relevant data races are interesting because they're elusive they are non-deterministic their timing dependent and this means they are herds to write tests for before they happen and even harder to track down after the fact and that makes them somewhat fascinating in addition they have undefined consequences right if your program did something wild with the value of count and the fact that count had a value of 2 instead of 1 could result and some really crazy things it's important to point out that data races can also result in unexpected consequences like seg faults or crashes because the compiler assumes race free programs and based on that assumption and makes a number of optimizations so if your program breaks that rule if you're a program is not race free then all bets are off the less feature about data races that make them interesting is specific to language Lego and some other languages like Python which is that it's almost easy to unintentionally and unknowingly introduced data races the goal map implementation for example in the standard Lib is not safe for concurrent access and a lot of new group go programmers just don't know this so the obvious question at this point is given we want to write multi-threaded programs how can we protect our systems from the unknown consequences of these difficult to track down data race bugs in a manner that's reliable and scalable so the tried and tested method is tearing really hard at the code is not an option it's neither reliable or scalable at the other extreme protecting every single memory access with a lock or some sort of sync primitive is not scalable and this is where data race detectors come in swoop in and save the day these cool cool tools analyze the given program either statically that is based on the source code or dynamically as the program is running and produce a race report that looks like this this is the GU race detector output run on our example racy count program it gives us a hella lot of information it tells us the the write and read accesses what lines they occurred at all sorts of useful things now as the developer I can just go in and like synchronize does that one access that is racing um so are these reliable are these tools scalable these are good questions the answer is absolutely yes which is why I'm here but we'll come to the details at the end of the talk the more interesting question to me at least is the how is how do these tools detect both memory accesses and synchronization events and detect data races that is the question I will aim to answer in this talk using the go race detector as a case study how many golang developers do we have in the house all right sweet I can skip over the basics section then so the go race detector was in version 1.1 so it's fairly recent I y'all must be familiar with it it's available it's it's available in the itch tips whip go so it's available as part of the go tool train you can just do go build install slash whatever and pass in the race flag and you get an example output now the go race detector is a dynamic race detector and it's based on a C C++ library called thread sanitizer that is used widely in Google if you're familiar with it and the reason I chose this is my case study is because it's extremely it's battle-proven it's found a number of bugs it's used in industrial used on-the-go standard library and in google and it's open source which is nice alright then let's get started so the rest of this talk is structured as follows I will first go over the fundamentals required the prerequisite knowledge to understand the detector we will then dive into the deep deep belly of the giri's detector will then take a step back with our understanding and try and evaluate it and then I'll wrap up and leave time for Q&A so first things first and very quickly since most of you here are familiar with go concurrency and go is available through goroutines which are uses space threads and you use them as you use threads in other languages now the go memory model is important to understand because it it gives us the language guarantees about our program execution so it says that all the events executed by a go routine all the memory access events are guaranteed to happen in order but if you have two or more go routines and they can currently access some shared piece of data then their access must be synchronized otherwise all bets are off so how do you synchronize go routines you may use channels you may use mutexes and go also has support for Atomics which is neat alright so now that we understand what we need we have the background knowledge that we need about go let's talk about the next core concept concurrency now if we go back to our database example we see that one of the requisites is this concurrent memory access but how do you determine concurrent memory accesses with our simple example alone this is like this is not an easy answer right if our program gives us the wrong answer if it gives us two then we know that that must have been a data race but there are other executions possible and now you think about a larger program and this is a difficult question so in order to detect data races we need to be able to answer the question how can we determine concurrent memory accesses let's first think about it intuitively so this is a similar program to our race account example - the goroutines is the read and write of count can that ever be concurrent no because they are executing in the same goroutine right we don't have any goroutines this is single goroutines so they they cannot be concurrent accesses ever what about this example this is again similar to our racy count example and that we spawn to goroutines or two threads if you're that that's the way you think about it except the sinker the access to the if count block is synchronized with a lock what about in this case can a read of count and a write of count from the two different go routines ever be concurrent no again we know the answer in this case is no because we have a lock what the lock essentially does is create a dependency edge of sorts right only one goroutine can acquire a lock at a time it must release the lock before the next go routine can acquire it and since the right and the read urn opposite sides of this edge we can we can conclude that they will never be concurrent that was easy now in the literature our intuition about concurrency is formalized into the happens before relation don't let the name scare you it's essentially the same idea happens before simply gives us a means to order all the events in our system which may be memory access events like reads and writes or synchronization events like blocks and unlocks across threads or across go routines so it's simply it allows us to look at all the events across threads like across the entire program and order them now order both happens before has a few rules in order to say that one event happens before the other if you observed their results as such one of three things must be true either one they happen in the same goroutine or two there's synchronization pair and by that I mean they are like a lock unlock on the same mutex object or a send receive on the same channel or three by transitivity if we can establish that one event happens before the other and that event happened before third then we can conclude that the first and the third event and the first event happened before the third event if we can establish this we're good but if we can't establish that one happens before X happens before Y and what happens before X then Retro their concurrent let's apply this and make sure it actually works cuz you know why would you believe me this is our example with the mutex with the lock we know that happens before B because same girl routine by the lock unlock on the same object principle we know that B happens before C and so we can conclude that a happens before D and this is the same conclusion we came to intuitively right we said that the right happens before the read of the next goroutine what about if we apply this to our receipt count example well let's see there are no edges in this diagram we know that a happens before B and C happens before D but what about a and C what about seeing a there's no edge between them and so we conclude that they are concurrent another useful way to visualize happens before is as a graph if you're if you think in math we can establish what happens before path it happens before relationship if there's a path between them if not they are concurrent or we get so far cool so now that we know that for data races we need to determine concurrency and we're going to determine concurrency but happens before the next obvious question is well how do you implement how do you implement happens before and again for this we turn to the literature and the literature says use vector clocks our folks here familiar with vector clocks okay so so okay so don't let the name scare you the concepts pretty simple the idea behind vector clocks is each actor in your system each go routine or each thread has its own vector clock and the vector clock is simply an array of n clocks where n is the number of threads or goroutines so we'll work through an example quickly and each clock corresponds to one go routine right and these aren't clocks in the sense of time stamps you they're logical clocks or scalar clocks you can think of them as versions there's certain update rules and to understand this let's just walk through an example so these are the vector clocks of our to go Jeanne's this uses the count with a lock example now at the beginning of the world both of them add a vector clocks are initialized to 0 now at each event a goroutine is going to update its index in its vector clock it's going to incremented by 1 so at the lock this becomes 1 0 at the read this becomes 2 0 at the right this becomes 3 0 and then there's an unlock and this becomes 4 0 now the unlock is a special event because it's an event that it's a synchronization event it's going to establish a happens before an edge so at this point we call this like we call this a send event because this is an unlock once it unlocks it it can send though the mutex over and at this point it sends conceptually it sends its vector clock over as well now goroutine 2 can receive the it can receive the mutex so it sees one event and increments its vector clock by 1 and again since this is a special synchronization event a received synchronization event it performs the special update it takes the max of every element in the vector clock and updates its value its entire vector clock to that to that max so from 0 1 this becomes 4 1 does that make sense cool so vector clocks are supposed to help us determine concurrency let's see how that happens so applied to our program 0 0 first event second event third event fourth event is the unlock so it's going to send it over now this point goroutine 2 has an event and so it's vector clock is 4 1 and its second event now our question is does a happen before D and the answer is if is vector clock is less than DS vector clock they happen before each other it happens before D is that true here well 3 is less than 4 0 is less than 2 so yes they happen before each other now let's apply it to a different example a more complicated example here we've introduced a third go routine and we're going to establish a happens before edge between goroutine 1 and 3 so it's going to send something over with that send it's going to send its vector clock over to and our question is is D does D happen before F what do you think no right so we're going to compare their vector clocks and we see that every element in D is not less than every element in F so they don't happen before each other what about F happens before D for the same reason F does not happen before D and since they don't happen before each other this is a concurrent access so at this point you're probably thinking well cool Kavya now I know all about happens before in vector clocks but why didn't we just go back to grad school and the answer is because this is what the girl race detector does it determines if two memory accesses are concurrent by a by establishing this happens before relationship and to do that it uses vector clocks this method of race detection is called a pure happens before detection there are other types of detection but this is the one we'll talk about today all right everybody with me so far cool it is now time to dive into the deep deep belly of the go race detector and look at how it's implemented so in order to implement happens before using bec two clocks what would the go race detector need to do well let's see it will have to create vector clocks for goroutines at goroutine creation it will then have to update these vector clocks at each memory access and synchronization events and then at each memory access event it has to compare these vector clocks to decide if there's a data race now if you stare really hard at these bullet points and maybe the coloring will give you a hint you will see that we can essentially tease out two subproblems a problem of event generation and a problem if event consumption and we're going to make it the program's responsibility to handle the event generation in that every time the event every time the program performs a goroutine creation or a memory access it's going to report it and the race detector is simply going to consume these events and take care of updating its state it's going to create the vector clocks it's going to update the vector clocks it's going to take care of all of that at a high level this is how the go race detector is implemented its implemented conceptually as a state machine whereby the program as it's running generates events and the vector the race detector keeps program state vector clocks etc and it updates them based on the events it receives but problem does this mean that we have to modify our programs so at every memory access at every synchronization every goroutine creation do we have to call out to the race detector do we have to modify our programs well no we just saw that we can take our program and run the detector on it so what's going on to answer that question let's look at the source so this is our racy count program compiled without the race without race detection enabled this is what the assembly for the increment count function alone looks like and this is uninteresting this looks pretty standard the function is defined now if we compile the program with race detection enabled what is the output of that look like well for one it looks longer but there's something else there a number of extra calls inserted there is a race func enter there's a race read a race right what's happened is that the go compiler the GC compiler has essentially transformed our program to look like this whoops to look like this right it's inserted the calls out to the race detective for us now the way the compiler does this is pretty standard so I'm not going to go over the details the idea is that if race detection is enabled it adds an extra instrumentation pass over the ir so it essentially modifies the code tree generated well this is awesome it means that we don't have to modify our programs to track memory accesses well what about synchronization events and goroutine creation for that let's look at the source of the ghost standard library this is mutex go this gives us the mutex sync primitive and we see that there is an if race enabled block which essentially calls out to erase the Tector function Reese acquire similarly goroutines are created in proc go write the new proc one function is what's responsible for creating a new Gore from the ghost standard library and we see the same if race enabled block with a function call add to the grease detector cool so what did these race detective what do these functions do for that let's go to the fiercest and see what's going on well ignore the fact that it's an assembly file we can talk about it later but what we notice is that there's a call to T San read this is a call to the thread sanitizer library the C C++ race detection library that we talked about the goriest detector isn't just based on TCM it's literally built on T standard calls into T San so at this point we know two things we know that one the program can generate these events for us thanks to compiler instrumentation and the ghost standard lib and we know that the race detector the the magic sauce that takes care of all the state is T San so how does T San work well T San implements the happens before detection so what does it need to do if we go back to our checklist we see that it has to do something with vector clocks right and to do this it uses something called thread state it needs to compute these happen before edges and to do that it uses those two data structures and finally it needs to compare vector clocks to detect data races rather than talking about the details let's just look at an example so this is our program and we'll see what T San does at each step so the first call to spawn a goroutine go increment count results in a call out to t san as we saw and at this point t san creates this thread state now thread state simply contains the vector clock for that goroutine and it returns it to the program so it can be stored on the goroutine alright so check we have a vector clock for our goal routine now we're going to do we're going to perform a read we're going to check count equal to zero now at this point T sound has to do two things it has to check if there's a data race with the previous access and if there isn't it has to stir information about this access for future detections let's look at two first to do this TCN uses something called shadow State it essentially creates a shadow word to track each memory access and this stores information about the access like this like the thread ID the clock the position like where the access occurred and whether the access was a read or write now these the shadow State is directly mapped the details don't matter now teason performs two optimizations to the shadow State the first is rather than storing one per memory access because that would just consume a lot of memory it stores a a fixed number of Shetty's shadow cells in the current implementation it's for for a region of memory let's just see how that works so if there is a read to the first two bytes of an applications word this is the this is the shadow word created now if another goroutine comes along and writes to the last four bytes this is the next shadow word created does this make sense cool the second optimization is that teason does not store the entire vector clock it only stores the scalar clock which is the accessors index in its vector clock now these two optimizations may seem surprising you know doesn't it caused a loss in percent in procession it turns out to not matter as we will see so going back to our program so the dog the read of count translates to the creation of this shadow word right we have it's the thread ID we have its clock which is zero we have the the region red and we have a zero because it was a read now at the count there is another shadow word created that looks like this and here's where it gets interesting g2 comes along and g2 performs a read and this is what its shadow word looks like right that's G 2 0 0 8 M 0 at this point T sounds going to check if there's a Reese let's let's see how that's done ok the algorithm at this point is really is beautifully simple all you have to do is check the accessors vector clock that is the full vector clock and the new shadow word with the existing shadow word let's see how this works so what exactly are we checking for one do the do the access locations overlap right are they reading if shared oh sorry are they accessing a shared memory location yes they are are any of these accesses are right why yes they are are the T IDs different are these two different goroutines why yes they are and lastly can we establish a happens before edge between them well let's see we're going to use vector clocks right that's why we learned about them so we're going to take G to Ector clock and compare it to the existing shadow words clock in order to decide that the read happened before the right we the shadow words vector clock to be less than the g20 clock right is that true in this case well no it's not well for red checkmarks what could that possibly mean it's a race so this point we've seen how T San implements race detection this was the simplest example there are obviously other there there's obviously a lot more T sans this hell a huge C++ library but a few a few important notes first T San obviously must track synchronization events as well right now these events are important because they establish that happens before edges and in order to facilitate that transfer that exchange of vector clocks T San uses like a sinc var which is another data structure but the interesting thing about this data structure is its stores of vector clock as well so when a sender and receiver manipulate is the vector clock is effectively transferred over the other points note is that T sin tracks file descriptor memory allocators and all of that the goal of this being that it is able to recognize implicitly synchronized events to so not just explicitly by a walk of art channel but implicit synchronizations that may happen through file descriptors the t san recognizes that the goal of this is so it doesn't report false positives speaking of false positives let's quickly answer our remaining two questions we know how the how it works but we still want to know is it reliable is t s-- is the girl race detector via t san scalable well first question well season or the giri's detector does not report false positives so if it reports a race there is a data race there is some order of events in your program that would result in a database but just because it reports a data race it doesn't mean that it's dangerous right there is such a thing as benign data races and that's up to you as the programmer to decide though the consequences of the data race it can miss races though and this is important this is a property of dynamic race detectors that they depend on the particular execution the particular execution of the program that you run them on right so for example if we ran our example program an enough number of times there would be at least one execution where there wouldn't be a data race and in that case the race detector wouldn reporter 'yes so depends on the actual events that occur and is it reliable well yes it's battle-proven like we said Google LLVM there are a number of users out there a second question is it's scalable oh not really this is the program slowed down at the execution time and that is the memory usage overhead of using the go race detector but is it scalable with respects to finding data races well it's more scalable than staring at your giant codebase and it's certainly more scalable with respects to your programs performance than locking everything and shipping that to production the recommended use of the go race detector is in E is in your testing environment or on your staging environment or even like a production cannery box and it works well for those persons for those purposes alright that brings us to the end of the talk thank you all for coming as you leave today I hope you walk away with an appreciation for the fascinating nature of the race detection the data Ruiz detection problem and I hope you learned about happens before and happens before using vector clocks and I hope you've learned about the the go race detector and how that works by compiler instrumentation and this hella cool T San runtime library thank you we have about five minutes for questions yes yes right so I rushed over the optimization sections we can go back to that yeah that's a great question so the so this is the simple scheme right where you store one shadow word per memory acts and per memory location per access and we you can see how that would work but the optimization that teason performs is foreign applications for 8 bytes of the application so for a word in a 64-bit but for an entire word it's tracked only for accesses to that entire region are tracked ok and you might think if you throw if you only keep the last four accesses and evicts accesses this might affect precision and that's true right but the race detective doesn't claim to not report it doesn't claim to be like foolproof right yeah so that's okay now let's see how this actually works so say that there's an access a read access to those two bytes this is the shadow word created and we know that it's dos two bytes that were accessed because of the position and the offset which says 0 to know if there's another access to this region of memory like a write of the last four bytes that that index is 4 8 so we know that it was to a different region yeah if there was a if there was another region like if there was another block that would be tracked by another set of 4 yes that's right and these the shadow state is stored in whoops yeah no it's not forever it's for per eight bytes of the application Thanks thanks that's a good question let me pull up yes yeah so an alternative to dynamic race detection is static race detection right so instead of this instead of grease detection the race detector analyzed the program source and the the advantage of this is that only a single pass of the program is required and and and that single pass should report all races and so it ends up it ends up being more effective in that sense but the problem with it is that you have to change your source code of infrasonic race detection you'd have to add these annotations to call out to the race detector and that's obviously for a larger code bases that's not that's not something that they'd want to do with respect to the potential for it in a language like oh well I don't know if a good reason why it hasn't been implemented I think the reason that the go race detector is is what it is is because go happen like go was created at Google and T San was widely used at Google so at the time it came down to choosing a race detector I believe that was like one of the leading factors in the decision mm-hmm yeah any other questions yes so tea Sam so Tisa and requires like help from the compiler to run right so the GCC compiler and go so go has two compilers the GCC compiler and the GCC go compiler it's not supported by GCC go but the GCC compiler supports this in addition you can also use T San with the later versions of are not late anymore but with certain versions of LLVM LLVM is clang and with GCC and that you would use that for race detection in C C++ yes yes yeah yeah it's pretty cool the other neat thing about tea Sam is that it allows it recognizes your customs synchronization primitive via dynamic annotations so if you created your own class or like your own object and you wanted that's to behave like a synchronization primitive in that you wanted that to establish these happens before edges and not report data races on that you can arrange for tea sand to do that by dynamic annotations which is yeah you're probably kid yeah yeah I think so yeah though so this comes back to like happens before right in order to get a partial ordering over all the events across goroutines you you must be able to recognize those edges because that that's what establishes the ordering yes yeah exactly it's to Trent it's to facilitate that exchange of vector clocks because what needs to happen is when there's a synchronization event Tisa needs to update the thread states right to the vector clock stored in the thread states to reflect the fact that that was a synchronization event yes oh I have not because its proprietary you can't get access to it or I would have used it I see sorry I should be doing this the question was how does this compare to the Intel Intel has a race detector what what is it called it's called system studio or something like that but the question was how does the go race detector compared to the Intel race detector and I believe it does something by injecting I see okay nice cool alright thank you I believe lunch is outside

Info

Channel: Strange Loop Conference

Views: 7,254

Rating: 4.931818 out of 5

Keywords: Go language, data race, static data analyzer

Id: 5erqWdlhQLA

Channel Id: undefined

Length: 44min 13sec (2653 seconds)

Published: Sun Sep 18 2016