Introduction to Go, part 14: Goroutines

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello in today's video I want to have a conversation about the tools that we have available to implement concurrent and parallel programming in the go language now if you come this far in the series or you've done any research and go at all concurrent programming is one of the hottest topics that has talked about especially among people who are learning to go language for the first time so we're going to talk about this concept of a go routine and how that enables us to create efficient and highly concurrent applications well start our conversation by learning how to create go routines themselves so this is going to be the basics of how we create go routines and how we can work with them a little bit then we'll move into a conversation about synchronization and we'll talk about two concepts weight groups and mutexes and how we can use those to get multiple go routines to work together because one of the challenges that we're going to have with go routines is also one of the greatest advantages go routines are going to allow our application to work on multiple things at the same time however you're often going to run into situations where you need a certain bit of functionality in your application to wait into one or more of those concurrent calculations is complete so we'll talk about how to use synchronization primitives in order to do that they don't move into a discussion of parallelism now up to this point our conversation is going to be about concurrency in the go language and concurrency is just the ability of the application to work on multiple things at the same time it doesn't mean it can work on them at the same time it just means it has multiple things that it can be doing when we talk about parallelism we'll talk about how we can take our go applications and enable them to work on those concurrent calculations in parallel or in other words introduce parallelism into our applications and finally we're going to wrap this video up again with a little section on best practices just to talk about some of the gotchas that you can run into with concurrent and parallel programming and some of the tools that are available to help keep your application safe and away from those minefields okay so let's get started by talking about how to create go routines okay so the first thing that you're going to notice is that we're in Visual Studio code right now now the reason for that is while we can certainly play with go routines in the playground when we start to get into parallelism that's going to be limited the playground because the playground only enables us to use one core at a time so when we're running locally we can use as many cores as we want so we can truly run our applications in parallel so some of the things that I want to show you are going to be easier to illustrate in this environment so the first thing that I want to show you is how we can create our very first go routine so the first thing that we're going to need to do is we're going to need to have a function here so I will create a function called say hello and this is going to be a very simple function all it's going to do is well say hello so we'll start with that and that's going to be just enough for us to get started seeing what's going to happen with our application so we can of course call the say hello function and call that from the main function so we can run this application by just using go run and pointing it to that file and of course it says hello so no big surprises there now to turn this into a go routine all we have to do is in front of the function invocation just type the keyword go so what that's going to do is that's going to tell go to spin off what's called a green thread and run the say hello function in that green thread now I need to take a little bit of a moment here to talk about threads most programming languages that you've probably heard of and worked with use OS threads or use operating system threads and what that means is that they've got an individual function call stack dedicated to the execution of whatever code is handed to that thread now traditionally these tend to be very very large they have for example about 1 megabyte of RAM they take quite a bit of time for the application to set up and so you want to be very conservative about how you use your threads and that's where you get into concepts of thread pooling and things like that because the creation and destruction of threads is very expensive and so we want to avoid that in most programming languages such as Java or C sharp now in go it follows a little bit of a different model and as a matter of fact the first place I saw this model was used by the Erlang language and this is using what's called green threads so instead of creating these very massive heavy overhead threads we're going to create an abstraction of a thread that we're going to call a go routine now inside of the go runtime we've got a scheduler that's going to map these go routines onto these operating system threads for periods of time and the scheduler will then take turns with every CPU thread that's available and assign the different go routines a certain amount of processing time on those threads but we don't have to interact with those low-level threads directly we're interacting with these high-level go routines now the advantage of that is since we have this abstraction go routine can start with very very small stack spaces because they can be reallocated very very quickly and so they're very cheap to create and to destroy so it's not uncommon in a go application to see thousands or tens of thousands of go routines running at the same time and the application is no problem with that at all now if you compare that to other languages that rely on operating system threads that have one megabyte of overhead there's no way you're going to run 10,000 threads in an environment like that so by using go routines we get this nice lightweight abstraction over a thread and we no longer have to be afraid of creating and destroying them so anyway let's go ahead and run this and see what happens and it's going to be a little disappointing because you notice that our message doesn't print out and the reason for that is our main function is actually executing in a go routine itself so what we did here in line 6 was we told the main function to spawn another go routine but the application exits as soon as the main function is done so as soon as it spawned that go routine it finished it didn't have any more work to do so the say hello function never actually had any time available to it to print out its message so we can get around that a little bit by using a horrible practice but it's good enough to get us started in understanding this so we'll just put an arbitrary sleep calling here in order to get the main function to delay a little bit now when we run the application we see that we do get our hello message printed out now as opposed to our first run of this it's not actually the main function that's executing this code it's a go routine that we're spawning off from the main function and that's what's responsible for printing out the message okay now this is a pretty typical use case of the go routine where we're using the go routine to invoke a function but we don't have to do that as a matter of fact let me just drop in this example here which is basically the same except for instead of using a named function I'm using this anonymous function here so notice that I've got this anonymously declared function and I'm invoking it immediately and I'm launching it with go routine now what's interesting about it is I'm printing out the message variable that I've defined up here on line 9 down here inside of the go routine so if I run this we do in fact see that it works now the reason that it works is go has the concept of closures which means that this anonymous function actually does have access to the variables in the outer scope so it can take advantage of this MSG variables that we declared up here on line 9 and use it inside of the go routine even though the go routine is running with a completely different execution stack the go runtime understands where to get that MSG variable and it takes care of it for us now the problem with this is that we've actually created a dependency between the variable in the main function and the variable in the go routine so to illustrate how that could be a problem let me modify the example just a little bit so I'm declaring the variable message and setting it equal to a hello I'm then printing it out in the go routine and then right after I launch the go routine right here in line 13 I'm reassigning the variable to goodbye so if I go ahead and run this you see that we in fact get goodbye printed out in the go routine not hello like you might expect based on how the program is written and the reason for that and it's not always going to be guaranteed to execute this way but most of the time the ghost scheduler is not going to interrupt the main thread until it hits the sleep call on line 14 which means even though it launches another go routine on line 10 it doesn't actually give it any love yet it's still executing the main function and so it actually gets to line 13 and reassigns the value of the message variable before the go routine has a chance to print it out and this is actually creating what's called a race condition and we'll come back and talk about race conditions at the end of this video but this is a bad thing and generally it's something that you want to avoid so while you can access variables via the closure it's generally not a good idea to do that so if that's not a good idea what are your other options well notice that we have a function here and this is just a function invocation there's nothing special about it just because we put the golden keyword in front of it it's to function so functions can take arguments so what happens if we add a message argument here and then down in the parens where we're actually invoking the function what if we pass in the message parameter well since we're passing this in by value so we're actually going to copy the string hello into the function then we've actually decoupled the message variable in the main function from the go routine because now this message that's going to print out is actually a copy that we're passing in when we're invoking the function for the go routine so now if we run this we see that we get Hello printed out so this is generally the way that you're going to want to pass data into your go routines use arguments to do that and really intend for the variables to be coupled together now this example so far is working but it's really not best practice and the reason it's not best practice is because we're using this sleep call so we're actually binding the applications performance and the applications clock cycles to the real-world clock and that's very unreliable so in order to get your applications to work you're typically going to have to sleep for a very long time relative to the average performance time in order to get the performance to be reliable so we don't want to use sleep calls in production at least not for something like this so what are the other alternatives well one of the other alternatives that we have is to use what's called a wait group so let's go ahead and add one in and then while we're doing that we'll talk about what they are so I'm going to create another variable it looks like my Auto formatting just help me here and we'll pull that from the sync package and we'll create an object of type weight group so I just need to put my curly braces here to initialize it now what a weight group does is it's designed to synchronize multiple go routines together so in this application we've got to go routines that we care about we've got the go routine that's executing the main function and we've got this go routine that we're spawning off here on line 13 so what we want to do is we want to synchronize the main function to this anonymous go routine so we're going to do that by telling the weight group that we've got another go routine that we wanted to synchronize to it starts off synchronizing to zero and so we're going to add one because we want to tell it that we're going to synchronize to this go routine right here now once it's done we don't need this line anymore once it's done we're going to go ahead and exit the application and we will do that by just waiting on the wait group and we do that by using the wait method right here now when the go routine is done then it can tell the wait group that it's actually completed its execution and we do that by using the done method so if we execute that basically what that's going to do is it's going to decrement the number of groups that the wait group is waiting on and since we added one and it's going to decrement by wanting to be down to zero and then the wait method will say okay it's time for us to go ahead and finish up our application run so if I save this off and I go ahead and run it we see the in fact that our application is performing as it did before but now it's taking just enough time to complete the execution we're not relying on the road clock anymore and having the Jimmy run with variables and hope that everything stays consistent now in this example we're just synchronizing to go routines together but only one of the go routines is really doing any work the main function in this example is just storing data and spawning other go routines but we can have multiple go routines that are working on the same data and we might need to synchronize those together and that can be a little bit tricky let me drop in this example here and we'll talk about it so I'm creating weight group again up here on line 8 and then I'm initializing a counter variable inside of my main function I'm actually spawning 20 go routines because inside of this for loop each time I run through I add two to the weight group to let it know there are two more go routines that are running and then I spawn a say hello and then I spawn an increment here and then I just have a weight method call here on line 17 just to make sure that the main function does makes it out too early now in say hello all I'm going to do is I'm going to print out hello and then going to print out the counter value and then in the increment function down here I'm just going to increment the counter by one now after each one of those is done I'm going to call the done method on the weight group and everything should be just fine now notice that I've broken my own rule here the weight group is actually being accessed globally in this application and that makes sense because I actually do want to share this object and the weight group is safe to use concurrently like this it's designed to be using this way so let's go ahead and run this application and see what's going to happen so our intuition says that we're going to print say hello so it should print say hello zero because the counters value is zero right here and then it's going to increment it and then it's going to say hello again it's going to increment it so it should say hello number zero hello number one hello number two and so on so let's go ahead and run this and see what happens and we see that we get a mess we in fact don't have any kind of reliable behavior going on here we printed one twice and then two three four five so that seemed to work consistently and then we jumped all the way to nine we printed 10 out twice and then we went back to nine for some reason and if we run this again we'll get a completely different result so what's happening here is our go routines are actually racing against each other so we have no synchronization between the go routines they're just going hell-bent for leather and going as fast as they can to accomplish the work that we've asked them to do regardless of what else is going on in the application so in order to correct this we need to find a way to synchronize these guys together now we could probably find a way to use a weight group on this but we've already talked about weight groups so I want to talk about another way to do this so we're going to introduce this concept of a mutex so with a mutex let me paste this example in here and then we'll talk about what it does but a mutex is basically a lock that the application is going to honor now in this case you see on line 11 I'm creating what's called an RW mutex which is a readwrite mutex now a simple mutex is simply locked or unlocked so if the mutex is locked and something tries to manipulate that value it has to wait until the mutex is unlocked and it can obtain the mutex lock itself so what we can do with that is we can actually protect parts of our code so that only one entity can be manipulating that code at a time and typically what we're going to use that for is to protect data to ensure that only one thing can access the data at a single time with an RW mutex we've changed things a little bit we basically said as many things as want to can read this data but only one can write it at a time and if anything is reading then we can't write to it at all so we can have an infinite number of readers but only one writer and so when something comes in and makes a write request it's going to wait till all the readers are done and then the writer is going to lock the mutex so nothing can read it or write it until the writer is done so in this modification actually I don't want to talk about that line yet we'll come back and revisit that so in this modification what I've done here is I'm attempting to use a mutex to synchronize things together so the modification is down here it may say hello I'm just reading the value of the counter variable and that's what I'm trying to protect I'm trying to protect the counter variable from concurrent reading and writing because that's what was getting us into trouble so in line 22 I obtained a read lock on the mutex and then I print out my message and then I release that lock using the our unlock method now in the increment that's what I'm actually mutating the data so I need a write lock and so I'm going to call the lock method on the mutex increment the value and then I'm going to call unlock now if I run this application I actually haven't gotten quite where I want to be so I don't get the weird random behavior that I was seeing before but you notice that something seems to be out of sync still because I get hello one hello to and then it stays it to and if I keep running this I actually can get different behaviors but notice that I'm always going in the proper order so I fixed part of my problem but I haven't fixed all of it yet so I can keep running actually that one got pretty close but there's obviously something else going on here well the reason that we have an issue here is we're still working within the go routines so if this say hello function gets executed twice by its go routines and the increment function doesn't get called in between that's where we get this behavior here where we actually get the same message printing out twice because we don't have a chance to lock this mutex before we try and read it that second time so the way to address is is we actually have to lock the mutex outside of the context of the go routine so we have a reliable execution model so let's go ahead and paste in a small modification here now all I've done is I've moved the locks out here so the locks are now executing before each go routine executes and then I unlock them when the go routine is done so if I run this we actually see that I now get the behavior that I expect I see 0 through 9 printed out and if I run it again and I run it again and I run it again everything is working great so the reason that this is working is I'm actually locking the mutexes in a single context so the main function is actually executing the locks and then asynchronously I'll unlock them once I'm done with the asynchronous operation now the problem with this application is I basically have completely destroyed concurrency and parallelism in this application because all of these mutexes are forcing the data to be synchronized and run in a single threaded way so any potential benefits that I would get from the go routines are actually gone as a matter of fact this application probably performs worse than one without go routines because I'm mucking around with this mutex and I'm constantly locking it and unlocking it so this is an example where if this is all that this application needed to do we would actually be much better served by removing the go routines and just running this with a single execution path and removing any concurrency at all however there are often situations where you can get a significant advantage by running things in parallel and so you can use wait groups or mutexes in order to synchronize things together and make sure that your data is protected and everything is playing well together now I have this line in here and I apologize for that I really shouldn't have had that in these earlier examples but I do want to talk about this function from the runtime package called go Max procs so in modern versions of go if you look at this go Max procs variable let's just go ahead and execute this simple program all it's going to do is it's going to tell me the number of threads that are available so it prints out that there are 4 threads available in the application and as a matter of fact let me just add this carriage return in here and run this again the way things look a little better and you see that I have 4 threads available so by default what go is going to do is it's going to give you the number of operating system threads equal to the number of cores that are available on the machine so in this virtual machine I've exposed 4 cores to the VM so I have by default 4 OS threads that I can work with now I can change that value to anything I want so for example I can change that to 1 and now my application is running single threaded so now I have a truly concurrent application with no parallelism at all so this can be useful in situations where there's a lot of data nation going on and you really need to be careful to avoid any kind of race conditions that parallelism can incur and maybe there's no better way to do it now I would say there's an architecture problem there but it is possible to run an application in a single threaded way by setting go max proxy equal to one now if you're wondering what this negative one does when you invoke the go max proxy function it actually returns the number of threads that were previously set and if you pass a negative number then it doesn't change the values so this go max approximate of one all that's doing is that's letting us interrogate how many threads we have available now we can also set this to for example 100 there's nothing stopping us from creating a massive number of operating system threads now what I found in working with go is that go max procs is a tuning variable for you to work with so the general advice is one operating system thread per core is a minimum but a lot of times you'll actually find that your application will get faster by increasing go max procs beyond that value now if you get up too high like for example 100 then you can run into other problems because now you've got additional memory overhead because you're maintaining 100 operating system threads your scheduler has to work harder because it's got all these different threads to manage and so eventually the performance Peaks and it starts to fall back off because your application is constantly rescheduling go routines on different threads and so you're losing time every time that occurs so as you get your application closer to production I would encourage you definitely develop with go max procs greater than one because you want to reveal those race conditions as early as possible but just before you release the production you might want to run your application through a performance test suite with varying values of go max procs to see where it's going to perform the best now the last thing that I want to talk about are some best practices to keep in mind when you're working with go routines in the go language so let's take a look at those next go routines in the go language are very powerful and it can be easy to let them get a little bit out of hand so I want to go through and give you some advice on how to work with go routines in your own applications the first bit of advice is if you're working in a library be very very careful about creating go routines yourself because generally it's better to let the consumer control the concurrency of the library not the library itself if you force your library to work concurrently then that can actually cause your consumers to have more problems synchronizing data together so in general keep things simple keep things single threaded and let the consumer of your library decide when to use a go routine what not to now this advice can be softened a little bit if you have a function call that's going to return a channel that will return the result then having the go routine in there might not be such a bad thing because your consumer never really has to worry about how that unit of work is getting done they don't really care if it's running concurrently or not because they are just going to be listening for the results on that channel but we haven't talked about channels yet so we'll revisit that topic in the next video but for now if you're creating a library trying to avoid go routines at least go routines that are going to be surface to the consumer and have them forced to work with them when you create a go routine know how it's going to end now we're going to see how to do this a little bit more when we talk about channels but it's really easy to have a go routine launched as kind of a watch or go routine so it's going to be just sitting out there listening for messages to come in and it's going to process those messages as they arrive however if you don't have a way to stop that go routine that go routine is going to continue on forever and so it's constantly going to be a drain on the resources of your application and eventually as the go routine ages it could even cause other issues and cause your application to crash the other thing that I want to give you some advice about is check for race conditions at compile time so I want to jump back over to the editor and show you how to do that but it's very important and very simple to do in most environments that go runs in so let's jump over to the editor and take a look at that in order to see it we don't have to go any further than this example now I know this is right from the beginning of the video and it's got sleeps in there and it's got some bad practices but if you remember if we run this application then it prints goodbye instead of the hello message that we originally printed so how could we have detected this without running the application well you might not think it's terribly important to be able to do that because it's obvious we've got some kind of a problem here and all we have to do is apply debugging skills but there are other cases where this is very very subtle and very very hard to track down without a little bit of help well unfortunately the go compiler has quite a bit of help available to you and it is simple to invoke as just adding a - race flag - go run go install go build whatever you're using to get your application up and running so let's go ahead and try that and see what it says about our little application here and you notice it does run the application because we invoked go run so we see goodbye printed here but notice what we got up here we got this data race message so it's telling us it sees that the same area of Zeta is being accessed by two different executing go routines so it says the first one that it found was in go routine 6 which an internal identifier unless we're profiling we've got no idea what go routine 6 is but it does tell us it was in folks on line 11 so apparently in this run go routine 6 was this go routine right here and it was accessing the MSB variable it also sees that we access the MSB variable on line 13 which is in our main function and so by adding the - race flag we get all of this additional information where the go compiler itself analyzes our application and tries to determine if we have any race conditions so I would strongly encourage you if you've got any kind of concurrency at all in your application you're going to want to run this because it's very simple check it runs very very quickly and it's going to help you prevent very subtle bugs from getting into your production code ok so that's what I have to talk about with go routines maybe you were expecting more but go routines are really quite simple now when we get into our next conversation which would be about channels things get a little bit more complicated but go routines are relatively straightforward so let's go into a summary and review what we've talked about in this video in this video we learned about go routines and how we can use them to create concurrent and parallel execution paths in our applications we started by learning how to create go routines themselves we learned that by adding the go keyword in front of a function call we've created a go routine and that's all that it takes there's no special semantics there's no spec some things that need to be done it's simply a function call with the keyword go in front of it now when we're using anonymous functions we in general want to pass data as local variables so you want to add a parameter into that anonymous function and pass data into the go routine instead of relying on the closures to prevent any kind of race conditions as you try and manipulate that data now that's not always true we saw with weight groups that we access that globally because that was our intention we truly did want that to be available in multiple scopes but even then we could pass a pointer in in order to be very clear about what information that go routine should have access to then we talked about the different ways that we can synchronize multiple go routines together one of the challenges that we have with go routines is now we've got all sorts of things happening and there's no way to ensure without synchronization how they're going to interact with one another now for a lot of concurrent calculations that's not a problem at all because the two might not be related to one another but often you get into situations where you're relying on the results of multiple calculations or something needs to know the result of the work that's been done or you've got a shared memory issue and you need to make sure that those go routines aren't trying to manipulate the same data at the same time so we can use weight groups to wait for groups of go routines to complete so we saw that we have three methods that are interesting there we have the add method to inform the weight group there are more go routines for it to wait on we have the weight method that's going to block the go routine that is caught on until the weight group is completed and then we have the done method available on that weight group that lets the weight group know that one of the go routines is completed with its work we also talked about the mutex and the r:w mutex and how those are generally used to protect data so if you have a piece of data that's going to be accessed in multiple locations in your application then you can protect that access by using a mutex or an RW mutex to ensure that only one go routine is manipulating that data at one time we then talked about parallelism and how parallelism can introduce some really tricky challenges into your go applications we talked about how by default bill will use the number of CPU threads equal to the number of available cores on the computer that is running on we talked about how we can change that using the go max procs function from the runtime package and we talked about how more threads will generally increase performance but too many can actually slow it down so in general if you're developing an application you want to start from day one with go max procs greater than one to find any concurrency in any race conditions early on in your application development but don't settle on a final number until right as you get close to production and you have a performance test suite that you can work with to find out what the best value for go max proxy is for your application because while the starting number is a very good number to start with a lot of applications actually perform better with a value higher or lower than that default value and finally we wrapped up with a discussion of some best practices to keep in mind when you're working with go routines we learned that if you're a library developer you should avoid the creation of go routines that are going to be exposed to the consumer of your library let the consumer control the concurrence of your applications because they're the ones that are in the best place to know if something needs to be executed single threaded or if it can be executed concurrently when creating a go routine know how it's going to end it's very easy to get into situations where go routines start leaking memory because they're never cleaned up because they never quite get done with their work now normally a go routine is killed as soon as it finishes its execution and we saw that with the main function the main function runs in a go routine and that go routine terminates as soon as the main function exits we also saw in our say hello function as soon as it printed its message out and the function exited that go routine was killed and it was cleaned up so it was very clear when those go routines life cycle is going to be over however if you've got go routines that are listening for messages in a continuous loop then make sure that you code in a way to shut those go routines down so that once you're done using them and clean up the memory that they're using also as you're going along with your application development check for race conditions it's not that hard to do you just have to add - race onto the go command that's compiling your application and then the go compiler is going to analyze your application and try and locate places in it that have the potential of accessing the same memory at the same time or in an unsinkable a causing very subtle and potentially very disastrous bugs where your application when it gets to production okay so that wraps up what I have to talk about with go routines I hope it was very valuable for you if you have any questions or comments you know where to put them and I'll come at you soon with the next video in the intro to go series bye
Info
Channel: Failing Forward
Views: 27,037
Rating: undefined out of 5
Keywords: golang
Id: icbFEmh7Ym0
Channel Id: undefined
Length: 31min 31sec (1891 seconds)
Published: Sat May 06 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.