Lecture 2: RPC and Threads

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today I'd like to talk about NGO which is interesting especially interesting for us in this course because course NGO is the language at the labs you're all going to do the labs in and so I want to focus today particularly on some of the machinery that sort of most useful in the labs and in most particular to distributed programming um first of all you know it's worth asking why we use go in this class in fact we could have used any one of a number of other system style languages plenty languages like Java or C sharp or even Python that provide the kind of facilities we need and indeed we used to use C++ in this class and it worked out fine it'll go indeed like many other languages provides a bunch of features which are particularly convenient that's good support for threads and locking and synchronization between threads which we use a lot it is a convenient remote procedure call package which doesn't sound like much but it actually turns out to be a significant constraint from in languages like C++ for example it's actually a bit hard to find a convenient easy to use remote procedure call package and of course we use it all the time in this course or programs and different machines to talk to each other unlike C++ go is type safe and memory safe that is it's pretty hard to write a program that due to a bug scribbles over some random piece of memory and then causes the program to do mysterious things and that just eliminates a big class of bugs similarly it's garbage collected which means you never in danger of priam the same memory twice or free memory that's still in use or something the garbage vector just frees things when they stop being used and one thing it's maybe not obvious until you played around with just this kind of programming before but the combination of threads and garbage collection is particularly important one of the things that goes wrong in a non garbage collected language like C++ if you use threads is that it's always a bit of a puzzle and requires a bunch of bookkeeping to figure out when the last thread that's using a shared object has finished using that object because only then can you free the object as you end up writing quite a bit of coat it's like manually the programmer it's about a bunch of code to manually you know do reference counting or something in order to figure out you know when the last thread stopped using an object and that's just a pain and that problem completely goes away if you use garbage collection like we haven't go and finally the language is simple much simpler than C++ one of the problems with using C++ is that often if you made an error you know maybe even just a typo the the error message you get back from the compiler is so complicated that in C++ it's usually not worth trying to figure out what the error message meant and I find it's always just much quicker to go look at the line number and try to guess what the error must have been because the language is far too complicated whereas go is you know probably doesn't have a lot of people's favorite features but it's relatively straightforward language okay so at this point you're both on the tutorial if you're looking for sort of you know what to look at next to learn about the language a good place to look is the document titled effective go which you know you can find by searching the web all right the first thing I want to talk about is threads the reason why we care a lot about threads in this course is that threads are the sort of main tool we're going to be using to manage concurrency in programs and concurrency is a particular interest in distributed programming because it's often the case that one program actually needs to talk to a bunch of other computers you know client may talk to many servers or a server may be serving requests at the same time on behalf of many different clients and so we need a way to say oh you know I'm my program really has seven different things going on because it's talking to seven different clients and I want a simple way to allow it to do these seven different things you know without too much complex programming I mean sort of thrust threads are the answer so these are the things that the go documentation calls go routines which I call threads they're go routines are really this same as what everybody else calls Red's so the way to think of threads is that you have a program of one program and one address space I'm gonna draw a box to sort of denote an address space and within that address space in a serial program without threads you just have one thread of execution executing code in that address space one program counter one set of registers one stack that are sort of describing the current state of the execution in a threaded program like a go program you could have multiple threads and you know I got raw it as multiple squiggly lines and when each line represents really is a separate if the especially if the threads are executing at the same time but a separate program counter a separate set of registers and a separate stack for each of the threads so that they can have a sort of their own thread of control and be executing each thread in a different part of the program and so hidden here is that for every stack now there's a syrupy thread there's a stack that it's executing on the stacks are actually in in the one address space of the program so even though each stack each thread has its own stack technically the they're all in the same address space and different threads could refer to each other stacks if they knew the right addresses although you typically don't do that and then go when you even the main program you know when you first start up the program and it runs in main that's also it's just a go routine and can do all the things that go teens can do all right so as I mentioned one of the big reasons is to allow different parts of the program to sort of be in its own point in in a different activity so I usually refer to that as IO concurrency for historical reasons and the reason I call it IO concurrency is that in the old days where this first came up is that oh you might have one thread that's waiting to read from the disk and while it's waiting to reach from the disk you'd like to have a second thread that maybe can compute or read somewhere else in the disk or send a message in the network and wait for reply so and so I open currencies one of the things that threads by you for us it would usually mean I can I open currency we usually mean I can have one program that has launched or removed procedure calls requests to different servers on the network and is waiting for many replies at the same time that's how it'll come up for us and you know the way you would do that with threads is that you would create one thread for each of the remote procedure calls that you wanted to launch that thread would have code that you know sent the remote procedure call request message and sort of waited at this point in the thread and then finally when the reply came back the thread would continue executing and using threads allows us to have multiple threads that all launch requests into the network at the same time they all wait or they don't have to do it at the same time they can you know execute the different parts of this whenever they feel like it so that's i/o concurrency sort of overlapping of the progress of different activities and allowing one activity is waiting other activities can proceed another big reason to use threads is multi-core parallelism which I'll just call parallelism and here the thing where we'd be trying to achieve with threads is if you have a multi-core machine like I'm sure all of you do in your laptops if you have a sort of compute heavy job that needs a lot of CPU cycles wouldn't it be nice if you could have one program that could use CPU cycles on all of the cores of the machine and indeed if you write a multi-threaded go if you launch multiple go routines and go and they do something computer intensive like sit there in a loop and you know compute digits of pi or something then up to the limit of the number of cores in the physical machine your threads will run truly in parallel and if you launch you know two threads instead of one you'll get twice as many you'll be able to use twice as many CPU cycles per second so this is very important to some people it's not a big deal on this course be it's rare that we'll sort of think specifically about this kind of parallelism in the real world though of building things like servers to form parts of the distributed systems it can sometimes be extremely important to be able to have the server be able to run threads and harness the CPU power of a lot of cores just because the load from clients can often be pretty high okay so parallelism is a second reason why threads are quite a bit interested in distributed systems and a third reason which is maybe a little bit less important is there's some there's times when you really just want to be able to do something in the background or you know there's just something you need to do periodically and you don't want to have to sort of in the main part of your program sort of insert checks to say well should I be doing this things that should happen every second or so you just like to be able to fire something up that every second does whatever the periodic thing is so there's some convenience reasons and an example which will come up for you is it's often the case that some you know a master server may want to check periodically whether its workers are still alive because one of them is died you know you want to launch that work on another machine like MapReduce might do that and one way to arrange sort of oh do this check every second every minute you know send a message to the worker are you alive is to fire off a go routine that just sits in a loop that sleeps for a second and then does the periodic thing and then sleeps for a second again and so in the labs you'll end up firing off these kind of threads quite a bit yes is the overhead worth it yes the overhead is really pretty small for this stuff I mean you know it depends on how many you create a million threads that he sit in a loop waiting for a millisecond and then send a network message that's probably a huge load on your machine but if you create you know ten threads that sleep for a second and do a little bit of work it's probably not a big deal at all and it's I guarantee you the programmer time you say by not having to sort of mush together they're different different activities into one line of code it's it's worth the small amount of CPU cost almost always still you know you will if you're unlucky you'll discover in the labs that some loop of yours is not sleeping long enough or are you fired off a bunch of these and never made them exit for example and they just accumulate so you can push it too far okay so these are the reasons that the main reasons that people like threads a lot and that will use threads in this class any other questions about threads in general by asynchronous program you mean like a single thread of control that keeps state about many different activities yeah so this is a good question actually there is you know what would happen if we didn't have threads or we'd for some reason we didn't want to use threats like how would we be able to write a program that could you know a server that could talk to many different clients at the same time or a client that could talk to him any servers right what what tools could be used and it turns out there is sort of another line of another kind of another major style of how do you structure these programs called you call the asynchronous program I might call it a vent driven programming so sort of or you could use a vent prevent programming and the the general structure of an event-driven program is usually that it has a single thread and a single loop and what that loop does is sits there and waits for any input or sort of any event that might trigger processing so an event might be the arrival of a request from a client or a timer going off or if you're building a Window System protect many Windows systems on your laptops I've driven written an event-driven style where what they're waiting for is like key clicks or Mouse move or something so you might have a single in an event-driven program it of a single threat of control sits an aloof waits for input and whenever it gets an input like a packet it figures out oh you know which client did this packet come from and then it'll have a table of sort of what the state is of whatever activity its managing for that client and it'll say oh gosh I was in the middle of reading such-and-such a file you know now it's asked me to read the next block I'll go and be the next block and return it and threats are generally more convenient because they allow you to really you know it's much easier to write sequential just like straight lines of control code that does you know computes sends a message waits for response whatever it's much easier to write that kind of code in a thread than it is to chop up whatever the activity is into a bunch of little pieces that can sort of be activated one at a time by one of these event-driven loops that said the well and so one problem with the scheme is that it's it's a little bit of a pain to program another potential defect is that while you get io concurrency from this approach you don't get CPU parallelism so if you're writing a busy server that would really like to keep you know 32 cores busy on a big server machine you know a single loop is you know it's it's not a very natural way to harness more than one core on the other hand the overheads of adventure and programming are generally quite a bit less than threads you know Ed's are pretty cheap but each one of these threads is sitting on a stack you know stack is a kilobyte or a kilobytes or something you know if you have 20 of these threads who cares if you have a million of these threads then it's starting to be a huge amount of memory and you know maybe the scheduling bookkeeping for deciding what the thread to run next might also start you know you now have list scheduling lists with a thousand threads in them the threads can start to get quite expensive so if you are in a position where you need to have a single server that sir you know a million clients and has to sort of keep a little bit of state for each of a million clients this could be expensive and it's easier to write a very you know at some expense in programmer time it's easier to write a really stripped-down efficient low overhead service in a venture than programming just a lot more work are you asking my JavaScript I don't know the question is whether JavaScript has multiple cores executing your does anybody know depends on the implementation yeah so I don't know I mean it's a natural thought though even in you know even an NGO you might well want to have if you knew your machine had eight cores if you wanted to write the world's most efficient whatever server you could fire up eight threads and on each of the threads run sort of stripped-down event-driven loop just you know sort of one event loop Recor and that you know that would be a way to get both parallelism and to the bio concurrency yes okay so the question is what's the difference between threads and processes so usually on a like a UNIX machine a process is a single program that you're running and a sort of single address space a single bunch of memory for the process and inside a process you might have multiple threads and when you ready to go program and you run the go program running the go program creates one unix process and one sort of memory area and then when your go process creates go routines those are so sitting inside that one process so I'm not sure that's really an answer but just historically the operating systems have provided like this big box is the process that's implemented by the operating system and the individual and some of the operating system does not care what happens inside your process what language you use none of the operating systems business but inside that process you can run lots of threads now you know if you run more than one process in your machine you know you run more than one program I can edit or compiler the operating system keep quite separate right you're your editor and your compiler each have memory but it's not the same memory that are not allowed to look at each other memory there's not much interaction between different processes so you redditor may have threads and your compiler may have threads but they're just in different worlds so within any one program the threads can share memory and can synchronize with channels and use mutexes and stuff but between processes there's just no no interaction that's just a traditional structure of these this kind of software yeah so the question is when a context switch happens does it happened for all threads okay so let's let's imagine you have a single core machine that's really only running and as just doing one thing at a time maybe the right way to think about it is that you're going to be you're running multiple processes on your machine the operating system will give the CPU sort of time slicing back and forth between these two programs so when the hardware timer ticks and the operating systems decides it's time to take away the CPU from the currently running process and give it to another process that's done at a process level it's complicated all right let me let me let me restart this these the threads that we use are based on threads that are provided by the operating system in the end and when the outer needs to some context switches its switching between the threads that it knows about so in a situation like this the operating system might know that there are two threads here in this process and three threads in this process and when the timer ticks the operating system will based on some scheduling algorithm pick a different thread to run it might be a different thread in this process or one of the threads in this process in addition go cleverly multiplex as many go routines on top of single operating system threads to reduce overhead so it's really probably a two stages of scheduling the operating system picks which big thread to run and then within that process go may have a choice of go routines to run all right okay so threads are convenient because a lot of times they allow you to write the code for each thread just as if it were a pretty ordinary sequential program however there are in fact some challenges with writing threaded code one is what to do about shared data one of the really cool things about the threading model is that these threads share the same address space they share memory if one thread creates an object in memory you can let other threads use it right you can have a array or something that all the different threads are reading and writing and that sometimes critical right if you you know if you're keeping some interesting state you know maybe you have a cache of things that your server your cache and memory when a thread is handling a client request it's gonna first look in that cache but the shared cache and each thread reads it and the threads may write the cache to update it when they have new information to stick in the cache so it's really cool you can share that memory but it turns out that it's very very easy to get bugs if you're not careful and you're sharing memory between threads so a totally classic example is you know supposing your thread so you have a global variable N and that's shared among the different threads and a thread just wants to increment n right but itself this is likely to be an invitation to bugs right if you don't do anything special around this code and the reason is that you know whenever you write code in a thread that you you know is accessing reading or writing data that's shared with other threads you know there's always the possibility and you got to keep in mind that some other thread may be looking at the data or modifying the data at the same time so the obvious problem with this is that maybe thread 1 is executing this code and thread 2 is actually in the same function in a different thread executing the very same code right and remember I'm imagining the N is a global variable so they're talking about the same n so what this boils down to you know you're not actually running this code you're running machine code the compiler produced and what that machine code does is it you know it loads X into a register you know adds one to the register and then stores that register into X with where X is a address of some location and ran so you know you can count on both of the threads they're both executing this line of code you know they both load the variable X into a register effect starts out at 0 that means they both load at 0 they both increment that register so they get one and they both store one back to memory and now two threads of incremented n and the resulting value is 1 which well who knows what the programmer intended maybe that's what the programmer wanted but chances are not right chances are the programmer wanted to not 1 some some instructions are atomic so the question is a very good question which it's whether individual instructions are atomic so the answer is some are and some aren't so a store a 32-bit store is likely the extremely likely to be atomic in the sense that if 2 processors store at the same time to the same memory address 32-bit values well you'll end up with is either the 32 bits from one processor or the 32 bits from the other processor but not a mixture other sizes it's not so clear like one byte stores it depends on the CPU you using because a one byte store is really almost certainly a 32 byte load and then a modification of 8 bits and a 32 byte store but it depends on the processor and more complicated instructions like increment your microprocessor may well have an increment instruction that can directly increment some memory location like pretty unlikely to be atomic although there's atomic versions of some of these instructions so there's no way all right so this is this is a just classic danger and it's usually called a race I'm gonna come up a lot is you're gonna do a lot of threaded programming with shared state race I think refers to as some ancient class of bugs involving electronic circuits but for us that you know the reason why it's called a race is because if one of the CPUs have started executing this code and the other one the others thread is sort of getting close to this code it's sort of a race as to whether the first processor can finish and get to the store before the second processor start status execute the load if the first processor actually manages it to do the store before the second processor gets to the load then the second processor will see the stored value and the second processor will load one and add one to it in store two that's how you can justify this terminology okay and so the way you solve this certainly something this simple is you insert locks you know you as a programmer you have some strategy in mind for locking the data you can say well you know this piece of shared data can only be used when such-and-such a lock is held and you'll see this and you may have used this in the tutorial the go calls locks mutexes so what you'll see is a mule Ock before a sequence of code that uses shared data and you unlock afterwards and then whichever two threads execute this when it to everyone is lucky enough to get the lock first gets to do all this stuff and finish before the other one is allowed to proceed and so you can think of wrapping a some code in a lock as making a bunch of you know remember this even though it's one line it's really three distinct operations you can think of a lock as causing this sort of multi-step code sequence to be atomic with respect to other people who have to lock yes should you can you repeat the question oh that's a great question the question was how does go know which variable we're walking right here of course is only one variable but maybe we're saying an equals x plus y really threes few different variables and the answer is that go has no idea it's not there's no Association at all anywhere between this lock so this new thing is a variable which is a tight mutex there's just there's no association in the language between the lock and any variables the associations in the programmers head so as a programmer you need to say oh here's a bunch of shared data and any time you modify any of it you know here's a complex data structure say a tree or an expandable hash table or something anytime you're going to modify it and of course a tree is composed many many objects anytime you got to modify anything that's associated with this data structure you have to hold such and such a lock right and of course is many objects and instead of objects changes because you might allocate new tree nodes but it's really the programmer who sort of works out a strategy for ensuring that the data structure is used by only one core at a time and so it creates the one or maybe more locks and there's many many locking strategies you could apply to a tree you can imagine a tree with a lock for every tree node the programmer works out the strategy allocates the locks and keeps in the programmers head the relationship to the data but go for go it's this is this lock it's just like a very simple thing there's a lock object the first thread that calls lock gets the lock other threads have to wait until none locks and that's all go knows yeah does it not lock all variables that are part of the object go doesn't know anything about the relationship between variables and locks so when you acquire that lock when you have code that calls lock exactly what it is doing it is acquiring this lock and that's all this does and anybody else who tries to lock objects so somewhere else who would have declared you know mutex knew all right and this mu refers to some particular lock object no and there me many many locks right all this does is acquires this lock and anybody else who wants to acquire it has to wait until we unlock this lock that's totally up to us as programmers what we were protecting with that lock so the question is is it better to have the lock be a private the private business of the data structure like supposing it a zoning map yeah and you know you would hope although it's not true that map internally would have a lock protecting it and that's a reasonable strategy would be to have I mean what would be to have it if you define a data structure that needs to be locked to have the lock be sort of interior that have each of the data structures methods be responsible for acquiring that lock and the user the data structure may never know that that's pretty reasonable and the only point at which that breaks down is that um well it's a couple things one is if the programmer knew that the data was never shared they might be bummed that they were paying the lock overhead for something they knew didn't need to be locked so that's one potential problem the other is that if you if there's any inter data structure of dependencies so we have two data structures each with locks and and they maybe use each other then there's a risk of cycles and deadlocks right and the deadlocks can be solved but the usual solutions to deadlocks requires lifting the locks out of out of the implementations up into the calling code I will talk about that some point but it's not a it's a good idea to hide the locks but it's not always a good idea all right okay so one problem you run into with threads is these races and generally you solve them with locks okay or actually there's two big strategies one is you figure out some locking strategy for making access to the data single thread one thread at a time or yury you fix your code to not share data if you can do that it's that's probably better because it's less complex all right so another issue that shows up with leads threads is called coordination when we're doing locking the different threads involved probably have no idea that the other ones exist they just want to like be able to get out the data without anybody else interfering but there are also cases where you need where you do intentionally want different threads to interact I want to wait for you maybe you're producing some data you know you're a different thread than me you're you're producing data I'm gonna wait until you've generated the data before I read it right or you launch a bunch of threads to say you crawl the web and you want to wait for all those fits to finish so there's times when we intentionally want different to us to interact with each other to wait for each other and that's usually called coordination and there's a bunch of as you probably know from having done the tutorial there's a bunch of techniques in go for doing this like channels which are really about sending data from one threat to another and breeding but they did to be sent there's also other stuff that more special purpose things like there's a idea called condition variables which is great if there's some thread out there and you want to kick it period you're not sure if the other threads even waiting for you but if it is waiting for you you just like to give it a kick so it can well know that it should continue whatever it's doing and then there's wait group which is particularly good for launching a a known number of go routines and then waiting for them Dolph to finish and a final piece of damage that comes up with threads deadlock the deadlock refers to the general problem that you sometimes run into where one thread you know thread this thread is waiting for thread two to produce something so you know it's draw an arrow to say thread one is waiting for thread two you know for example thread one may be waiting for thread two to release a lock or to send something on the channel or to you know decrement something in a wait group however unfortunately maybe T two is waiting for thread thread one to do something and this is particularly common in the case of locks its thread one acquires lock a and thread to acquire lock be so thread one is acquired lock a throw two is required lot B and then next thread one needs to lock B also that is hold two locks which sometimes shows up and it just so happens that thread two needs to hold block hey that's a deadlock all right at least grab their first lock and then proceed down to where they need their second lock and now they're waiting for each other forever right neither can proceed neither then can release the lock and usually just nothing happens so if your program just kind of grinds to a halt and doesn't seem to be doing anything but didn't crash deadlock is it's one thing to check okay all right let's look at the web crawler from the tutorial as an example of some of this threading stuff I have a couple of two solutions and different styles are really three solutions in different styles to allow us to talk a bit about the details of some of this thread programming so first of all you all probably know web crawler its job is you give it the URL of a page that it starts at and you know many web pages have links to other pages so what a web crawler is trying to do is if that's the first page extract all the URLs that were mentioned that pages links you know fetch the pages they point to look at all those pages for the ules are all those but all urls that they refer to and keep on going until it's fetched all the pages in the web let's just say and then it should stop in addition the the graph of pages and URLs is cyclic that is if you're not careful um you may end up following if you don't remember oh I've already fetched this web page already you may end up following cycles forever and you know your crawler will never finish so one of the jobs of the crawler is to remember the set of pages that is already crawled or already even started a fetch for and to not start a second fetch for any page that it's already started fetching on and you can think of that as sort of imposing a tree structure finding a sort of tree shaped subset of the cyclic graph of actual web pages okay so we want to avoid cycles we want to be able to not fetch a page twice it also it turns out that it just takes a long time to fetch a web page but it's good servers are slow and because the network has a long speed of light latency and so you definitely don't want to fetch pages one at a time unless you want to crawl to take many years so it pays enormous lead to fetch many pages that same I'm up to some limit right you want to keep on increasing the number of pages you fetch in parallel until the throughput you're getting in pages per second stops increasing that is running increase the concurrency until you run out of network capacity so we want to be able to launch multiple fetches in parallel and a final challenge which is sometimes the hardest thing to solve is to know when the crawl is finished and once we've crawled all the pages we want to stop and say we're done but we actually need to write the code to realize aha we've crawled every single page and for some solutions I've tried figuring out when you're done has turned out to be the hardest part all right so my first crawler is this serial crawler here and by the way this code is available on the website under crawler go on the schedule you won't look at it this wrist calls a serial crawler it effectively performs a depth-first search into the web graph and there is sort of one moderately interesting thing about it it keeps this map called fetched which is basically using as a set in order to remember which pages it's crawled and that's like the only interesting part of this you give it a URL that at line 18 if it's already fetched the URL it just returns if it doesn't fetch the URL it first remembers that it is now fetched it actually gets fetches that page and extracts the URLs that are in the page with the fetcher and then iterates over the URLs in that page and calls itself for every one of those pages and it passes to itself the way it it really has just a one table there's only one fetched map of course because you know when I call recursive crawl and it fetches a bunch of pages after it returns I want to be where you know the outer crawl instance needs to be aware that certain pages are already fetched so we depend very much on the fetched map being passed between the functions by reference instead of by copying so it so under the hood what must really be going on here is that go is passing a pointer to the map object to each of the calls of crawl so they all share the pointer to the same object and memory rather than copying rather than copying than that any questions so this code definitely does not solve the problem that was posed right because it doesn't launch parallel parallel fetches now so clue we need to insert goroutines somewhere in this code right to get parallel fetches so let's suppose just for chuckles dad we just start with the most lazy thing because why so I'm gonna just modify the code to run the subsidiary crawls each in its own go routine actually before I do that why don't I run the code just to show you what correct output looks like so hoping this other window Emad run the crawler it actually runs all three copies of the crawler and they all find exactly the same set of webpages so this is the output that we're hoping to see five lines five different web pages are are fetched prints a line for each one so let me now run the subsidiary crawls in their own go routines and run that code so what am I going to see the hope is to fetch these webpages in parallel for higher performance so okay so you're voting for only seeing one URL and why so why is that yeah yes that's exactly right you know after the after it's not gonna wait in this loop at line 26 it's gonna zip right through that loop I was gonna fetch 1p when the ferry first webpage at line 22 and then a loop it's gonna fly off the girl routines and immediately the scroll function is gonna return and if it was called from main main what was exit almost certainly before any of the routines was able to do any work at all so we'll probably just see the first web page and I'm gonna do when I run it you'll see here under serial that only the one web page was found now in fact since this program doesn't exit after the serial crawler those Guru T's are still running and they actually print their output down here interleaved with the next crawler example but nevertheless the codes just adding a go here absolutely doesn't work so let's get rid of that okay so now I want to show you a one style of concurrent crawler and I'm presenting to one of them written with shared data shared objects and locks it's the first one and another one written without shared data but with passing information along channels in order to coordinate the different threads so this is the shared data one or this is just one of many ways of building a web crawler using shared data so this code significantly more complicated than a serial crawler it creates a thread for each fetch it does alright but the huge difference is that it does with two things one it does the bookkeeping required to notice when all of the crawls have finished and it handles the shared table of which URLs have been crawled correctly so this code still has this table of URLs and that's this F dot fetched this F dot fetch map at line 43 but this this table is actually shared by all of the all of the crawler threads and all the collar threads are making or executing inside concurrent mutex and so we still have this sort of tree up in current mutexes that's exploring different parts of the web graph but each one of them was launched as a as his own go routine instead of as a function call but they're all sharing this table of state this table of test URLs because if one go routine fetches a URL we don't want another girl routine to accidentally fetch the same URL and as you can see here line 42 and 45 I've surrounded them by the new taxes that are required to to prevent a race that would occur if I didn't add them new Texas so the danger here is that at line 43 a thread is checking of URLs already been fetched so two threads happen to be following the same URL now two calls to concurrent mutex end up looking at the same URL maybe because that URL was mentioned in two different web pages if we didn't have the lock they'd both access the math table to see if the threaded and then already if the URL had been already fetched and they both get false at line 43 they both set the URLs entering the table to true at line 44 and at 47 they will both see that I already was false and then they both go on to patch the web page so we need the lock there and the way to think about it I think is that we want lines 43 and 44 to be atomic that is we don't want some other thread to to get in and be using the table between 43 and 44 we we want to read the current content each thread wants to read the current table contents and update it without any other thread interfering and so that's what the locks are doing for us okay so so actually any questions about the about the locking strategy here all right once we check the URLs entry in the table alliant 51 it just crawls it just fetches that page in the usual way and then the other thing interesting thing that's going on is the launching of the threads yes so the question is what's with the F dot no no the MU it is okay so there's a structure to find out line 36 that sort of collects together all the different stuff that all the different state that we need to run this crawl and here it's only two objects but you know it could be a lot more and they're only grouped together for convenience there's no other significance to the fact there's no deep significance the fact that mu and fetch store it inside the same structure and that F dot is just sort of the syntax are getting out one of the elements in the structure so I just happened to put them you in the structure because it allows me to group together all the stuff related to a crawl but that absolutely does not mean that go associates the MU with that structure or with the fetch map or anything it's just a lock objects and just has a lock function you can call and that's all that's going on so the question is how come in order to pass something by reference I had to use star here where it is when a in the previous example when we were passing a map we didn't have to use star that is didn't have to pass a pointer I mean that star notation you're seeing there in mine 41 basically and he's saying that we're passing a pointer to this fetch state object and we want it to be a pointer because we want there to be one object in memory and all the different go routines I want to use that same object so they all need a pointer to that same object so so we need to find your own structure that's sort of the syntax you use for passing a pointer the reason why we didn't have to do it with map is because although it's not clear from the syntax a map is a pointer it's just because it's built into the language they don't make you put a star there but what a map is is if you declare a variable type map what that is is a pointer to some data in the heap so it was a pointer anyway and it's always passed by reference do they you just don't have to put the star and it does it for you so there's they're definitely map is special you cannot define map in the language it's it has to be built in because there's some curious things about it okay good okay so we fetch the page now we want to fire off a crawl go routine for each URL mentioned in the page we just fetch so that's done in line 56 on line 50 sisters loops over the URLs that the fetch function returned and for each one fires off a go routine at line 58 and that lines that func syntax in line 58 is a closure or a sort of immediate function but that func thing keyword is doing is to clearing a function right there that we then call so the way to read it maybe is that if you can declare a function as a piece of data as just func you know and then you give the arguments and then you give the body and that's a clears and so this is an object now this is like it's like when you type one when you have a one or 23 or something you're declaring a sort of constant object and this is the way to define a constant function and we do it here because we want to launch a go routine that's gonna run this function that we declared right here and so we in order to make the go routine we have to add a go in front to say we want to go routine and then we have to call the function because the go syntax says the syntax of the go keywords as you follow it by a function name and arguments you want to pass that function and so we're gonna pass some arguments here and there's two reasons we're doing this well really this one reason we you know in some other circumstance we could have just said go concurrent mutex oh I concur mutex is the name of the function we actually want to call with this URL but we want to do a few other things as well so we define this little helper function that first calls concurrent mutex for us with the URL and then after them current mutex is finished we do something special in order to help us wait for all the crawls to be done before the outer function returns so that brings us to the the weight group the weight group at line 55 it's a just a data structure to find by go to help with coordination and the game with weight group is that internally it has a counter and you call weight group dot add like a line 57 to increment the counter and we group done to decrement it and then this weight what this weight method called line 63 waits for the counter to get down to zero so a weight group is a way to wait for a specific number of things to finish and it's useful in a bunch of different situations here we're using it to wait for the last go routine to finish because we add one to the weight group for every go routine we create line 60 at the end of this function we've declared decrement the counter in the weight group and then line three weights until all the decrements have finished and so the reason why we declared this little function was basically to be able to both call concurrently text and call dot that's really why we needed that function so the question is what if one of the subroutines fails and doesn't reach the done line that's a darn good question there is you know if I forget the exact range of errors that will cause the go routine to fail without causing the program to feel maybe divides by zero I don't know where dereference is a nil pointer not sure but there are certainly ways for a function to fail and I have the go routine die without having the program die and that would be a problem for us and so really the white right way to I'm sure you had this in mind and asking the question the right way to write this to be sure that the done call is made no matter why this guru team is finishing would be to put a defer here which means call done before the surrounding function finishes and always call it no matter why the surrounding function is finished yes and yes yeah so the question is how come two users have done in different threads aren't a race yeah so the answer must be that internally dot a weight group has a mutex or something like it that each of Dunn's methods acquires before doing anything else so that simultaneously calls to a done to await groups methods are trees we could to did a low class yeah for certain leaf C++ and in C you want to look at something called P threads for C threads come in a library they're not really part of the language called P threads which they have these are extremely traditional and ancient primitives that all languages yeah say it again you know not in this code but you know you could imagine a use of weight groups I mean weight groups just count stuff and yeah yeah yeah weight group doesn't really care what you're pounding or why I mean you know this is the most common way to see it use you're wondering why you is passed as a parameter to the function at 58 okay yeah this is alright so the question is okay so actually backing up a little bit the rules for these for a function like the one I'm defining on 58 is that if the function body mentions a variable that's declared in the outer function but not shadowed then the the inner functions use of that is the same variable in the inner function as in the outer function and so that's what's happening with Fechter for example like what is this variable here refer to what does the Fechter variable refer to in the inner function well it refers it's the same variable as as the fetcher in the outer function says just is that variable and so when the inner function refers to fetcher it just means it's just referring the same variable as this one here and the same with F f is it's used here it's just is this variable so you might think that we could get rid of the this u argument that we're passing and just have the inner function take no arguments at all but just use the U that was defined up on line 56 in the loop and it'll be nice if we could do that because save us some typing it turns out not to work and the reason is that the semantics of go of the for loop at line 56 is that the for the updates the variable you so in the first iteration of the for loop that variable u contains some URL and when you enter the second iteration before the that variable this contents are changed to be the second URL and that means that the first go routine that we launched that's just looking at the outer if it we're looking at the outer functions u variable the that first go team we launched would see a different value in the u variable after the outer function it updated it and sometimes that's actually what you want so for example for for F and then particular F dot fetched we interaction absolutely wants to see changes to that map but for you we don't want to see changes the first go routine we spawn should read the first URL not the second URL so we want that go routine to have a copy you have its own private copy of the URL and you know is we could have done it in other ways we could have but the way this code happens to do it to produce the copy private to that inner function is by passing the URLs in argument yes yeah if we have passed the address of you yeah then it uh it's actually I don't know how strings work but it is absolutely giving you your own private copy of the variable you get your own copy of the variable and it yeah are you saying we don't need to play this trick in the code we definitely need to play this trick in the code and what's going on is this it's so the question is Oh strings are immutable strings are immutable right yeah so how kind of strings are immutable how can the outer function change the string there should be no problem the problem is not that the string is changed the problem is that the variable U is changed so the when the inner function mentions a variable that's defined in the outer function it's referring to that variable and the variables current value so when you if you have a string variable that has has a in it and then you assign B to that string variable you're not over writing the string you're changing the variable to point to a different string and and because the for loop changes the U variable to point to a different string you know that change to you would be visible inside the inner function and therefore the inner function needs its own copy of the variable essentially make a copy of that so that okay but that is what we're doing in this code and that's that is why this code works okay the proposal or the broken code that we're not using here I will show you the broken code this is just like a horrible detail but it is unfortunately one that you'll run into while doing the labs so you should be at least where that there's a problem and when you run into it maybe you can try to figure out the details okay that's a great question so so the question is you know if you have an inner function just a repeated if you have an inner function that refers to a variable in the surrounding function but the surrounding function returns what is the inner functions variable referring to anymore since the outer function is as returned and the answer is that go notices go analyzes your inner functions or these are called closures go analyzes them the compiler analyze them says aha oh this disclosure this inner function is using a variable in the outer function we're actually gonna and the compiler will allocate heat memory to hold the variable the you know the current value of the variable and both functions will refer to that that little area heap that has the barrel so it won't be allocated the variable won't be on the stack as you might expect it's moved to the heap if if the compiler sees that it's using a closure and then when the outer function returns the object is still there in the heap the inner function can still get at it and then the garbage collector is responsible for noticing that the last function to refer to this little piece of heat that's exited returned and to free it only then okay okay okay so wait group wait group is maybe the more important thing here that the technique that this code uses to wait for all the all this level of crawls to finished all its direct chill and the finish is the wait group of course there's many of these wait groups one per call two concurrent mutex each call that concurrent mutex just waits for its own children to finish and then returns okay so back to the lock actually there's one more thing I want to talk about with a lock and that is to explore what would happen if we hadn't locked right I'm claiming oh you know you don't lock you're gonna get these races you're gonna get incorrect execution whatever let's give it a shot I'm gonna I'm gonna comment out the locks and the question is what happens if I run the code with no locks what am I gonna see so we may see a ru or I'll call twice or I fetch twice yeah that's yeah that would be the error you might expect alright so I'll run it without locks and we're looking at the concurrent map the one in the middle this time it doesn't seem to have fetched anything twice it's only five run again gosh so far genius so maybe we're wasting our time with those locks yeah never seems to go wrong I've actually never seem to go wrong so the code is nevertheless wrong and someday it will fail okay the problem is that you know this is only a couple of instructions here and so the chances of these two threads which are maybe hundreds of instructions happening to stumble on this you know the same couple of instructions at the same time is quite low and indeed and and this is a real bummer about buggy code with races is that it usually works just fine but it probably won't work when the customer runs it on their computer so it's actually bad news for us right what do we you know it it can be in complex programs quite difficult to figure out if you have a race right and you might you may have code that just looks completely reasonable that is in fact sort of unknown to you using shared variables and the answer is you really the only way to find races in practice to be is you automated tools and luckily go actually gives us this pretty good race detector built-in to go and you should use it so if you pass the - race flag when you have to get your go program and run this race detector which well I'll run the race detector and we'll see so it emits an error message from us it's found a race and it actually tells us exactly where the race happened so there's a lot of junk in this output but the really critical thing is that the race detector realize that we had read a variable that's what this read is that was previously written and there was no intervening release and acquire of a lock that's what that's what this means furthermore it tells us the line number so it's told us that the read was a line 43 and the write the previous write was at line 44 and indeed we look at the code and the read isn't line 43 and the right is at lying 44 so that means that one thread did a write at line 44 and then without any intervening lock and another thread came along and read that written data at line 43 that's basically what the race detector is looking for the way it works internally is it allocates sort of shadow memory now lucky some you know it uses a huge amount of memory and basically for every one of your memory locations the race detector is allocated a little bit of memory itself in which it keeps track of which threads recently read or wrote every single memory location and then when and it also to keep tracking keeping track of when threads acquiring release locks and do other synchronization activities that it knows forces but force threads to not run and if the race detector driver sees a ha there was a memory location that was written and then read with no intervening market it'll raise an error yes I believe it is not perfect yeah I have to think about it what one certainly one way it is not perfect is that if you if you don't execute some code the race detector doesn't know anything about it so it's not analyzing it's not doing static analysis the racing sectors not looking at your source and making decisions based on the source it's sort of watching what happened at on this particular run of the program and so if this particular run of the program didn't execute some code that happens to read or write shared data then the race detector will never know and there could be erased there so that's certainly something to watch out for so you know if you're serious about the race detector you need to set up sort of testing apparatus that tries to make sure all all the code is executed but it's it's it's very good and you just have to use it for your 8 to 4 lives okay so this is race here and of course the race didn't actually occur what the race editor did not see was the actual interleaving simultaneous execution of some sensitive code right it didn't see two threads literally execute lines 43 and 44 at the same time and as we know from having run the things by hand that apparently doesn't happen only with low probability all it saw was at one point that was a right and they made me much later there was a read with no intervening walk and so enact in that sense it can sort of detect races that didn't actually happen or didn't really cause bugs okay okay one final question about this this crawler how many threads does it create yeah and how many concurrent threads could there be yeah so a defect in this crawler is that there's no obvious bound on the number of simultaneous threads that might create you know with the test case which only has five URLs big whoopee but if you're crawling a real wheel web with you know I don't know are there billions of URLs out there maybe not we certainly don't want to be in a position where the crawler might accidentally create billions of threads because you know thousands of threads it's just fine billions of threads it's not okay because each one sits on some amount of memory so a you know there's probably many defects in real life for this crawler but one at the level we're talking about is that it does create too many threads and really ought to have a way of saying well you can create 20 threads or 100 threads or a thousand threads but no more so one way to do that would be to pre create a pool a fixed size pool of workers and have the workers just iteratively look for another URL to crawl crawl that URL rather than creating a new thread for each URL okay so next up I want to talk about a another crawler that's implemented and a significantly different way using channels instead of shared memory it's a member on the mutex call or I just said there is this table of URLs that are called that's shared between all the threads and asked me locked this version does not have such a table does not share memory and does not need to use locks okay so this one the instead there's basically a master thread that's his master function on a decent 986 and it has a table but the table is private to the master function and what the master function is doing is instead of sort of basically creating a tree of functions that corresponds to the exploration of the graph which the previous crawler did this one fires off one ute one guru team per URL that it's fetches and that but it's only the master only the one master that's creating these threads so we don't have a tree of functions creating threads we just have the one master okay so it creates its own private map a line 88 this record what it's fetched and then it also creates a channel just a single channel that all of its worker threads are going to talk to and the idea is that it's gonna fire up a worker thread and each worker thread that it fires up when it finished such as fetching the page will send exactly one item back to the master on the channel and that item will be a list of the URLs in the page that that worker thread fetched so the master sits in a loop we're in line eighty nine is reading entries from the channel and so we have to imagine that it's started up some workers in advance and now it's reading the information the URL lists that those workers send back and each time he gets a URL is sitting on land eighty nine it then loops over the URLs in that URL list from a single page fetch align ninety and if the URL hasn't already been fetched it fires off a new worker at line 94 to fetch that URL and if we look at the worker code online starting line 77 basically calls his fetcher and then sends a message on the channel a line 80 or 82 saying here's the URLs in the page they fetched and notice that now that the maybe interesting thing about this is that the worker threads don't share any objects there's no shared object between the workers and the master so we don't have to worry about locking we don't have to worry about rhesus instead this is a example of sort of communicating information instead of getting at it through shared memory yes yeah yeah so the observation is that the code appears but the workers are the observation is the workers are modifying ch while the Masters reading it and that's not the way the go authors would like you to think about this the way they want you to think about this is that CH is a channel and the channel has send and receive operations and the workers are sending on the channel while the master receives on the channel and that's perfectly legal the channel is happy I mean what that really means is that the internal implementation of channel has a mutex in it and the channel operations are careful to take out the mutex when they're messing with the channels internal data to ensure that it doesn't actually have any reasons in it but yeah channels are sort of protected against concurrency and you're allowed to use them concurrently from different threads yes over the channel receive yes we don't need to close the channel I mean okay the the break statement is about when the crawl has completely finished and we fetched every single URL right because hey what's going on is the master is keeping I mean this n value is private value and a master every time it fires off a worker at increments the end though every worker it starts since exactly one item on the channel and so every time the master reads an item off the channel it knows that one of his workers is finished and when the number of outstanding workers goes to zero then we're done and we don't once the number of outstanding workers goes to zero then the only reference to the channel is from the master or from oh really from the code that calls the master and so the garbage collector will very soon see that the channel has no references to it and will free the channel so in this case sometimes you need to close channels but actually I rarely have to close channels he said again so the question is alright so you can see at line 106 before calling master concurrent channel sort of fires up one shoves one URL into the channel and it's to sort of get the whole thing started because the code for master was written you know the master goes right into reading from the channel line 89 so there better be something in the channel otherwise line 89 would block forever so if it weren't for that little code at line 107 the for loop at 89 would block reading from the channel forever and this code wouldn't work well yeah so the observation is gosh you know wouldn't it be nice to be able to write code that would be able to notice if there's nothing waiting on the channel and you can if you look up the Select statement it's much more complicated than this but there is the Select statement which allows you to proceed to not block if something if there's nothing waiting on the channel because the work resin finish okay sorry to the first question is there I think what you're really worried about is whether we're actually able to launch parallel so the very first step won't be in parallel because there's an exit owner the for-loop weights in at line 89 that's not okay that for loop at line 89 is does not just loop over the current contents of the channel and then quit that is the for loop at 89 is going to read it may never exit but it's gonna read it's just going to keep waiting until something shows up in the channel so if you don't hit the break at line 99 the for loop own exit yeah alright I'm afraid we're out of time we'll continue this actually we have a presentation scheduled by the TAS which I'll talk more about go

Info

Channel: MIT 6.824: Distributed Systems

Views: 182,039

Rating: undefined out of 5

Keywords:

Id: gA4YXUJX7t8

Channel Id: undefined

Length: 80min 22sec (4822 seconds)

Published: Fri Feb 07 2020