Crust of Rust: Channels

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Thanks for these videos, /u/Jonhoo. They're really well done, very informative, and explain the concepts clearly. I've really enjoyed them and have learned a lot. Hopefully you're able to find time to continue with them after the move.

👍︎︎ 29 👤︎︎ u/rage_311 📅︎︎ Aug 06 2020 🗫︎ replies

Rust Lang streamers really amazes me :D Who is your favorite streamer cuties?

Mine Are:

Johnhoo
Brandon (Gamazo) (His rants are best part of stream)James Munn (Embedded Stuff)
mighty soy pie (He streams making compilers ,minecraft etc.)
Amos (He doesn't stream anymore may be hes too busy but he is writing lot of good articles)
And now Steve is also streaming.

Love the community :)

👍︎︎ 14 👤︎︎ u/[deleted] 📅︎︎ Aug 06 2020 🗫︎ replies

I’ve used channels before, but always knowing that I had absolutely no idea how they worked. Around 25 minutes in a massive lightbulb went off in my head. I’ve currently stopped at 37 minutes in, but I’ll be sure to finish it when I get more time. Thank you so much!

👍︎︎ 6 👤︎︎ u/sheepshapesoap 📅︎︎ Aug 06 2020 🗫︎ replies

Captions

welcome back everyone to yet another crust of rust episode i'm i'm trying to find a way to like fit in as many of these as i can before like i moved to la and like all these changes in my life happen and it'll be a little bit unpredictable when the next episode is going to be it's been trying to cram in as many like good episodes in the midst of my thesis writing as i can um i for those of you who aren't aware of what crust of rust is this is a variant of the live streams that i normally do where i try to tackle sort of beginner intermediate content is the best way to describe it this is stuff where you've read the rust book you have some familiarity with the language you've maybe built some things but you're you're looking to understand how some of the maybe slightly more advanced topics work uh so if you look at some of the past videos um that i've done i've done things on lifetime annotations uh declarative macros iterators smart pointers interior immutability a lot of the topics that like once you start getting deeper into rust you start seeing some of these things pop up and you might wonder how they work um and in order to do this next stream or the one we're about to do i tweeted out to ask people what would you like to see next and there was a pretty overwhelming plurality for looking at the standard sync mpsc module and mpsc is basically i mean it's not basically it is a channel implementation that comes in the standard library a channel if you're not familiar with it is just a way to send data from one place and receive it somewhere else the mpsc part of it is a multi-producer single consumer so this means you can have many senders but you only have one receiver so it's a many to one channel and in the stream what we're going to do is basically implement our own channel and see the ways in which it compares to the both the standard library channel but also some of the other channel implementations that are out there um some of the other ways to design these channels and some of the considerations that come up when you do and when you decide how to use them um before we dig into that let me uh oh yeah and you can like if you're interested in these crossover streams just follow me on twitter or on subscribe on youtube or twitch and you'll be notified whenever i do any upcoming stream so as i mentioned the standard library has a built-in mechanism for these mpsc channels and usually for any crate that provides something like this you have a receiver type and you have a sender type in the case of the standard library there's also a sync sender type and we'll talk a little bit about why that is and how it's different and as you can see the examples are fairly straightforward you create when you create a channel you create a sending handle and a receiving handle and you can you can move these independently right so you can give the receiver or the sender to some different thread and give the the opposite side of the channel to some other thread and now they can communicate over that channel the channel is unidirectional though only the senders can send and only the receiver can receive you can clone the sender but you cannot clone the receiver hence the multi-producer single consumer part before we dig into how to implement a channel let's first just like make sure that we're all on board about what a channel is and maybe why it might be useful uh so if you have questions about that let's like get those out of the way first before we start to get into the weeds of the code um oh yeah there's a discord server as well all the chat will also be in discord if you're interested in watching this if you're watching the video on demand um what does crossbeam do differently to the standard lib i forget that's something we'll cover after we've done our implementation what kind of data can i send through a channel so the channels are the channels in rust are all they take a generic parameter t so if you look at the sender for example here the sender has a type parameter t and you can send any t through that channel and the when you do when you create the channel in the first place like if you call the channel method which is the thing that gives you a sender and receiver it's parameterized by that type so the sender and receiver are both parameterized by the type of the thing you're going to send and receive um yeah they're very much like oh i can zoom in yeah yeah sure um so they are very much like go channels channels are something that exists in most most languages uh and very often they function in a similar way you have senders and receivers uh in rust parlance it's specifically the mpsc channel whereas in in some other languages you have like many to many channels for example i forget what ghost channels do there the data has to be send though right um not quite actually the so imagine that you have a channel but you never give away the sender or receiver to a different thread then the t is being sent on the same thread and so it doesn't actually need to be send and i don't know the standard library makes this distinction but hopefully it does yeah so you see that the sender is send if the t is send what this means is you can construct a channel and send stuff over it that's not send as long as you don't move the sender or receiver across the thread boundary but if you do then t must be sent um are there any constraints of what kinds of types you can send through the channel though does it have to be sent it does not have to be send and no there are no other constraints you can send anything any type through a channel remember that it's not it's not like serialization it's not tcp it's nothing like that uh it is really just sending the the data that's stored in it if you send a vec for example it's going to send like the length capacity and the the pointer to the data across the channel uh does they need to be sized or can it be din um i think it it has to be sized so remember from our one of our previous streams we talked about the fact that the size trait is an auto trait but it it is also um an auto bound so uh unless you say question mark size like this thing doesn't need to be sized every t that you write has to be sized so here because there's no question mark sized t has to be sized um while you go can you point out differences between other implementations of channels we'll look at that later in the stream we'll specifically look at uh what are other implementation strategies and the ones that we pursue how is the sender thread distinguished from the receiver uh they have different types right so there's when you call channel which constructs a new channel for you you get back two halves if you will a sender half which has a sender type and a receiver half which has the receiver type um can the data be non-static sure but you need to own it what's the point of a channel if you're not going between threads there are some cases where you might um you might not go between threads but you might have multiple sort of you might have things that you want to execute in parallel but not concurrently or concurrently but not in parallel uh so you might have one thread that's like an event loop or something right and it might end up sending to itself uh and so you might still want that sender receiver abstraction um so that would be one example tests or another um who owns the data in the channel the channel object yeah the the channel type itself owns the t so if you send something on the channel you don't receive it and you drop the sender and receiver the channel will make sure that the data gets dropped what's the performance impact of the channel um we'll look at that a little bit when we write the implementation um how does it do back pressure we'll look at that as well once we start to get into implementation uh all right so um it seems like the question now are more about implementation details so let's start with that um as always we will start with an empty project new lib and we're going to call it panama because panama was a channel that enabled communication between there are also good reasons not to call the panama but it was the first thing that came into my head um okay so we're gonna have a um we're going to have a pub struct that's going to be a sender and it's going to hold a t we don't know what's going to go in there yet and we're going to have a receiver t uh and then we're going to have a function channel it's generic over t and it's going to return you a sender t and a receiver t and by convention the the sender type comes first so you return a tuple where the first part of the tuple is the sender and the second part of the tuple is the receiver and who knows what this is going to do yeah so this is the setup that we have um and the question becomes well what is the actual implementation we want and here we have a lot of possible choices uh i'm gonna go with one that is just like very straightforward and demonstrate some useful concurrency primitives in rest but this is not necessarily the most performant in part because implementing a very performant channel uh requires a lot more uh subtlety and trickiness they'll be hard to cover in the stream so in particular what we're gonna do uh is we're gonna use other we're going to be using other parts of the sync module in particular we're going to be using mutex and con arc and convar so mutex we talked a little bit about in the stream on smart pointers and interior immutability mutex is a lock mutex stands for mutual mutual exclusion so the idea is that you have a i really want the default to be to collapse you have a lock method and the lock method returns a guard and while you have that guard you are guaranteed to be the only thing that can access the tee that is protected by the mutex and the way this works in practice is if two threads both try to lock the same mutex one will get to go and the other one will block it will have to wait until the other one releases the guard and then it gets to go so this ensures that there's only ever one thread modifying the t at any given time arc as we talked about in the stream on interior immutability and smart pointers is a reference counter type it's a rc because it's an atomically reference counted type which means that we can use it across thread boundaries which obviously we want for a channel it's kind it's not useless on a single thread but we would certainly want it to work across thread boundaries this is also the reason why we're using mutex instead of something like ref cell um and then convar convar is interesting if you haven't done a lot of concurrency work before you might not know what a convar is a conditional variable a convar is a way to announce to a different thread that you've changed something it cares about so think of this as like if there's a receiver who's waiting because there's no data yet and you have a sender that sends something it needs to wake up the the thread that's sleeping right the thread that was waiting to receive something and go there's now stuff you can read and that's what a convar lets you do and together these are very useful um concurrency primitives they give you a very nice model for how to write concurrent code in a in a safe in a safe way we might not even need any unsafe code in this implementation of channels um okay so what we're going to do is we're going to define and this is a pretty common pattern in rust when you have things that are that are shared like there are multiple halves or multiple handles the point to the same thing which is we're going to declare an inner type which holds the data that is shared uh and for us that's going to be sort of the the things in the channel um this is effectively a queue right because if a sender sends something and then the receiver receives something the receiver should be should receive the thing that was sent the longest to go right now let's start with this being a vec it's not actually going to end up being a vec but for now it's a that's a useful starting point and then we're going to say is that the receiver has an inner which is just an arc mutex t and the sender has the same thing so the sender and receiver actually contain the same stuff at least at the moment standard sync we want arc and mutex and we'll want convar and now we even have our way to create this channel in the first place right so what we do is we create the inner which is going to be like an inner and q which is going to just be an empty vect to begin with and then we're going to have the the shared inner be an arc of a mutex of an inner right and then we're going to return a sender of that inner and a receiver of that inner all right this is not a t but an inner t ah the heat is making me type more poorly great so that now compiles this is a rough structure that we've laid out here makes sense there are obviously plenty of things missing and there are some things that aren't quite right but it's a general idea of how we're planning to to sort of set up this shared state make sense um uh ref cell does run time borrow checking right yes mutex in a sense is also a runtime borrow check but it's it doesn't borrow checks so much as borrow and force if two threads try to access the same thing at the same time it'll block one thread whereas ref cell will tell you you can't get this mutably at the moment uh why does the convoy always need a much text guard uh we'll get back to that in a second once we start adding the convar why not make it a linked list um we'll touch on that for alternative implementations can you zoom your vim text a little absolutely how's that uh is it possible to specialize the struct so that if t is not send the arc mutex would just be an rc um no not easily you can't specialize the definition um yeah no you would have to have uh a sort of unsynced version right so this is something i forget if the standard library has something like this i don't think so uh but you could imagine that you had like a standard unsync or unsend mpsc um although i think the the actual use case for those is is less clear than for a cross thread channel in general channels are used for things to send um i've seen a lot of people using mutexes from parking lot and channels from crossbeam does convar have a similar better implementation you won't know of parking lot also provides parking and notification which is what converts give you so yeah you could totally use the stuff from from parking lot as well i know there's been some talk of trying to take the parking lot implementations of things like mutex and convar and make them the standard library ones and that might happen someday why would you not put the mutex in inner yeah i guess we could do that that's certainly uh this is a change that i would have done in about five minutes um so make this be this and you'll you'll see why that's actually necessary in a bit and in fact here we can even go default so we'll use the default implementation of vec why does the receiver type need to have an arc protected by a mutex if the channel may only have a single consumer thread so okay so the question is why does the receiver need to have a mutex and the answer is because a receive ascend and receive might happen at the same time and they need to be mutual mutually exclusive to each other as well right and so that's why they all need to be synchronized with the mutex um is there a difference between an arc mutex and a boolean semaphore um a mutex is a boolean semaphore uh effectively so no but there's i don't think there's a reason to use a boolean semaphore over the implementation in mutex in particular what mutex buys you is that it integrates with the um with the parking mechanisms and and user mode uh futexes that are implemented by the operating system so with a boolean semaphore if someone else a boolean 74 is basically a boolean flag that you check and atomically update the problem there is if the flag is currently set so someone else is in the critical section someone else has the lock has the mutex then what do you do with the boolean sem4 you have to spin you have to repeatedly check it whereas with the mutex the operating system can put the thread to sleep and wake it back up when the mutex is available which is generally more efficient although adds a little bit of latency at that point you can just use a q right i don't know what the question is why is the arc needed so the arc is needed because otherwise if there was no arc here then the sender and the receiver would just have two different instances of inner and if they did then how would they communicate right they need to share an inner because that's where the sender is going to put data and where the receiver is going to take data out of all right so so let's just like start implementing and see what happens um so for a sender we want a obviously send function it's going to take immute self and it's going to take the t that we're going to send and it's going to return something for now let's just say that it returns nothing and we'll see why that's a problem later on and similarly for receive actually let me move this up here and then for receiver we want a receive method which does not take a t but returns the t okay so what would send do well send is just gonna first it's gonna take the lock self.inner dot lock uh and you'll notice if we go back to the documentation from mutex you'll see that lock returns a lock result the answer for this is imagine that the last person who took the lock the last thread that took the lock panicked while holding the lock right so so it might be in the process of updating something under the lock but then the thread panics so that might mean that the data under the lock is now in some like not quite consistent state and the way that the lock communicates this is when the the thread panics it releases the lock but it also sets a little flag in it to say the last thing that accessed this panicked and so what lock result does uh is you'll see that it's a it's either a guard or a poison error with a guard basically telling you if you get an error back it's saying the other thread panicked you should know about that and of course you could always choose to ignore the fact that it was poisoned that you can ignore the fact that the other threat panicked but you it could also be that you don't want to ignore that in our case we're going to unwrap this for now uh inner q so we're going to lock the queue and then we're going to do q dot push t right and the receiver is going to do sort of the opposite right it's gonna it's gonna lock the queue and then it's gonna pop the queue now there should immediately be some some obvious problems with this the first is that this is not actually a cue right if the sender s pushes and the receiver pops then if you have two cents and then receive the receiver would get the last thing sent rather than the first thing we sent um and the problem here of course that we're using vec we're using it like a stack now in theory you could remove the first thing from a vec but what you end up doing is you have to shift all the other elements down to fill the hole of the thing you removed in practice the way to do this is to use a ring buffer we might cover those in a later stream but for now know that there is in collections there's a type called vec dec or vec dq which implements a it's basically a fixed amount of memory or it's it's sort of like a vector but it keeps track of the start and end position separately so if you put you if you push to the end then it it pushes it to the end if you pop from the front it removes the element and then just moves like a pointer to where the data starts and so this way the data might end up wrapping around the whole thing but it can be used as a queue as opposed to a stack and this allows us to have a to have send do a push back uh and q do a pop and receive do a pop front um you don't want to swap remove someone suggested that as an alternative if you swap or move what that will do is the last thing sent will become the next thing to be received it changes the order of the elements in the vector apparently vectec is the correct pronunciation thanks atlas all right so the other problem here right is well when we receive pop front as the compiler tells us returns an option it doesn't return you a t because it could be that there's nothing in there and then what do we do we can't just i mean one option here right is we can provide a try receive method that returns an option t right and so it will try to receive something but if there's nothing to receive it just returns none that seems totally fine i'm going to remove it now because we're going to change some things later but really we want to provide what's known as a blocking version of receive we want to provide a receive that if there isn't something yet it waits for there to be something in the channel um and so we need to figure out what to do here and the answer here this is where the convar comes into play so here we're gonna do something uh we're specifically gonna have a uh we're gonna split it into a tx okay or tx available a convar and an r actually we're just gonna do available is that what i want to do yeah we're going to go with available for now it's not quite true which is close enough um and the convar needs to be outside the mutex because the idea is that imagine you're currently holding the mutex and you realize you need to wake other people up the person you wake up has to take the mutex that's sort of the assumption but you're currently holding the mutex so if you tell them to wake up while holding the mutex and then they wake up they try to take the lock they can't they go to sleep and then you continue running and then you release the mutex then now no threat is awake and you end up with what's known as a deadlock no thread can make progress even though it is possible to make progress so this is why the conveyor has to be outside the mutex the idea is that you sort of let go of the mutex at the same time as you notify the other thread this is why to get to the question that was raised earlier why the convar requires you to give in a mutex guard you have to prove that you currently hold the lock and then it will make sure that it does this step as one step as an atomic step so here what we're going to do is we're gonna match uh on cue pop front uh and if it's sum t then we're just gonna return t uh but if it's none we're gonna block so we're gonna do self inner available wait and we're gonna wait on the queue now of course the the problem as the compiler also points out is that okay we wait but then what uh so this actually ends up needing to be a loop right so we're going to be doing this in a loop and not only that but if you look at the signature of weight you'll see that the weight actually gives you a mutex guard back and the idea is that if you get woken up you automatically have the mutex someone else chose to wake you up and you now are sort of it basically hands the mutex to you and then you do something appropriate with it um and so instead of having to lock it each iteration to the loop what we can do is this right and this too can be poisoned if the previous holder was poisoned and so what we're going to do is we're just going to keep looping but this isn't going to be a spin loop right so if we end up in the non-cause clause then what we're going to do is going to wait for us a sort of signal on this available convar and the operating system will make sure that the thread only the thread goes to sleep and then only wakes up if there's some reason for it to wake up and this also means that now the sender needs to make sure that it notifies any waiting receivers once it sends because otherwise uh imagine that some thread enters this loop and it's just like sleeping and then ascend happens we need to make sure that this thread wakes up if it doesn't we have a problem right uh so we're going to use the cod bar for this as well so the oops enter so it has a notify 1 and a notify all call and we're gonna drop the cue here so we need to drop the lock so that whoever we notify can uh can wake up and then we're gonna notify one thread uh and because we are the sender we know that this will be a receiver that we wake up does that make sense at the moment uh vector double ended q is a vector uh yeah basically i mean a vectec is just a a vector with a head and tail index uh isn't that kind of loop the razon dot i don't know how to pronounce that in french i should probably not try uh of async um not quite so this is fine like this loop is not a spin loop um where you need uh async weight is more if you're um it is generally when you are i o bound not cpu bound it's for slightly different reasons it's basically so that you don't need to have a million threads running um actually i lied for the need for weight to take a guard uh come to think of it so you'll notice up here notify one does not require me to drop the mutex it doesn't require me to hand in the mutex but but you sort of need to do that regardless but weight requires you to give up the guard right the idea is that you can't wait while still holding the mutex you need to give up the mutex in order to wait because otherwise whoever would have woken you up can't get the mutex and so that's why it requires you to take the guard um how is it protected from convar spurious wake ups yeah so one thing that can happen with convars is when you call weight here the operating system doesn't guarantee that that you aren't woken up without there being anything for you to do and that's what the loop does here right so imagine that you're woken up for some other reason like not because a sender happened but just because of a signal to the process or some other random reason basically the operating system doesn't guarantee that you wake up for a reason then what you'll do is you loop around you'll check the queue you'll realize it's still empty and then you go to sleep again so that's fine um uh how can someone send we have the mutex locked now when we're receiving so it blocks insertions yeah so that's weight gives up the lock uh just before it goes to sleep and so that allows the sender to proceed uh i'm not using al anymore i'm using uh coc them coc neovim which gives me the like inline type annotations and errors um wouldn't the lock be dropped after the notify yeah but i specifically drop it before the notify so that when the other thread wakes up it can immediately take the lock um how does the operating system know which thread to wake up it doesn't uh so when we do notify one what that means is notify one of the threads that's waiting on this convar specifically um and because we know there's only one sender and many receivers we know that that must be a receiver because this is the sender uh wouldn't it be nice to use brackets around let q and send instead of drop uh i mean that's true we could do this instead i don't know that that's any nicer i'd prefer for this to be an explicit um to be explicit about the the release don't we need to take the lock in the loop so the way the weight works on a convar is it consumes the mutex from you because it needs to give it up right before it goes to sleep but if you're woken up it takes the mutex for you so that's why we reassign to the the guard because we get it back when weight returns there's a weight with timeout right yeah there is a weight with timeout as well which does not necessarily give you the guard uh actually i think it does give you the guard as well wouldn't only one thread empty the entire queue and only allow other threads again once it's empty no because notice that we return when we manage to pop something from the front which releases the mutex is there a notification variant which takes the guard to drop it for you not as far as i'm aware no if n threads are waiting one of them is randomly chosen to be woken up yeah notify one does not guarantee which thread is woken up there's also notify all which notifies all the waiting threads is it possible for the queue to be locked between the drop and when the receiver locks from another sender there's only remember this is yeah so it can it can be that there's another sender um it can be that there's another sender that also manages to push to the queue but that isn't really a problem right it just the receiver will still eventually get to go so in the current setup right remember senders can basically never block here right the senders when a center gets the lock it always succeeds in sending so there are never any waiting centers in the in the current design we'll talk more about that in a second um all right so um this setup will actually work pretty well um in fact we can we can try it out here available is going to be a condvar uh new and then the other thing we want to do is we write the other thing we want to do here is we want to make sure that the sender is cloneable right so your first instinct might be to derive clone here now you can clone the sender unfortunately derive clone at least at the moment actually d sugars into impul t clone clone for sender t self to self uh with some auto generated stuff by the compiler here uh so this is what the if you put derived this is what it turns into and one thing you'll notice about this is that it it added the the clone bound to t as well very often this is what you want right because inner might like if the structure you're deriving clone on contains a t then t does need to be cloned in order for you to clone the whole type in our case though arc implements clone regardless of whether the inner type is cloned that's sort of what reference counting means you can clone an arc and there's still only one of the thing inside and so for our implementation of clone we don't actually need t to be cloned we want this implementation and that's the reason why we need to implement uh clone ourselves manually luckily it's pretty pretty simple though um it's just inner is self inner clone and the actual way to write this is here this is technically legal but it's usually not what you want to write the reason for this is imagine that inner also implemented clone rust won't know whether this call is supposed to clone the arc or the thing inside the arc because arc dereferences to the inner type and this the dot operator sort of recurses in in into the inner drafts and so usually what you want to do here is use our clone to say that i specifically want to clone the arc and not the thing inside the arc all right uh couldn't lock block yes lock blocks that's the whole point of lock and that's what we want right like send and receive should be blocking methods uh if if you run well i mean send if you run send will only block for short amounts of time but if you try to receive and there's nothing in the channel we want the thread to block uh i don't understand why you're talking about waking up multiple receivers while implementing mpsc sorry you're right uh i misspoke i meant um i meant there's only one receiver and therefore notify one will notify the right thing there are never sleeping senders in our current setup uh will there be a max size on the queue that will need senders to wait not currently although i'm about to get to it it's also easier to read that it's a trivial clone that way by removing the home bound yet is there a way to disable auto draft yeah you just don't use the dot operator you use the sort of unified method calling syntax why does rust bubble e enter whatever os equivalent up into the wait function it often doesn't have a choice um the weight implementation uh basically just ends up being the os implementation often they want to add as little as possible between except like poisoning um so yeah i think weight specifically says it does not give this guarantee that you don't get um what are those spurious ways wake ups um okay so let's let's just like check that this works um so we're going to do mod tests and a test and we're going to do like a ping pong test we're going to create a tx and an rx i guess here we're going to do super and that's going to create a channel i can't spell then we're going to do tx dot send and we're going to send 42 and then we're going to assert equals rx dot receive 42 and what did i call this thing panama that's right uh and tx and rx both need to be mute actually rx rx needs to be mute uh actually rx doesn't technically need to be mute but we're gonna make it be mute i'll show you that for a second uh and send also technically doesn't need to be mute but we might as well make it be mute uh which is why they need to be mute up here now if we run it great the ping pong test succeeds so we now have we have an implementation that works uh there's still some things wrong with it but we have one that works the only problem with arc clone is that it cannot coerce to trait objects you have to do it manually for example uh r clone as arctin trade yeah that's true uh do you think tx and rx are good names for channels i know std docs use that but i've always hated it i like tx and rx um but but it's true it sort of comes down to personal preference all right so the first and most most obvious problem with this is that the receiver imagine that there are no senders left so here we're going to do we're going to make this be closed so we're just going to immediately drop the sender and now the receiver it's not even clear what the receivers should do right like what happens when i now call receive is just going to block forever even though there are no senders left i guess they need to give this a type there are no senders left and the receiver tries to receive there can never be any future senders because in order to get a sender you have to clone a sender but all the senders are gone so if we run this test you'll see that it just hangs forever which is obviously not great so realistically what we want is some way to indicate to the receiver that there are no more centers left the channel basically has been closed and the easiest way to do this is to have a we're going to change the naming here a little and call this shared shared shared shared shared shared shared shared shared i'll show you why i change it to shared is because really we want some additional data that's guarded by the mutex so we're going to have the mutex protect an inner t and the inner t is going to hold the q this is now going to be inner and it's also going to hold like a ascenders which is going to be a use now of course this is going to be shared dot inner dot lock q right so the lock now guards both the both the q but also this additional you size that we added that should say enter that should say enter great and now what we'll do is uh every time you clone a sender we're going to increase the number of senders in that in that value so when you clone it's actually going to take the lock and then senders is going to be inclement incremented by one we drop the inner and then we clone the shared and then similarly we need to now deal with the case where a sender goes away so when a center goes away we also need to grab the lock um and then one thing we want to keep track of here is like whether we were the last one if the number of senders is now zero what might happen is that like the receiver was blocking and then the last sender went away we need to make sure to wake it up otherwise it might never wake up and so we also call self shared available uh notify one if we were the last and then what the receiver has to do is is it now basically needs to return an option t right rather than just a t because it could be that the channel truly is empty forever in which case we want to return none so now what we're going to do is in the nun case if um the inner dot or inner senders is 0 then we want to return none and i guess we don't we can do this a little bit nicer and only if the the sender count is more than zero do we actually want to block and wait um so this is a this is addition of keeping track of the number of senders makes sense i guess i'll show you that the test actually uh right this test is now not quite right the inner is going to be an inner default this can derive defaults actually let's just not do that because that requires t to be default so we're going to have an inner via q which is going to be an empty vectech and the number of senders initially is one and this is going to be a mutex new over that inner state i guess i need a semicolon this now is inner and now the test here needs to assume that here we got sum right this closed test should in theory now now succeed we can assert now that if we try to receive after the center goes away we get a nun oh fun it does not work let's figure out why closed hangs forever can't the receiver check if shared is unique um potentially actually it could be that we can get away with yeah actually you might be right that we could use the the reference count in the arc instead um it gets a little complicated because of weak references but because we're only using strong references here it might be that we can get away with so with arc there's a strong count which you can give self you can give an arc and it tells you how many references there are to that arc how many instances of that arc there are and if there's only one then that must be the one of the receiver therefore there are no senders uh you're right it's a good optimization so then we can get rid of the sender's field and we no longer need to deal with the case actually this is the complicated case um if you drop a sender you don't know whether to notify because if the if the count is if the count is two you might be the last sender or you might be the second to last sender and the receiver has been dropped so i think we're going to keep it the way it was it's also easier to read there are plenty of optimizations you can make over this implementation i'm more trying to build us like a representative way in which it might work um could use an atomic use size and shared rather than creating inner you could although the moment you take a mutex there isn't really that much of a value to it uh it would mean you don't have to take the lock in and drop and clone but those should be relatively rare and the critical sections are short enough that the lock should be fast anyway wouldn't you want to notify all for drop no so when the last sender goes away that means that there's only the receiver left so there will be at most one thread waiting which will be the receiver if any uh so someone pointed out the receive should probably return result instead of option i'll get to that can you overflow the center count in theory probably not in practice uh is there any immediate benefit adding to the mutex rather than atomic size i mentioned that a little bit [Music] yeah patrick i'll get your question later um what's the difference between vectec new and vector default uh none uh i think the error was initializing senders to one in the constructor and then calling clone on the sender we return uh so in channel uh what we're cloning here is we're cloning the shared we're not cloning the sender so this won't increment the center account um can you get false sharing in between the vectec and the sender count uh you could but they're under a mutex anyway so that shouldn't matter um uh can't you just notify every time a sender is dropped you could but that would cause a lot of extra wake ups like you want to avoid spurious wake ups because they're costly like they're waking up a thread that didn't need to be woken up there's no correctness issue to waking up more threads but it does it is a performance issue all right great so now we need to figure out why this didn't work why did the closed test not work um so when we run it it hangs forever and presumably it hangs down here yeah uh so it hangs on the receive and so the question becomes why does it hang on the receive um we take the lock we try to pop from the front if it's somewhere return it if it's none and there are zero centers and return zero so here's what we're gonna do we're gonna debug print to this value see what comes out okay so the senders is one so then the question becomes uh why isn't the sender's count decremented here now probably won't let me do that drop sender count was this oh it's not dropped huh or maybe it never gets the lock no then it should hang sooner yeah so for some reason the sender is not being dropped why is that i guess if i do this does that make a difference interesting okay so i was under the impression that assigning to underscore would drop immediately but i guess that's not true i think there was actually an open discussion about this for a while okay so the it actually is correct it's just that this does not drop tx apparently which i found weird i thought that was the case but apparently not uh where's an explicit call to drop we'll do what we want um i was a little surprised that that was broken because this implementation is pretty straightforward okay does that make so sense so far why not use atomic use size instead of a mutex well we need the mutex for the queue um and because we have the mutex anyway there's no the atomic use size doesn't save us anything but because we have to take the mutex we might as well just also update the count under the mutex anyway since we have it anyway do you ever use gdb to debug rust programs uh i do but i find the print debugging is easier for for the especially for small examples like this i guess we can get rid of this um great okay um so it turns out there's still a problem with this implementation and that is it can go the other way around so this is a closed tx but what if there's a closed rx like what if we drop the rx here and then we try to do a tx send of 42. um actually this isn't really a problem this remove the type annotation here so if i do this i guess mute and rx if i do this this test will run just fine but the question is should it run fine if i try to send something on a channel where the receiver has gone away maybe the right thing to happen is that i should be told that the channel has been closed rather than the send just sort of blindly succeeding it's not entirely clear what the right answer is here this is sort of a design decision of whether send should just always succeed or whether it should fail in some way i think in this case we're just going to keep it the way it is because it's okay it's fine but in in a real implementation you might you might imagine that you actually want send to get back a signal like send returns like a result or something if the channel is closed some implementations do some don't keep in mind though that if if you wanted this to be able to fail then you have to make sure that you give back the value that the user tried to send like if the send fails the you should user should be given back the value they tried to send so that they can try to send it somewhere else or log it or something like that and the basically the way you implement this is you add a sort of closed flag to the inner or just a boolean that the sender sets and if the sender drops just like we have a drop for sorry if the receiver drops the closed flag is set and it's drop and it doesn't notify all although there aren't senders blocking this particular implementation but it sets that flag and when you send if the flag is set you return an error rather than pushing to the queue can we resurrect a drop channel no if the sender goes away uh you have no way to send anymore in our particular design because the sender and the receiver have the same implementation in theory we can add a method that lets you construct a sender from the receiver most implementations are not quite as symmetric as this one and you can't easily create a center from a receiver and uh in our implementation you could get a receiver from a sender but we wouldn't want to provide that because then people could create multiple receivers which which would not work right it would be wrong with our notify one um actually this particular channel would kind of work uh with a multi-producer multiple consumer but we're it's we're not we make some assumptions that make that annoying in the future there's an audio video sync problem with the stream i try twitch instead sometimes it's better youtube gets confused and is slow all right uh so now let's look at some design decisions here that that might not always make sense um the first one here is that like every operation takes the lock and that's fine if you have a a channel that is not very high performance but if you wanted like super high performance like you had you have a lot of sends that compete with each other for example then you might you might not want the sense to contend with one another right imagine that you have 10 threads they're trying to send at the same time realistically you could perhaps write an implementation that allows them to do that the only thing that really needs to be synchronized is the sender with the receivers of the senders with the receiver as opposed to the senders with one another whereas we're actually locking all of them i'll talk a little bit about what that implementation might look like later the other thing that you might have noticed is when we looked at the standard library the standard library channel has a receiver and then it has two different sender types it has a sender and it has a sync sender and both of these if you construct one you get a receiver so the receiver type is the same but the sender types are different and these are the difference between these is that one is synchronous and the other is asynchronous now this is not the same as async like you might be aware of it's not that kind of asynchronous what they mean when they talk about a synchronous channel is whether it forces the senders and receivers to synchronize that is a imagine that you have a sender that's much faster than the receiver in the current design that we have the sender would just pursue produce content much faster than the receiver could consume it and the queue would just keep growing if you have a synchronous channel what that means is the the sender and receiver sort of go in lockstep basically the channel would have a capacity it would have a limited capacity so at some point if the sender sends so much that the res and the receiver isn't consuming it as fast the channel would fill up and the sender would now block and so the primary difference between a synchronous and asynchronous channel is whether sends can block in our implementation sends can't block right the send uh up here all the send does is it takes the lock pushes to the vec and then and then drops the lock and notifies the consumer um and that works fine but it does mean that there's no back pressure right the the if the sender is too fast nothing in the system is told that the receiver isn't keeping up so the advantage of a synchronous channel is that there's back pressure the sender will eventually start blocking as well obviously this can this creates some additional challenges right like now you might have blocking senders and the receiver might have to notify the sender and be like hey i know that you were blocking but i just received something you can now go ahead and send the way this works out in practice in a design that is based on like mutexes and convars is basically you need two cond bars you need one for notifying the senders and one for notifying the receiver the way we're currently doing but you can guard them by the same mutex i think is true now in practice once you implement these designs there are some other implementations you can go with that they're a little bit better suited um and we'll we'll talk about those in a second in particular our channel method right it does not take any kind of capacity it's just an infinite queue whereas if you look at the standard library you'll see that there's a channel and there's a sync channel and the sync channel function takes a bound which is basically the channel capacity and it returns a sync sender and a receiver that that functions very much the same way as our sender and receiver types except that the sends here are synchronous questions about that before i i'm going to move on to like alternative implementations a little bit down the line um why not have senders use weak uh if sender send tries to draft the weak and returns none then fail and if the weak count is zero then you can know that the receiver receive will fail um so if the senders use weak yeah you could have the senders use weak in general i don't think i would optimize this particular implementation too much because they're they're better implementations like we'll see later but that are more complicated um if the senders use weak what you would do is um when you send you so weak is a a version of arc that doesn't increment the reference count but you have a way to try to increment the reference count if it has if the reference count hasn't already gone to zero and so the sender would try to upgrade their their sender and if they succeed then they know that the receiver is still there and they try to send now one downside of weeks is that every time you try to send you have to atomically update the reference count and decrement it after so it actually adds a decent amount of overhead is there a way to have a convar without a mutex uh not really no as you see the convar weight requires you to have a mutex guard uh wouldn't send technically block if ascend caused a vec resize so this is uh an important point i've spent a bunch of time on resizing in my research recently uh and this innocuous call to push back is not necessarily free right the the pushback it might be that the vector the vectec we're using has capacity 16 and you're pushing the 17th element to it in which case it also like allocate a new vectec of capacity 32 copy over the 16 elements deallocate the old one and then push the element and that takes some time now this still isn't blocking like if you resize it's not blocking it's just it the send just takes longer um but it is true that it does mean that the send takes longer uh and that the the in the meantime you can't do sends uh and you can't do receives um in practice for for most implementations of these things uh you don't use uh um a vectec and you don't have this problem um given the implementation you currently have how hard would it be to write an iterator implementation which consumes values from the channel until all centers are gone and then ends with none um really easy actually so if we do impul we could even do iterator for receiver then the type item is going to be t next is going to take a mute self it's going to return an option self item and this is going to call self receive and now um receiver is an iterator um are you planning to make a video about your resizing insights maybe it could be interesting it might be like a thesis talk video um is there a good way to send multiple items at once um in theory that would be pretty easy to add you could have like a send many that just appends um so it shouldn't be too hard and you would only need to take the mutex once um okay so so there's actually um there's one more optimization that that many implementations do that i think it's worthwhile mentioning here and that is because we know that there's only one receiver and this is the first place we're going to try to encode that assumption into our code to make it more efficient because we know there's only one receiver we don't really need to take the lock for every receive instead here's a trick we can do bear with me here i'm just going to write the code first and then talk about it in a second here's what we're gonna do dot q if not inner q is empty then what we're going to do is swap this oh all right this optimization is a little cool and it's something you'll see in a lot of other implementations that don't necessarily use mutexus the idea is that because there's only one receiver every anytime we take the lock we might as well steal all the items that have been queued up rather than just steal one right because no one else is going to take them and if we call receive again we might as well just like keep a local buffer of the things that we stole last time right so what we're gonna do is when someone calls receive we're gonna just like first see if we last time we took the lock whether we still have some leftover items that were there at the time and if so we can just return from there we don't even have to take the lock only if the sort of buffer is empty do we need to take the actual lock and when we do take that lock then we try to take the front item if the queue is empty then we do the same thing as before we have to wait but if the queue is not empty and we get an item then we check are there more items in the queue and if there are more items we just steal all of them we swap that vec deck with the one that we have buffered inside of ourselves and we leave the m the empty self.buffer it must be empty because we pop front returned none we leave that in its place so we just swap the two and now in subsequent calls to receive um we just end up popping from the buffer until that's empty again and so this means that now instead of taking the lock on every receive if the we only take the lock once every time there were no send no additional sends between every time we lock if that makes sense it's a neat little optimization um it's not it's not really double buffering but it it has a little bit of that flavor um it is true that this ends up keeping sort of twice the amount of memory because you have two vectex that are both going to be growing as you add more items and you're going to be swapping between them um so you do end up keeping two vectex in terms of capacity uh won't the receiver buffer optimization trigger a lot of extra memory allocator activity um only twice the amount which is also amortized right the resizes happen every power of two pushes or power of two size and so in theory that this triggers twice as many resizes at predictable intervals um this is also a good way to have uh to reduce the amount of contention right this means that now uh the the lock is taken fewer times and that means the lock will be faster to acquire all right so now that we have that implementation and we've talked a little bit through it how did you come to that optimization looking at implementation of channels got it from somewhere else yeah this is a pretty common implementation if you look at some of the the more optimized implementations they generally pull this trick uh it's important the swap is used rather than just discarding the local buffer each time yeah you can imagine that instead of doing this right we did like self dot buffer equals like um if we use standard mem take for example which just allocates a new vectec and leaves that in place of the one that's there um but that would be much more inefficient because the the one you leave there is a new allocation you would deallocate the old self.buffer this lets us reuse it do you think it might be faster without the branch for the swap could be i mean we could do this it's nothing really stopping you from doing it it this does yeah um in terms of which one is faster without the branch is probably faster but not probably not by a significant amount the branch predictor should be pretty good at that because generally either your channels aren't usually empty or usually not empty uh if the channel is usually empty then the branch will usually be false uh and so the branch predictor is going to do well um if the channel is usually non-empty then the branch predictor will predict that it's not empty and so it will do well oh the branch predictor the cpu has a built-in component that observes all your ifs all your conditional jumps and it tries to remember whether it took the branch or not the branch last time and then this is where sort of speculative execution comes into into play where if it runs that code again the branch predictor is going to say it's probably going to take the branch or it's probably not going to take the branch so start running that code under the assumption that it will or won't and then if it doesn't end up doing that then like go back and unwind what you did and then do that stuff instead or maybe receive can just return a list of values it could it's usually nicer for receive just to return an option so you can use it as an iterator and this way it'll just be fast regardless uh if we return to list we would have to allocate the list every time uh what about extending buffer with inner cue drain this would save memory will probably be a lot slower it wouldn't actually save memory you would still end up with both of them having to have capacity uh all right uh so now i think it's time that we um now that we have an implementation that's pretty reasonable it's time that we try to talk about some alternative implementations um usually what you'll see is that there are multiple so there are two kinds of implementation differences you're going to see these are either these are usually referred to as flavors um so well actually there are more than two let's say there there are multiple different kind of flavors you'll see and usually they take one of two approaches one is that there are different types for different implementations of channels we saw an example of this in the standard library where there are two different sender types for for the different flavors right one synchronous flavor and one asynchronous flavor although not async but asynchronous as we talked about for channels the other approach is to instead just have a single center type and then under the hood have basically think of it as an enum although that's usually not how it's implemented of under the hood it like figures out what type of channel it is and that way you can use the same sender type no matter where you are in practice the implementations tend to vary in what they do but they all usually have this notion of flavors and the idea behind flavors is that you you have multiple implementations of your channel multiple backing implementations and you choose which one you use depending on how the channel is used so flavors i can't spell so there are some common flavors that we've seen one is synchronous synchronous channels one is asynchronous channels another is rendezvous channels uh which we haven't talked about yet and the last is one shot channels these are usually the flavors you see sometimes they're represented as different like explicitly different channel types but very often they're sort of under the hood you won't see whether or not they're there it's something that is dynamically chosen so a synchronous channel um this is a channel where send blocks where send can block uh usually it has limited capacity this is a channel where send cannot block and this is usually unbounded so any number of sends you can build up as much as stuff impossible in memory a rendezvous channel is a synchronous channel with capacity equals zero so the idea here is that a rendezvous channel is really it doesn't let you send things um it's usually a channel that you use only to synchronize two sides very often you see rendezvous channels just have the have t be like unit like the the empty tuple and the idea here is that it's used for i don't want to say time synchronization but for like thread synchronization the idea is that if you have you have one thread that you want to kick another thread to make it do something you don't actually want to send anything to it you just want it to to do stuff um that's where you get into a rendezvous channel right you create a channel that has capacity zero so and what capacity zero means is that you can only send if there's currently a blocking receiver because you can't store anything in the channel itself so the only way to actually send is to like hand it over to a thread that's currently waiting uh and that basically means that both threads must be like one thread must be at the receive point of its execution and one thread gets to its end point and now they've rendezvoused they're both at a known location and then they can move forward from there this is often achieved with something called a barrier you find whether those in standard sync as well but you can do it with a channel the channel version is is still it ends up being a two-way synchronization because the receiver also can't proceed until the sender arrives uh i'll get to questions one once i've done the last flavor and one-shot channels are channels that you only send on once so usually these can be any capacity although in practice only one call to send so these are often things like imagine that you have an application where you have a channel that you use to tell all the threads that they should exit early right like the user pressed ctrl c or pressed x or something and you want them just all to shut down you might have a channel that you only send on once and you don't send anything useful although you could uh and then the all the like that some thread is running somewhere when you send the signal the thread is gonna like drop what it's doing and shut down and so that channel is only ever used for one send so it's a one shot and these flavors are different enough that you can have different implementations that take advantage of their patterns and we'll look at some of those in a second all right let's do questions about flavors i've heard the term bounded and unbounded for synchronous and asynchronous in case others are wondering yeah um circular channels are often called bounded channels uh because that's what they are asynchronous channels are often called unbounded channels if you look at the ones in tokyo sync and i think also the ones in cross beam and in fact in many of the other implementations these are referred to as unbounded and bounded in particular because of the the potential confusion with async uh ron de vu rendezvous right sorry um so basically what you should use if you need a convar but don't have a mutex with locked data um so a rendezvous is not a mutex because it doesn't guarantee mutual exclusion it is sort of like a convar in that you can wake up another thread that's true but it doesn't give it doesn't give you any way to um also guard the data more like unix pipes i mean all channels are sort of like unix pipes uh can run with channels actually send anything useful yeah so the the way this works you can totally send data in a rendezvous channel right it still has the t type but the idea is that if the sender can only send if the receiver is present and the receiver can only receive if the sender is currently present if they both are present then then you can just like hand the data over you can think of it that way it's just that the the sender can't like put data somewhere and keep going because that the capacity is zero but if there's a handover it can hand data over yeah seems rust has a lot of specific impulse design of channels while gold just has a simple channel implementation um while go only has a simple channel implementation as far as i'm aware go has all of these implementations so specifically in go there's only one type just like in rust there's only one type at least for things like cross beam these flavors aren't different channels like they're not different types in the type system they're different implementations that are chosen between at runtime and from memory go does the same thing it the way it works is basically initially you assume that the channel is a one one shot channel and the moment an additional send happens you like upgrade it to be a different type of channel and so this means that the first send will be more efficient in a sense than the later ones um similarly a rendezvous channel you know it's a rendezvous channel because the capacity is set to zero so you can just choose that flavor and synchronous and unsynchronous you choose based on what the capacity is set to um could a synchronous channel or sync channel where t equals unit be a rendezvous channel yes that is that is like a rendezvous channel is any channel whose capacity is zero specifically it is a synchronous channel for any t where the capacity is set to zero kind of like a baton pass yep um okay so in the last few minutes what i want to talk about is is different implementations i also want to try to touch on async as an actual async as an async await in futures uh we'll see whether we get to that so for a synchronous channel uh what we implemented was a mutex we didn't actually implement a synchronous channel we implemented an asynchronous channel but you can do a synchronous channel with the mutex plus convar as well and usually what you do behind the scenes is is very much the same thing uh use a vectech and you just like have the sender block if the vectec happens to be full so the implementation is fairly similar for an a if you want to not use a mutex what do you do there are a couple of different approaches here um the s simplest way uh is that you use um you use basically an atomic vectech or an atomic queue and the way this usually works is you have head and tail pointers just like the way that a vectek is implemented but you update them atomically and this means that you can now you don't need to take a mutex in order to send which happens to help a lot and as long as you update the sort of head and tail in the vector atomically there are ways to ensure like there's basically an algorithm for how to implement this uh this data structure in such a way that that no thread ever tries to touch data that another thread is currently touching uh and then for wake ups you use sort of the thread park and thread thread notify primitives that are in the standard library or you can use the ones from parking lot if you wish to ensure that things are woken up appropriately basically you need some signaling mechanism right where if the sender if the sender is asleep because it's blocking the receiver needs to wake it up if it receives something because now there's capacity available or similarly if the receiver is blocking because the channel is empty and a sender comes along and sends something it needs to make sure to wake up the receiver and so you need some kind of notification mechanism often it's park and notify although it doesn't have to be and this kind of like atomic vectec is very often the implementation you see it's basically a fixed size array where you atomically keep track of the head and tails that's also the only implementation i really know about there i think flume which is a one of the implementations i'll point out later i think it actually uses a mutex but it does it in a slightly smarter way than we have we have a sort of dumb implementation of mutexes but there's some smarter tricks you can play and some of them have been mentioned in chat already of like tricks you can play with the mutex implementation to make it slightly faster like take advantage of the fact that there's an arc there that i believe flume does uh and i think crossbeam uses the atomic vectech approach or they're not actually using a vectec but they're using a sort of head and tail pointer implementation um asynchronous channels uh similarly the thing we did was mutex convar and uh an evec deck in practice vectex have some sad properties like resizing so very often what you want to do and this is one of the few places you actually want to do this is you use a linked list so what that means is you never resize right because the when when a sender comes along uh you just append to the linked list or you you don't even have to append you can pop from the link list sorry not pop you can push to the front of the link list and then what the receiver does is it just steals the whole linked list it like sets the head to null or to none and it steals the whole linked list and then it walks it backwards and that is an implementation that doesn't require resizing it doesn't have the memory problems that the vectec does and it plays the same trick that we did below where if you take the mutex as a receiver you can steal all the items rather than just one usually you want this to be a doubly linked list so you can efficiently get to the end or you can just keep track of the of the tail uh and for for non-mutex implementations usually what you see here is an atomic linked list this is uh often referred to as like an atomic atomic linked list or a just an atomic cue so here you it's basically the it's not a ring buffer like a vec deck is uh but it's an it's an actual atomic queue uh usually that the implementation is linked list but it doesn't have to be in cross beam what they do is actually kind of interesting it is a an atomic um i don't know what to call it like block linked list and the idea here is that rather than rather than have like every push be an atomic operation to append a new item to the list what you do is you sort of mix this like atomic uh head and tail thing with an atomic linked list so instead of having a uh so this is a linked list of t this is a linked list of like a atomic vectech t so only occasionally do you need to do the sort of only occasionally do you need to actually append to the linked list which is uh is a problematic operation because if imagine you have two senders that both want to send at the same time with a linked list one of them is going to succeed in updating the next pointer of the tail of the link list but the other will fail and we'll have to retry if you have a list of these blocks of teas then only occasionally does a thread actually need to update the next pointer usually they just need to increment the tail which turns out you can do concurrently using fetch add and so this this sort of block atomic linked list turns out to be a lot more efficient in practice and here too you need some signaling mechanism for waking people up um for rendezvous channels uh it turns out you don't need the linked list at all all you really need is this wake up primitive uh and then a single place in memory just to sort of store the item for the handoff um i haven't looked too carefully at the flavor implementations of this i know that crossbeam has one the standard library has one i don't think flume has this optimization yet um but the trick to play here is basically you can get rid of the whole like linked list part and you can have a much simpler implementation that that just synchronizes the threads uh almost all you need is a mutex and a convar well in practice i think these things are they're a little bit smarter and a one-shot channel if you know that you have a one-shot channel it's the same thing you don't need a linked list of any kind you only actually need to store the 1t and because you because you know there's only one item uh what you can do is basically just have an atomic place in memory that is either like none or some and you can just atomically swap the element in there think of it like a single slot and so the sender just fills the slot and the receiver consumes the slot uh and marks the thing as sort of empty at the same time or completed at the same time so these are basically you can write more specialized implementations that are faster for those use cases okay let's discuss that briefly we're getting towards the end here how do async await channels different implementation we'll look at that in a second uh youtube stream is gone yeah youtube is not always great at streams there's no linked list guaranteed to always end up doing an allocation deallocation on each push and pop yes um with the linked list you will be uh you will be allocating and deallocating on every pusher pop right the push is going to allocate a node to stick on the end of the linked list and a pop will have to deallocate that node this is another advantage of the sort of block linked list variant of course often the allocations the memory allocation system is not your bottleneck um usually especially if you're using something like jmallock you have basically thread local allocation so this turns out to be fine but it is true that that is a that is a downside so you're really measuring like memory overhead versus memory allocator performance um wouldn't a smart list just keep spare nodes around um so one option is to basically keep a pool of these nodes and reuse them the problem is now you need to atomically manage the pool which also needs a bunch of synchronization primitives to do correctly but in theory you can it's not clear that you can write a better implementation of a reusing pool then the memory allocator can allocate and de-allocate memory um maybe but it's unclear you might want to use like an arena allocator and that might work well um all right so i think the last thing i then want to touch on because we're sort of out of time but uh i'll do it anyway which is um there were some questions about like async await what do you do with async await um it's pretty hard to write a channel implementation that works for both async await like the futures world and for the the blocking thread world because the the primitives are a little bit different right in if you do um if you do a ascend and the channel is full then in the async await world what you want to do is you don't want to block you want to yield to the the to the the parent future yield to the executor ultimately yield to the task and then at some point in the future you'll be woken up to pole again um and that sounds a little bit like sort of waiting on a convar but in practice it's not quite the same because you actually need to return rather than sort of you don't get to sit in the current function and the same thing for receive of course and the notification primitives are a little bit different although they do have the same flavor of like you notify a waker that's going to cause that other thing to be pulled again so they're similar not quite the same where it gets hard is to write an implementation that internally knows whether it's being used in async context like in future context uh or in blocking context without exposing that to the user very often what you might end up with is like some additional type parameter that is like waker or or like signaling mechanism which gets really ugly now there are ways to do this uh if you look at both flume and crossbeam i believe have both blocking and asynchronous versions if you look in the code you might be able to see how they do that and basically it requires a bit more bookkeeping you have to be a little bit more finicky with the types and usually what you end up with is a channel that looks much the same but not quite the same and the at run time it basically ends up diverging into different ways of managing the underlying data store um or the underlying data structure uh if well depending on whether you're in the blocking world or not for example often if you're in the blocking world you can do some additional optimizations you can't always do in the async world uh and vice versa they're just a little different but in practice the the data structure that's used like whether you're using a vectec or an atomic linked list or or anything like that that is fairly similar and the flavors that exist are fairly similar um you could probably beat the allocator because we always need allocations of the same size uh allocators are really good at taking advantage of repeated patterns um so it's not clear to me actually um because it's it's really hard to write good performant basically garbage collection and the memory allocator has had a lot of practice with it uh you could use a bump allocator or something like that uh use any channels in your thesis yes many i would never roll my own channel it's a bad idea unless you specifically want to work on concurrency stuff which like is fun but uh i use the tokyo channels because i needed one with async and the tokyo one i know pretty well i don't have a particularly strong feeling about it noria is not really channel bound uh so the decision wasn't that important i couldn't use the standard library one because it didn't support async await and i think crossbeam at the time when i started adding them didn't but i might be wrong about that um great um okay so just to give you some pointers of where to look next if you if you're curious about this if you want to see how a real implementation work i recommend that you actually have a look at the standard library implementation if you go look at the the mpsc model module you probably don't want to read it through like the docs rs interface but it has really good documentation like what's going on under the hood like what is the implementation some of the optimizations that they do like internal atomic counters similarly if you go to cross beam this will be bright for the people of you who are reading this at night in cross beam there's a cross beam channel subdirectory that holds the cross beam channel implementation and if you look at it you'll see that there's a flavors directory that holds all the different flavor implementations you see array is the one that's like for synchronous channels that does this headtail business this list for the atomic block linked list and some of these are for things like uh rendezvous channels and and one shot channels um select we didn't really get into but there's usually a bunch of additional stuff you need to do to support selection the selection is the ability to for example receive from one channel or the other whichever sends first which requires some additional mechanisms in the implementation there's also flume so flume is an a different implementation of channels that popped up fairly recently it has a very different implementation to what crossbeam does like there's no unsafe code and part of the idea here is that it uses mutexes under the hood but in a slightly more clever way i think the experience i've had is that cross beam is better for very high contention cases because it doesn't use mutexes whereas flume is often faster for cases where contention is lower because then mutexes end up not adding quite as much overhead all right i think that's all i wanted to cover um let's see are there questions about this before we end for today like sort of at the tail end of the channel here um and he thought some benchmarking channel implementations um benchmarking channel implementations is hard i know there's been some work on this um i forget where that was i think burnt sushi did a bunch of benchmarking of channels looking at like go channels the standard library channels flume crossbeam channel his chan crate which i think is deprecated now and that's worth looking into in general when you benchmark channels you want to try to basically benchmark all of the different flavors because they do represent real use cases uh you want to benchmark things where you send the things you send are large the things you send are small there are many senders there are a few senders so for example if you have a single producer single consumer case you might be able to optimize better for that basically write a flavor for it and that might be able to perform much better than your your sort of general multi-producer single consumer version you want your benchmark to test cases where the channel is usually full for a bounded channel cases where the channel is usually empty which basically means you adjust the relative rates of production and consume uh calls um the number of senders is important like how do you scale with the number of senders um so you basically want to like do like a grid of all the possible configurations and then try to benchmark each one separately rendezvous channels are like the default go channels with zero capacity yeah that sounds about right a bump allocator would be really good since you would likely allocate memory atomically quite possibly and also because it you don't need to drop anything in that case because the memory has already been handed off so the drop implementation is a no op so you might be able to use something like buffalo which is pretty cool um how do they support async without tying it to a specific executor like tokyo um so the primary reason for the the current like uh lack of harmony in the async await ecosystem is is around the i o traits um like asynchronous and great and also the the spawn feature being able to run a future in the background for implementing a channel you don't need either of those uh all you need is like the primitive that's provided by the standard library which is the the waker trait and the ability to sort of yield or go to sleep and the ability to wake something up or notify something and those are the same they're they're from the standard task module in the standard library and so you can use those independently of what the executor is and so that's why a channel sort of trivially is cross executor if you have a sleeping center thread and you'd like to wake it if the receiver is dropped so they can free up resources is there a standard way to do that just have it wake up every few seconds to check uh no you do it the same way we did in our implementation here right you implement drop for you would implement drop for the receiver uh where it will do a notify all to wake up all sleeping centers which could then do um whatever freeing up of resources it needs to do all right i think we got it uh i will as always um like put the recording up on youtube it might even have the intro that i like demoed early to the people who showed up to the stream early it should be up hopefully in like a couple of hours i'll tweet about it as i always do and apart from that thanks for watching uh hopefully you learned something hopefully there'll be some more of these and as always just like follow me and you'll you'll learn about upcoming ones i also announce them on discord now there's like a channel with automatic uh updates whenever i go live whenever i upload a video uh or when there are upcoming streams and that also includes other rust streamers we have a bunch of them on there now including steve klabneck which is pretty funny um so like join the discord i'll put the link somewhere maybe someone put it in chat already and i'll see you there on twitter or in the next video thanks for joining

Info

Channel: Jon Gjengset

Views: 29,744

Rating: 5 out of 5

Keywords: rust, live-coding, channels, mpsc, synchronization, concurrency

Id: b4mS5UPHh20

Channel Id: undefined

Length: 103min 11sec (6191 seconds)

Published: Wed Aug 05 2020