GopherCon 2021: Arash Bina - Using CRDTs to Build a Highly Available Decentralized Service with Even

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello and welcome to my talk my name is arash bina i am a principal engineer at bread i've been spending a lot of my time over the past few years building a decentralized or distributed applications and distributed platforms um and most of that has been in go um recently at brett we used a sort of a new approach or approach that i wasn't familiar with a lot using crdts to build a service and i thought it was very interesting and i wanted to tell you a little bit about it um uh at brett to give you a little bit of context we have a multi-talent platform and we provide the capability to our clients to create highly configurable payment products and offer them to their own customers to their buyers and then during the process of the checkout we allow them to do credit decisioning fraud checks and various different types of sort of workflows and when the checkout happens we allow the clients to uh to sort of service those payment products uh using the platform the talk is mainly divided into four sections uh i'll talk a little bit about consistency what that means in distributed systems but i won't dive too deep into it i just want to mention where crdt sort of fit in in that spectrum uh then i'm going to introduce uh crdts what they are and how we uh develop them uh in the third section what different categories of them are um and how you know what are the advantages and disadvantages of each of the categories and at the end i'll talk a little bit about um you know our experience overall how how was it building the system uh this way so as i mentioned i won't um get too deep into uh consistency here but you know the the type of consistency that we're all very much used to is a sort of a strict type of consistency where we either provide that with some sort of a data struct database for example different replicas of of a service they all talk to the same database and that's how they coordinate sort of on and agree on a sort of a strict a sequential way of events and it's important because at that point all the replicas can sort of agree on the exact time that events happened and so if you ask any of the replicas what the state of the system is they can all sort of agree and they return the exact same response hopefully as a in contrast to that we have a causal or eventual consistency where the you know we allow the state of the different replicas to be mutated and so when the when you make a call to a replica of mutated state that replica of the state will be different than the state of other replicas in in that sense we allow the system sort of to get out of sync and then we have instead of coordinating that in a sort of a you know either with a database or in in some sort of a leader sort of follower type of architecture instead of that we sort of allow these sync events to happen between the different replicas and what that does is that it it allows the replicas to sort of converge just in their states to converge you know in a weaker sort of uh eventual consistency consistency system and uh there's only a probability that that convergence will happen and it's not a it's not necessarily a guarantee if there are two events and one of the replicas you know interpret that event as the first event happened before the second event uh and another replica interprets that as the second one happens before the first one it's possible that uh the state of those replicas sort of diverge and so there are different techniques that you can use for example to you know fully sync between the replicas um you know compare them and then choose one of them and tell all the replicas that this is now the state of the system and so there are different techniques that you can use and in contrast to the sort of a weaker weaker eventual consistency there's the eventual strong consistency where through a specific type of you know merging function which is a lot more deterministic so if you have two events you know the result of the merge of those two events is always the same thing and through something like that you can sort of guarantee a stronger eventual consistency where you know that all the replicas will eventually get to the same state and this is really versatility's fit and that's the type of guarantee that they provide so what are the crdt's stand for conflict-free replicated data types um they're essentially it can be anything but for most common examples are a structure that provides um methods for sort of querying and mutating um you know the state of of a replica and the second part of it is a what we call a monotonically increasing function uh it's a merge function and what that does is that it has that deterministic sort of behavior where um you know you have a crdt you have a replica with a state and another uh replica um census state to this uh replica um the the sort of the result of those marriages will always be the same thing and a very simple example of this is for example a max function imagine if you have multiple replicas each of them are holding a counter if when we are sinking between them if you always pick the maximum of the values um you know after a certain point in time all the replicas will end up with the that maximum value among all of them and so that's how they sort of they converge and that's the that monotonically increasing function um the other thing that's important about crts is that they every crd would have a full perspective a full view of all the whole system um now their perspective different replicas perspectives could be different at any point in time but sort of the behavior of the system overall is that all those estates converge the same thing eventually over a period period of time um so strategies are essentially a way to build a system and replicate uh you know data uh without centrally uh you know coordinate um between between these replicas and so instead of that we do periodic sort of synchronization so there are two major categories of crdts the state-based crts and operation-based crdts in state-based crts which are relatively simpler than operation based what we have is that each replica essentially holds the full value or full state of the system of all replicas so for example if you have two replicas the first replica knows about the state of the system at replica 2 and vice versa or at least has an idea about what the state of the system is at other replicas now what it thinks it is between different replicas could be different at any point in time but the hope is that eventually they all come to the same conclusion to the same state the other thing that's important about the state-based crt is that is that during synchronization the full state of all the replicas that a replica knows about is transferred or communicated with other replicas so if i am you know replica one and there are three replicas in the system for example i will communicate my not only my state but what i know about the other two replicas state to everyone else and then the merge function takes care of that sort of the merging between them as opposed to that there's the operation-based crdts uh which are essentially uh instead of communicating the full state of the system and storing the full state of the system i would each replica only holds their its own state and instead of communicating the full state it only communicates what that mutation was so for example if i have a counter and we increment the counter by a value let's say by one if that happens in in a replica instead of sending the the actual value of the counter it sends the operation that happens so for example i incremented the counter by one and the other replicas are supposed to you know receive that and do the same thing down replicate that operation one of the important aspects of operation-based securities is that that message needs to be sent be received essentially the message that is sent by one of the replicas needs to be received by all replicas they don't have to be received at this exact same time but uh we want that message to be received by everyone so that everyone can do the same operation so in order for everything to be a little bit become a little bit more familiar i was thinking of starting from a sort of a very simple crdt concept and then we're going to build on top of it and make it a little bit more complicated so the simplest thing that i could think of as a crdt is a boolean flag and let's say we want a flag and the value of the flag we want it to be true if the flag has ever been uh set to true so if if a flag has been set at true once then we want that uh true value to be carried over and not be flipped back to false by a false event so the way we can define the merge function for this crdt is that you know as the value if the value of my flag is um true then i'll gonna keep that value and if it's false then i'm going to you know accept the incoming value uh and if it's true then i'm gonna flip the value of the flight to true so let's look at it from the perspective of you know different replicas let's assume that we have a system that has three replicas each of those replicas holds a flag value in this case the flag starts from zero point is false and then we we can see how that that value is mutated over the system now in this case i want to mention that although these three different replicas are have a flag in them and have they have a value these are effectively the same flag uh it's just been replicated three times these are not three separate flags and so when i asked one of the replicas what the value of the flag is my hope is that all of them can give me the same answer because they're effectively the same flag now that may or may not be true at different points in time but we will see that eventually we want all of them to become the same so let's say a client makes a call to one of the replicas randomly here at replica 2 and sets the value of the fly to true so in this case if at this moment in time if a client asks replica 2 what the value of the flag is it returns true but if it asks the same question from one of the other replicas it will return false so obviously there is an inconsistency in the system at this point now let's allow the system to synchronize so you know each replica takes one or two of the other replicas and sends a message and it broadcasts its sort of state it communicates it's a state and in this case replica 2 is sending a message to 1 and 3 is sending it to 2n1 and so you can see because replica 2 communicated it states replica 1 now replica 1 also has the value of the flight to true but no one really communicated to replica 3 here and so the value of the flag stays false if they allow another sync event to happen maybe one of the replicas of one or two they communicate back to replica 3 and so now the whole system has the value of true so at this point in time if you ask any of the replicas they all return true now i want to emphasize that this happens exactly because of that monotonically increasing merge function right because we are keeping the true values um and we are not flipping the flag back to false if we receive a false event and because of that eventually the system becomes consistent if you learn to not have that rule and flip the value back to false um you know the the system does not converge uh and so it it's possible that um they may all end up with let's say the value of false but that's not the actual um that won't be the true value of the flag which in this case uh should be true let's look at a sort of simple code example here and like i said we're going to build on top of this progressively uh but i have a flag here and it holds a boolean value and so you know it's a zero value essentially when you create a flag is going to be false now i have a merge function so if if another flag sends its you know state to the receiving flag the receiving flag looks at its value and if it's false then it accepts the incoming value and if it's true then you know it it sure sort of returns without making any changes and so if you look at the main function i create a sort of a zero value or a flag i update it to true with a true with an incoming true sort of state and then i will attempt to update it with an incoming false state if we run it as as we expect you know this is a very simple example it was initially false then it became true and then it stayed true okay so let's look at as something that's a little bit more useful in this case let's assume that we want to implement a counter and the value of that counter is um you know the the we want the counter so let's say count the total uh number of calls that are made to a service um now in this case because the calls will be made to different replicas um so the the total value the true value of the counter would be the sum of the values of all the replicas however because we want those calls to be made to different replicas and because we want to be able to ask any of the replicas for the value and be able to get an answer then all the replicas as i mentioned before will hold the values of the other flags so rather than having just one value we will have a slice that keeps the value of other replicas and when you're measuring them the way uh to do it is to use a max function as i mentioned um which is an increasing function uh to sort of update our own state and always keep the sort of the max value for all the replicas and so if someone asks for the the actual value of the flag we can add all the values together and return the answer so in this case let's say we have the three the same three replicas each replica has a slice and the green sort of shows the the element that a replica increments if it if a call is made to it and asks to increment its counter so in this case let's say replica 2 receives two calls to increment the value of the counter by one and replica 3 receives three calls so replica 2's value is now 2 and replica 3's value is 3. so at this point in the system obviously if you ask different replicas for the total value of the flag a replica of one would return zero replica two would trend two and replica three would return three but the true value that we know because we made five calls should be five so let's say a sync state happens and replica 3 communicates to replica 2 first and sends its state so now replica 2 knows about the three calls that were made to replica 3 and then replica 3 replica 2 makes a call to replica 1 and updates the state so now replica one has the correct state of five replica two has the correct state but no one really communicated to replica three in this case and so the value for from the perspective of replica 3 the total number of calls is still three and now let's allow another thing to happen in this case replica one for example communicates to replica three but it could be duplicate two and so at this after the second sync all three replicas agreed that the value of the total value is five this sync in in this case a specific example happened in two steps but depending on how the replicas communicate obviously it could happen with just one step or it could happen in more steps so we don't really know you know there's randomness here depending on our communication layer and so although the system will eventually become consistent it may there's no guarantee that it would become become consistent in in one step or multiple steps a simpler example of this let's assume that we are implementing this in just one replica but three instances sort of with three elements in in our counter so if the counter has a slice i can initialize the slice or create a new counter and initialize the slice and then i can increment the counter but when i'm incrementing i send which replicas value i want to increment and then i increment that and when i'm calculating the value i add all the elements together and create that and so if i'm on the main function you know i create a counter i increment r1 and then r2 and r3 and in this case because i'm using one slice uh you know i would have the full value uh the full value here and so if if i increment r1 r2 r3 and then at r1 the total counter value would be four and so if i run this you know you you see that it it gives me four now we can implement what we actually discuss about three replicas and create actually three separate replicas and each of them holding a slice and in this case i need the counter to know which replica it belongs to so i add an id to to the counter and when i'm creating the counter i i would send it which replica it is for and when i'm incrementing because the counter knows the replica that it belongs to it can use the id so i don't need to send the id in the increment function anymore the value function stays the same and i implemented the sync function so essentially now because we have three separate instances three separate replicas we need to be able to sync between them otherwise the values will always stay you know different so i implemented a sync function and so a counter can send its state to a receiving counter and the way these two gets mer get merged is through the max function so if the receiving replicas value is lower than the incoming value you know it it picks the higher value and in this case i mutate the receiving counter but obviously the sun counter stays the same so now if you look at the state of the system so i increment three counters uh i increment one and so from perspective of counter one um the total value is one but at this point the uh the sort of the total value of the counter from counter two and three obviously is still zero um so i increment counter two now counter two has a one for its element and then i send the sync message from counter 2 to counter 1 and in this case counter 1 gets updated and now it has the correct value of 2. i increment counter three and three sends a message to one so counter one now has the correct value for the whole system now i increment counter one it becomes four and counter one then communicates with counter two and then counter two communicates with calorie three and then i print the state of the system the code um so we can see that you know uh as uh counter one got updated and then a sync happened between counter one and two now different from different replicas perspectives the counters values you know are different and then it goes through um you know incrementing the counter at uh replica three and a sync happens and then increment in the counter the camera at replica one and the sync happens and so eventually at like at this point in time at the step three uh replica one and two think that the total value is four but replicas three things that it's one and then after another sink all of them agree on the on the same counter value so um so that was a g camera which is a growing encounter um but what if we wanted to implement um you know account a counter that we could decrement this value um yeah if we just have one slice if we implemented the way we implement the g counter um and decrement but decrement the values you know because we are always taking the maximum um the the decrement events will be lost because we are always picking the highest number so the way to implement implement this a pm counter which is a positive negative counter and we can decrement it is to have two separate slices so we use one slice for the maximum increment values and we use another slice for the maximum decrement values let's look at the code for a pn counter so we have three replicas you know same ids but now here i have added a negative counter in addition to the positive counter or as negative slice in addition to the positive slice sort of when i create one i i initialize both slices and when i'm doing the increment value i am basically incrementing the positive counter positive elements in the positive slice and when i'm decrementing the counter i'm still incrementing but the values in the negative slice and so when i want to calculate the total value of the counter i add all the positive values together and subtract from that the total value of the negatives and when i want to sort of sync or the merge function looks at the positive elements separately finds the maximum for those and then it it looks at the negative values or the negative counter and it it still picks the maximums and that way it merges the two states together so to make it a little bit more automated and interesting i implemented a sort of a gossip function here and the way it the way it behaves is that each replica picks one of the other replicas as you know the replica that it wants to send a message to and in this case sends a message to the other replica and sends his state and then this states gets merged so there's a sort of a simple pick function that a replica can that the gossip uses to pick another replica randomly and i implemented a utility initialize function so i can initialize everything in there and then the main function basically seeds the randomness for you know for the functionality to pick a random replica initializes all the nodes and then really it's the it's the part where you're uh incrementing the counter so in here uh replica one i'm incrementing the value of the replica one and so you can see the positive um counter for replica one becomes incremented to one and then i do the same for replica two in this case two gets uh incremented and replica three three gets incremented and the negative values are obviously zero because i haven't decremented yet and now i decide to increase decrement the value for replica one and in this case instead of decrementing the positive counter for the replica i increment the negative counter right so in this case you can see that from the perspective of replica one the value is zero but from the perspective of other replicas the values are one and actually none of the replicas are correct at this instance because i've incremented three times and decremented one so the actual value should be two to make the gossip function um more automated i basically i create a ticker and that ticker takes every uh one second and then at every one second it does a the gossip functionality and then it prints the state and after five seconds i want the application to be done after the initial increments and the decrement this is the state of the replicas and then after after the gossips happen every one second the the values converge and so all of them agree on the same value now it's possible if you run this multiple times because they pick random values now in this case you can see for example after the first gossip that these two values are different so after the first gossip for example the counter two also knows the correct value but counter one is not updated yet so if you run it multiple times you you might get different results so i i modified that same implementation for a pn counter but to make it a bit more interesting and see how the system functions in more of a live sort of system i what i did was i created two other tickers so there's an increment ticker that takes every 600 milliseconds and what it does is that it picks one of the one of the nodes randomly and it increments its value and then there's a decrement uh ticker which picks one of the nodes randomly again and decrements this value but that one runs every one uh one second and then the gossip ticker takes every 200 milliseconds so it's possible that it may tick before an increment or after or before a decrement or after or multiple times between those because it's a shorter time uh so you can see that the sort of the go function here is a little bit more involved but fundamentally when one of the tickers ticks it prints that in increment of that value and then when the gossip happens the communication happens so let's look at what happens when it runs so you can see in this case that initially the system you know starts at the zero zero zero and then an increment event happens in this case at replica two and the value of the replica two is increased to one but as you can see replica three also shows one and that's probably because after the increment happens immediately after before the state was printed the communication between replica 2 probably sent a message to replica 3 and updated this state and in the next step the replica 1 also has the correct value and then we decrement replica 3 which becomes 0 and an increment um replica 1 and so you can you can see that it's difficult to predict exactly unless we print all the exact timings of the gossips and increments and decrements it's difficult to predict what result we should expect the important thing is that you know the the values don't diverge the values keep converging to the same values sort of for in all the replicas and at the end before i uh kill the application you know replica 2 and 3 i agree but uh 2 doesn't agree probably because the application was killed before the last sync or you know before the full convergence happened okay so that was the last state-based i want to talk about and so i just want to conclude the part about the state-based crdts and take a look overall at the advantages and disadvantages of them on the advantage side um i think you know the state-based crdts are a lot simpler to implement as you will see um in comparison to the operation-based crts but on the disadvantaged side obviously every replica needs to hold the values at full state of all the other replicas so on the uh for the space complexity um you know you need more space than if you're only holding uh the value of you know your your own replica essentially on the communication side as well uh each replica communicates the full state uh to all the other replicas and so because of that um the bandwidth for the communication is a higher bandwidth you need you need more data is transferred between the replicas okay let's look at an operation based crdt so for operation base i wanted an example of you know let's implement a set and the the way to set functions is that you know you add an object to the set it stays in the set until you remove it from the set and so similar to the pn counter we can't simply remove something from the ad the when we add it to the set uh because that function will not be increasing so similar to the pn counter i will have two separate uh sort of sets one is an ad set and one is a remove set and when i want to add something i add it to the ad set and when i want to remove something i add it to the remove set and the the net value of this set is sorted out it looks at the items that are added objects that are added and if they don't exist in the remove set then it concludes that those objects exist i wanted to make it slightly more interesting uh what if we wanted to be able to remove objects and re-add objects that we have already removed so if you add an object and then remove it that object will exist both in the add and the remove sets and so there's no way really to re-add something that i've already removed so in this case what i did was i i created a sort of an arbitrary rule so when something is added i will remove it from the remove set but i'll also time stamp and so if there is a case where uh two concurrent uh sort of events have happened one is an add and one is a remove i allowed at addition to win over the removal and now that's a completely arbitrary rule because i wanted certainty to function specifically this way but you could for example build another one that the remove wins over the the addition and so that clock the increasing sort of clock that is what provides that sort of monotonically increasing merge function so if you look at the implementation for that i reduced the number of replicas to 2 just because this is a longer example and wanted to be a little simpler but so we have r1 and r2 for the ids and now here i've implemented a clock and the timestamp is essentially the version value for all the replicas which is a slice of integers and the clock holds an id that id is the id of the replica that the clock belongs to and it has a timestamp now each clock can increment sort of give the next value for for the clock next timestamp and then i have comparison functions less than and more than and so when a clock is sent a timestamp it can look at the timestamp and make a comparison and say if the timestamp is you know ahead or you know before the timestamp of the clock it is possible that it's neither of those so and if it's not before and not after then i conclude that those events from the perspective of the clocks happen concurrently and that's where i will allow the the addition to win the timestamp returns a sort of a timestamp for for a clock and i have a new function to to create a clock and then i have an update function where when when a clock is send a time stamp not only can determine if it's before and after but it also needs to sync its own clock with that some time now because we want always to have keep the maximum version of of the you know individual clock values so in this case it syncs its own internal time stamp with that so the set is actually this the crt itself so it has an add map and it removes that remove map and i have a mutex here so that i can lock it when i'm sort of mutating the values of the maps the new set just initializes the maps and then the value as i described it looks at the ad set and it looks at the remove set and if something is added and not removed then it concludes that it exists in this set otherwise it doesn't so i introduced this uh i introduced a node struct here so that it makes the functions a little bit easier and a node essentially is a is a struct that includes a clock and a set so the node can add something to its set and to do that i just added to the ad set and delete it from the remove set as i mentioned earlier and update the clock of of the node and the remove just adds it to the remove set so initialize nodes just uh creates clocks and sets and adds them to the nodes and here's the operation part of the operation basically where i define an action and the action is just a you know an integer it's an either an add or a remove and the operation struct has a node id that's the id of the node that's coming from it has an action that happened and it has a timestamp and the object itself what was added or removed i have a send function send method that's implemented on the node and so one node can send an operation to another node here all the function does is that it initializes sort of it creates an operation and then it sends it to it to the node to be processed and so the process is really the the core of the logic and so initially i want and no matter what the results are of the merges i want the clock to be synced to the incoming time stamp then i look to see if a clock you know at the time stamp has happened after or the clock is before um the timestamp of the event that was sent and if that's the case i just update the internal sort of state based on the incoming operation if it's happened after it means that or the clock is clocked timestamp is after the incoming timestamp then that means that i don't need to do anything and all happens is that in the in the sort of differ function i just update the timestamp or i sync the clock and if the code sort of reaches here it means that it was neither less or more so they must be concurrent and so in that case if it's an ad i added and remove it from the remove set and if it's a remove i just basically add it to the remove set so if you look at the main function here i try to manually control sort of the additions and the sync so you can see what is happening in the system so initially i add a just a simple object to the first set and so for replica one i have added a and its remove set is empty i get the timestamp and i send it to replica 2 and so now replica 2 also has that object here i remove it from replica 1 and then i sync it to replica 2 and so it gets added to replica 2 as well then i re-add it back to replica 1 and what that does is that it removes it from the remove set and then at the same time before i allow the sinks to happen i remove it from also replica 2. so it was already removed from replica 2 but that what that does is that it creates two events one is at replica one that's an addition and one is that the replica two that's a removal and they happen at the same time because they uh they sort of the clocks haven't been synced yet so then i sync both of them and because of that specific rule that i want the additions to win over the removals um both of them should end up with the same object and if you run it that's exactly what we expect so the add happens the sync happens the remove happens the sync happens and then the add happens and the remove happens but it hasn't synced yet and then i let it sync and now both of the replicas end up with the same value okay then the last series that i want to sort of mention here briefly and i don't have an implementation for this but we actually use this for some of the things that we implemented in our service is a contextual series in a sense that with this you can really implement any arbitrary object and all you have to do is define an increasingly sort of that monotonically increasing function and the way we define it on internally was that we look at the state of these objects and we determine we define as sort of an increasing sequence for those so imagine you have a ticket and you have a ticketing system and you have a ticket and the ticket can be only like opened and modified and closed um now if you have two simultaneous or concurrent states one is a modify and one is a close you can say that close should always win over the modify right and then that's how deterministically you can merge those together now that gets a bit more complicated if for example you have you allow multiple modifications to happen at the same time and so in those cases you have to determine um you know which modification should be applied last or which modifications should be applied if they're happened if they're modified the tickets have been modified one of them for example represents a name change and one of them represents a description change um then uh you have to sort of uh have fine-tuned those and decide um how we want to um how we want the modifications to win um one one simple way to do it for example is that you can always allow the last sort of modification to win but in that case you may not see some of the earlier events so it really depends on how you want to design the system so on the operation based crdt sort of conclusion obviously the advantage is that um you know you have a smaller storage footprint uh because each of the replicas only hold their own data and on the communication side they only communicate um you know the the operations which are probably very small uh compared to the whole state of the replica uh and also you can see that you can probably build more sort of general purpose use cases with with operation crdts on the disadvantaged side um the system is more complex there are more moving parts um and so you have to write more code and as we know when we write more code we also introduce more bugs in the system so they are trickier to sort of implement and test and make sure that they are they are correct and if there is a bug in the system it will be more difficult to sort of find and address and the last thing is that they require more reliable infrastructure because we want each operation to be you know received by all the all the replicas and we have we need to have a sort of a reliable infrastructure that can do that um delivery the way we implemented the crt using the service we started with concrete implementations so because initially we didn't know what we wanted behavior or what is the common behavior for all the crdts so we implement them in a concrete way but as we were we implemented a few we noticed that uh you know there are common sort of a pattern common behaviors between all of those and so we created this interface where each strategy can give its id kind of its type has a merge function that can merge itself with an incoming crt you know some of them need a time stamp and so we can set the timestamp this way and also you can see rdd can prepare itself for broadcast and that means that when i want to communicate my state or the operation or whatever it might be the crdt knows how to package itself into a message that gets put on the sort of a transport layer to the communication layer and sent out and this really helps to simplify the other layers of the system the the communication layer or perhaps the storage layer depending on how you implement it so there are a few uh sort of miscellaneous smaller things that i want to mention here on the story side obviously each replica keeps its own storage they don't use a common storage and so you can implement that different ways we ended up implementing a simple sort of a memory based which uh each replica holds the whole sort of state in memory and also file based and depending on your needs depending on how critical a system is you can you can use different types of storage the other thing is that for tests to be quicker and not have specific dependencies for example you can use memory based and you can use file based for for other use cases on the deployment side this can be deployed really as any service as long as the replicas can communicate to each other in our case we deployed these as side cars because we use a kubernetes deployment and so the services that require access to these crdt values they have a sidecar that is essentially our service that provides the crdt functionality another very uh fascinating thing that was the topic of the clocks when we were doing research on this there's a very interesting implementation i've referenced to that on the last slide where there's a specific implementation where the the clocks have three components a physical clock a logical clock and sort of a vector clock and what that does is that it attempts to keep the vectors keep the clocks close to the physical clocks as you know in the distributed system it's impossible to keep all the physical clocks exactly in sync and so typically they might be a bit out of sync by a few milliseconds pro perhaps there are some certain implementations that for example use a specific hardware to keep the everything in sync but in this case it's a it's an algorithm that you know they have used and that's how we um sort of that's that was also our implementation for the client and that's what we used you need a gossip layer and um essentially the communication layer can function different ways but we use the gossip layer in which each node doesn't broadcast the whole um state to all the nodes it just picks some of the nodes randomly and you can obviously fine-tune that on how many nodes or how often he wants to communicate but then through this gossip all the nodes receive the state one of the interesting implementations that we learned from was hashicorps sort of the member list and it does a lot of things a lot of things happen in that library for example you have cues that you know between the gossips messages can be queued in there and then it communicates the whole cue and it takes care of for example the maximum length of the messages and a lot of things like that and some of the messages piggyback off of each other so that the whole system is more efficient the last thing i wanted to mention is that i mentioned that in the state-based crdts the whole estate is always communicated to other ones but there are some interesting articles as well where they implement the merge functions in a way that you can also communicate delta updates on the testing side uh so the main the main thing that you want to test on the crdt is i on the unit testing side is really the merge function and the value functions because that's where most of the logic lies and on the on the value functions are typically a lot simpler than the merge functions but most of the functionality is in the merge function so you really and it's something that's relatively easy to to unit test so you really want to make sure that the merge functions are you know correct as a central to everything on the integration testing side as you can imagine there are a lot of moving parts the integration testing is a little bit more difficult but we wrote several several sort of utility functions that helped us to uh to build the integration test a little bit easier and faster so one of those is for example they're functions that can create multiple replicas and establish communication between them we have functions that allow you to set the values for the replicas directly if you want to test something that for example before or after it causes a sink and the value of a replica is is something sort of that you know what it is you need to be able to manually adjust the clocks to be able to sort of be deterministic about the result of a test function you want to be able to perhaps create timestamps that you know happen just before or just after another time stamp you sometimes you want to be able to manually control sort of the communication between the notes for example you want one note to specifically communicate to another node and so you can you want to be able to manually control those in some of the tests and then there are utility tests where you can insert essentially the consistency between all the nodes so as the communication happens and the states change at the end you want to be able to sort of assert that the the state of all the replicas are the same and then you want to clean up functions that can clean up the replicas and such so in conclusion you know one of one of my colleagues asked me if if it was if it was worth to implement this service this way and i think that's a fair question it's a lot more complicated than sort of the regular ways we're used to writing services and although it's a fair question it's also an impossible question to ask out of context because you know you want to choose the right tool for the right job and so if you have certain performance for example requirements if the eventual consistency is an option for you um and all those conditions are right um then yes it might be the right tool and it might be worth implementing um if it's not if the function of the service is a lot simpler if it can't be easily implemented um you know using the sort of a regular ways that we implement services then perhaps no it might not be worth implementing that way but overall crts can provide great performance because you know every you can ask any of the replicas and it can very quickly uh respond back and also that coordination on the right and mutation side sort of uh does not have to exist uh and it's sort of it happens asynchronously um on the disadvantaged side there are many moving parts to cruts and they can be complex to implement and so um the testing is is more difficult and those are some of the things that we have to consider when we want to build a system this way and um these are some of the references uh there are many more um but these were the ones that i think we used we looked at most often and so i wanted to have them here and thank you that was my talk i hope this sort of give you a different perspective or perhaps if you knew about it it added a little bit more to um what it takes to implement crts and i hope it was useful to you you

Info

Channel: Gopher Academy

Views: 128

Rating: undefined out of 5

Keywords:

Id: snWY9bm0Q2o

Channel Id: undefined

Length: 49min 47sec (2987 seconds)

Published: Fri Dec 17 2021