ElixirConf 2021 - Tyler Young - Architecting GenServers for Testability

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Applause] [Music] [Applause] [Music] so folks uh welcome to the architecting gen servers for testability talk my name is tyler um very briefly about me um i'm a team lead at a company called imbala we make a software as a service platform for uh energy utilities to help them balance energy production against demands um i like to joke that i'm a recovering c-plus plus developer i spent 10 years at a flight simulator company called x-plane and toward the end of my time there i got to make this massive multiplayer server game server um and used elixir for it and so when i was ready to move on from from the world of flight sam i was specifically looking for stuff that i could do x or that i could do uh elixir full time um so as a caveat uh i really like this quote i'm not an expert i'm just a dude um in kind of testing this presentation out on some people um some of it is going to be controversial you probably won't agree with everything and that's cool um you know you might be right or maybe we might just be coming from different places and different experiences so i figure in the worst case this is uh maybe a good place to start a conversation uh with with your team or you know other people you respect so i've structured this talk as what i'm calling an exercise in comparative architecture so we're going to start with a version of a code base that is pretty bad uh it is probably not the worst code you've ever seen although you know maybe you're lucky and it is and then we're just going to apply a series of refactorings over it to improve the architecture improve the testing story and kind of tackle some things that that bite us in the real world we're going to be focusing on three areas in our refactorings the first is this separation of concerns we're going to be trying to um keep things that are conceptually distinct distinct in our code base and and the opposite side of that coin is that we're going to try to keep uh things that are conceptually closely related close together in the code um if you've seen sashi eric's recent talk uh clarity um i i really liked it and uh one of the things he said was that like separation of concerns is like the foundation for everything else that is good about code right everything about clarity and communication um it all starts with that uh so the the other way to phrase this uh the the object-oriented way is um to call it reducing coupling and increasing cohesion um that makes it sound very professional uh the second thing we're gonna be looking for in these uh refactorings we're going to try to make things more explicit uh to the extent that there is like implicit global state that our tests might need to manipulate or that the architecture as a whole might be manipulating um we're going to try to reduce that uh this is going to make it easier on yourself when you come back to the code base in six months and it's going to make it easier on your team as they're trying to figure out what's going on right um so yeah to the extent that we can reduce that global state that's that's a good thing and finally we're going to look at some cases where gen server cast kind of bites us and the the problems that that causes so those are the three things to keep an eye out for um if you'd like there is a link at the bottom of the slide um you can go there and follow along in your own ide or or whatever or you can just follow along with me so the code base we're working on is like this micro version of the code base that i work on every day we it's all about this virtual power plant and a virtual power plant is um an abstraction over some number of grid connected things in this case batteries that a power company can dispatch to meet a need and in this case batteries are a really good fit for for what we're trying to do because um if you think about like like a tesla power wall that can power a whole home or or maybe a commercial or industrial uh battery that could power a whole factory um these things are capable of dumping a bunch of power onto the grid when maybe your your central uh power station can't keep up with demand or if there is too much production maybe from like rooftop solar or something uh they're capable of absorbing that power um so uh in the initial version of the app we're looking at there is one virtual power plant it's a singleton um and if if our energy company wants more than one vpp um they can spin up a new instance because instances are cheap and then the other the other side of this of course is this battery registry with a bunch of battery gen servers the vpp will address batteries via a unique id and so anytime we call into the battery module it's going to be looking up the correct process to dispatch to via the registry um using that id so let's go ahead and look at what that looks like the application is blog standard right a battery registry start up a vpp um the battery gen server is pretty straightforward you can create a battery it takes an id a maximum power and then a current power which defaults to zero and the two things you can do to a battery you can update its current power and you can ask for its current power um the only bit of like business logic here is this limit power so if you ask for uh more power than the maximum will we will you know limit what we actually put on the battery uh battery registry is bog standard nothing to see there the virtual power plant this is our second gen server you can see at when the system starts up it's going to start a gen server with the name module when you add a battery it's going to cast that's just a battery id and the state of our gen server here um is just going to be a list um a list of the battery ids that this vpp manages um so things you can do to a vpp you can add a battery you can check the collection of battery ids that it's holding you can ask for the current power which is the sum of all of the batteries that it has uh and then you can ask it to either dump power onto the grid or absorb power off the grid the these like mutators are set up as ginser or casts uh we'll we'll revisit that later um but yeah so reasonably straightforward um if we look at the first tests we've written for our virtual power plant we are going to create some batteries add them to the vpp and then and then make some assertions on them right check that the current power is the sum of 8 and 3. and if we actually run this it's going to fail and you can probably guess why it's failed we're adding batteries twice right uh if we run it again uh it might fail on a different test right so it's failing because we've added this battery one id twice and battery to id twice um so this is the problem with like singletons right this is this is uh why people besmirch the name of singleton um and if you know a little bit of elixir uh if you've been around the block you might say i know how to fix this we'll just do a sysdot replace state and we'll look up the singleton vpp and we will replace its state with an empty list before each test um this will technically work uh you can you can run the tests and and it'll work but uh this is obviously very tightly coupled to the implementation of the vpp if we change its state um obviously this is going to start failing um and you know god help your your poor uh junior devs who who look at this and are like what's going on right uh so what can we do to improve this situation um if we open up version 1.5 we can instead of addressing the server by the name of the module we can make this call into server name server name will read some environment state and pull out a name and it will default to just the module so in our production system if we haven't configured this environment variable we just get the module name and then of course in the implementations here we can just call into server name this will work and in our setup now we can just put this environment state um but again this is like we're trading one bit of global state for another right uh to the extent that your tests are like manipulating environment stuff um it's it's kind of a code song right because your your co-workers or your future self are going to have to like search through the whole code base for this this atom and figure out where it's being used and what's actually doing a much better option is to just explicitly name rppp so we can accept these options we can pull out a name if one's provided and if not just default to module and then in each of these functions we can ask which server do you want to talk to right and we can even provide a nice default this is something that's kind of unique about elixir that you can have the first argument defaulted so if you call add battery with one argument you get the module uh feelings are mixed on this i could go either way some some of my co-workers feel really strongly that like it's not that many characters just to type you know virtual power plant when you want to make that call um but what this lets us do is in our tests we can say virtualpowerplant.startlink and pass it an explicit name and so now each of these um is setting up a new instance of our gen server and all is always right in the world so uh where we previously had one gen server for the whole app we can now have optionally multiple um and if we run these um i've added a few new tests to this version too um if we run it we're going to see it's going to fail um and it's going to fail in this test of checking that when we try to add power or absorb power from the grid that we're limiting to the battery's max power and the reason it's failing is that this absorb call if we jump to the definition of it absorb is a cast and in the implementation of absorb it's calling into each of those batteries to update current power update power itself is another cast and so we have like these two layers of asynchrony um if we were to run it again there are like four different values that you can get and if you get lucky this test will actually pass or unlucky depending on how you look at it um but uh that that asynchrony is just going to make the system hard to test and again uh like a little knowledge is dangerous right because you could say to yourself ah i will just use sys.getstate to uh to synchronize on the state of those gen servers before checking the total power right and you can work out we could spend a long time discussing why this exact series of four sinks is necessary in this order um but obviously the much cleaner solution the much simpler solution is if you don't really need that cast like just just do a gen server call right um and so in version three here uh if we just change that to gen server call and same with the the battery side update current power is a gen circle then that test works the simple way right like it it works in the way that you would that we did intuitively write this test the first time right um with some of the batteries you just iterate over it and you don't worry about sync syncing state right um so that is gonna be much more maintainable for yourself and and your co-workers um so let's let's go back to that architecture diagram um as you look at this you kind of have to ask yourself whether the like look up a battery by its id um is a good architectural decision right this this is um kind of the pattern that people tend to fall into like i have some bit of state i obviously need a gin server for it um but to the extent that we are not really worried about isolating failures um we're not we're not there aren't multiple clients for a battery that uh that might be calling into it it's all it's all this one vpp um we might ask ourselves whether we need those gen servers at all right we could massively simplify the architecture by making that just plain data and so um let's open up version 3.5 here um what if we remove the battery module entirely and we make adding a battery just take that data that that set up the battery previously id max power current this make battery function can return a struct with those fields um and then we can just put all the uh all the business logic there in line um you could argue this is better for cohesion pulling that stuff out but there's always that balance right between keeping related logic together and and separating things that are conceptually distinct so this is uh this is a non-example what can we do instead my suggestion is that if you don't have a good reason for for having a gen server just use a struct right id max power current power um and then this logic we have isolated um that that power limiting stuff uh into this battery struct and accessing things like the current power you no longer even need a function right it's it's just on the struct and you can know in advance that it's going to be on the struct right it's not it's not a formless map um that may or may not have the keys you're looking for um the vpp can now instead of taking a battery id it takes a battery struct and to my mind this is uh honestly pretty close to like the um the platonic ideal of this system right so we have this stateful uh bit of of uh state that that changes over time the virtual power plant and its collection of batteries uh but the batteries themselves are are just plain data um again this gets us back uh the separating out those battery concerns but um also kind of clarifying our thinking around the vpp so the next bit that i'm going to suggest is uh perhaps the most controversial bit of this um [Music] when i'm looking at this it seems like the virtual power plant is kind of smooshing together two distinct concerns the first is um the business logic around like how do we how do we handle calculating the total power for vpp how do we handle uh updating the power uh we're we're combining that with the overall like state management over time right a gen server is a tool for managing state changes over time um but having those combined um i i feel like it could be cleaner so let's look at what that might look like um if we instead went to a virtual power plant that was just a struct right now it only has one field batteries but as you know you know strucks have a tendency to grow fields over time we can so we're operating on battery trucks but then because our state is just a struct the implementation here it just goes straight in line right contrast that with looking at version four where if you want to know the implementation of add battery you have to go search on this atom and again like god help you if that happens to be a very common atom uh used for for other things in this case like the implementation of add battery is separate from the function head in this case it's just right there um i like this a lot and and what you can do is as long as you get the wire up correct uh of the the gen server that wraps that struct um you can do all of your testing on the plain stateless uh struct uh and then testing the stateful part of it the the state management over time the gen server becomes just a matter of ensuring that you've wired up to those uh those struck module functions correctly so uh the test for our virtual power plant looks like creating a struct adding batteries to it and we end up with with the struct uh that we can just make assertions against right um this is really nice for testing um and then our server looks like it looks like more of a smoke test than um [Music] you know testing every edge case and that sort of thing because you know once you've got the edge cases handled in your your struct module uh again assuming you get the wire up of your gen server correct uh you can't possibly screw up so the one thing i don't like about this is that we still have so much boilerplate in our gen server module right we still got all these calls uh you still have to go search around and okay so add batteries implemented here that's just a call to that function and so on um so this is what we were working with a server contains a struct which aggregates batteries um there's a refactor we can do here um i have used this it's it's kind of the world's dumbest method for wrapping an implementation struct this apply call function takes a function from your your struct module it takes the struct itself and then takes whatever additional arguments you might pass so in the case of our virtual power plant if we were wrapping ad body uh we'd pass uh virtual powerplant dot add battery as the function uh the state would get passed and then we would have one additional argument of this battery um so in the server wire up we say this is what we passed to gen server call and then we have one handle call implementation for all of these uh again this gets rid of some of the boilerplate that boilerplate but um i'd really like to reduce that even further right it would be really nice if there were a macro that we could use to say you know this gen server is going to export you know a stateful wrapper around all of these struct functions from from the module that it's wrapping and you know do that in the obvious way right uh i don't have either the the the macro wizardry or uh or the time to to have done that myself but uh if somebody wants to collaborate on that i'd certainly be interested um yeah so as you're looking at this you might wonder why not just use an agent right with an agent you can have this business logic in line and you can you can get rid of the spoiler plate the the problem that i run into is that it seems like every gen server i write has some limitation that prevents me from using agent right maybe it needs to handle timer messages what that might look like is suppose in our in our system we have um we have these batteries battery uh that correspond to physical batteries in the world well when we ask to set the power of that battery it might not actually get that power right maybe the battery is offline uh maybe the owner of the battery has overridden our request and so we need to like update um we need to make some kind of call to the internet attached device uh at some regular interval uh to check and see that that uh you know what we've asked for is actually what we've got and if not may we adjust things right um this is like th this happens in like all over our code base right process.send after uh we've got some time out and then we want to send ourselves this message fetch remote state um because this is so common like this prevents us from using agents um in in our code base at least and the question becomes like what do you do with the stateful management part of this uh this state right when you set up your your vpp server we want to take these remote update this timeout duration and and a batch size uh like how many how many batteries do we update once if you've got ten thousand batteries you don't wanna you know uh send ten thousand requests every ten seconds or whatever um so we've got we've got these parameters that tell us how to manage the state right and the pattern that i've used is um in our init to make our state a tuple and one is the state itself and the other is information about how we manage that state um this lets us unwrap those separately and kind of uh kind of pull out the the state management part when we need it um i the alternative of course is to stash all that information in your your vpp struct itself but again like the idea of separation of concerns you know the the batch size at which you you call into uh the vpps update batteries um that's not really part of the vpp struct itself right it's not it's not core to it it's kind of configuring how you use it therefore i do like having it pulled out separately with that [Music] one last architecture diagram we our vpp server now has the vpp struct itself and then like this separate stateful config um [Music] again i'd be interested to hear how other people have done this um this is kind of what has worked for us um but i'm sure there are other other ideas that people have so that's what i got um by all means uh hit me up on twitter uh with questions comments criticism uh and if you wanna come work on a team that values thinking about this stuff and and thinking about testing and architecture we are always hiring and i think we have a couple minutes for questions thank you so much tyler it was a great talk thank you for being here at elixir golf um we appreciate the the effort that you all speakers do uh we are waiting in case we have questions i think we are catching up with the with the lag as we speak so yeah anyone have any question for tyler i think we just catch up cool so everyone is super happy with you thank you again for this talk and let him know if you have questions okay we have a question cool i will stop sharing your screen and so you can see a question that we are having here perfect it's displayed is from brooklyn um yeah so replace state on async tests um because those messages get serialized in the actual uh process yeah you'd end up stomping on on other like if you had two tests running concurrently and you were you were doing the replace state uh yeah you you'd stomp on the other tests um it the those those first tests were uh intentionally not async uh because because that wouldn't work um i don't know about you but in our code base we certainly have a number of tests that are not capable of running async because they are so stateful like that cool thank you so much we also have another question actually it's from brooklyn i think it's a follow-up cool new server process instant for each test um yeah to some extent you could make the argument that we should test everything you know exactly as we're using it in production right in my opinion if you are depending on that um like it's kind of code smell right um i think there's some value in having smoke tests where you do um where you're not just testing a single uh a single piece of functionality adding a battery or or checking the sum of uh the power and stuff uh but maybe you're doing a bunch of manipulations over time and and and checking that um we certainly caught bugs that way right uh you know bugs that we didn't see in the the isolated unit test uh when you when you do a dozen things to a vpp uh suddenly you you see issues right um so there's some value to that but i think for like it's like the testing pyramid idea right like most of your tests should be uh unit tests that test some very small bit of functionality and then you have a few uh bigger tests that you know uh whether you call it a singleton or or a you know a long-lived instance and that sort of thing um there's that's how we've handled it right uh so maybe we have like 10 unit tests to one like bigger smoke test integration test thank you so much we have another one how do i like intellij for elixir um i have been a jet brains user since college a long time and so i'm i'm very much wedded to their id's uh the elixir support is really really good for being a community plug-in but it it would not trick you into thinking that it is um first party unfortunately um i uh yeah i i use it and i still like it better than like vs code but uh i i do wish intellij would would invest in a first party and elixir id oh the code editor war this is yeah exactly always a good one okay we have another one from chris testing handle info um yeah in the um in the in the uh repo that you can you can check out that version seven um we do have tests of of our um timer there um and we specifically create an instance and say uh give it a a a timeout to make those those calls of like 10 milliseconds right and so we create it with the 10 millisecond timeout uh sleep 10 milliseconds and then and then check that it's done stuff right um testing that that testing that the logic is wired up correctly i think is is valuable um but again when the um when the server component the gen server component is like a thin wrapper around the struct where we would spend the bulk of our testing effort is on uh testing that the function that gets called by handle info is is correct right so we would have maybe a half dozen tests to test all the all the edge cases of that update function um and then maybe one test on the gen server to make sure that it is calling into that struct module correctly cool thank you we have another from mike yeah so uh there are a lot of different ways uh you could handle it um in the if if you look at the implementation in the repo um the batching happens by pulling off like the front call it 10 uh batteries again if you've got like a thousand batteries we pull off the front 10 uh update those and then stick those to the to the back of the list so that you know we're gradually moving through the lift um we in our in our production system it looks quite a bit different from this um you could have a strategy where you uh you know if there was an error you put it in a separate queue for for retries um and maybe do some exponential back off there um again that bit about uh you know there are always new fields that you're gonna add to your struct um yeah i would say we don't have a great answer for this we're still kind of experimenting uh with with how to do that thank you so much and we have one last one that i think everyone is thinking about this webcam [Laughter] so so i i bought the ridiculously overpriced uh lg 5k display uh that like only works with max and it has a nice built-in webcam perfect thank you so much uh they were asking for the repo but it's been answered on the on the chat so cool tyler thank you so much for this talk it was super interesting and everyone loved it so thank you for being here at league surgonf cool thank you all for your time see you

Info

Channel: ElixirConf

Views: 964

Rating: undefined out of 5

Keywords: elixir

Id: EZFLPG7V7RM

Channel Id: undefined

Length: 32min 31sec (1951 seconds)

Published: Sun Oct 24 2021