Keynote: Clarity - Saša Jurić | ElixirConf EU 2021

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Applause] [Music] [Applause] [Music] okay so thanks jim for the introduction and hello everyone it's uh refreshing to see people in real life you know it's been a while uh so anyway i want to immediately start with the talk uh so there's a lot of stuff i want to go through and the subject of this talk is going to be source code and this is something that does developers spend a lot of time working with you know a lot of our working hours are that take place inside some code base and whenever we have to work with this code whenever we have to change it you know we have to understand something about it because it's just way too big to fit into our brains uh completely all the time right like we're talking about thousands tens hundreds of thousands lines of code and so you know we constantly have to learn something about this code and even if we're writing a completely new feature like completely new piece of code we still have to somehow integrate it with the existing stuff and so again you know we're going to have to learn something about the code and how do we do that we read the code and uh you know gather some information and uh something uh interesting happens in that process so intentionally or not the previous authors on that code we read transmitted some information to us and transmitting information is a textbook definition of communication and so it follows that source code among other things is a form of communication between people not just between the human and the machine and given how much time we spend working with the code as developers i would say that it's quite a frequent form of communication between developers on the team and so it only makes sense to think about like how can we make this communication better more efficient if you will and if i could think of one property that i would desire from any communication then the property would be clarity right which is incidentally the title and the topic of this stock so we're going to explore the clarity of the source code and i want to immediately make it a little bit more concrete by showing you some code so we're going to take a look at some code and try to assess how well it communicates to us right the humans and i'm going to show you one program that solves a simplified version of one challenge from a previous uh one previous edition of advent of code and i'm not going to tell you what the challenge is though right so this is quite a realistic situation for me you know on multiple occasions in my life i would find myself in some part of the code where like even the purpose was not clear to me the original author has long departed the team and you know i had to figure it out just by reading the code and so that's what we're going to try to do uh by you know looking at this code uh don't be afraid it's like a short program fits onto one slide about 20 lines of code you know no macros otp stuff or something like that so surely surely we can handle this right and so uh uh let's see we have like this function run which we have to figure out it takes a single argument called x and it starts by uh duplicating uh number one x times and then it's going to convert this into an integer and uh you know then it's going to produce some elements right this is what stream iterate does the first element is this number we just computed and then every other element is somehow computed based on the previous one that's 3 meter it in a nutshell and so how do we compute the next element we take the previous element increment it by one get the integer digits out of that and then in that list we prepared number zero at the very top and then we chunk every two elements so in that list so the first and the second the second and the third and so on and so forth and then we're going to take these pairs as long as the first element of each pair is less than or equal to the second one then we're going to take the second element of the pair and this constitutes our prefix and now we're going to build our suffix by taking the last element of the prefix duplicating it n times for n is the number of digits in the integer which is the previous element uh incremented by one and then we're going to subtract the length of the prefix and this is the amount of uh you know duplications we do of the last element we get our suffix then we simply concatenate prefix and the suffix convert into integer and this is our next element and then finally we're going to take these integers as long as they're less than 10 raised to the power of x and we're going to count the amount of those integers so you know without telling it saying it out loud you know show hands who thinks that they understand like the purpose of this program and the algorithm that was uh chosen here the solution that the author chose and so roughly speaking this is approximately zero people so uh no surprises there right i didn't really give you a fair chance right so it would take some time to figure this out if i were to see this code for the first time you know i would stare at it for a while then i would sprinkle some debug i o inspect statements or run it with a couple of different inputs and you know probably something can be figured out here but i mean this is the problem right it now becomes a puzzle it becomes a challenge and we have to spend some considerable amount of time and energy to figure it out you know and i i have to confess that like as the original author uh i did a pretty shitty job here of communicating to people although in my defense i wasn't even trying right so i had like this mindset where i was obsessed about solving the problem you know i write the problem i thought about some algorithm how to fix how to solve it and you know then i just communicated exclusively with the machine and it took you know like this kind of mindset about 30 minutes of my time and about 20 lines of codes to produce a hard to work with legacy so i'm going to show you another program same algorithm different program and here the author actually took some efforts to communicate to people not just to the machine and so we start uh again with the run function takes a single argument called length and it computes or counts the number of valid passwords of the given length right and so one line in we already know the purpose of this program and this is something you will never be able to figure out from the previous version uh like you can stare at it until the end of time you're just not going to be able to get it because it's simply not there the idea of passwords and validity is simply not communicated at all you know one line in we already immediately know it uh so what are valid passwords of the given length we take the lowest or the smallest integer of the given length and the largest one so for example if length is 3 that would be 100 and 999 and then we cut then we get monotonic integers in the given range so what are monotonic integers you may wonder like we could google it or we could try to figure it out from the code but the author was considerate enough to summarize that it is an integer where uh each digit is greater than or equal to the previous one so 122 123 but not 121 for example how do we find these integers you know i start just below the lower bound and i find the next monotonic integer and the next and the next and the next as long as i don't exceed the upper bound how do i find the next monotonic integer so i'm not going to show you that code uh you've seen the bulk of it in the first implementation this is where most of the complexity lies but again the author was considerate enough to summarize this for us in you know too long didn't read short steps in plain english and most notably to note that this is a this is a constant time algorithm so somehow it's able to magically jump in a single step to the next monotonic integer no matter how far away it is from the given input right so no brute force scan and this would roughly be a small demonstration of what i consider to be clear communication to people right and i have to say that like more more recently i'm preferring to use the term clarity in favor of these other terms i mean these are definitely properties that we want from our code but the problem is that they're like very very vague very hard to define even remotely uh precisely and so what happens is you know they're open to subjective and creative interpretations and i have witnessed a bunch of discussions where one developer claims that like this code is uh readable or maintainable and another one claims it is not and now we reach an impess and like where do we go from here typically it leads to a very unproductive long futile energy training discussions you know and while clarity is not something we can uh measure scientifically i hope at least that it is a little bit more concrete and that it makes it easier for us to find some common ground because we what we're looking for here is like that as a reader who is reasonably fluent with the language of communication i want to be able to reasonably effortlessly understand the purpose of the program like what is even the problem this thing is solving and then the solution uh that the author chose right and out of this i get knowledge and with this knowledge i'm empowered to work further with the code you know to make some changes to take it further to fix problems to optimize it even to rewrite it so for example let's say that i found myself in this piece of code because there were some performance issues so you know circumstances have changed previously i had to support or we had to support like fairly short passwords only a couple of digits but now we have to work with like much longer password 100 digits or more and you know now the program is unacceptably slow and so now that i actually understand how the author solved the problem i can figure out okay this algorithm this algorithmic complexity is actually not sufficient enough for our uh for our scenarios anymore and so i have to think of a faster algorithm and this was so the original algorithm was not uh fast enough and i had to think of a faster one and this was the fastest one i was able to come up with uh i'm not i'm not going to have the time to explain it but essentially it's a linear complexity algorithm meaning that for example to count the number of 100 digit monotonic integers it requires only 100 steps where the previous version would require about 350 billion steps you know so blazing fast pretty short uh i don't think that it is clear and i don't think that i can clarify this through code alone you know i think that i would need like a large block of pros of comments you know with some step-by-step example illustrations and whatnot um but anyway you know this is a full rewrite of the code but the reason i was able to choose to make this decision right and to make this rewrite is because i was able to understand the original program together with its shortcomings right and so this is what clarity means to me and more generally the benefits would be it makes us more efficient and effective which means that we you know we can gather information faster we simply spend less time you know trying to understand something because it's better communicated and we uh get better knowledge the information is transmitted with higher fidelity if you will which means that we are less likely to uh to you know miss some very important but implicit non-obvious relationships and dependencies and therefore we are less likely to introduce unexpected bugs or performance regressions and so again essentially you know we're faster and better surely that's a good thing right and on top of that clarity empowers the team right and so uh we move away from this mindset uh where like only the original author is qualified enough to work with the code they wrote to the point where everyone on the team is confident enough to work with any part of the code and so this assists with team dynamics so this is something happening quite frequently in our industry you know people are coming and going uh you know people may move to a different project they may switch to a different role like say manager role or people move to a different company and so if the code is clarified enough then the negative consequences of people departing the team are significantly reduced and the onboarding overhead of new people arriving to the team is also reduced again because everything is you know communicated much more clearly and so to me these are like benefits that i would want from any sort of a project which is developed as a team effort over some longer period of time which i think describes any like real world project be it commercial or open source and therefore to me uh clarity is an essential property you know something that i want to have inside the code base and i would give a very high priority to it you know and so this begs the question like how do we get this thing so we have to you know we're not going to stumble upon it by chance you know we have to make some efforts to make it happen and we have to invest some effort constantly right so like forget about the idea that you're gonna ship features uh for a while and then you're gonna spend like uh every x months a couple of weeks clarifying your code like that's not gonna happen you're paying lip service to clarity nothing more than that you know you're going to end up with the same old uh you know obfuscated convoluted legacy mess of the code no one ever really wants to work with right so if you really want clarity you have to invest some part of your effort continuously and the key practice here in my opinion would be the practice of code reviews so uh the first and foremost you know the focus the primary focus of the review process should be on clarity right because everything else comes out of clarity like how am i even supposed to you know spot some problems issues deficiencies missing parts of the implementation missing missing tests and whatnot if i cannot even understand the code right that's not going to happen and the practice basically loses its purpose so you know clarity is the primary focus out of that it follows in my opinion that pairing is not really a proper substitute for reviewing because pairing as the title says is programming and uh therefore you know both members of the pair are programmers right and so what this means is that like they both participate in the programming session in the journey and programming session you know it's not just the act of mindless typing uh it is uh you know the act of exploring right so you immerse into the problem and you try out different approaches different venues and by the end of that session you develop a very deep intimate knowledge with the uh with the solution and with the problem as well right and so this is everything is naturally clear to the author you know and so this is the luxury that no other uh other reader possesses including the future version of the author and therefore you know to review you need a fresh pair of eyes and so basically self review is not going to be sufficient enough and pairing to me is a form of self review you know the pair programs and reviews together so we need you know someone who did not participate in this journey to actually assess the outcome of uh the final outcome of that process for clarity uh and the process of review is going to introduce some tension in the overall development process right so the thing is that we want to merge as uh you know reasonably fast uh because merging means progress right and progress is what we're after because without it we're out of business uh but at the same time we want to clarify uh as much as required and we're gonna postpone our merge to do this right because clarity also means progress it means subsequent progress right it's not like we're building the current set of features and we're done with it forever you know we're actually gonna have a quite a long road ahead of us and we don't want to compromise our journey or our ability to progress down that road right this is why we want clarity you know and so uh how do we resolve this uh apparent tension uh essentially in my opinion the author and reviewer should work a little bit more closely uh with each other help each other to make this process effortless right so let's start with the author so uh uh beyond just implementing uh the feature uh an important task of the author is to make the job of the reviewer as easy as possible because this leads to fast review fast feedback and good quality feedback right so the reaver is more likely to spot some problems so uh first and foremost do uh try to make your uh pull requests reasonably small right uh because this is going to be like the theme of this entire talk you know us humans are pretty terrible at dealing with a large amount of information thrown at us at once at the same time and so instead you know what we need is to for this information to be presented to us more gradually more incrementally in some reasonably sized chunks which we can uh understand or study in isolation right and so talking about pull requests you know a frequent misconception that i've noticed is like people conflated with feature right the whole feature doesn't have to be implemented as a single pull request you know if if the implementation requires a large amount of changes then split it somehow into reasonably smaller reasonably sized uh smaller pull requests the same thing holds for commits you know like if you give me one commit which packs 10 different types of changes together with a couple of refactorings and optimizations strong for good measure i'm not really going to be able to figure it out or even if i'm able to understand something by the time it happens you know i i'm mentally wasted and i have nothing useful to give you you know so present it in smaller steps and most notably try to make your history linear right so uh the thing is again you know programming is a journey right and it's not necessarily a straightforward journey from point a to point b you know it can be quite long-winded as you're trying out different venues and approaches and uh you know try to hide away the details of the journey from the reviewer as much as possible because it's mostly useless to them and it's mentally very texting for them right so clean up your history as you move along clean up your history before you submit the pool of the pull request and then occasionally very on very few occasions but it happened to me like the history was beyond the point of repair and then what i would do is i would recreate the change in a more linear fashion right so now at this point i actually have the working version version that i want to submit you know and i'm just going to linearize the history by recreating it again you know and i'm doing this not to make myself look perfect but to make the job of the reviewer easier because again it leads to faster feedback and more quality feedback um by all means do self review as i mentioned it's not going to be enough but you know you can sell freebie and clarify as much as possible and this may help the reviewer tremendously now this attitude cuts both ways so as a reviewer be mindful of the fact that uh you know everything is super clear to the author and so your first task is to actually point out places where you had problems understanding you know like don't don't be shy to say like i had problems understanding this part of the code i wasn't able to understand this function at all you know you're not the problem there even the author is not the problem you know the code is the problem this is what we want to fix right but going beyond that you actually have a better position now because you know again everything is clear to the author and so you are i would say more likely to make some better suggestions about how to improve it so don't be shy to make some suggestions as code snippets and what i occasionally do like when i'm not completely sure what's the best way to do to go around you know as a reviewer i'm going to do a refactoring session on my own and explore it you know so check out the branch do a refactoring session if i'm happy with the outcome i'm going to submit the pull request on top of pull request and so the rules reverse for that brief moment or period in time and then one thing that i feel that many of us are uh not doing frequently enough you know definitely not me synchronize right so not everything has to be done asynchronously through pull request comments uh sometimes it's more efficient to reach out directly to the author and you know try to discuss problems and various ideas so through uh chat or voice pair by all means you know pairing is most certainly a wonderful technique that can complement the review process but in my opinion not replace it so anyway this is what i like to call collectively a collaborative review process and i feel like many teams are perhaps not practicing that style so give it a go and see how it goes i think you might get pleasantly surprised either way you know let's turn our attention to the code right so uh how can we make it make our code clear you know what is this secret sauce and i would say that we could easily spend this entire conference you know discussing various patterns techniques approaches architectures and whatnot uh but i think that uh there is like this very small simple idea which when properly followed naturally leads to many good properties you know such as good modularization with high cohesion low coupling uh good well-intentioned revealing names and by following this idea we naturally can arrive to you know design patterns and architectures which are good fit for our particular scenarios and this idea which i'm going to discuss next is called separation of concerns so uh separation of concerns is a phrase attributed to edgar dykstra from it appeared originally in his 1974 essay called on the role of scientific thought and this essay is not about source code at all you know it deals with some higher level programming topics but definitely not source code and uh it's essentially you know an essay about the style of thinking and so the way i would summarize it the way i see it is what extra tries to say is like when he is studying some subject some material he wants to be able to look at it from different points of view from different aspects in isolation from all the other aspects right and so nothing particularly groundbreaking here you know this stuff has probably been known to us like since ancient times uh you know uh it's a basic breakdown of a problem or a divide and conquer of some sort if you will you know even dijkstra emits as much you know at the very beginning of the essay but the thing is you know like this is described this whole essay is written from the standpoint of a student you know not like a college student but from standpoint of a person studying some material and assuming we agree with this approach which i most certainly do then as writers uh in this case as writers of code what we want to do is we want to split the code into parts which can be understood in isolation again nothing particularly groundbreaking not so big idea really but i feel that we very frequently lose the site of this idea you know so most notably not every partition of the code actually fulfills that goal right so if i have something somehow like a program big program and split it into two parts which i always have to understand together fully then i haven't you know reached that goal and in fact i have made things worse you know because every split also adds some mental overhead or if you will you know uh abstraction is not a zero cost abstraction right so you know we have to actually reach that goal so uh let's see some concrete examples uh suppose we have to implement the following system a very small system you know we only have to support two use cases so a user can register with an email and a password and the registered user can log in with the correct email password combo and this let's call it ambitiously uh requirements document already practices itself uh separation of concerns because it does not deal with you know a bunch of other things such as user interface the fonts the colors accessibility mobile versus desktop it does not deal with networking protocols delivery guarantees and so on and so forth all important things that we need to solve in order to build the full product but we just cannot handle all of this at once you know we're just going to crumble under that mental load and do nothing useful right and so here you know we are focused first and foremost on the core behavior of the system the very reason why we're building the system in the first place and so uh to mirror that in the code you know i would separate the code into these two regions this is what i would start with so i would have the core layer which implements the core behavior and on top of this i'm going to stack the interface layer which exposes the core to the external users and clients you know such as browser rest client graphql client uh phoenix channels live view and so on and so forth you know whatever delivery mechanism you want to use and i'm not doing this because i read it in some book or because some thought leader said so but i'm doing it because these are two very large parts that can be separated from each other and understood in isolation from each other you know you can most certainly work and understand the interface without caring about the internals the ins and outs of the core implementation and vice versa right so this is the general idea so uh uh okay let's see some code um so far i have been using the phoenix namespace convention so for me core would typically reside in my system and interface in my system web uh more recently i'm toying with the idea only in my mind uh to go for my system.core and my system.web dot interface uh because it maps precisely to the way i uh regard this code but i haven't tried it out you know so this was just a small digression uh doesn't really matter here because this is the last death module that we're gonna see today so anyway top level core module has the register and login functions corresponding to our two use cases uh signature wise don't be bureaucratical about it you know whatever works so like you can accept individual parameters if you're dealing with two or three parameters obviously if you deal with like a larger amount you're going to accept them somehow you want to bundle them somehow rather than have a function that takes 20 parameters and so let's focus on this case right so we accept params in the core as a map let's see type specification so the main thing to notice here is that the data coming into the core is well shaped well structured so this means that keys are atoms and we are dealing with a well-defined set of keys right so it's not a freeform map values are properly typed strings are strings integers are integers booleans or booleans and so on and so forth and then we have optionality represented as well so email and password are required date of birth is optional and this to me is a small demonstration of what i personally consider to be the most important benefit of types and this is clarity types clarify they describe more precisely the nature of the data coming into the function and out of it and i find in practice like with a good chosen name and a good type specification i can frequently build a very good intuition about the function without having to fully read the documentation or god forbid the implementation right so that that that would be the main benefit of types for me um so anyway you know here from the type specification we can actually see as potential clients how to use this code how we can pass something what should we pass into this function and so let's see one client so this would be phoenix controller the standard action handler so we take con and params and params here is a free format which means that keys are strings some required keys might be missing some keys we don't support might be present values could be weakly typed right so if you're dealing with query params of the url then all the values will be strings and we have to deal with all this uncertainty someplace and it's pushed outside of the core which keeps the core more focused to be clear it does not reduce the amount of lines of code in the core but it just keeps the core clearer there's less uncertainty there right we're dealing with this in the interface and so we have to give the structure to this and this is what i like to call normalization to distinguish it from validation right so validation to me is about you know enforcing business rules and constraints normalizing means just giving structure to unstructured uh you know data and uh how do i do this uh i define a schema where i describe the fields i expect and uh the types and optionality and then i invoke a helper private function called normalize uh which unfortunately i don't have the time to show you uh but this normalizes going to normalize the params according to the given schema and basically it's based on schema sector change sets right it's a generic piece of code not particularly long uh you know if i advise to try it out it's a nice little practice you know even if even if you're not going to follow this style of programming you know just give it a try so anyway we normalize and we get the structured params which we can now pass to the core and you know then we render success or an error so to me this is a more refined idea uh that follows phoenix context you know it goes in the same direction only goes somewhat further and i want to contrast the two uh approaches because i think there are some interesting points to be made here so as a refresher in phoenix context this is how the controller would look like you know you take these params and pass them immediately onto the context function as they are right so way less lines of code here the context function is not going to be any larger because of that so in general less lines of code you know we can build our features faster we are more productive and these are very nice benefits but these benefits have been obtained at the expense of clarity and and this is something that i have observed you know through experience a lot of times it's not a rule but frequently uh various abstractions and ideas that promise productivity and reduce lines of code do so at the expense of clarity and so if you care about clarity do assess them from that point of view right so let me try to prove this so this is the type specification of the context function and immediately from this specification we see that this uncertainty has been now pushed deeper right and so as a potential client if i want to use this context function what should i pass inside you know no idea i have to read the full implementation of that function including every single other function it invokes such as the changed function from the schema module and so in order to understand anything i need to understand everything and the role of modularization of separation has not been met you know and so this to me is probably way less clear way more obfuscated but it does not mean it is bad i actually completely agree with the phoenix idea right so we got to get out of this absolutist mindset where things are universally good or universally bad you know so we have to keep in mind that phoenix onboarding documentation and uh generators are all about the onboarding experience right so bringing a person who doesn't know anything about phoenix to the point where they can work with phoenix do something with phoenix right and so it only makes sense you know to make these uh more lightweight choices if you will you know because imagine if like the onboarding material taught phoenix together with ddd even sourcing cqrs hexagonal architecture microservices like who would be able to understand anything out of that and in a sense this is a good example of separation of concerns you know the onboarding documentation the onboarding material concerns itself with the onboarding right as it should you know but then it's up to us the recipients of information to understand it like okay one thing is to you know build something with phoenix there is another thing to produce the code which can be further maintained by the team not just by the original author right and so if we just blindly copy paste stuff from the documentation without you know critically thinking it then we're effectively cargo culting you know so try to be always a little bit more critical about any ideas you're presented with um so anyway let's let's go a little bit deeper so the implementation of the core function uh one possible implementation again is follows the blessed way so to speak as proposed by documentation slash generators so we instantiate the user struct which would be a nectar schema then invoke a change in function and then we insert this into the database so this is one of the very few places where i strongly disagree with uh decisions made in phoenix so let me try to argument this basically if i were to this if i were to translate this into more plain english it would read something like prepare user for insertion for the purpose of registration then insert right and so this is a very long convoluted way of saying insert user right this prepare blah blah blah means nothing to me you know it's just noise it's not something i can relate to it it actually leaves more questions than it answers it answers zero questions to me you know and so again you know what i find in practice i always have to read this code 100 of the time together with the change in function and vice versa i can never really fully understand the function isolation so again the role the goal of separation has not been met you know it's a split of some sort but not a separation of concerns in my mind and so what i would do and what i would suggest starting with is you know i would just stash this stuff together directly in the core code you know there's not a lot of code here it's best understood really if it's uh put together and we have no other architectural constraints you know forcing us to split it so this is what i would start with and then i would you know split it when i have some reasons when the code grows so for example uh let's say that we have to send an activation email on successful insertion then i might end up with something like this and now these lower level concerns are naturally emerging driven by the actual requirements and the big point i want to make here is that like concerns are best chosen by looking at the actual material we're dealing with not by waving our hands through the air randomly and citing various random principles you know that's going to be very counterproductive in fact and so another example of this line of thinking uh so going back to the architecture level we have two layers currently interfacing core and i'm calling it core because it's not pure business slash domain right so we're dealing with some business class domain things such as you know use cases flows uh business business domain or domain level rules and constraints and then we deal with infrastructure as well which means that we have to interact with services that we're using ourselves right to provide our own service so for example database most frequently mailer service notification notification service payment gateway infini dash you know whatever you decide to use and so surely now this looks like these are definitely two concerns which we should split you know but it really again depends because the thing is that splitting business from the infrastructure is going to lead to more ceremony you know the similar i would say actually the larger amount than the amount we already introduced in the initial split uh and so this is your overhead right uh you're going to go for like either three-layered architecture or dependency injection-based architectures like ports and adapters aka hexagonal or clean architecture or onion you know whatever and this is going to add some amount of complication in the code and so it's got to be worth it right so like in our current system you know obviously the code is way too small but the really important thing is that the business uh logic is really thin no no particular complexity there right and so if we split that kind of code you know essentially the most complicated part to understand about our code is going to be the architecture itself which kind of defeats its purpose you know so uh more generally i would say that like for let's call them agency style projects you know where a bulk of your core is dominated by infra and you have relatively thin you know business behavior i think that this is a perfectly fine approach and this is what i would go for and this is what i would start with right uh but obviously you know if you're like working on more enterprise domains uh i don't know you're dealing with uh insurance policies loans mortgages and whatnot you know you have more complexity there then of course the split might be worth it so again you know consider the actual situation you're dealing with you know don't just do something because some thought leader or authority said so you know even if that person is me you know so just think contextually you know think inside of your situation so anyway the last thing i want to talk about are tests right and so this is a one place where i think that the battle for clarity is frequently and very rapidly lost and so i want to start by talking about the purpose of tests you know why are we even writing them in the first place and it feels really strange that i have to say this but you know here goes in my strong opinion tests should be used to test right and if you're considering or if you're using them for something else do consider giving them a different name and i mean this in the nicest in the most most constructive way possible because different name clarifies a different purpose and different purpose means different choices right and so uh here i'm going to focus on tests for the purpose of testing and what does that even mean you know system we're testing the system and system is all about the behavior right and so what you want to do is test behavior not implementation details this is one of my favorite programming quotes by one of my favorite technical talks by ian cooper tdd where did it all go wrong highly recommended and then in the same fashion as a more in-depth exploration i also recommend this book by vladimir klorikov unit testing principles practices and patterns very good read right and so what does that even mean to test the behavior so roughly speaking you want to focus less on lower level abstractions and more unlike things which are actually which makes some sense which are concrete and tangible for external clients of your system or external users of the program right and so use cases would be a prime example here uh for example testing the behavior of registration testing the behavior of login and so on and so forth you know because collectively use cases fully describe the behavior of the system now uh making exceptions when you have good reasons is perfectly fine you know so like if i had some complex abstraction deep inside some layer i may decide to test it directly for example for performance reasons right because the test is going to run faster and also usually or frequently it may be you know simpler to write the test which tests the abstraction directly rather than going through a bunch of layers you know and i might also be able it may be possible for me to trigger some execution pet which is otherwise impossible to trigger so if you have like good reasons definitely do so but like as a guideline not as a rule as a style of thinking my first consideration is always to test as much as possible through use cases right because this is how i'm focusing on behavior do not feel compelled to test each module or class in o directly right so vladimir described this nicely in in the book you know the unit we're testing the unit that we're looking for is the unit of behavior not the unit of code right and so it doesn't really matter if that unit of behavior is powered by a single module or by multiple modules you know that's the implementation detail right so for example phoenix view to me is always an implementation detail of the controller and i rarely test it if ever directly thinking about it you know i don't even test the controller directly so what i do instead is i build the cone right i'm using the phoenix test support to build a con and go through the full uh pipeline uh so it exercises that single test exercises the endpoints various custom plugs the router the controller the view and to me it is still not an integration test it's a test of a behavior of the public api right so this is the way i see it highly related do not feel compelled to obsessively mock or stab away dependencies of the thing you're testing right so uh ian and vladimir describe it that like in the classical school of tdd when we talk about isolation the thing that has to be isolated or the thing that we want to be isolated most of the time is a test itself not the thing being tested uh test is isolated if its execution does not affect the outcome of other tests in the suit right so this is what you're striving for you do not have to isolate the thing you're testing right that being said of course if you do have good reasons again use test doubles so for example mocking away uh non-deterministic stuff like remote network dependencies or time perfectly fine you know mocking away global shared state also a good example this is by the way how you isolate the test because uh multiple tests uh depending on the global shared state may affect each other's outcome right mocking because you will get some significant performance gain also perfectly fine you know but don't feel compelled to mock everything away right this is a very i would say overly simplistic view and perhaps even the wrong view of what has to be isolated try to avoid directly depending on internals right on internal structures such as socket assigns inside channel tests or live view tests or a database structure inside most tests this is a very frequent example you know these are things which are very prone to change so like i change the database structure adapt the implementation my code is working my program is working correctly and i have 100 tests failing which i have to fix frequently one by one right this is very very depressing and demoralizing and uh more generally i would say that uh in my sentiment feeling you know uh tests the more the tests are coupled to the implementation level details the more prone they are to both false positives and false negatives meaning they are more likely to fail even though the program is correct but what's even worse is they are more likely to pass even though the program is wrong right and this reduces my confidence in tests and confidence is the most important property i want from my test it's like the very reason why i'm writing them in the first place right and focusing on behavior increases our confidence that uh system is working as expected yes tests cannot prove the absence of bugs so i'm not going to get 100 confidence but like most other properties in life confidence is non-binary right including clarity you know clarity is also non-binary so you can have more of it and you can have less of it and in this case more is better focusing on behavior increases that confidence this style of testing also improves clarity of the code first and foremost because we're going to use significantly less test doubles and these things complicate the code the production code and the test code but more importantly tests which focus on behavior play very nicely go hand in hand together with the practice of refactoring because refactoring by definition means changing the internals without changing the behavior and so if you've done everything right in the refactoring session your tests are passing if you've done something wrong some tests will fail which is precisely what you want and of course refactoring is a very important practice which allows us to keep our code in sync or clear enough uh uh with respect to the ever-changing world you know as driven by the constant stream of requirements coming our way uh test code is code of course it is code you know test code is code which is uh maintained and managed by the team right and so it should be held to the same level of standards with respect to clarity as the production code and i would say that we even maybe want to go like one or two steps further because it's annoying enough when a single or some tests are failing it becomes downright depressing when we have to spend some considerable amount of time trying to even figure out the purpose of the test and like is this even a bug in the program or uh should i should i fix the test you know so try to avoid that clarifier test as much as possible so a couple of short examples so here i'm testing the behavior of the public api uh registration operation happy path so i build the valid input and then i use phoenix as support to build a con make a request get the response decoded and assert something about it four lines of code you know uh pretty short and sweet i would say if you're familiar with phoenix and testing you know it should probably be good enough for you you know but there is already some amount of mechanical noise here and this is going to add up the more tests i write and uh the more i have to deal with like uh more a larger amount of parameters and so what i want to do is i want to aggressively push this stuff uh deeper away from the test and i'm going to end up with something like this so i'm going to introduce this register helper uh it starts its life as a private helper function in the test case module and then eventually it might make its way uh to be a part of some public api of a test support module and so this register function does with all deals with all the mechanics right so it builds a con issues a request decodes the response and as an added bonus it's also going to atomize the keys you know something something i can safely do here because i'm in the test code and this reduces the noise from the test and so with that helper in place i end up with like this piece of code which communicates exactly that which is relevant for the purpose of this test nothing more nothing less and this to me means clarity another example so testing that user cannot register with an email of an already registered user so we need a registered user in the database first and i'm going to create this user by uh going through the api public api is fine core api is fine try to avoid as much as possible inserting directly into database because again this will couple you to implementation details and what's even worse is you're going to end up inserting data which is actually invalid according to various business rules and once it starts failing which it will i wish you a lot of luck because you're gonna need it right so try to avoid it as much as possible so here i'm going through the public api uh simply because it's more convenient i just built a helper function for that although here i had to build you another small wrapper around it so register with the bank which raises if the operation doesn't succeed so this is my precondition here i'm not testing this first registration right i'm assuming that it's going to work uh most notably uh do note that uh i'm only passing the email parameter right and so i typically build these helpers to uh provide the default valid values using uh uh uniqueness where required you know and this allows me to remove a huge amount of noise from this test you know like here it's not so dramatical because we're dealing with just two parameters but more realistically we're gonna deal with like five ten twenty parameters and if i had to specify all of those here in this test you know it's a lot of noise distracting noise uh it's very hard to figure out like what is actually relevant for this test and email is the only thing that we care about in this particular test and so with all that in place we end up with a nice bdd like given when then test so given a user registered with some email when another user tries to register with that same email then the operation will fail with the following error and in the same fashion uh uh testing the login operations so given a user register with some email password combo when the user tries to log in with the same email password combo then the operation will succeed and this is an example of how i'm decoupling myself from implementation details you know so this test exercises the storage mechanism the database right so register is going to store something into the database and the login is going to read something from the database uh you know but i don't care about it in the test you know so this test exercises the system using the core vocabulary the abstract core vocabulary a registered user can log in not like if i insert a user record in this particular shape and invoke the login function with those parameters then this will be successful like what does that even mean you know it's way more confusing to me and actually gives me sparks it sparks less confidence if you will right so in summary uh source code is a frequent form of communication between the team members and if you want to make the communication more efficient you know clarity is the property you're looking for dedicated to clarity focus on clarity this is what you're after not satisfying some random set of principles or winning the architectural award of the year you know all interesting goals probably but that's not what we're after we're after clarity because it means progress right practice code reviews uh focused on clarity and uh collaborating those code reviews know help each other work with each other to make this process efficient and uh you know effortless as effortless as possible organize your code into pieces which can be understood in isolation from the rest this is the separation of concerns done right not every split of the code is separation of concerns no matter how people are you know trying to frame it or spin it right test behavior not implementation level details and then as a sort of uh underlying theme of this entire talk you know uh clarity is not something you can reach by just following some set of rules mechanically you know it's a more creative process not in the artistic sense but in the sense that like you have to consider the actual situation you're dealing with and so try to avoid any absolutist dogmatic reasoning you know think more contextually think more critically and i think that you should be fine thank you very much [Applause] [Music] [Applause] all right thank you very much uh we have some q a okay mike's not working um and so let's see here it may be easiest if um folks who want to ask questions here come up and cue up here to ask questions and while you're doing that i have some online questions so barbara help me out here we've got the first question um and that's sergey is online i think if you unmute you can ask your question and we can all hear it nope okay so it's not going to work we tried it and it worked but i'll read sergi's question um on what level should clarity be implemented variable names function names or function doc blocks what was the lesson on the function it says doc blocks function i think even use function blocks ah function blocks well i mean it should be ideally implemented on all levels right but of course obviously uh we wanna we're gonna gain more at the higher levels of abstraction uh so this is what i would focus on first you know so so like separating interface and the core for example already uh brings us a lot of uh useful stuff and when we're talking about low level implementation you know uh you definitely want to have good names uh a frequent misconception which i also did myself is that uh you know we necessarily have to split things into function functions you know this can actually also uh add some problems because we have to jump back and forth so adding a good meaning variable name instead of extracting something to function can actually also help a lot and so i don't know if that actually answers the question but no i don't know i appreciate that answer though you wanna try again or do we wanna okay so let's do the next question and that's uh by uh d hutch so if you're online and you're you should be able to unmute and ask your question all right i'll ask it if unit tests are to be considered as units of behavior does this not imply that only happy path explicit use case functionality should be tested um does this imply that only happy path use case functionalities should be tested well i would say no you know so it depends on what a happy path is you know so for example to me uh let's talk about registration like uh if i provide an email of an already registered user so this is obviously rejected to me this is actually a happy pet in the sense that uh the program successfully uh rejected the business error right uh and so this is definitely something that is part of the behavior and probably should be written inside the use case as well and so this is what i would test uh if there is a an exception because you know we have a bug or maybe some input is provided which we don't even support then this is not a part of the behavior and this is not something that you should want to test explicitly you may want to if it gives you more peace of mind but i most most certainly wouldn't uh or as a as a guideline i wouldn't really go for uh testing stuff which i don't even support which is which are not the part of the use case but uh business kind of errors or any other sort of errors uh that you know we actually plan for and uh somehow write them up in our use cases are certainly things that are part of the things we should test yeah all right thank you very much okay next question this one's got 14 thumbs up let's say you dive into a code base not following most of the tips you shared here what should be some of the first changes you'd suggest that you feel bring the best bang for the buck for unclear code basis no one's ever dealt with unclear code bases right this one can i can see why it has so many thumbs up right uh probably many people can relate with this uh well i guess that i would first start with i mean defining some practices around you know something i actually wanted to talk about but obviously i already crammed too much stuff in uh this talk so you want to set up practices of code review and practices for example automated checks for uh all the low-level stuff like formatting or style guide enforcing through linter in this case it would be credo right and so those kind of things would be the first thing you know this is not gonna fix the problem at all you know but it's at least gonna remove some mechanical uh noise and it's uh it's like a low hanging fruit which we can do other than that you know uh we would have to establish some sort of uh like the way we want to go forward what are we striving for and then gradually try to move uh this this thing away to that you know uh what i definitely do not recommend as the first consideration is a big rewrite right so people usually you know think that this is the easiest way but ultimately on many occasions i have heard you know stories that they end up with just uh how is it called the second system effect right or where basically you know repeat the mistakes of the first or or make some different but same kind of mistakes uh as the first version right so try to do it incrementally through uh practice of refactoring when you're adding new code try to follow these new ideas that you want to enforce and then gradually you should be able to at least to some extent you know clarify it all right thank you another question here what strategies do you suggest to minimize the complexity of split testing your behavior how would you go about testing this behavior of split testing what strategies would you suggest to minimize the complexity of split testing your behavior of split testing the behavior um i don't know what that what that means does anyone have a clue here like ah like a b testing oh wow that's a that's something that i would have to consider before answering or think about it for a while yeah so so this one is really interesting i mean at some point when you do have like uh two things right two two different versions of supporting both are part of the behavior so definitely i would say that you want to test one and another right as well so how is this done operationally i must confess that i never did it so it is a little bit it can be a little bit tricky but you can most certainly do it right and this is what you want to try for because it is after all the part of the behavior for some people it works like this and for others like that right okay thank you um are there any questions here from the live audience uh yeah once you come up here and we'll ask the questions so you can stand here and i'll hold the mic so you don't have to touch it so sasha thank you for your talk uh question about talking about git and linear history what other histories do we have i'm sorry what other histories do we have in it oh the opposite of linear history is like uh if i try some i'm solving some problem let's say that it's a more complex problem so like you know adding a new field to the form or something more involved so you know i move along for a little bit and then i figure out oh this is actually not right and so i'm going to try to move somewhere like horizontally more and then you know i'm going to try there and then i figure out oh actually the first idea was good so i'm not gonna return there but i'm gonna just you know continue on the journey along the way and it like goes you know uh oh in a long-winded way you know sort of yeah well they try to do as much as possible uh uh as i move along uh when i'm making some change i try to think like can i actually amend the history immediately so that it's always as linear as possible you know rather than but you know some problems are more difficult and uh you ultimately cannot focus on all of that at the same time and then you know i just gonna do a bunch of you know commits which make complete nonsense you know not even to me uh let alone to anyone else and then what i'm gonna do is uh either try to you know somehow uh somehow squash some of those commits or uh when this is not possible then i'm just gonna recreate this change again you know but i'm not gonna like type it again i have the working version of the final solution and then i'm just gonna you know try to reapply it uh more incrementally in a branch you know more linearly if you will does that make sense and adding to this point uh i am a strong proponent then of uh explicit merge commits because it really keeps you know the documentation like these fine grained commits actually explain uh better uh the way the thing was implemented while uh the single pull request explains why we had this feature like the bigger context you know next question next question i'll hold the mic okay uh cool uh hi sasha thanks for uh for the talk it's clarified a lot pun intended uh so i wanted to say something and ask something uh first thing is uh about the split between the the interface and and the core uh that that you presented i think that's a that's a very valid idea we saw that work pretty well when we actually started to add a second interface uh so we've actually started to add like a grpc interface to some of our services and some of the logic underneath had to be the same for or similar for the h speakers and the grpc jrpc is uh is highly typed so uh so so it was good to kind of set the types above the the core so so that worked worked well for us what i wanted to ask is um and you you mentioned like uh these cases where somebody uh sends you a huge like pull request and i'm sure like everybody can relate like at some point in their life they got like 2 000 lines of code to review and i'm probably guilty of creating some of those myself so my question is like do you have a i don't know a thought experiment or a mental model that sometimes somebody can follow to actually split these uh these pull requests into smaller pieces because sometimes i i got the feedback from from the person that created the pull request that they you know they just kept adding and they didn't know where the boundaries are so do you have anything to like a thought experiment um well other than you know some gut feeling i don't really you know i just uh when i program myself uh you know sometimes you can decide up front like okay this is clearly going to be like huge change and so let's try to organize into some smaller pieces but uh it actually more frequently happens to me that you know i'm just coding and at some point like okay this is getting out of hand and this is the point where i'm just gonna try to stop and see what is what could i wrap up which actually makes sense at this point and submit for review you know and uh you know in that pull request i'm going to say okay this is the first of end pull requests implementing this feature uh where possible link to the task and then i try to explain the further strategy at a higher level you know but let's start with this and uh uh but other than that i don't have any particular mental uh mental model or something organized it's more like you know uh as i said creative you know you have to take a look at the stuff speaking about your comment it's a very good comment and this is something that i would actually advise to people like when you want to consider what how to separate interface from the core think that you're having like a second interface uh you know uh we also had these situations that the previous company i used to work for we had to support like the regular rest together with postgresql interface so we we pretended to be a postgresql database you know uh and so that was quite interesting and this revealed quite a lot you know so just imagine that you're dealing with a second interface and this kind of helps clarifying what is actually the role of the interface versus what is the role of of the core you know so for example if you deal with rest and graphql graphql does normalization itself you know so you define the schema and then you will have absent doing these normalization stuff right so that's like a very good comment thank you hi thanks for for the talk um can you go to can you go back to the slide when where you test uh registration with a already used email uh i think a second i cannot currently or okay okay so let me uh in the in the meantime talk and ask the question so we try to follow that kind of similar approach to testing uh to make the test more clear now imagine in the uh in the in the in the application that oh that one yeah perfect so imagine that in this system the user can then like create a post and then uh then they can even comment on their own post and then they can like some other person's uh posts and whatnot uh meaning like this like the model is kind of pretty complex right so uh and they're also gonna test like a behavior that says the user can unlike um some other person's post right so to make the test more clear the pr the approach that we're taking is like we would say at the top of the test like hey give me a comment um right and what would be your suggestion uh for a situation where the model is complex would you have like a explicit setup where you would say like register the user create a comment post and whatnot i would do exactly what you you would do right because uh now these low level operations start to become your assembly for for uh the stuff you want you know you should always clarify what it is that you want but definitely not like do this then do that then do that and then like it goes on and on and on and you have like 20 30 lines of code before you actually have set up your your like the uh what is it called the arrange phase in the uh aaa pattern right so uh i would do precisely what you said okay cool and would you did you consider like a more like a declarative approach versus this imperative like i would say like i want to have like a comment instead of like like make create a comment is do you think there's like any difference there yeah the thing is like how would you describe it in plain english to uh another person you know uh would you go all about these steps or would you just say you know i need this kind of comment and then i want to uh check something with that prerequisite in mind and if that's the answer then this is how i would do it right if that answers the question yeah thanks okay so as the last question is thanks for the talk session i also have a question about testing i think the idea about about testing i think the idea of testing the use case is really interesting i think uh in the example of the user test a lot of the tasks we actually try to query database and see if the user is actually created and do you have confidence in the test when you don't actually test that would you say that maybe there's still a success response but the user is not actually created okay thank you thank you for asking it because i was thinking about adding the discussion about that uh uh you know in the actual presentation so just to clarify uh the question is like uh you know i have this test actually now maybe i can still move here uh we have this test right and so i'm not actually checking that something stored in database it's kind of implicitly checked right so this test could easily be defeated so for example if uh i don't know uh process dictionary is used right or ets table you know and so uh the test will be passing but we're actually not storing something into the database but i'm not in the business of defeating tests so there's that otherwise you know other other practices such as code reviews should be able to establish that and then finally you know this is not the only testing you should do you have like end-to-end tests uh various other quality assurance tests which should uh you know basically verify uh those kind of things but i would i would not bother about actually checking that something is stored in the database right so if you there's an integration test it should definitely find the use in the database because there's a higher level integration test that would actually do the check some sort of an acceptance test you know which is done on the real thing on the deployed thing should actually find but again i would not directly uh check it like have a query database query inside any sort of test if you will you know because this is an implementation detail to me sure thank you hi thanks oh sorry thanks for the talk um i wonder if you have some sort of strategies or patterns on how to introduce the clarity concepts that you have showed us to your development team i'm sorry on how to on how to introduce and encourage the development team to use the concepts of clarity you may want to start with some high level presentation you know such as this probably you know delivered over like a double amount of time uh and just you know explain this idea and you need everyone on the team should agree right so this is one important uh thing you know like uh if we cannot reach consensus of roughly what we want from our code you know then we're going to be a pretty bad team you know so we have to find some common ground like what kind of things we want and so you can uh do some high-level uh discussion about it maybe with some examples so everyone is on team and what you want to go from there is i would start with code reviews this is to me the key practice that you know in everything comes out of that i see so many rents about code reviews uh by the thing that people are uh maybe you know i don't know what what are the reasons actually but what i can say from my experience i've been doing them for the past seven years at work you know everything was reviewed and it's one of my favorite practices had predominantly great experiences so if you do like these collaborative style reviews where everyone tries to work with each other uh this is the practice which enables everything else and so from there you're gonna see organical problems rather than discussing some abstracts right so you're actually going to see the problems you're dealing with and then you're going to try to see okay how could we clarify this and the author and the reader may want to work together obviously of course pair especially initially because you know this this process is going to initially be a little bit bumpy until until it clicks in but what i can say from my experience i don't have any hard measures or anything but i think that you're actually you're not even slower because of that you know because when you have like good organized pull requests a typical review will take like 10 to 15 minutes of time there will be of course some exceptions when something is more involved but most of the time you know it's just you're just reading it almost like a book you know next next next oh i have like this small remark but it mostly looks good to me you know and so it doesn't really take up much of the time and uh then it's then it becomes really it's worth paying off because it starts paying off because people revisit this code and they are way more confident about working with it they spend way less time fighting with the code you know so but you know i would start with code review maybe some you know automated checks as i mentioned earlier you know formatter the linter some you know credo style guide and uh then go from there to uh to see you know where it takes me okay thanks very much all right that's so we got time for just c3 and we're okay we're pushing the timeless all right hello uh so first of all it's very exciting to see you in person i usually watch you on youtube so it's cool uh i have a question about the second system effect so you mentioned that it's like very common to have that so how would you suggest to avoid like most of the mistakes or main mistakes in in that sense so let's say i know the company who already went this way like second system and yeah i just would like to know what's what's how to avoid the mistakes let's say well i guess the most obvious way to avoid the second system effect is if you don't write it right uh it's too late already well uh i don't know i mean uh you know i understand the impulse to to rewrite the the pro everything you know i i've been developing professionally as a full-time professional for 20 years you know so and i worked on some you know pretty disturbed legacy code basis and produced unfortunately some of those myself and i can understand the sentiment of you know now we're going to write it and now we're going to do better and there are certainly probably there are you know situations where really it's like to the point where uh the original stuff doesn't doesn't really make sense at all you know i could maybe even name one from the or say that one from my past actually was like that but most of the time i feel uh why i mentioned that that thing is that people are uh getting too eager to rewrite you know like oh this is getting a little bit confusing let's rewrite everything you know so that comment went really into the way of uh uh like you know don't go for rewrites go for gradual reflectors you know but uh what was the uh it seems i didn't answer the questions yeah i mean we already went to the way of second system so i i would like to know if there are like common pitfalls in there or something like well-known which we might not know yeah i cannot really talk i can't really make any comment uh that would have any foundation because i didn't rewrite anything so far so smart enough yeah okay thank you okay hi so i'd like to say that it's also super cool to see you in person and i actually got interested in elixir because of you uh among the others because of your presentation the soul of erlang and elixir okay thank you happy to hear that yeah and my question was you mentioned that you want to have tests like when you are wondering what level of tests to select like higher or older or lower you want to make it like the same way you talk to someone so you want to have the same level of description and so have you got any experience like actually creating like domain specific languages so they fit better for the problem like have you have you ever used and you've got any opinion of the like tool of extending the language that you are writing uh you mean like writing oh you mean like writing dsls for uh those tests uh yeah i don't want to say publicly something you know that will come off as a wrong way so i personally don't use that as much you know for me when we talk about uh these tests they are made for programmers and you know why would i use any other language than the language i'm using for everything else you know it's the most expressive language and with some lightweight abstractions such as this you know some functional wrappers you can get close to the domain uh specific language actually if i remember correctly you know it was a long time ago martin fowler had an article about dsls and i think that he called those things internal dsls in the sense that you're just defining some abstractions inside the language rather than defining the entire language you know dsl would make sense if you want to involve non-programmers into that this is something i mostly didn't uh do or don't don't have any experiences with it so i don't want to make any comments this is not a tool to use lightly for what would you use dsl other than non-programmers why would you want to invent dsl um so for a problem that is well okay this is i i didn't have any particular problem okay i mean we can discuss it later but you know dsl is you know adds a lot of overhead itself so i mean it's going to be weakly supported language compared to say elixir you know with less support and tooling and abstractions and whatnot then you gotta have good reasons to to do this yeah make sense thanks okay you can find me later and uh after the talk you could ask later so we're out of time thank you everyone for your questions uh let's give sasha one more big round of applause thank you [Applause] you
Info
Channel: Code Sync
Views: 1,120
Rating: 5 out of 5
Keywords:
Id: 6sNmJtoKDCo
Channel Id: undefined
Length: 68min 35sec (4115 seconds)
Published: Mon Sep 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.