Fred Hebert - Operable Erlang and Elixir | Code BEAM SF 19

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

There is a discussion on Hacker News, but feel free to comment here as well.

👍︎︎ 1 👤︎︎ u/qznc_bot 📅︎︎ Mar 23 2019 🗫︎ replies

Captions

so I'm going to talk to you about approval or laying in the licks here there's gonna be some content in there that overlaps a bit what Maksym is just presented but I'm gonna be a bit acting in a bit at a higher level so one of the things that we are always told to do when we want something that's easy to debug is to keep it simple right the idea is that if we do something complex that requires to be clever to do it it's gonna be super hard to debug so we have to keep it simple the problem with that is that some complexity cannot be avoided this might be a fights a fine system if you're having a web store for example that really that received 50 requests a week or something like that but if I'm telling you that I have a web store that receives thousands and thousands of orders per hour and that a team of 40 people have their livelihood based on the system and I tell you that this is my architecture you're either not going to believe me or you're going to consider me to be entirely irresponsible so there is a requirement that when we have more people we require more complexity as a system is used right there's no way that whatsapp only has one big machine with a label webserver on top of it so you do have more complexity and if you get into redundant regions then you have to care about stuff like DNS being done differently load balancers and this is still simplified diagram but it gets to be a lot more complex when data doesn't fit one big database then you need to start charting it to split it to have replication in all kinds of places and it starts being a bit different and harder the behavior of the big system is not the same as the behavior of the small system and we cannot keep the small system forever complexity is mandatory for some kinds of loads and so I'm going to use quotes from a systems Bible by John gall here and there it's a fantastic book full of good but sarcastically mentioned concepts in there and so the first one is a simple system may or may not work there is never guarantee that a system works at all but a system one might work a complex system that works is enviably found to have evolved by a small system that worked a complete system design from scratch never works and cannot be made to work you have to start over begin with simple working system and so essentially this is also connected to the second system syndrome if you have a simple system you decide to make a more complex one on the side you have to do it part by part you can't just ship the big thing and hope it works the first time around we were just not good enough to do that and the large system produced by expanding the diamond show smarter system does not behave like this small system as I've mentioned doing a reporting query on my webstore in here might just be a big select statement in SQL and I get all the data I need this might no longer work in other data infrastructure but the behavior that changes is not just a feature that are possible or not it's also the way that the systems fail and so the failures of the thing of the small system are not going to be the same failures of the large system and so we need to have trained operators are able to deal with the increasing complexity so we might want to get back to our simple system that's gonna be fine for what we have let's say that there is a bug in the system how do I fix it well the first thing I need to do is be aware of whether the system is healthy or not there might be a bug but if nobody's looking at it and nobody is using it I don't know there is a bug and so we have the concept of monitoring and monitoring is essentially asking the question how are you doing you're just asking you know how are you doing you're feeling good today it tells you yes or no you had a little input that's gonna be an HTTP request or something like that it's just asking for the status and it tells you yes or no essentially you're taking a pet the temperature of the system if you see if the note tells you it's sick it's not feeling well you don't know why that's why you have observability which essentially is asking the question what are you doing you've got high temperature why is it that way did you just do exercise are you sick do you have a flu or something like that observability is about making the diagnosis and it turns from control theory a bunch of engineers like real engineers working with real hardware and yeah not software that's not real engineering and stuff laughing that's true it's not real engineering and you have for example speed control or flight stabilizer and observability tells you that if you only look at the output of the system you are able to infer the internal state that the system hacked we visibly cannot do this in software because sometimes all you get is a 502 bad gateway and nobody knows why that is so what we do is that we cheat pretend we are really engineers we just add more output and then we pretend that we can reverse it and that software observability we just throw our output even though the users don't see it and so let's say I have that then my system like that is no longer to system I'm operating I need more components to be able to deal with that and so I grow the definition of my system I have a bunch of green components that are related to getting all these metrics the monitoring the logging that kind of stuff I have to have a way to put code on there otherwise there's nothing to fix right it's not burnt into persistent memory and then you'll never change it and if I have these components access to the server's I also have to care about the code and then the code are you yellow components that has to do with development but that is still an incomplete view of my system my system has people writing the code has people understanding all the metrics and all that stuff and so this is a more complete vision of what the system is and there's a bunch of being people in there and they're doing the imported stuff and not everyone has the same perspective and the same point of view of what is going on the little dude on the top left is not necessarily going to have the same vision as the three developers and everyone is still having an impact on the system the little person on the top right is not touching any of the components but they might still have a disastrous impact on the system where a marvelous one they might be a salesperson that sells a feature without telling anyone they might be a CEO who just changes the budgets and they have a huge impact on the system and there's a big morphus blob of communication channels and that turns out to be super important and so if I want to make a system operable I have to think of the system not just as the components but with the interactions that it has with the people in there and so how do we debug a system has to do with the point of view that we have in the system we're there the person on the top left is not going to debug it the same way there are three people developing the code or doing to do it and look and any kind of luck the person on top right is not going to debug it at all so the way we work is that we have a kind of map a conception of what is into its system and this is not the City of London this is a map of the city of London you don't see the people you don't know which restaurants are good or not but it is a very detailed map it has streets it has train stations it has piers in there it has the rivers it has colors for buildings and as kinds of information this I would compare to the source code the source code is not run is running in production you have specific artifacts and specific environments but it is extremely detailed and very much nobody knows it by heart and by the way source code also includes the operating system we are taking on the responsibility of everything we ship if there is a bug in Linux or something like that it's certainly not the marketing folks who are going to debug it is going to be the developers anyway so this is the kind of understanding that we have or that we wish we had in fact the kind of mental map and the understanding we have of the system is much closer to this and if you look into the diagrams that we had and when we describe this it would describe systems this is the kind of stuff we do it's not like this it's more like that right and so this is a tourist map of London it is entirely inaccurate all the scales are wrong and pretty sure that the two people standing in the bottom rights are not the right scale there they don't fit the Big Ben Big Ben is not that big and so that understanding is kind of all wrong this map is not correct but if you're a tourist in London for the first time it is far more useful as a map than than the other one if I'm looking for Saint Paul's Cathedral and I'm standing part of other tourist attractions because I'm a tourist why would it be anywhere else then it's easier to get that way than just drawing this map to somebody and tell them figure it out jerk right and so the useful the usefulness of a map much like the usefulness of a mental model of how the system works is directly proportional to what we can predict from that model the model doesn't have to be good it doesn't have to be accurate it doesn't have to be recent as long as the decisions you make with it are good and so if you are in an emergency situation because there's an incident and you have this crappy map you might make better decisions that someone who has the complete view but is unable to navigate it properly and mental maps don't have to be right as I said but they do self correct only when they are no longer longer right there's a kind of philosophical idea that tells you that something does not exist until it breaks and then you notice it exists mental models work the same way as long as you work with the wrong mental model it predicts everything fine you're going to good right if I have this water-cooled machine that works at home all the time I'm going to be happy with it I turn it down I go to sleep I'm in Quebec it does it goes below the freezing point the machine bursts while it's not even running I get up the next morning suddenly my world changes temperature was not something I considered but all of a sudden the operating temperature of the world can break the machine even when I'm not using it the mental models that we build are corrected the same way they are remaining partial and wrong until we can no longer go that way anymore and to drive the point a bit better this is another map of the City of London it has no building it has no streets it has no information whatsoever it kind of relies on global and global context that you should have but this map lets you go further away than any of the other ones it's a it's a Tube map of London and it turns out to be excellent for navigation in there and so when we get back to our systems and the mental models this is how we work the little person on the top left probably has a mental model if they are only operating from the logs and the metrics that is based on data flows which servers or what services communicates with which other components the people working in the code on the right probably have an entirely different vision and when there's an incident they will approach at an entirely different way they might go and say oh that's a component that fred is maintaining I don't trust Fred fred is a jerk that's probably where the bug is which is not something that the dude in the bar on the top-left would ever think about they're only looking at the numbers they are using the subway map and the person on the right are just using entirely different circles of influence but both might lead to solving some bugs differently in other ways and this variety of perspectives is what we need to have in an organization to be able to debug things more efficiently so knowing that it is a very soft topic like that what can we do well the thing that we do as developers is usually just fix the bug of yesterday and then put visibility as something that's a solve problem so if this is my application is there's only windows no walls it's a black box every time there is a bug the thing that we do is going to say like oh that was a tricky one I'm going to put a log line there so next time there's no problem or it's going to be like this is error handling code I put it to do so I'm usually going to put the login they're telling me to fix this later and then when we run it it happens a bit like that you just put more and more windows in your applications and you hope to find the tricky bugs but you will never find tricky bugs that way because tricky bugs by definitions are bugs that you did not see coming and so they will be in the red Xs so all that we have done is add visibility haphazardly to our application and we created the software engineers house it has great observability you can see everything that's inside it's just a nightmare and so that's what happens and so what you do after that is kind of say well this is garbage I don't like it I'm going to make something where I can see everything all the time and so this is a glass house and yeah I like that one the big problem with the glass house is that it's not helpful either right if from my point of view in my application I just expose all the operations that appear all the time it's not a big help I've worked with a team in the past where they wanted to log everything that went on in the network and so they added the login call around every call of the HTTP or TCP libraries that were happening and of course that was too much volume and so they put that into an optional switch that would log everything to disk so what happened is that when they wanted to debug what was going on they had to go into the production machine turn the debug logging on only for all the network probes and they would get the full trace there it's a stupid idea and the reason for that is that if you have this kind of glass house approach you don't know what you're looking for unless you're already like you need to know what you're looking for to make sense out of it there's too much noise it's a completionist view if you don't have a perfect or a very good mental model of everything you don't know what to look for or where to find it and if you have a very good mental model and you and you know what to look for and how to find it you're not going to use that right if you're going to get the network information ask anyone who's about the network they're going to be used TCP dump or Wireshark they're not going to use the stupid switch for logs on the network stuff and so really that's the thing that's going to happen a glasshouse model where everything in the application is trying to provide visibility for everything else it's just not going to work well allow at all right if you don't have a good model it's useless you cannot form in your mouth there's too much information if you know what you're looking for you're not using it either so all you have is a bunch of information nobody cares about instead an approach that's interesting is to base ourselves a bit into what operating systems will do so operating systems especially like Linux or BSD tend to give you a lot of probes that you have at a low level you look at the operating system and it's going to give you stuff that has to do with network drivers isle of all kinds disk memory all that kind of stuff and you will notice if you use them that none of them necessarily require you to know how the code is written within the operating system the thing that they provide is a good clue about how you interact with it with the operating system that's kind of the key here what you want to provide when you provide debugging information is information about the interactions you want to provide a kind of story where the user of your components have their idea of how it works and you don't know what it is but they're able to figure out and get information without understanding what's underneath and so if we put everything in the application and that's roughly the glasshouse or the software engineers house we are taking on the responsibility at the application layer of providing visibility into the framework the libraries the standard library the language the operating system and everything like that and that's a huge task that nobody can do really really well so what we should do instead is use that layered approach and it was nice to see it in Maxim's presentation because they had that layered approach of probes at every level that they were monitoring and everything so what I should do is well there's stuff that if I'm using a program a programming language and I wanted bug stuff I used a programming language feature I don't debug this stuff from my own stuff from my own probes in there if I have an application I should not be providing logs to debug the application within the application because if there's a problem with the application I shouldn't be able to trust it loads it's buggy if I'm using the application logs it's because I'm trying to understand how I am using the application if you're using a web server like nginx nothing asks you to understand the internals of nginx they tell you about the configuration that you have set they tell you about the request it has received it tells you about the data you care about but all the logs that you have to use there have nothing to do with the internals of the tools and so the logs that we emit from within our application should be entirely limited to the interactions of the user or the operator with the application itself if I want to debug the application then it it follows that I should not be using the logs of the application I should be putting them a layer below and so if you're using a web framework for example a lot of them have a concept of middlewares or plugs if you're using Phoenix of something like that and what's interesting with these is that if in the framework itself there are logs or probes or metrics that are put for these probes there is going to be a single probe point that you put in the application it's not hard to maintain but if the user uses 50 different middlewares they get the larger them for free or as if I try to do it in the application I have to put 50 probe points myself and that's garbage and it gets to be ugly and hard to maintain and very difficult and so that layering is really critical and what's interesting is that you want to use a stack if you care about your operators you want to use a stack that is friendly to that kind of layer and approach where you can use different levels of expertise for different operators that you get different levels to figure out what's happening and so we're playing with a concept that is similar to operator experience we have user experience where we try to make an intuitive application without documentation you're able to figure it out with different levels expert users beginner users or something like that nobody really gives a about the operators but we should be doing the same thing and that layering I think is kind of the key to do it so to get back with the systems Bible the crucial variables are discovered by accident you don't know what you don't know until it blows up into your face that's what updates a model that's what makes you fix things when everything correlates with everything's things will never settle down and they do that on side effects if you're trying to tweak a button but then everything moves forever you don't know if you have any effect the same is kind of true with the glasshouse approach to logging and metrics if all you get is a white wall of noise of all metrics of everything you have no idea to build a model out of it it's too complex a system is no matter that it's sensory organs meaning that if you cannot see it if you cannot see it happening it's not happening until it breaks something else and then you can infer the state but if it's not visible it's not there and the meaning of the communication is the behavior that results I love quote there essentially if you're sending information somewhere nobody uses it nobody does anything about it it is meaningless it has meaning if it is actionable if you can do something about it alerting works that way logging should work the same way metrics should work the same way there is nothing worse for an operator than trying to figure out why the system is behaving weirdly and how all you have is a dashboard with 50 different metrics that you don't know what they mean but they all correlate with each other randomly so one reflects that we might have is to just say well what the hell the best way to please my operators is to not have operators I'm just going to automate everything and so here I am the agent and I might be looking at my system here I have a database there's a single master if the database goes down the things is gonna happen it's gonna I'm going to look at it say oh the master is down I'm just gonna switch the configuration the follower one is the new master if follower two now follows from follower one and then I update the configuration and everything works and so it might be interesting to say I'm just going to automate that and then my operators don't need to think about it anymore the problem is that there's a thing called the law of requisite variety and what it says essentially is that any action in a system can only be undone or corrected by something that has the same flexibility to counteract that action and if you don't have that you cannot do anything to fix it the other thing that it tells you is that whatever you are able to model in your system is the limit of the control you can have and so this is very interesting because it's basically an upper bound into what you can automate and what you cannot automate if I get back to the data bases and I replace myself as an agent and I put it with something that's a piece of software like that I get different scenarios that are a bit fun right it's possible that when I get the call I look on the website see that everything is down and then I can know for sure because I'm outside of the data center that yes the master appears to be down if I do it in an agent that is sitting in the same data center as I my stuff and I take a shortcut because it's hard to do a full round-trip by the application I just use the health check for the agent then all of a sudden the agent doesn't know about netsplit necessarily and then it's able to take wrong actions so maybe we'll do the stranger switch of configuration from the first follower second follow or change the conferring in the clients but the old master is still alive and now I've just corrupted data and now what's super interesting is that as an operator I'm stuck trying to debug that system and I don't have to understand the system itself I have to understand the system itself and then the understanding of the system by the agent and then the actions of the agent on the system to be able to undo all of that and change it and that's the big problem with the large requisite variety right the understanding of the system here is incomplete it's not just up or down it's not just master follow or something like that it's up down and don't know there's a net split something like that and so I have added tremendous complexity to the system on behalf of the operator because I wanted to save them work and so the thing that we have to care about at that point is figuring out that some automation might save a lot of work on mundane tasks but Randor is the overall system so complex that it becomes very very hard to operate I think that an automated agent should do is related to what teams do and what teams do is essentially what a good teammate would do right if you are having trouble you tell your other teammates if you are starting a risky operation you are telling everyone else if you feel you are reaching your limits you tell somebody else if you feel that your coworker is not doing a good job or they're having trouble you go and help them who here has written automation that does this no hands raise either you're all sleeping or nobody has written a smart agent I'm gonna say that it's a smart ingenuity is reading it and so the problem that we have with that is that it's succeeding lis hard and so we should be asking the question is it worth doing or do I just automate something that doesn't require that kind of coordination and so for the systems Bible we get the quote an extra brain in the tail the tail wags on its own schedule if you put a drain into something is going to be independent you no longer control it the system itself does not do what it says it is doing the controller is either sized by the elements of the greatest variety of behavioral response this is a human being there is pretty much no system where everything goes down it's still the computer making it there's a human who has to act as a backstop for everything that's happening so this is interesting to keep in mind when what we're doing is automation because essentially it means that whatever intimation we do we have to keep in mind that there is a human that's going to debug something somewhere and we have to be careful about what we do and the little rule of thumb but I have 4 that is the last point and I freaking love that quote if it's worth doing at all it's worth doing poorly and here's my interpretation of that if I'm trying to figure out if I want to automate something I ask myself if I do a half-assed job is it still going to be useful and sometimes the answer he's gonna see is gonna be yes it's like this is tedious SL I don't want to do it all the time just like do it that way I don't care if it's not perfect it's better than me doing it by hand this is worth automating if the answer is no it has to be perfect because otherwise it's going to be a huge nightmare and everything is going to be broken then by God don't automate that because you're not gonna do a perfect job and I'm not gonna do a perfect job either so I will only automate things that I'm allowed to be shitty at even if I'm going to try to do a good job at it but at least I know that if I make a mistake it's not going to ruin someone's life and so that's one of the ways that you figure out what to automate we can get back to what can we do as Erlang an elixir developers right logging use structured logging if you just log with a sentence that says something like this is a bug that happened for this reason or something get out of here because if you have logs like that you're trying to surge them you you just need a full text search engine to come to understand your logs that's garbage you want structured logs that are based on keys and values and structures that you can just write tools for to help write and the tools don't have to be perfect you can do a crappy job of them because it's still better than just looking at older logs by hand and read me in words dot txt or whatever the tool is on Windows I don't recall it doesn't handle breakpoints anyway everything you log log it at the level you want it in production there is nothing well there's a lot of worse things but there's nothing worse there's a lot of worse things than that but you go an assistant you turn on the lugs and then you crash the system because there's too much lugs too much lugging for what you need like there's worse thing in that that's not fun it's like okay I just ruined the system for no reason it always useless structured logging super useful and do mention only facts things that are happening you don't want to provide interpretation from within the code because you don't have the context to provide an interpretation the interpretation should be left to the human that's still automation right and so the best example I've seen about that was a piece of software that would check SSL certificates before downloading something and it just assumed that everything would be updated of the up-to-date forever it was an old piece of software that was not very good and when at some point it but it became too old to do proper SSL it started telling people error you are being hacked and so what happens when you do that is that you get that crappy message that means nothing because the person writing the code expected that everyone would maintain their infrastructure properly is that suddenly you have a user who is not necessarily technical panicking that they're being hacked they have no idea how to solve it not more than if it were a technical log with just an event like that they have no actionable action they have no way to do anything about it and what's interesting is that not only are they losing time trying to understand that they are losing trust in the information that your system is giving them so what happens in the long term is that you're trying to provide logs and be useful in what you do and all that happens is that people disregard them basically because this is garbage software and you don't trust what they're doing what they're saying somewhere there's an operator with as many tattoos as a guy in memento just like don't trust their words so what we can do is just yeah that's what we provide in our application you have metrics you all of that there's a lot of resources what's interesting is how do we get to the lower layers Maxim has mentioned a few of them I like sis trace for some OTP processors it's not super great for high performance stuff but it's given to us by the framework and this is interesting because if there is systrace I don't need to do print statement debugging with that it tells me about which messages are received by an OTP process which responses are being sent I turn it on something like that and this is a little ping-pong client and UDP edge is gonna tell me all the messages and I just turned that on and off with a simple switch on as many servers as I want I don't need to compile anything it's just there there's a slightly more advanced version called log and the only thing it does is that instead of just outputting them to standard output it keeps them in a little circular buffer in your process and you can go fetch them by calling get and you get them and starting with OTP 22 which has a new release candidate as of this week when the process dies it also outputs the logs that were pre accumulated in there which is pretty freakin sweet so you get that for free you can turn that on in the start link arguments to a jet server process so even if you're writing tests not just doing it in production you can debug your tests with that far more easily it would be to just put a crapload of print statement in there Moreland's version says install instead of just getting the events output or accumulated you can put a call back in there and do what you want with them forward them send them to another process play with them I've used this in tests before instead of doing mocking or intercepting arguments of something it's just like I'm gonna do my call put the message in there and just send me the events to the test process and then able to trace what is going on without any additional machinery just put the call in the test itself it's a feature of the language it's fine and then you have get status get state replaced state that's kind of risky it's been mentioned before it's kind of neat when you need it but you hope you don't need it if you drop from the framework then you can get to the virtual machine level I've written an entire small book about that the Erlang and anger stuff tells you about all that kind of things so I will defer to the book instead of reading it in front of you micro state accounting Maxim has mentioned it this is how you call it by hand as he said and it is pretty nice to get more advanced stuff I didn't cover it in the book so it is interesting if you want to do tracing tracing at the Erlang or elixir level you have cool applications like Rex bog where you can just have like this in a member function put the arguments it's in a string and it's going to tell you all the calls that happen with it I'm personally a fan of recon trace because I wrote it so it's the way like it and it works kind of the same it has a different way to trigger on stuff it has all these kinds of things you can get the return values you can trace on private functions we're not needing to recompile them in a special shell that you have built for that you can just get the values out of it with that and you're able to get the values that you want but that's still at the virtual machine at the language level you have these trace functions what can we get if we go deeper well you we get into the operating system stuff and it's fine you got here for example perf and perf on linux proof top this is how we debug back when I was at Heroku problems with the SSL application it just tells you that C functions that the VM is running in a top like interface and where the CPU is going and here what we could find was these function called do minor which is garbage collection and DB next hash get hash select delete hash which I didn't necessarily know what they meant just went in there figured out it was related to the select delete operation then I ran the trace that I had in there on the production node to see who is calling select delete all the time turns out it was the SSL manager and we found about all neck in there that when we removed it once you rendered the code five times faster but we could do that with about half an hour of investigation and poking around no need to compile to check anything we just use the tools that are available one thing that you can use dtrace and systemtap dtrace on PS DS and BS DS like system tat on linux you can create scripts one-liners entire libraries of debugging tools based on that you can look up all the system calls a bunch of functions that are happening in Erlang is pretty mean because it's all instrumented to run with that you just need to compile the right switch when you build the system and one of the really really cool things that not a lot of people talk about is the din trace module and it in trace module essentially what it lets you do is use system tap or D trace to do your entire stack from the operating system to the airline functions it's just a bunch of functions and like then trace : P and then a bunch of arguments that are really meaningless but you can put them the way you want and they become visible within system tap and D trace scripts and so you're able to check everything from the operating system to the top of your applications if you need to debug that way here is this one is a D tre I know that one is a system tap script and what it tries to do for example is say when someone calls Jan UDP send and the I net UDP sent which is the underlying function basically this will trace me how much time is being spent preparing a packet to be sent over UDP before it even leaves a virtual machine I run that a little thing like that and takes like elephant to 14 microseconds just last week I've debug the case where we would take something like more than a second and it turns out that someone passed an IP address is a string and then that goes through a full DNS resolving because if it's a two-pole it's an IP if it's a string otherwise it goes other way and then you don't need to do it's just like a little bit of binary search like that with trace pros and you find out that all you need to do is look up the IP once and then the system was seven times faster that's great dtrace looks a bit like that I personally prefer DTrace as a format and as a little language this one I wrote when I write and when I wanted to debug low performance of rebar 3 on FreeBSD on the Raspberry Pi which is a very common platform and you know it could take a lot of time to figure out why the hell is this thing slow and then I wrote the thing that just like show me all the freaking system calls that have to do with whatever and just give me the time I'm just going to ask for the version it wouldn't take me three seconds to get the version of the tool and then I found out that all these call had to do with polling select and whatnot and it turns out to be file access and so I wrote a second script on the side just like give me the file names of everything I'm opening and it turns on it was just loading Erlang modules to start the e stripped the problem was just that I had a shitty SD card and trying to debug that with print statement would have taken days but with just a little D trade script like that you can just look at whatever you need and dig through and of the system all you need to do is to have the right kind of understanding and that's where the mental models come in play and so one thing I have of course I'm going to mention in my latest book but it's something that I found interesting in trying to build a model for testing the software is that you're also trying to come up with a model at all and it turns out that for some pieces of code that rewrite coming up with a model is super hard and then that kind of begs a question if I developer writing the code cannot figure out how to model this how the hell it is an operator going to do it it's kind of impossible so it's a kind of interesting proxy for that it's like code coverage right 100% code coverage does not tell you that he have good test but 0% code coverage is a pretty good site you have shitty tests and so being able to write a model as a property is not going to tell you that you have a great way to model your software but being unable to come up with one is probably a good sign that nobody else is going to be able to do it either another thing that's uh yeah kind of interesting with that is that frankly any kind of modeling like that is probably worthwhile try to write documentation try to use ela plus is kind of harder but just trying to write documentation forces you to create that kind of model to give an explanation and if you have a hard time writing documentation it's gonna be hard for people to operate your software the other big tip I would give is that practice makes perfect and the best way to learn how to use debugging tools and productions for your operators is to use them in development don't use a debugger that you would not use in production only use production tool and development and so you can play still with the little breakpoints that you have both in rebar three shell which I want to demo as a lightning talk tonight and you have them in mix as well and the thing that you can do is isolate the test that you know is failing and try to use the tools that the virtual machine the language the frameworks the operating system gives you to debug the bug in your software because this is what an operator would have to do and so with this having a failing test figure out can you just use the SIS stuff the system stuff in OTP can you deal with dtrace can you do it with tracing what can you do to figure out what there's a problem test your hypothesis directly in a test case using these and get some practice in development to use it in production what you might find out is that essentially your software is not debuggable with production tools and nobody's gonna have a chance what do you need to change in your software to make it better and otherwise what kind of tools it's a great learning opportunity so that would be yeah my last point practice makes perfect do we have any questions [Applause]

Info

Channel: Code Sync

Views: 11,678

Rating: 4.9163179 out of 5

Keywords: Erlang, Elixir, Debugging, Resilience, Fred Hebert

Id: OR2Gc6_Le2U

Channel Id: undefined

Length: 35min 30sec (2130 seconds)

Published: Fri Mar 22 2019