Robust Connected Applications with Polly, the .NET Resilience Framework

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
i am happy to welcome up next we have brian hogan and brian's going to be talking about building resilient applications with polly so come on hey don't take it take it away brian thank you very much anyways john said my name is brian hogan i have been using poly for quite a few years and writing about it and occasionally presenting and today i'm going to give you a quick overview of the whole lot of the resilience framework failures happen they can happen anytime to any application and they will happen to your application what do we do about it when something goes wrong many of us work on applications that rely on connectivity but connectivity has never ever been a guarantee we have to deal with software failures local network outages and of course an unreliable internet any problem along the way and it could lead to your requests failing for applications such as mobile applications or iot or embedded systems this is especially true so what will your application do if there's a transient fault or a longer outage when making a request to a remote system will the whole thing grind to a halt or will you lose just that one request for a lot of applications losing a single request might be fine but for critical applications think of medical financial losing a single request can be quite serious what could you do about it the obvious answer is to retry a request so think of your local system making requests to remote systems make a request get back a failure you try it get like a failure retry failure retry success so keep retrying until you get a response keep retrying within reason not not forever of course so how might you program up something like that if you're using http client you'll have a get async hitting a remote endpoint you might put a for loop around it which specifies the number of times to retry you'd have a catch looking for exceptions and then you might have a break to get out of that loop when you have a success i've had people at conferences tell me that this is a fine approach but from my perspective it's fragile be very hard to use in multiple locations hard to change hard to configure hard to customize and of course we're here to talk about poly poly is a resilience for net and if you're working in microsoft as i imagine most of us are it's really the only game in town if you have worked in java you've probably come across hysterics poly helps protect your application from failure but has a knock-on effect on upstream and downstream applications helping them to remain stable also with poly you can handle all sorts of failures poly helps you to do this through what are called policies such as retry wave and retry circuit breaker and a few more polly was started in 2013 by michael wolfenden but more recently he was taken over by the app the next group with dylan reisenberger as the lead architect on the project poly is active on github slack they have a blog and is getting about 000 downloads per day dylan has been on my podcast on.net rocks over the last few years the popularity of poly has soared when i started working on poly it was something like i remember 10 to 14 000 downloads per day those are the numbers i remember and now it's at 150. as i said a moment ago poly works on all net platforms so again think of how this might help you with mobile applications embedded devices or iot where connections are more tricky for those of us who may be working on technology that's pre dot net core 2.1 or net 5 our request with poly would look something like this you have your http get and it's wrapped inside a policy the policy is defined like this where you say it handles a http response message looking for a success code if you don't get one or if you get an exception you should perform a retry that's pretty good relatively easy but since dot-net core 2.1 this is all you need at the point of execution there's no sign of poly there's no sign of the policies it is your familiar and traditional http get and you get a http response you don't have to unwrap it you don't have to do anything with it you don't need to pass the policies around this is thanks to the http client factory and i'm going to talk about that a little bit more uh in a few minutes when i started working with poly and when i started working on the pluralsight course for it i looked to see if there was any definition out there of a resilience framework and there wasn't so i came up with my own from my perspective a resilience framework actively protects resources restricting the amount of memory threads and sockets that an application can use when possible it will let you recover from a failure through something like a retry and if recovery is not possible it will facilitate graceful degradation now what graceful degradation is very much up to you it might mean that you fail quickly rather than slowly or that you reduce load on remote systems or potentially shed load on your own application and again all of these have a knock on effect of helping the services that you depend on also by not overloading them if they're stressed giving them a break giving them a chance to recover at the heart of pali are policies these are things such as a retry timeout and circuit breaker and i will get to each one of those in turn a policy is made up of two parts two clauses let's say the handles clause and the behavior clause the handles clause specifies the failure the policy should respond to like a 400 401 or some type of exception and the behavior clause specifies what should be done in this case retry three times as you can see in a fluent style are reusable a single policy can be used in multiple places and will not interfere with another use of that same policy there are threads safe it is no problem at all to use the same policy at the same time in multiple places each policy can execute a delegate to perform an action before the behavior clause kicks in so think of the example of a reauthorization there's not much point in retrying a request that failed due to reauthorization unless you do something to fix that authorization issue and i'll show you that as well in a second poly support synchronous and asynchronous requests and finally you can wrap policies so the idea would be you'd have a http request with a retry policy around it with another one like a circuit breaker around that and then potentially a fall back around everything poly policies are broken into two broad categories the reactive strategies and the proactive strategies the reactor strategies are made up of the retry wait and retry circuit breaker and fallback these respond to problems that have happened proactive strategies monitor ongoing attempt extremely ongoing events and attempt to prevent problems from occurring and they stabilize the system or potentially allow you to fail gracefully and the react the provider policies are time out caching and bulkhead isolation and i will go through each one of those in turn to give you an idea of where poly sits and how it works when it's interacting with a request i'm going to go to the example of a retry policy you make a request paulie checks if it was a success if it wasn't holly checks whether or not it should perform a retry if it should perform the retry it will make that request again paulie will then check again if it was a success if it was it will return you to response to http response that you would normally expect if this if it was not a success it would check again whether or not it should retry and if it should not retry at this point if let's say used up all the retries that you were going to make it will return a response which would be again a http response indicating failure holly doesn't interfere with the flow of the request if it succeeds poly doesn't do anything if the request fails and should not should not be retried poly doesn't do anything with a retry policy you can make hundreds of requests using a single policy and they will not interfere with each other making a request inside the policy has very little impact on the speed of that request and the poly team have been working on that over the last few years to make it even less of a performance issue with regard to retry there is one problem with it when you send your request and it fails you will retry immediately when that comes back with a failure it will retry immediately again so if you're calling a system that's failing because it's already overloaded you're not helping by hammering it with more requests you're also making life much worse for yourself because you're opening up potentially more sockets more threads more memory in use what would be better would be to give that remote system a chance to recover and that's where the wait and retry policy comes in when you make a request if it fails you pause and then you retry if that one fails you can pause a little longer and longer and so on and you can choose any algorithm you want to calculate the period between retries this gives the remote system a chance to recover for anyone familiar with networking protocols ethernet uses something like this when there are packet collisions if you are making multiple simultaneous requests using the same weight and retry policy they could all potentially back off and retry at the exact same time and that's not good because you're gonna have a burst of requests hitting the remote system at two seconds at six seconds at 14 seconds so you you can add a little jitter to that delay calculation function so it helps randomize when the retries happen here's what a retry looks like so i showed you a little earlier it says handle http response messages and if it's anything other than a success code retry three times and then this is what a wait and retry looks like again it's the same handles clause but now it says retry up to three times but here's a way of calculating the delay between the retries and again you can use any calculation you want i mentioned reauthorization because it's a more interesting case there's not much use in retrying the exact same request if you're getting an unauthorized error instead what you do is you use the on retry delegate as part of the policy so the policy at a high level says if it's not a success code you should retry and then specifically within the retry prior to executing a retry this block is hit and if the status code is unauthorized you perform some custom reauthorization logic that might be going out and getting a new jwt or a new cookie or whatever it happens to be one drawback on this one is if you try to do this with a http client factory it is more complicated but there i have some examples on my blog for that one one thing that i've been asked at conferences was can poly be used for non-http requests and it can you can pretty much use poly on when calling any method whether or not to return something so here are a few examples here i'm checking the result coming back that's an int i'm checking if it's not zero or if there was an exception retry three times and here i'm executing it in these examples you have to execute your code in explicitly inside the policy because you're not using http client factory relatively straightforward example i have a genome of type job status and if the response is not valid i'm performing a wait and retry and with this one i'm simply saying if an exception is thrown when i call this code retry up to three times so that's to retry and wait and retry policies another policy is the fallback policy it's usually the last one that you would use when you're executing a request so you might have a retry and then outside it would be a fallback you can return some sort of default value when everything else has failed fallback also supports a delegate that allows you to perform any action you want so maybe you will page someone or restart a service or scale a system something like that and its code looks like this in this case i'm checking if the status code i got back was an internal server error or not and i'm generating a whole new http response message to return instead of what i was expecting to get from the end point i was calling the next policy is a circuit breaker so this is quite like the circuit break here in your home or in some countries they refer to as strip switches it cuts the connection when a problem is detected circuit breaker has three states the closed state is a normal operational state closed like an electrical circuit current flows in our case requests flow open no requests are flowing and if a caller makes a request via something protected by a circuit breaker they will get a broken circuit exception immediately no delays and then finally there's a half open status this is a sort of test status when a real request is made and the circuit breaker is in half open if the response comes back from the remote system as a success the circuit breaker switches to closed if not it switches back to open i'll show you all this in a state diagram in a moment there are two kinds of circuit breaker in the original circuit breaker and an advanced circuit breaker the original one will open if there are some number of simultaneous some number of consecutive failures so two failures in a row three failures in a row the advanced circuit breaker opens if a percentage of failures occurs over time and there's a minimum throughput and i'll show you two examples in a second but to make things clear circuit breaker starts in the closed state if there are problems it transitions to open after a period of time being open it will transition to half open if a request fails during half open it transitions to open again stays open for a period moves back to half open and then if that request following quest works it moves back to closed and again just to hammer the point home when the circuit breaker is open it immediately returns or immediately returns exceptions again not using up any resources on your application not holding threads not holding memory not holding sockets if my application is connected to multiple remote applications and each has multiple endpoints a circuit breaker could be used to protect a single endpoint a group of endpoints or unlikely the case all outbound requests i mentioned that there are two kinds of circuit breaker this is the original one it says if there are two consecutive errors break the circuit for 60 seconds on break on resetting on half open are delegates that you can use when the circuit breaks when it's closing and when it's in a half open state you could use it for logging or other purposes the advanced circuit breaker says if 50 of requests in a 30 second window with a minimum throughput of 7 requests have failed so after requests in a 30-second window have failed and you've had at least seven requests then break the circuit for 60 seconds this allows you to add a little more nuance to breaking your circuit the circuit breaker i showed is fine if you have a single instance of an application but if you have multiple instances for the circuit breaker to function correctly all the instances of your application need to share the state you know how many times have you had a request that failed and that's why something like azure durable functions or azure durable entity functions is the answer poly has a new ish library that allows you to use azure durable entity functions to store that state and distribute your circuit breaker across multiple instances of an application quick recap the reactor strategies were retry wait and retry circuit breaker and fallback they respond to something that has already happened and try to help you recover from a problem i mentioned that you can wrap requests so there's really wrap policies if you've got a http request protected by a retry around that or retry policy around that you can have a circuit breaker policy and around that you can have a fallback so there's nothing to stop you from nesting policies within each other to any degree you want i mentioned that there's a thing called a policy registry and this is effectively a dictionary where you can store policies or wrap policies and it can be passed around your application with dependency injection are used with http client factory and this one gets very interesting hb client factory uses the policy registry and a method to pick an appropriate policy and apply it to that request you might choose a different policy depending on the http verb or the end point the request is hitting so for a get you might use a retry for a put you might use a wait and retry and for a post you might want to use a no op policy but it's entirely up to you to pick the policy most suitable i sometimes i'm asked about testing and there are a few scenarios in which you want to attest poly so some people want to test their code as though no policy exists to see the application running as though they were not there and you can swap in a no op policy for a retry or circuit breaker or for anything along the way another scenario is you might want to test how your application would behave when an exception from poly is thrown so the broken circuit exception or a timeout rejected exception what would your application do in response to that and for that you can use mocking and more recently there's an a secondary repository or package that's been developed based on poly called simi it's a chaos engineering and fault injection tool it lets you generate exceptions alter results delay requests or perform pretty much any task up to dropping a database prior to making a request and this will let you test your policies very very well on the proactive side these are policies that monitor ongoing events and attempt to prevent problems from occurring or they'll try to stabilize your system or potentially let it fail gracefully poly has three of these a timeout this lets you decide when to time out rather than some default imposed by the piece of code you're using caching allows you to store responses for a period and then finally bulkhead isolation this one will protect your resources so one part of your application cannot bring down the whole application with timeout if a request goes unanswered it will time out your choice is how it will time out http timeout in the past it used to take up to 100 seconds for a request to timeout so your process could be sitting there waiting and anything that depends on it would also be waiting so potentially threads memories socket sockets or you could use a poly timeout policy in which you specify how long a request will wait and if no response is received within that period paulie throws a timeout rejected exception you can of course wrap this policy inside a retry or any other policy the timing out quickly also means that the application calling you doesn't waste resources waiting for a response and in most scenarios with a http request if you haven't gotten an answer within a second or two you probably won't get one it's a very very simple policy this one says the policy will wait one second if you don't get a response you will timeout and show an exception the cash policy is a tricky one to demo or to explain in a short period it's a policy that often isn't used by itself but wrapped around other policies you can use anything that gets stored in the cache in multiple places within your application not just the part where you loaded information into that cache you can potentially store a full http response if you wanted to but that's a slightly unusual case usually you want to store just business data and it supports all the traditional on cache get miss put error and whatever other methods you'd expect from a typical cache here are two cache policies one caching is caching the whole http response for five minutes and the second one is caching let's say the business data a catalog item for 5 minutes by default the cache policy will use a relative time to live but it also supports absolute time sliding window or result ttl in the event that you're making a http request and the header includes a response specifying how long something should remain in the cache the result tdl can be used to set that time to the exact amount you got in the response as i said it's quite it's quite involved to show the code for it so please take a look at my blog i have four or five examples of using it here's what happens if your application can't handle all the requests it's receiving or trying to make the whole thing would sync memory gets used up threads and sockets will be held but this could potentially have a downstream effect on applications because you no longer respond to their requests they're also going to hold onto threads they're going to hold on to sockets and memory too so you could have a cascading effect on other applications the term comes from the nautical world where bulkheads and ships were used to prevent flooding from going from one compartment to another so the idea being your whole ship should not sink if one bulkhead is compromised same thing with your application here is a quick example of how you might end up in a scenario where bulkhead would help you your local system is making multiple fast requests to a remote system but it's only getting 50 responses after a while you will have so many outstanding requests holding on to so much memory and sockets that your application will fail here's what a bulkhead looks like in poly it's made up of execution slots and q slots when you get requests in they move to the execution slots when they're full subsequent requests will move to cue slots any more requests coming in will be rejected with an exception immediately when an execution slot frees up something will be moved from the q slot into the execution slot and you can now accept another request this limits the amount of memory processor cpu threads everything that any part of your application can use and for something so complicated the policy is simple to write right it says three execution slots six cute slots and that's it if we imagine that our application has a hundred percent of resources available using the bulkhead isolation will allow you to protect it from having all of those resources used by any particular part of the application so you start with 100 and then as the application runs the bulkhead isolation policy will prevent all the resources from being consumed this keeps your whole application stable that is a quick whirlwind tour of poly if you want more information please check out my pluralsight course my blog i have quite a lot of examples on this and some on semi there's the active poly slack channel github is very active as well and finally there is a blog excuse me a blog from the poly team thank you very much for your time john i cannot hear you it's not just me other people are saying oh my goodness we've done this twice now i'm just new to this whole thing um so we had a few questions on whether poly is usable with blazers specifically webassembly i have not done so but it's usable anything that.net runs on i have not tried it in that case great okay i also had a question about grpc and and uh it's usable you said it's usable with anything not just http yeah but should work fine with grpc too huh any method you can call poly a poly policy can be used to execute it so a a simple method that returns a string a complicated method of calls out to a database http client pretty much anything i've tended to focus in my work on the http side but it works on anything great okay one last question uh is working with with a non-200 responses so for instance like a response that says you know wants you to throttle or something like that i'm not quite sure i understand so if there's so not just a network exception but if you get a response from the server but the server has a an api response it says you know yeah you can specify your request in your handles clause you can check the status that comes back for anything you want so in the event of a three or something or four or something or five or something you can have the behavior clause kick in if that doesn't answer the question uh the person can contact me directly on twitter my dms are open and i'm very happy to discuss awesome okay that was amazing thanks a bunch uh and yeah if you've got questions that brian wasn't able to get to then uh head on over to twitter and and uh that'd be great there thank you very much
Info
Channel: dotNET
Views: 5,336
Rating: undefined out of 5
Keywords: .NET
Id: L9_fGJOqzbM
Channel Id: undefined
Length: 29min 34sec (1774 seconds)
Published: Fri Nov 13 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.