The CORRECT way to implement Retries in .NET

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody i'm nikhil in this video i'm going to show you how you can properly and correctly implement production grade retry policies for your c sharp code having worked with microservices for a long time i don't think i've ever worked on a service that didn't implement retry policies in some capacity whether that is about consuming a message or an event doing some in-memory work or even calling some api to get some data which is the example i'm going to be using in this video for demonstration purposes now i've seen this done wrong many many times and it always has devastating effects usually for the service you are calling or consuming and you're retrying on and i'm going to explain all that in this video and remove any ambiguity from the issue if you like about content and you want to see more make sure you subscribe bring this notification bell and for more training check out nickchopstas.com now before i show you the code i want to tell you about the sponsor of this video octopus deploy octopus deploy is an automated deployment and release management tool used by leading continuous delivery teams worldwide it helps devops teams at over 25 000 companies accelerate reliable repeatable and traceable deployments across different cloud providers and on-premises infrastructures with more than 500 different automation step templates and integrations with hundreds of technologies like azure aws gcp azure devops and way way more connecting your processes together into one pipeline has never been easier it's actually what i've personally been using for the past five years in my last two jobs work with microservices in both major cloud providers to manage my deployments and build simple but also complex cd pipelines and i've been extremely happy with it it was actually the only devops tool over the past five years that was never changed for a different tool because nobody had a problem with it it does what it needs to do and it does it well and reliably so if you want to get started with octopus deploy check the link in the description and thanks again so much to octopus deploy for sponsoring the video all right so what do i have here so here i have a simple weather forecast api which has a controller with a single endpoint and all this endpoint has is a weather forecast retrieval endpoint that accepts the city and then returns the weather so if i go ahead and i run this endpoint as you can see if i call the endpoint to get the weather for london then i'm getting the weather for london right now and if i do i don't know something like paris then i'm getting the paris weather and you're probably thinking how this works it's not an in-memory thing i'm actually calling the open weather api and i have a client defined over here for that i'm using an api key with a quota and everything and this is the call that gets sent to that api with the api key by the time you're watching this video this key will be long gone so don't even bother using it for your own purposes you can just grab your own it's basically free up to certain requests per hour i think now in a realistic scenario this code wouldn't go anywhere near production i wouldn't be allowed to push it because it is not resilient what if this weather api doesn't respond or if a request times out do we really want to return an exception or a 500 to the user or i don't know some other response code no we probably want to retry if it is a transient error and http transient error is usually any status code over 500 so 500 which is intel server error and above and also request timeout you would probably want to retry now how you would do that is pretty straightforward you probably have a loop with retry counts and say for up to an amount of time uh retry it i'm gonna use a label and go to to visualize how we do it but don't use labels and go to use loops so what would happen is you probably have like a start point over here and then you would have a try and a catch they try will contain over here these two things and then the catch will catch the http request exception and this is one of the things you probably retry maybe depending on the exception as well but generally these things can be retried and the other thing you'd retry is probably say if response to status code is more or equals to internal server error or is request timeout then also retry and this is where you probably also implement like a retry count equals zero and then every time this thing is retried you would increment it and then go to start which is the same logic that you would have here and maybe you also want to have a bit of a delay because you don't want to spam retry things you might say okay maybe wait i don't know for a second before you retry you would probably build something like that now as you can see in this retry policy even though functional assuming i added the top level check that says that if retry count is more or equals two than a five then return bad stuff which we're not going to implement now then even though that this is absolutely functional it works and it will do the job it is not really very elegant for our code and what happens if you have more requests or you're using more clients you have to implement this very nasty looking stuff ever in your code well you kinda did but we can actually make this way way better for ourselves and the recommended approach that i have here is to use a nuke package called poly so poorly if we go ahead and reformat it it's a nuget package is this one with a parrot over here and it has a lot of resilience policies already baked into it and it makes it very very easy through a fluent interface to actually implement them so now they added it i can go up here and i can say private read only i async policy and i want my policy to be on the http response message which is what this response is actually returning so i'm going to apply it there and i'm going to just say retry policy and the great thing about this is that i can just define it in one place and then reuse it everywhere else so i don't have to have it on every single place and what i'm going to use i'm going to use the policy class i'm saying that this is a policy for the http response message and at this point i can tell it to in my case handle and i want to handle an exception in this case that exception was the http request exception and i can also say or some other expression and change things or i can even say or resolved and in this case i will be able to access my http response message and say http response basis status code is more or equals than the internal server error or extra status code equals the request timeout and i can actually use the new features to merge this into a pattern so this looks a bit cleaner and now i can define what i want this to be so i can have a fallback policy a retry policy circuit breaker i'm planning to actually cover all of this but what i want to do at the very surface level is just retry async and i can specify how many times i want so let's say retry for five times so now i have my policy defined and i can copy that and i can wrap this call over here and say execute async use a delegate and now i wrapped this call in the policy so this call will be run through that policy and if any of those conditions that i defined here is met so in case of an exception or a status code within that range the thing will retry for up to five times but the retries will be back to back to back there won't be any weight between them if i want to have a weight i could also have a wait and retry for a specific time async let's park this for now let's just leave it with retry async and i will explain why because just retrying in this case isn't really enough retrying goes deeper if you want to do it correctly and to explain why that's the case i'm going to bring the whiteboard so let's go ahead and swap to the whiteboard and what i want to have is the exact same scenario that we have here i have some api that is calling the weather api over here now imagine a bunch of people over here planning to call that api at the same time so bam bam bam bam and i have to handle all four of these requests and send them to the weather api because i need the weather now what happens if the weather api goes a bit boom let's say for i don't know one minute then what this will do is every request from them will basically be tunneled through me and i will try to send those requests but because the api will say well i can't really handle them then me the api will try to retry five times their api requests i'm retrieving and because users are just very pro just spamming the refresh button i can actually change what is effectively an attack to this api with any backup period now to prevent that one way to deal with it would be to add a backup period so you would have like a one second pause between retries right and that way i have these periods where i don't really do anything but then this period where i'm sending the request that is a good first step but it is still problematic because it is not really a big enough time to recover if something didn't respond with a one second delay then it isn't that likely that within the next second it will respond so what we tend to implement is something called exponential back off where in the beginning you wait for one second then the next time you wait for two seconds then maybe for four seconds then for eight seconds 16 seconds and so on and so forth you kind of need to be careful with this approach the exponential one because who wants to wait 16 seconds to get the response that's really bad user experience so you have to kind of pick your bottles and say okay what is acceptable for me one plus two three plus okay seven seven seconds maybe it's in the high end but i accept that and that way of exponential back off is better but still suffers by the fact that as everyone is just calling this api you're synchronizing the time that you're calling that api the weather api through your own service so this is still not great because even though it backs off it doesn't have great retrial distribution to prevent that what we do is we introduce randomization element and we call that the jitter so what this means is the first time we're gonna maybe wait for 1.1 but maybe for 0.9 but maybe from 0.89 or maybe zero point another time and then the second time we're gonna do 2.2 or maybe 2.1 or maybe 2.01 or whatever and then 4.3 or maybe 4.0 or maybe 4.1 and it goes on and on and that way you distribute your tries better and actually gives more breathing room for the api itself and you also eliminate this dead zone of your application basically doing nothing now if you wanted to implement exponential back off in this scenario you would say something like wait and retry async and let's say five times and i would use this overload over here with the integer which is retry attempt over here and we would say something like time span from seconds math dot the power raising a specific number to the power and they would use two to raise that to the power of a specific value and that would be the retry attempt and that way you have this exponential back off now the problem with this is it doesn't have any randomization element it doesn't have the jitter and i could just sit here and complicate things and implement it but actually there is a pre-baked package that contains the best way effectively to implement a jitter and you don't have to worry about it and i'm going to add it that is the poly dot contrib dot weight and retry package it contains as you can see over here i'm gonna just comment this thing out and i can say wait and retry async it contains this back off class and this pack of class has a bunch of different predefined backups constant backup exponential backup linear back-off then aws decorated jitter back-off and then the decorated generaback of v2 which is effectively the recommended approach to do things we're gonna choose that and we're gonna define a couple of things first we're gonna say that the base of the back off will be one second and by base i mean the median first retry delay and then i'm gonna specify the count which is five and that's it and if we go in here to see how that thing is implemented then it is not super complicated i mean you can wrap your head around it there's a lot of math and there's the randomization factor that i mentioned before if we search for random in here yeah we will find the concurrent random class which accepts a seed by the way so if you want deterministic backups in your jitter you can even provide a seed which will ensure that your backups are deterministic but in any case we don't want it here so i'm just going to remove it and that's it this is effectively production grade retries for your application however we wouldn't really do that in this specific scenario for any other scenario absolutely fine but dot net actually has a poll integration package that can be used with the http client factory and if you don't know what the client factor is or what named clients are i made a great video explaining all that i'm gonna put it in the top right corner of your screen you can watch that after this video because you don't really need to watch it now but watch it later it's really really good and i can go here and just basically revert this to what it was so remove the wrapping into the policy and just have it exactly as it is and as you can see this policy isn't used i'm going to leave it exactly as it is and i'm going to go ahead and add the http dot poly package which is that integration point between the http client factory and poly itself and i'm gonna now go into the program.cs and say add policy handler and i can use the exact same code if i want to do so just copy that and just paste it here import things and it works and what this will effectively do is just add this policy on anything in this http client and of course you can extract this policy into a variable and reuse it across all of your http clients if you want to that's how flexible it is however it actually goes deeper than that i can just comment this fella out and because these transient errors are such a standardized thing there's actually an extension method called add transient http error policy where i can simply configure my policy here and i can just say this thing which is wait and retry async goes here and then copy my back of definition paste that here and then well that's it everything is made for me and if we go in this add transient http policy you can see that the policy is this handle transient http error which is public you can reuse it if you want to without the extension being present and it's the same thing http request exception or transient http status code and those are anything bigger than 500 or equals to request timer the exact same thing that we had so all of that code that we had before now is just these two lines of code and it automatically applies to everything in that client without having to wrap every single call explicitly which is very very elegant and as long as you're using back off with a jitter and you know how far back it can go from a ux perspective you have a properly implemented retry policy for your asp.net core application again it doesn't have to be an http request you can just use the raw approach to handle anything just specify the object explain what you want to handle and pick how you want to retry but proper back off especially with third-party services can go a long way in stability of the system and for the service you're consuming as well now i'm planning to cover way more things about resilience so make sure you subscribe to not miss any of the new episodes well that's all i have for you for this video thank you very much for watching special thanks to my patreon for making videos possible if you want to support me so you can find the link description down below leave a like if you like this video subscribe more content like sharing the bell as well and i'll see you in the next video keep coding
Info
Channel: Nick Chapsas
Views: 59,124
Rating: undefined out of 5
Keywords: Elfocrash, elfo, coding, .netcore, dot net, core, C#, how to code, tutorial, development, software engineering, microsoft, microsoft mvp, .net core, nick chapsas, chapsas, dotnet, .net, polly, polly retries, retry in code, code retries, polly nick, retry policy, The CORRECT way to implement Retries in .NET, retries in c#, retries in .net, .net retries, c# polly, .net polly, c# polly retry
Id: nJH0PC2Pubs
Channel Id: undefined
Length: 17min 0sec (1020 seconds)
Published: Mon Sep 19 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.