Understanding API Rate Limits: Purpose, Types, and Essential Insights

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hello friends thank you for watching this video I am Muhammad and in today's video we're going to be discussing API rate limiting we're going to be discussing why do we need them and then we're going to be discussing the different way we can actually Implement rate limiting into our API so let's get started so what we're going to be needing here is we're going to be representing a client and server relationship so let me get a laptop for example and this is here a laptop and this is going to be for example representing my client so this is going to be my client and for Simplicity sake I'm going to put like I have four different client 1 two three and four client so here what I have is I have four clients and these four clients basically they're going to be communicating with my server and inside my server they're going to have an API so now that I have my client and I have my API ready whenever my clients wants to do any types of call they need to actually call my server so for example here I can have one client calling This Server here for request then I have another client also calling the server for request similarly to all of these clients calling for request and lastly this one so if every single one of these client is going one single call per second that should be fine my server is able to handle it so a server as we can see here has certain limitation when it comes to memory which is going to be equal to RAM and CPU for processing so this is going to be my main limitation for my server so in case one client is doing one call that should be fine if another client also doing one call that should be fine and third one as well and third one as well so now I can see I have my clients every single one of them is doing one call One Call one call my my my server and my API is able to handle this so let's say the limitation of my server is 100 call per second and everything other than that my server was just crust so let's say my client all of a sudden decided that okay they wants to utilize more of my server functionality and they increased their utilization from from 1 to 25 so that's a very good implementation it's still within the limits of my API and I'm still able to give back my uh the response then client number two decided okay I also have a good utilization I'm going to increase my utilization from 1 to 30 so now we are more than halfway of my resource capacity so here a server started to work a bit harder in order for it to actually return back the response now we have client number three client number three okay perfect I also have a perfect opportunity here I'm going to increase my utilization from 1 to 30 so now basically we have almost more than 75% of utilization so now the server is actually really struggling and to start to back the response now we have our final client final client said okay I'm also going to need a 20% utilization of my resources so I'm going to basically send 20 more requests and basically from there we can see now the server has ex have exceeded the number of allowed uh requests so now we have 105 rather than 100 and now the server is completely going to be overwhelmed and the server is going to crash so the application here is going to say a 400 or 500 server error because basically the server is not able going to be is not able anymore to to process any of those requests and this is basically what happens when we don't have any types of limitation on the number of requests which is going to be coming in because the server is actually going to be compromised of certain memory and CPU we're not going to have basically a card to blunch or like an open-ended capabilities of how many requests we can do per second if we are on the cloud we can actually have automatic scaling for This Server again this is all theoretical so if for example if I'm on the cloud I have capability to scale but when I I scale more my server I'm going to I have to pay more money for it so for example if I want in order for me to serve this it means that I need to have another server and if one server is is costing me let's say $50 per month to having a full another service to handle this that's going to be another $50 as well so that's $100 just for an extra five request which is going to be coming in which is not really feasible and this is where we can see the problems that it might we might face whenever I have an open API where anyone can call my API without without actually having any limitation on the number of requests any client can do for the server so how can we solve this so lucky for us we have something called rate limiting let's change the color to uh blue as defense So within a rate limiting and there's different ways of rate limiting which we're going to be discussing in a bit more details but in essence rate limiting limits the amount of requests that any client can do to the server a rate limit for example would say like every single client is allowed to do certain number of calls per per seconds around those lines so basically limiting we are not allowing if the client number one wants to do 25 we're not allowing them to do the 25 at the same time what we're doing is we're basically trying to break down this 25 into chunks of 5 five five five so per seconds we are only receiving a handful of requests coming from a single client so for example here instead of having 25 the way we're going to be processing it could be five five five five and five so this is going to be the equivalent of having a 25 call so instead of having directly 25 call the with the rate limitting I'm going to have five five five request of five every time similar to one you have here instead of having a five request I'm going to have six and similarly here and here I'm going to have four so that way what I'm doing I'm actually protecting my server by actually trying to limit the amount of request coming in and first of all I'm saving money because basically my server is not scaling so I'm not really utilizing more resources so which I have to pay for second I'm actually protecting my database because also the database is also going to be utilized it's going to have a lot of pressure on it in order for it to provide all of this information back so we can either here like have a high utilization of the database or for example I could have a read loock on it so there's good multiple scenarios where my database could F face problems so that's why it's also good to have rate limiting implemented on it to protect it and lastly is to protect my API there's a lot of malicious utilization of apis so for example if someone has uh wants to take my API down and wants to prevent anyone else from utilizing it and wants to make it unavailable if I don't have any rate limiting let's say I have this malicious client so this malicious client here I'm going to make it in red because it's malicious and this malicious client decided that uh I'm going to say it's a bad actor decided for any reason that they want to uh break my server and they want basically not allow anyone utilize it so what do they do they start flooding my server with requests they start actually doing all of these different requests to my servers and for example instead of having a normal number of requests like 20 or 30 they start sending a thousand of requests per second and we can see here 1,000 requests per second uh on a server that can handle 100 requests per second it's going to break and it's going to turn the server to to completely stop working and this is why rate limiting will actually protect my API from any Bad actors or anyone who's trying to actually break down the server that I'm trying to build and this is why it's really important for us to actually protect the apis because first of all we don't really want to damage any of our data we want to make our API available 24/7 uh for all of our clients and we want to prevent any Bad actors from basically breaking our applications so how can we implement this rate limiting so there's a lot of different ways where we can actually Implement rate limiting for our applications and the first one we're going to be starting with is going to be fixed window so fixed window means here that we have a timeline so here we can see I have my timeline and within this timeline we can say that this timeline is going to be every 5 seconds so every 5 Seconds there's going to be requests coming in and within this request coming in I'm allowed to process 60 Quest so every second or every 5 seconds I'm only allowed to process 60 Quest so once this uh timeline has finished so once the first timeline has been processed completely then I'm going to have another timeline or basically another time window of of 5 Seconds which I'm only going to be processing request so what happen if a request comes in and uh I'm still waiting within my pipeline so let's say I'm I'm already in my window here and I am in the 2.5 second uh window time time frame so I am here in the 2.5 second and the request comes in so what happened to that request which is going to come in the request will not going to be able to go into directly into my into my timeline because this timeline has been closed and it's only going to accepting a certain number of requests coming in so this request will go into AQ any subsequent requests will also going to be going into AQ the request will keep piling up as long as my time slot is still in process and it's going to keep piling up and piling up so this queue will also going to have a limit so for example here we can say the number of requests coming in for this queue will only accept let's say in this case six requests so once we hit the six request the the queue will be full and basically within the six request our time window will also going to be uh utilizing to process all of the incoming requests that we have so once this uh time window has has elapsed so what's going to happen then then we're going to be basically having a new five second timeline and then What's happen is I'm going to take requests from my queue and start processing here and this is what fix window mean fixed window is within a certain time frame which is going to be 5 Second in my scenario I'm only allowed to process certain amount of request and all of the subsequent request will go into a q so you might think to yourself okay great if that is the case what's going to happen if we said the que can only handle sixth request what's going to happen if a seventh request come in or an eighth request come in how will my application handle it in case another request comes in and the que is already full this request will be completely rejected and if another request comes in and this request will be completely rejected and the error that these will get is going to be the error of 429 to many request implementing rate limiting on these requests so we can actually prevent our apis from getting flooded so these two many requests that we're going to be getting here and all of these requests which is going to be coming in are going to be completely rejected perfect so we can see that this is actually a good implementation and and a good way to protect our uh service but this will not protect it 100% why because here as we can see we are thinking that the requests might come all sequentially but what happened which I'm going to make it like this what happened if all of the requests suddenly come all in a single burst so we can see here five request or six requests came directly into a single burst and my application was able to process this five request and they were able to finish within the first second so this mean that once my application has finish it within the first second I'm going to have roughly 4 seconds of an API doing nothing so I'm going to have all of this wasted space all of this wasted resource inside my API which I'm not able to do anything why because my API was able to finish everything within a single second and this mean that I'm not really optimizing my API to the maximum this can work uh this implementation of a fixed window can actually be very beneficial in certain scenarios but as you can see it has certain limitation that we might need to think of so depending on the need and depending on how you want to actually your API to respond you might actually consider this or you might consider another one of these implementation now that we have understood what fixed window means let's move to the other types of rate limiting that we can actually Implement which is going to be called a sliding window so a sliding window mean it's pretty straightforward it's it takes the same example as a fixed window but we can say that for every single second we can only process two requests so for example here if a new request wants to come into place and we are already processing it we start adding it into a que so in this scenario I have my timeline and I can say for example for every single for a certain amount of time I can actually process certain amount of request so we can see here that this is time frame that I am in and I can see I have two requests so once this request has been completed and this request has been finished successfully what I'm going to do is the wind the sliding window will go in and this will move forward and this now will become will become Act active and this will become active for it to be processed and then once this has finished this will also now be turned into completed and then this will move into the next step which is going to be here and so on so forth so the sliding window in many ways is complimenting the fixed window that we have discussed before it has the same structure of a queuing mechanism where actually all of the requests will be queued in and we can see here it it can actually allow us a bit more flexibility when it comes to handling all of these incoming requests I would say the main uh downside to the lining window it's going to add a bit of an overhead for the full response time but this is also the case for every single types of rate limiting implementation that we're going to be doing so that's fine the second one is going to be the third one actually the third types of rate limiting is going to be concurrent and concurrent is going to be a bit more of a real my Preferred Choice to be honest on what on what type of implementation that we can actually do within a rate limitation so here we can see that we have six different different users and all of these different users are calling my API are calling my API simultaneously so we can see here we can have six users and all of these different users have these requests coming in into place so with a concurrent request I can say that my application can handle a certain amount of requests so I can say every single application or every single user is only allowed to have five or uh five requests coming in simultaneously I'm not limiting on the number of users whiches are coming in I'm limiting it on the number of requests that every single user so here before in the all of the different examples I'm not really trying to limit the request that which I'm processing based on the users I'm basically I'm user agnostic so I'm just basically taking any types of requests coming in on the other hand with concurrent I'm actually doing the limitation based on the user which is who is doing the request because every single user has a limited stock of request they can come in so I can say for example every single user will have five requests to come in so every single request that they do after that it's going to be completely uh rejected and they're not going to be able to process so this is why we can have within a concurrent request we are not actually limiting a new users from com in so let's say I have user seven user eight have joined and these user are only doing one to two calls they're actually able to join and do those requests even though my API is already like fully backed with all of these different users I'm not forcing any types of limitation because within their own allocation uh request U slots they are actually able to do this request if they exceed it it's going to fail so here we can see that user 4 has already another five request coming in and all of these five requests are going to be completely rejected because basically they are exceeding the five allowed so once these let's say this one and this one has completely completed successfully so I'm going to say green for them to be completed successfully what's going to happen then the limitation for these is going to also going to move forward with it so here we can see now they have another five and this will now become able to be processed in case they do a new request so here we can see that with this every time they finish the slots number that they have they get basically a new slot of request and they to come in and then based on that they will be able to process more and more requests another cool feature about this let's make it back so in case my request comes in so instead of directly rejecting and saying that there's going to be too many requests what I can do is I can put these incoming requests into a que and it's going to follow the same as the time fix window rate limiting uh so basically these will go into a Q I can specify if this que going to be fifo or lifeo first in first out or last in first out and basically within this que what I can do is I can just store them so I'm not rejecting them if my queue is full and then from that point I can say okay you have a 4 to9 rate limitation so you have done too many request and we're not going to be able to process a new request but other than that I'm actually able to keep a track of all of these new incoming requests and once I keep track of all of these new incoming requests I'm actually able to resend them to their specific client based on their utilization so I can know for example these two belong to client one this belongs to client 4 etc etc and basically in this way I'm actually able to manage these incoming requests I am actually able to protect my API and I'm giving flexibility to all of my users for them to actually have capabilities to use my API and point without forcing a limitation if one user decided they they want to basically do a lot of calls and I don't want to prevent all of the other users I'm basically here basically sharing the amount of resources among my users and I'm in the number of request per users and this is why this is one of my favorite way to implement rate limiting and the last uh rate limiting approach that we're going to be discussing is going to be the token based trate limit so what's a token based trate limiting so let's take a look at this example so we can see here a request will come in and there's going to be whenever a request will come in we're going to be checking how many tokens do we have so basically we say when we set up our API we're going to be say for every single time frame Li for example for every 5 second we're going to have roughly around 10 or basically 30 tokens so for every 5 Seconds we're going to have 30 tokens to be utilized so whenever a quest comes in Within These 5 seconds it's going to check do I have any tokens available yes if they do have a token available the request will be processed will be taken into will be forwarded for processing a token will be removed from the box and then the roest will be processed another another request will come in do I have any tokens to be utilized yes and then this token will be actually utilized and then uh the request will be processed so let's imagine now we have reached our 30 token that exist and we are still within this 5sec time window and new request will come in do I have any available tokens in this case no so the request will be ignored and this is what token based policy so token base is not going to be preventing the incoming request based on a user it's not going to be preventing the incoming request based on the on the client that we have it's going to actually base it on the number number of tokens or basically on the amount of request that we are actually allowing to be processed within a certain amount of time and this box here that's going to be storing all of these token it can be refilled every single time so we can we can say every 5 Seconds we're going to add another 30 tokens every 2 seconds we'll start filling it up again so there's different ways where you can actually handle the token based policy a good case scenario for me to utilize token based policy if I don't really know the amount of clients who's going to be utilizing my API so I don't really want to allocate resources based on the client so this way is a good way to actually allow if for example I have two or three different clients and one of them want want to really do a lot of calls that can that way I can actually utilize this the main downside of this if I have for example greedy clients hope let's say I have one client who wants to do like a 100 call they can actually do a 100 call and basically deplete my token before it's getting being refreshed before any other one any of my other clients can do any request so we can see this is could be one of the limitation but I can it's always going to be a delicate of what do I want to achieve and what are my use cases and based on that I can select what type of rate limiting algorithm I can utilize within my apis to protect it from malicious utilization so I hope this video was helpful it was mainly about understanding those different techniques of how we can actually protect our apis from malicious users and how we can actually have more control about the incoming requests if you have any questions please put them in the comments down below if you want to see how to implement this I will link the video where I where I go step by step about the implementation of all these different rate limitation withinn uh I'll link it here somewhere in the description below so you can actually go watch it that's a full implementation with code this one is a bit more about the idea of rate limiting and why do we need them so thank you very much for watching and any questions please make sure you put them in the comments Down Below have a great day

Info

Channel: Mohamad Lawand

Views: 787

Rating: undefined out of 5

Keywords: .net, api, c#, dependency injection, asp.net core tutorial, .net core, asp.net core, web api, dotnet core tutorial, dotnet core web api, dotnet core, dotnet performance, api caching best practices, dotnet core filters, asp net core filters, rate limiter, dotnet rate limiting, web api rate limiting c#, web api rate limiting, web api request, c# rate, c# rate limit api calls, API Rate Limits, APIs Explained, Programming Concepts, API Management, Web APIs, API Implementation

Id: LVl2Lftj8A8

Channel Id: undefined

Length: 19min 42sec (1182 seconds)

Published: Mon Jan 08 2024