How to Speed Up API Requests With Async Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone in this video i'm going to show you how you can speed up your processes that require a lot of api calls in python so to speed up the process i'll be using aio http along with the standard async features of python but first i want to show you what it looks like when you don't use any of the async features so if you look down here i have the results of my script here which basically just takes a youtube channel id finds the playlist id for that channel's videos and then for each video on that channel it gets the number of views puts them all into a list and then calculates the average so you can see here for my channel i have 422 videos and the average view count is 19737 and this entire process took 95 seconds so let's walk through the process here so first in addition to importing a request to perform the request and time to measure how long it takes i'm getting the api key for the youtube api so i'm just importing that so you don't use my api key then i'm starting the timer and i'm putting in my channel id for my youtube channel so you can take any channel id for any youtube channel that you want but in my particular case i'm taking mine and then i'm performing a request to youtube's channels in point to get the playlist id for my channel's videos then once i have that playlist id i can go ahead and get the list of video ids and what i'm doing here is i just create an empty list i'm looping continuously because for each endpoint there's a possible next page token result that will allow me to get the next page so youtube returns about 50 results at a time i believe at maximum and if there are more than 50 results then it will give you a page token so that's just what i'm doing here then i'm taking all the video ids and appending them and then uh i'm just checking to see if there's a next page token and if there is i repeat the process if there isn't i just break out of the loop then once i have all the videos i do one final loop for each video so i have 422 videos so i'm performing this request here 422 times i'm getting the number of views per video i'm appending it to that list and then here i'm finally doing the calculation so all this this simple code takes 95 seconds which is really slow and if i had a bigger channel with more videos it will take even longer so by changing this to async we'll see how much time can be saved with this so just one thing to note that async is something that you use when you can do a lot of things in parallel meaning they don't depend on each other so if you've noticed i have three different types of requests here the first one is getting the channel information so the playlist id from the channel and then the second one is taking that playlist id and getting the list of videos and then the final one is just getting the views per video so if you think about it the first request for the channels has to happen before the second request for the playlist i can't start the playlist request until i know what the playlist id is from the channel so the second request depends on the first so i can't do anything with async here because i can't run them at the same time but once i have all the videos i have the video ids then each request becomes independent because no one video id request depends on another so that's where i can use async and that's where i'll modify this code to make it run significantly faster so this third loop is the one i'm going to modify i'm not going to touch the code above because i can't do anything about that in terms of making it faster through async so the first thing i need to do to get this to work is i need to create a function that i'm going to run in the event loop so the event loop is what's going to handle the running of everything in parallel so i'm going to create a function and what i'll do is i'll comment this out and there and i'll start a new function right so i'll call this async and actually i'll put this underneath the print statement so i'm going to run it above this print statement but i'll define it down here so async def so this is how you define a function that will be run asynchronously and main right and inside of here is where i want to do something so what i'm going to do is i'm going to use the library aio http so let me just import that so a i o http and i also need to import async io from python so this library is not in the python standard library so you have to install it first async io is so it's there by default and now what i want to do is i want to prepare to start sending all these requests so the first thing i need to do is create an aio http session to handle these requests that i'm going to send so what i can do is i can use async width so this is what you use when you're using an async function in a with statement so async with and it's going to be a i o http dot client session capital s as session and then what i want to do in here is i'm going to create some tasks and the tasks are going to represent each individual request that i want to send right so i have 422 videos so i'm going to create 422 tasks and the easiest way to do this is just to loop so what i'll do is i'll create an empty list called tasks and there are multiple ways you can handle this part but i think this is the most clear for a video like this but i can say four and what i'm going to do is i'm going to loop over the list of video ids that i have so if i go to video ids which i'm filling in here this loop i can say 4 video id in video ids and then what i want to do is i want to add a task using that video id and sending a request so i need to create a second function which i'll do in a moment but i can start the beginning of the code here and i'm going to call async io that ensure future and i'm going to call a function and another async function that will perform the requests and it's going to take in two things it's going to take in the video id and the session so i'll call this git video data and i'll pass in the session and i'll pass in the video id and i'll define this in just a moment and i'll make this a task and i'll do task.append and then task so what's happening here is every time i call insure future i'm basically telling it to get this process started remember i haven't created this function yet i'm telling it to get this function started and then move on immediately right so it creates this function it appends it and then it moves on right so i'll have a bunch of these being called in parallel when i do this it's not going to wait on the result like it does in the old example here i make a request and then i have to wait for the results to get back whereas here i just send the request without waiting for the results to get back and i append it to these this tasks list and i'll do something with that list later so before i come back here let me create this function so this is going to be another function i'll call it git video data and it takes in a session and a video id and then what i want to do here is i want to take the url and build it so i'll take the url from here and i'll paste it there so i'm going to take the video id and create a url and then i can use async width again so async width and then i can take the session which comes from here and the reason why i'm passing the session to this function is because i only need one session to perform the requests i don't need to create one for every single video that i have so that's why i pass it to this git video data function so async with session dot get url and then i can say as response right so it's going to send this request and it's going to await it meaning it's not going to sit there and wait for it to return instead it's going to allow other things to execute at the same time and it's only going to come back to this code when it's done once it comes back i can get the response so i can do response.json and then i can also await this but this will be pretty fast converting the response to json and i can say something like results data and then once i have this result data i can get the actual data from the api so this will be similar to what i have up here so i have items and then i can get the view count so let me just copy all this and paste it here it's slightly different so i'll modify it so instead of r json i already have the response converted to json so i'll take result data and just put it here and then i can get the view count and i don't need to append it to anything i'm just going to return the view count because this stands alone now so this will only be executed for one video id at a time so i'm getting the video id and i'm returning the view count for that particular video id and before i forget i should make this an integer so it runs okay so now that i have this get video data function done i can come back here and what i want to do is i want to call async io again dot gather so what dot gather will do is it will take a list of tasks so i'm passing in the task here and i'm just using the star so i can pass them like you know it can be like task 1 task 2 task 3 and so on that's just what the star is doing is taking the list and converting it to a form like that and what i can do is i can await this and i can assign this to be the view counts so this view counts should be in here because i'm not using it but view counts down here should be assigned to async.gather right because each task is going to return a view count and what gather does it's going to take the result from every single task that i created with ensure future and it's going to put them in this value here this list so what ends up happening is after all this is done i end up with the same list that i had in this previous example it's just a different way of getting there but the difference is i don't have to wait for each individual request to finish everything is going to run and then as they come back they're all going to get put into this list so the data will be equivalent but the process to get there is going to be different and then the last thing i want to do is i want to display how long it takes so i'll just copy this with the print statements and i'll put that here okay so now that i have that and actually can put it outside of the uh with block what i want to do is i want to run this so to run something that's async you have to have an event loop and the easiest way to create an event loop is just is to use the asyncio.run function so where's my print statement here so i'll use async io dot run and then main is the name of my function and we'll see how long this takes so before it took 95 seconds let's see how long this one takes well it failed already it says main is not defined and it's not picking up at the end because i'm running as a script so i'll put this at the end and now let's run it so it takes three seconds and we see i still get the same number of videos i get the average number of views is the same but it only took three seconds instead of 95 seconds so this is exactly why you use async so anytime you have a bunch of requests that can be independent if you follow this pattern then you'll be able to save a ton of time over the regular synchronous approach so just remember everything has to be independent and then as long as you follow a process like this or something similar to this you can get the results back way faster and i'm on a slow connection right now so it would be even faster if i wasn't so that's all i wanted to show you in this video i'm demonstrating this because i ran into this and something i was making myself so i figured i'd make a video to show you all how to do it so if you have any questions about this feel free to leave a comment down below if you like this video please give me a thumbs up and if you have subscribed to my channel already please subscribe so thank you for watching and i will talk to you next time you
Info
Channel: Pretty Printed
Views: 18,392
Rating: 4.9649739 out of 5
Keywords: python async, aiohttp, python requests, async requests, python, tutorial
Id: ln99aRAcRt0
Channel Id: undefined
Length: 13min 53sec (833 seconds)
Published: Thu Dec 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.