Using paginated APIs with Python (four ways!)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so what we're going to do in this video is figure out how to work with paginated apis which is an api where it doesn't give you all of the information you want at the same time and maybe you have to make a few different requests and combine them in order to get the the total number of uh you know results you're looking for the api we're going to be using is this pokemon api here and the way we're going to be using it is we're just going to be grabbing a list of every single pokemon that exists so this is the url that we're using gives you a ton of pokemon very exciting bulbasaur ivysaur venusaur etc etc etc what we're going to do is we are going to figure out how to get all of the pokemon in at least four different ways because when we start dealing with this api there's an issue let's see it happen this is the url that i just showed you um we're grabbing a hundred pokemon at a time we're just using basic requests blah blah blah if we look at data the very first thing we're going to zero in on is these results here because we love pokemon we just want to see lists of pokemon so exciting all these pokemon so many we can just jump right into those results and look at them they're a list how many pokemon do we have here a hundred pokemon but there aren't more than a hundred pokemon so what's what's happening here well if we look at uh how this is set up we see the very first part of this uh result from the api says one one one eight for the count and if we read the documentation which hopefully we did but maybe we didn't we might realize that oh there are actually 1 118 different pokemon so we will have to get them 100 at a time again and again and again until we have caught all of them now the way that we do this we see a hint here the next url is offset equals 100 limit equals 100. let's take this url and look at what happens when we request it so i'm just going to throw it on in here and then i'm going to look at data now it looks pretty much the same except if we look at the results we see that this one starts with the pokemon electrode pokemon 101 and if we go to our last request we see that it started with pokemon 1 bulbasaur and pokemon 2 etc etc so what's happening with this url is it's giving us 100 at a time but when you make the request along with saying i only 100 at a time you can also say let's skip the first 100 or as you're seeing right here the next page is skipping the first 200. so skipping the first 200. we're now down to this pokemon here pokemon 201 um and if we keep going all the way up to something that's around this count we'll see if we can't get to a point where we have our final pokemon okay so if we get to 1100 we are here and we see results is not nearly as long of a list if we went up to 1200 we would now see that we have no more results so pretty much what we need to do is we need to start from the page start from the page of limit equals 100 which we could also say is offset 0 limit 100 and then go the way all the way up to this collecting pokemon as we go now is this the best way to do it no what we're going to do right now is we're just going to make a loop where we step through the offset like this there are a few different ways to do this what i would probably do is we're going to start off with a limit of 100 and an offset of zero and then i'm just going to loop so how many you know pages of content are going to be in here for i in what is it going to be 12 we'll see we'll see so we'll say limit is and print offset is and every single time we go through here we're going to increase offset or there are a few different ways to do this one way could be we're just going to add 100 to offset every single time i'm going to add a line here um maybe even yeah so offset just increases by 100 0 100 200 300 and then it goes up to 1100 another way to do this is this is kind of our page number right here so i could say page number times one hundred and then because page number is zero one two three four what range does is basically counts up 0 1 2 3 4 all the way to 11 in this case uh oh oh because it starts off as zero i need to move the print down here so yeah i guess it depends on whether we want to add at the the beginning or the end the way that i'm going to do it is i'm gonna work with this page number thing because usually speaking when you have an api sometimes you get to say the offset and sometimes it's gonna be like page number one page number two page number three and on the other side it's going to automatically calculate the offset for you so we're going to stick like this um we're going to say our limit is 100 every single time we go through we're going to calculate the offset we could also change this to be page number times limit probably the fanciest way to do it at the moment and what what this will do is if we change the limit then it will end up changing the way that it steps through the offset so let's build the url i'm going to steal this url here i'm going to say url equals voila fill in the blanks offset is this when it is that and then i'm going to print out the url that have strings sorry so there we go takes us up to 100. now what we need to do is we need to as we go along collect all of these pokemon into a single list the way we're going to do that is something called extend so if i have a list of things and it is one two three four five and then i have more things and that is you know some other numbers if i want to combine these two lists the way that i will combine these two lists is by something called extend and what extend is going to do is it's kind of a way of adding multiple elements to one list so if you have two different lists and you want to combine them you can use extend and so now magically more things has been added on to list of things all these things here are all these things here and they just they just got thrown over so nothing too crazy just what we're gonna do is every single time we go through we're just gonna add on to our list of pokemon and as we saw before pokemon they live in data data results so in the very beginning i'm going to say i have no pokemon and then every single time i step through when i get this url i'm going to say hey go get that url process the json and then add the pokemon onto my list of pokemon um before i make the request i'm going to print out that i am making the request here uh it'll just make things a little bit a little bit easier because if there's something goes wrong with the processing at least don't know where it went wrong so i'm going to run this requesting requesting requesting requesting and now if we look at the number of pokemon i have it's it's a lot we can look at the actual pokemon themselves goes from bulbasaur all the way down to uh even even more than that so i guess i can look at the last one the last one is uh toxicity low-key gmax all right that's much more complicated sounding than than bulbasaur okay so that was the first way to do that this is probably the worst way to do it um you don't often know so pros um maybe easy to think about um the cons of doing this method which we'll call the uh discrete page of nether method we'll call it the you know we know how many pages we have method the cons are you need to know how many pages it is so it's not very flexible it's not flexible okay the next method we will use is listening to the api so when we made a request to the api we'll grab this from the very very top one of the things that it gives us is this next url here and it turns out that we can just keep grabbing that next url until there is no more next url so let's say this is the very first request that we make and it says okay this is the next url and i say okay i grab this url i'm gonna put it in there and it says okay now this is the next url and i say okay i'm gonna do it and i say oh wait wait wait wait instead of just copying pasting this in i could just use data next right i could just paste that right there and so this says all right it's you know gets the next url and then it processes that url and here's our new next url then we're going to take this new next url grab that and then get the next next url and it's just going to keep going so we can just keep running this code running this code running this code running this code running this code running this code until eventually there's no more next url right so we start by saying hey here's the url that we should request we add in this is our manual work this time we're saying this is the url we're interested in and then we just keep running this code until there's no more next url in the request now if you want to run some piece of code until something is true or until something is false you're going to be using a while loop and so here's what we're going to do we're going to store this url maybe we're going to call it next url and we're going to say hey while there is a next url do this stuff so what's going to happen is i'm going to say hey here's the url that i'm going to hit um [Music] if it exists then grab you know go to the change up to next url um i should have clearly just kept this as url for cut and paste sake if if nothing else so what we're going to do is we're going to say hey this is the url we're starting at grab it get the information and if there is a next url save it to the variable url and then that is going to go back up here to the beginning of this loop and as long as it finds a next url it's going to keep running this code so as we always do let's say requesting url and it's just going to print it out so it keeps grabbing keep subscribing keeps grabbing and then magically without us having to know how many pages it is it finally runs out there because there is no url at here we can look that again if we look here we see next is actually missing data whereas for the very first page there is a next url so it just keeps going until there is no more next page that's being pointed at if we want to make this actually work and if we want to make this actually grab the pokemon we could do the same thing we did before where we start with an empty list of pokemon and every time we use dot extend in order to add to it so i'm going to say pokemon is nothing and while i'm doing this i say pokemon please be extended by the data here um i would there's no real reason to do it in this order um it just seems to make sense to grab the new pokemon and then just whenever we're done update the url all right so we're going to try this again run run run run run run run and then i can look at the number of pokemon and i again have that nice big number one thing to notice is that if we've listened to what the api says for the very last request it says limit 18 whereas for the very last request last time because we were always saying limit 100 it is limit 100 even though it's the same offset is there a difference between those no no um it just knows there are only 18 left whenever you hit 1100 so the api is smart enough to say oh you only want a limit of 18 here all right so uh next up there are two more so going until there are no more results and going until there's an error let's see let's pros and cons of this one so this is the method of listening to the api and the pros or the api is knows what's going on inside and it's good if you don't know how many pages there are pages or results shall we say um the cons are the api has to actually tell you the next page it's not always the case that you get information about what the next page is and so that's what we're going to do for this next one which is going until there are no more results so if we go back to this right here actually let's go back to start from this url because it's near the end so if we grab this url and then we get all the pokemon got a lot of pokemon here right billions of pokemon millions of pokemon if we increase the offset once again there are again a bunch of pokemon fewer than before but new stuff still not zero and then if we increase this to 12 we now see there are no more pokemon so you don't get any results from the api now what we're going to do is very similar to what we did before what we did last time is we said hey as long as there is another url as long as there is information here let's make another request what we're going to do now though is we are going to say hey as long as there is actual data here keep running keep doing this which means our loop is going to be a little bit different so we're going to start from here and we're going to start looping through now first i'm going to show you how to build the url if we run this code right now it is not going to work it's going to run forever because url is never going to disappear but we actually kind of want that we're going to be doing a crazy thing called while true and that just means run this loop forever now if you run this loop forever it's not gonna ever stop until your computer turns off or you shut down your server or something like that so we have to manually escape from this loop now when we escape from the while loop before it was actually checking something it was saying is there a url is there a url is there a url this time we're going to be running this line called break and what break says is just take me out of this loop take me out of this loop i want to escape so if i run this code it's only going to go once and that's that's going to be it because as soon as it hits break it's gonna be gone now let's say we we're counting like we did before we're gonna start with offset equals zero and every single time we go into here we're gonna say offset equals offset plus 100. and maybe i'll say if offset is greater than 500 break so what's going to happen is this loop is going to run forever increasing offset increasing offset increasing offset increasing offset until this happens if offset becomes more than 500 it will break so if we run it i guess i should probably probably fix up our url um the url is going to be offset equals offset and limit equals we'll have a limit variable because why not maybe we want to change it at some point okay so we can see that offset is slowly increasing and slowly increasing and slowly increasing um once it gets above 500 then it it kicks out so how are we going to turn this into something that we are going to work with for for our pokemon well um instead of saying if offset is greater than 500 or even dealing with offset at all what we're going to be doing is saying are there still results are there still results for here yes there are still results get another page for here there are not still results so we don't want to get another page so in here i'm going to say hey we're going to request this url grab the url get the data from the url if we have let's see we'll do if land data results is zero so if there are no results here um just exit otherwise take our pokemon and extend it with data results so are there any pokemon left or did we find any pokemon if not actually the loop if we did find pokemon add them to our list and then move on to the next offset and so it'll just keep looping looping looping looping looping um request request request request and we see this one is different from the last one where the last two stopped at offset 1100 right because they're only 1118. for this one it had to actually go all the way to 1200 um because this is the first time where there are no more pokemon where no pokemon end up existing and that's where it says okay we actually don't have any more pokemon let's say that we're done with this now one more way of doing this the final way well this is first method um this is called going until no more results so the pros it's very flexible and the cons are extra requests doesn't really listen to the api because we're get grabbing urls that you know it doesn't necessarily want us to grab um it's it's you know maybe it's a few more lines of code um the final way is waiting until things break it's similar to what we're doing before so let's actually go through this code here and we're just going to adjust it a little bit this technique is especially good for scraping but we'll get to it we'll get to it so what i want to do is instead of saying did we find any pokemon i want to print out the first pokemon because you know maybe i want to debug debug by printing out the first pokemon because for example before we a few times ago for maybe the first example second example we requested that first page twice an offset of zero um so it might make sense to kind of check what pokemon we're getting on every page so i'm going to run this and it's to say okay bulbasaur is the first one electro the second one okay so we are successfully grabbing a new page each time we're not accidentally getting you know this first url twice or anything like that it's going it's going it's going everything's working great but then when it gets to offset 1200 right where there are no more pokemon at this url right no more pokemon when we try to grab results zero it says no you can't do that can't do that there is not a first pokemon for you to grab the way that we fixed this before is we said hey are there no more results if so break we're going to do something slightly different here what we're going to do is we're just going to look for an error and if there is an error then we are going to escape from our our loop um in python i can say try to do this if it doesn't work we found an error and try will run this code you can do like a bunch of lines of code in here i mean if there is an error then it is going to run this thing down here what we want to do try to print the first one out if there is no first one to print out we're going to break exiting the loop which means we don't actually need this here we could also do pokemon extend in here it's you know you can organize this however you want um but the idea here is instead of manually checking to say hey do we do we have pokemon that we can add we're just waiting until something goes wrong till it can't find any pokemon and then it's going to say uh i'm going to exit the loop find some pokemon find spoken finds pokemon requests this page encounters an error exits the loop um method waiting until you hit an error with your processing oh i will note that we can't put pokemon extend in here because if we do pokemon.extend data results and results is an empty list it's still going to work because you can extend you know we can add on an empty list onto this list it just won't do anything so we have to make sure that we're getting the first element the second element printing out something from the results that will no longer be there when we have run out of results so um wait until you hit an error so pros super easy and flexible this is really good for scraping if you're trying to grab things that happen like headlines on pages or something like that a good technique for scraping multiple pages the cons are what if there's some other error that isn't a you've run out of data so let's say we were printing out let's say the name of the pokemon or something like that or you know the name uh in english even though it doesn't exist here maybe the pokemon exists the pokemon has a name but it doesn't have an english name you know it only has a japanese name that would have an error and then it would pop down here and it went next to loop when in reality the error isn't there is no first pokemon the error is that first pokemon has no english name um cons it's also looks more complicated um honestly probably the one that we did where we just looked at the next url is probably the easiest one i'm going to clear all of my code so i can scroll through this easily this is the first one that we did right here where we just we knew the number of pages we had to go through and we had to go through 12 different pages we just updated our offset we updated our limit built a new url grabbed the information from there second one we did was we listened to the api we said hey what's your next url we just kept grabbing that next url and as long as that next url existed we were able to keep running through our loop here the other one was we said hey if we request something from offset 1200 it then gives us no more results right results is an empty list so we just said hey um is results an empty list if so we're going to exit from this loop otherwise this loop would go on forever if you write this code wrong another con is if you write this code incorrectly it will just loop forever and you'll have to control c in order to exit out of there and then the final one we did was similar to this one where we look to see did we find any pokemon but what we did was we said let's try to print out the first pokemon and if there is no first pokemon we're gonna get an error we'll just swallow that error up uh and then we'll um pop out here and uh exit the loop so things to pay attention to is uh we you know updated the url uh we used while loops in general and yeah they're just 101 different ways to do it probably depending on the api one or another will end up being more convenient the end
Info
Channel: Jonathan Soma
Views: 451
Rating: 4.4285712 out of 5
Keywords:
Id: 4Fdyft-ky0w
Channel Id: undefined
Length: 29min 57sec (1797 seconds)
Published: Sun Jun 27 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.