Want Faster HTTP Requests? Use A Session with Python!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
if like me you've been using the request library in python a lot you'll surely love it for its simplicity how straightforward it is and easy to use it's billed as http for humans and we can make a simple get request to a server and store the response in just a single line of code now this is really cool really useful for us especially when we're doing things like web scraping or getting a data from an api but its simplicity and its basic usage does come at some costs it's most simple case of request.get misses out on a lot of features sometimes where we might want to make our code better by using these or at least more tailored for the situation that we're trying to achieve one of those features is a session so what is a session it's essentially an object that we can create that allows us to keep certain parameters uh from our initial request over further subsequent requests what this means is by making these parameters persistent we can clean up and speed up our code so these could be things like authentication maybe usernames and passwords request headers which i just did a video on last week so if you want to go check that out i'll put a link somewhere for you or maybe things like cookies as well but the main one we're going to be looking at in this video is the connection pooling that it allows us to use we're essentially making our http connection to that server reusable so what are the benefits of this well instead of creating a new connection to the server for each request that we do we can effectively keep a single tcp connection alive for the whole time and use it to send multiple http requests over it so this can really provide a significant performance increase when we're scraping data from our website a small change in the code is all that's needed so i'm going to jump into the code right now and i'll run through some demos and i'll show you what i mean you'll be pleased to know that i finally got a replacement cable so i'm back with my lovely headphones that i like so much as opposed to those other ones sound really good but nowhere near as comfortable as these so this is the site that i'm doing the testing on it's scrapedthesite.com it's basically for practicing your web scraping skills pretty cool pretty useful and this is the page here um what it is is there are multiple pages here with data and we start on page one and we load up and what we could quite conceivably want to do with the website like this is pull out as much as or all of the data including the names et cetera et cetera et cetera so this is the site that i'm using so credit to those guys for making this this is the code that i've got here let's make this a bit bigger so we can all see i've got requests i'm importing beautiful soup to do the passing of the data and i'm using date time and i'm just basically doing a rudimentary start and finish minus the start to give us a rough idea of how long it has taken to run this code it's not an exact science but it will sure it will be good enough and accurate enough to show us the difference in speed of the two so we can see here i have this function called get title i couldn't think of anything more imaginative to do so we're just printing the title here's our url and where you can see i'm using an f string with the x here to get each new page and then we are doing our r is equal to request.get which a lot of you will be familiar with and we're printing the title and then returning the function i have the if name is equal to main in here because of what i was doing some testing earlier you absolutely don't need this bit if you're doing your own testing you just need to run these functions i'm going to leave it in there for now so we can see i'm creating a start time object and then a finished one which i'm minusing and we are doing for x in range 1 to 21 which is going to give us 20 pages and then it's going to print the finish time taken so if we run this we will see that this should take we can see that the title was popping up this is each every didn't every new page um i couldn't yeah we could have done a bit more and got the name of the first team or something like that but we didn't bother we can see that took us roughly 9 seconds 0.84 so i'm going to delete that and here and then i've got no session here and i'm going to paste that in just so we can have a reminder of how long it took us so to actually utilize the session we just need to make a couple of small changes what i'm going to do is i'm just going to copy this function out here and paste it underneath i'm just going to collapse the first one and i'm going to say get title and i'm going to say session there and under our r is equal to request.get i'm going to change that to r is equal to s.get because s is going to be what our function is our uh session is going to be called the rest is going to be the same and then down here we're going to do s is equal to us is requests dot session now we're putting this outside of our function here because otherwise every time this function would run we'd create a new session therefore defeating the point of the actual session itself so that needs to be outside of the function we're putting it here and then we are going to change our get title function to our new one let's save that and our old session was uh 984 and our going to say with session our time will be and we will run that now hopefully it should be quicker it already looks quicker coming down the screen if that's such a thing and we can see it was six seconds at 0.3 so that is quite a marked dis difference there already that's like over three seconds simply by using a session so we can reuse that um tcp connection keep it connected up and get our http requests flowing much quicker so i'll run that again just to see what we get just to make sure it was consistent and we can see it looks much quicker again and it's going to be somewhere around six seconds there we go 5.45 so even quicker that time so if i was to let's say change this back let's comment out our s thing again and then put just to get title here run that one more time it's probably going to be again somewhere around nine and a half seconds you can already see it looks much slower there we go 9.56 so even over the space of 20 requests on one server like this we have found that just by using the session object we're able to speed it up by almost 50 so i guess that's like a 40 speed increase just by doing this so it's definitely worth looking at and putting into your code so as you can see just a small change in our code to use that session can provide us with some significant benefits so i'd urge you to go ahead and try this in your own projects and your own code and let me know how you get on if you like this video and you found some value in it please consider subscribing i've got lots more videos on my channel like it and more to come so thank you very much for watching guys and i will see you in the next one goodbye
Info
Channel: John Watson Rooney
Views: 9,890
Rating: undefined out of 5
Keywords: web scraping with python, python requests, requests session, python requests session, session objects, python session tutorial, faster requests, speed up web scraping, python requests advanced, http requests python, python requests library, python requests package, python web requests, python coding tutorial, learn python, learn web scraping
Id: IDhuUpeF1n0
Channel Id: undefined
Length: 7min 15sec (435 seconds)
Published: Wed Feb 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.