Python Web Scraper Tutorial: Sessions, Requests, Cookies & JSON!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey there in this video I'm going to show you how to use Python and the Python request package to scrape a list of online users from a website that requires you to be logged in it's a very simple application but I see a lot of videos don't really mention using cookies or how to scrape data from websites that require you to be logged in so I'm going to show you simply how to do that this is the website on the right hand side that I'm going to be scraping data from and as you can see here there is a list of online users now this is the dates that we're going to be scraping from here to follow along with this tutorial you should have - installed and you're going to need the Python request package which is here I've left a link in the description and if you scroll down to the installation page there is some instructions on how to install now this package recommends you use Pippin I don't if you do just copy and paste that into your terminal otherwise just use regular hip here install requests I have that installed already so next up we can begin to scrape this page so before we do actually I want to quickly give you a brief overview of HTTP and how requests are made to this web application in order to get the list of online users now this application specifically sends a get request to their API that returns a JSON file with all these users here so periodically they're updating this list every minute or so there you go is just update as we've been watching so to see how that's done right click and press on the inspect button and we're interested in in the network tab here now this network tab monitors all the requests that happen on the application or the website and as you can see here a period nicholae requests to this kind endpoint are being made and I believe that is the inbox messages because that's number four and up here is number four so every so often they're checking to see the number of messages you've got and there's another on were interested in called online users so in a moment it's going to send that request and we're going to grab the details and move on from there so let's wait for that I think it updates every minute or so so it's due to so update now I think there we go online users so as you can see it's made a request to this page a get request and the response is a JSON file with a list of all the users that are online okay so let's head over to python and let's try to send that request and see if we can get that the same data so creating a file called scraper dot py and we're going to import requests and requests that gets and let's just copy that link that URL sorry and let's print that and see what happens so our dot text will just return their response the data and on the response so oops Python scraper dot P why didn't indeed here we go so it hasn't returned the day we once and the reason that is because we haven't passed along any cookies or any data to prove to the API that we're logged in because as you can see is a chat application and requires you to be logged in so this request can only be made from users who are logged into the application and as this request is just an ordinary request sending no cookies or anything like that the application the so not Lots been but we can fix that quite easily so scroll down and to the request headers and you'll see a cookie here which copy that over to this so here view toggle world wrap now we're interested in this one specifically this cookie here so when the request is being made by the web application here it's sending along this cookie chat IW underscore session and this cookie authenticates you on the applicant on the api side so you can make this request from the browser and get the response that intended response so we can change our request a little bit to make that happen so the first thing we're going to create is a accession object now this object is a a session like a browser session that stores cookies and other information that allows you to make requests in your in your script like so session doc Gair and any request that's made using the session objects will use the same cookies so that sounds all confusing let me just show you what I mean so here we've got a new session and we're going to change that to session and let's just test that still works oh what's happened sorry too many typos I'm live coding to be awkward there we go it still works what we're still getting the date we want just yet because we haven't passed in the cookies so next up we need to create a cookie jar so to do that it's requests got cookies that request cookie jar now the cookie jar is just like a list or a dictionary and that stores cookies basically so let's add a cookie to the cookie jar na that sounds so weird like a school nursery right so that's the name of the cookie and then the value of the cookie is this here we like the equal sign right and next we need to add this cookie jar to our session object session cookies equals jar and finally pass these cookies along with the request to surpass actually no we don't need to do that sorry so that now should return the same date so there's in the browser because when we make this request here using session don't get it's gonna pass along these cookies so let's give that a try there you go like magic he works and if we import the JSON package you can pass this date so quite easily and print a HTML and that's just print Lee the first user in the list there we go so because of the user ID the sec is the nickname country code country name state age at the VIP and that's how you scrape data from endpoints that require you to be logged in you just simply pass on the cookie and it's the same with most applications I find it do the same sort of thing it's just a bunch of cookies that I set so that's about it guys if you enjoyed this video please give me a thumbs up but I am just starting to do more videos now you say if you want to see more videos from me please subscribe and on the person to suggestions on the future videos leave a comment below if you have any problems again leave a comment and I'll help you write have a great weekend guys have a great week have a happy 2019 goodbye
Info
Channel: Jay Jay
Views: 26,697
Rating: 4.9137645 out of 5
Keywords: python, python3, web scraper, web scraping, tutorial, requests, http, cookie, session
Id: PpaCpudEh2o
Channel Id: undefined
Length: 8min 45sec (525 seconds)
Published: Tue Jan 22 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.