Cookie Handling For Selenium Web Scraping in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on guys welcome back in this video today we're going to learn how to handle cookies when working with selenium and doing web scraping so let us get right into it not a g it's AED all right so the idea is the following let's say you want to do some web automation task using selenium and let's say in order to do this automation you need to log into a service or you need to do some initial setup process for example you need to change some settings or you need to uh change some parameters and only then can you get started with the automation procedure now chances are that you don't want to have to run through that process every single time you call the functionality or every single time you run your script if possible you want to stay locked in um and if possible you want to keep the settings the way they are and the way our ordinary browsers do that is using cookies so cookies keep you logged in cookies store information about a session cookie store information about uh certain settings sometimes and this is super important if you want to optimize your web scraping or web automation procedures or applications and this is what we're going to learn how to do with selenium in this video so how can we handle cookies how can we export them how can we import them so that we don't have to constantly log in and log out of services so we're going to do that using a sample application or a website called set cookie. net so this is not an actual website that does anything else but setting cookies but imagine this to be some web service that you want to write an automation script for so you want to log in and then you get a login cookie of course in order to see which cookies you have you have to inspect and you have to go to uh storage and cookies and then you're going to see the cookies this is in Firefox in Chrome I think it's quite similar you have some other tabs here but you basically have to just see what happens when you do it with your browser but on this page for example what happens is I can type some cookie name let's call this some name um and then cookie value can be some value for example and then I can just go down here submit query and you can see I sent this to the server now if I restart it or actually if I just go and press enter you can see it received the cookie some name equals some value and if I go to inspect and if I go to storage you can see I have some name some value so this is what happens in this case on this page now in an actual login page you would log in with your information and you would probably see some other cookie here that keeps you locked in it doesn't really matter what the cookie exactly is it's important that you know what the name is and it's important that you know uh what is happening once you log into an application and this is going to be different for every single application this is going to be different on social media platforms on some uh paid Services no matter where you lock in usually you're going to get a different set of cookies or a different combination of cookies here the important thing is that you figure out what is happening so that you can mimic the behavior and you can export the correct cookies in your automation script so so what we're going to do first is we're going to install selenium so pip 3 install or pip install selenium and also web driver manager which is not um which is optional if you don't want to use a web driver manager I just prefer to do it otherwise you have to point to the um directory or to the path of your Chrome driver if you want to use Chrome so what we're going to do now is we're going to import uh time we're going to import pickle we're going to import time mainly for sleeping so for waiting we're going to import pickle to serialize so to export uh the cookies into a file and then we're going to also use pickle to load the cookies from a file um and then we're going to also say from selenium we want to import the web driver from selenium web driver. chrome. options now this is an optional uh Point here so you don't have to do that I'm just going to show you how to do that in case you want to for example deploy your automation script on a server you want to use some options to make it basically run without a graphical user interface which is also possible um then we're going to say from selenium do web driver. chrome. service we're going to import service and then from web driver manager Chrome we're going to import Chrome driver manager and now I'm going to just say driver equals uh web driver. Chrome and here now I'm going to say service equals service and I'm going to load from the Chrome driver manager um a chrome driver using install so on demand it's going to just get a chrome driver manager uh and now here we can add the options uh which are going to be Chrome options which are as I mentioned um optional no pun intended so Chrome options is equal to options and you can add a bunch of different options here the ones that are relevant for uh server scripts are the no send box and headless options so I can add them here Chrome options equals or not equals uh Chrome options. add uh argument and then you can add the argument for example-- no sandbox then D- uh headless to basically run it without a graphical user interface then you can also use disable def shm usage these are just some small things that you might want to do if you want to deploy this on a server I'm going to delete them because we're going to run this here on my desktop PC um but this is usually the combination maybe you want to use ver bothos to see more information uh these options are what you want to use if you're deploying this into a Docker container and then on a server for example and this is not a c this is a v but in my case I'm just going to leave options empty or I'm going to comment them out so you can see them here in the code and then you can just pass the Chrome options here all right so this is the driver now this is what we use to navigate to URLs and then we can set the URL to be equal to htps and then set cookie. net now what we want to do is we want to cause a cookie to be set because the setting of The Cookie is going to be done by the website we don't have to set the cookie ourself uh the website is going to set the cookie the only thing that we want to do is we want to export it once we're done and we want to import it for the next session so what I'm going to do right away here is Implement a logic for loading a cookie in case we already have a cookie file so right now this is not going to work because we don't have a cookie file um and because of that we're going to have a file not found error um and in this case we're going to pass here as well but what we're going to try to do is we're going to try to load using pickle a cookies pickle file which I'm going to create at the end of this code and if it exists we can load the cookies from there so I'm going to do here with open cookies. pkl read bite mode ASF and then what I want to do is I want to say cookies is equal to F load oh sorry not F load pickle load F uh so this loads the cookies from the file and what we want to do now is we want to iterate so we want to say for cookie in cookies driver at cookie so with the driver at cookie function we can easily just set cookies and here by the way you can set whatever you want but we're going to set the cookies that are part of the cookie jar that was import that is going to be exported at the end of this uh script and then we're going to navigate again so driver get we're going to navigate again to the URL now what's important is we want to also navigate to the URL before we do that because we need to be on this URL to be able to do that so we're going to say driver. getet URL up here then we're going to load the cookies we're going to set the cookies and then we're going to get to the URL again just so we refresh it and we have the cookies there already in case the file is not found which is going to be the case the first time we run this application because we don't have the cookies yet in case we don't have cookies yet we're going to print that we don't have any cookies no cookies found uh yeah let's do it just like this then we're going to go again get URL I don't know if this one is necessary though probably not maybe we can delete this uh we can wait some some uh time for this to load because this is going to happen very quickly it's going to try it's going to fail and then um I want to wait for the content to be there of course uh the professional way is to wait for certain elements to appear but I don't want to make it too complicated here to focus is now on cookie handling um but what we want to do now is we want to get the elements that we want to interact with and again this is going to be different for every every single website in this case you can just rightclick uh on the text box here for example inspect and then you can see that this is an input field with a name name or with an ID name you can go either by ID or name I'm going to go with name um and I want to get this input field that has the name name and then I also want to get the input field that has the name value and then what I want to do once I uh write the stuff in there is I want to get the button uh or in this case the input with type submit and if there's only one I'm going to find this one if there are multiple you have to somehow say I want the first one or something like that uh but in this case there's only one so what I want to do now is I want to say name input is going to be equal to driver find element by name and the name is name and then I'm going to have the value input which is going to have the name value and then all I have to do is I have to put in the values so I have to to send the keys um and by by the way I I actually said before you have to look at what the browser is doing you don't even have to do that because you just do the behavior that is going to cause the cookies being set so you don't even have to uh watch which which kind of cookies are being set you can just go through the behavior that is going to set the cookies automatically so this is even simpler because you basically do what you do with your ordinary browser and you save the results of that so name input send Keys let's say we want to send here some cookie name and down here name value input send Keys some cookie value like this and this is now our login so in your uh application this would be the username input the password input and then you would just uh add username and password load them from environment to be safe um and then you just click the submit button so the submit button is going to be equal to driver find element I'm going to use x path here to say that I want to have an input input with the type submit so I can do that here with an add type equals submit like this and since there's only one I'm going to find the one and then I can just say submit button click all right and then we can wait for 2 seconds and after this we're basically done and what we can do is we can export the cookies into a pickle file so the next time we can actually find it so what we can do now here is we can say with open cookies pickle in writing bite mode as F we we can say pickle dump into this file driver get cookies so we get all the cookies from the driver we dump them into F which is our file stream cookies pickle and then what we do is just to see some results driver get URL and then we're going to wait for 5 seconds just so we can look at the page all right and that's it that's basically how you do that I can run this now and you will see the behavior is going to happen so it's starting it's opening up the website it finds the fields it sends the form it reloads the page the cookie is set I'm waiting it's going to close in 5 seconds there you go and now we have the cookies pickle file and now I don't even have to run through the process anymore because now it's just going to realize okay we have cookies already load them and jump immediately to the section where we load the page and you're going to see that the cookies are already set without me having to do anything and this is the equivalent and of course in your application if you want to use this for something that needs a log in you would already be logged in so again even though I said it in the beginning it was not entirely correct you don't even have to look at what is happening you don't even need to go into right click inspect or anything like that you don't even have to to see what kind of cookies are being set because all you have to do is you have to mimic the behavior that you would do naturally or manually in your browser and then export the cookies and load them the next time because that's exactly what you do in your browser so uh it's it's not even a different kind of behavior here so yeah this is how you handle cookies in selenium so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye
Info
Channel: NeuralNine
Views: 3,564
Rating: undefined out of 5
Keywords: selenium, cookies, python, web scraping, web automation, selenium cookies, python selenium, python selenium cookies, cookie handling, export cookies, import cookies
Id: APpm80uxv1g
Channel Id: undefined
Length: 14min 7sec (847 seconds)
Published: Fri Mar 29 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.