Working with COOKIES and HEADERS in Python SCRAPY framework or REQUESTS package

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
there's gonna run guys in this video we're gonna learn how to use cookies along with us creepy spiders and what's the difference but when providing the cookies along with this creepy spider and doing little the same but with a bear request library so first of all I would like to introduce the request bang comm site that allows us to make this public URL endpoints to inspect the incoming requests and I just want to delete those guys from the previous session so the first thing to consider I just want to open my developer tools and grab some hitters from the network tab so if you're just trying to and the first thing to consider would it be so I would have just grabbed the entire theater set from over in here and just copied and before actually creating this creepy spider I just want to show you one little thing that I've been talking about so maybe even you know like okay so probably let me create one more file here so just wait another foul so yeah I just want to show you the difference between a scrapey framework and the request library so let's consider cookies request request stop by and this would be called like cookie scrape so let's actually start with [Music] [Music] [Music] okay and so I would be basically using the uterus grab from the browser in Europe so just in order to avoid wasting your time you've got your time guys so meeting for some packages you're a pink warfare requests right leg and your right to find the heaters so let's say customers here's the type of - dictionary and I just pull the hitters grab from the browser and I just need to throw away the proper format to fit the dictionary the bight dictionary syntax set of the keys and the values and please know that we got this cookie line here which is very important the difference I'm talking about so okay so I just want to go from the terminal in the current working directory and by touching three cookies West by checking out if don't send synthetic errors so it seems like no none of that and here we just need to make HTTP requests it doesn't matter this is the get request actually so it could be whatever request to request been calm so and here I'm creating the responds are [Music] also you know it's specified was this purpose disliked and and I just specify the euro so if you just make a a bear request like you know already was done actually was it hold on a second seems like it wasn't actually oh my god I'm doing this within the quickest great you sorry and try this one more time actually okay syntax okay so also we're gonna Brent the bring the wrist bones so just rent don't stop thanks just gonna make sure that we're done basically so I'm just wondering why oh sorry guys uh I needed to copy this endpoint actually not know the Euron I'm really sorry but this inconvenience let's try it again yeah success for an ounce notes fight so if we have a look at our request inspector basically that here we have this incoming get request and we have five meters and - these are the heater so let me just grow these developer tools for a while it has only the very basic eaters and also the user agent is like quite the request library and this might have been by integrating systems on the target sites your crawling through so just a very nice and now if we just specify the heaters like Peters and this is it basically so in this case Wes true and I would get another response and suddenly will rehab up to that's in hitters so we have we got this how host accept accept encodings all the cells so now it looks like a browser basically and also mmm the very important thing here is actually we do have this cookies then specified as well and in case of Python requests libraries cookies are kind of part of the hitters so the the cookie is considered just to be like a regular hitter but it's not really the same when it comes to bite this grapey framework so let's now do literally the same but using Python squeeky framework so just [Applause] grab this request using scraping framework like this and what I would probably [Music] [Music] okay so here we need to him for some packages because well here I would like to say for it's crazy and from scrapey dog Crower I went in for roses just to be able to around this fighter from my diploma the Python script and here I define this minor class so glass let's call this meters cookies spider and it would inherit from scrape it out spider and we need to throw wine the spider name and maybe hitters cookies okay and also oh yeah let's provide the gyro and now I need to define the rel central points so there is a method called death sorry requests which is the entry point of the scrape spider the basis spiral that we're inheriting from don't take yourself instance and here and here I need to say yield straight the West and quite a bunch of parameters so first we need to specify URL that would be equal to well I will drop specifying the hitters at the moment just to show the difference okay and function no we can actually we're not we're only gonna be parsonage So So Def poorest would have take the self instance just can bring the barest bones bone stock backs and we should get exactly the same height success true response so now let me provide the main driver as far as or into the approach there was some main driver vehicle [Music] and I would say roses I'm sorry first we need to agree the crowd approaches so proces with the instance of crowler princess and then we say like roses dog bro the spider class is about to be used and finally seen like roses start in order to actually start over and just say like okay so now we got a made a response and made a request and got our response and in return so if we have a look at the very latest upcoming request we see that well it has slightly bit more hitters compared convert the request by brew in those settings but still it's the user-agent UPS creepy and we don't have that enough hitters to consider years so I'm going to make use of the hitters I can simply gonna grab this guy's and literally paste right over here so the inner class okay here's the heaters cookies spider so here I would say Peters equals two so Souths years and suddenly okay and suddenly we got 12 hitters but you see like all the hitters has been inspected correctly so that's exactly the same that we've specified but we don't have any cookies here and that's the very difference I've been talking about so this kind of cookies are not really available and here is the trick basically so let me just try to get rid of this cookie strand within the Peters and your supply [Music] now in order to provide the cookies along with the request we actually need to say like cookies equals cookies these cookies are not yet initialized but so this would it be the type of poison dictionary in any case so let me just show you an example so [Music] so if we just run this oh my god you just run this again I would have some cookie already available right over here so you like oh it's already like the team heaters so the cookie is enabled here so we got our custom key and that is but unfortunately we can't we can't pass the strength directly so before actually doing anything first we need to parse this kind of strength so here I would say like parse raw strength and the first the first thing to consider and yeah actually [Music] empty Python dictionary just right at the moment and I would also like to print this after initializing them so if I just so let's let me just print with my friend this cookie strength then splitted by so it's better it's easy to show rather than to explain so if I just say like cookie friend actually not the cookie string but so cookie string and I want to split this trend by semicolon followed by the by the empty space so you see like this is the key and this is the value right and again like separator and again the key and value and separator and again the key and no value and that's what I really find it that's not a big deal and so on and so on so now we need to split so splitting by this semicolon followed by the empty space would give us this this this and so on elements so first I just want to bring this elements to show you how they look like so don't save this error that's that's just because well actually no it's not it's not what I thought but okay so actually something wrong with the strain here Lucas train split pupil object is no split this is a bit strange to be honest because kooky string hold on a sec why on earth the cookie strand is actually Q this is a bit strange hmm this is this is really strange so why is this a tuple I actually this should have been a strain actually so just what an interesting disaster here so whatever you consider it to be a tuple my thing is going crazy or I'm going crazy I don't know but this really seems to seems like oh that's because of this come sorry guys so this comma consider it took me just because of that comedy thought it's a tuple notice so now that shouldn't be a strain actually yeah now it's the strain perfect so as far as we got a strain we can split this so I can say strength semicolon followed by the enter play the empty space and now we got this key value pairs separated by the equal sign and somewhere it might have occurred that we don't have a value but only the key so we will accept this kind of situations so now we need to loop over now we need to loop over over this kind of list and you actually need to force them to throw away so I need to initialize this cookie so I'm referencing the cookies dictionary and the key would be equal cookie which is the next element within this list and I want to split this by the equal sign and the key would it be the first element while the value would have been a second but actually to make it even more simple so I would probably first I would say you need to key key and here I would say like and [Music] we need to we need to key value and it would be equal to literally the same but already index of one which is the second element so thou equals but this would throw an exception because and because we don't have that we use all the time so it would throw us an exception and guess it didn't it's a bit strange oh it's because not the first is actually index of two because this one is the equal sign and I could be used like the very last one but in this case the very last one would be equal sign and now this yeah now this here's the illest index out of Rancher that I was expecting for so you're we try well I could have except the aesthetics error but no need that really so and that's actually print there's guys only in that case where this is possible so now we got this is a bit strange so I'm not really sure why I'll probably just well I've no idea why it doesn't really okay hold on a sec just okay hold on if I I'm just trying to figure that out how many how many indexes they're available so if I'm splitting by equal sign probably the equal sign is not being is not counted anymore yeah well why did I think that it is actually counted so just like but but in this case I have no idea why I didn't actually bring me the error for the first time so that's a bit strange so maybe just because it's not available or something right I have no idea why well okay so still it's better to keep them within this tricep statement be being being it being enclosed within a tricep statement so here actually it's time to parse our cookies okay and your I would say cookies index like the key would be equal to the value and at the very end of this operation I just want to friend so Justin Ford's jason module to print our cookies so I would say Jason dumps and these information equals different species okay so here we got our cookies already being parsed you see like this element doesn't have any corresponding value so that's kind of it basically and it would be parsed back to the format of like that's like so we don't need oh we don't need this anymore basically so this back so now actually we already can use our cookies within the request itself so if I would now say bye cookies would it be for like this okay I'm just trying to run this one more time then hopefully I supposed to get exactly the same cookies waves were waiting there so that's considered the last request here and again 13 hitters which is the good news and you see like the cookies are literally the same that we've provided there so like I was saying like this one that doesn't have a value stance stays just without the value so that's gonna hit and we got all the cookies being initialized so this is it guys and probably one more situation that you might have encountered while grading your web scraping while creating web scrapers and or working on your web scraping projects especially regarding this size that doesn't really like being scraped sometimes it happens that cookies are being set by the response so let me just actually try to check that out so check this out so kukiz so what when we just me this request actually I'm just wondering so dot get I guess and there might be a he recalled so we just inspect this page within the network let me just update the page one time and okay so is this this is it right or well this isn't really matter that much so request here's where do we have response response here is our nuts and cookies unfortunately so yeah probably just hold on this is a bit strange so why doesn't this is really strange to be honest it's not exactly what I'm supposed to see so so cookies being sad but we're stones here so I can't see set cookie here well some size did for why that well okay so I don't know maybe let's try to just this is very strange because I really kind of seen that you did actually have yeah okay so here it shouldn't be available okay there's like I've seen this right over in here again not cookies are taken from somewhere okay maybe if it just goes to the pyre requests been calm here I just want to show you this Seth cooky Peter from the respose here is it's not available here as well my god okay so let's try this case let's try whatever I don't know that's right side I've been scraping not there a long time ago there is already covered on this channel so I will show you this as well so if we have let's just try to pick up some some dataset in return well really that's interesting okay you own you homes maybe just homes bye okay now shoo to work so yeah here it is so here you see like we got this sad cookie hitters and sometimes sometimes it's important to actually try to change the cookies dynamically so when the response is getting you some cookies in return it might be useful to parse them and actually use them alone so just first burst them and then update the existing cookies and then just make a require a request another request with the array update cookies it's not really use that often but sometimes you might encounter this sort of the situations and what else what else so I just wanna have a look so well before actually proceeding there wondering so this on this side request being AB dot with request been calm I guess we don't really have this cookies this set cookie what text response I'm sorry not another text response response computers like this yeah but there are no cooking basically so okay I guess it's literally not already just to show to show you that what we got over in here so that the requests and the cookies being set and we can track this so now I want to just change the URL basically so this front of this gyro I guess it might you know like this site might be a big it's not the very best example actually I already grade numerous scrapers to scrape this kind of site but that's that's not the best ever case or educational purposes so I would rather going to write and try to grab some cookie response to get some set cookie hitters from the response from there let's consider whatever location doesn't matter really so and find okay and now we need now we need this one so okay so then this is the perfect perfect example so like we have three sides cooking elements huge so I just grab this euro and you know like I will just recommend this euro out so you will be available to make use of it without the service code even if you would like to try to play around with this source code actually so equate another hero here like safe and I just want to probably with this here is it might not worry but well let's actually try so setcookie is this cold actually set yeah seems like so now let's try to finish this and all the Euro and I really hope that the setcookie yeah so now we got our cookies being set and please know that by default mmm they are considered to be bear bites so not a string so we need also to go down before actually parsing them so in order to do that so let's say like [Music] this respose and also i would like to use the list comprehension technique in order to and worried every single element from this list so this would be okay and I want to I want to string it point this list and hopefully to get actually straight string good by the version of this yeah perfect so now we got it and also as far as our logic of Bart's into cookies right over in here involves some sort of you know like the the single string so we can also join all this elements by yeah I guess we can just join them by semicolon followed by the empty space so now the Rob Lucas is already a type of strength okay perfect so I know we can actually force them so uh also this probably might be a good idea to separate this it would take the self instance at the first argument followed by let's say your raw cookies and your we're not splitting self cookie strength but we would be splitting this okay yeah and here so this just don't exist anymore so one more little blue little mango so I would probably don't command this stuff so this euro would be right is right so your request and here first I just want I just want to test this Boris cookies function so return and here we can say like cell doors cookies and self dot cookie string right string as the response object okay so oh let's actually try to run this again it won't set any response cookies but hopefully complete it should have give up they should give us one more yeah so the cookies be initialized exactly the same sorry I didn't mean to invoke the workbench okay last coverage okay so uh I'm just wondering yeah and now probably again and yeah so this this guy's and just just to check that out that we did actually update the cookies here okay you know like I would probably even like okay so here we did away this rock with a string so we can actually duplicate this request and so far as cookies but this time we wouldn't be using this raw cookies here so now we would have a person to cookies that we've just obtained from the response here is dog atlas set cookie Peters from the current response okay so now this should ah but why did it caalso forest recursively okay let's let's actually try to print expose URL just to distinguish your else we're crawling okay so first it should have be the right move right wrist bones URL and this is the right no right now then he said like filter duplicate requests just wondering oh that's because sorry guys not sure this will work so first we're making the request right move to obtain the setcookie here is from the respose and this is the response URL and then we crawl in the endpoint of the request pain and yeah now the response URL is this one and also we yeah so here is a new report here is a new request has just arrived and for some reason has put in here I'm wondering why Willie why exactly protein ears here host o because it didn't include the host here but that that was done automatically okay but it seems like yeah this cookies has been updated so it's not the cookies from so from right over here here well actually there's blazes right over in here and make use of the rock cookies okay okay so here is yours when we got basically so let's go back first we did crawl right moves in the right now right and here is the response URL and here is the set of already forest not yet forest but sad cookie here is that that were joined into a single string this is it and then it grows this request been dot org URL and this is the response of the of the particular URL end point we don't have any set cookie but this particular cookies should have fear within the last request so let's check that out so yeah Furber muse ID two zero zero five yeah this is an advanced low ends with domain equals right low Dakota and this is it basically to be honest I have no idea regarding what is this all about so they didn't have actually a one more not really sure where this comes from but still it's kind of the Rockies but for some reason I don't know well it's hard to say not really sure what this stands for ds0 and stuff so it is actually with it within the last hitter and yeah we did with it obtained this OTS oh one baby two tears well this might be different every time so it's not a problem basically yeah yeah you're welcome faz deaf you're welcome appreciate you command thank you man thanks as well okay so I'm just so the only thing I'm wondering is oh but this is already over in here so it probably is some sort of a different order order I'm not really sure why this happens or hold on a sec yeah this is it like path domain right Luke oh so this is this is it like okay so this is the very last one really but in this case yeah I'm just wondering oh I guess I guess probably well this might be possible that if they are presented in a different order or maybe scrape it shuffle them for some reason like say elf or something down I don't know really but the very main thing that we got literally all the cookies that were actually updated and that's that's the case actually so also it could it be possible that the initial cookie cookie set was changed so in order to check that as well let me try to just go into the bear write and try to get cookies from there and I would also try to change them so copy this and I would like to change this cookies trend based save and now I would like to change if this values has been changed so in order to do this actually I would like to say so here within the forest cookies but hold on a sec so I'm analyzing this cookies and returning them so what I want to do here I just want to try to figure out how to provide a logic of updating the existing cookies but again like probably yeah it seems like yeah it's just they are actually updated but within this sad cookie here is we got literally all the needed cookies so actually no need to update the existing ones yes are you just gonna bring this to the initial state to the previous state okay yeah so no need for that because the circle heaters are updating everything themselves so I just want to check this request one last time and this time the number might be even different or maybe not so 200 5 2 5 4 here yeah it's different 200 five two five two five game yeah so this is different you see like the cookies has been updated so this identifier is being changed so this is gonna also expiry date for the different so this [Music] you're yeah it's different so this is a guys I hope you've learned something interesting out of this tutorial and this cookie said and might be really helpful in those cases when you deal with the sites they don't really like being screen so obviously pythons creepy framework have an extended very complicated best practices on how to handle cookies in sessions but just like the item exporters and all the other fancy features within this framework I just don't like there's over complications and prefer to instance straightforward when it's more like pythonic pythonic in terms of explicit is better than implicit so doing everything explicitly so it's kind of clear what exactly is no one owning the code is just way better in my own opinion so I wish you all the best guys this is it for this video until the next time and take care
Info
Channel: Code Monkey King
Views: 3,601
Rating: 4.9545455 out of 5
Keywords:
Id: NvkWfjMWn7Y
Channel Id: undefined
Length: 53min 28sec (3208 seconds)
Published: Mon May 25 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.