Proper way of passing FORM DATA along with POST request | Python SCRAPY tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey what's going on guys commentakins here this video is sponsored by one of my subscribers who also became my patron on patreon recently so he asks for help regarding the post request and the form data he's trying to pass along with that post request on this video.net site so my task now is not to scrape some sort of a data not extract anything not to store this to some sort of a format but just to make a proper uh post request and pass the form data properly so eventually it would give 200 response in return that's that's kind of it so this is exactly what we're supposed to be doing in this video and this is already the second video in the row regarding the post request if you if you've been watching the previous one you probably remember that there was quite pretty similar issue uh so instead of using the scrapey form request uh method that is uh an easy way to submit forms using python script from framework uh i've been using the bayer scraper.request method specifying the request request method as post explicitly and passing the form data not as a dictionary in the first version but instead i was passing this uh as a very string within the risk request body so we we're supposed to be doing something uh very similar in this video so the very first thing to consider just let's just open the uh developer tools here and i just need to find a proper post request uh my subscriber has been pointed out so uh let me just find the next page here so this one and here we got this post request if you have a look uh what happens on this side it gives some characters in return and this is the exact data he actually wants to scrape so uh my task is actually to get this response within the scrapy so uh let's go to uh to our heaters tab and grab the euro and then grab the heaters and basically try to make this request so i've already created this file called pita.pi and uh uh let's start with uh well i'm not gonna be making my extended heaters as i usually do so let me just say post post requests to vito api by co monkey king so this this should be literally enough here so first we need to import some packages and i need to import scrapy and from scrapey dot crawler i need to import what is known as the crawler process and this is needed in order to be able to run uh the scraper from within the python python script which i that which is the practice i use i always follow in my videos and in my uh in in my jobs as well basically so feet of scraper definition here so class fido which inherits from scrapey dot spider let's define the scraper name which would be equal to sorry let's call this video like this and let's also provide the main driver so i can say if name equals to main in this case uh i want to run my scraper and i just want to create the process variable that would be the instance of the growler process and then say process dot crow and using my veto scraper as an argument i'm sorry guys as an argument here and finally say process does start okay so let me just open terminal the current working directory and by typing python3 video.pi should actually run okay scrapey so scrapey like this okay yeah so now it now works so uh now let's specify the base url we're supposed to be making our requests too so the base url would be this one copy and base url equals this stuff here then we need to provide the custom heaters and here's here's here so now to make it a little bit faster rather than copy pasting this stuff from uh uh from the browser uh response here is not this one request your hitters i will go for a little trick here so if i just grab the entire stuff and just copy and then i would like to open the python interpreter and create the raw headers variable a multi-line one so just store them and then i can say simply like raw haters split by the new line okay so everything but the first elements so we can say starting from first element to the very end and now just use the list comprehension in order to do the following stuff so just to parse this uh string to a dictionary so for error in heaters and now we'll need to import json and create a variable called heaters that would be the type of python dictionary and now we need to say heaters.update and here here split by so i'm parsing every single line i'm splitting this by the semicolon uh sorry by the column and here i'm taking the first element and then here the split and by the column and the very last element and also this one should be stripped to get rid of unnecessary trailing spaces okay and finally i want to print this to screen so json the dumps and headers and indentation equals to four spaces to match the python indentations okay not step but strip sorry s t or pip okay so now we got our heaters so let me just grab them and copy and get back to my source code paste and the only issue here is i'm sorry yeah it is the cookies so cookies i just already know this they are not taking like in the hole but as far as they have some semicolons within uh within the within with the cookie strain we need to actually uh copy them manually so i'll take this cookies here so just copy and paste so now this should work actually okay so i guess from now on yeah oh and one more important thing here is obviously we need to provide our forum data so post whole post request form data and i just create this form form data and this would be the type of strain and i just want to grab this so this is the most essential part within the entire video by the way so usually we do like we just grab this stuff and this is already par so we just make a dictionary out of this data and then using it within the form request but it fails quite pretty often so instead we need to use this view source so now the view parsed yeah but view source and here we have the bear request body here been inspected you see like the bare request button this is the exact screen we need to make use of and it is and as far as already type of string we just place this in and that's it very simple so uh now let's define our crawlers entry point growler's entry point uh the start requests is a standard method of this creepy that spider class we're inheriting from from this one so this is this is the exact method that uh this crawler process starts to uh this is the first this is the first method being invoked by the crawler process when it calls the krell method with this uh scraper name as a parameter so it takes only the self instance and that's kind of it so here we need to make http post request to fido api and i just simply say yield and scrapey dot request and now the bunch of parameters so the first one would be the url and the url would probably would probably be just the better self based url and that's it so self url this should be enough then method would be equal to post and this is a very essential part to take care of it so heaters would be equal to solve self.heaters we're using these heaters being passed along with our request and now uh body which would be passing the form data so it should be a string and i'm just make using this forum data as a string and finally the callback function and full form data which is the type of python strain already let's use this cell the course and this is kind of it so if i did everything correctly so this is a [Music] boris callback method so if i did everything uh correctly so just define the parse it takes the self instance followed by the response object then it should actually be able to print the bones.text after making this request so uh let's actually check this out uh i don't need python interpreter and let's try to run our video scraper again okay some error here body equals form data name for data is not defined yeah uh obviously we need to say self dot forum data because it's the inner field of the fido class so let's try again okay still something wrong here so i'm just getting redirected for some reason yeah just getting redirected okay let's see um well this this should be happening due to either round heaters or [Music] okay maybe maybe the cookie session is already outdated i'm not sure so did i pause the hitters okay the point is that i have just written this uh this sort of a source code and already sent this to my subscriber and it seems like i did literally the same thing here so i'm just a bit wondering what can't get wrong here so cookie everything should be just fine um just gets redirected oh session expires so probably yeah probably something is wrong with my with a cookie so just copy this so this is also a pretty important part well probably this happens due to the expiration of cookies this is happening from time to time so uh okay let me just click next again and i would like so everything else or hold on a sec is this content length oh so this is going to get a bit different so yeah in this case let's actually try to to make this one well i could have uh i could have changed the hitters but i i guess they will looks uh the cookies will expire uh by the time i would be doing this so okay okay if i just uh can i hope this is go going back and uh 11 40 in here 10 19 now it's not the one so maybe try this is absolutely different one well okay uh if i just i'm not sure if i can actually just disable this content length here so i will now try to update the cookies um oh guys i understand why this is happening sorry sorry i don't i don't need all this stuff i just i just i'm not sending the cookies really that's that's that's the trick here so uh yeah obviously i forgot to specify the custom uh spider scraper settings so custom settings uh equals and here in order to uh pass cookies as uh note as but along along with heaters we need to say we need to see the property called cookie cookies enabled equals to false so the idea is very simple well first let me let me just make sure that it's supposed to be working uh are you kidding me man maybe just hold on a sec maybe just spell misspell this cookies here so yeah i will already probably drop that file that i've sent uh okay so let me quickly check this maybe just misspell this okay let's just go to scrape documentation it should be here cookies enabled so seems like it's pretty nice still well i feel a bit confused because it now really should work okay yeah probably i don't know maybe there was some some sort of a type or something because you see like now it gets it finally gets the response basically so here we got this table with uh with this characters that i have no idea what do they mean basically but nevertheless it's kind of exactly what it's the exact response that actually needs to be scraped for for my subscribers so this is kind of it guys and yeah now let me explain uh this little trick regarding cookies enables enable equals to false so the idea is if this setting is not being set up so let me just command this and run this again so you see like it gets redirected uh due to the uh session expired error here so it seems like the session has been expired because no cookies has been provided there and the idea is that even though the these cookies are specified within the ears they are not actually getting passed along with the request because by default uh within the scrapy framework we need to specify uh cookies here uh within the request i'm not sure whether it's plural or single probably probably like this but uh so we can make use of this uh definition where that we we actually won't need to pass the dictionary so the cookie should be parsed this is a real pain but this might be useful when we need to use the dynamic cookies instead of instead of the study but here is literally enough to use the steady cookies and in order to make use of that we actually need to explicitly say the cook is enabled equals to false for so from now on we can do this dynamically within the request but uh we can do this statically when the cookies are actually specified within uh within the heaters so this is kind of it and now we just make this request just gives the proper response yeah probably just with some typo there i'm not sure really so uh this is kind of it basically and from now on we can already uh actually try to extract the data from from this table and store it to whatever format we like uh but as far as this tutorial is actually dedicated to passing the raw form data uh along with the post request i'm not going to be covering the data extraction here especially bearing in mind that i have no idea what these characters actually mean so it would be incredibly difficult to uh order them appropriately so this is it guys uh so again like let's summarize uh what what we've done here so we did use the raw strain uh passed within the body along with the post request and we did set up this cookies enabled equals to false in order to being able to pass our cookies as as the heaters as the part of the heaters very simple so these tiny little tricks uh can be very beneficial from time to time so uh this is for this is it from my side i hope you learned something interesting out of this video so yeah until the next time and take care
Info
Channel: Code Monkey King
Views: 1,821
Rating: 5 out of 5
Keywords:
Id: YcJIY8kZlJQ
Channel Id: undefined
Length: 20min 39sec (1239 seconds)
Published: Fri Aug 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.