A Comparison of Proxies - Rotating IP with Python Scrapy

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] today's topic is proxies so i'm going to talk about three services the first is scrapper api the second is one open source scrappy proxy pool and third is crawler and crawler i am not going to demonstrate in this video maybe some other time so this one is on github so this is free and what it does is actually it's just a middleware so you install it and make some changes to the settings so essentially you are enabling the download middlewares and that's it so what this will do is it will go to some of the free proxy sites where it is going to find some proxies which are available for use and it will just plug into scrappy project nicely so this is one option and then this is the paid option scrapper api so if you look at the plans uh the first plan basic plan is for 29 and you get 250 000 api calls so this is on the cheaper side and then you have crawler which is very professional and it is from the same guys who make scrapping hub and their plan starts at 99 for a 200 000 request so these are the three main proxy scenarios so maybe you want to start with this one scrappy proxy pool and let's quickly see how we will do it and the first is the scrappy proxy pool so first command is to pip install i have already done it but if i want to show you i can run it again and you can see that all the requirements are satisfied so as you can see i already that it's a good decent set of requirements anyway so the second step is to create a scrappy project so let's create one scrappy project so scrappy start project and let's call it proxy and i will create one blank spider and let's call it free and fourth parameter as start url anything we'll change it later in fact let's generate one more spider and let's call it api and crawler i'm not going to demonstrate in this video maybe some other time so let's open this folder in visual studio code and i'm going to fast forward a bit because i don't want to bore you with the basic structure of the spider i'm going to use the same spider but i don't want to bore you with the structure so i'm just going to hit pause for a moment so what i've done is i've created a basic structure of the spider so if i want to zoom in so as you can see that it's a very simple spider so instead of using start urls i have used this function start request the reason is that we get a lot more flexibility about if you want to create a dynamic set of request then this is a good place to do so that's why i'm using it i actually want to show you what happens when we send multiple requests so i'll probably run a loop here and then i have copy pasted exactly the same code in free so let's run this code once without making any change so let's run it so let's go to terminal clear everything so scrappy list should show the two spiders so let me zoom this and let's run any of the spider it's basically the same so scrappy crawl and let's crawl api and if you remember that i simply printed so if i'm using a print statement i actually don't need the log so i'm going to skip that and i'm just going to set the log level to warnings only so we'll directly get okay so so this was the mistake instead of url i passed c url or curl so as you can see that this proxy address was printed out and that's all actually this url does so it's a very simple url which simply returns your ip address so that's why it is good for this kind of testing now if i want to run it let's say five times so all we have to do is for i in range five a word of caution if we do it like this i want to show you one side note you can see that there was only one request because scrappy and you know found out scrappy actually determined that these scraps are duplicate five times so it will do auto filtering so we in we don't want auto filtering so what we'll do is we'll set don't filter to true and let's run it now and now you can see that these same requests were called all the five times in fact there is another trick most of the website will ignore query string parameters that they don't understand so maybe we could also do something like you know some arbitrary query string parameter and set it to this and then we don't need this so this will also work let's test it so this again printed five times so we can go with any of these two i'm going to copy this exact code in the other spider now let's go to the documentation for scrappy proxy pool so we already installed it now we need to make some changes in settings so proxy pooled enabled so this has to be true okay so let's go to code find the settings file so this is the settings files which was generated so i'm just going to make things easier i am going to delete everything which is commented out okay so these are the required things so this i am going to set it to false it will save me one api call and proxy pool enabled is true this also i'm going to copy and paste it here now let's run the spider that's all we needed to change actually so let's run the spider now so it will go and connect to these websites and it will find out a list of ip addresses which are available so far we can see proxy daily we can see ssl proxies free proxy list and now you can see the messages that proxy has been chosen so it took some time but yes we have all the five requests completed and here you can see the result so this was as simple as that just enable these two things and all the spiders will use proxy for now i'm going to delete these because i want to move on to the next one so the next one is scrapper api so as i showed you that the 29 plan has 250 000 api but you don't have to directly go for it because you can see this button sign up for free so i've just moved on to the dashboard and you can see that i signed up for a free account so you can see that this is a free plan and i have 1000 requests so this is good enough for testing but any production kind of scenario you will need at least the 29 plan so once you sign up to this site you will get one api key and that api key is what you are going to need so what i have done is i've created this file config and i've just pasted in that string of api so all i'm going to do is i'm going to import that file from config import and this is the string which contains my api key so this is the first thing that you need now let's move on and follow the documentation on the site you have the link to documentation here we are on the documentation page let's look at the sample code they have different sample codes listed out and here is the one which is for python and this code on the top is to replace a request library scrapper api client so as you can see that you will need this api key which is going to look something like this and if you're using scrappy they have mentioned clearly that if you're using scrappy instead of start url directly passing the url you need to pass instance of client and then call the scrappy get method so what i'm going to do is i'm going to copy these two lines from the documentation paste in here and of course this is also mentioned that you need to do a pip install scrapper sdk so this actually i have already done you can actually see that already and if you're using start urls you can simply take this line and update but what we are using is start request so we need this line instead of this we'll paste it here like this maybe just this part because we'll be running for more than one so this part looks okay and again this parse this is also not required because if we do not provide a parse method then the default is parse and that is what we have so that's all we needed ah one more thing this api key which is copied from documentation so here we need our api key that the one that we got when we created the account so let's run it and see what happens so scrappy crawl api and this time i'm going to let the log print everything and let's see what we have so as you can see very quickly we are getting the results and so here is the first ip this ip is different this ip is different this is different this is different yeah done now but still it took no time as compared to the free solution that we were using so again scrapper api is good paid service when you want to get started but when you want to get really professional then you should move to something like crawler let's go and look at the documentation so if you want to use crawler then what you have to do is you have to set crawler enable to true and the api key these are the two things that you will need rest of the parameters can be adjusted as you go and this will actually work very nicely if you have a scrapping hub cloud account so i hope you found this tutorial somewhat useful that's all for today see you in the next one [Music] foreign [Music]
Info
Channel: codeRECODE with Upendra
Views: 4,973
Rating: 4.9365077 out of 5
Keywords: python web scraping tutorial, python scrapy tutorial, scrapy for beginners, Python Web Scraping, web scraping python, how to scrape data, website scraping, python scraping, scrapy tutorial, Python Scrapy, Scrapy Spider, web scrapping, web scraping, web crawler, webscraping, scraping, scrapy, proxy, proxies, proxy python, scrapy proxy, how to use proxy with scrapy, ip rotate, how to rotate ip scrapy, python web scraping, web scraping with python, web scraping using python
Id: qHahcxoGfpc
Channel Id: undefined
Length: 12min 12sec (732 seconds)
Published: Thu Aug 20 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.