Deploy Scrapy spiders locally - Scrapyd

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so earlier in this course you've seen how to deploy spiders to the scrapping of cloud right however what if you have your own server and you want to host your spiders in it or maybe you want to take your entire project and deploy it to Heroku for free but before all of this I simply need to require instant and explain to you why I've decided to add this section or what are the reasons why I sometimes don't prefer to use the scraping a cloud so first thing first the scrapping hub cloud has this feature called periodic jobs which allows us to run spiders on a particular month take and our that we choose this feature was free in the past however now you can't get it unless you pay for it so this is the reason number one reason number two is what if you were using scrapey splash along with your project well in this case you have to pay two and as you see they have plans that goes up to $100 per month in order for you to be able to use splash with your project therefore I thought why not show my students how they can get the same thing without paying in a penny and don't get me wrong I'm not saying that this capping cup cloud is not good or not efficient but like I said I just want to show you how to get the same thing for free all right so first we have to install a package called Square PD which stands for scrappy demon and basically is a service the turns in the background listening for coming requests so without using time let's install it pip install scrapey D presenter now to launch the scrappy demon instance we type square P D press Enter okay it's running and we have a server that's listening on the address on 27001 and the port 6800 so let's open it in chrome 127 0 0 1 and the port 68 sorcerer I know that it has the ugliest interface in the world but it can get the job done so we have this jobs link where we can see all the current running jobs and by saying jobs I mean current running spiders and we have this logs link which stores the logs of each running spider now the following step is to deploy our Airbnb project locally and for simplicity reasons we can use another package called scrappy demon client which will take care of all the steps in order to deploy our project so I'm going to open a new tab now we can install it by either running the command pip install scrapey g - client however this can cause some issues so rather than sticking up with this way let me show you how to install it correctly so let's go to github.com and search for square P D - client press Enter let's open this one click on clone or download and then copy the link now we type pip install gate + and we paste the link and of course make sure that you have git installed otherwise it won't work now in order to deploy our project we type square P D - deploy and then we set the target equals to default now the reason why I am typing default is because if we go ahead and open the square P that config file we have this deploy rapid between two square brackets and in files that ends with or have the extension dot CFG everything that is written inside two square brackets is called a section so we have the Settings section and we have the deploy section now this deploy section in reality should be written like this deploy : default this is why in this creepy demon deploy command I said the target equals to the fault if we had for example local we type scrappy demon deploy locker like this next we have to specify the target project name so the moon and the line Airbnb but since by default that is a variable inside the square P that config file called project which points to our project name we can simply omit this argument now before we hit enter let's uncomment the URL and make sure that it points to the scrappy demon URL now let's press ENTER and while our project is being deployed let me explain to you what will happen under the hood so square P demon deployed will first create the project ACK after that an HTTP POST request will be sent to an endpoint called ad version of Jason and remember when I told you that scrappy demon is a service that runs in the background and listens for coming requests well this is one of the requests that we sent so we can deploy our project now we have here the response sent back by this creepy demon server as a JSON object and all it matters is the status key we got okay so our project is deployed successfully now more importantly let's launch the Airbnb spider and for this task we will use a tool called curl to send request from the command line if you don't have it just go to curl dot H a xxxx sorry curl dot H a xx dot s a hunter click download and then choose the one that works with your operating system I already have it so I'm going to open a new tab then I'm gonna CD to the curve for girl in my desktop and then to the bin folder now to launch the spider we type curl HTTP colon slash localhost come on 6800 /schedule dot JSON and then we type touch date project equals to demo underline airbnb touch the spider equals to open B - D city equals to Miami for example now let's hit enter status is set to okay so back to Chrome click on jobs and as you see the Airbnb spider is currently running now one last thing that I want to show you is how to stop the current running spider so for that we have to copy the job ID so copy and then from the command line we type curl HTTP tap / localhost 6800 / cancel dot JSON - D project equal demo and line Airbnb and then - D job equals 2 and then we paste the job ID presenter so this will send a post request to the console dot JSON and point to stop the spider it will take some time to stop because of the auto total extension and that's pretty much everything I wanted to show you in this video however if you want to know more about all the available end points just Google squiggly D click on this documentation link and then API and here are all the available end points that you can play with now more importantly I know that I didn't show you yet how to store these weighted items but bear with me I'm gonna cover that part when we deploy the spider to Heroku
Info
Channel: Human Code
Views: 9,775
Rating: 4.9365077 out of 5
Keywords: Scrapy, web scraping, Scrapyd, web scraping course, deploy scrapy, deploy spiders
Id: PZKH5S0C8EI
Channel Id: undefined
Length: 8min 24sec (504 seconds)
Published: Tue Oct 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.