Hi, welcome to this
Data Science Dojo tutorial on part two of automating the task of
web scraping for analyzing text. In part one of this video
tutorial you followed along while we wrote an R script to scrape hourly bitcoin use, summarize the text and send
the summary in an email alert to ourselves In part two we'll set up our web scraping
script to run as a background task on our computer so you don't need to
manually run the script in R yourself. The script will be scheduled to run hourly as we're grabbing text on bitcoin events from the last hour or so. You can access the full script,
just see the link below the video. You can set this up as a task in task schedule in windows, or you can do this through Rstudio itself I'll show you how to use the
task scheduler R package to easily schedule your web scraping script in Rstudio. Otherwise, you can check out
our KD Nuggets tutorial link below on how to set this up using the task schedule interface Now if you're using OSX, automated
would be the equivalent tool for this and for Linux It would be genome schedule. And to do this in R studio the equivalent package to
task scheduler R for Mac users And for Linux is called cronR. The
installation of cronR is fairly simple and a link is provided below on how to do this. The functions in cronR are
similar to those in R task scheduler R too. Let's go ahead and install and load taskscheduleR into R. So just uncomment and run this line to install And don't forget to also run this line to load it into R Now you can use the add-in interface.
If you prefer to use the add-in interface just install these packages here and Then once installed go to the add-ins drop-down menu at the top here select schedule R scripts on windows And you can upload your script here and you can schedule it hourly, daily, weekly or
whatever time frame you're interested in and just hit create task. Now, you might want to have your output data and logs go into
another directory folder on your computer otherwise by default it goes into the taskscheduleR
extension data folder inside your R folder. And also these interfaces
are very similar in cronR, too. Okay, cool. So let's just use some taskscheduleR
functions to schedule your web scraping script to have it run every hour So we'll use the taskscheduleR create function And now we're going to give it all our inputs into the function so we'll start with giving our task a name. And I think I'll just call it R web scraping Bitcoin And sorry for the typo. Okay, cool. And we'll give it the full
path to where R scripts sits, so in my case it's Let's just sit here my Web auto scripts folder And if you're in Windows
don't forget to use double backslash Okay, cool. And we want to
schedule our script to run every hour. And we'll input the start time. Now, you
can specify a start time, but I'm just happy to go with the default time, which is my current time, or my system
time and have it like kick off within sixty two seconds. And I'll just follow that hour/minute format. And you can also specify a date But once again, I'm happy just
to go with the default, which is my current date. Now, you just need to make sure that
this matches your computer systems date format. So in my case it's month followed by day. And we're also going to give it the
R executable file to run our R script. And this usually sits within the bin folder in R. Okay, cool, let's run this. Okay, the output states this
was successful, so we have now set up our web scraping script to run every hour. It's also just a good idea, I'll show
you what I mean, to check out your logs here. So, the reason why you want to
check your logs is to see if there were any errors Running during the script
causing it to halt or anything like that. So, any data saved as the
output and the logs are stored in the same directory path of where your script lies and We gave this to task schedule function up here this path here. And basically, I've put little print statements in my
web scraping script to help with debugging and such and my actual output is email alert. So, I'll either receive an updated email
summarizing Bitcoin events within the last hour or not Just a couple of other things that I wanted to show you it's just basically like You might decide to stop running
your script later or delete it altogether And in this case you simply want
to use the taskscheduleR stop function and just feed it to the task So let's go ahead and do that. I'll just copy and paste the thing I've used up here. Okay, great we've successfully kind of stopped our task and to delete it, we'll just use
the task scheduler delete function Okay, cool And that's it. After, you
know, successfully creating a task you can close out of Rstudio and this
will run in the background of your computer Something to take note of, you do need
to keep your computer on in order to have the script run in the background. So you just can't let it go to sleep Power consumption is
something you might want to think about You can change your power and sleep
settings in windows and if you're, you know, using a Mac It would be your power and
it'll be your energy saving settings Thanks for watching. If you
found the video tutorial useful, give us a like. Otherwise, You can check out our other video tutorials at: tutorials.datasciencedojo.com
Sorry if this was already discussed but what is the advantage of using this over just task scheduler (if you're in windows)?