Puppeteer + Node.js = App That Tracks Prices on Amazon

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Note that if you do this from different IPs, you get different results

👍︎︎ 12 👤︎︎ u/StoneCypher 📅︎︎ Mar 10 2020 🗫︎ replies

... also a good way to get yourself IP banned from Amazon, but good luck with that, i guess.

also, whenever an API is available, use it. scraping information should be your absolute dead last resort to getting it.

👍︎︎ 18 👤︎︎ u/FormerGameDev 📅︎︎ Mar 10 2020 🗫︎ replies

This looks like a great starting point to learn web scraping as a concept as long as you don't do it on the likes of Amazon or Google. like others have pointed out - doing so will get you ip banned quickly.

For Amazon, I have used and still use product advertising api heavily for getting product prices as well as other product data.

it's pretty easy to get access to and the rate limits are fairly allocated based on how much sales you drive them. Search for Amazon associates and you will find everything you need on this.

If you are interested, I shared a case study of one of my blog doing about $2.7k a month from Amazon associates here -

https://www.bloggingcage.com/amazon-associates-site/

Even that sites used product advertising api to display prices inside articles.

👍︎︎ 1 👤︎︎ u/alertify 📅︎︎ Mar 10 2020 🗫︎ replies

You can do something similar with product reviews. Here's my project:

https://github.com/ajbogh/amazingreviews

👍︎︎ 1 👤︎︎ u/Buckwheat469 📅︎︎ Mar 10 2020 🗫︎ replies

Very cool. Honey (joinhoney.com) can do this, but I am unsure of the alert delay from price trigger.

👍︎︎ 1 👤︎︎ u/_mausmaus 📅︎︎ Mar 11 2020 🗫︎ replies

It's mostly because I haven't had a use case for scraping with Puppeteer (yet), but I must admit I hadn't thought of using Puppeteer just to get the page HTML, then parsing it with Cheerio like you would with classic scraping. Thinking about it, there are some advantages to doing it that way for certain cases. Still, for a simple case like this I was expecting him to just use page.$() or page.waitForSelector() or similar.

👍︎︎ 1 👤︎︎ u/NoInkling 📅︎︎ Mar 11 2020 🗫︎ replies

Awesome video! would not have thought that this is that easy!

👍︎︎ 1 👤︎︎ u/112358132l3455 📅︎︎ Mar 10 2020 🗫︎ replies
Captions
hey it's done and today we're going to talk about web scraping once again so if you have seen my last video we used their request library to send a request from node.js app to get the content of the website unfortunately this is working only for static websites like Wikipedia like hacker news those which are not so interactive more and more websites recently get very interactive and the content on on them is being generated dynamically so sending a simple request won't get us the whole content of the website because the JavaScript in the background is probably fetching some data and then filling the website with the content so today I'll show you how to use puppeteer and cheerio libraries to get content of dynamic website like Amazon so let's jump to the code at first it's definitely important to find out from some website that we'll be trying to scrape so for this reason and I thought that Amazon might be good and today we're gonna build a node.js app that will track prices of the products on Amazon so for this reason and we'll track price of the Sony WH 1000 xm3 like noise cancelling headphones they're very good but pretty expensive so it's worth to track the price okay so let's jump to the code at first we definitely need to find out some URL so we can find the data okay so maybe let's do something like this yeah there's some Oh different versions so okay maybe use those like ones are pretty fine okay so as you see we have a pretty long URL there are some additional parameters like keywords ref is probably some reference so we can cut it and check if that's working yeah it's still working so we can use this URL and we can create a Const with this URL so now we'll have to initialize puppeteer so what is puppeteer puppeteer is a library which is some kind of Chrome browser to use on the backend so in general it's using chromium so the open source part of the Chrome without any codecs for mp3 for video etc those things that Google has to pay for and in general we can think about puppeteer like using Chrome from comment line so we are able to open the browser go to particular URL and make a screen shot create a PF feel some some data into forms send those forms or just get the content of the website when it's fully loaded and just like on Amazon because the content on Amazon website is created dynamically especially for you so if you were watching different things in in the past you probably will get different recommendations so let me show you how very popular puppeteer is so as you see nearly 60 hundred sixty thousand stars and puppetry allows us to to make a lot of different things our usage will be very basic so okay let's create a browser instance so it can be and today differently as in in the last episode we'll just use a weight I think so syntax instead of the promises and chaining them hopefully that will be much easier to understand for for you so puppeteer launch okay and then we have to create an instance of the page [Music] with your new page okay so we created a new page which is empty right now and the next step is just to go to this URL so we page go to URL okay and let's wrap it in a function so it has to be a Singh so maybe configure browser so we will just launch this function once and return page okay that should be fine so calling this function will will return us page which is already a page which is using this URL that we provided and right now it's important to check the particular data so let's go back to the website and let's check what exactly data is here so this is a span object and it has some pretty nice ID price blog deal price so I strongly believe that it's it's pretty constant it's not an idea that's being generated every single time it's probably constant and class and there's a value okay so I think that we can copy this ID and we will try to get this data of of this span with this particular ID from the website and to to do it we will use a cheerio so cheerio J's is a library that's implementation of all good jQuery on the backend so we can manipulate data of the website using jQuery syntax which is pretty easy and a lot of people already know that which is very nice and makes things much easier okay so let's create a function that will check price so it will also have to be passing function and we can call it check price okay and let's put the ID here so we won't forget that and right now we can reload the page because this function check price will be called every for example 15 minutes maybe one day even I can check we can change that that's that's not a problem so page reload because if you will create an instance of the browser and page every single time at some point it will run out of memory or or your terminal will tell you that you're not able to run another instance or or something will just crash so we'll just reload the page it won't get more memory however as we all know Chrome is able to consume any amount of memory it will provide so ok so we got the page reloaded we now need the HTML content from from the website and wait page evaluate and evaluate is a function that let's use a Harrow function it's a function that is providing us content from the website but we have to define what type of content we would like to get so in our case we will just want to document the body sorry buddy in haryana so we want the oh the whole HTML content from the website okay so we can right now log the content and see if if that's even working so maybe configure website oh that should also be passing monitor passing function okay so configure browser browser okay we can page and then okay that that's it yeah we have to provide the page and then we can wait check price with the provided page page okay that should work hopefully so let's run that in the console note index property a new page is not a function oh one second new page yeah because it's it's a browser function not puppeteer that's that's pretty obvious sorry for my mistake so okay can call it right now and hopefully we'll get some content just to check if the HTML will be returned from the Amazon and yeah that that's definitely looking like a proper HTML from Amazon a ton of HTML tags okay so clean the console that's like it looks like it's working and right now okay we can just comment that right now for a moment and right now we need to get the data of this span with the ID price block deal price so we can use the known syntax so hashes I the getter for for the ID we provide HTML so the the dollar sign is a reference to the cheerio library that's probably just a convention we can call it cheerio but in all examples and in the on their repository they they provide samples that are also using the dollar sign so I think that it's it's pretty common way to do this okay so we provide each HTML and for each for each object returned and we can call a function okay and that dollar price will be this and we have to get the text because we we got the whole content the the whole tag but we need just the content so we need to get the text and that should be dollar sign 298 okay and we can console lakh dollar price okay so let's make sure this is working where where's my console I'm always losing that okay okay no index oh okay sorry comment it out this content let's check if it's working I hope it it is at least it should its slogging for some some time yeah the return data is proper so it looks like the idea is constant and we are able to scrape the Amazon website using this ID right now we would definitely like to do to get the formatted data did the formatted price so I will just copy the regex I prepared previously so that's the current price and that's the formatted dollar sign 298 so we remove the dollar sign and just format and just convert that to a number value and that's okay and right now we can set up some some some limits so if if current price is lower than 300 then we can log by a it's it's worth this money yeah so this is probably a very basic version how it should look and work so we can check that of course and the next step will be adding the cron function so the checking process the checking function will be called every single for example our current price is not defined okay current oh okay yeah so we we just have to move this here let's use the new syntax instead of the old VARs and okay that should work right now yeah so as I mentioned we will just add a cron function and that will run this monitor every single for example our in our case we can do this every 15 seconds so so so so we can check if that's working fine we can also add price yeah that's that will be working okay so right now we'll just create a chrome function okay so let's check this last last time okay it's working time to add a cron job so a I already prepared function so we can comment it out cron is it's not part of of this tutorial so so there's no point to explain all the details and yeah so right now we can configure so start talking and we just have to call this function right let's start tracking yeah so as in the monitor function we just create an instance of the page and then we create a cron job that will run every 15 seconds and check price of the of the headphones and if on if the price will be less than 300 dollars then the console will shout bye okay so we can run that however I think that it makes more sense to add a sent notification function that will send an email so as in the last video we'll just add an as in a sync function sense notification and this function is getting one parameter price it's creating its using node mailer very popular tool to sentiments from node GS we create a transporter so the object that will let that will send us the email and then we create the content so the price drop to the parameter price there there will be also an HTML link to the website with the product that we are tracking and the subject will be price drops to price okay and there's also the console log with the information with the ID of the message which was sent if there is an error the message ID will be empty so we will just know if that started working or not so in this case we have to go back to the check price function and if the current price is less than 300 we can call sent notifications with the current price okay and that should hopefully work before we will run that I will get my iPhone that I'm using to record this okay like that there is no notification yet here we can change the time to be like 30 seconds because it's it's probably pointless to call it every 15 seconds and you will see that we should get the notification I hope yeah just just wait for that is it working one second yeah I just called twice I should get the notification soon yeah three is not notification that's let me lose face ID price dropped we can open that there is a link and we can use Safari and yeah indeed it's $298 so you can buy them right now anyway looks like it's working okay it turned out that 30 seconds is not in half so I think that that's all thank you for watching for you thank you for your time spent with me and if you have any questions feel free to ask me of course the source code for for this tutorial as usually will be linked down below the video description and you can have any questions and thoughts any concerns let me know down in the comments section I will be more than happy to explain or help you thank you for your for your time once again and see you next time bye
Info
Channel: Tom Baranowicz
Views: 26,967
Rating: 4.8959107 out of 5
Keywords: webscraping, nodejs, Puppeteer
Id: 1d1YSYzuRzU
Channel Id: undefined
Length: 21min 14sec (1274 seconds)
Published: Mon Feb 10 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.