How to scrape Google Maps data with Nodejs and Puppeteer | Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and this is actually it so all you need to do is just to run node index js we open the page we scroll down we get the places and this is it this is how our parsing going to work hi guys it's been a while since i last time published the video so i was looking at the app work what people do as you can see there are a lot of people who are using the scrapping services so i just opened the first one and there are some requests to parse google maps yeah so this guy had at least three requests in this year so maybe some of you will be interested in that so i've decided to implement it on my own so let's say you want to get information about all of the sushi restaurants in houston so we can just copy this query and paste it into the postman let's open it run the new query type and here we go here is actually what we got from the google maps unfortunately again if you look at this content there are actually a lot of mess so this information based on my experience it cannot be easily parsed so google are not really welcomed for you so they do not expect that you can get this information for your needs all right maybe in this case if we'd like to look at the network we will get more success but again whenever you whenever you scroll down you can see that there are a lot of http requests and none of them are using you know like json or data that can be consumed easily for you most of the time this is just some binary format that i have no idea how to parse it looks like a challenging task for me so that's why i've decided to implement this parser with the puppeter package so as a part of this video i'll show you how you can open the google maps page how you can scroll down parse the places go to the next page parse the next places and so on so forth until until you'll get all the places that are available to you by this query so let's get started okay so to complete this project we need the node.js framework you can download it from the nodejs.org next i created an empty folder and here i'd like to initialize our project so we can type npm i need y and here we go okay so now we have the empty project and we need to add the dependency the package that i'm going to use for scraping data from the google maps is the puppeter so you can find it at the github.com all we need to do is just to run this command in the terminal so if we go back to the vs code we can type it here and just wait until it installed during installation it is going to download the chromium browser and maybe the firefox but that should be fine okay next the package is installed so what we can do is just create the index.js file and let's see that it works so we just type console.log and here i can type not index.js and it works the next thing to make sure is that the popular package works so we can copy this code paste it into our file and instead of using this way i'd like to specify headless like false just to show you how it works under the hood so let's type node index.js okay it was too fast so instead let's comment this line and run it once again here we go so what is actually happening we create the browser we open a new page we go to the example.com and we make a screenshot so other words what we have now in the folder is the image of the page that we've just opened and i would say that this is all we need to start working on our parser so let's say that we'd like to pass information about all of the sushi restaurants in houston so if we type it here we can see a lot of them then we never get to the next page we see 20 more and pay attention that all we have the first is just seven or eight places then we scroll down and the google loads additional places some more and only after that you go to the next page so while working on this project what we will need is that to open this page got the places then we will need to scroll down get the next places and the next one and only after that when we have all the places paused from this page we go to the next page and we're going to do this until all the places are parsed so in order to get the query string for this search i think we can remove this part type it and here we go so this is all we need to start parsing the data so we can copy this query put it here and just to make sure that it works once again we can run node.index.js this one opened and at the example dot png we can see this information so next i'd like to change the size of this page so i can paste these lines and just set the viewport again if we run it we should see that now the size of the window is bigger and the page is also bigger so you can see more information on this single page so next we actually don't need this screenshot and i can also remove this line save it and i think we can start parsing data so for example if i open it and i'd like to i'd like to get information about the name of this place so as you can see on the right side um what we have here is um we have some html markup so what i'm going to do is i need to find the exact element in this markup and here we go i would say that google is not welcome to us because as you can see the class names uh complete mess but however if you take a look at this one you can see like gm2 subtitle alt 1 so this is exactly what we need and even though this information is not a part of this div we can start with this div and then we will get the name of this place so what is actually going to happen is we are going to find all of the elements with this class on the page and just get the name from it i can show you the console so i think if you type document query select role and just paste this class name you will get the list of these elements and as i told you there are only seven elements right now on the page if we scroll down and type it once again now there are 14 elements once again and now it's 20. next let's create the function parse places and pass the page object as a parameter i don't want to bore you guys with the typing lines of code so let me just paste it here and explain it to you one by one so we have the empty array of places next we are going to get the elements from the page with this method so as you can see we used the same query selector and what we are going to have as a part of the elements is a list of spans that were extracted from the page so if we go back to the page this is our selector and this is our list of spans so if we open it you can see that actually the name of the restaurant is just the inner html or the inner text property after that we will just iterate through the elements and get the name of it with the using of the evaluate function so this is it let's call this function from our main page we pass the page we'll wait let's take cons places wait and just console log base places so if we run it once again we see the error and okay the reason for it is because yeah i think the reason for it is because our function is not a sync but we are using the weight so we need to add this one here save it and run once again so the browser started we got the data and now we have our first first seven places parsed so now in order to get the next places we need to scroll down in the list of places at the google maps so instead of writing my own function i've just found that you can actually use this one so we can copy this autoscroll function and paste it into our project let's paste it here but instead of scrolling to the document body scroll height we need to find the element that we're going to use for our scrolling if we get back to our page and again we'll open it and we need to look at the we need to look at this element but unfortunately this is not the only element on this page so another one is its parent so instead of using this one we will need to use the second one so for doing that we are going to use document query selector all we paste this one we see that we have two elements in this node list so this way we can access under the second element let's get back to our function so these two lines we are going to replace with this ones so what is happening here once again at this line we are saying that we are going to run this function in the context of this page next we will return the promise that will be resolved once we scroll to the bottom of this page the bottom of the page but actually not the page but the element is declared by comparison heights of your current height and the height of the element and here this is just the set interval so we run this function each 100 milliseconds so we get the element and we scroll down we increase the total height and the actual scroll is happening well where is actually happening yeah my bad i i forgot to paste this line as well so and in this line actually we scroll by um this distance in this element so we get this element and we use it for scrolling on this distance i think we can increase the distance to 300 instead of 100 okay and let's call let's call actually this function and see how it works so let's go here once we pass our places we can call it and let's start once again and see how it works so we open it we get the places and now we scroll to the bottom so let's say we got 20 places from this query and the next thing that we're going to do is actually to go to the next page let's open the html markup and see how we can find this button we can actually find the button with the area label property and its value is the next page so in order to do it we just let's open the console document query selector all and here we need to type that we are looking for a button and its area label attribute should have the value of next page something like that and unfortunately we haven't found anything so i think the problem in this yes in the extra space and here we go so now since we have this button we can actually click on it let's try to do it from here let's get it and click so this is what we're going to do in our code we need to find the button click on it and get the next list of pages so i think we can copy this selector go back to the code and i'm going to add the next function go to next page again we pass the page parameter and this is the selector that we are going to use so again i'm going to just paste these two lines of code the selector is the same so we don't need it so what is happening is we just use the click function of the page and we'll click on this element and the last thing here is that we don't want to run our code until all the http requests are completed so we can just call this function that will wait for it again since this function is going to use the weight we need to mark it as i think and let's try to call it and see how it is going to work so i can copy it i can put it here call await so let's start it note index.js so we open it we scroll to the bottom and then we click all right so i think we have everything we need to start implementing our loop so all the elements are in place and let's get back to the code okay so let's see how it will work let's do some very dumb implementation do while true i know it's stupid but we're going to work on this one a bit later so let's say we have led places as the empty one we're going to put this one here i'd like to cancel log these places and yeah so we will have the infinite loop so when we open the page we scroll to the bottom get the places and go to the next page this is all we need so let's see how it is going to work like node index js here we go so it is actually iterating over the pages and yeah i hope it gets information yes it and it constantly printing it looks cool i'm not sure that we may even need something else besides this one i would say that this is actually the basic implementation for parsing the google maps once it's done you will see that actually the infinite loop is just keep running so now instead of running our loop infinitely i'd like to implement the stop condition i think the best way to do it is just to make sure that the next page button is disabled and let's open it at the markup here we go as i showed you before all we can do actually just we can just get this button by aerial label with the value next page let's get it let's open the console document query selector here we go and we need to say that we're looking for the bottom and that's it if there is no more pages that would like to parse you can see that there is a property called disabled equals true so this is what we're going to looking for we can get back to our application and i'm going to paste this function again let me explain you how it works we are looking for this element of the page by this selector if we don't have it we will throw the error next we are going to parse the disabled value again we call this function [Music] in the context of this page so we'll get the element attribute disabled and then we will just return its value so the name of this function is has next page again we need to pass the page into it and instead of running while true let me say while await has next page page so whenever we have elements we will keep running this loop and after that when we don't have it let's let's actually print all the places afterwards instead of printing them during the run and instead of overwriting our places at each iteration let's say we want to call places concat and pass this function in this case we'll just add new places to the existing array and finally we will place them but let's add some additional information like post i don't know places length places just to make sure that it's keep working okay let's run out index js and we'll see how it works so we run it on the left and we can see that it's actually working i don't know why but we got 22 places instead of 20 usually usually whenever i run this function i got 20 places each time but okay the more is the better and this is it um again i don't know why why we end up in this city instead of houston but all in all i think that all of these places are what we were looking for at houston from now on you can you can do with these places wherever you want you can export it to the csv file or for example if you'd like to get more information about these places then you might want to get the link of it like this one because by default google maps does not allow you to get information about the website or about the telephone number so if you'd like to get this information as well what you need to do is you can pass this link and just open it so if you open it in the new tab again you will get the information about this place and you will be able to extract phone number the website maybe the address whatever you want and this is it for today thank you so much for watching this video um please let me know if if there is something else that i can cover in my next video about parsing google maps maybe i forgot something and have a good day
Info
Channel: Seasoned Developer
Views: 403
Rating: undefined out of 5
Keywords: web scraping, puppeteer, javascript, how to scrape google maps data, how to parse google maps places, parse google maps with nodejs, how to use puppeteer to scrape data, google maps parser tutorial, javascript parser tutorial, data scraping with nodejs and puppeteer, how to extract data from google maps, export google maps data with javascript, how to get google maps places, nodejs puppeteer tutorial, javascript tutorial for beginners, seasoned dev
Id: 5xOD4-M2jSw
Channel Id: undefined
Length: 21min 22sec (1282 seconds)
Published: Fri Oct 29 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.