Scrape Data from Google Maps (in 2023)

Video Statistics and Information

Video

Captions Word Cloud

Captions

In today’s video, you are going to learn how to scrape leads from Google Maps using Octoparse, which is a no code web scraping tool. I know. There are a lot of similar videos about the very same topic. However, Google Maps evolves pretty quickly. And some No Code tutorials created 6 months ago may not work as well as they used to. It’s time for a quick recap. The link to download Octoparse is in the description. Please, note that this tutorial will allow you to extract phone numbers from companie but not emails. If you want to get emails, I suggest you to use scrap.io instead. It’s a kind of software, fast and easy to use. I also let you the link in the description for that one. As an example, I am looking for barbershops in London. The idea is to scroll down to the bottom of the page as many times as possible in order to load new data. And then, to click on each single element in order to get the detail page. And finally, we will extract data like the title, the number of pictures, the number of reviews, the rating, the category, the phone number and so on. The first step is to copy our URL and to paste it on Octoparse. And then, I click on “Start”. We’ve got a popup which prevents us to have access to the website. But that’s okay. We need to turn on the browse mode in order to remove it. I click on “reject all” and I have access to the website. I turn off the browse mode. And, in order to make sure the popup won’t appear anymore, I will save the cookies. I go to the options, “Use cookie” and “Use cookie from the current page”. And to apply what I have just done, I click on “Apply”. In this tutorial, we are going to apply a lot of different formulas. All of them will be written in the description. So, you just need to copy and paste them. These formulas are XPaths. If you want to know more about what they mean, I will probably make a video about this topic. The first thing we are going to do is to create our loop item, meaning we will select each element within our page. So, I add a step and I create a loop. I need to take a look within the loop mode and to click on “Variable List”. Then, I insert my first formula, which is this one. I paste it. I click on “apply”. And, as you can notice, we’ve got 3 elements. But 3 elements is not enough because if we scroll down to the bottom of the page and if we redo the same process, we can notice that we’ve got now 6 elements instead of 3. So, we need to scroll down to the bottom of the page before selecting all of the elements. I am going to add another loop above the first one. But it will be a “scroll page” element this time. This is a “scroll”. And I have to choose whether it’s a default scroll area or a partial scroll area. Here is an example of a default scroll page. And here is an example of a partial scroll page. As you can notice, a partial scroll area means that the scroll bar is inserted within a part of the website. We’ve got a partial scroll area then. And, as a XPath, I insert this one. I click on “apply”. This XPath localizes the exact area in which the scroll bar is included. There are a couple more options. I suggest you select “for one screen”. And we are going to scroll as many times as possible. So, let’s say 10 000 times. It doesn’t matter as long as you check this box: “End loop when there’s no more content to load”. As a waiting time, I will wait for about 2s each time. I click on “apply”. All that remains to do is to drag my loop item inside my scroll item. Now, I can click on each of these elements. I add a click item element. I select “relative XPath”. Basically, the difference between the relative and the absolute XPath is that, when you click on “relative”, it means that it will click on each URL specific to each element. Whereas, the absolute XPath means it’s the very same URL, no matter of which element you choose. The formula for that one is just /a. And, in the options, you click on “load with AJAX”, with a timeout of 10s. I click on “apply”. To see if what we have just done works, I click on “loop item” and “click item”. And, as you can notice, we’ve got the detail page. All that remains to do is to extract our data. So, here is how we will proceed. I am going to show one example. But the process is the very same thing for all of the data. Actually, you’ve got 2 options. If the formula don’t work anymore, you can do what we call a “point & click”. If I want to extract the title, I can point on the title. I can click on the title. And I can click on “Extract text of the selected element”. And I repeat the same process for all of the elements I want to extract. However, this particular method is not very accurate. And this is why we’re gonna use the XPaths. The other way to do it is to click on “add step”, “extract data”. First of all, we are going to extract the URL. So, I “add a custom field”, “page-level data” and “page url”. Something which is paramount is to uncheck “Extract data in the loop”. I click on “apply”. And as you can see, I have got my data now. To extract the title: “Add custom field”, “Capture data on the page”. I want to get the title. And you are going to click on “Absolute XPath” each time. I insert my XPath. And I click on “confirm”. And if the process works well, I should have my title. I see you back in a minute. Finally, in order to keep our IP address safe, we will add some timeout. I click on the “Extract Data” step and I will wait for 10s each time. I click on “apply”. I am also going to add a timeout on the loop item. But let’s say around 1s this time. And I’m going to add another second on the scroll element. Once we have done this, we can run our task. I click on “run” and I click on “standard mode”. Actually, there is a change you need to make. And if you’ve got any problem regarding this, please, ask it in the comments. It’s better if you drag the loop item outside the scroll item. In other words, we will scroll down to the bottom of the page first. And secondly, we will extract all the data at once. As you can notice, if we click on “show browser”, we are at the end of the list. All that remains to do is to export our data, to remove duplicates if there are any. And we are about to get an Excel spreadsheet. Here is what it should look like. This is the end of the video. I hope you have enjoyed it. If you need any kind of web scraping services, you can ask for a quote by sending me an email. And if you need to scrape Google Maps at a bigger scale, you’ve got scrap.io. The link is still in the description. See you next time.

Info

Channel: François from Octoparse

Views: 71,680

Rating: undefined out of 5

Keywords:

Id: 7_hbCI3HBBE

Channel Id: undefined

Length: 9min 46sec (586 seconds)

Published: Sat Jan 07 2023