In today’s video, you are going to learn how to
scrape leads from Google Maps using Octoparse, which is a no code web scraping tool.
I know. There are a lot of similar videos about the very same topic.
However, Google Maps evolves pretty quickly. And some No Code tutorials created 6 months
ago may not work as well as they used to. It’s time for a quick recap.
The link to download Octoparse is in the description.
Please, note that this tutorial will allow you to extract phone
numbers from companie but not emails. If you want to get emails, I
suggest you to use scrap.io instead. It’s a kind of software, fast and easy to use.
I also let you the link in the description for that one.
As an example, I am looking for barbershops in London.
The idea is to scroll down to the bottom of the page as many times as
possible in order to load new data. And then, to click on each single
element in order to get the detail page. And finally, we will extract data like the title,
the number of pictures, the number of reviews, the rating, the category, the phone number and so on.
The first step is to copy our URL and to paste it on Octoparse.
And then, I click on “Start”. We’ve got a popup which prevents
us to have access to the website. But that’s okay.
We need to turn on the browse mode in order to remove it.
I click on “reject all” and I have access to the website.
I turn off the browse mode. And, in order to make sure the popup won’t
appear anymore, I will save the cookies. I go to the options, “Use cookie” and
“Use cookie from the current page”. And to apply what I have just
done, I click on “Apply”. In this tutorial, we are going to
apply a lot of different formulas. All of them will be written in the description.
So, you just need to copy and paste them. These formulas are XPaths.
If you want to know more about what they mean, I will probably make a video about this topic.
The first thing we are going to do is to create our loop item, meaning we will
select each element within our page. So, I add a step and I create a loop.
I need to take a look within the loop mode and to click on “Variable List”.
Then, I insert my first formula, which is this one.
I paste it. I click on “apply”. And, as you can notice, we’ve got 3 elements. But 3 elements is not enough because if we
scroll down to the bottom of the page and if we redo the same process, we can notice
that we’ve got now 6 elements instead of 3. So, we need to scroll down to the bottom of
the page before selecting all of the elements. I am going to add another
loop above the first one. But it will be a “scroll page” element this time.
This is a “scroll”. And I have to choose whether it’s a default
scroll area or a partial scroll area. Here is an example of a default scroll page.
And here is an example of a partial scroll page. As you can notice, a partial scroll
area means that the scroll bar is inserted within a part of the website. We’ve got a partial scroll area then.
And, as a XPath, I insert this one. I click on “apply”.
This XPath localizes the exact area in which the scroll bar is included.
There are a couple more options. I suggest you select “for one screen”.
And we are going to scroll as many times as possible.
So, let’s say 10 000 times. It doesn’t matter as long as you check this box:
“End loop when there’s no more content to load”. As a waiting time, I will
wait for about 2s each time. I click on “apply”.
All that remains to do is to drag my loop item inside my scroll item.
Now, I can click on each of these elements. I add a click item element.
I select “relative XPath”. Basically, the difference between the
relative and the absolute XPath is that, when you click on “relative”, it means that it
will click on each URL specific to each element. Whereas, the absolute XPath means it’s the very
same URL, no matter of which element you choose. The formula for that one is just /a.
And, in the options, you click on “load with AJAX”, with a timeout of 10s.
I click on “apply”. To see if what we have just done works,
I click on “loop item” and “click item”. And, as you can notice, we’ve got the detail page.
All that remains to do is to extract our data. So, here is how we will proceed.
I am going to show one example. But the process is the very
same thing for all of the data. Actually, you’ve got 2 options.
If the formula don’t work anymore, you can do what we call a “point & click”.
If I want to extract the title, I can point on the title.
I can click on the title. And I can click on “Extract
text of the selected element”. And I repeat the same process for all
of the elements I want to extract. However, this particular
method is not very accurate. And this is why we’re gonna use the XPaths.
The other way to do it is to click on “add step”, “extract data”.
First of all, we are going to extract the URL. So, I “add a custom field”,
“page-level data” and “page url”. Something which is paramount is to
uncheck “Extract data in the loop”. I click on “apply”.
And as you can see, I have got my data now. To extract the title: “Add custom
field”, “Capture data on the page”. I want to get the title.
And you are going to click on “Absolute XPath” each time.
I insert my XPath. And I click on “confirm”.
And if the process works well, I should have my title.
I see you back in a minute. Finally, in order to keep our IP
address safe, we will add some timeout. I click on the “Extract Data” step
and I will wait for 10s each time. I click on “apply”.
I am also going to add a timeout on the loop item.
But let’s say around 1s this time. And I’m going to add another
second on the scroll element. Once we have done this, we can run our task.
I click on “run” and I click on “standard mode”. Actually, there is a change you need to make.
And if you’ve got any problem regarding this, please, ask it in the comments.
It’s better if you drag the loop item outside the scroll item.
In other words, we will scroll down to the bottom of the page first.
And secondly, we will extract all the data at once.
As you can notice, if we click on “show browser”, we are at the end of the list.
All that remains to do is to export our data, to remove duplicates if there are any.
And we are about to get an Excel spreadsheet. Here is what it should look like. This is the end of the video.
I hope you have enjoyed it. If you need any kind of web scraping services,
you can ask for a quote by sending me an email. And if you need to scrape Google Maps
at a bigger scale, you’ve got scrap.io. The link is still in the description.
See you next time.