Scrape Data from Google Maps (in 2023)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
In today’s video, you are going to learn how to  scrape leads from Google Maps using Octoparse,   which is a no code web scraping tool. I know. There are a lot of similar   videos about the very same topic. However, Google Maps evolves pretty quickly.  And some No Code tutorials created 6 months  ago may not work as well as they used to.  It’s time for a quick recap. The link to download Octoparse   is in the description. Please, note that this   tutorial will allow you to extract phone  numbers from companie but not emails.  If you want to get emails, I  suggest you to use scrap.io instead.  It’s a kind of software, fast and easy to use. I also let you the link in the description   for that one. As an example,   I am looking for barbershops in London. The idea is to scroll down to the   bottom of the page as many times as  possible in order to load new data.  And then, to click on each single  element in order to get the detail page.  And finally, we will extract data like the title,  the number of pictures, the number of reviews, the   rating, the category, the phone number and so on. The first step is to copy our URL   and to paste it on Octoparse. And then, I click on “Start”.  We’ve got a popup which prevents  us to have access to the website.  But that’s okay. We need to turn on the   browse mode in order to remove it. I click on “reject all” and I   have access to the website. I turn off the browse mode.  And, in order to make sure the popup won’t  appear anymore, I will save the cookies.  I go to the options, “Use cookie” and  “Use cookie from the current page”.  And to apply what I have just  done, I click on “Apply”.  In this tutorial, we are going to  apply a lot of different formulas.  All of them will be written in the description. So, you just need to copy and paste them.  These formulas are XPaths. If you want to know more about what they mean,   I will probably make a video about this topic. The first thing we are going to do is to   create our loop item, meaning we will  select each element within our page.  So, I add a step and I create a loop. I need to take a look within the loop   mode and to click on “Variable List”. Then, I insert my first formula,   which is this one. I paste it. I click on “apply”.  And, as you can notice, we’ve got 3 elements.  But 3 elements is not enough because if we  scroll down to the bottom of the page and   if we redo the same process, we can notice  that we’ve got now 6 elements instead of 3.  So, we need to scroll down to the bottom of  the page before selecting all of the elements.  I am going to add another  loop above the first one.  But it will be a “scroll page” element this time. This is a “scroll”.  And I have to choose whether it’s a default  scroll area or a partial scroll area.  Here is an example of a default scroll page. And here is an example of a partial scroll page.  As you can notice, a partial scroll  area means that the scroll bar   is inserted within a part of the website.  We’ve got a partial scroll area then. And, as a XPath, I insert this one.  I click on “apply”. This XPath localizes the exact   area in which the scroll bar is included. There are a couple more options.  I suggest you select “for one screen”. And we are going to scroll as   many times as possible. So, let’s say 10 000 times.  It doesn’t matter as long as you check this box:  “End loop when there’s no more content to load”.  As a waiting time, I will  wait for about 2s each time.  I click on “apply”. All that remains to do is   to drag my loop item inside my scroll item. Now, I can click on each of these elements.  I add a click item element. I select “relative XPath”.  Basically, the difference between the  relative and the absolute XPath is that,   when you click on “relative”, it means that it  will click on each URL specific to each element.  Whereas, the absolute XPath means it’s the very  same URL, no matter of which element you choose.  The formula for that one is just /a. And, in the options, you click on “load   with AJAX”, with a timeout of 10s. I click on “apply”.  To see if what we have just done works,  I click on “loop item” and “click item”.  And, as you can notice, we’ve got the detail page. All that remains to do is to extract our data.  So, here is how we will proceed. I am going to show one example.  But the process is the very  same thing for all of the data.  Actually, you’ve got 2 options. If the formula don’t work anymore,   you can do what we call a “point & click”. If I want to extract the title,   I can point on the title. I can click on the title.  And I can click on “Extract  text of the selected element”.  And I repeat the same process for all  of the elements I want to extract.  However, this particular  method is not very accurate.  And this is why we’re gonna use the XPaths. The other way to do it is to click   on “add step”, “extract data”. First of all, we are going to extract the URL.  So, I “add a custom field”,  “page-level data” and “page url”.  Something which is paramount is to  uncheck “Extract data in the loop”.  I click on “apply”. And as you can see, I have got my data now.  To extract the title: “Add custom  field”, “Capture data on the page”.  I want to get the title. And you are going to click on   “Absolute XPath” each time. I insert my XPath.  And I click on “confirm”. And if the process works well,   I should have my title. I see you back in a minute.  Finally, in order to keep our IP  address safe, we will add some timeout.  I click on the “Extract Data” step  and I will wait for 10s each time.  I click on “apply”. I am also going to add a timeout   on the loop item. But let’s say around 1s this time.  And I’m going to add another  second on the scroll element.  Once we have done this, we can run our task. I click on “run” and I click on “standard mode”.  Actually, there is a change you need to make. And if you’ve got any problem regarding this,   please, ask it in the comments. It’s better if you drag the   loop item outside the scroll item. In other words, we will scroll down to   the bottom of the page first. And secondly,   we will extract all the data at once. As you can notice, if we click on “show browser”,   we are at the end of the list. All that remains to do is to export   our data, to remove duplicates if there are any. And we are about to get an Excel spreadsheet.  Here is what it should look like.  This is the end of the video. I hope you have enjoyed it.  If you need any kind of web scraping services,  you can ask for a quote by sending me an email.  And if you need to scrape Google Maps  at a bigger scale, you’ve got scrap.io.  The link is still in the description. See you next time.
Info
Channel: François from Octoparse
Views: 71,680
Rating: undefined out of 5
Keywords:
Id: 7_hbCI3HBBE
Channel Id: undefined
Length: 9min 46sec (586 seconds)
Published: Sat Jan 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.