How to Scrape Google Maps at the Country Level

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you are probably aware of it Google Maps is a perfect database to get leads from small and medium companies you can easily segment your target market you have access to Reliable data and this data can be globally governed however having access to the data doesn't necessarily mean you can easily retrieve it to keep it short and simple scrapping hundreds of leads is pretty different from scrapping tens of thousands of them it raises the question of can we do things bigger are we able to script Google Maps at a larger scale and still get a comprehensive set of data and how can we do that in this video we are going to compare two different approaches the first one is a kind of do-it-yourself method we are going to cope on our own the second one is much simpler actually it's for everyone since no coding skills are required this approach is named scrub.io I will talk about it later on the link is in the description let's begin with the first method meaning we have to deal with the challenge by ourselves and the challenge is the following we are going to scrape a category at a large scale not at the scale of a city not at the scale of a country not at the scale of a state but at the scale of an entire country in other words we will scrape restaurants in the United States the obvious issue with it is that we cannot directly achieve this result the first thing we have to do is to retrieve the list of cities and states and then we will create a kind of loop for example we will search for restaurants near New York City once it's done it will be near another city the and the City free ct4 and so on so it will be a pretty long Loop fortunately there is a website in which we can retrieve the list of states and cities and both are really important we cannot only script the cities because you've got cities with the same name but in different states I'm particularly thinking about Springfield which is pretty known because of this here is our starting point our website I'm going to use October's I copy this URL and I paste it here to create a new task I click and start in order to make things as precise as accurate as possible we are going to insert some formulas the formulas will be the following but please note that they might change over time let's do the first one together I turn on the browse mode I'm going to refuse the cookies first of all we are going to create a loop allowing us to select all of our cities so I click another step I create a loop I click on my loop I switch the loop mode from list of URLs to variable list and I copy and paste my XPath we are going to find out how we end up with this six path in a minute I copy it I've pasted here and now you click on apply as you can notice I've got all my cities in order to write my XPath I need to come back to my browser and to take a look at the HTML code as you can notice the H2 element is actually the name of the state to figure that out I use the xper which helps me to write my XPath so it starts with this then we select the following sibling which means this element and we've got 50 Elements which mean all the cities included within the 50 states and then I select the LI element which represents all the cities of the list and here is a result all right let's move on as we have said earlier we need to select two things the first one is the name of the city of course so I click on add custom field captured it on the page and I copy and paste absolutely nothing I just name my field City I click on confirm and I should be able to get the list of my cities then the state and it's related to Z6 path we just need to add some timeouts so I click on my Loop item and we will wait for five seconds before the action I click on apply I think it's over so I can run my task I click on run and I click on start on mode I see you back once I've got all my results the remaining thing to do is to export our data in an Excel or CSV format and here is what it should look like the next step is to combine both columns in order to create a third column which will be our keywords then we will insert all of these keywords to scrape our data as a result you might choose Excel but I'm not a huge fan of excel to manipulate my data so I'm gonna use the python pandas Library instead make sure that you have created a writer environment and that you have imported the pandas library and the open Pi Excel Library as well the last one will be helpful to manipulate the Excel data I'm going to use Jupiter lab as a text editor so I type this line of code on my terminal you have created a new notebook and you have paste your Excel file in the same directory as your notebook to begin with we import pundas SPD and we create a new data frame named us and if we take a look we've got the correct number of rows and the correct number of columns we create a new column named keyword which is a combination of the city the state plus the category plus the country and we've got our new colon I'm just going to remove this space character okay it's slightly better finally I'm going to save my change into a new Excel file which will be called cities keyword United States and here is your result our first method is about to end to scrape our data I'm going to use a template and I'm going to use an auto parts template I click on template I click on maps and I'm going to use store details by keyword Google Maps I click and try it one more time I insert my keyword I'm going to try it out with the first one but if you need to insert multiple keywords you insert them just like this what about the page size it sounds like 100 is a maximum you give a name to your task you click on start and you click on start on mode a few moments later my task is completed once again and we've got our data rows but much more important I've launched the same task with all of my keywords I have started my task but I succeeded in getting around 70 1000 that errors I'm going to export my data and we're going to take a look about how reliable the data is it's not over yet as you can notice I've got four files instead of one it's because there are maximum of 20 000 data rows per Excel or CSV file in order to make things clearer I need to merge these four files into a single one I'm going to use the python pandas Library one more time and this time we are going to concatenate our data frames I've got my four data frames I use the PD that can get function and I save my change into a new Excel file I've got my entire file now and I've got 42 columns but let's take a closer look shall we we've got our keyword the name of the restaurant the number of reviews the rating the address the country I assume the city the state the website the phone number the opening hours from Monday to Sunday the silly thing here is that they combine the colon additional info and the opening hours if you take a look at this data was it's identifying as women owned and then you've got the opening hours I've got the URL of the Google Maps detail page the coordinates the latitude and the longitude the category up to four image URLs the description if there is any the Press range the current status I may assume that they wanted to show us wherever the restaurant is closed or open but it doesn't seem that the data has been scraped I've got the delivery I've got the open time on Monday Tuesday Wednesday up to Sunday and we've got the popular times zero one two three four five six so I think the data is related to this graph however it raises one question I wonder whether popular Time Zero is related to Monday popular times one to Tuesday free to Wednesday and so on or is it related to the today's date in that case it will be different because if we have scraped the data on Friday we'll get the popular times related to Friday and I honestly do not know I do not have an answer to this question the first method is over now we're going to talk about the scrap.io approach if you remember what we have said at the beginning of the video we told that scrap.io is a solution for everyone because it's very easy to use you don't need to download any software you don't need to write a single line of code and you do not need to build your own scraper your own crawler using the first approach we have succeeded in getting 44 columns virtual scrap.io we are getting around 70 columns but to be more accurate how reliable the data is and how much data can you expect I won't lie to you the example I've just shown you is just a sample actually it seems like there are around 450 000 restaurants in the United States but one might wonder whether you can get more of them using the first method because we cannot know for sure because I've start my task but to give you a more accurate answer we have done the very same tests in France and in that case we've got 52 000 restaurants using the first method and 139 000 restaurants using scrap.io it's pretty safe to say that scrap.io will allow you to get more data to use krep.io it's pretty simple you go to scrub.io and you first need to create your own account so you click on login and you can sign up now once it is done you have access to your dashboard if you are not logged yet you can still get an overview of what you can achieve of what kind of data you can get you just need to insert an activity meaning a category so we've been talking about a restaurant haven't we and you insert a city here it's Twitter and friends it's because I'm currently located in France so let's type Paris for the sake of the example and you click on search and you've got the number of leads you can expect but you're gonna ask you promised we can Target leads from an entire country but yes you can I click on my dashboard and here is a place you can extract your lead you can filter your leads by typing a city by typing a lava L2 division which is the country I believe the level one division which is the state and the country so if I'm looking for restaurants in the United States if I click on search scrub.io is going to tell me that I've got about ten thousand plus results actually we've got to run 450 000 results but maybe you need something more robust precise that is the reason why you can also filter your data are you looking for closed restaurants maybe not are you looking for a restaurant with a website are you looking for a restaurant free phone numbers with email with social networks and which ones Facebook Instagram YouTube Twitter LinkedIn or maybe it doesn't matter to you do you want to know whether the restaurant claim it's listing on Google Maps what about the price range because a Burger King restaurant is not the same thing as a freestars restaurant I guess what about the rating it's pretty silly but it gives you a perfect Insight whether the restaurant is good or bad what about the number of reviews as well because you can get a rating of two out of five but if if you only got one review maybe it's not relevant what about the number of pictures this one can give you an idea of the brand image of the restaurant what about the content form on the website this one is even better than the email because when you send a message through a contact form you are pretty sure your message will be sent and will be received which is not necessarily the case when you send an email what about the pixel on the website you make your choice on your click on filter to export your data you click on well export you can give to your export a name and if you click on Advanced options you have an overview of all the columns you can get and then you click on export so if I click on my exports I can have another view of all of the exports I have done so far I just wanted to show you that you can download your files through a CSV or an Excel format now let's take a look at our columns so there are some common fields between the first and the second approach you've got the name of the restaurants whether it's closed or not the main type so the category sum over categories for instance a hotel can also be a restaurant and the restaurants can also be a Mexican restaurant or a French restaurant you've got the website you've got the phone number of course it's the time zone the full address which is divided into different subtypes so Street one the street 2 the city the postal code the state the level division one which is the state as well in our case solovela division two which is the county the country the coordinates the Google Maps link the owner's name the email I should have said emails because when there is more than one you also have an access to them you've got the Facebook link YouTube link Twitter link Instagram link link and then link the price range has we have mentioned it earlier the reviews can't the reviews waiting the reviews per score in that case we can see that more than 200 people had an excellent experience within the restaurant the number of pictures the URLs of some pictures the occupancies the characteristics and in that case you've got all of the characteristics which is related to this port and then you've got another category as you can see you've got some yellow colors and then a kind of orange color which means that now we have access to SEO Fields basically once we have access to the website we created the crawler which will help you to have access to more details so you have access to the website title so your website keywords submit a description the website meta generator which is the kind of software people have been used for their website WordPress wix.com the over emails email to email female4 up to email five you've got the contact page I should have said contact Pages up to five this time and the same thing for the Social Network the website Technologies this website use Google analytics this one use yoast SEO what else we've got a woocommerce which is also related to Wordpress and you've got the website at pixel this is the end of the video I hope you have enjoyed it if you want to get more leads from Google Maps you can go to scrub.io and if you have any questions you can ask them in the comments or directly on scrap.io through our customer support see you next time
Info
Channel: Scrap-io
Views: 11,673
Rating: undefined out of 5
Keywords:
Id: 8vnqiGhKqsY
Channel Id: undefined
Length: 18min 4sec (1084 seconds)
Published: Thu Apr 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.