How To Generate Google Maps Leads with Selenium Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is Michael in today's video we are going to see how we can scrape Google Map leads using selenium in Python and yeah with that said let's get straight into the video okay so first of all before we start coding we need to see what our boat has to do step by step in order to scrape the leads so first of all as you will see right here if we visit Google Maps and search for lawyer let's say we want to get lawyer leads then what our bot has to do is visit this URL right here and then add this keyword basically what it needs to search for example lawyer and then this part is not necessary so for example if I want our boat to visit doctors then we will take this part and then I'll doctor and then it will go to that Lo to those leads first of all we will need to load all the leads so I want our bot to scroll down until it gets all the results so it will keep scrolling down as you'll see right here and this is not endless of course after some time it will finish scrolling but it depends on your location and there we go so as you see we got youve received you've reached the end of the list and that's perfect so now everything is loaded all the results are loaded what our bot needs to do is go to each one of the results or basically fetch all the results and then for each result it fetches we basically want to get the title the reviews The Stars maybe the occupation the address of course and of course their telephone number and then as you will see many of them don't have a website so if you have a web design engine see maybe you want to scrape those that don't have a website so you would scrape this the website value and if it it is null or undefined or basically doesn't exist then we save them to our list with leads and of course you can do whatever you want after that now as you start scaling this up you will run in two main issues now of course if you just want to scrape those few hundred leads you'll be fine right but when you start scraping tens of thousands of lists for example let's say you don't only want to search for doctors you want to search for lawyers you want to search for whatever you want and you start scraping more and more data Google will realize that you are doing this using a bot and it will start blocking you and another issue is that for example as you'll see right here if our bot searches for Doctor it will will basically show you results in nearby your location or basically either in your country or in your city but that's a limitation that we don't want for example if you want to search for doctors in your location but you want to also search for New York or Denmark or whatever then you won't be able to do that because the say doctor and then New York won't give you as accurate results as if you were using your board in New York for example now there is a way to bypass both of those isues and it is by using proxies proxy will basically mask your IP and it will allow you to change your IP address which is what Google uses to identify you and then basically using a proxy you can also specify with some proxy providers what location you want want your pro to be at so for example we can be very specific we can choose USA specify the region for example New York and then again the city New York City and get very specific results because if you just put a country for example USA it will give you random proxies and this is not what we want also because we are using proxies now every time we search we get some results it will be with a different IP address so we are basically bypassing those issues now as you start using proxies you will realize that not all of them are the same some of them are clean and some of them are bad but basically if you are using bad proxies Google will be able to detect that you are using a proxy and not your own IP address and it will block you or if you are using free proxy for for example most of them don't even work so when you start using them they will not even load the website and even the ones that work will basically either be very slow so your script will be unable to run or it will be very or extremely slow with free proxies you you are almost guaranteed that you will be banned or basically be detected from Google Now if even if you start using paid proxies not all proxy providers provide good proxies fortunately there is a way to check the proxy quality and make sure we are using quality proxies and this is by using or this is basically what I recommend and this is by using this proxy Checker by pixel scan and basically this plugin for Firefox which I'll have a link down in the description allows you to give you give it your IP addresses or basically your proxies and it will scan them and tell you how likely they are to be detected or how clean they are another thing is that with free Proxes since they are free and most of the time they are public everyone is using them and that's why they are easily being detected because there is a lot of traffic going through them and Google can easily detect okay so let's see how we can use this plugin so first of all click the link in the description and click add it to Firefox and let's click add okay here it is so let's open it as you'll see it has protocol HTTP which is what we will be using for our proxies and then here as you'll see you can paste your proxies your proxy list up to 50 proxies which is fine we just want to check most of the proxies and yeah so let's grab some free proxies so you can see the difference between free and then some paid ones there we go so I got some free proxies so let's click check proxies and there we go so as you see we got very bad results actually yeah that's very bad now not all of the time there are this bad so as you'll see here on the status we got error so this mean if we had used this proxies none of them would work so they would they wouldn't even open the website not even scrap them so yeah that's very bad as you see with free proxies success rate is 0% but when using paid proxies success rate is always 100% if it's below 100% then you are using a very bad proxy provider I've tested many proxy providers in the past and most of them only provide clean proxies 25% of the time which is very bad considering if we use bad proxies and maybe we do that repeatedly Google will block you and also your script will most likely malfunction and then I found node Maven they provide clean proes as you'll see right here 95% of the time and they also provide proxies with super sticky sessions that last up to 24 hours which in our case is not as useful but for example if you used proxies when I don't know you are doing ads on Facebook and you want to switch your proxy per account but you also maintain that quickie for many hours then that will be very useful Now by partying with no Maven I was able to get a link which will give you access to our trial that is going on right now and also by using the code Michael on checkout you'll see right about here you'll be able to get an extra 2 gab of bandwidth for free so after you're signning up using my link down description and go to proxy setup here on node Maven as you will see we have some settings we can which we can apply for our proxies and let's go through each one of them and see how it can be useful in our case so first setting which is very useful in our case is the location setting and here you can specify exactly what country region and City and also the IP that you want to use for your proxy for example let's say we want the country to be United States the region to be New York and the City to be New York City then our ISP it doesn't really matter but if you use their proxies and you want to bypass some sort of B detection maybe you want to use a popular ISP like AT&T so for now I'll leave it to random because in our case it doesn't really matter and then we want the session type to be rotating so this gives us the ability to only use one of the proxies right here and when we use that to our script each time we load a new page it will automatically change the IP address so we don't have to do the rotating of the proxies ourself which is something that not many providers provide this ability and also this ability to be very specific okay so let's go ahead and specify the country for now let's put United States leave everything to default and let's copy paste our Pro here make sure you put number of pro to 50 then let's go back to our Firefox plug-in and let's test our proxy we got very good results as you see the average quality level is high which is the best you can get and then we only get low quality IPS 8% of the time which means means we get good proxies or basically very low risk or proxies as you'll see most of them are at 1% which is the best you can get and also basically this means we get 92% of the time okay so let's start coding so first of all let's go ahead and create a new folder let's name it selenium um I don't know Google Maps leads scraping and then let's open it in terminal and if you're using visual studio code do code space and then dot to open the folder in Visual Studio code otherwise if you are using any other ID and then I guess it's fine just make sure to open that folder in your ID okay so first of all let's create a new file and call it main.py so instead of using selenium to create our web driver we will be using selenium wire since by using selenium wire we are we are able to use proxies and there we go so if we scroll actually scroll down then we search for proxies um proxy there we go as you'll see it gives you an example of how to use it and that's perfect actually so let's first of all go up copy the example or actually just this part go back to our code copy it there we go then let's go on Google and copy paste this part of the URL for Google Maps put it here and then actually yeah put that then let's make it Dynamic so put an F here then we will we will replace this with a value so let's put here keyword then create that value so let's do keyword equals to and then let's do lawyer of course usually in your application you might want to get that lower part for the keyword from a txt file or basically a list with keywords that you want to scrape and go through each one of them want to import some stuff from selenium itself so what we will do here is say prom and then selenium Dot and then and then let's do web driver and then we want to get from chrome the service and from that import service let's also copy that so we don't have to repeat it and we also want to get from Comm actually we want to get the Chrome driver manager and I'm going to use Chrome driver manager wait no we want to get Buy since we are going to use that in our selectors then we also let's copy that again need to get from support the web driver weight so we can wait until the selector is visible and then we also need to import expected conditions and then as EC there we go and then another thing that we will be using is the Chrome driver manager so let's go back here and what this allows us to do is so instead of trying to find un compatible Chrome driver for our python application and then updating it each time what Chrome driver manager do is install it automatically each time you are using the app or basically the first time I guess let's only copy this part put it here there go and also import Json since we will use our results we will save our results in adjon file and also let's import re for our reject now explain why we need that in a bit basically we will be using Rex to scrape the phone number from the results yeah that's it for the Imports of course you need to install those libraries if you haven't already so so we will do peep then our version 3.11 and then let's say selenium selenium wire then web driver manager and of course install and then what we want to install and there we go since I have already installed everything this is pretty fast but in your case might take some time now next thing we want to do give our driver some options so we will do Chrome uncore options now equals to web driver do options or Chrome options specifically since we are using Chrome and then we will create a service which is where we will use our web driver manager or Chrome driver manager then using our service we will specify to use the Chrome driver manager service and then the install command from that there we go we also need to create our driver so we have done all of that already let's just pass the options here so service we will use our service next we want to pass the options so that equals to Chrome options and let's leave that to now in a bit I will show you how you can use proxies as well but for now let's keep it's simple and yeah actually that should be it let's test this part now and make sure this opens the browser and it says cannot import Name by okay yeah okay so here on one we need to do do BU okay so here we need to do support uh web driver away. UI and then here we need to do dot f we don't have to do anything here okay so let's run R again and there we go and that's it now as you saw it gave us a Cookie window which will will we will need to click accept automatically so let's go ahead and do that first so let's put a TR catch since we don't always get that Quicky window and then put accept here and then put an exception here and of course we will pass in here what we will do is put a web driver weight for our driver here and then pass the seconds so maximum 5 seconds and then we want to await until the EC do element is clickable element turns to clickable so we want to wait until the element we will give it is clickable and then we will do buy and then CSS selector comma and then let's go ahead and find that CSS selector and in the end we will do do click and that's it so let's go here let's copy paste this and let using cogo and see what we get and there we go so this is the window we get and let's try and find the selector so open inspect element and then go here let's find the B right here and now we need to find something unique now as you see those JS names or those class are basically autogenerated so these are not very reliable because the next time Google does an update this most likely will be changed we need something that is more permanent and here area label as we change the countries this will be the same so if you are in the US for example you get English but once you change the countries this will be different so we cannot use that either but as you'll see here what we can do is use the form so as you will see we have two forms here so what we can do is say form and then get the second child as you'll see when we get the second child the first element that we can click is this one right here so let's go ahead and use that since the form is also the button it's the same thing so let's close out go back to our code and then let's pass the selector here perfect so let's run the code again you also want our browser to not close immediately so what we can do is say time do sleep and then put 60 here and also add time here so import and then time there we go and let's run it again and there we go so you click the button and that's it okay so we made a mistake here we put the dollar here let's remove that that's a mistake there we go we have to do that since this is not JavaScript but yeah that worked so let's close this for now and let's go back and let's go to the Second Step our script has to make so as we said the Second Step would be to scroll to the bottom until it cannot do that anymore so let's go back to our V code and let's code that part so here what we will do is use scrollable actually first of all I want to say something if we keep scrolling and we code that in our bot as you will see it will start scrolling on the map or basically it will not function at all we needed to SC scroll on the bottom but right here specifically we need to get the selector for this list so let's go here and get this selector that surrounds this list now as you'll see this is the unique element of it roll equals feed so let's grab that and what we will do now is say scrollable F div equals to then we will use a driver and say do find element by the CSS selector and then we will put this selector here as so there you go now that we find the scrollable div by the CSS selector what we will do is say driver execute and we want to execute some JavaScript so we have to add three strings here there go as so and here we will post we will put the custom vanilla JavaScript script for our Bo to run so what we will do is say bar and then scrollable div equals to and then arguments since we will pass the scroll div via arguments and then we want to get the first argument and then the way to pass the argument is right here so we will say scroll will be here there we go now the next step is to create a function and let's name this scroll within element and this should take the scrollable div there we go and of course we want to return that function so let's say return scroll within element and then we pass of course the scroll b d right here there we go next we want that function to return a promise since we need to do a certain task and once the task is completed then we want to return that promise so we will put here resolve or reject and then run that and of course close that and here we will do bar and then total height height and put that to Zero by default then we want to specify the distance basically the little height will be for the scrollable div then the distance distance it it should scroll its time so we'll do distance equals to and it should be to a th000 and then add the scroll delay between each scroll that should be to 3,000 actually it will be much faster and you'll see why in a bit and then we will create a timer and then create the interval and we need that interval to run every 200 milliseconds okay so now we want to get the scroll head before so we can compare it afterwards and and we we will use the scroll b d for that and then we will get the scroll height height there we go then using the scroll B div we want to scroll by and we will scroll the distance we have specified and also say total he basically update the total height and say plus equals two so we want to add the distance we scrolled there we go then we want to say if total height is bigger or equal the scroll height before then we want to say total height equal to zero basically reset it and then put a time out and in here on the timeout we will use the scroll delay so put a timeout out for 3,000 milliseconds then say scroll height after so basically get the height after and using the same method here then say if scroll height after is bigger then scroll height before then we want to return and basically do nothing but otherwise we want to stop scrolling down because otherwise we have reached the end and basically we want to cre clear the interval for that timer and then resolve okay so I have to reformat the script of bit and as you see after running it it start scrolling to the bottom so let's see there we go so as you see there we go it works so again make sure to copy it down from the description but yeah so as you see after we had we have loaded all the results then we need to get all the items all the results so we need to use some selector and we need to make sure we are not to get here the HR we only want to get the results here and of course we cannot use a class name since this is generated and what makes sense for me is to use again the roll feed of course then go to the div here for the selector and then get the ones that have ajs action test out first of all and and yeah as you see right here we do get as the first result the here the options right here but the second result we keep getting each result for this selector right here and the way to get around not getting this one right here is actually pretty simple what we will do is scrape the results for each result and then say the ones that don't have a title which would only be this one right here we filter them out so for now let's copy the selector which is which works perfect for us and let's say items equals to driver do find elements because we want to find all the elements that match this selector and of course we will use a CSS selector and then there we go put it here and then what we will do is say results equals to an array which will will upend the the data here and then what we want to do is say for each item in items create a data which will be an object which here we will append the title and all the information we get because sometimes we won't get all of the results for for example some of them don't have reviews or don't have a phone number website Etc so we want this to be dynamic and another thing we want to do is for each value we want to get use a TR catch since some of them as I said won't exist so try and then use accept here and then exception and then do nothing with the exception here what we will do is say data and we want to to add the title and what we'll do is say item do find element then here we will specify the the selector for the title so actually we'll do buy. CSS selector then the selector goes here and then from that of course we want to get the text so let's go back and let's find the selector for the title so let's go here and actually want to get here I want to get inside this there you go keep going inside of it and there we go this is the title and let's open again and try to find a unique selector and as you'll see font headline small should exist as you'll see right here for only the titles yeah as you see for each result we get the title so let's use that go back and copy paste it here move that part and yeah now what we will do is say if data. title actually no if data doget and then title then we want to append this subject to our results no wait what we want our results and then DOA pen and then data yeah okay mine and there we go that should be it and then finally with and then let's just open and then create a file so results. Json write to that file as F and then basically dump our data to that file so dump results using our F and then let's push some let's use some formats so ENT two that should be it now another thing we want to do is put all of these in a triy catch just in case it fails so let's copy paste this try put that inside use finally so whatever happens we want to quit our driver in case the script fails of course let's put the time slip here before we quit the driver and of course you will remove that when you actually use this in production this is just so it doesn't close immediately so we can see what is happening and yeah let's run our script and what should happen is create the results. Json file and then append all for all the results the title so let's wait there we go so there yeah we got it so let's go on result to the Json and this is almost perfect as you'll see we get some weird stuff here so this is because we got the text in Greek so this is why we get those weird characters so as you'll see when I save my results I make sure I use in coding and then utf8 and let's go ahead and add some more data we want to get so let's let's copy this actually and we will also want to get the link we also want to get the website link basically the link here is the Google Maps link and also want to add the website and then for Ratings or for number we will use a completely different method so let's do those two for now and of course for link on website we're getting the hrf so let's do here get attribute and then hrf for both of them there we go and yeah let's go ahead and find the CSS selector for first of all for the link and actually that's pretty simple we can just use the a tag so let's do that then let's go ahead and find the selector for the website now as you'll see this is also an a tag but this is the second a tag this is not the first a you get so let's find there we go as you can see data value equals website so let's go ahead and use that so it is an EG dog and it uses this and we want to use single quotes there we go so let's run this and test test it out for some reason we didn't get the website yeah if we see right here okay yeah never mind so as you'll see the data value here actually changes per language so we cannot use that either okay I guess we need to find another selector what if we do div and then that yeah that works perfect so actually we don't have to use this part and yeah okay so let's put that and yeah so for the selector of the website let's replace this with this one right here perfect so let's find again okay this didn't work for some reason we still got the link Google Maps link so I guess our only option is to choose that and put that there and yeah that should work now and there we go so as you saw we just have to use the full selector for this one so now the next thing and we will get two more values so the first one is the review so let's see how we can get that one so first of all let's find it there we go let's go inside and as you see it's pretty simple so what we can do is use the font body medium and then also the span that has an attribute of roll equals to image there we go perfect now as you see with this selector we only get the views perfect and then what we can do is get the area label so let's do that so let's copy this and yeah let's copy this again copy this put it here then for attribute as you saw we want to get the area label so let's do that and let's replace this with rating text now have in mind as you saw the attributes um I guess the language attribute changes based on your browser and based on the proxy we get so we want to make this Dynamic and not based on the text so we cannot basically say split based on the stars and then get the first value and the second value that won't work since the language changes so what we can do is say rating then numbers since we want to get the numbers and here we can use some so we'll say float and then say piece dot replace then for each piece and I'll explain how we will use this piece right here what we want to do is replace the commas with a DOT seeing in some language you get a comma instead of a DOT which is what we need and then what we will do is say four piece in rating that uncore text. spit and we want to get each value on that if P dot replace comma with a DOT then replace a DOT with nothing Dot and then we want to make sure that all values are a digit I think I made a mistake let's find a mistake ready okay so yeah it equals two okay so we save for each piece on the rating text and basically we get each part of that phrase and then we make sure to replace the commas with the dot same here and then we only get the digits on that text so now we know for sure that the first digit is the stars and then the second digit is the reviews so let's do exactly that so if we have rating numbers because as I said said for some would have alreadying at all then we get the stars and the stars is the rating numbers and then the first value and then for the reviews and then what we can do is rating numbers and then one and then say if and then uh rating numbers is bigger than one else return zero okay I made a mistake here on the dot split on rating text we need to leave a space here because we need to split the rating text on Spaces otherwise it doesn't make sense so let's reun it and let's check the Json perfect okay actually we only got the Stars we didn't get the reviews for some reason so so okay actually here it should be if Len basically the length is more than one else zero so finally let's get the phone number and this one is a bit complicated we do have to use reject because I haven't found a reliable way of using this selector so in that case for the PO number I'll just copy paste the code and explain what is happening okay so as you see we first of all get all the text from our result from the item and then we use some phone pattern to extract the phone number and then what we do is basically use the reject to basically do the extraction to find all the matches we get the first match then we create a value for The Unique phone number numbers and then what we say is if unique phone numbers exist we return this value else we return none uh so yeah let's run our script again there we go perfect so as you see we extracted the phone number and also the reviews is correctly shown here instead of shown for example reviews 1.0 so yeah we extracted all the data we wanted and finally what I said in the beginning we want to see how we can use the Proxes now in order to change the location so let's go up here or actually let's go back on Chrome let's go to selenium wire and then find the prox example there we go on Proxes here and as you see this is the way to do it so let's copy this put it here and actually let's copy this part and remove this we only need that part and put it right here perfect now we need to replace two values this those two right here and those two will be replaced by the proxy and we will create a value for that and then here let's specify the proxy and let let's leave it as that right now by the way I did say you don't have to create the code for rotating the Proxes but in case you want to automate the whole process and basically have a proxy for each location that you want to scrape then of course you can have a txt or whatever you want with all the proxies and then map them or basically use a for Loop or something and go through each proxy as you'll see right now but for now we will do it for one proxy and we will do it manually so let's go back to node Maven let's switch the output format to protocol so we get this format right here and let's go here to rotating country let's specify United States region New York City New York City let's leave ISP to random unless you actually want to use a specific ISP and that's it so we only want to use one of the proxies we have to use multiple since it is rotating there we go and let's run the script again so as you'll see now as we run the script as you'll see there we go perfect so as you'll see we are getting results for New York City and there go as you see we've reached the end of the list and we got the results so let's see and there we go so as you see we got all the results and we got about 121 which is perfect got the phone number successful their views the Stars the website the link everything we will have the code down in the description and of course make sure you'll use my link to get access to the node Maven proxies and get access to the trial and also using my code Michael you will get an extra 2 gab of bandwidth so if you enjoy this video and it was helpful I appreciate if you leave a like And subscribe to the channel so you don't miss any of my next video projects and yeah with that said see you in next video
Info
Channel: Michael Kitas
Views: 979
Rating: undefined out of 5
Keywords: Selenium tutorial, Python scraping, Google Maps leads, lead generation, web scraping, Python tutorial, business data extraction, Selenium Google Maps, Python automation, web scraping tutorial, Google Maps scraping, JavaScript scraping, Selenium Python, local business leads, automated lead generation, SEO leads, data scraping, Selenium scripts, Python projects, Google Maps API, Python web scraping, Selenium browser automation, digital marketing, Python coding
Id: bOZmPSUlLmg
Channel Id: undefined
Length: 46min 46sec (2806 seconds)
Published: Sun Mar 10 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.