How to Automate Web Scraping Using Excel Macros

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello friends welcome back today we would be learning something very interesting that is how to automate web scraping using Excel and macros yes just using a simple Excel file we would be trying to scrape some values from a web page so for an example this is the website this is a simple calculator page if I enter some values over here for example 4 and 2 and click on the calculate button I get these values and now if I want to fetch these values in my excel file automatically how would I do that so for that you would be writing an Excel macro so before writing the macro let me give you a demonstration of how this would actually work so this is my excel file and in columns a and B I have two numbers in column C and E I would be automatically fetching the results of addition and multiplication from this website so let's go ahead with the demo this is my excel file and now I'll just click on this button I have already clicked you could see in few seconds and Internet Explorer the browser would come up it has come up and now it would enter the values which are there in the excel sheet you could see it is entering the values and in the background it is fetching the values as well now it has quit because the browser has closed the scraping is completed and you could see in the columns C and D the addition and multiplication results are successfully copied hope you enjoyed the demo now let's come to the interesting part where we would be actually writing the code to achieve this scraping task for that we will take a fresh sheet so in sheet 2 what I'll do I'll just copy these headers and these numbers as well now we will move on to the macro window maybe you would be actually writing the code so what you have to do is from your keyboard press the Alt key and the function key f11 alt f11 once you do that you'd be landing on this window this is the macro which I had written but don't worry we'll be rewriting it will you be doing it from scratch so what you have to do is you will find something called modules over here do a right click insert select the option module so this kind of a blank screen would come up and this is the place where you would be actually writing the code for the web scraping one very important point to notice before writing your code you have to go to tools select references and within references make sure that these values Microsoft Internet controls Microsoft HTML object library by default these two things would not be checked you'll have to find it like you will have to scroll down and find these things and you will have to select these options these options should be ticked and then you'll have to click OK I will tell you why these things are important without having this things marked your execution on the browser would not work so this is mandatory you have to do this you have to just scroll and search for these libraries Microsoft Internet controls - or HTML object library and just select these options and click on ok once you have done that now you are in a position to start writing your code so we will start writing it what you have to do is the syntax is sub any name of your choice this is like a function name which we generally give in any program language so I am giving the name as pull data from where any name of your choice would do and when you click on enter the end sub could come by default now between sub and n sub will write the logic which would be connecting to the website and extracting the data from there so as a first step what we will do is we write these three lines of code I will explain you here we are defining two objects this is the I object which we use to initiate the Internet Explorer browser and this is the dock object which I explained you in few minutes and this line is very important because you are telling this object to create an instance of the Internet Explorer application now very important point over here this macro would only support Microsoft Internet Explorer it won't work on other browsers because both are Microsoft products and those are compatible with each other so this thing would only work on the Microsoft Internet Explorer browser so here we are we have already created an instance of the Internet Explorer browser next thing is what I will tell is ie dot V is I be le visible is equal to true so once this object is created you would be able to see the browser getting launched it will be visible to you so I have marked this property as true in the next step we will be instructing this ie object to navigate now we get where to the page from where we want to extract the data in our case this is the website I'll show you so this was our website like from here we would be pulling the data so this thing has been mentioned over there ie dot navigate and the URL of the web page the next few lines are very important I'll show you now this is a do-while loop which we have created over here and within the loop we have just instructed it to wait for one second but wait until when until the browser gets ready now what will happen is like when we'll be instructing the browser to navigate to this web page it will take few seconds for this page to load correctly so until the page gets loaded correctly it would be within this loop and it would be waiting for the page to get correctly loaded so this statement is very important else it may happen that the execution would move faster and it won't wait for the page to load and you won't be able to pick up the values from the page now we are at a point where we would be actually interacting with the web elements so we want to enter some values in these two text boxes the number one and the number two and then click on this calculate button for that I should know some of the HTML properties so to view the HTML properties what I am doing is I am using a Chrome browser since I am comfortable using the Chrome browser but you could use any of the any browser of your choice you could use edge browser or a Firefox browser to see the HTML properties but one thing to remember is although we are using the Chrome browser for extracting the properties but the actual execution as I told would be performed on the Internet Explorer browser so to view the HTML properties of this number one text box what I'll do is I'll do a right-click click on inspect within a few seconds this window would come up this window may come up on the right hand side as well but you could move it as per your choice and now if you'd see this number one text box is highlighted in the right blue color over here and here I could see it's ID is number one so in my code I would be using this ID number one so once again I go back to the macro window and here I had be writing a code that would be actually inserting some value in that num1 text box the syntax goes like this we'll use this I object remember we had created this I object in the first line so I am writing ie dot document dot get element by ID and the actual ID which was num1 [Music] dot value is equal to for example full what this line of code will do is it will search for an element whose ID is num1 in our case num1 belongs to this text field and what it will do is as per an instruction it would set its value to be 4 now we want some value to be set on the number to textbox same thing what you could do is like you have this arrow kind of with marker over here left click on it and drop your mouse on the number 2 text box and now if you see the blue light blue highlighted line is pointing to ID is equal to number 2 so I would use num2 in my code in the similar way I have used num1 in this highlighted line but if you remember a few minutes ago I told you I will explain you why is this doc object created we will use this doc object to populate the value in that text box number two the syntax goes like this just follow me first we will have to use this dot object like this set doc is equal to we'll use the ie object ie dot document so your dog object is now ready now I'll explain you why we have created this dog object and its advantages so now I'll use this dog object to populate some value in the number two text box so I use doc dot and now we are getting the help from the intelligence get element by ID if you remember previously I had to type this entire thing I was not getting any suggestions so there are chances that I may make a mistake while I'm actually typing it but with the intelligence the spelling mistakes won't happen and you'd be sure you're syntactically correct so now we will put the ID which is num2 for the second text box I'll close the inverted quote dot value well you may not come in this intelligence list but you have to just type it is equal to any value of your choice I am writing in three over here so you could follow either this approach where you are directly using the I object but you will not get the intellisense or you could use the talk object and you'll get the help of the intellisense both will work so I'll just execute these lines of code to show you like what will happen when you have created this code so I'm just placing a breakpoint over here so that the execution will stop over here and you are able to see what actually happens to executed I am clicking on this green arrow on the top I clicked on it within few seconds the browser will come up it has come up now it will start entering the values and now you could see it has entered value three in the first text box sorry entered value four in the first text box and three in the second text box which was expected you could see num1 should be populated with four num2 should be populated with value three on this internet explorer browser we have achieved this now our next step would be to actually click on this calculate button so once again I'll go to this macro window stop the execution I'll close this browser and now I start writing the code to click on the battery so forth clicking on the button we need the properties of this calculate button once again I take my mouse over to this arrow left click and drop it on the calculate button and now you could see the idea of this button is BTN calculate which we'll use it in our code it is copy it now it's very simple doc dot get element by ID open inverted codes close inverted codes within the inverted quotes the ID the actual ID which is bTW and calculate in our case this time you are not setting the value we have to perform a click so we are writing click so what will happen is like this will click on the calculate button we'll get some values over here and now you want to fetch these values so to fetch the result of addition once again I take my mouse pointer over here drop it on the result of addition which is showing me as 6 and if you see it's ID is the LBL added I'll just copy this I'll use it in the code like this dot dot get element by ID the actual ID within double inverted codes inner text so when this line of code will get executed we have pulled in the value which should be present over here what we'll do is we'll store this value in a variable I am using a stay at a DD is equal to you could use any variable of a choice of to store this value similarly we need the value of multiplication so I go over your do a right-click inspect and now I see it's ID is lvl mult and there is some class now what we will do is will not use ID because you may come across situation where the HTML does not have IDs so if you do not have IDs and still you want to extract the value how you would do that so in this case we'll use this class the class is CLS lb l now one thing if you would see the class of these rest of the fields like for example the addition result the class is same CLS lb l similarly there is subtraction result if I inspected again the classes lb so we will be using the class lvl but we'll have to make sure along with the class labels something else we'll have to provide because all these results have the same class so the syntax goes like this we go into the code and the code would be dot dot get elements by class name if you remember we are using this class attributes because of which in the code we are used get elements by class name one very important point here if you see it is get element and here it is get elements which means plural and here within double inverted coach whatever is the class in our case the name of the class is CLS lb l so we'll mention it over here it will be l but as i told you there are other elements which have the same class so along with the class name we would provide its index the index over here is this is the first second and the third but internally it starts from zero so this would be 0 1 & 2 so the result of multiplication would be stored in index 2 of class CLS lb l so in the code we have already mentioned class as CLS LD L now for the index what you have to do is open a round bracket put the index number which is 2 in our case close it dot in a text now once again store this in a variable I have created this variable strname steer mult so this would store the value of multiplication so at this stage we are able to capture the values now our intention is that you have to capture these values and insert it in our excel file this is our excel file and we to insert these values in the columns C and D respectively so that is very easy the code for that goes like this this workbook dot sheets jerk within brackets within double inverted coats we have to provide the sheet name in our case the sheet name is sheet 2 dot range we have to insert the value in column C row number 2 so we have mentioned C 2 over here dot value is equal to STR ad so whatever is present in STR EDD would be inserted in situ of sheet 2 similarly for the result of multiplication the syntax would remain as is I am the scoping this line pasting it over here C 2 would get replaced with D 2 and here we want to use the STR mult because STR multi storing the result of multiplication so let's execute these lines of code and see what happens right now you could see the columns C and D for O number 2 is empty and now I am executing it and clicking on this green button on the top the internet browser has been launched and now it has entered the values 4 and 3 and now if you see in a spreadsheet you could see it has correctly populated the value 7 and 12 now you must be wondering here the values are 5 & 4 but the addition is showing 7 and 12 why is this happening because here we have hard-coded the values & 3 & 4 you could see the highlighted lines the first text box is filled with value for the second text box is filled at value 3 because of which it is showing me the result as 7 rather than 9 and 20 so what I will do is in my code I will not hard code this but I will pick up these values from the respective column so here we will pick I will pick up the value from column a and B the code is very simple it is exactly like this so once again I am just copying this and now I'm replacing four with this value but remember the range would be a two because it is coming from column a and this would be B to fine and now if I will execute I am closing this browser and now if I'll execute let's see what happens once again I am clicking on this green button on the top the browser has been launched now if you see it has entered number one as five number two as four and the respective values have been calculated let's move back to a spreadsheet and now you could see it is showing the addition is five plus four is equal to nine and multiplication is 20 so up till now we have successfully fetched the values from the webpage but now we have only fetched the value for the first row we need the values for all the rows for that we have to do some changes in the code it is pretty simple we will just use a for loop to achieve it the syntax goes like this so at this point we are entering the first value so we will place our for loop before this line like this for any variable name of your choice I am giving the name as intro is equal to 2 because we are picking up the values from row number 2 row number 1 we have just the headers so I am starting from row number 2 next is we have to write tío to the end row number we are ending at row number 5 so I am just mentioning Phi over here and these all these lines of code would be written the for loop after year because here we are entering the values into a spreadsheet and after the last line I would put a next this is the syntax of a for loop so whatever lines are present over your would get repeated four times from row number two to four so one small change which we need to do over here is if you see all these values range A to B to C to and D to a hard-coded the column would remain the same but each time the row index should increase so the for loop would take care of increasing the index and the variable which which will get increased each iteration is intro so we will use this intro variable like this it is pretty simple I am removing this two from here which was hard-coded and replacing it with an ampersand which is the concatenation operator and intro so the value of nth row would keep on incrementing after each iteration it with two three four and five so each time a different value will get picked up similar changes we will need to do for all the statements where we have hard-coded the road now row number the year similarly yeah finally the last lane so a loop is ready now what I'll do is I'll put some delay here just before we are moving on to the next iteration so that when it is executing it would be easier for us to view it so I've introduced some delay over here and now we will execute it so before executing I'll show you here on the spreadsheet and removing the this nine and twenty and all the cells of C and D are blank right now and closing this Internet Explorer browser and now I am clicking on this green button to get the execution done let's see what happens you could see the browser has launched it has entered five four which was there in the first row now the second row the third row and finally the last row and now if you see a spreadsheet let's see what happen you could see all the values have been copied correctly over here if you want to test it you could just change some values and put in four and four over here three and eight over here and once again I will execute and closing this browser going back to my code and now I click on this green button once again the browser would get launched and the values would be populated you could see it is happening it is completed if you'd come over here and now see the values for row number three and four are correctly updated to showing the correct addition and multiplication in this tutorial we extracted the values of addition and multiplication for your practice you could drive for subtraction and division and if you have any issues you could let me know just few steps before closing this tutorial what we have to do is we have to place a button which is pretty simple to do what you have to do is on the spreadsheet within the Developer tab you could see something called insert click on it select within the form controls this button left click anywhere on the spreadsheet and this kind of a button would come up you have to link this button with the sub bill you have written the name of our sub was pull data from web so I am selecting this option clicking on OK if you could see the name of our sub was pull data from web so we have linked this sub with the button over here you could just resize the button and you could change the label of the button as well so I am just mentioning click and what else you can do is like once your execution is done you could quit the browser so that the browser would get automatically closed and you could give a message so that we come to know that the execution is completed I am just writing as done and yet once again I will change few values and now when I click this button see what happens see the same operations are getting performed and you would see on the excel the values have changed for the first row and now it has given me the message as done and you could see the new values have come so this is how we automate web scraping using simple Excel macros it is very simple you just need to do a bit of practice I hope you found this tutorial helpful if you like my videos I request you to please subscribe to my channel so that you would be updated with the latest videos I would be uploading guys thanks for joining I'll join you soon goodbye
Info
Channel: Automation Made Easy
Views: 29,213
Rating: 4.9505153 out of 5
Keywords: How to Automate Web Scraping Web Scraping Using Excel Macros, How to Automate Data Scraping using Excel Macros, Import Data from the Web into Excel, How to Import Data from the Web into Excel, Introduction to web scraping using excel vba, How to fetch data from websites in excel using vba, Excel VBA Pull Data From A Website, How to automate data entry
Id: docnLSdj8rI
Channel Id: undefined
Length: 26min 57sec (1617 seconds)
Published: Mon Jul 01 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.