Web Scraping – Automate collecting tabular data from a web page using Power Automate desktop

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and a very warm welcome to today's tutorial in today's tutorial we'll see how we can automate collecting of tabular data from a web page using power automate desktop my name is clavin and i work as a technical consultant you can find my blog at clevenfernandesh.wordpress.com and my twitter handle is at the rate clavin fernandez this particular video tutorial is based on my blog web scraping in this blog you can go ahead and find the prerequisites just remember the first prerequisite would be to install power automate desktop secondly you need to configure the browser extensions so remember you can use any browser of your choice edge chrome or firefox just make sure that you have the extensions for the browser configured other than that you should have some knowledge about power automate desktop and also flow you need to go ahead and configure the on-premise gateway you need to run the flow in an attended or unattended fashion from cloud then you'll need this if you're just going to use it with power automate desktop so it would not be a prerequisite but we will run the flow from the cloud okay we'll go ahead and invoke our power automate desktop flow from the cloud so that's very important finally what's the scenario for this let's consider that your boss wants to keep a track of the gold prices for every day now you go ahead and do the work manually now you go to particular website you go ahead and copy this tabular data you copy it and you put it in an excel and upload the excel and maybe convert it into a pdf now you don't want to do that right you want to go ahead and automate this process you want to save up some time now in this scenario is just one site that you're going now let's consider that you also want to get gold prices silver prices etc now to do that it's going to be very very difficult now let's consider you want the silver prices maybe this as well maybe in uhd maybe in other currencies how much time consuming it will be in this blog we will address this problem we will scrape the data and that is in the tabular format using power automate desktop okay so that's the scenario so without wasting any further time let's go ahead and build a power automate desktop or power automate desktop flow so i'm here in my power main desktop i'll click on new flow and i'll give this a name power automate desktop web strapping and i'll click on create now this will create a new power automate desktop flow [Music] so first thing we would have to launch the browser correct as i told you you need there's a prerequisite that is to go ahead and install the extension once you have the extension installed you need to launch the browser so i'll use the firefox because i have the firefox extension configured next step would be to tell the browser what url you need to hit so initial url will be this particular url and windows state maybe you want it normal maybe you want it maximized or you want it minimized we'll say keep it maximized advanced options you can clear the cache you can clear the cookies remember this is the default thing you should always keep it at wait to load you want to load the page and only then you will be able to extract data from it if there is a pop-up how do you handle it using this drop down for example if you have a pop-up that comes up on the page you can say close it or press a button or do nothing for this scenario i have not seen a pop-up coming up on that page so we'll leave it at do nothing you can set the timeout to 60 seconds or more depending on the complexity of the page now the browser is launched the next step would be to extract data from the page right so let me launch the page first i'll just open it in mozilla so this is the page that we are going to extract we want to extract this particular tabular data so i go back to power automate desktop and type in extract so we'll use web automation okay we'll use web automation and we'll say extract data from a web page so this is the action we need so i'll drag and drop the action if you see this is what it tells from a single value list rows or tables here we have to pass in the browser instance so we have already launched the browser and the output variable if you look at it is browser we are launching it here now we can store the data in a variable or we can store the data in an excel spreadsheet it totally depends on your requirement for our particular scenario we will store it in a variable the reason being we have to pass it to power automate or a flow so the what variable it will produce it will produce the variable of the name data from web page i'll click on save now you see as soon as i click on save it gives me an error it says it tells me that extraction parameter cannot be empty so again launch it just go back to your web page and click on the web page you see as soon as you click on the web page you have a recorder that comes up this recorder will help us get the data so i can right click here on the table and i can say extract entire table and here you see all the data all the data from this particular table is being populated out here and how easy it was just one right click and that's it i'll click on finish now when i click on save that error should go away so i think we have launched the browser we have extracted the data finally what we want to do we want to close the browser so i can say close web browser i'll just drag it here and which instance i want to close the browser instance i'll click on save now this is all good our scenario is done but this needs to be automated from flow okay from the cloud flow so let's consider the scenario that i want to pass make this variable dynamic okay so maybe the user can type in four so he gets the fourth month results maybe he types in one then he gets the results for january maybe two so to do that he needs to pass the data from the cloud flow so just remember if you need to pass the data from the cloud flow to power automate desktop you need to define input variables so i'll click on input so if you see the variable type is input and we need to give this variable a name so let's give this variable a name such as month number you can go ahead and specify the data type in this case we'll leave it as text you can also specify sensitive data types such as passwords etc in that case you'll use sensitive data type uh we will not give it a default value and here i'll say month number will leave the description blank so this is the input variable now we'll in this particular action we'll change this phi with the input variable that we have defined so the input variable is month number right so i'll just click on save so good we nee we have passed the data in but what we need to do is we also need to pass the data out or pass the data to the cloud flow so to pass the data to the cloud flow we need to define an output variable so the output variable name i'll just specify it like let's keep it simple and easy pad table pad table and then click on create now i have specified pad table out here right so input variable is defined output variable is defined finally i need to assign my output variable path table to data from web page variable so the data from web page variable holds the tabular data that we have extracted from the web and that is what we need to pass it to our cloud flow yeah set variable we can use the set variable action we'll add it above the close web browser action and set the pad table variable to data from web and click on save so this is our power automate desktop flow so let's save our work i'll just click on save it's saving the flow in the meantime i go to the cloud in the browser and i click on create now for this scenario let's use an instant flow i'll just i click on manual trigger so before we add the action we need to pass the month number right so i'll just type in text i'll type in month and we'll leave the input just like that next thing we need to invoke our cloud flow so as i told you the prerequisite would be to go ahead and have the power automate gateway configured okay if you don't have the gateway this would not work so here you see you can have this action known as power automate run flow built with power automate desktop if you have it configured you will always see this particular i'll click on this you see this giving me an error but the error will go away because i have it configured correctly and here you can see the run mode attended or non-attended if i click on attended i would need an attendant license just remember that if i need run it unattended i would read an unattended license okay the difference between attended and unattended is user does not need to sign in if it's going to be unattended and for attended the user needs to sign in and if you see is requesting me the month number variable the month number variable was the input variable if you remember out here you see this month number is the input variable that we define so here i am going to pass in the month number month number will come from the manual trigger i'll just type in month number perfect so next what i want to do i want to go ahead and convert this file into a pdf let's consider my boss does not want anybody to edit it so the best format would be to have it as a pdf right so i'll use the mohembi action and i'll say convert document here i can give it a name so i'll just hard code it for this particular example something like gold dot txt we can give it as a txt format and you see pad variable this is the output that has been passed in by power automate desktop so pad variable is defined now next muhammadi actions can also be used to watermark so what if we add a watermark as well let's add a watermark so the source file content will be the processed file content watermarking text so watermarking text would be let's add the name of the site saying that prizes from this particular site we'll just set the font family as arial maybe font size as 24 middle center and we'll specify the width as maybe 550 by 550 source file is optional we don't need it including rotation we can give it an angle that's also optional i'll get minus 45 opacity we can set it at 50 percent opacity means how dark or light the water mark is so we leave it at this so text watermark is configured finally we will create the file so create the file back into sharepoint [Music] and i'll create the file in demo data sps demo data i'll give it a folder path i'll say pdf file i'll give this a name let's create the dynamic name so i'll use convert action and i'll use processed file content output from the convert action and then i'll append it with maybe the month number and i'll say dot pdf and file content would be the file content of the add text watermark action i'll give this a name i'll say patch power automate test and i'll click on save so create a template okay i'll just see what the error is invalid connection it tells me for some reason here so let me try to specify this again so now it tells me authorized i click on flowchecker i don't have an error i'll click on save so i had to re-enter my password for some reason but yeah so let's go ahead and see it in action so let's consider i want a report for february i'll just type in two so here you see it automatically launched the browser desktop out here it's trying to extract the data i think it already extracted the data so let's see so here it's still yeah it extracted the data automatically i'm using an attended flow now it should convert a file to pdf and then it should add a text watermark and it should create a file so the create file failed because workflow input is not valid let's quickly check what did i do wrong okay passing the processed file content which is absolutely wrong i need to pass in the base file name okay so base file name save it again run and done so this time i'm not touching anything i think it already launched the browser here we invoked the flow from power automate cloud it launched the pad that is power automate desktop it went ahead and did its stuff and now it's gone back to power automate convert action working fine adding text watermark and this time it created a file so if i go back here and if i refresh yes go to and here you see you have the prices you have the watermark and you have it as a pdf so isn't this awesome you have just extracted tabular data from powerautomate desktop i think that is pretty much it and thank you for listening and i hope this session was informative bye-bye
Info
Channel: Clavin Fernandes
Views: 14,108
Rating: undefined out of 5
Keywords:
Id: DDJQ3676oCw
Channel Id: undefined
Length: 15min 26sec (926 seconds)
Published: Wed Jun 30 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.