Web Extraction in Microsoft Power Automate Desktop - Full Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
web extraction with Microsoft power automate desktop is straight forward to get the maximum out of this video please open up your own power automate desktop and do all the steps with me in case things goes too fast just pause or rewind the video is designed for that my name is Anna Jensen I'm a Microsoft most valuable professional let's learn some power automate desktop right click on your desktop say new folder and we will call this project this is our project folder here we will put in the course materials that the robot will read it will also be the folder where the robot will output too let's go download the input material so I open up a browser navigate to this course page the link is in the description below on the course materials you'll see the input to amazon.xlsx right click say save link else and we will put it in our project folder so navigate to the project folder and save it there let's go inspect the data so in the project folder we will have the input to amazon.xlsx double-click to open it here we will have an Excel book with two sheets we will have products to scrape and this is the sheet that we want the robot to read here we have four products for each one of these products we want to do a book search on Amazon and scrape the corresponding results back for example we will do a regex search it will look like this so our robot will navigate to Amazon then it will click this drop down it will go into books and then it will search for regex it will click enter and all these results we want to be stored in an Excel sheet that is we want the title to be saved we want the corresponding URL for each one of the books we want the horses for all the books and we want the ratings and amount of ratings here let's go create that in power automate desktop so in power automate desktop you click new flow we will call this web scraping and click enter we will use regions to better structure our robot so here I'll find a region and drag it in a region will be ignored at runtime but it lets us developers get a better view of the robot especially if we got a lot of actions this one we will call any Tia lie station and click save let's grab the path to the project folder so go to our desktop we will minimize all this we will grab the path to the project folder so here you'll say shift with your keyboard right click with your mouse cobious path back to Power automate desktop and our flow go up to actions search for a set variable and drag it in the variable name that will be project path and in the value you will press Ctrl V on your keyboard remember to move the quotation marks because we will not need them when power automate desktop wants to read a path then we click save the benefit of storing the project path as a variable is that if we choose to suddenly change the location of the project folder then we can just do it here in the variable and not like three places down the code that we're going to write now so always use variables for the things that can change and also use variables for things that you repeat a lot in your flows that is best practice we'll drag in another set variable this one would be for our Amazon URL and in here you'll just say HTTP s colon 2 forward slashes amazon.com then we'll click save let's open up a browser so we can do the robot work find a launch new in here you can either use Firefox Chrome or Edge I prefer Edge so I'll drag that one in but you are free to go with chrome or Firefox with this guide it will not make a difference we will launch a new browser and in the initial URL we will go to the Amazon URL so click the X double click the Amazon URL we will create a browser instance called browser then we'll click save we can run our robot by clicking run up here that will open up Amazon so far so good let's go back to Power automate desktop we will do the web scraping now so up in actions find a another region and drag it in here this one we will call Rep scraping the first thing we want to do because right now we're just here that is to choose books from this drop down so up in actions you'll find a set drop down list value on web page make sure you pick the one on the web form filling and not the one up here so I'll drag this one in you'll see that we operate in the browser instance that was the variable that got created up here in the launch new Microsoft Edge we will choose a UI element so click the drop down add UI element and let me move this UI element bigger over here you will pick this drop down as you can see it's a little bit bigger than expected this red border it covers a little bit more of the search field but it's a select that you'll need to find press control on your keyboard while this is red click with your mouse the UI element is created we will not clear it but we will click this drop down and instead choose to select options by names in the options names we will have books that's the drop down value that we want to select we will click save try to run the robot Again by clicking run up here we are opening it and we are now in the books section then we can do our search by typing in the search term here first we will not do all four we will just do one of these and see that this works then we will expand our solution to work for each one of these topics that is the robot will read this Excel sheet so let's go back here to do a search we will find a populate text field on web page and drag that one in again we will operate in the browser instance the UI element we will click this drop down and here you can see the previously created UI element that must have dropped down we will add a new UI element by clicking down here find the input text field up here again press control on your keyboard and click with your mouse that will create this UI element in the text that will be the text that our robot types I'll just type in manually regex we'll click save run and inspect again and yes we will run our robot a lot that is to verify that each steps Works imagine that we built 20 steps and then we realized something was wrong in the beginning that will give us that could give us a lot of extra work right now it succeed now we just need to click enter here so I'll go back find a send keys and drag this one in under insert special keys find misc choose enter here you can see we could also write this code but it's often more easy just to find it automatically then we'll click save and in here there's a little bit of a trap I just wanted to show you so click run and don't worry if we're not going to succeed here hint we are not as you can see we say regex but we don't send the enter click here what is wrong this is a common scene what is wrong here is that our populate text field unfocus the text field and then when we send enter we are not focusing on the web page the solution is to double click the populate text villain web page go to Advanced in this unfocused text box after filling it just deactivate that click save run the robot again and see that this works now that we can do the search books reg eggs we send the enter click There You Go the actual web scraping is the easiest part so go up here find and extract data from web page again on the web data extraction drag it in here now we need to open up the search results and yes we have opened up a lot of browsers that is our robot doesn't close down the browsers yet that's fine we will of course create that in the robot a little later just open up this search result then this live web helper will open that will help us extract the data let me just resize so I can have both the live web Helper and the results again you can see this red bottom moves around what I want to do here is that I want to choose the first header and here you can see I can choose span but I can also go a little wider and choose heading 2. there will be no practical difference in this use case we will just go with span here so I right click this time you don't press your keyboard but right click with your mouse extract element value and here you can see text and there is the first part of the title click it then you will see red a green dots around the title and it will also appear over here then we will take the second title and here as I said it doesn't make a difference if you choose span or heading 2 but if you choose span up here you will need to choose Span in the second title as well right click extract element value and pick the text once more boom power automate desktop automatically says well this is a pattern we chose the first title the second one then we want to take each one of these title so quite clever right then we want the corresponding author and here we can see that we can actually grab this angle but if we have more authors we need to grab the whole line that is the div that means that we will get unwanted data with we will have to buy and we'll also have the date in the end that's fine we can easily do data gymnastics later on so just choose this entire div right click extract element value and choose the whole line and we only needed to do it once here that is we already Define the pattern so power automate desktop says well if this is the patent then I want to take this entire line to each one of these results we also want the rating and this is this 4.3 it's not sure that you have the same book but just pick the first rating so we right click extract element value and pick the rating now we got that we also want the amount of ratings that is this five here here we have six ratings and 70 ratings and so fourth so I right click here extract element value and pick the amount of rating again we get a little bit of unwanted data since we have a parenthesis we will fix it later on now we have the data that we want for now so click finish here we can see if we're scrolling a little bit down then we save it into a variable and the variables produced is called Data from web page you also have the option to if you click this drop down to choose Excel spreadsheet if you want to save it directly into Excel but since we want to work with the data it's preferable to store it in a variable that will be a data table this will be much faster so in the variables produced it says data from web page let's rename this this is another best practice always rename your variables so it tells a story about the data that it stores that will make it easier for you and your colleagues to come back to and maintain this robot later in the future save now you can click run runs more here we'll open it we will set the drop down say reg X do the search and in a little while we are extracting the data go back to Power automate desktop over here to the writing variables in case you don't see it it might look like this you can just click this little X hop here in the corner Amazon results 16 rows and four columns double click on that here there you go if I move my mouse a little bit to the right we have all our data here some of the books don't have reviews that's file and here we have it we will have the data for the first site and if you see here you can see that this is a data table that means that the first row that is index 0 a data table is zero indexed so usually you you will think that this will be index one but it is index 0. that will matter later on when we work with data tables go close this we also want to close the browse because see here we have opened up quite a lot of Amazon's right now so go up to actions then you'll find a region and drag it in here we will call this end in this region the robot will close down the systems that it worked in here it will close down our browser go up to actions once more find a close web browser here and drag it into the newly created region this will just close down the browser variable that is the instance that got created up here in the launch new Microsoft Edge click save now go run it so the only thing that we'll see here is that we know that all these steps works but we'll see that the robot close down this browser it did congrats you've already built a lot let's write the results back to Excel and yes we could have done it directly into the extract data from web page remember if I double click here I can choose Excel spreadsheet but we didn't want to that is because we want to do some data table operations a little bit later that we want to go Advanced on so we will do it in a variable here and then we will just have the right to excel in the next region first we will launch an Excel instance go up to actions and then you'll find a launch Excel and drag it in here since we are going to write out results to it it will be a blank document we will not have it visible we don't want to see what's going on here the variables produced that is called Excel instance since we are opening up another Excel instance later that will be when we want to read our different topics then we will call this one Excel instance results like this and we'll click save so we open up the Excel sheet up here in initialization then after the web scripting we want to write the data into this Excel instance so go up here then you'll find another region drag in the region in between the web scraping region and the end region this one we will call right to Excel and click enter we will add a new worksheet so up in actions again add new worksheet drag it in here we will add it into our Excel instance that is the only one created so far the new worksheet that will be regex that will correspond to our search we can edit as either the first worksheet or the last one I prefer last but feel free to go with first if you prefer that then we'll pick save to write to excel you will find a right to Excel worksheet drag it in beneath the add new worksheet the value to write that will be the Amazon results that we got over here so pick this little X here say Amazon results it's important that you don't write these things out but whenever you can use these variable Pickers that will reduce errors in your robot building where do we want to write to since we can't not write headers out directly from a data table we could do that if we looped it we will just write the data out into Excel and here we will start in column A and pick Row 2 because it's in the first row we'll have our headers then you'll click save now scroll a little bit down to the region end here we will delete the default Excel worksheet called sheet1 you know whenever you create an Excel book you will have this sheet1 in we don't want that so we will delete it so delete Excel worksheet drag it in here in the region end we will delete the sheet named sheet1 like this if you use a local version of excel that could be in Danish for example in my language then this will be Arc one just try to run the robot if this doesn't work and pick your local name instead but mine is sheet1 so I'll go with that I'll click save we also want to close and save Excel this just right to the Excel instance a close Excel will save the data so find a close Excel and drag it in here here we will choose to save document else where do I want to save it I want to save it into this project path that we created in the beginning I can choose this little X here double click the project path and then I want to say what name do I want to give this sheet so I will have it backwards less then I'll say result Dot xlsx now we click save let's go run it so I'll say run and again this will get repeat we know this works we haven't seen the Excel things so it will be exciting to see that we can actually write the data out it looks like our robot succeeded it did so if I go back here in your project folder you will now have a result dot Exodus X double click to open it there you go we now have our data beautifully written out and yes we have a little bit of a problem here because we will have the title that is fine but the author we will have it by and then you will have this date and we will also we can also see that we have a minus here minus minus that was the amount of reviews so we need to at least remove this minus no problem and we can see that we still need headers so let's go add this I'll close down this Excel and I'll go back to Power automate desktop we will write into the right to excel again so go up here find another right to Excel worksheet drag it in beneath the first one so in the value to write we need to add a row that will be uh the type data row it will look like this so two percentage sign move inside them then make a curly bracket start and a curly bracket end like this move inside the curly brackets then we'll have a hard bracket start and a hard bracket end so and let me make some spaces these will just be ignored with at runtime but it will make it easier for you to actually see what's going on so two percentage signs in the start and in the end then the curly bracket then a hard bracket and then we can have the data Row in here so a single quotation mark since this is a string text value comma separated this is the first one so in single quotation marks say title and then a comma another single quotation mark will have the author like this then we will have another single quotation mark and a comma we will have the rating like this have another comma we'll have the amount of reviews like this add another single quotation marks and since we are going to extract the URL a little bit later let's just create that as well when we are here so this will be an empty column for now fine we will fill it with data later on where do we want to write it well it will also be in the column A now it will just be Row one then we'll click save try to run the robot again and what we're going to see here is that we can add headers in DXL so we build components over and over to this flow and once we have added another component we will go run the robot and inspect that this will work double click the result and up here you can see that we now have nice column headers so far so good close it again each time the robot runs we will overwrite this result let's do Dynamic file naming so we will not overwrite the data back to Power automate desktop scroll up up here in initialization we will get the current date and time so I'll say get current date time drag it in after the two first set variables the reason that we want this current date and time is that we can use it for dynamic file naming that is we will always have different date and times when the robot run we can specify down to milliseconds if we want it the reason that we drag it in here in position four the line 4 after the two set variables that is we want the variables that can change in the beginning so when we come back to the flows we can easily see well we have two set variables here these are the data that can vary for example we can choose to have our project folder another place or we can choose to use amazon.de right now we will get the current date time into a date time variable we'll click save to use it in a format for files we'll need to convert it to text so here you'll find a convert date time to text drag it in just after the date time to convert that was the current date time that just got created up here click this x double click the current date time the format to use that will be custom and here I'll specify my format I'll say full wise two big M's two D's two big H's two M's and 2 seconds the reasoning behind this is that I want a year first and I want a month the days the hours in the 24 hour format that will make it easier to sort minutes and seconds in case you just run this robot once a week you can of course skip these hours minutes and seconds but since we are going to run this robot continuously we will use all of it we will store this formal date time in a variable call formatted date time now click save go down to the end into the close Excel double click to open it navigate in front of this r ing the variable bigger you will choose the formatted date time go up here and click between the percentage signs and the result I like to have an underscore this is just Cosmetics that is just to make it look a little bit nicer now say save try to run your robot once more and again it will do the same so I'll just fast forward to the end that's it let's go inspect our data that's nice we now have a sortable format that will not override our files and we can see this data is the same but we will have a new name each time the robot runs so far so good let's do some data gymnastics and then we open up the results again what I want now is two things I'll want to remove this -5 in case you still have the parentheses the approach will be the same I also want to isolate the authors if we have one or more that's fine here we have plenty and here we just have one so let's create some very nice regular expression patterns to solve for this I have this minus but I'll also solve it if you have the parentheses so let me minimize this open up a browser and yes we have plenty I'll just go to the first one open up a new thing I'll call this one reg [Applause] x101.com and click enter the first thing you'll do is to pick your flavor we are using the.net C shop and let me introduce regex regex is a series of characters that defines a search pattern so when we have our data it looks like this in my case so that could be this and if we have a thousand separator it could also look like this in case your data looks like this we will create the pattern so it will also work with your parentheses so 2 1 3 4 like this what we will do is to create an expression that will extract these digits and it will remove these minus or the parentheses to extract a single digit that will look like this a backwards last and a d that means that we're looking for a digit so right now we're just saying look for a digit we get a lot of matches but what I want to do is that I want to say I want to look for one or more digits then I can say A Plus here and now we are getting out the digits we can see that it ignores this comma but over here in the match information the first mat is five that was fine then it's 214 that's also fine the next match is 2 and 1 3 4 that is because it separates these two since we want to keep this separator I move to the start of the expression I make a hard bracket I move to the end of this D I make another hard bracket now I can inside these hard brackets I can say I want to look after digits and commas there you go we now have our matches just as we want it's that easy so let's try to test it so what I will do here is that I will copy this regular expression and let me just close down this Excel sheet here so we will not confuse ourself we'll go back to the web scraping what we will do is to open up the extract data from web page open up any one of these search results we will get this live helper up again you'll need to go into advanced settings and let me just move here what we want here is to isolate the amount of reviews over here or amount of ratings and that is one two three four here we have the CSS selectors it doesn't say it but this is for the first one second one third one and fourth over here in drag X in number four simply just paste in your regex pattern click OK boom we have now removed the parentheses or the minus let's also try to click finish click save and we will run the robot to test that this works reg X is kind of straightforward it's can be a little bit complicated at First Sight but please just do these exercises with me you will learn so much I don't expect you to learn regex entirely here but if you can understand what we do here you come a long way let me minimize this and let me minimize this if I go over here and pick the last one here there you go we have now skipped this minus or parentheses that's cool we also want to do the same thing for the author here so let me just grab the first occurrence here and I will open up regex1 again so in the test string we'll have our author we will delete this expression we want to create a new expression now we will say look after buy and space we will create a positive look behind that is a parenthesis a question mark a smaller than than an equal sign and here we will have to say what regex should look behind that will be buy and space close the parentheses and now you can see this purple one here the purple pipe that says that regex will start here currently we are not extracting anything so what we'll do is that we'll say a DOT that is any character that is for example the C we'll say plus and yes that will extract the rest of the line I also want to say I want to stop at this pipe and I don't want to include this base then we will create a positive look ahead so i parenthesis question mark an equal sign and yes this one will become easy for you once you have completed a couple of exercise in it so please do it me then I want a space I want a mysterious that says that I want to look after more than one space because some of the occurrences we have more spaces I'll show you a little later then we also say we want to here we are going to say where we're going to stop we're not going to stop at this space this space this space this space but we're going to stop when the space is followed by a pipe and a pipe we also need to escape like this with a backwards last and then we'll have the pipe I'll close down the parentheses so right now this is working but we can see here that we have a space so what we will do is up here add the dot I'll have a parenthesis around this Dot and this plus sign like this then I'll go in after the plus I will have a question mark this is a non-greedy quantifier it tells the regex engine to find the shortest possible match as opposed to the longest possible match this is the default behavior that is the longest possible match so we're changing it here in other words it tries to capture as few characters as possible while still satisfying the overall pattern so this one works as well because we have this space here so now we have our pattern it should look like this and let's just try with some of the other ones to see that this works for example if I take the next one here so I copy this one to our browser once more I'll move to a new line you can see here we also extract this name and here let's just try to this part of pocket reference O'Reilly let me copy this and show you that this also work and here you can see that we actually have two spaces after the the author but it also gets this order so far so good let us copy this and do the exact same thing as before so we can close down our Excel sheet now we go to Power automate desktop double-click the extract data from web page now open up an Amazon search Once More go to advanced settings and the author that was valued to so paste in your regex pattern right here click OK there you go we have now isolated the order for each one of the extractions click save run your robot again and clap yourself on your shoulder you came a long way with this again we're doing all the steps what we want to see now is that we can isolate the actual author and not get this data and buy with so let's move again and we will clean up in this browser mess a little bit later this is just what we develop so if I move this here I expand it we now have nice authors and we if we have more than one author we also save it that's it let me close this once more I also want to grab the URL because you can see here that if I hover my mouse of the title I can click it and open up the book I want to grab this weird looking URL up here I might want to use it later on so let me go back again we cannot do it directly so we need to use a little bit of Wizardry and I can show you why we can do it if I open up the extract data from web page once more I open up the results here and here if I try to right click extract element value we cannot extract the actual URL yeah so what we will do instead let me click cancel let me click cancel here we need to find the selector for the URL our self so what we know now is that we know that and let me just refresh so we get rid of these green dots what we know now is that the URL must be related to this title in some way and what we will do here is that we'll create a notepad so I'll go down here to start click start typing in notepad and click enter to open it now we will fill it with data we do this to make it a little bit easier so we store our data in here again go over to the extract data from web page open it find the Amazon salts here so we have this one open now click advanced settings here we have a base selector this is an address for our search results we can see that we're extracting a table that is this table with all these data in but this is the base CSS selector that is the actual address to find the first element then these selectors will find each of the other elements that we're trying to find go up here in the base CSS selector and copy it out move to your notepad say table base CSS selector click enter click Ctrl V to paste in the table base CSS select them we also want to grab the CSS selected for the title because we know that it's somewhat related to this CSS selector so move a little bit down and here we will say selector folder title go in here Ctrl a control C paste in the selector here this is the selector for the title and let me show you why this is the address for the table base selector and this is the selector for the title so I'll click cancel here cancel here cancel here we will move to our search results and let me just do this refresh it again so we don't have these Green Dots up I'll open up my notepad what we will do now is to inspect the code for this webpage and that is because the code that is rooted here the selectors are rooted in the code this selector that was the address of our table base and this was for the title when we wanted to extract it and this is found over here if you press F12 on your keyboard the developer tools will open in case you don't see it here in the button in case it looked like this let me click these three dots over here to the right just click these three parts and tuck it to the button that will make it much easier so what we're seeing here is that we are starting in an HTML for the base then we have a body and let me scroll up here this HTML that was this HTML here then it says look for body look that's down here then we want to and maybe we can just so we can have both of it here so we are in the body then it says find the first div and again I don't expect you to understand all these selectors but I expect expect you to give it a try with me then we find the first div that is this one here and let's click this little arrow to expand it then we want to find the second div again it's zero indexed so we scroll a little bit down to find the first and second div here we have the first one and we didn't want it that we want the second div so we scroll a little bit further here we have it this div ID search I'll open this one again now I need to find the first div that one is here I'll open it again I need to find the first div so that will be this one here again I'll find the diff here and now I'll need to find a span and yes I know that this is a lot but please try to do it that will make your selector work a lot better so this is the first span then we need to move inside this div here yeah and here you can see that and now we have the base so here we have the results we have the first result and if I go a little bit down you can see that here we have the first one if I go down here we have the second so we have identified the base in the code that is why this is the table base selector that is here so now what this title says that says for one of the results this is the first result look in a div and then we will have five tips and eventually we will find it in an h2a span I will just go through it like we did before so here we have a div we have another diff we have another div we have another div we have another div let me go here we have another div and here we need to find the second div that will be this one here we'll go into the div we'll go into another diff we'll find the first div that is here then we will find the H2 here and now we will have the a that one was here so if I open this here we have the span and if I open the span that is why the title had this selector and look what is Smart here if I move one back so if I move this span from this selector we will end here in a and look what this H ref attribute says here we have our URL so what we did here is that we backwards engineered the selector for the URL if I go to my notepad once more make sure you move to the end have and enter click here we will have a selective for the URL and click enter we will take this selector up here that we just wrote in but only to the a so copy everything to the a it looks like this so it looks quite similar to the title we just need to do something and here we're not looking for text but the age ref attribute so make sure you have this in your clipboard you can close down the browser minimize it at least go into the extract data from web page again we'll need to open up a search result to open this go inside the advanced settings now we will add the URL so click this plus the CSS selector well we just created that one here in the attribute you type in age ref click OK boom there you go we have now got all the URLs wasn't that easy at least when you understand what you're doing and trust me this selector work will do wonders for your power automate desktop game click finish click save try to run the robot again now I want to see that I'm able to save the URLs to the Excel book we saw that we could extract them but we also want to see that it happens when the robot runs so yeah I'll move in to my results double click it there you go we now have the UL yes they are long but they are that on Amazon so we have now extracted the data we have done data gymnastics and we have manually created a selector for the URL right now we are manually doing a regex search I also wanted to make sure that I could do all four topics in our incoming Excel sheet that was the input to Amazon let me just open it to recap here I wanted to move to products to scrape and then I want to for each one of these products I want to do an extraction and write the results out to my result Excel sheet let's go do that so we close here and we move inside here scroll up so we will need to read the Excel data we will launch another Excel instance move it in front of the first one and not after it this is important because those two can interfere with each other so have it before right now we're not opening up a blank document we're opening up the following document we know that we are in our project path like this and then we will say backward slash input to Amazon Dot xlsx we will not have our instance visible since we don't want to write data back to this topic we can improve performance a bit by saying open as read-only we'll click save one other common Pitfall is that again let me just open up this input to Amazon again is that we have two sheets we have the anesians.org which is just an info sheet with some nice links for you for example if you want to join my community which is completely free you can click this Discord but we want to make sure that the robot reads from here and right now per default the robot just reached from the last one open so our robot will work if this is the last open but if this was the last open our robot will fail we cannot get the choppies out here so let's solve for that as well find and set active Excel worksheet and drag it in here here the Excel instance and now we might want to say here I created the Excel instance and call it Excel instance let's just stick with that name for two seconds so here we'll say activate the worksheet with the name that was products to scrape I can click save so let's also name this Excel instance maybe topics so we stay in the syntax of this one here we can of course rename both of them but let me show you the nice thing to do move over here to variables right click rename and then I can say Excel instance topics and look what happened when I click enter it changes both places that is exceptually useful if we have more than one use of the Excel instance in our flow we had two but we could have five so now we are working in the Excel instance topics we are setting the active Excel worksheet now we can read from it so I'll say read from Excel worksheet I'll drag it in here the Excel instance that I want to read in that will be the Excel instance topics I double click here what do I want to read I want to read all available values from the worksheet go into advanced we have column headers so in the first line of range contains column names take this here the variables produced that is called Excel data let's rename it to Excel data topics and we'll pick save we also need to close this Excel instance again so I'll have a close Excel we're just reading this data into a data table so that we can work with it Excel instance topics we will not save the document we cannot by the way we open it as read only then we'll take save so what we're doing now is that we just read it into a data table let's try to run the beginning of the robot to see what we do here so what you will do to the left of the 11 press your mouse that will create a breakpoint again press with your mouse then it will be red now when you run the robot will stop at the launch Excel so we are not continuing all of these 20 steps and go back to your robot if you click stop here we have what we wanted what I wanted here was to have the Excel data topics if I double click over here you can see that we have now read our topics into a data table a data table is again just performing like an Excel sheet we do it because it's much faster to work in than directly in the UI of an Excel sheet so let's use each one of these values to do this regex search and to create a new sheet here in the add new worksheet with the proper name let's also move this breakpoint again so click once more before this region web scraping we will find a for each and for each Loops a collection one time from start to finish drag this one in here so the value to each rate that will be the Excel topics we want to grab the topics one by one so Excel data topics here here we will store it into a Q and item that will be each one of these topics I like to call this one current topic again we rename the variables so it will make e so it's easy to see what they're actually doing click save now click this end here so click with your mouse it turns a little bit darker blue press Ctrl X yes you will have an arrow go all the way down to the region end and click that region end it will be a little bit darker yellow press Ctrl V you will paste paste in the end here so now what we want to do is for each topic we want to do the web scripting and we want to do the right to excel one best practice is for each one of these systems works is to navigate to the front page after each iteration that is to make sure that robot starts at the same place so it can click these buttons we don't know if we moved into the system right now at Amazon it doesn't make a difference but let's just Implement that so you'll remember it if you are on a web page where it matters so here you will find a go to web page so for each one of these topics I will go to the Amazon URL click this little X go up and find Amazon ul and double click and click save so open up the populate text field on the web page here I will not do a regex search so I'll delete that what I will do is to move inside the current topic so I'll click this x here I'll say current topic and then I want to say I need to give it the header of the column we only have one column in our Excel data topics but it will look like this so I'll have a hard bracket a single quotation mark and then I'll say product single quotation mark hard bracket and this product let me show you where it comes from so if I say save here that will be this current topic is each one of these data rows in the Excel data topics and if I open up this Excel data topics the header was called Product that's why we are saying go look in the product if we are lazy let me open this one again Mark all this Ctrl C to copy it because we are using it down here when we want to add a new worksheet so paste it in here as well so now we made a dynamic solution that will looped through each topic that is four in this case but it will work if we have 100 topics let's go run the robot I will fast forward to the end so we can see if it works that is one thing we could do to speed it up a little that is this run delay if we set this to run then there will be no delay between this this is 100 milliseconds that is 0.1 second this is for us to see when we develop here so that's why it's useful but let's just set it to one what I want to do here is that I want to go back let's inspect the results now we have one Excel sheet for each one of these results go look that was quite nice so what I want to do now is let me close this one down again what I want to do here is that for each one of the results results let me just search for regex then I want to make sure and I just need to be in books sorry now I'll do a regex book search then I want to make sure that I grab each page here up to right now here you can see that we have 180 books so let's go create that let's go to Power automate desktop navigate out to your main console we'll create a new flow and this is where we can test and play a little bit around so I'll call this playground web scraping and that is we start on a fresh just to see that we can navigate through each page when this works we want to implement it into our main flow so again let's just clean a little bit up here in all these browsers so we can easily see what's going on I'll keep one Amazon open let me just make sure that I don't have it over here so make sure you only have one open now you will do a regex search if you're not already did it on books that's important so this is the page that we want to work at I want to be able to make sure that we click next next next next next until we have no next here here you can see that we are navigating through each one of these Pages move back to page one again so what we will do here is that go into our playground flow fine and let me maximize it find a launch new Microsoft Edge and drag it in in the launch mode here I'll just attach it to our running instance so I'll pick this one here that lets me if I click this drop down you can see that I can now attach it to the Amazon regex it's a little bit easier than just opening it up over and over so then we can click save now I want to click the next element so if I go up here I'll click link on web page and drag it in click the drop down here that will let us create a new UI element scroll down oh that was too much find the next button that will be an anchor here press control on your keyboard and click the next button we have clicked it here we click save also set the Run delay here to one that will make it a little bit faster so now let us just try to run it to see that we can actually click one next button that worked again let's move back to page one like this let's test if it works on multiple next buttons I'll go back here before the click link on web page we will have a loop here we will drag the loop in here here I will start from one I want to end at 40. this is arbitrary at the moment we're just testing and developing of course I want to only click the next button whenever it's available we will have increments of one so this Loop will go over and over until it reaches 40 so it will run 40 times here I will click save now drag in this click link on webpage into the loop now when you run it again make sure you are on the page one because we defined it up here so try to click run and now it should jump to all the 40 pages if it works but apparently here we have a little bit of a problem we're jumping towards page two and three over and over we cannot use this one here we need to fix it a bit again move to page one so in case you want to make sure I just move to page one we need to create a next button selector manually we could not use this one here so what I will do here is that I will right click I can say inspect I could also press F12 to get the developer tools just right before this one just let us jump directly to the next element this is the address of the next element what I want to use here is that I want to use the a element that was this anchor then I want to use this class because I think that is fairly static it says s pagination item s pagination next you can see it says something about the next button so go in here make sure you click couple of times with the mouse so it marks everything Ctrl C to copy it out then we go to our Notepad jump two lines below here here you will say next button selector and we will click enter when we click double Ctrl V here we will paste in this class attribute which is with this value I also want to add the a element so I'll say a here then a hard bracket and a hard bracket here in the end that says go into the a element and look for this class so go copy this selector move down to your playground web scraping go to UI elements here we can see this anchor double click to open it so we will not use this this one didn't really work I'll click the text editor now we can manipulate our selector manually so go mark it and press Ctrl V we'll click save let's also rename this so I right click and here we will say button next like this this is just best practice so we can easily see what's going on again let's try to run our robot closely developer tools make sure you are on page one we are let's try to run the robot and see if this works so now we have page two we're in page three when page four when page five we created a selector that works of course it's quite random because we only have 179 results here and we defined it to go through 40 pages so suddenly it will cause an error here because now it cannot find a next button let me show you why that is because this one is grayed out we cannot click this next element so that is fine that was expected that we failed but we saw that we can Loop through each one of these pages so now what I want to do here is that we have a working selector we can continue to click on the next until we have an arrow and then we should stop go over to actions and then find a loop condition oh I put it in the wrong place I can just click cancel put it in here in the end so the loop conditions the first operand here we will just say in press present size we will say true equals true this means that this Loop will run over and over until we stop it yeah click save here move the click link on web page down here in the loop condition click this one here press Ctrl and click this one click delete so what we will do here is that we'll click this over and over and this will of course cause narrow when this course is an error we want to move out here so let's create a label so here I'll have a label I'll drag it in here in the end this label will say no more next buttons and we'll click save now I just need to tell this if this course is narrow that means that we're out of pages then the robot should move down here to label so double click the click link on web page go into unarow here I want to say if an error occurs I want to let me just make it a little bit bigger I want to continue the flow what do I want to do I want to go to the label that I just created it looks like this so now we are going to this no more next buttons our robot will not fail when it have reached the last page when we implement this in our final robot we will just have our web scripting in here so we are web scripting clicking the next clicking the next and so forth but let's make sure that we are on page one we are not so we go to page one again try to run the robot now we should see that we are clicking all the way up to I don't I can't remember how many pages we had around 10 I believe now I can see you six seven eight nine we had a little bit more apparently 11 Maybe actually we have 12 and now we're looking for this next button now our robot will try to look but we made Arrow handling so in a couple of seconds it realizes it couldn't find it and then it closes successfully bingo so click this Loop condition click shift on your keyboard click the label press Ctrl C to copy it then we'll move back to our web scraping flow in here you will click the end region and then you'll press Ctrl V that will paste in our Loop condition now we will move the extract data from web page inside this Loop so what will happen here is that we will run this Loop over and over that means for each one of these topics when we have done the search we will extract the data we will click the next button then it will continue it will extract the data click the next button and when we have no more next buttons it will move down here and eventually it will write to excel so now we have multiple pages on here if we started our flow now this Amazon results will be overridden over and over with this Loop so our data from each one of the pages will get overwritten we need to create a base data table where we can add data to one over and over once we are extracting it so let's initialize a base data table I go up here find another set variable here and drag it in after the two first set variables I'll call this one here base Amazon results and down here we will have another data row since we already created it and we're lazy let's click save here this will give an ever give us an error because we misses some data in it that's fine so what you want to do is to scroll down inside this right to Excel worksheet let's go copy this one here that is the value to write this data this data row it looks a lot what we're going to need move inside this base Amazon result paste it in since this will be the header of our data table I move out here between the curly brackets and the actual hard brackets and I make a hat like this this indicates that this is a header row that's it now we initialized a data table where we can add data to over and over I can click save here we'll have no error then what we want to do in the extract data from web page we want a loop that for each one of these rows here it will add it to this page and base Amazon results as no direct way to do it that's why we use a loop we cannot merge two data tables in power automate desktop of course we can but we need to use this wag around no worries this is a common scenario so please do this part with me as well drag in the loop inside this Loop condition but after the extract data format page so what do I want to do well these are a loop for all the rows that is in the Amazon results so I'll start at zero because it's zero index then I'll count the rows here so I'll say xia I'll scroll up Amazon results and let me scroll up again we will have the rows count like this since the rose counts are not zero indexed I'll just minus one that is because we started at zero I'll increment by one and we can see here that the variables produced is called Loop index that is saying in the first Loop index that is Loop index ones two three and so fourth so I'll say save so what do I want to do here well for each one of these rows in the Amazon results I'll add it to the base Amazon results so find a set variable here and drag it in here so the base Amazon results click this x double click the base Amazon results so I will say whatever is in the base Amazon Shults double click here then I'll add rows to it from these Amazon results it looks like this I'll say plus and then I'll say Amazon results like this delete these two percentage signs and have a hard bracket have Loop index and have a hard bracket end so this one the loop index will be zero at first that's perfect because that's the first row so it will add the first row then it'll be one that will add the second row to this data table over and over I can click save one thing that we don't want is that this one here we sort this Loop condition that will click the next button this will take a lot of time if you want to extract all pages on Amazon so let's restrict it let's just take the first three pages too restricted we will add a counter variable which we can use so again up here in the actions find a set variable and drag it in just before the loop condition I will call this one here Loop count and it will start by one and that is because we start at one and add 100 so we just create this counter to keep track of where we are in the loop condition so it will start at one and then we will add 100 each time this Loop condition has run so I'll add in another set variable here again you don't need to do this if you prefer to just extract all of the pages I just want to restrict the the pages to three to make it a little bit quicker and that is my only assumption so here I would say Loop count and again I will say add 1 to this Loop count so I'll click this x say Loop count plus one now we add one to it now I just need to say if this Loop count is less than or equal to 3 then I don't want this Loop condition to run so double click it up here you will say the first operand that will now be Loop count we will say in the operator we will say less than or equal to 3 I will click save and now we can close down all this if we have any Amazon instances running and I can also close this regex just to clean a little bit up hey do you enjoy this video then you can help me and my channel a lot by giving it a thumbs up thank you the only thing that we need to consider here is that when we run the robot now we have done everything right except for one thing and that is this base Amazon results we created up here and that means that when our robot runs for each one of these topics we will attend the results to this base Amazon results and we will write it out to Excel for each one of these topics that means it will work for the first iteration but for the second one we will have double the data and then we'll have three times a data four times the data because we will append the data over and over so we need to reset this base Amazon results we can do it by moving it down here into for each so go up here in the set variable press Ctrl X move in here and then Mark the go to web page and Ctrl V we will paste it in here this is a nice exercise of thinking where to put it and I chose to do it very directly so you will actually think about why we will have it here and not up here you can try to run the robot where you have one here and then afterward up here or in the other direction and see what the arrow is or what it will be now we can just click run here the key takeaway from this exercise that is learn to work with CSS selectors in case you missed some of it in this video simply just rewind it and make sure you understood everything the second one that is data tables we have manipulated quite a lot of data today learn to be able to refer to data tables by the indexes and the row counts and all what we did here the third thing that is regular Expressions I don't expect you to understand the code but please just understand what it can do you can easily Google or chat DBT all the codes just you can write a prompt and then chat DBT will help you so now we're scrapping the three first pages of each result and we want to see that we can write it to excel so let's just see what we have we have the record search that will be the last one it's very exciting to see this project work and we are done in one search and let's see now just need to finish so here we are writing that's it let's go to our results let us open it there you go if I go to Visual Basic we have exactly three pages that is 48 results we also have it here in JavaScript and we have it in regex great job to complete your power automate desktop Journey you should watch this video next
Info
Channel: Anders Jensen
Views: 33,942
Rating: undefined out of 5
Keywords: web extraction, web scraping, Microsoft Power Automate Desktop, automation, data extraction, windows 11 web scraping, anders jensen, power automate desktop, web scraping in power automate desktop, web extraction in power automate desktop, power automate desktop full tutorial, how to do web scraping, how to do web extraction
Id: POfK2A8iWAs
Channel Id: undefined
Length: 73min 5sec (4385 seconds)
Published: Sat Mar 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.