Web Scraping in Power Automate Desktop | Multiple Pages | Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
do you want to learn how to scrape data from the internet with power automate desktop then you've come to the right place in this tutorial you will learn how to scrape data from web pages with multiple Pages if you're new to the channel my name is Thomas and you're watching tomsack Academy let's start right away the website that we're going to use today is a website that you see on the screen and there is a link to this website in the description of this video we're going to scrape multiple categories so you see that we have phones here acidic category page of this specific category has multiple Pages which were all going to scrape if you navigate to computers and for example tablets you also see that this page also has multiple pages and in this video you will learn how to navigate to our product data and navigate as well through the multiple Pages let's never get back to the home page and let's get started so let's start with the creation of a new Excel file I'm going to create an Excel file here and let's call this one to scrape and here I'm going to define the categories that I would like to scrape so I'm going to open it I'm going to rename the sheet to sheet 1. and let's go to this website okay let's see what we have so we have phones but I'm only going to focus on the phones with touch screen here because this category has multiple Pages as it has two so let's copy this one let's go to the Excel file and here I'm going to create two columns I'm going to create one with the name and I'm also going to create one with the URL so those are the headers let's make them bold and then the first name I'm just going to call this one phones and this is the URL and then let's do two more categories so let's also navigate to computers and then we're going to go for tablets so I'm going to copy this one it's called tablets and let's also paste this one here and then let's also do laptops right so let's copy this one sorry laptops mp0l okay so we're gonna scrape those three pages and for these three pages we're going to then scrape every page so we're going to go to this entire navigation I see this website has quite a lot of laptops so I think that's really exciting because here we can also try whether about has the has the right performance um let's go back to the Excel file we have all of this we have name and we have URL that's important and then let's open power automate desktop the first thing I'm going to do is I'm going to launch Excel I'm not going to launch a blank document I'm going to open the following document and that's the document that you've just seen on the desktop here the Excel file doesn't need to be visible as we've discussed earlier this decreasing the performance of the bot and let's save this one and if you open Excel we're going to close it as well so go for close Excel and we are going to close Excel instance this is important because in this exercise we're also going to work with multiple Excel sheets so multiple Excel instances this one I don't need to save because I'm not making any changes in this Excel file with what okay and then I'm going to read the Excel range read from Excel worksheets and put this one below the lines Excel I want to extract all available values from this worksheet and of course my Excel file has headers you've already seen it so we have to enable this one click on Save and I see that this this entire data sheet will be stored in Excel data and I'm going to Loop through it with the four each so put that one here and the value to iterate that's Excel data and it'll be put in current item okay so the only thing we are now doing is we're just making sure that the bot is looping through this Excel sheet and it's basically going to phones tablets and laptops and inside this one we're going to need another loop and that's that's the second Loop and that that's a loop that's going to control the pagination and if you can't follow anything of this don't worry I will explain this all in the next minutes okay so in this for each let's also opened the browser you know I like Chrome but feel free to use any browser and chrome and I'm just gonna launch this website so just copy from a static after the second slash this one and let's put this one here I want to launch a new instance and I also like this page to be maximized because that way it's always the same size it's always the maximum screen size which is always the same so click on Save and if we launch Chrome we're also going to close the browser right close web browse don't be confused because there are multiple activities to open every web browser Chrome Firefox Etc but there was only one activity to close them all and I'm going to do that here close in the browser okay and in this for each we're then gonna do the actual scraping so let's see which activities we have for that so navigate to browse automation web data extraction extract data from web page this is the one you need and I'm going to put that in the for each browser that's still correct timeout is 60 Seconds store data mode that's a variable that's that's correct and you see that the variable that's produced is called Data from web page so click save and it will see that you get an error and that's because we haven't defined which data to scrape and if I'm going to open this one again and if you now navigate to this website you will see that the data extraction wizard will pop up and this extraction wizard is a bit different than what we have seen earlier so you can hover over all the items you want to scrape and I definitely want to scrape the title and I'll make sure to watch carefully because this is different don't click on the left Mouse button but click on the right one the one that you normally click on to see more options and as you can see here extract element value now you can extract the text but you can also for example extract the URL and that's also quite important especially if you want to scrape that web page as well for now I'm going to go for the text you will see that now only gives you one green border and that's not what I want so we have to show Power automate desktop that there is a pattern here so I'm also going to click on this one just make sure that you click on the same element on multiple instances right click extract element value and then take as well the text and now you will see that power out to my desktop notices that there is a pattern and it will basically show a dotted green line around all of the titles and from now on it will do that automatically if you add another element so let's add the specs as well right Mouse click extract element value text and now you will see that you will get this green dotted rectangle automatically around all of them I also want the price so click here and we want the text and there's one more thing that I want and that's the URL and the URL is hidden behind this link so I'm going to click here right Mouse click extract element value and now you will see that you need hrf so click here I will now see that we have um we have captured those four values from every item so I'm going to click on finish and because all the pages are the same we only have to do this once I'm going to click on Save because this page is basically the same as tablets and the structure is the same it's just the content that differs let's see what we're going to need so I already told you in the beginning that we're going to use multiple Excel instances and I'm gonna open another Excel so launch Excel but I'm going to do that in the for each like this and I want to start with a blank document because there is no data yet the instance doesn't have to be visible and if you open it I'm also going to close it at the end of the four each um that here I need to save it right because it wasn't there there yet and the Excel instance you can you see now that you can choose between the two of them and I want to go for in Excel instance 2 because that's the second one that has been initiated here so click on Save and I want to save the document as and I want to save it so you see we have the four each here we're just looping to the Excel data of the first Excel and I'm going to save it as the name so phones tablets or laptops Dot xlsx so let's go for a variable here make sure to choose that you select save document s document format is default and then select a variable this one here the X and then I'm going to go for current item and here I'm going to use again the square brackets name and then after the second percentage sign so behind this one I'm going to type Dot xlsx so it's going to save it as phones.xlsx laptops.xlsx Etc click on Save here and I want to make sure to save this data in this Excel file I don't want to overwrite any data so I want to write on the first row that is still empty in Excel there is an activity for that first I want to get the first freak row for a specific column so take this one and let's put it here at Excel instance you know that is Excel instance 2. the column and we're just going to go for column one because if this column is populated the other Columns of this same room are also populated I see the variable produces first three row on column so I'm going to click on Save and then I want to write data to excel right to Excel worksheet like this Excel instances Excel instance2 and the value that it would like to write that's what they've just taken from the web page so data from web page select and the right mode is on specific cell the column is just going to be column one because I'm going to start at the First Column with writing this this table and the row is going to be the first empty row so that's going to be first three row on column make sure to scroll here you will see it and I click on select click on Save now okay so that was one exercise but now we will only take one page from every category we're going to navigate to the correct URL so let's go to to browser automation and then go to web page and make sure to put that in the for each two URL and the URL that we want to go through that's going to be current item and then URL between square brackets and that's actually why we have this first Excel file right okay and now before we run this robot there's one thing that we still need to change navigate to large Excel and make sure that you make this instance visible and for some reason if you don't do this um you cannot see this data in this Excel file you can try it yourself so make it visible and then let's run a robot okay you probably already saw the robot writing this data so I'm just going to open laptops because that one had the most data I will see that the robot has written one two three four five six rows if I navigate to the laptops page computers laptops they will see that this one this first page also has six rows but of course you want to scrape all the pages right you want to get all the data even the data on page 20 and how to do that I'm going to show you right now so let's close this one let's also remove those files so that we make sure that we are not confusing ourselves um okay so let's take a look at the pagination so basically there are two routes that you can take you can every time click on the next button this one or you can just click on the button of the page and then plus one so if you're on page one you're going to click on two if you're on two you're going to click on three Etc and I'm gonna go for the second thread because I think that one is a bit more you learn a bit more but you can also go of course for the other route so let's start with making a loop navigate to loops and then I'm going to take this one Loop and make sure to put it below the lounge Excel like here okay we're going to start from one then every time we're going to increase by one increment by one and we're going to end let's say with 1 million make sure this number is large enough and the variable produced is Loop index and that's basically the counter that will take care of counting the page that we are currently going through so we'll start with page one and then in theory we'll enter with page 1 million which you will see that we are probably going to close before so click on Save then make sure to extract data from web page get first free row and write to the Excel sheet those are all activities that we have to do at every page of course the closed Excel we only have to do that at the end of this entire range of laptops and phones Etc so take those three activities so click on the first one then click control click on the second one and click on the third one while also still holding Ctrl and then drag them here in the loop like this okay and then I want to know if this button so if the button uh Page Plus One is existing so I'm going to go back here let's go to browser automation and I'm going to use if web page contains so take this one and put it here below the right to excel and if web page contains the following elements I'm going to create a UI element here and then here I'm going to click on two so make sure that you you click control and then you click with your left Mouse button and you will see here the selector that is going to be created or the UI element so click on Save but I'm going to change this UI element so I'm going to click here this is where you can find all the UI elements all the selectors and I'm going to click here on answer 2. and here you you will see all the items that this selector is using to navigate so you see that the selector is using um the text to which is part of a and the a element in HTML is a stands for link so it's looking for a link that has the text to and I'm going to change so I'm going to click on this one click on the tool here and I'm going to change this one with the X to a variable and that variable is going to be Loop index but before I close I don't want to click on page 1 if I'm on page one and on page two if I'm on page two I want to click on the next one so I'm going to say plus one like this and if you have this I will show it one more time if you have this on the screen then it should work for you as well so click on save so now we're only checking whether this element exists because it's also there is also a link in it and if I'm on page 20 there is no there is no button anymore which says page 21 with link writes so in that case the if web page contains will return like it does it doesn't exist and in that case we also know that then I'm also going to add an else activity so just search for else take this one else I just put it in the if before the ends like this okay so if this page exists then of course I'm going to click on it so I'm going to search here in browser automation and I'm going to search here and I'm going to use your click link on web page this one so if it exists I'm also going to click on it so the UI element is the same UI element because I've seen that it exists so now I want to click on the same elements right so you don't always have to create a new UI element you can do that but that way you will just create redundant items so I'm just going to select this one and click on select and I'm going to click on Save and if the next page does not exist then I'm just going to close this Loop so I'm going to go to Loops again and then here I'm going to go for the exit Loop activity right and then I'm only exiting this Loop not this one okay let's navigate here let's close this web page make sure that you have removed all the files and I also promised you that I was going to show you how to save files in another directory so for that I'm going to open launch the first launch Excel this one I'm gonna copy this one then I'm gonna go to the save of the second close Excel so this one make sure that you have the close Excel from Excel instance 2 click on it and then for document pads I'm going to just um press Ctrl V here to paste and I'm going to remove this last part so now you should have C users and of course your own name desktop current item name Dot xlsx and that way your Excel files will be saved on the desktop which is next to this file okay um I think we're ready um so let's see if this is what's going to work and I hope you bought work just as well as mine if this video was useful for you don't forget to give it a thumbs up if you have any questions please leave them in the comments and I will try to help you as good as possible and I will see you in my next video
Info
Channel: Tom's Tech Academy
Views: 1,834
Rating: undefined out of 5
Keywords: power automate tutorial, power automate desktop, power automate examples, power automate tutorial for beginners, power automate desktop web automation, power automate desktop web scraping, robotic process automation, rpa, robotics, power automate desktop training, tutorial, power automate tutorial advanced, power automate course, learn power automate, microsoft power automate desktop, microsoft power automate, microsoft power automate tutorial
Id: k5639-5Hi5A
Channel Id: undefined
Length: 19min 16sec (1156 seconds)
Published: Tue Aug 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.