Web Scraping with RegEx in Power Automate Desktop - Advanced Use Case

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
regular expression or regex in power automate for desktop is very powerful let me show you what reg X is and afterward I will show you how we can use it in power automate for desktop imagine that we got some data that could be from a web page or another system they look like this these are four transactions in four lines so the first one here that is the category and this is the subcategory and the date I really need the category so that one is this one here and then I need the subcategory in another variable but you can see the sizing differs so I can really use character positions I need to come up with another Rule and that is exactly what rig X is reg X is a search pattern that can easily extract data out of structured data let me show you how powerful it is so open orbit browser this is regex 101 a regex tester my test string I just Ctrl V and paste in these four lines of data here and then if I want to get out the first let's say the category I can go up here I want to start at a line and that is a head now you can see these four here turns purple I just told regex that I want to start at a line I haven't extracted anything yet so I have four matches but they are all null because I haven't defined what I want to extract say that I just want for example three characters then I can do this a DOT that is one character and then in curly brackets I can say three and do like this that will be three characters but that was not what we wanted we wanted to extract the category and when you create these regex pattern we need to think about how we can create a rule and the first rule could be that we want to extract everything until we reach a space and a forward slash because a category can also have multiple words for example now I didn't I'm just saying noodle ABC it can look like this as well so we need to say we want to get everything except when it comes to a space and a forward slash that will look like this so first I want to say I really need every character except a forward slash so in hard brackets that will look like this then I can have another hat and now a hat doesn't mean that um I'm in the start of a line but that means that I want to extract everything except now the forward slash comes in and again you need to practice a bit because and what you want here is to learn all the basic about regular expression and if you sign up to my newsletter down here I'm planning to make a complete regex guide for power automate for desktop developers in later this year so um but now back to the case and then I want to say a star and that one gives me everything until the end but I also want to say I don't really want the space and that can be quite hard to configure because if I did it here I say accept a forward slash and a space because then it will miss this space as well but what I can do is that I can say look behind a space and a forward slash that looks this with a that's actually a look ahead so it looks ahead of something so now this will look ahead and here I can say what do I want to look ahead of well I want to look ahead of a space like this then an escaped forward slash like this and maybe another space so now if I go down here I can actually extract this category with this symbol pattern this will work whatever happens we just need the rules saying that the category is separated by a forward slash so I can take this regular expression I just copy it so we have it here in our symbol notepad I'll say category looks like this to get this subcategory we will do something similar except that we are not in the start of a line so I will remove this here I want to say I know that the server category always comes after a forward slash so I can say look behind like this and I just need a area like this so now I can say what do I want to look behind well I want to look behind the space in the forward class and it is escape because a forward slash has a different meaning in regex so we need to escape it with this so now I'm saying I want to look after a forward slash here you can see that it actually gives us multiple positions because we also have a forward slash here but we came a little bit along the way what we can do is say that I want to look for everything like this and now I just say I want to stop at the next forward slash that will be here here here and here so then I do this I do this a um an escape I just need to do this and escaped forward slash and maybe a space like this that's how we get the subcategory out now we created a pattern for that as well my name is Anna shenzen which learned some Microsoft power to make for desktop let's see how we can implement it in power automate for desktop and this whole case we will see how we can extract data and use regex with it is actually from one of you it's from po thank you Bo for the nice question this will be an excellent video I will in this new format answer questions to you live there will be less edited so you will have a little bit of error that will be fine no zooms but I will answer comments from these YouTube comments down here so place a comment if I should solve your problem in a video the next time now we proceed so we got this subcategory and I'll move it over here so just to say this is my subcategory pattern so let's look at the case so the case is that we want to move to start.dk forward slash so s-o-e-g this is search page in the Danish governance side so what we want to do now we got 2740 results you can see down here all uh items here but we only want news from this year so I want to click start date and then I want to say January the 1st here and if you watch this video in one of the coming years you can just say the first of January in the year that you're in for example 2023 and then today's date I can just click here and click this little blue so now I got every and result from this year I can click Start search and here you can see oh no I got 176 results that's fine but I got 18 pages to scrape that means that my robot should click next next and click all these pages so instead we might want to say I want 50 result per page now you can see we've only got four pages could we do this smarter yes we can we can go up here in Max results and then we can tweak the URL tweaking URLs is always a nice trick as an RPA developer we can do this say a thousand now we got the 176 results in one H I can see these are the ones that we want to scrape we want to scrape the title that is this one here new Andalusia Visa that was Danish by the way I'm Danish and then we want the category that is the new that is the one that we just practiced in revic so this will be easy for us then we want the subcategory and finally we want the URL because if I click one of these you can see that this is a total new news so I want this URL up here let me go let's create a solution for that so now we handle that we can have every element by trigging the URL we also need to address the problem that today is the 28th of September 2022 but tomorrow it will be a different date when the robot runs or say that robot should run next week then we don't want it to stop at the 28th of September 2022 we want all the news from about till today what we can do of course we can have our robot click buttons but look at this up here here you can see the date from and then the date too so if I can just tweak this little element up here to have today's date then I will have a club list the search and I can just start scraping instead of us clicking button with the button with a button so I copy this URL we will tweak it in a little while so I will go back to Power automate for desktop here I have a blank flow and what I want to do is that I want to launch a new browser so I'll say launch new room like this and here I'll drag it in my URL I can just paste in this one here I know that we said we want to change this we will do that in a little while just remember the structure because the structure is here I have a year a month and then a date separated by these I there's one of you told me what this was this little separator please let me know in the comments again and I promise not to forget it so we want to do this Dynamic and we can do this we can click save here we can just get the current date and time we will only use the date but so we'll say get q and date and time and drag that one in here like this this will just give us the current date and time so what I want to do here is that I want to say I want to say save but this is in a daytime format I really want to convert it to a text so I can use it in this URL so now I find a convert date time to text here and drag that one in here so I'll need a date time to convert and that was this q and date time from up here do this the format to use I need to I can either use a standard button I need to look if there's something that matches my here I don't think that is so what I will do is I will say custom so again remember what we wanted we wanted a year first that will be four wise this is actually.net custom date time formatting you can easily Google this if you want to know more or write me I'll gladly send you the link and I want a month and then I want a date so now I have the same format that is being used as up here I can use this little it's it's been getting saved in formatted date time and just click save and we can use it down here so open the new launch new Chrome and then here manual date too sorry um I just click delete this and click this little X here choose formatted daytime now we use that in our URL so now we know that we've we grab the Qing date then we use it in the UL making sure that we always will get the right result let me just show you here I can run it and this one will open up the newest one I know that this one was for day for today but that's actually worked because we use the variable here isn't that beautiful if you like these sessions where I help you guys with your questions here in the comments please let me know give this video a thumbs up or a comment that will help the channel a lot and I can make a lot more videos so what we will do here is that we can now start scraping and let's just inspect what we actually want because we want this title then we want the category the subcategory and the UIL most of it we can get by using the extract data from web page but there's a build and trap and actually something nice for you to see and learn so here I find a extract data from web page on the web data extraction drag that one in here so I need to open up my search page which is here now you can see that I can get all these results here so first I want a title I can't just right click here extract element value and pick the text that is the first news item then I want to say I want to grab the next one as well then I say right click extract element value and we get each and every one there's only 220 here in the preview but it will work for all 176. so now we get that then we need to get the category in the subcategory and now you can see here we can only get the entire line it's not possible for us to get inside here and that's why we introduced regex in the beginning of this video so for now we'll just grab this whole line I'll just right click extract element value and get it so now you can see that I got the entire line we also want the URL and usually a URL is tied to this title that's where we click to get to the next page but if I click right click extract element value we can see that we can't find any URL tied to this title and that is there's a reason for this so let me just go over here if you press let me just see F12 that will open up the developer tools um it's it's the easiest thing to open up here in the button if you have it over here you can just click these three dots here and talk to the button it might look like this I prefer to have it in a button but it will be the same that's just because the text goes white it's more easy to look at let's and that can inspect any element on a web page so that is the code that the web developers made that we can now inspect and these this code is tied to the address that powered made for desktop users so we can use this code to generate our own element for the URL if I click this little layer and then I can just click this Arrow I'm under elements click the arrow so it turns blue and click that here that jump jumps us to you can see here to the H2 new and illusory Visa free episode of handicap come my job that was dangerous again but this is the title we need to say where is the where's the URL tied to this and you can see it's actually up here so eight uh this element H and then the attribute href that is actually usually the attribute we use for ul's you can see it here and then we have the UL here that is the URL that we want to grab so what is important here is that we can look at this saying we want the UIL and the URL is in the a element just before the H2 so for now I will just move over here to this live web helper like advanced settings and here you can see what we how we got the title we actually got the title by having a base selector but then we have the H2 so power automate generated this for us and it it says eight a and then H2 so we can just use this a without the H2 let me let me show you how that looks so specify additional CSS selector then I'll just say sorry a and over an attribute I want the H ref attribute so I'll just say h ref like this and that's it and I can click OK there you go we now have all the URLs tied to the elements isn't that beautiful I can just click finish now and then just click save so we can now we're now done with the data scraping but let's just see that everything works and let me just close here and I click run so we're just going to see that we can actually get all the data so if I go back to Power domain for desktop you can see data from web page 176 rows I just do this there you go we now have all our data so our next problem problem is that we want to Output the data so we want to have it in Excel and then we want to use regex on this one here so one step at a time let's start by outputting the data to a data table a data table is nice because it's easy to work with so we'll create a data table in the beginning of the flow we can do it in several ways either with the native activities or just with a set variable so drag a set variable in right after the get current date and time I like to have this get currently in time in the somewhat beginning but we can easily argue that it could be everywhere it will just take a date right when when that occurs so it doesn't really matter now here in variable I'll say I'll call this result this will be a data table and the way we do data tables is that we have these standard signs in here and then we'll say we want to have these these early brackets that is a data role and we want to specify this is a header row so that will be a hat and then we will have the data Row in hard brackets so right now we will have nothing in it but we can start adding it I want a title like this then I want the category like this I want to sub category like this and then I want the URL so that will look like this and if you have made a minor mistake power to make for desktop you just warn you you forgot it um a simple a comma or something so now if I click save we don't have any errors but let me show you if I just forgot this comma here it will say syntax error so it doesn't say quite what it is but it tells you that there's something wrong and we can just make the comma here and click save so now we created a data table now we can say for each one of the rows here in the data from web page that is the data table that we created with all the data then we want to add a data row to this one yeah so here I will say for each that will iterate through a collection here a data table I'll say for each the value to iterate well that is data from web page oh I will do like this and it will get stored in a current item reference variable so this current item you can see it as for each one of the rows we can refer to them as they occur by saying a Q and item I can do this now we're entering through it then we want to say we want to add a row to this data table up here so I'll just use another set variable I like these set variables compared to the native data table activities because they're easy to configure if we want to add something later on we just add a thing here to our expression or down here but now we drag it in here and what do I want to have in here well I want the title the category subcategory and the UL we don't have the subcategory quite ready yet but we will have it in a minutes so um this one here that will say that is the result so I click the X and depict the result I also want to be careful here because when I want to add data to the result I don't want to override whatever is in there so I want to say I want whatever is in the result currently that is I know it's empty in the first one but the second one it will not be empty and then over here then in here I'll say plus and these spaces doesn't matter power automate for desktop will ignore it so then I want the data row and again I'll sing hard brackets here I can refer to the current item so current item like this and then I want a a hard bracket and then I want the index number of the data table first I wanted the title and that was the first thing we scraped so that will be zero you can also find it over here in this data from web page these three columns that is the columns that we use down here so I'm saying q and item 0 because it's zero index so the First Column is referred to as zero and I want the Q and item one like this that is the whole line as the category so we don't really have it separated yet and that's why we don't we will for the subcategory we will just have two quotation marks and finally we want a Q and item and then we will say two that one is the URL so we have fixed the title or we'll see the title the URL we haven't fixed yet these two here but we can just click save and fix it later now after this we want to print it to excel so here I want to say launch Excel now I can launch an Excel sheet and print the results to it so I go down here launch Excel I'll just use a blank document because I'll pick I'll write by this result data table to it so I have nothing in it make instance visible no thank you and then I can just click save now I can write a to Excel worksheet and the tricky part here is that we have to write the header separately from the result that's just the way power automate for desktop works currently so I'll say right to Excel worksheet what value do I want to write let's write out the headers first and we also get a little bit of practice and here I'll just add a normal data row to it so this will look like this it will look like the exact same thing as before with these curly brackets hard brackets and percentage sign and then we can have the item in here so again I'll say title I'll say category and again if you want to see more I gladly do more of these videos instead of my normal job just you can help me a lot by giving it a thumbs up that will bring more people to the channel making me earn more to create these videos and five someday I will be able to to make these videos full time that could be really awesome so um but just thanks for being here that means a lot and again if you have any questions about power domain for desktop or in your other IPA tool post them down here in the comments and I might pick you a question and answer them with a video I'll do it each week from now so I hope you really enjoy this so now we have this where do I want to write it well I want to write it in a and then the Row 1 that is my header so I do like this now I can write out my result so another write to Excel worksheet so um I want to write to what value do I want to write I want to write the result where that one is here again I'll just write it into the a column so I want to start at the A and now I will do it at 2 because I have my head is in one I can click save let's say that I just want to save it on my desktop and you might have a different desktop but then then I do like the address mine is called unless Jensen something then we can just use a get special folder drag that one let's pick it here and drag it in here this will just return a special folder path with this value and that is the the address of my desktop so that is good I just got this special folder now we can close Excel again so I'll say close and that is important otherwise power to make for desktop locks it but it's also where we can save what we have written to it so here I will just say what do I want to save well I want to save document s and now I'll use the special folder path down here like this X special folder path then I'll have a backwards less results dot slsx like this and then I can click save so shouldn't we test it but what I really want to do here is that I don't really want to scrape all the salt because that will be quite uh slow so let's just pick 10 see that works and when the robot runs in the end we will of course pick every element remember to save your flows because power is made for desktop will not auto save now let's run it and to see if we have any errors and of course we will fix it if we do so now we've run these 10 we open up Excel and in a little while we're done and now we ran our flow let me minimize all this here we have our results and it's over here there you go we now have let me just do this here we have it we have we have sorted a lot of the job we have the the title here with the the header we have the category well we haven't sorted that one yet that is the regex bar and we have the subcategory that will also be the regex part and here we have the UL so we came a long way so what we will do here is that let me close this one down again and um let me just move in here and say I really want to um to get the regex up here for each one of these rows I will use my regex so using regex in power automate we will do that with a pass text we'll need two of them one for the category and one for the and one for the subcategory the text to part well that stands in the current item and let me just click save here it will give us an arrow so um in the Q an item let me just open that that is actually just a row from the data from web page but it will look like this we want to look in the second column that is index one so what we can do here we can just go in here text to parse true and item then I want to say one like this oh sorry curly bracket that was not my my meaning so I will just say look in the Q item the first that is actually the second column and what text do I want to find in the beginning we created the regex for that so uh the category that was this record remember we created it in the beginning of this video rewind it if you haven't seen it the video is designed for that so that looks like this is regular expression yes first the currency only doesn't really matter we only have one equivalence but that is fine down here we get two variables produced we get a position that is the position where this text occurs that will be position 0 for the category and then we will have uh the category printed out in a variable called match let's rename it I'll go down to match and I'll say category like this and then I click save similarly we will do a regex pass for the subcategory so drag that one in we will still look in the Q and item one oh I could have copied that one or just make it again to practice your Q an item run and then the regex pattern well we just saw it here so I'll copy the subcategory in and paste it in here yes it is regular expression and again we will just change this match here so I will call this sub category it's just so it's easier to refer to than match or match two now what we want to do here is that we want to be enable to to write them in down here so I can just go here and instead of this q and item one I can just refer to the category variable that we just created and similarly subcategories sorry sup category like this and then I click save now we just need to fix one thing before we test it and that is this Max results up here I'll set it to a thousand again we only had it while we tested and I can click save here and now we're ready to run I will just save here and then we will click run if you want me to solve your problem next post your problem here in the comment under the video and I might pick you up for just for next next video I'll publish in next week I enjoy making these videos because you'll see me develop I can see it will take a little bit extra time with 176 rows but I will enjoy these videos because you can see me develop somewhat on the Fly of course and prepare a bit just so how it worked but you'll see a lot of advanced concepts Concepts that we don't necessarily charge in the regular videos now let me just fast forward take a sip of water and see you when the robot has ran there you go now we're done let's inspect our results just to see that we've got everything right so open it up here there you go we now have all our data we have the category the subcategory but what is uh what is here you can see that we're missing some subcategories did we make a mistake or um was it something in the data so that is just copy this one here so we can search the web page so you can see here that there's no subcategory on this and this and that might mean that we have to refine our big expand so here we have our search result click Ctrl F paste this one here here you see we have no subcategories on this one we only have a category so our wreckage pattern actually solved for this and you can see it down here click the video up here to be taken to the next Advanced power automate for desktop use case see you next week
Info
Channel: Anders Jensen
Views: 5,114
Rating: undefined out of 5
Keywords: power automate desktop, anders jensen, regex, power automate desktop regex, microsoft power automate desktop, power automate web scrape, webscraping, web scraping, power automate desktop use case, power automate desktop advanced, power automate desktop examples, power automate desktop web automation, power automate desktop tutorial
Id: TmeGaxksrUA
Channel Id: undefined
Length: 30min 10sec (1810 seconds)
Published: Wed Sep 28 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.