Check whether a PDF Document requires OCR with Power Automate

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi this is joe from ecosian in this video we're going to explore how we can um evaluate whether a pdf document contains a text layer so that we can decide whether we want to perform ocr or not so if we look at this existing flow we've got a sharepoint trigger um that's got a trigger condition to say if a pdf file is is created in the sharepoint library being this one then we want to go and get the file uh ocr it and then we'll update the original file so you can see here that we're passing in the identifier from the trigger and we're passing the file content so um this is fine and this this will work absolutely okay the problem is is that this is going to ocr every single pdf document that gets added to this library and if you've got a scenario where you potentially got a mix of some documents that have been ocr previously and some that haven't that's a problem because you don't want to be um ocra documents that have already been ocr'd so what we can do we can use another encoding action to check whether the contained document has a text layer so what we'll do we're going to drop in another action here and we'll just go in codium and let's just drop jump into here and let's just do git pdf and we've got it here get pdf document information is when we're after we're gonna do fairly straightforward stuff we're gonna give the file name and extension um we're gonna pass in the file content and we don't need to do anything advanced options that's absolutely fine now what this action will return is information about the pdf document you can go and check out the support portal for the full documentation for what gets returned but it's fairly self-explanatory what i'm going to do i'm going to add a condition and in the condition i'm going to check one of the properties returned by get pdf document information and what i'm going to check is has text layer so if i can find it down the bottom here we go so this this property is a true or false so boolean value that says whether the pdf document that's been evaluated has a text layer or not so we're going to say if the text layer apologies is equal to false then that means that it doesn't contain a text layer and therefore we need to ocr it so we can just simply take these documents now and pop them into there sorry these documents these actions even and what can also do is i can just pop a terminator action in here and just say that that we can terminate that has succeeded okay so really quickly uh okay yes we've used extra action and coding action to do an evaluation but actually uh computationally we're gonna save a lot of resource because we're only gonna ocr those documents that now need to be ocr so i'll do i'll save this and i'm going to drop two files into the um into the sharepoint library um and we'll we'll see what impact that has i'm going to drop one in that doesn't have a text larry one that does so let's just drop these in so those two documents should individually trigger this flow and what i'm just going to do i'll jump back to flow and we'll go to the run history and we'll be able to see those runs as they come through so we'll just wait for those to update and fire okay so we can see those uh those two flows are fired now so we've got one that's run at 13 seconds 153 so let's just uh open these up so we've got the first one that's running here and we'll have a quick look once that's loaded let's just check the second one's loading as well power automate's been a little bit slow this morning so here we go so the first one's happened get pdf document information and if we want to check the outputs we can but we can see here that the the text layer i should imagine is comes back as false which it has sorry true so this file contains a text layer so therefore the flow would have just simply been terminated because we didn't need to do anything ocr hasn't been performed whereas the other document uh the text layer would have been false so ocr ring has been has happened to that document or been performed even um let's just double check that so you can see here that we've got has text layer equals false therefore the documents within those yards so super simple uh again if you need to before you perform aci if you want to just check whether you need to do it use get pdf document information action you can evaluate the property has text layer and if it's equal to false then you can perform ocr and if you have any other further questions or queries um you can visit support.coding.com or simply email support encoding.com you

Info

Channel: Encodian

Views: 2,045

Rating: undefined out of 5

Keywords: Encodian, Power Automate

Id: jDmiAWtfP18

Channel Id: undefined

Length: 4min 45sec (285 seconds)

Published: Fri Mar 11 2022