Redact Sensitive Data from PDFs | Microsoft Power Automate's AI Builder and Encodian Flowr

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Sophie from En codian and today I'll be showing you how you can start to redact sensitive information from your documents using both Microsoft's AI Builder and encod and's flower So today we're going to be looking at how you can start to redact payment information from invoices so we're going to build a custom AI Builder document processing model which is going to identify the sensitive card payment information and we're then going to put this into to power automate flow using flowers redution action to actually go ahead and redact these identified pieces of information so let's have a look at the solution So within the Power Platform aiub is the new place to go for anything related to AI Builder and within aiub you can then see all of the available AI Builder models so we can see here we've got a lot of models that are already pre-built but we can also see that we can create custom models as well so we're dealing with invoices today so yes there is already this invoice processing pre-built model here but when you're using pre-built models with AI Builder you're not able to actually go through and add any custom fields to tag and identify information so this is the invoice model so we can see all of the extracted information that it's going to bring back here and if I just scroll through we can see that there's nothing to do with card payment information so if you are just trying to extract the information that is already there definitely use a pre-built model but if in like an our scenario the information is not there you're going to have to create a custom model so I've already got a draft of a custom model which I'm going to walk you through step by step so you can see how you can start to build your own and the interface is really nice it's just step by step and you just click through so there's no coding no formulas or anything needed so when you create your custom document processing models this is the first page that you will see now I'm taking you through a draft today of a published model that I've already created for this solution but I'm just going to take you for it step by step so we can see what you need to do at each stage in the process and then I'll take you to my already published version so we can see what it looks like at the end so the first thing it's going to ask you to do is to select what type of document your model is going to be using so we're using invoices which is why I've got invoices selected you can also choose unstructured documents or structured documents when you click next it takes you to the information to extract so because I've selected an invoice all of the fields that we saw previously before with the invoicing pre-built model I'm just going to come up here as default because it's the type of document I'm using but I can also go ahead and add any custom Fields as well so to add a field just need to go to add and then you can add text number date checkbox or table fields to extract from your invoices so I've already added these in so I'll just show you what it looks like once you've added them in and this is what it looks like once you've added in some of your custom Fields you can tell what ones are the custom ones because they don't have that big default button and you can also easily see what type of field it is that you've added in as well so once you've added in all your fields we can just click next and this takes us to the collections of documents page so when you're dealing with document processing models it allows you to be able to process multiple different types of documents under one AI Builder model so that means one model can process multiple different types of documents meaning that you're not going to have to have a new model for each type of document so it just keeps things a bit neater when you're dealing with these AI Builder models so for this demo I've got two different types of invoices and these groups of documents are called collections as well when you're building new models so my first collection is for invoice type one and my second is for invoice type to they both have the same information but the layout is slightly different of the on the page so here is where you upload your documents so because we are training an AI model the more example documents that you can provide for each collection the better the results are going to be and the minimum number of documents that you can provide is five so just bear that in mind the more documents you have a yes the more tagging you're going to have to do but at the end it will create a better result so to add new documents just need to go to whatever collection you want to upload them for and click add documents at the top once you've finished adding in all of the documents you can then go ahead and click next and then we'll move on to the tagging process so I personally think the tagging process is the best fit of creating these models so we can see here because this is a draft I've already got items tagged however we can see that I don't have the phone number tagged now I don't have a field for phone number but say I did and I needed to tag this I can just drag this box around that piece of data and then and then I can select what tag to do here but because all of my tags are already being used the ones that I'm using nothing shows up here but for example if I wanted to I could tag this as card holder name but you know it's not so we can just see here that we've got this tagged as card number we've got this done as card holder name I've got the CVV and I've got the expiry date and just to note as well these card payment informations on these examples are not real values they were created from an online generator so please do not try and do your shopping using them because it will not work so this is how you know everything is been tagged and to move along you just go up here and you just move along documents and when they have a tick it means that all of the all of the fields that need to be tagged to all of the custom fields that you've added have been tagged in the document and once you're finished you can click save and close so once you're finished you can move on to your other collection type and when you're ready you can click next now that the model is ready we can see what information we're going to be extracting from the different document types and we can just go ahead and click train down here at the bottom clicking train will then start the process for the AI model to be trained on the data and the examples that you've provided and this will take a few minutes so I'm just going to pop to the module that I've already got published to show you what it's going to look like and this is my published model we can see here that we are given an accuracy score so this just gives you an indication of how well your model is working so we can see we've got 96% which is a pretty good score over here we can also see the information to extract so this is going to show you all of those custom fields that we added in how well that it's working for each one as well you can test your model which of course is really highly recommended before using it to make sure it's working as expected and it's pulling out those right bits of information because if it's not you may need to upload some more examples and then retrain once you've tested and you're happy you need to then publish your model so because that my model is already published instead of a publish button I have this Ed model button but when you first create a model and you get to this page you need to make sure you hit publish because if you don't and you go to use it in a power automate or a power app it's not going to show in the available list of models to use so just an important note here if you have tried to use a model that you've created and it's not there just double check that you have published it as soon as you've published it it'll be there ready to use so now that we've got the model ready let's have a look at the flow that we're going to be using this in to start the redaction process before I show you the power automate I first just want to show you the one drive back end so because I'm dealing with documents I've just set up a one drive folder called invoices where I'm going to drop any invoices that need to be redacted and I've also added this folder here called redacted documents which is where the redacted invoices are going to be put at the end of the power automate of course you could Auto to make the process of your invoices being added to this folder but for demo purposes and as we'll see later I'm just going to drop it in manually so this is my power automate flow so my trigger is when a file is created and it's going to be pointing to that invoices folder the next step is where we use our custom AI Builder model so the action needs to be extract information from documents because the model that we're using even though it's a custom model it is a document processing model so this is where it will lie underneath AI model here you can then select the model that you need to use form type is the document type that we are inputting into the model so you can only input PDFs jpegs or pngs so if your invoices aren't in this format and you need to convert them say to PDF you can always add an extra step in front of this to convert them to PDF using another one of flower's actions but for today my documents are already in PDF format so I don't need any conversion and then the form is where you supply the file content of your document the next step we have is redact PDF and this is an encod and flower action so we're just going to provide the file name of the one dri file the file content of the one drive file but the inputs that we're using for this action are going to be the outputs from our AI Builder model step so our AI Builder model when it runs it's going to bring back the exact pieces of text that exist for those custom Fields so it's going to bring back the card holder name the cbv the card holder number and the expiry date as it is on the document and AI Builder actually brings back more information than this so it also brings it brings back these text pieces but it also brings back value pieces and the reason that we're going to be using the text over the value is because especially when we're dealing with our expiration date because we set it as a date field is going to come back in a date format so if we wanted to use this we'd have to use format date time to get it into just the month and the 2ear format cuz it's going to come back as a different way whereas if we just use the text of how it appears on the document we don't have to do any extra steps or use any Expressions there and we know that that's exactly how that piece of data already lies in the document so when we're inputting data into the redact PDF action you can input the exact pieces of text that you want to be redacted like we are using but you can also use regular Expressions as well if you know what they are for the information that you're trying to redact but in our scenario today we're going to be using the text values as this is already an output from our AI Builder model there's also two bits of UI with this action as well you can either input information like this or if you prefer you can switch to enter it into this Jason array format here and actually this makes a little bit easier just to see exactly what information is feeding in and where we have our Dynamic content from the step before once the text has been redacted we can then create a new file back into one drive so this time we're just going to add redacted uncore in front of the file name and the file content will be that from the previous encoding step and that's going to contain our redacted invoice so let's test this and see how this works in real time so just coming back to my one drive I'm just going to upload some test documents and and now we'll wait for the flow to run so we can see that the flows have both now run and they've succeeded so let's head over to one drive to have a look at the redacted invoices so we can see I've got redacted test three and redacted test 4 and we can see here all of the card payment information has been redacted correctly so let's test test four and we can see that this information has also been redacted correctly even though it's a different format so this is just how the model has worked with the two different types of invoice formats that I provided it so today I showed you how you can start to extract payment information from invoices however you can start to extract lots of different types of information from lots of different types of documents you may just need to create a custom AI Builder model to do so like we went through today as the pre-book ones don't always contain the fields that you need to identify as it's using AI it's always recommended to be responsible when using this technology so having a human step somewhere between maybe emailing out these redacted documents would be really highly recommended because AI doesn't always work as expected so especially when dealing with sensitive types of information it's always better to have a human check step somewhere along the way in your process especially before sending the document on either internally or externally as always if you have any questions about anything you saw in the video today please leave me a comment down below or get in touch with us here at en codian and as always happy automating
Info
Channel: Encodian
Views: 277
Rating: undefined out of 5
Keywords:
Id: QYej75aEC0k
Channel Id: undefined
Length: 14min 46sec (886 seconds)
Published: Wed Mar 06 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.