VBA PDF Automation - Read, Write, Extract, Convert, Control Pages, Forms and more

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome back to new series from VV a a Cosette if you're new to this channel my name is lumpam I and today I'm gonna show you all you need to know about automating PDF files PDF documents are portable and accessible from any devices it is easy and free for viewing but as you know to edit an extract data from PDF is tedious tasks that is why in this short series we'll look at some of the real case scenarios on how you can use PDF api's within your VBA program we'll learn how to publish or convert PDF documents read contents extract PDF tables read and fill out PDF forms and also learn how to manipulate objects within the document these are simplified tutorials so anyone with basic VBA or Excel micro knowledge can follow along also please note some of the videos use Acrobat so if you do not have one please find the link to download a free version from the video description before we begin just make sure you subscribe to our channel and hit the bell icon to stay updated with our latest videos let's get started in this section we're going to quickly learn how to publish the documents to PDF here are my sample files the excel file the PowerPoint and the word document first I'm gonna go to my Visual Basic for Excel insert a new module let's call this publish to PDF game output file that string this output file is gonna hold the pot and the name of the file so this workbook dot name this will return us the name of the workbook but from there we just want to take the name of the file so I'm gonna tail left this work dot name and from there I'm gonna take the land and subtract minus 5 this will just give us the name of the file without the extension and here I'll just add dot PDF okay that looks good along with in the amount of file we want to include the name of the workbook part as well okay that looks good this will be the name of our final file I'm gonna copy that output file name is equal to this workbook part and this workbook that name and changing that xlsm to PDF okay now we'll write the source State one more statement to publish this kid one is what I want published so say export is fixed format the format will be PDF the file name here is gonna be the output file and quality can be Excel quality standard you can you know include all these properties some of the application will have a wide variety so it should be okay even if you do not include them here some of them are ignore print areas you can specify if you want to export from certain page from page 7 to 8 for an example or if you want to open them after it's being published so I'll just leave it as it is for now I'm gonna run this code once it completes we'll expect the file to be exported into this folder publishing it now seems to be working okay and here's my PDF file quickly opening it this probably is 38 pages very smoothly okay now we're going to quickly replicate this for Word documents and PowerPoint first I'm gonna go to my PowerPoint again insert a module the name is gonna be same instead of this workbook let's do active presentation and here instead of one we want to export if I devote this now for example so it's giving us vbe underscore PDF dot PDF so it seems to be working ok now we'll publish this PowerPoint to video active presentation dot export as fixed format it will take in first as a file pot let's copy the output file pod paste it here this will be again PDF and then you can intend for print and it is white again it's white variety is a properties I'll just leave it as it is for now to default so once I run this code it's gonna publish again to to the destination folder here publishing it and here is a file ok so finally we'll gonna Google do this for our word document so bezel basic from this document I'm gonna insert a module call it instead of this workbook we can say this will be active document I'll replace this workbook with active document and here instead of sheet one we'll have to use active document again it's pretty much the same thing but to give you a better idea this will be the file pod and then we can say exported format as PDF and do you want to open after it is published I'll just say yes for this one straight out so once this is published it will open up a PDF file itself oh yeah seems to be all working fine let's also learn how to quickly create a PDF form I'm starting off Adobe Foam central so once you open you'll see a wide variety of forms that you can utilize to replicate the phone it's very simple to create them for now let's create a custom phone here are the controls and objects that you can add to your phone like a text box and so on we're gonna add a text box to text box and let's call this the first name and the last name you can also easily change the type later on also now I'm going to add the email ID it has the controls and validation in place already let's add a quick like a survey option or call this are you satisfied with our services or something like that you can change the preset and color and everything whenever you have time you can play more with that let's add the date field as well we're gonna use all this field later to read and writes or add different type of fields here in the phone so this can be like a report date yeah a survey did well at the header let's call this custom phone maybe maybe it is a demo I think that's pretty much it so it's very simple to create a PDF form you can also change the name of the file from the top there called demo and as I've said you can quickly change the validation the type of the text box the number types and so on it's very simple before we start writing the code let's quickly copy the file path and now I'm gonna create a new workbook let's insert a new module and create a public constant variable called PDF underscore file and we'll store our file path here we'll create a sub procedure called read from PDF and before we start writing the code we'll reference it to Adobe Acrobat library just make sure you reference it to the latest version I have 10.0 there now let's start declaring variables the app will be our acrobat application every document this will be our acrobat viewer now let's replicate this for PDF dog this will be PD dog and select X variable again all these are object variables in the page we're gonna select the text later this is for the page corner and the text corner we're going to look through later on we're gonna use this and then this will be the page numbered content page content and the content itself within the page now let's set the variables creating object we'll create an object for a VAR every duck as well now we'll open the PDF document adding a quick handler in case if it fails to open it'll exits up now we'll try to reference and get the active document in this slow I'm gonna add a reference to get the active document from the application yep okay now I'm gonna reference this PDF RP a PDF underscore dog to the AV dog then we'll be exposed to the methods from the from the AV dog for PDF doc now I'm gonna look true from for all the pages the counter start from zero we'll have to subtract one from the pages so within this loop we're gonna reference I will create this page page number content and and so on okay creating an object again for our page content this is this particular object is basically gonna highlight highlight tags or contents within the page and then we'll discounters is basically we're selecting all the text from 0 to 9,000 you might want to add more if your page have more contents adding a quick error handler in case it returns an error when we try to select the text within the page select Isis using the page highlight page content from the page content okay now we'll start looping through all the all the text within from the selection or j.j counter is going to take care of that again looping from 0 till all the total and let's print this value to our immediate window that's pretty much it our we're gonna review it later after we write everything let's quickly clean the memory so make sure that once it finished running it clears out the memory so the next time it runs it's okay we're gonna close the PDF document there without saving it false stands for that we're gonna exit the application and then we're just gonna set all these variables to nothing all the object variables so I keep on compiling there just to make sure that it compiles okay though these are hard party API so it might even if it compiles okay maybe it will not be 100% working but then we'll have to run and test it out quickly we'll have to change this from read from PDF the typos and looks like we have to add it should be yeah it should be pitched content dot add ok so let's quickly check out our code and make sure it's working I'm gonna quickly turn on the immediate window and try to run this code so I'm gonna hit f8 and quickly go through the code so it's starting to print out the data from PDF okay I'm gonna run the rest of the code let's extract all the data but since it's too much of data really doesn't make sense we print it out here in a mediate window let's try to transfer the same data to the to the worksheet here just say range let's print this out to column a okay I think that should do I'll call this PDF and I'm gonna execute this it's super quick it extracted all the data now I'm gonna quickly open the PDF file and do a quick recon so as you can see here the top flows report and the date is here in the PDF and then the Adobe Acrobat is this the paragraph starting from here and then so on so as you can see the data that is being pulled from PDF is not coming index format that we need so in the following video we're gonna put in some additional logics to pull in data from the table from this particular PDF in our last video we'll learn how to read all the contents from PDF now I'll show you how to pull records from PDF table into your worksheet first of all we're gonna identify some of the unique strings that begins and ends in the table for example our unique string could be below are the top five and flows for the week or it could be the UID and it can be high-level observations here that we can use to create a logic so based on these two strings will end a number of the columns which is here we can see four columns so based on this will create our logic in to pull in the data coming to our ID here I'm gonna quickly copy this all this code insert a new module this time I'm gonna call this read table here we're going to introduce some variables let's call this data print let's call this when we want to print or not this will be true or false and then the counter for our column stories in a long value and then the current row okay this will be our three new variables learn how to use that now so coming down here we'll say content our content is gonna be this data that is pulling in from I'll just comment this out for now here I will say if the content value is like something like UID which it is then I want to start printing the value or writing the values to the excel then I'll say that else if the content is like high that's when I want to stop printing it so I'll say data print go to false and then I'll just stop printing altogether exit from the loop itself it's also exit for now we'll start writing a logic to print the data if this is true this condition is met so we'll say if printer is called it true then just increase the corner : and we'll see cells or whatever the role that is defined will start printing from this row row number one and then the column and they will say print the contender this loop is going to take care of for each of the column now we have to reset the column once it reaches for contra is equal to 4 because we have four columns now every time he reaches for I want to reset for the next line column is for 10 on its call to zero so every time we reach here it will start with 1 again and then we'll increase the Rock honor as well once it reaches 4 okay I'll code should be working fine now quickly go true I'll run through this I'll just put a breakpoint there and run this at 5 so now I'm gonna press f8 seems a breathing print out the headers and then now the data since we print it out maybe not in the format I'm gonna run trolled us now first f5 okay so there are some cleaning that is required okay coming back we'll clean this content the same axle trimmed and then we'll clean as well because there is like a new line there this is a clean function within the clean mode put the trim as well I'm gonna rerun this quote very quickly now the data seems to be good so the date the date portfolio is in general and then this is in number format so if you look at our table here and the table here so it seems to be working great the beginning will quickly copy part of the form and we'll create a constant variable let's call this PDF form file and then we'll store this as a string okay now let's reference this to Acrobat phone if on aut 1.0 or whatever it is for you create a sub call rate PDF from Phil's will declare that Acrobat app let's do this for Acrobat viewer the same as a document we'll have to use both the form as well as the document together now we'll start clearing the the form and the fields this will be the field will replicate the same variable for fields later on we'll gonna look through all the fields I'm going to create the object for our application we'll do the same for the viewer for the document okay now we can we can open the document we'll put the same validation like we did in the first video we'll make sure that it's activated or brought to the front this can be hide our show for noms is gonna hide the application this is gonna run at the back end I'm gonna start creating an object for our phone okay now we'll start referencing our fields within the phone setup the fields okay now we can start looping through all the fields with using this variable seems to compile fine I will try to print out all the valued the name of the field the value within the field and the type of the field so it's better to use words because we're gonna reuse name of the field type of the feel like this is a tax or is it a radio button and so on my email button let's put a bar between each of these fields so the one is printed out it makes sense okay so our code is almost ready though it's gonna just print it out let's close the form without saving it we'll exit the app and then we'll just clear the memory okay since the compiled fine so let's give it a shot wheels gonna just press f8 and go to its line okay seems so open I'll start to print the name of the fields the type and the value at the moment the phone does not have a value so we'll quickly fill out something and we'll try to rerun this let me open the form and just put in put in my name and some dummy ID these two radio buttons will have its two fields and then the survey date I'm gonna save it for now and replace the existing one so you can see it's been filtered now we're gonna reorganize the window a little bit and see how it prints out the value as well I'm gonna go through it again quickly I'm just gonna make yeah just comment that out start looking through the fields so you can see the first one is strongly angry there the value of the selection radio button against agree the next one is the email ID first name and name right there and the date to have you know like seems to be working perfectly as we wanted it to be now we'll quickly write this to our sheet instead of the immediate window and then maybe we can also in our next video loop through each of the file and then pull in the data from there for each of the from that corner is going to reset every time the data is being filled and then the value just change the counter of the column column one two entry no I'm gonna run it so so I've pulled in all the data from the phone in our last video we'll learn how to read PDF form and import the field information like the name the name of the field the fill type and the value of them of the form now we're going to quickly write another code to fill the PDF form here from Excel itself and then save the phone with a new name and close this for now and just below this we'll start writing a new code well use the same part again let's call this right do you clear the app again let's call this [Applause] [Applause] then we'll just reuse this form there's a forum app and then the phone feels and we'll start declaring the name of the fields let's quickly open up the phone so we need the name there's two last name first name email ID two options button and the date so we'll just quickly declare them [Applause] you so this will be our two options button so we're declaring the the fields now and now we can start to open up the our document we'll open up this form the same phone you see them on don't see it it's show just carry all this so these are foam and then these are feels don't need this feel actually call this first name so we have to replicate this for all the variables first name last name de Mille idea the date or option button 1 & 2 so this were we need to put in the name of the fields we have that set up we can say we can directly write to the value now for example this could be Sharon spilling out the dummy information so for the forms we have optional neutral agree strongly agree disagree and so on so we can just copy and put in the neutral agree okay so now we have to save the document we can say we have to use these two we have to hope this document viewer and document itself so we'll just say dot get pity job I think that yeah now I will try to save this using this feature [Music] and this is where our full part will come in so I say it is a full and then here we put a path we can say in the same path we want to save it with a different name we'll just call this output one dot PDF if it is able to save the information of the spring out how to save something like that okay so now I think that's pretty close now then we can save our document so you want to close this and then we'll also close this as well and now we can clear off the memory say just replace all this just clear off all the memory oh it's because it the app as well do the cleaning after some time now we have to put in the name of the field here now let me just get rid of this again just to make sure that we have the updated field names just in case I added something so I'm gonna run this code 3 PDF form fields ok so I'm just gonna start filling up our field names disclose this for a bit the first one is already a button it is the emo idea this is the first name this is for the last name and I think we're good now once we execute the code we expect that this form to be filled with the information that is here in our code down here and then it will okay I think we have missed a date survey date so we can feel the deed as like this we'll call this 12 12 19 see it the file now I'm gonna quickly try to run this putting a mark so so the file is open now the background I'm gonna quickly say the feeling I will start to fill up the phone Sharon can't believe a curry neutral tool to fill out a date as well think we forgot to fill the email ID killing me just run through this now and then we'll add the email ID as well see my ID is go to a star Jim gmail.com okay it's gonna run this one more time we'll call this output - okay I'm gonna go to the folder output - you can see the form has been filled out it seems to be very straightforward um I think I don't need to walk you through all this again but the first portion is just declaring the document the viewer declaring the fields this part is the form these are all the fields that we want to utilize the app and the document we're sitting out the object and then opening up the document activating it bringing it to the front and then we're setting up we're creating the objects basically here we're saying this first name variable which is the phone field will be PDF form and the field name is this and then we're filling up the information here we're assigning a value to the field and then we're getting we're hooking the the document and the viewer and here we're saving as the new file and then we're closing and cleaning out the memory in this episode we're going to quickly learn how to combine multiple PDF files into one we'll start again by writing a new sub procedure over colt is combine PDF files will call this team will have to and from document and then we'll just set up our application copying from the fire codes this part is fine this will be or be the Portable Document instead of the viewer now we'll just say show we'll just sorta document for now I'm gonna show you a sample of the PDF that we're gonna combine I have a sample procured here start opening out our document there is a sample this this is a sample PDF that I've created it has five pages we'll try to learn how to insert between each of these pages that's why I just have the numbers in the pages quickly take the part this particular PDF again is a sample it has two pages we have opened up the document now that this index to ensure the pages is so will say to document that here you'll have to mention the number of the page you want to insert after so here we'll say we want to insert it after the the second page the count start from zero say one okay and then this will be the source or from document now the start page will be zero and we can say we want to insert also want to say get all the pages and if you want to include all the flags on the bookmark within the PDF with what's that district true otherwise it should be okay we'll see if it is false then we want to say oops fail to insert the page okay now we want to save this document and then we'll do a PD save all file again and this will be the name of the part where we want to save part we'll just call this first output dot PDF able to say my voice will say so I think that's pretty much it will close two documents here no we exit an app so we can give it a shot once discord runs we're expecting that it'll be inserted the second document which is add sample to pages within this PDF document will be inserted to this pages file after the second page starting from 0 1 1 means it's two pages so it will be inserted in the third page I'm gonna give it a run once we run this we'll see the merge merge output dot PDF see it's document saved so it's like it worked here's the output they make this smaller as you can see here it is inserted after the second page now if we mentioned okay insert it after the first page run again this time we'll call this as one well has been see it so after the first page you see it does save all the pages okay so this is very simple this is how you can use this particular insert beads method to add pages to to your document what you can try is you can replicate this same thing if you have Tarte document that you want the insert within here you can just replicate this index and this the file here this will be your tart file reference that you need to update and if you have to look through and do this for multiple of them what you can do you can combine you can create these two function and you can have you know like arguments with file1 file2 and so on and then it will keep on inserting if you need more help around this feel free to comment on the video below in the next video we'll learn how to delete the PDF pages as a continuation we'll quickly also learn how to delete pages from PDF here we have our seven pages let's write a code to delete these two pages that was inserted will be second and the third page one of those days take the file pod just have it here for our moment we'll just copy and replicate the combined code just did it all the unnecessary code was called this delayed let me just insert this in a new module [Music] so everything is same we just need to up insert a statement to delete these pages now we'll call this tool it edited output the method or the syntax to do it it's just dot till it pages here you specify the index or the page number again starting the count start from zero we want to delete from second we'll say 1 and ending will be due we'll say develop train deleted I was failed to delete there should be it's all good click give it a run after so it'll open up this output file here delete the page from 2 to 3 since the index again is starting from 0 that in mind and once this deletion is complete it will try to save the file as edited output dot PDF if it is able to save it it will see document dot say the same as the one that we did for combine the PDF let's give it a run document saved at the output as you can see that two Peters has been deleted in this episode I'm going to quickly show you how to convert the PDF document to other applications format for example we'll convert it to XLS eggs to a Word document and to HTML this will become very handy because once you convert it to Microsoft you know applications it's easier to read the data and then you can manipulate them in a way you want first I'm going to copy the file path as usual quickly show you this this is the same sample that we've been using all true I'll close it for now coming here we'll just call this I'll just paste this but for all Park there for now we'll call this convert PDF document I'll just get down all the syntax from there so we don't have to rewrite in okay so first is our application or document this will just reuse this entire thing and then don't want this [Applause] now if we are able to open the document that's where we'll try to convert we'll call the years file as string that will be let's just call this s file source file destination file and we'll have an extension input as well for now we'll just give this a document server extension you specify here it will convert it to that document so this file source file is gonna be the one we copied earlier and the destination file gonna be it will just replace dot PDF with whatever you specify as an extension so we'll just call this replace we want to replace as file dot PDF with dot extension whatever extension has been specified you know I want to start a replacement from first character okay now of destination and the source file is ready we'll open this file does have to update that because copy different other phone now once the file is open we want to use how to setup the PDF document so every document is open will do the PDF document here we'll get them active to human will also set up JavaScript object will call this zero I'm saying then what's a third let's go to our document okay that seems to be okay now and from here we want to save as this to other document the file that we want to save as and this the type so the type you can find this from Adobe site as well this will be the format so the format is calm dot Adobe Acrobat extension so whatever you specify here will be the format that's going to replace I've tried this for XLS eggs I've tried this for our HTML so you should be working fine but we'll just so I think that's pretty much it I can try to quickly run this real ones let me just run this so basically when we run this we want to see flows report out do see document seems to be rendering we bring the file for explored okay let's check the output so seems to work perfectly okay it still have the table format in tach now let's try for excel so this X I would say is pretty quick so here again it's much better than the one that we read initially and we'll try it for HTML basically the webpage construction is really fast okay seems to walk fine we'll touch base a little bit on how to use poor query to import tables from your PDF in this example we'll use the file we just converted to HTML to import a data table if you're new to Park Cory it is basically a data link technology these features is available in your excel and power bi desktop it will basically help you connect with various sources and you know I refine your their horses to meet your need okay just give it a quick demonstration I'm gonna go back to our excel add a new sheet from your data tab in your ribbon go to get data launch our power query editor from new source we'll just select the web okay now we'll just get the HTML file we converted earlier from PDF we'll copy the link and paste it here click on OK it's very simple to use so here is basically our metadata document and then here's the table that we want to import just like that and click on ok let's give it a moment once you're here if you want certain data format or data type to be changed we can do so from advanced editor so once you click here you'll basically see like just like a sequel statement that is auto-generated here is the fields and the data type UID is integer and then the date is in date type let's say we wanted the format to be in text and not date you can just replace that ok once you're done you're happy is click on done as you can see the format have changed a little bit and simply click close and load it'll now refers the data from from that table if you want a code basically you can just record this as well if I have to just redo this in a new workbook I'll just delete this for now extinct connection I'll just delete this I think it's easier to you know record this import from HTML click on okay we'll just cook this name of micro will store this in this workbook we'll just do the exactly the same thing that we just did right now wants to the editor and our source clip Boris then queered so does 3 copy the file pot coming back to our editor click on ok this can be the actual web page and not you know I'm not the local convert at once only but for now we're just using this as a demonstration purpose like the table again click OK maybe we'll just go back here and edit this as well again they do text well I have a look at the code that was recorded in the active work boots basically adding a query this is the same statement that we sell from the advanced editor these are the properties these are the list object source type using the OLAP TV and then this must be the provider with the X and then destination where is being added the command is basically select from this table and so on you can easily record this instead of you know having to type this complex syntax and if you have to dynamically construct this feel free to play around and if you have any specific questions around this feel free to comment on the video below thank you guys so much for watching please be sure to subscribe if you've enjoyed this series and also to follow our upcoming videos it would mean a world to me and I'm super grateful ok I'll see you guys in our next video bye bye and have a good one
Info
Channel: VBA A2Z
Views: 81,620
Rating: undefined out of 5
Keywords: read pdf file using vba, write to pdf file using vba, extract table from pdf using vba, insert pages in pdf using vba, pdf automation using vba, vba pdf programming, vba pdf, pdf to excel, excel to pdf, delete pdf pages using vba, insert pdf page using vba, convert pdf to word using vba, convert pdf to excel using vba, convert pdf to html using vba, automate pdf using vba, publish pdf using vba, fill pdf form using vba, combine pdf pages using vba, vbaa2z, pdf vba, macro pdf
Id: uc6palG76Y8
Channel Id: undefined
Length: 65min 59sec (3959 seconds)
Published: Wed Jan 01 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.