VBA to Read or Extract PDF Tables without Reader or Acrobat API - VBA PDF Automation-11

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome back to my channel I'm long Parma and in this video I'll be showing you how to read PDF documents or extract PDF tables without PDF Reader or Acrobat API I'll also demonstrate how to look two rows and columns within the table and in following video I'll also show you how to read PDF forms content without Reader or Acrobat API so before we begin please do not forget to subscribe and hit the bell icon for upcoming videos these are the PDF files that we'll be using for demonstration purpose as you can see the default program to open these files is Chrome at the moment because I do not have the Reader or Acrobat installed let me load up these files first this and yes okay so this first file let's give it a moment this file is a basic PDF document which is a table in between this is the same file we use for demonstration while we did a series on PDF automation using Acrobat API so we'll touch base on this file as well we'll try to extract this table already through all this content of this PDF file and this one is a sample again and I'll report this particular PDF has two tables one is here this is a sample data' and then here's another one so i'll demonstrate how to extract these two tables into your excel and in following video we'll be looking at how to extract data from the PDF form okay we have to use another method to do this so I'll cover it separately in the following video first let's get the file name and we'll move on first let's start off with this file let me get the file name and we'll move on to okay let's insert a module let's call this file name as string and then let's call this sock read from PDF now for this demonstration I'm gonna use the word library so go to your reference and look for Microsoft Word object so here I have our 16.0 object library so whatever is your version just check this box and click on ok now let's start let's call the word app W app as for application and let's call this the dog as document let's just clear the memory now itself like this and then now we can open up set this object equal to word or we can say this dot documents open so here goes our file name and the parameters you can see the file name and then there are so many optional ones that you can input so here one thing to note here is confirm conversion so if you specify the true then you'll get notified like you'll get an option to choose which you know I programmed you to convert it to four now we don't want that prompt so I'm gonna turn back to false okay and here maybe we don't want to see this we'll just turn this of the false as well and we'll close without saving once we're done and here goes our reading art so now before that let's take clear the paragraph and line so Dean let's call this PG ass word dot paragraph and then word line as string something like this now let's true look through all the you know like text in the paragraph and see how it works so now let's look through all the paragraph in our word document the one that we've opened paragraph so PJ and here let's assign the range tax to this one and let's bring this out just just and see the word application now let me first save the file call this and now let's give it a try so now I think we can go ahead and run this good so quickly here we set up the application a new word application the word document paragraph and a variable for string and this is basically to hold the lines later on and then coming here we said you know app we don't want the app to be visible so we set it to false here it opens up this file and then without you know like conversion without conversion we turn it to false and then here it looks through all the paragraphs and then it prints out the tanks okay now let's go ahead and try to run this so as soon as it hits this line is gonna stop just for now remonstration birthdays okay and so it start to print the first line okay so I'm gonna go back to the file this the file we're working on now making a copy of it and a lot of this file so we can see it's like my tonight okay so here you'll get an idea so this part is here and you can see this data is being printed out here okay now I'm gonna let this run true for all the lines without stopping and then once it reaches here it's gonna close the document without saving it and then clear the memory and application okay okay now let me disable this part and I'll show you how to look through all the rows and columns within the table okay so first let me so we're talking about this table now we'll try to extract this table and then we'll cover another example as well first let's take clear variable again team let's call this row table row and slow and then column as well okay now if you want to add a handler you can also say that if this thought tables dot count is greater than zero then we can you know I have this code run here okay and then here we'll put a reference to the table and now we know that there is table 1 table there so I'm gonna put it like this in this fashion and then from here we're gonna look through all the rows and columns okay with this table with the first table so if you have more tables then you can specify table two here or you can look through you know like you can set a counter for tables count and then you can look through all the tables and then I have the result printed out so after we do this demonstration after we complete this demonstration it should be much more clearer now with this table we want to look through so we'll say for this row is equal to 1 2 okay so to be fine 1 to this document this tables rows count okay so we're going to look from row number 1 to 2 maximum rows okay and then within this row we're gonna add another counter to look through the column okay so it's a nested loop go to 1 2 and then the column start count again this dot column is you know like of these columns it's same as writing it in this fashion okay and then this is gonna be key column and here let's try to print the value first no let's trim this dot or we can just say we can set the record for number anything dot range thanks okay let's give it a try at five so it's running just give it a moment as you can see start a print out this one and then this you know let me just do this okay so there you can see side by side how the values are getting print out so portfolios five and then the value here and then now it's gonna print out this so you as you can see like the format is not in the exact way that we want so row treatise column one does not exist so maybe we can just have it run true irrespective so you'll need to add a Merrill resume next and I'll let this line go true and then let this pork rerun it jouji okay so it printed found the last value here 780 okay so now I'll try to write this two cells instead of finding it out here so again we might have to add handler based on the document that you have you need to play around more I'll just add one row down starting from here and then column let it be the T column that is fine let's run this okay so it's not in the perfect manner but it's there so you can organize it better and better let's clean this data a bit so so first let's clean this value this will remove you know like if there's new line or you know like illegal you know like tax and then we'll add one more will trim as well and then within this will add whatever we constructed earlier let's rerun this world okay so now it's much better okay now let's quickly cover this document as well and our report so in this here we have two tables so let's try and print this out as well here's one table with four columns and here's another one so I'm gonna take the pot keep this pot is not gonna work okay so if I try to run this now you should bring the first table only okay now let me make a copy of this file as well so that we can open it side by side so this is the table that we just print it out yeah oh okay we can copy and paste like that so you can see the last value is six five two and then the first one is 132 portal product one two product five okay now let's look through both the tables and then we're gonna print out these tables in no different location oh let's give one more variable table count and then we'll clear one more variable let's call this table index now we can have this handler run this and within the Sandra and instead of Table one will say this table index next table okay so look through all the tables which are there within the document now we've already changed index here before we print out let's make this dynamic now let's add some indicator so that we'll know which table day diet is maybe we can say we can identify the last row and print it out or we can also do that in the more simpler manner as well dimmed row index now let's set this Tarou index or I think we should dynamic click on stroke this will say damn last row as long and here let's set up the last row sheet one based on our a column will identify what is the last Road you know up data dot row okay so this should just return us last row with date down so at the moment is saying seventh it's the last row with data so if I get rid of this it should just give one okay we can use this to Train it to the neuro all the time so we have to have this run all the time now let's try and run this first and then we'll come back okay so here you can see the first table and years a second table now let me add small bifurcation for these two tables now let me move this with them in the table and let's call this one it'll just show like a table bifurcation let's add some more rows there I'm gonna get rid of this [Music] okay this this was not defined here yet that's the reason why it's giving error okay so here's our first table and the second day one okay guys that's gonna be all for this video I hope you found it informative if so do not forget to leave a like and please do subscribe for upcoming thank you so much for watching [Music]
Info
Channel: VBA A2Z
Views: 12,022
Rating: 4.953846 out of 5
Keywords: extract pdf, read prd, without, reader, acrobat api, using vba, pdf to exce, convert pdf to excel, extract pdf data, pdf data extract, extract pdf form data, extracting pdf form data, extract pdf data to excel, excel data from pdf, excel data from pdf form, pdf form data extracted, pdf to excel, pdf extracted to excel, pdf to excel table, pdf form to excel table
Id: KoTuy92uboY
Channel Id: undefined
Length: 20min 42sec (1242 seconds)
Published: Sat Jun 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.