How To Merge and Split PDF Files Using Python (Python Automation Tutorial For Beginners)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey howson guys in this tutorial we'll learn how to merge and split PDF files using python if you are working with confidential PDF files and don't want to use those three tools on the internet and I want to pay for PDF subscription service to access the features then this is a pretty handy script to save you some headaches so to merge and split PDF files I'll be using the python Library called pi pdf2 Pi pdf2 is a free and open source pure python PDF Library capable of splitting merging and cropping in and transforming the pages of PDF files now I'll choose the pi PDF to python package we'll first install the library all right so open your terminal and research my environment first and to install the pi PDF to python package with the command pip install pi pdf2 and make sure that everything is lowercase enter and once you install the package we can now go ahead and start with the examples for the first example I'm going to show you how to split a PDF file into pages so here my project folder in this example one folder I have this PDF file okay let me show you the content from this PDF file I have eight pages of pages down to split into individual files right so show me close this file so here my python script from the pi PDF to module I'm going to import a PDF by the class in the PDF reader class so these two classes are all we need to split in most PDF files now to split a PDF file first we need to open the file first so here we come to type we open and here I'm going to insert the file path and it's going to be located under this example one folder now grab the file name first and I need to open a file as binary so here for the mode I'm going to set that to RB list file and once we open the PDF file we need to clear PDF reader Archer so from the PDF reader class we're going to pass the file object and I'll name the outputs PDF reader right so there are a couple of things we can do with this PDF reader object but first I want to figure out the total pages so I'll create a variable code total pages and from the PDF videos after we can access all the pages by referencing the pages attribute negative function to count how many pages that wig shop phone this is a PDF file now to split the PDF file into individual pages so here I'm going to insert Loop I'm going to say for index and page it in PDF reader.pages and because I want to extract the page number here I'm going to wrap this output with the enumerate function to return the index number as well all right so here I'm going to create a PDF by the Archer is equals to PDF writer next we need to add the PDF page to this PDF render aperture now to add the PDF page we need to see add page method and now insert the page object now here this is a PDF brighter after represent the PDF file itself and to save the PDF page so we can insert another waste statement and it's going to be the file path of the target file foreign folder I'm going to type the di3 first followed by the file name and for the file name I'm going to just name to page file by the page number dot PDF and here we need to set the mode to right binary now name this as output and to create the output file we need to reference the PDF writer object that right and write a content to the output object and that's it now if I go ahead and run the python script it'll Slide the script smash and he only a single PDF file is created you see oh so here I forgot to assign the variable so it's going to be a format and I need to assign the page number right so this is going to be index plus one let me try again all right so if I look at my files these are the eight pages from the PDF file now let's go to example two which is two most PDF files all right so we have a typo right so to uh most PDF files so here in my example to border I have this APD files that I want to merge into a single file now again I'm going to grab this input statement and I'll copy and paste and here I'm also going to import the OS module and that's because I can use the OS dot plus the function to list all the files in a folder okay so here I insert the folder path and this will be example two now if I go in the wrong this code block and based on this list the function and it's going to return this the returning all the files and folders giving this uh directory path and because among interest in PDF files so here I want to create lists that contains all the file names of the PDF files so I can say or I'm going to insert underscore as placeholder for each item found this output here and if the family ends with the PDF they want to store the file name all right so here I'll name the outputs PDF files next I'm going to create a PDF Bridal object and to merge the PDF files here I need to insert Loop to iterate each PDF file from the PDF files list inside the loop I need to open each file individually and the file is located in example two folder and I know the PDF file variable is going to be the file name so here I'm going to insert the PDF file variable and we need to set the model 3 binary and now name the object as file now here to load the PDF file so I'm going to create a PDF reader option and I need to insert the file object to load the file the next step is paste on this PDF graded object which is going to be the PDF file itself we need to iterate each page individually so here I'm going to say for page in PDF reader data pages and now add the pages to the PDF by the object using the add page method now instead of page object here now based on these two Loops the first Loop is going to erase each file the second Loop is going to iterate each page on each file and once the operation is finished we can go handle and save the file to in this case I'm going to save the output to this output to folder and I'll name the file merge.pdf and for the mode we need to set this to right binary and now name the object as output now from the PDF writer option we can create a file using the right method and now insert the output object as the PDF file content now this is going to be the entire script to merge PDF files and it's only 16 lines of code now let me terminate this session and I'll pass F5 to run script and if you have a typo let me fix that alright so let me try again now this time I'm not running into an ear so if I go into my output to folder it has the most PDF file now if we're looking at the file itself and after the merge my PDF file now has 15 pages so this is going to be if someone come to cover in this video and hopefully you guys find this video useful and feel free to post your feedback and your question in the comment section below and don't forget to subscribe and I'll see you guys next time bye
Info
Channel: Jie Jenn
Views: 2,762
Rating: undefined out of 5
Keywords: python automation, pdf merge, pdf split, python project
Id: C58_VRClsFs
Channel Id: undefined
Length: 9min 23sec (563 seconds)
Published: Wed Feb 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.