043 - Adding a Text Layer to a PDF with OCR Software (Demonstrating OCRMYPDF)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody in this video we're going to be covering how to use an OCR utility an optical character recognition utility as well as demonstrating how the same uh the same utility can be used to compress our file to remove the file size okay I have this document called yeah the it it's part of a book called Kings King Solomon's Minds King Solomon's Minds by writer Hagrid a great book I I really recommend it um but anyway this this book has no textual overlay it has no text data for us as humans we're able to look at this and we're able to see that there is text on the screen but the computer does not recognize any of this information as being textual data for a computer this is only image data these are simply a series of images that are put together and stored in the PDF format so in order to turn this into or to extract the textual data I am going to use a utility known as OCR my PDF OCR my PDF and if we take a look at the the help page for that take a look here OCR my PDF so it's a it it's a very it's it it's a very powerful utility it has a lot of different options you can create it and you can create highly customized approaches to performing the OCR process as well as compressing the data so with that in mind we're going to do two things one of them is to OCR this document and we're also going to compress so you see this optimize optimize uh the data and so we're going to use the dash o flag with a number two and this will reduce the file size okay so let me clear the screen and let's use the ls command followed by LH to see what files we have we see that we have this document all in all caps Kings or King Solomon's minds.pdf and you see the file size is 4.3 megabytes 4.3 megabytes and we want to both run that file through the OCR process as well as optimize the data so with that let's go ahead and get started OCR my PDF using the dash O command with a 2 with a setting of two which will optimize it more we then hand it all caps King Solomon's minds and the file that we will be writing to is ksm.pdf ksm.pdf so here we go so you see it's taking a little bit of time not too bad though it does take more time with larger files all right so it performed the OCR process and now it is optimizing the data and should be done in just a little bit we need elevator music right here I don't know that Melody all right so let's clear the data and let's take a look let's use the l h command and we see we have KSM so look at the file difference between uh capital K King Solomon's Mines so all all caps King Solomon's minds and then the KSM file so technically those are both the same they will look the same but notice that that KSM is is half the file size half the file size so that shows you the power of using um using Optima optimization so let me go ahead and take a look at those files over here so if you remember King Solomon's Minds was not searchable if I open up ksm.pdf and fit it to the fit it to the width there we go so now I can select I can select the text copy Ctrl C and then run over to LibreOffice dock and paste that information look at that so you see that that information has now been turned into textual data that can be copied pasted searched for for instance I go back over to this document and using Ctrl F to search the document I can search for Haggard and it gives me the locations of haggard Haggard there we go so that is a demonstration of how to use an OCR utility as well as compressing the data thank you very much for watching and have a wonderful day bye
Info
Channel: Nathan Tonning
Views: 1,322
Rating: undefined out of 5
Keywords: Linux, ocrmypdf, PDF editing
Id: AWZWbVnGYos
Channel Id: undefined
Length: 6min 8sec (368 seconds)
Published: Sun Mar 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.