Extract Text from Video - images | Tesseract

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello dr pi uh today we need you to study for an exam but we can't provide you with any uh books or anything so you'll just have to work it out by yourself so well nothing nothing at all you know what no no no we can't sorry no so yeah let me know when you've done it thanks hello everybody so we're going to be looking at how to extract text from a video in my case this is going to be a youtube video that i need to extract text for so that i can study some notes there's a video with copious amounts of exam questions which i need to study and prepare for as i've not been given any official materials so what we're going to use is pi tesseract which is an optical character recognition tool for python that is it will recognize and read the text embedded in images now basically a video is a bunch of images as we all know 25 per second typically so we don't want to extract 25 images for every second of video or in this case anyway so what we want to do is extract an image every one second or every four seconds and then we want to read that image file or scan it with ocr and make some sense of it and extract some text now how we're going to do it well we're going to use python we're going to use pil we're going to use pi tester act and yeah so let's have a look at what start the code's going to look like so we're going to import tesseract ocr lib tesseract which is basically a developer tools extra bit for tesseract eil hello we're going to import something called ones.image nltk which we can use to do some ml machine learning natural language processing in due course and also we're going to use we'll import numpy just in case we're going to use import os so that we can create some folders to store our images and we'll import cv2 so first thing we want to do is we want to set up a folder to contain our images because then we will have somewhere to store them ready to process first thing we'll do is we'll write a function and we'll just do that files and what we'll do is we'll we'll remove the image frames if it exists and then if it doesn't exist we'll create image frames and we will make we'll make a directory of called image frames which is basically what we've done there so if we ever want to change the name of it we can change it here and we just need to change it in one place what we also will do is we'll specify the the source of the video path and if i just bring up the sidebar you'll see i've just i'm just going to put the video in the same directory as all of the code so nothing fancy there and we're going to use cv.video capture so that's actually what's going to handle the video side of things and all we're going to do is we're just going to return the source video once we've made all the folders and files so once we've done the setup side of things making the once we've set up the video we've made the correct folders then we're actually going to process it so what we need to do is write a function i'm just calling it process and it's going to take the uh the return returned object from the previous function and what we're going to do is we're going to just say while the video is open we're going to set ret and we're going to set frame to the object that gets read from the source video and then when we get to the end it will break so name each frame and save as a png so what we're gonna do is we're gonna just store each png in succession and we're going to miss out every so i for my purposes i'm going to store one frame every four seconds but obviously change that to suit your needs so modulo modulus 100 so if the frame number is divisible by 100 exactly then it will return nor and it will do this so extract the frame and it will do cv2 in right so that's the method of cv2 which the function in right saves the image to the specified file the image format chosen is based on the file name extension in general only eight bit single channel three channel bgr channel order it can be saved using this function we don't need to worry about that it works right next index equals index plus one so obviously we're just going to keep going until we've got no more images to process then uh if cv2.weight key basically it waits for a key press so if we want to want to quit then we just it's going to look for you for quit so then we do we release the objects and we destroy all windows which is a method of cv2 so what we've done so far is we've set up some folders we've read the video and we've extracted one image every four seconds we've taken a snapshot every four seconds of the video so next we actually need to get the text so far so good what we need to do is say or each of the images in our images folder that we've just create put images into uh then we say for i in os dot list which is just going to say for all of the images in that directory i'm just printing the that for my you can remove that to be on and my example equals image open so i'm going to open the image and then we're going to pass that to pi tesseract or image underscore to underscore string and i've been experimenting with different languages and you need to go to github and download the um you need to download the additional data file to work with other languages other than english so i'm just going to set that back to in fact i think if you don't use if you don't specify it i think it just defaults to english anyway language equals fun let's just okay so then we just have a main driver which is going to run the three functions first one sets up the files and the folder structure the second one gets the creates pngs from the video every four seconds and get text will go through each of those images and try and extract us some text so if you're ready to watch it run um as you can see i've tested it already it would have been a rubbish video if i'd kept testing it and it failed so um i've got my test video there test2.mp4 i will just delete that just for clarity so you can see um don't worry about obs that's just my video recording directory so if you're ready let's run it so python uh i haven't actually showed you the video so if we just uh go open container folder and we're just going to play you the video that i'm going to extract the data from so it's here and this is the one with some exam example exam questions in it and it's about a 40 minute long video and it just it's just a video of exam questions that flash up uh about every 10 20 seconds so i really don't want to sit through 40 minutes just to see a screen change every 20 seconds or 30 seconds and you'll see that that page should change shortly so this is why i only really want to do it every four seconds because more than that you just end up with the same page over and over again but you hate it when it when your phone rings when you're in the middle of recording something right okay let's go so extracting frames 100 200 300 so on and we've got an error because i think it's because i've been right let's just get rid of lang equals i'm just going to leave that in that leave that blank and run it again it's because i was um it's probably en or eb or something just when again if you go to the github page you can see all the different languages available and um couldn't load any languages okay let's try ian if this fails then i'll have to do the uh we'll have to look it up on the github that's the rx github page no that didn't work okay let's just look that up quickly okay so let me just zoom in on this a bit and you can see burns all currently supported languages uh but i can see in the notes down here somewhere that i just saw that you can use eng but it's lowercase cng basically you go for chinese simple justify two languages so let's do search for eng there we go defaults to eng if not specified right let's uh just run it one more time this has been working so i think when i was testing it with the french attempting to translate it into french it's broke it so uh yeah eng lowercase eng and there we go we've got some text i'm going to need to experiment with modifying the modulus number because i think every four seconds is potentially too many so i might i could go up to maybe 150 but as you can see i've got answer e okay that's because my vid in the video sometimes the the um the page spans the question spans across two pages though it's not actually the thought of by tesseract or the code it's actually um it's actually before of the layout of the text on the video i believe yeah there we go so which option provides the best performance for running oltp workloads in oracle cloud infrastructure and exadata db systems answer a so there we go there's another one i think that's actually missed something out there not sure um i'll go back and investigate that obviously it's not it's never going to be 100 perfect it's always going to be sort of bits that it can't read but i think in general that's pretty good so um from a video of text which was about 40 minutes long i could then go through and rather than sit through a 40-minute long video of exam questions i could set this to maybe let's do it with so how many files did i get there i got um that's too many was at 12. so let's do every 200th every 200 i should only get half as many files clear that man right okay and again so we should only end up with eight files eight lots of processed text there we go so as i say there's uh i think there's there may be issues where it's get missed not sure but here but as i say the the actual video when um we look at it again in the actual video the questions and span um can span two pages sometimes so okay so there we have network bandwidth is variable answer a and we just need to play okay so we had three what was this this was question eight so it looks like we may have skipped from question three to question eight so all it really requires is a little bit of fine-tuning with the um the index brings that back to 100. but we obviously once we've got this text we can then save it all to um we can merge all of the the text responses from each page each converted page back into one big text file and then we can clean it up a bit but i think you'll agree that this is pretty impressive and it's extracted a nice lot of clean text without any um you know new line characters or strange ascii characters in it so i think this is pretty good and it's going to be a starting point but also you could use it for machine learning as well and that's obviously why we started to um use i've imported nltk in tokenize and what you can do is if you're familiar with bag of words or pos tag part of speech tag yeah then we can go off and get all of the words from a video and then we can begin doing some natural language processing analysis on it so you could also run all this within pandas as well so you may want to do that it may actually be beneficial to use pandas but i just thought for the sake of this video it would be good to run it within vs code and show the processed output in the command line as it happens um obviously it happens in pandas as well but yeah anyway i've done it in vs code and there we go we have successfully extracted text from a youtube video now if you're not familiar with how to get download a youtube video then there's plenty of info out on google on how to do that you can download via an online converter or an online online downloader or you can also do using your own python code so um i'll leave that with you and thank you for watching and if you've got any questions or any ideas suggestions for future videos then please get in touch would you like to see i think scrapey's probably just on the sidelines for the time being if you'd like to see some more scrapey videos asap then let me know would you prefer some scrapy or would you like to see some more um clever stuff with ocr and pi tesseract and um extracting text from images because really this although this is a video it really is just a sequence of images and as you can see we have still images from a video and really if you didn't even want to um if you didn't even want to extract a text you could just run the first part of this and you wouldn't even need to you wouldn't need to run get text if you run this without if you just come out get text you just end up with a load of frames so there we go thank you for watching and don't forget to subscribe thank you you

Info

Channel: Python 360

Views: 18,176

Rating: undefined out of 5

Keywords: pytesseract, text recognition, ocr tesseract, ocr opencv, python ocr, python text recognition, python 360, PIL, Pillow, Python YouTube Extract text, ocr video images, text from youtube video, image to text, convert video to text, Extract Text from Video - images

Id: EkSaIJTruTA

Channel Id: undefined

Length: 18min 20sec (1100 seconds)

Published: Mon Jul 12 2021