Extract Text from Video - images | Tesseract

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello dr pi uh today we need you to study  for an exam but we can't provide you with any   uh books or anything so you'll just have to work  it out by yourself so well nothing nothing at all   you know what no no no we can't sorry no so  yeah let me know when you've done it thanks   hello everybody so we're going to be looking at  how to extract text from a video in my case this   is going to be a youtube video that i need  to extract text for so that i can study some   notes there's a video with copious amounts of exam  questions which i need to study and prepare for as   i've not been given any official materials so what  we're going to use is pi tesseract which is an   optical character recognition tool for python that  is it will recognize and read the text embedded in   images now basically a video is a bunch of images  as we all know 25 per second typically so we don't   want to extract 25 images for every second of  video or in this case anyway so what we want to do   is extract an image every one second or every four  seconds and then we want to read that image file   or scan it with ocr and make some sense of it  and extract some text now how we're going to   do it well we're going to use python we're going  to use pil we're going to use pi tester act and   yeah so let's have a look at what start the  code's going to look like so we're going to import   tesseract ocr lib tesseract which  is basically a developer tools extra bit for tesseract eil hello we're  going to import something called ones.image   nltk which we can use to do some ml machine  learning natural language processing in due course   and also we're going to use  we'll import numpy just in case   we're going to use import os so that we can create  some folders to store our images and we'll import   cv2 so first thing we want to do is we want to set  up a folder to contain our images because then we   will have somewhere to store them ready to process  first thing we'll do is we'll write a function and   we'll just do that files and what we'll do is  we'll we'll remove the image frames if it exists   and then if it doesn't exist we'll create image  frames and we will make we'll make a directory of   called image frames which is basically what we've  done there so if we ever want to change the name   of it we can change it here and we just need  to change it in one place what we also will do   is we'll specify the the source of the video path  and if i just bring up the sidebar you'll see i've   just i'm just going to put the video in the same  directory as all of the code so nothing fancy   there and we're going to use cv.video capture  so that's actually what's going to handle the   video side of things and all we're going to do  is we're just going to return the source video once we've made all the folders and files so once  we've done the setup side of things making the once we've set up the video we've made the correct  folders then we're actually going to process it so   what we need to do is write a function i'm just  calling it process and it's going to take the uh   the return returned object from the previous  function and what we're going to do is we're   going to just say while the video is open we're  going to set ret and we're going to set frame to   the object that gets read from the source video  and then when we get to the end it will break so   name each frame and save as a png so  what we're gonna do is we're gonna just store each png in succession and we're going to  miss out every so i for my purposes i'm going   to store one frame every four seconds but  obviously change that to suit your needs so   modulo modulus 100 so if the  frame number is divisible by 100   exactly then it will return nor and it will do  this so extract the frame and it will do cv2   in right so that's the method of cv2 which  the function in right saves the image to the   specified file the image format chosen is  based on the file name extension in general   only eight bit single channel three channel bgr  channel order it can be saved using this function   we don't need to worry about that it works right  next index equals index plus one so obviously   we're just going to keep going until we've got no  more images to process then uh if cv2.weight key   basically it waits for a key press so if we want  to want to quit then we just it's going to look   for you for quit so then we do we release the  objects and we destroy all windows which is a   method of cv2 so what we've done so far is we've  set up some folders we've read the video and   we've extracted one image every four seconds we've  taken a snapshot every four seconds of the video   so next we actually need to get the text  so far so good what we need to do is say   or each of the images in our images folder  that we've just create put images into   uh then we say for i in os dot list which is  just going to say for all of the images in   that directory i'm just printing the that  for my you can remove that to be on and my example equals image open  so i'm going to open the image   and then we're going to pass that to pi tesseract  or image underscore to underscore string and i've been experimenting with different languages  and you need to go to github and download the   um you need to download the additional  data file to work with other languages   other than english so i'm just going to set  that back to in fact i think if you don't use   if you don't specify it i think it just defaults  to english anyway language equals fun let's just okay so then we just have a main driver which  is going to run the three functions first one   sets up the files and the folder structure  the second one gets the creates pngs from   the video every four seconds and get text will go  through each of those images and try and extract   us some text so if you're ready to watch it  run um as you can see i've tested it already it would have been a rubbish video if  i'd kept testing it and it failed so   um i've got my test video there test2.mp4 i will  just delete that just for clarity so you can see   um don't worry about obs that's just my video  recording directory so if you're ready let's run   it so python uh i haven't actually showed you the  video so if we just uh go open container folder   and we're just going to play you the video that  i'm going to extract the data from so it's here   and this is the one with some exam example  exam questions in it and it's about a 40   minute long video and it just it's just  a video of exam questions that flash up   uh about every 10 20 seconds so i really don't  want to sit through 40 minutes just to see a   screen change every 20 seconds or 30 seconds and  you'll see that that page should change shortly so this is why i only really want to do  it every four seconds because more than   that you just end up with the  same page over and over again but you hate it when it when your phone rings  when you're in the middle of recording something   right okay let's go so extracting  frames 100 200 300 so on and we've got an error because i think it's  because i've been right let's just get rid of lang   equals i'm just going to leave that in that leave  that blank and run it again it's because i was um   it's probably en or eb or  something just when again if you go to the github page you can see  all the different languages available   and um couldn't load any languages okay let's  try ian if this fails then i'll have to do the uh we'll have to look it up on the  github that's the rx github page no that didn't work okay let's  just look that up quickly okay so let me just zoom in  on this a bit and you can see   burns all currently supported languages uh but  i can see in the notes down here somewhere that i just saw that you can use eng but it's  lowercase cng basically you go for chinese simple justify two languages so let's do search for  eng there we go defaults to eng if not specified right let's uh just run it one more time this has been working so i think when i was testing it with the french  attempting to translate it into french it's   broke it so uh yeah eng lowercase eng and there  we go we've got some text i'm going to need   to experiment with modifying the modulus number  because i think every four seconds is potentially too many so i might i could go up to maybe  150 but as you can see i've got answer   e okay that's because my vid in the  video sometimes the the um the page spans   the question spans across two pages though it's  not actually the thought of by tesseract or the   code it's actually um it's actually before of  the layout of the text on the video i believe yeah there we go so which option provides the  best performance for running oltp workloads   in oracle cloud infrastructure  and exadata db systems answer a   so there we go there's another one i think  that's actually missed something out there not sure um i'll go back and investigate that   obviously it's not it's never going to be  100 perfect it's always going to be sort of bits that it can't read but i think in general  that's pretty good so um from a video of   text which was about 40 minutes long i  could then go through and rather than   sit through a 40-minute long video of  exam questions i could set this to maybe   let's do it with so how many files did i get  there i got um that's too many was at 12.   so let's do every 200th every 200 i  should only get half as many files clear that man right okay and again so we should only end up with eight  files eight lots of processed text there we go so as i say there's uh i think there's  there may be issues where it's get missed not sure but here but as i say the the actual video when um  we look at it again in the actual video   the questions and span um  can span two pages sometimes so okay so there we have network  bandwidth is variable answer a   and we just need to play okay so we had three what was this  this was question eight so it looks like   we may have skipped from question three to  question eight so all it really requires is   a little bit of fine-tuning with the  um the index brings that back to 100. but we obviously once we've got this text we can  then save it all to um we can merge all of the   the text responses from each page each converted  page back into one big text file and then we can   clean it up a bit but i think you'll agree  that this is pretty impressive and it's   extracted a nice lot of clean text without  any um you know new line characters or   strange ascii characters in it so i think this  is pretty good and it's going to be a starting   point but also you could use it for machine  learning as well and that's obviously why   we started to um use i've imported nltk in  tokenize and what you can do is if you're   familiar with bag of words or pos tag part of  speech tag yeah then we can go off and get all of   the words from a video and then we can begin doing  some natural language processing analysis on it so   you could also run all this within pandas as  well so you may want to do that it may actually   be beneficial to use pandas but i just thought  for the sake of this video it would be good to   run it within vs code and show the processed  output in the command line as it happens um   obviously it happens in pandas as well but yeah  anyway i've done it in vs code and there we go we   have successfully extracted text from a youtube  video now if you're not familiar with how to   get download a youtube video then there's  plenty of info out on google on how to do that   you can download via an online  converter or an online online downloader   or you can also do using your own python code  so um i'll leave that with you and thank you   for watching and if you've got any questions  or any ideas suggestions for future videos then   please get in touch would you like to see i think  scrapey's probably just on the sidelines for the   time being if you'd like to see some more scrapey  videos asap then let me know would you prefer some   scrapy or would you like to see some more um  clever stuff with ocr and pi tesseract and um   extracting text from images because really this  although this is a video it really is just a   sequence of images and as you can see we have  still images from a video and really if you didn't   even want to um if you didn't even want to extract  a text you could just run the first part of this   and you wouldn't even need to  you wouldn't need to run get text   if you run this without if you just come out get  text you just end up with a load of frames so there we go thank you for  watching and don't forget to subscribe thank you you
Info
Channel: Python 360
Views: 18,176
Rating: undefined out of 5
Keywords: pytesseract, text recognition, ocr tesseract, ocr opencv, python ocr, python text recognition, python 360, PIL, Pillow, Python YouTube Extract text, ocr video images, text from youtube video, image to text, convert video to text, Extract Text from Video - images
Id: EkSaIJTruTA
Channel Id: undefined
Length: 18min 20sec (1100 seconds)
Published: Mon Jul 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.