Find, search, get or copy text from scanned PDF using OneNote - free - 2021 - OCR

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
of feature deep dive in today's episode we are going to see one feature of one note which is very useful but commonly unknown very often we get scanned documents or documents where you can't copy paste the text and then you want that text to be copied somehow and that's where typically an ocr kind of software is required if you have a scanner attached to your machine that comes with some kind of ocr software but everyone doesn't have a scanner attached so what do we do fortunately onenote has been having this feature so let's see an example of this so i'm going to show you a pdf file which is basically some kind of contract and this contract is a typical stamp paper based contract it is scanned signed sealed done now let's say i need this contract to be signed with someone else but the original text of the contract didn't come from me that was the other party's lawyer and i need that text what do i do that's scenario number one i have to retype the whole thing in even commoner scenario is i have a long scanned document and this one is three pages but you could have long contracts which can go into hundreds of pages all scanned and now i want to search for a particular clause how do i do that control f is not going to work because what is a scan document every page is essentially a picture so what does that leave me with scroll scroll scroll and read so you know hands or brain hands use brain not used means inefficient so that's where one node comes into picture so what is onenote maybe some of you know some of you don't know so very quickly i will try to explain what onenote really is so onenote is a notetaking software just check if it is already installed on your machine if not onenote is free for entire life you don't need any kind of subscription to use onenote you go to onenote.com it'll redirect to this page just sign in using a microsoft id and download it it's available on all devices it's available on desktop android ios in fact on desktop it is available in two different versions so if i search for onenote i get two types of onenote one is a desktop app which is this this is the desktop app and i also have another app which is a windows 10 app now why there are two kinds of apps let's not bother about that some features are there in desktop some features are there in the windows 10 uwp app so i would say install both now what is onenote onenote is just a collection of notebooks so you start with different kinds of notebooks you say file new create a notebook always create it on onedrive why because onenote app will sync with that once you do that you have one not notebook so this is a notebook and on the notebook what do i have on the notebook within it i have topics you can create as many topics as you want and within each topic you can have as many pages as you want so practically you have unlimited supply of note back notepads or note books or diaries whatever you want to call it it's like an organizer diary physically you're going to carry only one diary at a time here you can carry as many as you need to manage your work so the first thing is when you start using one note create many notebooks logically depending on what kind of work you do having done that now let's come back to the scan document so we have the scan document here and i don't want to search like this now how does pdf or any other application which has something which is a picture which contains text i want to search for it how do i do that there can be hundreds of different applications so what is the linkage between any application which has an image or images containing text and one node onenote can't integrate with unlimited applications so there is one point of integration that is printing so you go to file menu and choose print when you choose print you will get multiple options i am not talking about a physical attach printer i am talking about one note so both of them will work either way so i'm going to do it for one note desktop now the moment you say print what happens the onenote icon may blink and you may not notice what it is asking you it's asking you you have one multiple notebooks which one do you want to put it in that's what it is asking you so what do you need to do you need to choose a notebook after you choose the notebook what do you do just choosing the notebook is not enough each notebook has sections so choose the correct section having done that now what will happen so when i click on the section and click ok what happens it actually prints it so there were three pages there we got all the three pages here what is the big deal what have we achieved so far nothing really because originally in the pdf it was an image and even now it's an image what is the point so the point is this when you go here and search what is search control e is search across notebooks and control f is search within the current page so the moment you type control f notice there are only three images on this page so now i am just going to type one character just one character i am going to type the character s for example notice it said i found the text s in 100 pages 100 instances rather and it's saying do you want to see so where has it found it notice what is it saying now shall second successor so any word which starts with s it has found it's like s star y card now let me change this and put something more sensible obviously you're not going to search for one character so let's say agreement happens between parties so i say party no problem it says the word party comes in 10 instances so now suddenly what has happened in our life instead of scrolling hundreds of pages and manually searching for the clause i am clicking on this button it is going to take me to the next instance whether it is two pages away or 100 pages away doesn't matter less effort more impact means efficiency so that's how this works this is search now this is good but i want the text now you'll notice that this particular document is not very nicely scanned there is a slant you can see if i zoom in there are some blotches it is not a very good quality scan so will it work yes it does work so if i go and try to find the text how do i find it so for those who have been seeing my videos and my methodology you will know that if you need something just assume it must be available it is just a question of finding where it is how do you find it simplest way is right click now this could be one page this could be 100 pages we don't know so even if in the right click menu there was an option called find text do you really expect me to go right click one page copy text paste it right click second page copy paste no that's obviously inefficient so what should we do in that case what we should do is expect that this is a need even if it's a multi-page document i don't want to right click or choose the option called copy text on a per page basis so you have a need and 100 the need is satisfied so what do we have here if you want you can copy text from one page but generally what will you want to do let me see i am overlapping myself hold on so right click and then what do you see various options copy text from one page if you want it that way or copy text from all pages it does not matter whether it is two or two hundred now where did the text go obviously it went to clipboard now how do you paste it you can paste it anywhere now that it is a part of clipboard so let's go to word and create a blank new document and try to paste it now we are not interested really in the formatting but actually as you will see this will give you nice stuff here fairly good some of them will be mistakes depending on the quality and lot of time and energy are saved in the process that's how you get text from a document which is scanned now that looks like the topic is finished no that is a feature and you have understood the feature you have seen one use case for it many people think now yes i know the topic no that is just the beginning this is one scenario which is very good and useful no doubt about that but is that the only scenario no not at all so let's see some additional scenarios where this can be useful so one common thing is when we have a ocr this is by the way called ocr technically and this ocr behind the scenes uses ai to do that this is not available on mobile so if you have to do something on mobile you will have to take the picture let it sync with the desktop and do it on the desktop now very often we have receipts and many times we have to enter receipts like this in expense software or something like that what do you do same thing receipt is a picture by the way it's not a pdf file which was sent as a printout so this is an isolated independent picture so now when you right click on a picture what do you expect only one option because it's only single isolated picture copy text from picture and having done that let's see if we can paste the text here itself and as you will see it has done a fairly good job of finding what was there in the text that's another scenario one more scenario which was relevant when we were physically visiting each other is very often when we go to a customer they would give you a wi-fi card and you quickly want to type that long thing you can directly extract text from there but that's not all think a little and tell me which other scenario in education space can be very useful if you are smart enough and apply your mind you will understand if you think a little you will get it eventually but very often we do open book exams so what happens if you really want to cheat which is not something i would recommend but i know people using this feature for that purpose so practical good use people don't do and misuse is popular so you can do that scan a book and search for it but there is another thing which you may not know handwriting is also understood so if you have a phone and you take notes using just finger scribble that can also be converted to text that's not a scanned document so you can't right click on it and say copy text from picture because when you write using your finger or stylus that's not an image technically it's called ink so if you have something like that for example these are notes i have taken using a stylus now this is not a very great handwriting i've just scribbled it randomly now what do you do with it so if it is ink you go to the draw tab and you will see ink to shape or ink to text depending on what it is so that is another way in which this works there is another way of doing it i told you about scribble on the phone so if you are doing scribble on the phone what happens this is how it'll look this is not even using a stylus i did it using my finger so now just to keep this page i'm going to copy paste it so that i have one copy now what do i do i also select it right where is lasso select draw and lasso select if you why is that important if you just try to select it like this it will not get selected properly so lasso select is better now the ink to text option gets enabled and now notice it has done fairly good job and as you use it more and more it understands and improves its recognition capability so that's that now if any of you are developers you may want to know how this is done behind the scenes this is called as your cognitive services vision api where is that if you go to azure cognitive services language support for computer vision and the ocr part which is built into outlook is not just available for english it's available for multiple languages so if you right click on an image make text searchable by default it will give you three languages but there are i think 20 plus languages supported you have to install the language pack for that purpose so as of now from what i know no indian language is currently supported so that's how scan text can be converted to an image in ocr now let's try this with another more verbose kind of thing so i have a long document now seven page document which is basically a pdf very often people protect the pdf and you don't have this you can't bypass that protection by putting a password microsoft is not going to help you break adobe password but you can still print it and print it to what print it to one note like i showed you earlier one desktop same process it will ask you the same question and let's see if this works so again i'm going to put it in demo so many pages got printed right click copy text from picture and let's paste it in another document in word now what happened it has obviously picked it up but there is a problem here why because you'll notice that employees are bringing suddenly there is a break why because the line broke there and that's why this was forced to put a line break something like that now how do you repair this this is not a part of this topic but just to complete the topic i will show you what did i just do i clicked on this particular button what does this button do it shows or hides hidden characters so what is it showing this character which looks this is the enter key this is the enter key this is called soft enter or new line and this spa like thing is the actual new paragraph so what is happening here there is an enter key and paragraph enter key and paragraph something like that so if you want to get rid of it there is a way of doing that automatically so let's see how that can be done that's not really a one note topic but this is often a side effect of converting scan text to word how do we do it so find replace control h find replace now what are we finding how do i put this character and this character they are coming together i want to just remove them for the time being so more when you don't know something click on more and then we have some special characters here so if you go to special characters paragraph mark there is a shortcut for that this is paragraph mark right and there is another character called line break what is line break where is line breakout so there are so many special characters which we can actually find so what is this correct sign v means new paragraph and new line which is called manual line break is this so actually the order is reverse so i will cut this and put it here so line break followed by paragraph break and what do we want to replace it right now let's say for argument sake nothing so this should work i don't know why it's not working so i'll probably control p and now it has removed all of them that's how you repair badly formatted documents notice it has not removed all the paragraphs the genuine paragraphs which were not associated with the line break are still there so it didn't put all seven pages in one long paragraph so if you now get rid of this hidden characters this is exactly the way the document is so that's how you use onenote in a very effective manner to manage scanned text from a searching point of view as well as to extract the text using built-in ocr so that's it for this episode see you next time thank you
Info
Channel: Efficiency 365 by Dr Nitin
Views: 7,355
Rating: undefined out of 5
Keywords: OneNote OCR, Microsoft OneNote search scanned document, Scanned document search, find text in scanned document, get text from scanned pdf, scanned pdf get text, scanned pdf search, scanned pdf find, scanned pdf find text, onenote get text from scanned pdf, onenote text from picture, onenote search text in scanned pdf, onenote copy text from scanned pdf, onenote handwriting to text, ocr, optical character recognition, dr nitin paranjape, microsoft onenote, onenote
Id: R-_pelnEsnk
Channel Id: undefined
Length: 18min 51sec (1131 seconds)
Published: Thu Aug 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.