OpenCV Intro to Character Recognition and Machine Learning with KNN

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello again everybody uh in this video we're going to perform uh character recognition via the K nearest neighbors algorithm in open CV and we're going to go uh get to the implementation details uh shortly we have a nice example project to go through but before we do that first I'm going to give you a quick overview of the KNN algorithm for those that may not be familiar with it um I'm not going to get into too much detail as far as the algorithm here this is just a broad very highlevel overview of the general steps involved and what the algorithm is all about so let's say we're trying to detect uh the digits 0 through 9 so for any machine learning process of course you have to have a training data set so let's say we had a training data set of five each of the characters Z through n so for example we would have five zeros 5 1es 5 twos Etc through 5 9es so of course 10 digits times five um training images of each digit we would have 50 total training images and then also to have uh just to have some numbers to work with we're going to actually use different numbers when we implement this momentarily but just for the purposes of this slide suppose that we're going to resize each of the training images to be 10 pixels by 10 pixels so in other words each image would have a total area of 100 pixels and we'll do the same when we're attempting to identify in our test Set uh later on so as a result of the training process what we're going to have is we're going to have two parallel data structures we're going to have one will be the set of images of course the 50 images and then the other one will be a set of numbers indicating which group or classification either of these ter uh terms are pretty common for machine learning either is acceptable uh which group or classification each corresponding image is in so for example we'll have a classifications data structure uh that's going to be in this case for example suppose the first five images were the five zeros so the classifications data structure would be 0 0000 0 0 and then the five ones 52s Etc through 59s and then we have our 50 images corresponding and once training process is done then we could begin testing uh that is to say identifying characters under test so let's say suppose that we're trying to identify an unknown digit X and um what we would start with doing is or the Cann algorithm itself would start doing is it would identify the nearest Neighbors in order so in other words the terminology nearest neighbor means the best match to x out of the training set so for example let's say that uh when we compared X to our training set that the best possible match we found that 99 of the 100 pixels in X matched one of the training images that happened to be a zero so that would be the nearest neighbor let's suppose that the second best match 97 of the pixels matched and that happened to be an eight and the third best match 96 of the pixels matched and that happened to being eight and so on for the fourth and fifth best match once again being zeros and then all the way down to the second worst and worst matches uh only four and two of the pixels matched and let's suppose that was for completely different numbers so um the next thing we would have to to consider is what is our K value K referring to number of nearest neighbors so in our example data set here let's suppose that we set K to be one then we would determine that X is zero because we're only looking at the one nearest neighbor and that happened to be a zero let's say we set k equals three well in that case we would have zero but then 28s so two out of these three would be eights so then we would say that X is an eight and then for example if we chose K to be five then we would have uh three zeros out of five so then we would say k equals 5 so uh that's sort of an overview of how the K&N algorithm works and um one specific thing that I should point out here as far as choosing K value for relatively obvious reasons you don't want to choose an even number for example if you chose K to be two well would it be a zero or an eight that's not necessarily clear So to avoid ties like that you want to choose K to be odd and the other thing of course is that you cannot have a k value that is larger than the smallest number of samples for any of your classes so for examp example suppose we had only three images of the character Zero but then suppose we had 50 images of 1 2 3 4 Etc through 9 well we still couldn't pick a k larger than at the absolute largest three because we only have three zeros so that is a limitation to keep in mind in your data set um now in the example program that we're going to get to we're going to work with very simplified data uh if you were going to do this in a production environment you would test with different fonts different sizes if you were going to read people's hand handwriting you would test thousands of different handwriting samples but we're going to work with a data set where we have the digits 0 through 9 and we simply have one instance of each to keep the uh program and the training time down and one additional thing that I should also mention is that some of you may have heard of svm that's short for support Vector machine uh which is also a machine learning algorithm that's not all that different from KNN really probably the most fun fundamental difference between svm and KNN is that effectively with svm well first let's consider KN andn what we did here is we compared all of uh image X to all of image 1 through 50 in our training set um if you had a really large size training set or huge images that would take a long time uh what svm fundamentally does it's different it does multiple other things different but to me the biggest difference is that with svm you would find the key points of X and then you would find the key points of the training imag and then you would simply compare key points so if you had images that were say a million pixels or you know very large images and a huge set of images it would still be workable TimeWise I may do a future video on svm or I may do that as my next video not really sure uh if people have interest in that maybe Post in the comments but in any case for now let's get to implementation of K&N okay so if we go to my GitHub page and I'll just type in in case you don't have a book marked GitHub microcontrollers m i r o c o n t r o l e RS and more and there we go pops right up and you probably want to go ahead and bookmark this if you have not already uh let's see repositories and one additional thing I should mention is um open CV 2.4.1 Windows installation guide uh I'm going to suppose going forward in this video that you've already followed the steps in installation sheet sheet one if you'd like to use C++ or installation sheet sheet too if you would like to use Python and on my um YouTube channel I do have corresponding videos for those installations as well so please follow those through if you have not had a chance to do that already uh and so what we're going to do continuing for this project is open cvk andn character recognition machine learning and first let's just take a quick uh Overview at everything that's here so Canon overview this is simply that slide I was just uh showing earlier and then we have uh two programs here and a C++ version and a python of each the uh python version of each the one uh the first program rather that we're working with is generate data. CPP or generate data. py whichever you prefer and then the second program is going to be train and test.cpp or train and test.py and then the other contents here read me is essentially the read me just says refer to the video which you're already watching so honestly you don't really need to go through that and there's two more here test numbers. PNG and training numbers. PNG so training numbers is simply uh 0 through N9 written out and again for a production grade program you would need a much bigger training set than this but just for our simple example today this will work and then test numbers is this is what we're going to detect the characters in and I just randomly chose pi as a good choice of just a string of numbers to use you can use any string of numbers that you like and also just to show you that there's nothing special about these two particular images rather than downloading them off of here which you can do to save time if you like rather than doing that for the demonstration today we'll actually recreate those uh via screenshots we'll get to that in a moment so uh let's start with generate uh generate data. CPP and then we're going to go to Raw so we can copy and paste and go ahead and fire up visual studio and then we're going to make a new project here file new project and let's see here's the installation cheat sheet that I mentioned earlier this is installation cheat sheet one we're going to use this to configure our project here so let's call it generate data so let's see uh visual C++ Windows 32 console application let's call this generate data one uh usual location uncheck those next uncheck those empty project console application and finish and when it comes up add new item C++ file call it the same name as the project add and back to to the GitHub page here control a contrl c and let's make that just a little smaller contrl V and we're going to get a whole bunch of red lines because we have not yet told Visual Studio where to find open CV so at this point I'm just going to re repeat the same steps as in the installation guide here so configuration visual C++ directories include directories edit and there's our include directory then Library directory edit and I'm simply copying and pasting out of the installation sheet sheet one if you're wondering here's here's what I have on the other screen just copying and pasting out of there so there's Library directory and then we're going to go to Linker input and copy and paste our open CV libraries into there and then apply and okay and there we go all our red lines disappeared okay and let's see if we go to where we just made that project so visual studio 2013 PR CPP generate data 1 okay so we're going to need an image in here to recognize characters in so let's go ahead and make the image now and we're just going to do this here 0 1 two 3 four five six 7 eight nine and I'd suggest a pretty good siiz font here font let's go maybe 28 anything in the 20s should be fine make it nice and big and then I'm going to press control print screen and then fire up Microsoft Paint and then go ahead and paste in that screen print there's my other screen momentarily here's a screen I'm working on currently in the video and then if we draw a box around these and choose crop and then we do save and we have to save it to that visual C++ project directory so 2013 proges CPP generate data one save it and we'll rename it in a second okay we can minimize that for the moment and if we look down the program here training numbers. PNG that's the name we're looking for so we need to change this from untitled.png to training numbers. PNG so rc+ program can find it and I just paused for a second to change back to the large cursor hopefully that makes this a little bit easier for anybody viewing this video in uh anything less than 1080p resolution so uh if we didn't forget anything we should be able to start that up and there we go okay so here's the windows that we have up um this is the image simply reproduced here so this image here is all that this is and then we've additionally drawn a red box around the current character that we're trying to recognize uh this is the threshold image we'll get to that in code a little bit later but this is just printing it to to prove out that the thresholding process worked okay and then these two images here uh the larger of these two sevens is that's this seven cropped out and the smaller of the two sevens is this top one resized to a fixed size again we'll get to that later in code so now at this point I'm simply going to call out the numbers as I'm keying them in here 754 986 3 2 1 and zero training complete press any key to continue so now if we go back to that same directory we're going to see two files here classifications. XML and images. XML that were not there before and these are in XML uh file format this is the data we just entered these are the numbers 0 through n um of course they're higher numbers simply because of the conversion to the asking number matching that character and you'll notice there's a decimal after each uh the Cann object requires things to be passed in in float format we'll get to that in the second program but that's the reason for the decimal is because they're floats but other than that that's the data and the asky conversion of course that's the data we entered and now images of course is going to be the 10 images and their we'll get to this more in a moment later but that's a fixed width of 20 and a fixed height of 30 so it's 10 images that are 20 X 30 so of course images is going to be much more voluminous in terms of data and these numbers here are simply our images and once again there's a DOT after each number because they have to be stored as floating point so getting classifications and images was the idea behind that so that was successful so now let's go to I'm just going to open up a second Visual Studio here easier to have two open at one once I think so now we're going to go back to here and we're going to go to train and test.cpp raw and then I'll just be copying and pasting out of this so train and test so X out of that uh file new project uh 132 console application training test one usual location uncheck those next uncheck those empty project console application okay that's good and now we're going to make our C++ file add new item let test one CPP CPP okay that's good make that a little smaller and contrl a contrl c and contrl V and now if we check here train and test one so we're going to need to add some things to this directory for the program to work um the first thing that we're going to need do is we're going to copy classifications and images over to train and test so that data is available to train and test and now the second thing is we're going to need to add an image here to detect characters in so you could pick any number sequence you like uh just common number sequence everybody's heard of is pi so let's use that so let's see if we bring up notepad and we just paste over that and then control print screen and then start paint select paste that in there move over here draw a box around that choose crop and then save and we'll save that to visual studio 2013 Prue CPP train and test for now we'll just save it as Untitled we're going to have to rename it of course okay we can close that let's just minimize notepad for the moment okay and we're going to have to rename that too whoops we didn't put in our reference es yet have to do that as well training images okay we're going to rename our image to testore numbers so now our program will be able to find that and there's our image that we just made and let's go ahead and do our references from the cheat sheet to get rid of all these red underlines here so project test one properties and let's see here include directory configuration properties visual C++ directories include edit new paste and Library directory is next li Library directory whoops edit and new paste and now Linker input and copy and paste in our libraries additional dependencies edit paste apply okay and if I didn't miss a step there good all our red underlines are gone and we have the three additions to this directory that the program needs classifications. XML images. XML and test number so we should go be able to go ahead and run it now if I didn't forget anything good okay um hopefully uh most people are able to view this in 1080P if not it might be hard to see the numbers but in this uh command line here I'll move them like this so you can see them both at the same time in this command line window these numbers here that is a string that's stored as a string inside the program and this window on the left here is the image which uh has each of the characters or numbers um that was found the uh bounding rectangle of those is drawn in green and as you can see it did work here 3 point uh well there's no decimal of course the the dot is eliminated because we weren't trying to recognize um anything except 0 through 9 and the test that we used in this program for that was is the area at least 100 um of course the area of a DOT is not 100 that's not big enough uh so that was taken out but as you can see it did work 3141 15926 5359 so that was successful um and some of you might um be kind of looking at this and thinking well that's not very impressive you just had a plain Jane image with numbers in it and then you converted it to a string well okay that's one way to look at it but on the other hand this would work in a production environment you would simply you know if adapted correctly you would need a much bigger data set you would need different font sizes um all different fonts that you could possibly think of if you're going to read something that's handwritten you would need thousands of handwritten samples uh there are many handwritten samples of digits available in the open CV uh samples directory I might add so you wouldn't have to do them from from scratch you could read them out of there and then of course um if you were reading numbers in a scene you would have to do a tremendous amount of pre-processing to get down to an image this clean and it might not be possible to get down to an image quite as perfect as this but with enough pre-processing and depending on how familiar you were with the scene or the constraints of it you hopefully could get pretty close and then get good character recognition so um all those extra steps we haven't really got into today just to keep the example as simple as possible but as you can see it is possible to read without too much difficulty using open CV and uh KNN the uh characters in an image so um we've got that working for us let's go ahead and take a quick look at the python version uh before we get into the code too much and I think we can just save it straight to the desktop here so let's see move that out of the way and we'll do this back one here and okay generate data. pi and raw and we'll grab the name there and new text document and whoops okay that's not quite going to work just a moment format uh let's see font and go back down to 10 here there we go it's got some extra characters in there copying and pasting so rename that's much better yes we'd like to change it and then we're going to also take a look at the program train and test.py and raw and we'll grab the name here and minimize that and P past that to there take out any extra characters new text document paste yes we're sure we want to change the file extension and let's go ahead and use the image or let's use both images that we just made so this is back in the C++ example here so generate data we're going to move training numbers out to here copy or copy it out rather and then train and test we'll copy out test numbers here as well and so again those are just the same as for there's training numbers and there's test numbers and now we need to get the programs themselves so let's go ahead and open both of these up in py charm of course they'll be blank and while we're waiting for pie charm to start we'll go to generate data here and then raw and yes we know new version of py charm available minimize that and now open train and test in py charm as well and this is generate data control a contrl c and generate data contrl V and then the same for train and test so if we go back here back one more train and test.py raw control a control C and yes we want to unlock all files and I'm just making sure here final check that I didn't miss anything thing all we need for the first program is the training numbers yep that's there and as soon as we run uh generate data over here we will see those two uh same text files up here with the training data in it so let's go ahead and run that and we're going to run generate data yes okay so I'm just going to call the numbers out as I type them in as you see there's the same maybe I'll just move these around here so it's a little more clear this is the same for Windows is is as was pulled up in the C++ program so 7 54 98 6 32 one and zero and there we go there's our two files that we needed let's just move these over here so they're all grouped together okay and now if we do train and test I always like to move this window over to the right and now if we run train and test and hopefully I didn't forget anything okay there we go so I'll uh make this a little bit bigger here so you can see it I know not everybody's probably watching this in 1080P so 1 2 3 four five 6 Seven 8 nine 10 clicks bigger that should be sufficient and as you can see this is the string that's actually inside our program just print it out to the prompt here and here's our image and as you can see once again we have a successful read 3141 59265 359 less of course the decimal point which we weren't trying to read so that worked out pretty well and I'm just going to resize this back down 1 two three four five six seven eight nine 10 okay that's good enough and the code in the python version versus the C++ version is essentially just a translation from one to the other so let's go through the C++ version um but before we do that I guess really the only thing that's substantially different between them is that because uh in Python uh with open CV2 images are stored as numpy arrays so the part at the end of the first program here to uh write the two parallel data structures uh let's pull this up here to write these our classifications data structure and our image data structure when we write those out to a file um the python code to do that as numpy arrays here there's where we're writing them to file and then here's where we're reading them to file is a little bit different than the C++ code other than that the two programs are just a straight translation from C++ to python so um to save some time we'll just go through the C++ code so let's check uh generate data first and when I write this code I try to make everything thing uh sometimes the spacing gets a little goober up here with the copying and pasting that should be over there a little bit um I try to write uh this code to be as readable as possible I try to use very intuitive variable names and comment things pretty thoroughly so I don't think we need to go through it line by line in too much detail but let's just take a quick overview skim through it so here we have our includes uh three Global constants so minimum Contour area so again the only checks we're doing in these program to see if something is a valid contour and then before we pick a digit to assign to it is we're making Mak sure it has an area of at least 100 obviously in a production grade program you would have to do far more checks than that but just for our simplified example today that works okay and then we're going to um we have to uh satisfy the uh KNN objects requirements we have to pass in when we pass in our set of training images these they all have to be the same size and our image under test also has to be the same size as well so we'll Define these constants for resized image width and height uh then we declare a number of images and then Contours and hierarchy and classification ins that is essentially this data structure here the numeric classifications and then training images is of course the corresponding data structure here and then valid characters is our list of characters that we're trying to recognize 0 through nine so you could see if you were going to add say upper and lowercase letters it would be very simple just to add here all the letters of the alphabet and then you could recognize letters as well uh so this program definitely could scale up uh if that's uh something that you're interested in so then we read in the training image make sure it's not empty and then we convert to grayscale blur call adaptive threshold to get a black and white image and that was the threshold image that we showed on the screen earlier that's this line here showing that uh we make a copy of the threshold image to pass into fine Contours you always want to make a copy of the image when passing into fine Contours because fine Contours does modify the image that you pass in and then once we found our Contours we Loop through each contour and we get the bounding rectangle that was the red in the first program and then Green in the second program a rectangle that you saw around uh each of the numbers they were red of course as we were entering uh the king in the numbers one by one and then Green in the second program because we could just show all at once which uh Contours were found and then this line here draws a rectangle around the found contour and then we out of the thresh image we focus on the ROI of the current Contour by passing in the bounding wct and then we resize the ROI to Roi resized and then we show our three images here just for user reference uh then we get the key press here and as long as Escape wasn't pressed of course we would exit the program if Escape was pressed supposing that it wasn't uh then we're going to check if the key that was pressed with this line here we're checking if the key that was pressed was that one of these and if it was then we add to our two data structures via VIA these lines here push back and push back and I should mention here um Matt classification ins and then Matt training images you might have first look at this and think if classification list is in a list of integers wouldn't it have made more sense to use say a vector of integers or an array of integers and then for the training images wouldn't it have made more sense to use a vector of images rather than just a single image and then treating it as though it's a vector with this push back call here uh you may suppose it would have made more sense to do that but what we're going to find and we'll get to the second program in a minute where we'll see this uh in more detail but in the call to the KNN objects uh train function you have to pass in a single mat uh for both your your classification numbers and and your training images so you essentially have to append multiple images via push back or multiple numbers via push back onto an instance of just one mat that's kind of odd but the KNN object requires it in that format and from that point on we're going to write out our two data structures to file and open CV uh provides this uh very helpful file storage class for such purposes um the syntax is is pretty straightforward here um please see the comments if anything is unclear but I think I've commented it pretty well so at this point we've got our two uh files or data or two data structures rather written out to file so classifications and images was successfully created through here and now we can jump into train and test so in train and test um it's very similar to generate data really the only major difference is that we introduce a class here and the reason for that is that we have to sort the Contours that we find uh from left to right to display the final number from left to right you may have noticed when we were entering King in the digits to begin with when we were running generate data find contour doesn't work from the left of the image to the right of the image necessarily or at all in fact I think it started in the middle and then it jumped to the right side and then back to the left so but but we wanted to show you know if Pi is 3.14159 we have to show the numbers in there that order not in the mixed up order then it'll look like we didn't detect the digits correctly so um to sort the Contours we would have to sort the Contours and then also the bounding rectangles that match to them since we're going to sort the Contours based on the X position of the bounding rectangles and then we would have two separate data structures and if we sorted one we would have to then sort the other one to match it so at that point it's more straightforward just to put everything in a class and then sort the instances of that class after you declare a vector of them so that's what we have here we have the Contour the matching bounding wrect and then as long as we're declaring a class we might as well put some other functionality in it so here's the area of the Contour and then here's our function for check if Contour is valid and all we're doing is checking if the area of the Contour is at least 100 again in a production grade program you would of course need substantially more checks than that but just for our simple example today it'll work okay and then this function here allows us to sort a vector of Contour with data types uh by the bounding rectangle's X position and we'll get to that in a moment so now we jump into main we declare a vector of all Contours with Theta and then valid Contours with Theta and here we read in the training classifications and the the training images and those are essentially the reverse or the corresponding functions to uh the functionality at the bottom of generate data and now we're finally ready to train so we instantiate the Cann object here and then we call train once again the commenting sometimes doesn't quite hold when you're copying and pasting in but I think you get the idea and then after only these two lines here instantiate and train now we're ready to test so this will look very similar to the previous program more or less the same steps so we read in the image uh test numbers that was this number this image right here I should say and then we declare our images to work with uh convert to gray scale scale blur call adaptive threshold uh make a clone of that to pass into to find Contours that finds us our Contours uh then in this for Loop here We're looping through the Contours that we just found in fine Contours and we're assigning them we're essentially populating each member variable of Contour with data after we declare a contour with data and then we're appending that to all Contours with data then in the next for loop we're going through all Contours with data and if the Contour is valid then we're pending that to a vector of valid Contours with data so when we get down to here valid Contours with data will contain the valid Contours and only the valid Contours so this function will sort the Contours from uh left to right by the bounding rectangle X position and you'll notice we call this here which is what we were defining here so this allows us to sort by the bounding rectangle's top uh left exposition and let's see where's that sort line again okay there's where we left off then we declare the final string that's what we're going to print out to the command prompt at the end as you saw and then we Loop through each of the valid Contours we draw a green rectangle around the current contour and then we get the region of Interest so that being the current Contour in other words the current character we're looking at at the moment then we resize that to that 20 X3 size and then here we do some uh conversions to appease the KNN objects requirements as far as data type so we're going to um convert Roi U resized to a mat but using float data types there and we also have to okay commenting got a little obscure here again that should be like that and we also have to call this matte Roi of floats and reshape it to onedimensional here with this call to reshape and then we can pass that into uh the KNN objects find nearest function and then the second parameter here is simply the K value and since we have a sample size of only one for each character we have to use a k value of no higher than one in a prod production grade program you might have say 30 of each character and then you could use a k value of maybe 5 79 11 uh something in that range and then that will return the K&N objects fine nearest function will return the current character in the form of a float uh so we simply convert that to an integer and then convert it to a character and append it to our final string and then once we've looped through this for Loop here once we've gone through all our valid Contours then we show the final string to the command prompt and then also we show the image with the green boxes drawn around the characters and that concludes the program hopefully that was helpful for anybody out there trying to get started in character recognition or machine learning or both with open CV uh I don't really have any for certain concrete ideas in mind yet for next video uh if anybody has has anything in mind anything they'd like to see please post in the comments and I'll see everybody in the next one

Info

Channel: Chris Dahms

Views: 69,045

Rating: undefined out of 5

Keywords: OpenCV (Software), Character Recognition, KNN, Machine Learning, K-nearest Neighbors Algorithm, Optical Character Recognition (Software Genre)

Id: 6FzlF9qf960

Channel Id: undefined

Length: 34min 33sec (2073 seconds)

Published: Fri Jun 19 2015