Sign language detection with Python and Scikit Learn | Landmark detection | Computer vision tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so this is exactly a project in which we will be working today you can see that this is a sign language detector we are currently detecting different signs I am doing with my hands these are three signs I have selected from the American sign language alphabet from this alphabet I have over here I have selected the letter A the letter b and the letter L but obviously we could apply exactly the same process to absolutely any other symbol absolutely any other sign from this alphabet or from any other sign language alphabet from any other country so this is exactly a project in which we will be working on today's tutorial this will be 100 in Python we are going to be using opencv media pipe we are going to be addicting the landmarks from our hands so this is going to be an amazing tutorial and let's get started so this is exactly the project in which we are going to be working today and now let me show you the pycharm project I created for today's tutorial we are going to work with these three requirements obviously please remember to install these requirements before starting with this tutorial we are going to work with opencv media pipe this is a very important Library we are going to use today and we're also going to work with ckit learn so these are the requirements for today's tutorial and now let me tell you about the project about the processing which we are going to be working today this will be a three steps process and this is actually a very classical machine learning process we are going to start with the data preparation the data creation then we're going to train the model and then we're going to test how this model performs so these are the three steps in which we are going to be working today now in order to get started with this tutorial I have already created a script which is going to be super super helpful in order to create our data set this is a script I have already created so we can save some time while we are working onto this tutorial so we can focus our energy on everything else because everything else will be much much more important so you can see that this is my webcam and I have a message on my webcam saying ready and if ready press the letter Q so the way we are going to use this script the way we are going to use this uh webcam is that we are going to collect many different samples of the three symbols I shall show you right so what I'm going to do is for example for the letter A for this symbol which represent the letter A I'm going to do many times I'm going to do this process I'm going to move my hand towards the camera and away from the camera and this way what I'm going to do is to create many different samples many different examples of how the leather a it looks like right remember we are building a classifier and the way classifiers work classifiers need data and the more data we have or the more diverse data we have is going to be much better so I am going to do this this process for the letter A then I'm going to repeat the same process for the letter b and then I'm going to do exactly the same process for the letter L so we are going to repeat this process for these three symbols and this way we are going to collect our data we are going to generate our data set that we are going to create in order to work on today's tutorial so the idea is that once I'm ready I press the letter q and I start collecting symbols I start collecting frames for all these different symbols so let's start there I'm going to press the letter q and I'm going to move my hand towards the camera and away from the camera and that's it so now I'm going to move to the next symbol which is something this and I'm going to move my hand towards the camera and away from the camera and I'm going to look for a few seconds and that's pretty much all I'm going to do the same with the letter L and I'm going to move my hand towards the camera and away from the camera and I'm going to show you in a few seconds how this data looks like so this is pretty much all now let's go to this project directory and I'm going to open data and you can see I have three directories uh the directory called zero another one called one another one called two and these are the three classes we are going to generate we are going to classify today each one of these classes is encoded into a number into each one of these numbers and if I open one of these directories you can see that this is me just doing the exact process I just did just moving my camera towards the camera and away from the camera while I was talking to you in this video so these are the samples these are the images I have just generated and for each one of these directories I have 100 images right so I have collected 100 samples from each category from each class I'm going to show you for the last class which is exactly same idea but for the letter L so this is exactly a data we are going to use in order to train or classifier and one of the things I want you to take from this video is that when you are working on a computer vision project or actually any type of project when you are approaching a project when you want to build a solution then there will be many many many many many many many many different ways to build this solution there will be many different approaches you could take in order to solve the problem that will be always the case given a problem there will be many many many many many many possible solutions and the idea is that when you are just sorting on a project the idea is that you consider some of these Solutions you just uh consider all the different ways or some of the possible ways to solve the problem and then you decide for the most promising approach for the most promising way right for example in this case one of the ways we could solve this problem is by building an image classifier so if I open one of the frames for each category you are going to see that let me do something like this one next to the other so I can show you exactly what I mean one of the ways we could solve this problem is by building an image classifier that takes the entire image the entire frame as input and this image classifier to classify each one of these frames into different categories so for example this image the entire image could be the letter A then this other image will be the letter b and then this other image will be the letter L and so on right we could label the entire frame with the category with the symbol we want to classify that will be a way to do it that would be an approach then another approach will be to crop the image and to consider only the symbol we want to detect right so for example in this case we will make this crop in this other case we will make this over crop or something like it or something like this and then in this whole case we will make this over crop right so in each case we will be using the same approach in terms of using an image classifier but we will be classifying uh different things right or we will be classifying only a curve from the entire frame so this will be another way to solve the problem and I think it makes sense it will be a very a good way to solve the problem I'm sure we will achieve a very very high accuracy and I'm sure we will achieve a perfect performance because it looks like a very good way to solve this volume but another approach is to instead of using the entire image or instead of using an image we could extract the information we care about this image which is only the position of the hand right because if you realize for each one of these js2s or for each one of these symbols all the information is in the position in the post sure of all the different fingers and my hand and and so on right if you think about all the different symbols I show you when we started this tutorial you will notice that all of these symbols the only difference between them or actually the information in each one of these symbols is how the hand is located and how the fingers are located and if you think about your fingers or if you think about a person's fingers there there are not many situations or the movement we can do with our fingers is very constrained because we only have some possible movements right so basically all the information in our hands it's in the landmarks so if we use a landmark detector we will be extracting all the information we care about these pictures these images and we will be converting transform in each one of these images into a collection of points into something like 20 points or something that I don't remember exactly how many landmarks are defined in the hand for the models we are going to use today but it's something around 20 points so we will be taking an entire image of 640 times 480 pixels and we will be reducing all that image which is actually a vgr image so it's this size multiplied by three we will be taking this entire image and we will be transforming it into an array of something like 20 points or 30 points or something in that so that's a very good transformation because we are reducing the the space the input space the the the data we are going to classify we are applying a dimensionality reduction technique because we are reducing the dimensionality of our data while keeping the same information that's very important so that will be another approach that will be another way to solve this problem and that's actually the way we are going to solve it that's basically the classifier we are going to be building today so that's basically something I think is very important I think it's one of the ways I want you to one of the things I want you to take from this video is that giving a problem giving a project there are many many many many different ways to solve it and it's very interesting and it's a very good exercise to consider all the different ways or some of the different ways to solve it and to choose the most promising one that's one of the things I want you to take from today's tutorial in this case we will we could classify the entire image we will classify only a crop of the entire image or we could take the landmarks of the hand in each one of these images and we just classify the landmarks and that's uh from my perspective it's a much better approach because the classifier is going to be much smaller and it will be much more robust as well because the input of this classifier will be the information it needs in order to make a classification right it will be only the position of all the different fingers and the hand and so on and we will be removing all the unnecessary information like all the pixels and the hand and the background and me and everything else which is completely unnecessary in order to make the classification right so that's the approach we are going to be taking today and this is the data in which we are going to be working into this tutorial now in order to extract the landmarks and in order to move forward with the approach I described we are going to take all the images and we are going to do some post processing we are going to process all these images so we create exactly a data we need in order to train this classifier and the first thing I'm going to do is to import media pipe as MP this is one of the libraries we are going to use then I'm also going to import CB2 cool and I am going to import OS okay and now what we are going to do is we are going to iterate in all the frames in all the images I show you a few minutes ago and we are going to extract the landmarks from each one of these images and we are also going to save all this data into a file we are later going to use in order to train or classifier so let's start with that the data there I'm going to Define a variable which is data there and this is targeted here okay then I am going to iterate in all the directories in data there so this will be for during foreign and I will um iterate in all the frames Within These directory so this is something like a four image Parts in all list here or part join data here okay and and I'm going to Define this image but as dot part join data dear dear and image part and I'm going to create a variable which is image and image will be seeing a CV2 in read and this image part okay now I'm going to convert image into VAR or actually into RGB it's already into bdr because we need to convert this image into RGB in order to input the image into media pipe when we are working with with media pipe all the landmark detection is always on RGB so as we are reading image in vgr we definitely need to convert it into RGB so this will be CV to convert color color [Music] bgr2 RGB okay so this is pretty much all so this is the images we are reading from our data directory and now I'm going to import multiple live dot Pi plot because let's plot how these images look like I am going to extract landmarks in a few minutes but for now let's just plot these images in order to make sure everything is working properly so I'm just going to upload a few I'm going to plot one for each directory so I am going to do something like this so I only take the first one okay and now this is in show image RGB remember that a metal Leaf also requires the image into RGB in order to plot this image using battle leave this will be plot dot figure and then plot.show and let's see what happens let's see if everything is working properly okay perfect so we are plotting a frame from each class from the three directories and everything seems to be working properly so far and now let me show you three objects we are going to use which are these three sentences I have over here I'm just going to copy and paste everything everything is already ready in this notepad so we don't really lose time defining these objects and this is basically the three objects which are going to be super super useful in order to detect all the landmarks and in order to draw these landmarks on top of the images we don't really need to draw the landmarks on top of the images in order to do our classification but I'm going to to do it only to show you how these landmarks look like so these are the three objects we are going to use and now let's define an object the hand detector which is something like this this will be if I'm not mistaking a p dot hands dot hands and then the variables are something like study image mode true and then confidence mean detection confidence which is we're going to use 0.3 okay so this is the model we are going to use and now going back here remember this is the image this is the image we have converted into RGB and now we are going to do something like this we're going to say hands.process and we're going to take this image here and what we are doing here is to detect all the landmarks into this image right that's exactly what we are doing with this sentence and now the only thing we need to do is to iterate in the landmarks in all the landmarks we have detected in this image and for that maybe the best way to do it in order to move one step at a time is to show you how these landmarks look like so let me get back here and I am just going to copy and paste this function which is going to be a much it's a much better way to do it I'm just going to copy and paste it so we don't really lose time to coding this um function from scratch so basically you can see that this is iterating in all the results we have from this hand detection we are doing here because remember we could be attacking only one hand or two hands or no hands at all so it makes perfect sense to iterate in all the different results and then for each one of our results we are going to draw the landmarks and this is basically the way to do it by calling these functions with this function with these arguments remember everything will be available in the repository for today's tutorial so you can just take all the code from this repository and you can just take everything from there from now for now just follow along so you should come follow the entire process and then you can just go back to The Code by looking at the repository so this is the landmark drawing and this will be pretty much all if I'm not mistaken let me just run this code to see what happens I have to do something else which is I need to ask if we have detected at least one hand right because we could be detecting no hand at all and that that could be a problem for what we are going to do later on so let's just run this script let's see what happens we are perfect just perfect you can see that we are detecting exactly the position of the hand and we are just detecting exactly all the different landmarks you can see that we have many different colors the style we are using in order to the in order to draw these landmarks it's just it's giving us all these different colors so we know exactly all the different fingers and so on so these are exactly the landmarks we are going to use into this tutorial we are going to take all this information and we are going to draw the we're going to build our classifier with this information so this is only to show you how these landmarks look like and this is what I mean when I say that these landmarks contain absolutely all the information we need in order to work on this tutorial in order to build this classifier because Take a Look only at the landmarks don't take a look at the image itself at my hand but take a look only at the landmarks and you can see that all the information we need it's in the landmarks right for example in this case for classifying the L we will need to take a look what happens here what happens in this section with these four landmarks if we have a situation like this where this finger it's like up and then all the other fingers are down then that's pretty much all we need to know in order to make this classification and then if we go here if we notice these four fingers are up this means we are in this situation and if we are in this sort of situation with all the different fingers like this then we are here but absolutely all the information we need is in the landmarks so it is only to show you how it looks like and it will be exactly the same situation with absolutely any other symbol we choose the information will be in the landmarks and now let's continue we don't really need to do the drawing the only reason I did the drawing was only to show you how this looks like but we don't really need it so let's continue what we are going to do now is uh yeah I'm just going to remove this part yeah because we don't really need I'm going to remove all the drawing I'm going to keep everything else and what I'm going to do is I'm going to take the the entire landmarks and I'm going to create an arrive from all the landmarks right I'm going to take all the image and from each image I want to have an array with the information of absolutely all the landmarks we have detected right so in order to go one step at a time I'm going to iterate for e in range uh the length of Handler marks.land mark and I'm going to show you how these look like I am going to print this value and this is going to give us the value of all the landmarks I'm going to show you how it looks like first and then I'm going to do something with them okay we don't really need to plot the images anymore but we have the images too and you can see that for each one of these landmarks we have three values x y and sieve and these are all the values which Define exactly the position of each one of our landmarks so we are going to use only the X and the Y coordinates which are the horizontal and the vertical coordinates and from this information we are going to create an array a very long array and we I think we are going to train the classifier and we're going to inference this classifier by considering this array of landmarks all on right so we have an image we detect the landmarks and then we take these landmarks into a very very long array and that's there right we are going to consider in order to train our classifier right that's a process we are taking and we are taking everything one step at a time now we are going to access the X and the y coordinate so this will be the x coordinate and then the y coordinate will be something like this right and we are going to save everything into an array so I am going to create two variables which are data and another one which is labels we don't really want to iterate in only one image anymore so I'm going to delete that and I'm not going to plot it anymore so we have defined these two variables which are the variables which are going to contain all the information right the data which is the data we are going to produce in order to make this classification and then the labels which are the category for each one of these images for each one of these data points so this is how we're going to do it we are iterating in all the images for each image I'm going to create an array which is data aux and this is an empty array for now let me show you exactly what we are going to save here this is where we are going to save the X and the Y coordinates so something like this X and then something like this with Y right and then at the end of the entire iteration we are going to do something like uh data dot append data logs and then labels dot append and then a deer which is the category right remember that we have three categories we have three directories one of them is called zero the other one is called one and the other one is called two and each one of these directories contains each one of our symbols so the symbol is encoded into the name of the directory so this is exactly what we need to do for each one of our images we are extracting all the landmarks we are creating a very long RPI with all these landmarks and then this very long array is going to represent our image right we are going to create a an entire list with all these different arise and then the labels will be the name of the directory of each one of these images and doing so we are creating our data set doing so we are creating the data set we need in order to train our classifier and so that's pretty much all now let's see if this works properly because we have done many many different things and there could be there are many places in which we could have made a mistake so let's see what happens okay everything was successfully executed so everything seems to be okay so what I'm going to do now is I'm going to save all this data so I'm going to do something like this and I'm going to Define an object which is f and this is open data dot pickle if I'm only saying I haven't imported pickles I'm going to import it now and remember pickle is a python Library which is very commonly used for these type of situations to save data to save data sets models and so on it's just like a way to save this information so we are opening this file and we need to do it like this okay because we are grating and it's also we are doing it as bytes so we need to do it like this then we need to say pickle dot dump the object we are going to save and we are going to create a dictionary containing these two keys and here we will have the data and the labels we have just generated and then we need to close the file and that's pretty much all and also we need to input F here and that's pretty much all so let's see what happens I'm going to run this again um and if everything runs successfully we should have a file with this name okay everything runs successfully now if I go to my directory I should see this file which is the file I have just generated you can see this is my current time so this is if I did have just generated and I have other files from all iterations or four executions where it was preparing this tutorial but this is the file they have just shared so going back to poicharm so everything seems to be ready for now we have created the data set this is the data we are going to use in order to train our classifier and now we can just continue to The Next Step which is training this classifier now we are going to take the data we are going to load this data and we're going to train a classifier with it so we are going to do exactly the same as we are doing here but we are going to load the data instead so I am going to import pickle is going to remove this string classifier and I am going to call pickle.loa and this will be open the file name which is data dot pickle and then I need to read this as RB okay and this will be our data file or or data dictionary if I'm not mistaken this is the way to read the data and now let's print let's print two things the keys of this dictionary and let's also print the object data sorry data dictionary and let's see what happens okay everything seems to be okay these are all four labels and these are the dictionary data these are the keys data and labels and then everything seems to be working properly so now let's continue and let's continue by importing all the different objects and all the different libraries we are going to use in order to train this classifier we are going to train it using the library secret learn and we are going to use a very specific Model A very specific type of classifier which is random Forest so let's start by importing from SQ sklearn dot ensemble import random forest classifier and then we're also going to import sqlarn dot model selection import train test split and we're also going to import um from The Matrix Library accuracy score okay so these are the three libraries we are going to use in this section in order to train this classifier we are going to train a random forest classifier and these are two other functions we are going to need two so let's start by and grabbing the data and the labels and this is how we're going to do it and we're going to call that a dict and this will be data and labels data dict labels okay and we need to convert this into one ampere array sorry I need to import an mp2 I'm going to do it just now uh because we just need to import it because this is the way this classifier works and this is why all these libraries work so I'm going to import numpy SMP remember that currently or data is as a list right data and labels are lists that's why we need to convert them into numpy arrays okay and the first thing we need to do in order to train this classifier is to prepare the data so we have all four data into these two objects data and labels now we are going to split this data into a training set and a test set this is a very common practice when we are training a classifier any type of classifier we usually need two sets assets we are going to use in order to train this algorithm and in another set we are going to use in order to test the performance of this algorithm so this is what we are going to do and this is how we're going to name this training set and this test set um we're going to let me grade it first and I'm going to explain in a few minutes this is something like this this is where we're going to use this function uh we're going to input data labels I'm just going to write everything down and I'm just going to explain it in a few minutes um so just bear with me Shuffle it with true and then stratify according to labels okay so what we are doing here is calling train test split and we are splitting all of our data we are splitting all these array and all this over array into two sets right so from the data list also from the data array we are creating these two sets and you can see that one of them is called train and the other one is called test and we're doing exactly the same for the labels we are creating two objects to arise one of them is trained and the other one is test so we are taking all the information within data and labels and we are splitting this information into two different sets let me find a picture so I can explain this a little more so this will be a little more clear train test split I'm sure we're going to have we're going to find many many different pictures explaining how this is done you can consider this image so we have the entire a ride the entire day time and we are splitting this data into two sets training set and test set so the training set is what I have called xtrain and the test set is what I have called egg the X test that's basically what we are doing and we are doing exactly the same process for data and labels then test size is the size of the test set you can see that we have two sets and we can Define the size of this test set for with different sizes right we could say this is a 10 of the data the 20 the 50 the 80 and so on so I have specified this value in 0.2 which means we are keeping the 20 percent of our data only the 20 percent as or test set and 20 is a value which is very commonly used as a test then Shuffle equal through means we are shuffling the data this is a very good practice it's a very common practice and I will advise that you always always shuffle your data when you are doing something like this when you are training a classifier because this is something I have mentioned I have covered in our tutorials where I also show you how to train image classifiers or different type of classifiers and sometimes there are different biases in our data that we are not aware of so shuffling the data is a very common practice even when we think it's not a long story short always remember to shuffle the data it's going to be much much further then stratify equals labels this means that we are going to split it as a set but we are going to keep the same proportion of the different labels of the different categories we had in this object we are going to keep exactly the same proportion in the train set as in the test set right so this object and this other object are going to have the same proportion of all of our different labels right so if you remember the data we are using we have 100 elements in this category 100 elements in this or category and 100 elements here so this means that one third of the data is labeled as a which is this symbol one third of the data is B and one third is L and if we look at these two objects white rain and white test we are going to see exactly same proportion one third of all the data of all of the elements in these are I are going to be a then an earth here will be a b and a North here with b l and exactly the same situation here so that's basically what it means when we are stratifying according to the labels and that's also a very common practice and a very good practice when we are splitting at asset so always remember to include stratify equals labels when you're using trained test splits so this is all for splitting the data and now it's time to create our model so the model we're going to use remember is a random Force classifier so this is the classifier we are going to use and the first thing we need to do is to train this classifier so we are going to call fit we're going to input X strain and white strain this is the way we are training this classifier this one this is the way we are fitting this model and then we just need to call Dot predict and we need to input X test and this will be y predict and that's it in only a couple of lines we have trained the classifier and we have also made our predictions this is how simple this is so let's just execute this script to make sure everything works properly everything works properly and you may notice how fast it was right this took only a few seconds if you are familiar with machine learning algorithms with training machine learning algorithms you will know that sometimes it takes a lot of time to train a classifier but in this case this is a very very fast training so that's one of the reasons I chose random 4 is because it's like a very simple algorithm I know the training is very fast I know that it's robust enough for or a problem for project so we have a really true in the classifier we have all predictions and the only thing we need to do now is to see how these classifier performs so I am going to find a new variable which is score and this will be accuracy score and I'm going to input all predictions and then I'm going to input why tests which are all labels and now I am going to print and this will be something like I'm going to express it as a percentage of samples were classified correctly format and this is something like score something like this let's see what percentage of our samples were classified correctly so I'm going to run this code and you can see oh I forgot to do something which is multiplying this value times 100 you already saw the answer but anyway let's do it again 100 of our samples were classified correctly so this is amazing this means we have a perfect performance we have Trina classifier with a perfect performance so everything makes sense now the way we process the data in order to extract all the landmarks for more images from our hands because that's where we had all the information and now the selection we did for this classifier because we had the intuition this is not really a super super complex problem and a random Forest will do just fine everything makes sense now when we are achieving a 100 accuracy right so this is very good and the only thing we're going to do now is to save this model the same way we save the data here we are going to save the model because we want to use this model later on to to test the performance of this model but not testing the performance as we did here but testing the performance of this model with real data I am going to make some gestures some symbols and we are going to classify these symbols live on my camera on this video so we definitely need the model I'm going to save this model like model.pe and I'm going to do something similar I'm going to say model this will be model on this okay so I am saving a dictionary with only this object and I press play and that should be it we have to wait and then it's it's completed if I go to my directory you can see the model we have trained is here and this is my current time so everything it's okay so everything is almost ready you only we have only one step left which is testing this classifier so that's what we are going to do now okay now it's time to test our classifier and the way we are going to test this classifier is by importing CV2 because we are going to access our webcam and we're here going to do all these different symbols in our webcam and see what happens so cup will be something like CV2 video capture and I'm going to access this webcam the number two and then while true a cup dot read and this is red and frame and then I am going to call CV2 IM show I'm going to call this window frame and this will be my frame uh and then CB2 weight key and here we should put a 25 which means we are going to wait 25 milliseconds between each frame right so everything seems to be working fine and this is my webcam so um what we are going to do now is we are going to detect uh all the landmarks in this webcam we are going to detect all the landmarks of my hand and then we are going to use the classifier which is trained in order to know what a symbol I am displaying in my hand so let's do that I am going to release memory I'm not sure if it's reading here because we are exiting the program the the program anyway but I'm going to do it just just because it's a good practice CV2 destroy all windows and that's pretty much all so uh I don't have much space here so let's go here and I'm going to copy a few objects I'm going to copy this which is the um the media pipe library or the media pipe functions we used and then I am going to we also need the color conversion but I'm going to do this okay something like this this will be frame RGB frame frame RGB and this is where we are going to um go through all the landmarks and we're going to do something similar as we did before so I'm going to copy this code I used before and the first thing we're going to do is to draw the landmarks on top of the webcam and then we are going to classify it so as always it's always a good idea to go one step at a time and let's just run this code to see if everything works properly um everything seems to be working properly you can see we are getting the landmarks from my hand and now it's the time we are going to classify this object so we go back here and we say something like this we are going to iterate in all the landmarks the same way we did before and now we are going to create a no of an auxiliary array which is this the same way we did before and the only thing we need to do now is to um use the model but we need to read the model first we need to load the model first so let's do something as we did before I'm just going to Define it here let's say model dict is equal to pickle load mole dot p and then let's say model equal to model dict model and obviously I need to import pickle okay and this should be enough now we are here yes we are here and we need to um uh use model to predict the category so let's see what happens this will be in p Us are right from data Alex okay we need to import mp2 okay and let's see let's see what happens let's see if we don't get any error okay we get an error of course and it's related to the shape if I'm not mistaken what we need to do now is to input this as a list let's see what happens now yeah everything is okay so now we are using our classifier in order to classify the class for these landmarks we are we have like a very like a more poor performance now now it's better it seems the classification is not making this real time maybe if I do something like waiting only one millisecond it's not like it's the most important thing but just so we can have like a more real-time detection now it's a little better remember I'm also recording my screen so that takes resources that's pretty much the reason I think so let's continue everything seems to be working properly and what we will do now now that we have our our prediction this is our prediction we are going to draw the prediction on top of our frame or actually what we can do we can define a dictionary which is something like labels digged and this is 0 equal to a remember the number zero represents the a character then one represents B and then two sorry I'm doing things wrong and two represents l okay and what I can do now [Music] um something like predicted character it's equal to um labels digged int prediction zero because we are getting a list prediction is a list of only one element so we need to do it this way okay that's pretty much all let's print predict this character I'm super excited let's see if this works I'm super excited so I press play and now let's see if we can accurately detect and recognize all the symbols we have just trained so the first one is the ah I have to put my run again and if I do the a character you can see I'm getting a once and again I'm getting a so everything seems to be working properly now I'm doing the B and you can see I'm doing I'm getting B once and again I'm getting B and everything goes perfect and if I do L which is this value you can see I'm getting the L and also the accuracy it's perfect so that's pretty much all because it seems everything is working super super properly the only thing we're going to do now is to make a more pretty drawing right let's just print the um the prediction on top of the frame and let's do also like a bounding box around the prediction around the hand we already take things so that's what we are going to do now but you can see that we have solved the problem we have completed all the different steps we have train the classifier and now we are testing the classifier and everything seems perfect we are testing this classifier with live data and everything seems perfect so the only thing we're going to do now is to do called rectangle and we are going to input frame and then we're going to input a two values which are X1 and X2 and then we're also going to input sorry this is y1 and this is X2 Y2 a color which I'm going to Define in Black because we have too many colors right ready in this picture in this frame with all the landmarks and so on so I'm just going to set it in Black and then the thickness value which will be I don't know four something before and I'm just going to do this now and I'm going to [Music] um I'm going to Define X1 and X when y1 and X2 and Y2 in a minute but let's just continue for now and I I want to call put text and I have a pulled text here so I'm just going to copy paste and I'm going to [Music] um edit it with these values this is going to be the class we already take it in so it's exactly this then this will be here remember we have solved this problem so we should be super super happy this is only a detail to make a very pretty drawing but everything is sold everything is perfect so we are drawing a rectangle and we also put in the text on top of this rectangle with the class we are predicting and now we have to Define these values in order to find these values the way I'm going to do it I'm going to Define two additional arise which is here X and Y I think this will be the easiest way to solve this problem so I'm just I have defined these two auxiliary arise and I'm going to say x dot append X and then y dot append and Y and I'm going to say something like X1 equal to the minimum value of x yes then y1 is exactly the same for y and then X2 and Y2 are the maximums remember we are trying to get the corners of the rectangle contained in the hand and we're also um so we know exactly where to display this rectangle and this text so this should do but we need to do something else because if you remember the values we were getting for the landmarks everything is in a in a float everything is float so we need to convert it into an integer and the way we're going to do it is by calling uh we are going to get the frame shape everything's okay and then we are just going to multiply this value times the width of our image and we are multiplying the solar value times the height uh we need to cast everything as an integer and this should be enough let's see how it performs this is not necessary and and okay okay let's see let's see if it works or or if we need to adjust something else okay X1 is not defined okay so the problem is that if we are not detecting any hands right if the if there's not any hand in the frame then we never execute this Loop we never Define these variables and we cannot draw the rectangle Lambda text in the location we specify so doing something this should be enough now let's execute it again and I'm just going to do an a and you can see that I'm classifying the a super super properly now I'm going to do a b and now I am going to do an l and everything is okay I'm going to adjust a very very small detail which is I am going to do this drawing in a slightly different position I have already been testing different values and if I do it here in -10 everything is going to look much much nicer and I'm going to do this in y1 -10 if I'm not mistaken sorry sorry sorry I'm a mistake it's the older one the text one y1 minus 10 if I execute it again everything is going to look a little nicer so they say you can see now we are seeing the A on top of the bounding box this is B and this is L so everything is working super super purpley and that's going to be all for this tutorial so let's go into your for today my name is Philippe I'm a computer vision engineer if you enjoyed this video I invite you to click the like button and I also invite you to subscribe to my channel in this channel I usually make tutorials cooling tutorials which are exactly like this one and I also share my experience as a computer vision engineer so if these are the type of videos you're into I invite you and you're super super welcome to subscribe to my channel this is going to be your for today and see you on the next video [Music]
Info
Channel: Computer vision engineer
Views: 87,367
Rating: undefined out of 5
Keywords:
Id: MJCSjXepaAM
Channel Id: undefined
Length: 55min 37sec (3337 seconds)
Published: Thu Jan 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.