Hand Tracking 30 FPS using CPU | OpenCV Python (2021) | Computer Vision

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome to my channel in this video we will learn hand tracking in real time we will first write the bare minimum code to run and then learn how to convert it into a module so we don't have to write it again and again for different projects the best part is we do not have to configure 100 parameters along with 20 different installs to make it run within 10 to 15 minutes you will have your model working i have created an ecosystem of modules that can be used in different projects to learn more about it do check out my premium course with jetson nano so let's get started the framework we will be using today is called the media pipe which is developed by google they created these amazing models that allow us to quickly get started with some of the very fundamental ea problems such as face detection facial landmarks hand tracking object detection and quite a bit more so we will be covering the rest of these as well so make sure to subscribe to keep updated now the model we are working with today is the hand tracking it uses two main modules at the back end so one of them is the palm detection and the other one is hand landmarks now the palm detection basically works on complete image and it basically provides a cropped image of the hand from there the hand landmark module finds 21 different landmarks on this cropped image of the hand to train this hand landmark they manually annotated 30 000 images of different hands so that is a lot of work and this is one of the reasons it works so well and the best part is that it is cross platform and we do not have to dive deep into the sea of configurations and installations so within just two clicks we will be up and running so let's have a look at the implementation so right now i am in pycharm and we are going to first create a new project so you can see that i have created this hand tracking project and we will go to file settings and we will go to our project then the interpreter and we will add so here we are going to add our packages so we will write here opencv python we will install that and then we will write media pipe and we will install that so these are the only two packages that we will be needing so within two clicks we are ready to start coding so that is amazing okay so now we will create a new file we will call it let's say hand tracking tracking minimum so the bare minimum code that is required to run this so the first thing we will do we will write here import cv2 and then we will import media pipe as mp and then we will import time so this is to check the frame rate so first we are going to create our video object so we will write here cv2 dot video capture and i'm going to use my webcam number one you can use your webcam number zero so then we will write file true and then we have success success and we have our image is equals to cap dot read so that will give us our frame we will write cv2 dot weights key 1 and we will write cv2 dot i am show i am show and we will write here image and image and we will write img so this is basically what we always do to run a webcam and what we can do as well is right here that if more doing it or we can skip it it's fine we don't need to write that we have to close with the q button so here we can right click and we can press the run button and let's see so there you go this is my webcam you can see my hand there you go and we are going to detect this hand so the first thing we have to do is we have to create an object from our class hands so here we are going to write now this is related to the hand detection modules or the hand detection model so later on we will create our own module so that we can learn how to use it easily in different projects so getting the values of these different points or the landmarks is a little bit tricky but we will create a module so that we can just say i want the point number five of the hand so tell me the location so that will become quite easy to use in different projects so first of all we are going to write here mp hands is equals to mp.solution now this is you can say a formality that you have to do before you can start using this model so you will write mp.solutions.hands and then we are going to write that we are going to create an object called hands we will write mp hands dot hands and then inside that we have to write our parameters now what are these parameters so let's go and check them out so we will click on we will press the control button and we will right click on this and it takes us to that function so here we can check what exactly are we getting uh what exactly do we have to input so here the first thing is the static image mode so static image mode they have this configuration where they will uh track and detect so if you put this as false then sometimes it will detect and sometimes it will track based on the confidence level but if you put it as static mode then the whole time it will do the detection part which will make it quite slow so we will keep it false so that it detects and if it has a good tracking uh confidence it will keep tracking so this way it will be much faster whenever the tracking confidence goes lower than a certain range then it will do the uh detection again so then you have the maximum number of hands so here we have two and then we have the minimum detection confidence so this is 50 and then minimum tracking confidence which is 50 so it means if it goes below 50 it will do the detection again okay so now that we know our parameters we can go back and we can write here false so actually we are not going to write anything because these are the default parameters and they have already given the default values so we do not have to change or write anything here if we want to we can otherwise we can skip it as well so for this instance we are going to skip and later on we are going to write whatever we need so then we are going to go actually we will need to go back okay so then here in the loop we are going to send in our rgb image to this object so here we have to first convert it so we will write here image rgb is equals to cv2 dot cvt color and then we will write our image y is a double bracket okay we will write our image then we will write cb2 dot color underscore bgr to rgb so this is our idea that we want to convert it into rgb because this class or this object only uses rgb images so we need to convert that first so we will write here that our results results is equals to hands so we are calling this object dot process so there is a method inside this object called process that will process the frame for us and it will give us the results so that's how simple this is now all we need to know is how to extract this information and use it so after this what we can do is we can simply display this but at this point we are not really displaying or doing anything but i still want to run it to see if everything is working so far so it will be processing it but it will not display anything for us so let's try it out so there you go and now you can see even though it's processing the frame rate has not decreased it's uh it seems real time we will later on check the exact speed as well the exact frame rate so don't worry about that okay so then we are going to open this object up the the one that we have received and we are going to extract the information within so as we have seen the parameters we can have multiple hands so what we can do is we can extract these multiple hands so we will have to put in a for loop uh to check if we have multiple hands or not and we have to extract them one by one now before we do that we have to make sure that there is something in the results so we can print out the results and we can run it and it just gives us that it is a media python solution based solutions output and if i bring in my hand nothing really changes so we need to know when something is detected or not so to check if something is detected or not we can write here dot multiple uh multi underscore hand underscore and underscore landmark landmarks so let's run this and see what happens so here it says none and if i put my hand and there you go so straight away we are getting some values so what we will do is we will say that if we can remove the print or let's keep the print we can copy this part and we can go down and we can write here if this is true then we are going to go in and for each hand so we can say for each hand landmark um landmarks let's say in results dot multi whatever we wrote here multi-hand landmarks so you saw that we were getting some results so is it of one hand or two hands we don't know well we actually know because i just put one hand but it could be of multiple hands so here we will have each hand and then we will get the information or extract the information of each hand so once we do that we have a method uh provided by the media pipe that actually helps us draw all these points because there are a lot of them and we you have almost how many were there 21 points and between each points if you want to draw a line it will be quite a lot of maths that would be involved there so they provided us with the function or a method for that so we are going to write that down and that is basically mp draw we will call it mp draw is equals to mp.solutions solutions dot drawing utilities so we will write that and now we will use mp draw to actually draw it so we will write here mp draw and then we are going to write draw landmarks and inside that we will give in our image that we want to draw on so we don't want to draw on the rgb image because we are not displaying the rgb image we are displaying the original image bgr so we will write image and then we are going to write hand lms so this is a single hand okay so there could be multiple hands this is let's say hand number zero then there could be hand number one so this is that single hand so if i run this now that should draw the hand for us let's try it out and there you go so now you can see it is drawing the hand for us and it looks pretty good so but these are points and i told you that we could draw the connections as well so how can we do that we can do that by writing here mp hands dots and underscore connections so that is it so we are using mp hands dot hand connections and this will draw the connections for us so let's try that out and there you go so now you can see how easily we got our what he called hand position and we got all the 21 landmarks if you like the video so far give it a thumbs up and don't forget to subscribe so this is good but the problem is we don't still know how to use these values so where are these values how can i extract and use them so for example if i want to track one of these positions to perform a certain task what exactly can i do so that is still remaining and we will learn how to do that but before we go there i want to do the frame rate so we are going to write the fps so to do that we are going to write here that our previous time is equals to zero and our current time is equals to time is equals to zero okay so once we have done that we will go down here and before we display we are going to write here current time is equals to time dot time and this will give us the current time and then our fps will be one divided by our current time minus the previous time previous time okay so then our previous time will become the current time so our previous time will be the current time so yeah that seems good and what else can we do can we yeah i think we should display it on the screen so that we can see it rather than putting it on the console so we can write here cv2.cv2.put text and we want to put it on our image we want to convert it into a string because it is time so we are using what do we call fps fps and we also have to round it because or should we if we round it it will give us decimal values we don't want decimal values for fps we can just put integer so that will give us that and then we can give it a value the position let's say 10 and 70 and then we can give in our font cb2 dot font whatever comes first and then we write then we write the scale and then we write the color so let's put purple or let's put blue whatever let's put purple and then we have the uh i think i missed something i missed a comma here okay and then we need to put i think the scale or the thickness the thickness let's put as two or let's put a three okay so that seems good and what else i think that should be fine let's run it so here we have it so now we can see that the time is around 30 30 fps the frame rate it goes to 20 sometimes but most of the times it's 30. you can see it's quite fast very responsive thumbs up oh thumbs up makes it go away thumbs up yeah this time it worked thumbs up great let me try my other hand as well so that seems fine and it is working quite good so we can move on so now we are going to get the information within this hand so for each of these hands so we will get the id number and we will also get the landmark information so the landmark information will give us the x and y coordinates and we also have their id numbers and they are already listed in the correct order so all we have to do is we have to check their index number and that's it so what we can do is we can write here for id and the landmark we are going to find it or we are going to enumerate and then we are going to find it inside the hand lms dot landmark so this is basically our landmark this is basically our landmark that we are getting from here and this is the id number or the index number that we are getting which will relate to the exact index number of our finger landmarks so if it is zero it will be the bottom middle one uh then if it's four it will be a tip and things like that so what we can do is we can print here and we can write id and landmark so we can see at least what is happening so let's run that and there you go so let's see what did we get so if we go up here you can see that this is id number 20 19 18. so if we keep going back we keep going back we will start from zero so each id has a corresponding landmark and the landmark has x y and set so we are going to use the x and y coordinates to find the information or to find the location for the landmark on the hand but the thing is if you see here these values are decimal places so the location should be in pixels so it should be for example 500 pixels in the width and 200 pixels in the height something like that but here you can see these are picks these are decimal places so basically what they are giving is they are giving a ratio of the image so we will multiply it with the width and the height and then we will get the pixel value so this is how we can get it directly so here what we are going to do we are going to first check out the heights the width and the channels of our image which will be which will be image dot shape so we can write this and this will give us the width and height and then what we can do is we can find the position so we can write here cx and cy is our position of the center and basically it will be an integer because it is decimal places so we have to convert it into integer so we will multiply our landmark dot x value multiplied by the width and for the second one it will be integer and then landmark dot y value multiplied by the height so this will give us the cx and the cy position so now we can print this out but the thing is that it is not for a specific one it is for all of them so if we print it now let's remove this and we will print we will print cx and cy so if we run this now it will give us for all 21 values so how do we know which one is for which which one is for landmark one which one is for landmark two so we need to write the id of that as well so we can write it like this so there you go so now we have this information so if we look here this is the this is the id number and this is the cx and the cy position so what we can do is we can use any of these to actually uh use it to our benefit to actually print out any of these landmarks so i can write here if id is equals to zero this means we are talking about the first landmark then we are going to let's say draw the circle so we will write here cv2 dot circle we are drawing it on the previous one and we will color it a different way and we will make it a little bit bigger so it is easier for us to know that this is the one that we are printing so it shouldn't be an issue so we can write here that our radius is let's say 5 and then our color will be different it will be purple and then we have cv2 dot filled so once we have that now it will only draw for what you call the id number one so if i run this now and there you go so you can see here at the bottom uh you get okay let me make it bigger it's very small so let's make it 25 there you go so now you can see clearly that we are detecting that landmark which is zero so if i remember correctly four is also four is a tip of one of the fingers let's make it 15. 25 is too big there you go so it is the tip of the thumb so you can see now we are getting this information and what we can do is we can put all of this in a list and we can use it to print or we can use it to find the location and move around and do all sort of different things with this what we can do is we can also remove this and then it will draw on all of them but that's that's not useful because we are already drawing on all of them so here you can see looks quite weird anyways so that is the basic idea that this is how you get the cx and cy information which is basically for each one of these and we can put them in a list so that we can later on uh return this list and use it to our benefit if we want to track the index finger the tip of it or the bottom part of it whatever we want to track we can do that so now that we are done with this we are going to create a module out of this so that next time if we are using it in a project we don't have to write all of this again we can simply ask for the list of these values of these 21 values of each hand for example we can say give us for the first hand give us for the second hand whatever and then we can simply say okay i need point number ten and it will give us the value of point number ten which is let's say at this point it is four four four and two one zero so that will make it very easy for us to uh create new projects so let's see how we can do that so now we will create a modules file so here we will call its hand tracking module so we will copy pretty much all of this code and we will paste it here and first of all we will write here if name is equals to main this means that if we are running this script then do this so whatever we write in the main parts will be like a dummy code that will be used to showcase what can this module do so we will write here def main and we are going to put our while loop inside of this so while true and in fact all of this as well uh not that let's copy this part first so we'll put this here and then this part here also for the frames fps we will put it down here and what else what else do we need yeah the video capture we can put it here wait why did it show here okay i think i copied it or what yeah so we need to remove this okay and then what else what else i think that is fine for now so now what we have to do we have to create our class so i thought of doing it in functional programming but i think it will be better if we create a class so we are going to create a class here we will call it class and detector and inside that we will write def inits in itself and inside that we have to give in our parameters so these parameters are the basic parameters that are required for this hands so if you remember we went to the hand and we have all these parameters so these are the ones that we will be using to input that so so that we have the flexibility of changing these so here we have the mode so we will write here mode is equals to false then we are going to write the max number of hands so we can write here max hands is equals to two then we can write the uh detection the direction confidence is equals to 0.5 and the track confidence is equals to 0.5 so then we can remove all of this and now the first thing we have to do is we have to write self dot mode is equals to mode this means that we are going to create an object and the object will have its own variable so this is that variable whenever we are using the variable of the object we will call it self dot something and we are assigning it initially we are assigning it a value provided by the user so we are calling uh we are calling it mode and we are providing it the value of the mode so the same thing we have to do with the other parameters so we will write here max hands is equals to max hands self dot detection confidence is equals to detection confidence self dot tracking confidence track confidence is equals to track confidence and then all of these have to be inside this initialization as well so if you remember they are part of the initial code where we are initializing everything and then there is the while loop so we need to initialize these as well over here and again we will write here self dots so we will keep putting self dot everywhere and we also have to so why is this giving an error empty hands because we need to add the self here as well so we will write here self.mphands.hands so that should be good and inside that we have to give in our parameters so the parameters will be self mode then the max hands the confidence and the tracking confidence there you go so this should be fine so i think the initialization is done so now we can move on to the detection part so we can write here let's say we will call it find hands and inside that we have to just copy this part so do we need to convert we do need to convert and we need to put this as well so we will put all of this we will put it here inside and then we will go back up here and let's start from here so first of all we will need an image to find the hands on so that will be this image and then hands is not being recognized because it has to be self-taught hands so we are talking about this object within this object so then we have mp.draw so this should be self.mp draw and then self dot hands connection so that should be good and should we draw it inside i i don't think we need to draw it here in fact we do not even need to get the landmarks from here what we can do is we can keep this outside and we can comment this so here this is what we need basically to draw the hands so we can put a flag here we can write here draw we can put it by default as true and we can check if we want to draw or not so here we can write here if draw then do this okay so it will only draw if we ask it to draw so i think this is good enough to actually run the code or the run the class without actually getting the list so for testing this should be fine so what we can do here so we will create a new method within this class that will find the position for us it will give that list for us but for now we will just test to see everything is working so far or not so here we will first create our object we will call it detector is equals to hand detector and we will not give in any parameters because we know that we have these default parameters already there so once that is done we will get our image and once we get the image we are going to send this image we are going to write here detector dot now this is the method here find hands so this is the method within our class so we will write here find hands and we have to give in our image so that is the most important component so we might need to draw on it so we need to return the image if we have drawn on it so we will return the image so then we can go back and take the image over here so image is equals to this so if we run this now as a module it should work so let's see if we did any mistake uh yeah it's working oh yeah that's good so now our module is running the main reason for creating this module is to get those position values of the landmarks very easily so we need to create that find position function or the method so we will write here find position and we are going to give in the parameters of our image now we don't really need the parameter of the image but we need it for the width and the height so if you remember here we need the shape so we can do it in other ways but this is simple so we will try it now later we can improve on it then we need the hand number so if you are detecting with if you want the information of hand number one and number two and number three whichever hand you want you can ask the information of that and then we will have the parameter for draw so again we will put it as true by default so now we can uncomment this and we can bring it back okay so that should work now here the issue is that we were using a for loop to actually run this but now we need to first check again we will go back here and we will create a list here lm this is the landmark list that we are going to return so this list will have all the landmark positions so we can return this whether it is filled or not we will return it so we will return this and then we are going to check again that whether any landmarks were detected or not or any hands were detected or not so to do this we use basically this part here so if the self results multi-hand landmarks if that is available then we are going to check the next things so here we will write this and we will put all of this inside that so but here we are getting the results this is the results it should be self.results self.results and now i can use the results here as well okay so now we need to replace this here as well okay so if it's not self then i cannot use it in this method to use it in all the methods you have to make sure this is for this object this variable okay so now we have to write down that which hand are we talking about because earlier we were getting it for all the hands so if you want you can get it for all hands it's up to you but i'm creating this method to get the uh to get for one particular hand if you want you can change that too so here we are going to write so earlier we had four hands this this this now we will get this and we will point to we will point to a particular number and that number will be the hand number so we will say that our uh let's say our hand let's say we'll call it my hand is equals to this and we will put this over here so it will get the first elements the first hand and then within that hand it will get all the landmarks and it will put them in a list so here we are just printing them out so here we can write lm lists list dot append and we want to append the values of id cx and cy and we can remove the print because we're getting it anyways so yeah so that is i think that is good and here we have the option of draw as well so we can write here if draw then do this otherwise don't so by default it is true so it is going to draw so yeah let's see how that works out so we can return this list so i can call this so we can go down here and i can call find position and we can remove the self and we have the image and do we need anything else i don't think so okay copying this was a bad idea anyway so finding positions of image yeah and it will give us the list so we can copy this and we can paste here so now what i can do is by the way we have to write detector dot find position so now what i can do is i can print the value of my list at any index so if i want let's say uh the zero index then i will write here zero if i want the landmark number four i will write here four so if you remember the four was the tip of the thumb so this will give us that position so let's run this and see what happens index is out of range okay so that is understandable why because here we have to check if nothing is found which means lm list the length of it is zero so we will write here that the length of this list is zero then we will if it's not equals to zero then we will print so let's run it again and there you go so if i put my hands so now you can see it's drawing for all of them but it is showing me only landmark number four so if if you look at my thumb so if you look at my thumb if i'm going really to the okay it's not moving the values okay so now if i go till the very end you can see it is 600 something because the value is 640 the max value and now if i go around to the starting point it goes uh around 150 something 200 so i'm talking about the x position by the way so the x position is changing like this then we have the y position here it is going towards zero and down here if we go down here it is going towards 400 something so this is a 640 by 480 image it works well with 1280 and 720 as well so it's still around 20 something frames per second so that is quite good okay so now this is working as a module and what we can do is we can use this in a different project now you might say how can we do that well here is the dummy code so this dummy code we can use to actually uh run in a different project so i can copy this and i can create let's say my new game game hand tracking blah blah blah okay that's a very long name um so i can i can paste this here uh the complete code and i can remove the indentation and then i can import i can import these so then i need to check what is missing so here now i need to import i will import hand tracking module and tracking module as h t m so now i will write htm dot hand detector and the rest will remain the same so that is pretty much it so if i run this now it should run exactly the same so there you have it and there you go so now it's giving me the values of the index uh not the index the tip of the thumb which is landmark number four and it is showing me all the landmarks uh as we have drawn so if i if i go back here if you like the video so far give it a thumbs up and don't forget to subscribe to my uh module i can change the color or i can change the size of these as well again all these parameters you can change you can add to your methods if you wish and it could become easier for you it depends on your project so if you have a lot of different things that you want to accomplish within one project then you can add more methods to this to compensate for that so here let's say we will make it a little bit smaller so 15 is too big we will make it seven let's say and let's change the color so it's bgr so let's make it blue and there you go so now we have changed it and if we go back actually i preferred the previous one even the big aspect so um here what we can do is if you don't want to show it we can write here false and it will not display oh it's displaying wait why is it displaying so why is it displaying let's go back here and if draw then only we do this okay maybe it's drawing here as well am i running this yeah i'm running the correct file but when i write false here oh there are more arguments that's the problem there are more arguments so we cannot just write here we have to write draw is equals to false my bad straw is equals to false and there you go the drawing is gone that the custom drawing that we did now if you want to remove this drawing as well you can write here draw is equals to false and there you go so now you will see that you are getting the value of the thumb but nothing has been drawn so this way you can uh customize it to your needs so this basically brings us to the end of our tutorial i hope you have enjoyed it i hope you have learned something new from it and we are going to look at the other aspects or the other models of this media pipe and i am very excited about those because they are very good and they run in real time so i will be sharing all of these uh other models as well so make sure to subscribe and hit that like button if you like this video if you enjoyed it and don't forget to share it with your friends try to spread the knowledge let everybody know what is out there so they can create amazing projects and i will see you in the next one
Info
Channel: Murtaza's Workshop - Robotics and AI
Views: 932,406
Rating: undefined out of 5
Keywords: hand tracking, hand tracking opencv, opencv hand tracking, opencv hand, hand gesture opencv, opencv realtime hand gesture, mediapipe, opencv media pipe, mediapipe python, opencv tutorial mediapipe, mediapipe hand tracking, opencv hand tracking mediapipe
Id: NZde8Xt78Iw
Channel Id: undefined
Length: 48min 59sec (2939 seconds)
Published: Thu Mar 25 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.