AI Virtual Mouse | OpenCV Python | Computer Vision

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome to my channel in this video we are going to create an ai based mouse controller we will first detect the hand landmarks and then track and click based on these points we will also apply some smoothing techniques to make it more usable if you like to create real world computer vision apps do check out my premium course in which we learn how to create apps such as object detection augmented reality document scanner and a lot more the link is in the description below so without further ado let's get started so here we are in our pycharm project and we have created it by the name ai virtual mouse so what we have here is the hand tracking module now if you have not been following we have written this module from scratch so from the very beginning from the very first project we have added a lot of different methods to this particular class so now the thing is that in our previous project we added the fingers up method and the fine distance method and this will allow us to very easily create this new project so we will have a look at that how we can do that and this file of course will be available online on my website so a lot of you ask how do you access the code on the website you have to log in and you have to enroll to get access and of course it is free just enroll and you will get the access now if you have not been following you have to go to file settings and you have to go to the interpreter and you have to add the opencv opencv python and we have to install it and then we also have to install mediapipe through which we will get all this hand tracking functionality so media pipe and we are going to hit install okay so now both of these are installed and we can hit okay so the first thing we will do we will go and create a new python file and we will call it ai virtual mouse project okay so what we will do is we will first import cv2 then we are going to import numpy as np and then we will import our module which is hand hand tracking module as htm and then we are going to imports imports time now apart from all of this that we have been doing earlier as well what we will do is we will also add a new library which will allow us to move around with our mouse so with the python script we will be able to move our mouse we will be able to click on it there are a lot of these that you can use the one we are going to use is called auto pi so we are going to hit install on that so an editor occurred so let me check again if we can install it okay so he was giving an error earlier but then i clicked on it again and it installed fine so we can close this and we can go back and now we can also import auto pi okay so the first thing we will do is we will run our webcam to see everything is working so we will write here cap is equals to cv2 dot video capture and we are going to write that our video id is one now you will use zero if you have one camera i have multiple so i'm using one now the second thing is that we have to have a fixed width and height so we cannot leave it to the default of the camera so we need to change our width and height so we will write here cap dot set and the prop id for width is three then the prop id for height is 4 so we will make it 480. so that's how you can define the word and height actually what we can do is we can put them in variables because we need to use them later on as well so let's declare our variables over here so we are going to write here that our width of the cam and the height of the cam is equals to 640 by 480 and we can just input these values here so this is the height of the cam and then we can simply go let's remove this and then we can simply go and write while true we are going to say success and image is equals to cap dot read and we are going to get our frame value so once we have this frame value then we are going to say cb2 dot im show and we will say that our image and then i mg and then cv2.weight key and one so this is pretty much that we have been doing in all our projects so let's run this and see if it works there you go so now you can see my webcam and there is my hand so that is all good so next what we can do is we can add our detector for the hand tracking but actually let's discuss what are the steps that we are going to take today to create this project so the first step will be let's put some numbering as well so it is easier to remember so the first step will be to find the hand landmarks so that will be the first step then the second step will be to get the tip of the index and the middle finger so the idea is that if we have just the index finger then the mouse will move if we have the middle finger up as well then it will be in uh clicking mode so if it is in clicking mode and if the distance between the two fingers is uh less than a certain value then we will detect it as a click so you can bring your fingers together and it will click so in that mode we are not going to move the mouse but in the index mode where we are moving uh that is the only mode in which we will move so what we can do here is that our second uh what do you call our second step will be to get the tip tip of the index and middle fingers so once we have the tip of the index and middle finger what we will do is we will check which of these fingers are up so we will write here number three check which fingers are up then in the fourth step based on this information we will check if it is in moving mode so we will write here only index finger which means it is in moving mode so we will move our mouse and if it is in moving mode then we are going to then we are going to convert our coordinates the units now why do we need to convert because our webcam will give us a value of let's say 640 to 480. so for my screen i have a full hd which means 920 by 1080 so we need to convert these coordinates so that we get the correct positioning okay so then okay it's bringing it back we can change that later then we will add another step to actually smooth in the values so why do we need to do that so that it is not very jittery it doesn't flicker a lot so we will write here smoothen values so we will smoothen these values and once that is done we can simply move our mouse so move mouse okay so this might seem a lot but it these steps are very easy some of them are single lines some of them just two lines so don't worry about these then number eight will be to check if we are in clicking mode so when both index and middle fingers are up then it is clicking mode so once it is clicking mode we will find the distance between these fingers so we will find distance this tense between fingers and if the finger is if the distance is short then we are going to click so we will write here click mouse if distance short so these are the 10 steps we have to follow and the 11th and 12th step is fairly easy so the 11th step is the frame rate to see if we are getting a decent amount of frame rate and the 12th step is to display so we have already done this display thing so we do not need to do anything more in that now what we can do is we can go on to the frame rate the frame rate is very simple as well we have done this quite a lot of times by now so we will simply write time dot time and then we are going to write fbs is equals to 1 divided by c time which is the current time minus the previous time and then we will write that the previous time is equals to the current time and then we will write cb2 dot put text image then our string which will we will first convert it into integer and then we will write our fps and then we are going to write the position so we will write 10 and let's say 50 or let's say 20 and 50. then we will write cv2 dot font cv2 dot font plane and then let's say for the thickness or this is the scale let's put it as three and then we have okay we need to go back then we have the color 255 uh let's keep it blue yeah and then we will write three this is the thickness so if we run this we should have here p time p time equals to zero so if we run this we should have our frame rate there we go so that is quite good okay so next we have the frame rate we have the display now we are going to do the actual part of all of these steps so first of all we have to get the landmarks to get the landmarks we have to declare here the detector we have to create the object so detector is equals to hdm dot hand detector and inside that do we need to add something we can add for example the maximum hands because we are only expecting one hand so we can write here one and the rest we can keep same then here we are going to go down and we are going to write that our image image is equals to detector dot find hands not fingers up fine hands find hands and then we are going to find the positions of these hands so we are going to write here lm list and the bounding box so this is something that we added in our previous project and we will write here the detector detector dot uh find position i think the spellings detector find position and then we will write image so we are sending in our image and that should detect it and it should also draw so let's run this and see if it detects we have an issue uh finance oh we have to give argument of image my bad okay so there you go so now we are detecting our bounding box and we are detecting the fingers uh and the landmarks as well so that is pretty good so we are done with step number one and now we will check that if if our length of our lm list is not equals to zero then we are going to uh get the tip info so we can actually put this up here so here we will write x1 and y1 so these are the points of the index finger so we are going to write lm list and we will write that it is point number eight and we want from uh we want the element number one and two so we will write it like this uh the same thing we will do for our second finger which is the middle finger we will write x2 and y2 and here instead of 8 we will write 12. so this will give us the coordinates of our index and middle fingers so we do not need to draw these at this point so we can just print them out if you want to see we can print x1 y1 and then x2 and y2 so we can print those and we are getting an error x1 y1 not enough values to unpack expect it to got one why is that lm list let me check here what is the issue this is find position yeah and this is lm list and bounding box yes so that should be fine okay let's print out the lm list first print lm list let's check that yeah we are getting some values and they seem fine uh oh okay my bad should be one one and colin okay there you go so now we are getting the points there you go so now we are getting all these points so for the index finger and the middle finger we move them around you can see the values they change okay so this is good we are done with our second parts now we will go on to the third part check which fingers are up now this is extremely simple because we have already created a method by the name fingers up all we have to do is we have to call it we will write here detector dot find not find fingers up and we will store it in fingers so we can push this in and let's print out so print fingers and we will remove the print from here so let's run that so there you go so all the fingers are up all of them are closed one two three four and five so we are getting these values so that is pretty good now let's go to the next step which is okay let me push those all of these in okay so step number four is only index finger moving mode so now we need to check if only the index finger is up so we will write here if if fingers at one which is the index finger is equals to one and fingers at two is equals to zero so this is when the index finger is up and the middle finger is down so this will be moving mode so now we need to move our finger now we need to check where our finger is moving so we get those points and we send it to the mouse coordinates okay so first of all what we will do is we will write here that we need to convert so here we are converting our coordinates so we will write here that x3 is equals to we will write np dot interp we are going to convert one range to another range so here we are converting the x1 value and the initial range is basically from zero to the width of our webcam and then the second range is from zero to the width of the screen but we didn't get the width of the screen now i know that my screen is this size but it could be different for yours so in order to get the exact value what we will do is we will go up here and we will write here that our width of the screen and the height of the screen is equals to auto pi dot screen dot size so this will give us the size of the screen so if we remove the print from here and if we remove this statement and all of this uh yeah so then we can print this print w screen and height screen okay let's run this and there you go so now you can see it's telling me it's 9 to 1920 by 1080 so this is the idea now that we have these values let's comment this now that we have these values we'll go back here and we are going to continue that it is from 0 to the width of the screen so the same thing we will do with the height we will copy this and we will write here y3 and then we will write here y1 and then we will write here height and then height so this is the idea so these are the points that now we have converted and now we will send this value to the mouse we will smoothen these values but we will do that later on first we need to see what is the original result and then we can convert it so here we are going to write auto pi dot move mouse dot move and then we are going to write that our x 3 and our y 3 are our coordinates so let's try this and see what happens so you can see my mouse here if i bring in my hand and this is my index finger and now you can see it is moving but the problem is when i'm going to the right it's going to the left so this is very annoying and it is very it's not intuitive so what we will do is we will flip it in order to flip it we just need to flip the width so what we will say is we will say that whatever the width of the screen is screen is minus this so now if i go to the right it should go to the right so the image here will be flipped but in reality i'm moving to the right and the mouse is also moving to the right now if i move to the left the mouse is also moving to the left so this should be easier to work with so that is good what we can do is we can draw a circle so that we know that we are what you call moving the mouse so here we can write cv2 dot circle and we will write image and then we are going to write x1 and y1 so we want to draw on that and let's say 15 is the radius and the color is purple and then we will write cb2 dot fill cv2 dot filled there we go so let's run that yeah so now whenever we are in moving mode then it will show us this big circle so that we know that we are in moving mode okay so this is good so now one of the issues here is that when i am moving when i'm moving i can go up very easily it's not that bad it flickers at the top much more than in the center but i can go there but if i want to go down it's very hard because the hand is not detected properly again if i move down you can see it's not detecting properly and i'm unable to go down so what we can do is we can set a region where we want to detect the movement instead of the whole image size we can set a particular range so how can we do that first of all let's create that range so we will write here cv2 dot rectangle and we will set in our image and then we are going to call this let's say frame so this will be a certain value for example 100 or 200 something like that so we will call it frame reduction and we will also again call it frame reduction so we can go up and we can declare it here frame reduction is equals to let's say 100 so we will write here this is basically frame reduction reduction okay so that should be good now once we have the frame reduction what we can do is uh now we need to give in the second value so this is the initial value now we need to give the diagonal points so we will write here that the width of the cam minus the frame r and then the second point will be the height of the camera minus frame r then we will give in the color two five five zero and two five five and then we will give in the thickness so this will draw a rectangle so let's try that so whenever we are in it's not drawing anything oh it is yeah whenever we are in okay um okay maybe we nee we need to put this outside so we can put it outside here because we won't always want to see it whenever we have the hand in we want to see it so now you can see we have our box now the idea here is that when i reach the top of this rectangle it should be the mouse should be at the top of my screen and when i reach the bottom of this it should be at the bottom of the screen so and same thing for the corners if i am moving at this corner it should be at the corner but now you can see it is not at the corner so again you can adjust these values up and down we will keep it in the center for now but later on if you want to you can adjust so how can you reflect this on our x3 and y3 so how can you change these values so all you have to do is it's very simple instead of zero you will right here frame r you will write here frame r and here you are going to write width of the cam minus frame r and height of the cam minus frame r that's it so now your values should reflect properly so here if i have my finger at the top right corner you can see it reaches the top right corner if i have it on the other side you can see it's reaching the top and now if i go back and i go down you can see it reaches down so we are having some issues as well it's going out of bounds we can fix those issues later what we can do is we can push this up as well a little bit so that it is easier for our fingers to move around but we can do that later we can move on to the next step which is to detect the click okay so then we are going to detect the click so here we have to check if both the index and middle fingers are up so we are going to copy this part and we are going to paste it here and we will write one and one so if both of them are up then we need to find the length of between our fingers so what we will do here is we will write that our detector dot find distance dot find distance between which points point number eight and point number twelve so uh these are the landmark ids so landmark eight and landmark twelve and then we will write image then it will unpack the values of length and then the image and then the what did i do here it should be comma and then we also get a bunch of info that we are going to ignore so the main thing that we need is the length so we need to know what is the distance between these two fingers so what we can do is we can write here print length and let's try it so when we are in our detection mode it is giving us the length and it is telling us uh there is a good indication because it actually gives a center point as well and it draws a line in between so that is pretty good okay so what we can do next is we can check that if the length is below a certain value then we will detect it as a click but we need to define that threshold so we are going to go back and let's try it out so here it should be open here it should be closed so i can see it's around 30 something so if it's less than 40 maybe yeah okay so we can say if it's around less than 40 then it is detected as a click uh you can do a normalization here as well but that will be quite detailed so we are not going to go into that so we will write here length is less than 40 then we are going to cv2 dot circle we are going to draw the same circle that we had drawn here but this time we are going to draw it in green so we have the detection that it has been clicked so let's try that so here there you go um we could draw it to the center one as well it will look better okay how can we do that basically this is the information we are getting for the line so we can write here info line or we can write line info and then based on this line info if we go to our fine distance you can see cx and cy are the last elements so this is the fourth and this is the fifth so we will write here this is the fourth and this here is the fifth push it down okay so let's run this and hopefully this time the center one will be green there you go so now it looks a little bit better so that's good okay so what is next now we actually need to click so rather than just changing it to green we need to click and the clicking part is way easier than you think and that is auto pi dot mouse dot click and that's it so now it should click uh by the way these two we are already doing so we should write here that we are checking the distance here and then we are clicking the mouse if the distance is short over here so that's the idea okay so let's try it so what i will do is i will try to click and minimize this this part here so here is my finger and if i move around and i click you can see it's shaking a lot yeah it clicked there you go it click again but as you can see it shakes a lot so this is a very big problem which is not allowing us to use this properly so what can we do as i mentioned before if we go up here we can smoothen the values so how can we smooth in the values so what we can do is instead of sending in exactly the same value that for example if it goes from 0 to 100 instead of saying go to 100 directly we will dilute it a little bit so we will smoothen it we will reduce its value so it goes step by step so what we can do is first of all we are going to create a value called smooth the ning is equals to let's say five so this is a random value that i've chosen later on we can see what is the effect okay so now what we will do is we will we need to also create two variables so what we will do is we will write here in fact we need to create more than two variables so we should separate the variable declarations here yeah that should be fine okay so what we will do is we will say that our previous location we will call it previous location of x and the previous location of y is equals to 0 0 and the current location of x location of x and the current location of y so again these will be 0 and 0. so what we will do is what did i do here so what we will do is we will use these values and we will update them each iteration to smoothen our uh mouse so here we are going to go here now instead of x3 and y3 we are going to send in the smoothen values of current location and we will update our previous location so how can we do that we will write here that our current location of x is equals to our previous location of x plus our x3 minus our previous location of x divided by the smoothing value so whatever the value is we will divide it by that and the same thing we will do with our y value so we can write here y you can multiply with this as well you can multiply smoothing as well then you will have to go into points so 0.1 0.2 or you can divide and keep it whole numbers it's up to you so we write here y 3 and then we will write y and then we will write that's it okay so then we will just send in our x value and y value instead of x three and y three and then we will just update these values once we have uh use them so we will write here previous location x and previous location y is equals to current location x okay let's put y first current location uh current location x okay so that is the idea now uh let's put a very dramatic value let's say 20 and let's run it so now you will see if i move it around you see it is quite smooth but it is quite slow so we need to find you see when i stop it takes a while to stop so what we need to do is we need to find a good balance so let's try five so i like this it moves nicely and it stops it doesn't shake a lot there you go i can click as well there you go and let's click on the minimize there you go so yeah that looks good let's try 10. uh 10 is good but it's a little bit slower yeah it's hard to stop at that point yeah 10 is a little bit fast uh it is a little bit slower so maybe seven yeah this one is better there you go i can do that i can go to this one i can click on this one and this one again there you go so it is pretty good so that is quite nice so that is pretty much it as you can see it works quite well and we broke it down into different steps and when you go and try to solve each of these steps it becomes very easy to get a solution and all of this is possible thanks to our hand tracking module that we created earlier if we don't do that then it will be quite difficult and it will take quite a lot of time to actually create such a project but as you can see it was quite easy and quite simple what we achieved in this short amount of time so this is it for today's video i hope you have learned something new if you like the video give it a thumbs up and don't forget to subscribe and i will see you in the next one
Info
Channel: Murtaza's Workshop - Robotics and AI
Views: 380,373
Rating: 4.9582882 out of 5
Keywords:
Id: 8gPONnGIPgw
Channel Id: undefined
Length: 39min 38sec (2378 seconds)
Published: Mon May 03 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.