AI Hand Pose Estimation with MediaPipe and Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's up nick what you working on uh you know just some hand tracking that allows you to do this whoa that's awesome want to learn how to do it let's do it what's happening guys my name is nicolas renate and in this video we're going to be going through hand pose estimation using media pipe so you'll be able to use your webcam to be able to track and detect all the joints within your hand in real time let's take a deeper look that's what we'll be going through so in this video we're going to be going through a bunch of things but specifically we're going to tackle three key problems so installing media pipe so media pipe is the library that allows us to perform hand pose tracking among a bunch of other solutions what we're then going to do is detect our hand as well as our second hand so we can actually do multiple hand tracking in real time using our webcam and you'll actually be able to see those results rendered to the screen and then what we'll do is we'll also save some images with our detection so this means that you'll be able to take the outputs for a media pipe and from your pose detection model and output them to a jpeg file or whatever file format you want using opencv enough on that let's take a look as to how this is all going to work so first up what we're going to be doing is installing mediapipe for python so specifically we'll use a pip and saw for that pretty straightforward what we're then going to do is detect hands and specifically our hands in real time from our webcam feed so for this we're actually going to be using opencv to get a feed from our webcam we'll then overlay our media pipe detection model so you'll actually be able to see all those joints rendered in real time and then we're actually going to output those images to open cv now this is part one there's actually going to be a part two to this series where we'll actually do a little bit more sophisticated detections with our particular model so we'll actually determine which hand is left and which hand is right and maybe we'll do some additional stuff but this is part one we're going to keep it pretty simple ready to do it let's get to it alrighty guys so in order to build our hand pose model what we're going to need to do is three key things as per usual we're going to go through this step by step so first up what we're going to do is install and import our dependencies so that's step number one then what we're going to do is draw our hand detection so what we'll actually be able to see is as we put our hands up to the webcam we'll actually be able to see all of the joints within our hand detected in real time so this is the initial phase or one of the first tutorials that i'm going to do on hand pose the next one is going to be a little bit more advanced and give you some more capabilities for it then last but not least to add a little bit of additional flavor what we're going to do is output the images from our detection so you'll actually be able to see the images with all of our hand pose models or landmarks actually drawn out to a jpeg file so you'll be able to pick those up work with them play around with them if you want first things first what we're going to go ahead and do is install our dependencies so let's go and do it okay so dependencies are pretty straightforward today so what we've gone and written there's exclamation mark pip install media pipe and then open cv dash python so media pipe and open cv are going to be our two core dependencies so media pipe if you've seen some of my previous tutorials around this gives you a bunch of different computer vision solutions so you're able to use it for hand pose tracking holistic tracking body pose tracking facial landmark tracking whole bunch of really really cool stuff then opencv is an open computer vision library which allows you to work with your webcam allows you to work with images really really well so it's pretty much the standard when it comes to computer vision libraries now the next thing that we're going to go ahead and do is actually import those dependencies into our jupyter notebook so let's go ahead and do it okay so those are our dependencies now imported into our notebook so we've gone and written one two three five lines of code there so the first one is import media pipe as mp so this is going to import all of our media pipe solutions and if i actually show you media pipe there's a whole bunch of different uh that's a github repo media pipe so there are a whole bunch of different solutions available inside of mediapipe so specifically we're going to be working with the python api and we are going to be working with this hands model here so you can see hands if we select hands so you can see that you're able to get really really sophisticated hand tracking now the cool thing about this is that it's really really quick so it works on your computer so you're able to see those detections in real time so the advantage of it being quick is that it can be used in real time so this is an example of the hand landmarks and i've actually got this inside of the notebook which i'll show you in a second actually but basically this is what we're going to be able to do so we'll probably recolor it a little bit so it looks a little bit better though okay so that's mediapipe then what we're going to do is import cv2 which is opencv so import cv2 so that gives us opencv and we're mainly going to be using opencv for our webcam feed so we'll use opencv to connect to our webcam then we're importing numpy as mp so import numpy as mp so numpy just makes it easy to work with your different outputs from your mediapipe model then the next two are really all to do with output so we're not going to use uuid and os until we get to step number three but uuid allows you to generate a uniform unique identifier or unique uniform identifier one of those ways so this basically allows you to generate a random string which you're then able to use as your image or we're going to use as our image name so this means that we're not going to get any overlap when we actually capture our images and then os stands for operating system so it is the os library for python so it just makes it easy working with different operating systems cool alrighty now the next thing that we want to do is actually start setting up media pipe so we're going to import or we're going to set up our drawing utilities and we're also going to set up the hands model because remember we're going to be using this model here and again all the links to this code including any references are going to be in the description below as are a couple of links to the previous tutorials that i've done but again if you've got any questions at all by all means do hit me up in the comments below and join the discord server because we're always pretty active on there in this case let's go on ahead and set this up okay so those are our two media pipe components now done and dusted so we've gone and brought in two additional things here so first up we've brought in our drawing utilities so that's this line over here and our drawing utilities just make it easier for us to render all the different landmarks in our hand so when you actually get the output from the media pipe model what you're going to get is a series of landmarks so there's going to be one landmark for each individual joint in your hand now let me uncomment this so you can see it now what i actually mean by landmarks is this so each one of these dots the red dots represent a landmark so and a landmark is really a joint really in this particular case so your wrist your thumb uh thumb c m c thumb mcp so i don't know what the mcp cmc is but uh basically each one of these is going to represent a landmark so again you can see that you get really really sophisticated uh joint tracking for your particular model so we're going to use drawing utils to help us draw all of those landmarks to the screen then we've actually brought in the hands model so to do that written mp or actually let's take a look at how we brought in drawing utils so mp underscore drawing equals mp.solutions.drawing underscore utils then we've brought in our hands model so mp underscore hands equals mp.solutions.hence and then we've brought that in successfully so the hands model is actually this model over here so everything that we need to work with this hands model is in this line over here cool so that's all well and good now the next thing that we actually want to do is actually start working with our webcam so let's set up our feed and then we'll start building on from that to be able to draw our detections from our mediapipe model so i'm just going to add in a new cell here and start writing this out okay so that is our basic webcam feed so again i went through that pretty quick but if you've watched any of my real-time videos before this block of code will look really really familiar to you but what i'm going to do is i'm going to go through each step we'll run it and then we'll overlay our media pipe model on the top of it so first up what we're doing is we're getting our webcam feed so to do that we've written cap equals cv2.video capture and then we've passed through that we want our video capture device number zero now again you might need to play around with this depending on what type of machine you're using and what device your webcam actually is so on my windows machine it's device number zero but on my mac it's device number two again sometimes you need to play around with this if you get an error saying that the image is empty or your image must have three dimensions then you might want to play around with that number then what we're doing is we're actually reading through each frame within our video capture or within our from our webcam so to do that what we're doing is we're saying while our capture is open so while we're connected to our webcam so while cap dot is open and then we've included a colon what we're going to do is read each frame from our webcam so to do that we've written red comma frame equals cap dot read so the ret and frame variables are just extracting or unpacking the results from our cap.read function so out of this we're going to get our return value we don't need that what we're more interested in is this frame variable so the frame variable is actually going to represent the image from our webcam cool right then what we're going to do is render that image to the screen so again this is using opencv so cv2.iamshow and then we've passed through what we want our frame to be named so you'll see at the top in a second once i actually run this that the name of our frame says hand tracking if you wanted to change this by all means go for it but in this case we're going to leave it as hand tracking and then we've passed through a comma and then frame so this is represents our actual image so to the cv2.iamshow function so this line here you pass through two variables so you pass through the name of the frame or at least the two variables that we're going to pass through are the name of the frame or the name of what you actually want your output to be as well as the image that you want to show and then everything down here is pretty standard in terms of gracefully closing down your window once you're done with it so this is basically saying that if you hit q on your keyboard or if you close down your window you're going to break out of the loop and stop the feed and once that runs we effectively release our webcam using cap.release and cv2.destroyorwindows closes down our frame so that is all well and good let's go ahead and test this out just make sure we can actually get our feed from our webcam so if we hit shift enter looks like we've got a bit of an error there oh this should be double equals let's try running that again and ideally you should get a little pop up towards the bottom of your screen in your taskbar and you can see that's all well and good so my webcam is activated i can see my feed on the screen and you can see it's pretty quick it's all rendering all looking pretty good yeah cool nothing crazy there at the moment right so we've got our webcam feed now what we actually want to do is overlay onto this so what we're going to do is we're going to take the feed from our webcam pass it to pipe make detections render the results to the image which you can see that at the moment is called frame but before we pass it to the cv2.imshow function we're actually going to apply our different detections to that particular image so ideally what you'll get back is not just a webcam feed but a webcam feed with all of those real-time detections applied so let's go on ahead and do this so we'll take it step by step so i'm not going to do it all in one big go so we'll first set up media pipe to work with this feed and we'll probably just print out our results and then what we'll actually go and do is render so let's go on ahead and do this all right so before we go any further let's actually take a look at what we've got here so i've gone and written two new lines so far but before we go on i wanted to sort of take a little bit of a break there so first up what we're doing is we're instantiating our media pipe hands model so we're basically going to be using this using the with statement so with mp underscore hands dot hands and then we're passing through two keyword arguments so these two here let me zoom in on that so passing through min underscore detection underscore confidence and we're setting that to 80 i'll come back to that in a second and then min underscore tracking underscore confidence equal to 0.5 aka 50 and then we're going to be working with this big variable as the variable hand so whenever we reference hands what we're doing is we're actually working with this now these two metrics here or these two keyword arguments represent two specific things so when you first use the media pipe hands model it's going to detect your hand and then from that particular image it's then going to track the hand so what we're basically saying here is that initially we want our detection confidence to be 80 for that first detection and then for the tracking if we want it to be 50 so ideally this sets how accurate our model is going to be now you can set it to 100 but then you might not always get the perfect detections or it might not actually detect anything if you set it too low then you might sacrifice appropriate accuracy so again you want to play around i found 80 for detection confidence and 50 for tracking confidence to be pretty good metrics then the next line that i've actually written here is this one over here so what we're going to be doing first up is we're actually going to be recoloring the frame that we get from our webcam so you can see that our frame is up here what we're going to do is we're going to convert it from bgr to rgb so when we get a feed from opencv by default the image color is going to be in the format bgr so blue green red but in order to work with media pipe we need to send that image to the detection model as an rgb format so basically it's shifting around the color arrays so we're initially going to convert it from bgr to rgb using this line here so to do that we'll bring image equals cv2.cvt color so this function allows us to recolor an image and then we're passing through our frame which we got from up here and then we're specifying what color combination we actually want to apply so in this case it's cv2 dot color bgr egr2 rgb so you can see that there right okay now what we're eventually going to do and specifically what we're going to do now is we're going to set our image writable flag equal to false so this just stops us from copying our image and ideally performance tunes we're then going to make detections set it back to writable equals true and then we're going to recolor it back to bgr to be able to render our results so let's go ahead and do this okay so we are now looking pretty good so we went and wrote an additional what five lines of code there so all of these lines so let's actually break this up so first up what we did is we actually recolored our image so we went bgr to rgb then what we did is we went and set our writable flag equal to false so to do that we've written image dot flags dot writable equals false so again set flag and what we actually went and did so this line over here is probably the most important line out of all of this so this is actually going on ahead and making our detection so to do that we've written results equals hands dot process and then to that we've passed through our image which has had its flag set to false and it's also being converted to rgb so this actually makes our detections detections then we've gone and said our writable flag back to true so this is going to allow us to render to this particular image or draw on this particular image so we've set that flag back to true and then what we've gone and done is we've gone and recolored it back to bgr so again so we've gone and converted it from rgb to bgr and then what we're doing is we're actually printing out our detection results down here so this should actually allow us to see our detections and to do that we've written print and then to that with password results so also on these flags we'll set image.flags.writable equals to true and then to convert back to bgr written image equals dot cvt color pass through this image over here or pretty much the one that we set up over here and then instead of having bgr to rgb like we did up here we've actually gone and set it to rgb to bgr to convert it back to pgr so as of right now you won't actually see anything rendered to our image it's still going to be the same old image until we actually go and apply our drawing utilities so let's run this for now and then we'll come back and we'll take a look at what it looks like so again if i run it ideally what you should see is our results printed out now so print it out as in printing out the string not actually anything rendered to the image but that's fine okay so it looks like the our frame popped up but then closed so if you do get that just run it again and you should be good to go okay so our image so we've got our real-time feed still pretty quick and you can see down here we're getting all of our solution outputs from mediapipe so that's looking good now if we actually quit out of our frame this is actually a good sign so if we actually type in results now you can see that we've got our media pipe solution down there and if we actually type in dot hand believe it start hand it should be multi-hand landmarks so rather than hand it's multi-hand landmarks oh doesn't look like we got any okay this is actually interesting so we don't actually have any landmarks at the moment and this is because we didn't actually put our hand in the frame so let's try that again so i'm just going to make sure my hand is in the frame now so we actually get some results all right so close again let's try running that again perfect all right cool so our hands in the frame and i'm just going to quit out of this and now if we actually go on and access results we should get the results from the last frame detected in this case you can see that we've got all of our landmarks and from here you can see that we've got our landmark and each landmark is represented by an x y and a z coordinate so x is going to be the x axis y is going to be the y axis and z is the estimated distance from the camera i believe so that is all well and good but now what we actually want to do is actually go and render our results so let's go on ahead and do this so we'll actually see our results shown to the screen okay so that or those three lines of code should now allow us to actually draw our landmarks to our image now there's one last change that we actually need to do here or let's actually take a look at those three lines of code first and then i'll show you the last change so this is actually rendering results so first up what we're doing is we're checking whether or not we've actually got any results in our results array or our results variable so this is going if results dot multi underscore hand underscore landmarks and this is basically checking if there's anything in there if not we're going to skip our rendering and then what we're doing is we're actually looping through each one of our results inside of the multi-hand landmarks variable so which is what we saw down here so to do that we're in four num comma hand in enumerate results dot multi underscore hand underscore landmarks and the reason why i've used enumerate is because in the follow-up video we're actually going to do something a little bit more sophisticated with the results so again this is sort of future proofing for the next tutorial then what i've gone and written is mp underscore drawing dot draw underscore landmarks and to that we've passed through three variables so the image the hand so in this case this is going to be this variable so we're going to loop through each set of results and then we're also passing through mp underscore hand dot hand connections so really our image is going to be the image that we've been working with so far so and really that's the image from our webcam our hand at that particular point is time is going to be a set of landmarks from our results dot multi underscore hand underscore landmarks variable and then our mp dot hand connections really represent the sets of coordinates right so let me it should be mp underscore hands this should be mpo but dot hands okay so this really represents our hand connections so basically what this is saying is that the wrist is connected to this joint the wrist is connected to this joint and it basically shows you the sets of relationships so this joint is connected to this joint this joint is connected to this joint so on and so forth so basically allows you to draw the connections now there's one last change that we actually need to apply and right now we're still rendering our frame so this is our raw webcam feed that we got from up here but what we actually want to do is pass through our image which is now had our different joints actually rendered to it so all we need to do to do that let's get rid of some of these spaces is type in image here so replace our frame with image and now what we should get is our results rendered to the screen so let's go ahead and run this okay we have the pop-up and a close run it again and close again got to debug where that's happening so again if you get the little grey pop up and then it closes ideally just rerun it again okay so we've got our webcam feed that's all looking good if i stand up now you can see that we've got our hand tracking working and again it's quite well it's quite accurate and fast as well so you can see i'm moving my hand around still tracking relatively quickly and again we can do two hands as well so again you can do this really really or you can apply it to a whole bunch of different use cases now the cool thing about this is that you can actually do it for more than just two hands so say for example you had multiple people in the frame you can actually pass through a variable to this line here which i believe is max num hands and if i go and take a look i believe that's parameter max num yeah so if you pass through maximum hands you can actually bump this up so at the moment it's detecting a maximum number of two hands that's the default but if you wanted to apply this or pass through more than two hands to your frame at a particular point in time you could definitely bump that up to be able to render more results to the screen but for now that's pretty good right so we're rendering our hands but again as per usual my ocd is kicking in and i don't like displaying things in red and green so let's go ahead and change the coloring of this so for this we're going to use something called a drawing spec so let's hit q on this frame and close this out and go and apply our drawing spec so we're going to make an update to this mp underscore drawing line to be able to go on ahead and change our colors now specifically let's actually take a look at this mp underscore drawing function so mp underscore drawing dot draw landmarks so through this function you can actually pass through five different variables so in this case we can pass through your image your landmark list which in this particular point in time is our hand detection or hand landmarks then you can pass through your connection list which is our hand connections over here but then you can also pass through a landmark drawing spec and a connection drawing spec so your landmark drawing spec represents the colors for your joints or your landmarks and then the connection represents the color for your lines think of it as your bones so we're going to go ahead and pass through two drawing specs to be able to change our colors so let's go ahead and do this okay so those are our drawing specs now applied so i've gone and done or written two additional lines of code there and these lines of code i just passed through to our draw landmarks function that we already had up here so specifically i've passed through mp underscore drawing dot drawing spec and then i've passed through three keyword variables so i've passed through a color variable a thickness variable and a circle radius variable if i actually show you the mp underscore drawing dot drawing spec let's pass through question mark question mark so you can see here that you can pass through three variables to this so color in this case will pass through our color and in this case it's going to be in the format of bgr because we've already converted that image back to bgr before we go and render so we've passed through our drawing spec we have passed through our thickness so the thickness is going to represent your line thickness and then we'll also pass through our circle radius so in this case if we're drawing a landmark it's going to change the size of your circle if you're drawing a line the thickness parameter is going to change the thickness of your line on your finger so we've gone and passed through two of those so to do that we're in mp underscore drawing dot drawing spare and then we've said color equals and then inside of brackets 1 2 1 comma 22 comma 76 so if you wanted to change the color all you need to do is just pass through a different color combination so remember it's blue the blue value the r value oh sorry the blue value the green value and then the red value and then we've also passed through our line thickness we've set that equal to two and then our circle radius a circle underscore radius equals four then we've gone and passed through another drawing spec so this one is actually for ow i believe this is the line so to do that again we're in the exact same line the only difference is that i've gone and passed through a different color so we can differentiate the joints or our landmarks from our connections so this color is going to be blue so 1 2 1 green 44 and then red 250 and set thickness to 2 and circle radius equal to 2 as well cool that's all well and good i think we're good to render this now so let's go ahead and run this maybe we'll get a pop-up up close try that again all right cool so we've got our webcam feed it's all working you can see that we've now gone and changed the color of our hands pretty cool right and this will work with or without a green screen so i've obviously got a green screen up but you can see it's still tracking relatively well even with a complex background so all well and good working pretty cool there now if we wanted to we could actually change our color right so say for example i change this from 44 to i don't know let's actually change this value so blue green red so if i set this to 250 and then run it again this should change the color of our lines now let's wait and see let's run it again cool and so you can see that our line is now more purple than pink right but again so you can sort of see that you've got full flexibility to work with this stuff as well right pretty cool right all righty in that particular case so that is really step two done now so we've gone and done a bunch of stuff so we've gone and rendered our image we've gone and drawn our results to the array now there's one last thing that you can actually do so what i want to do in this particular tutorial is actually flip the image on the horizontal because when we go and do some of the more sophisticated stuff in the next video we actually want our frame to be flipped so we i ideally want to be able to detect our left hand and our right hand so what we can do is just pass through a flipping function so image cv2.flip pass through our image and then we want to pass through one so this is going to allow us to flip our image on the horizontal so flip on horizontal alrighty let's try that now so ideally you should see that the frame is actually flipped on the horizontal now you won't see anything with the green screen in the background but i'll take it down again cool so you can see that our oh my microphone is now on this side again all still rendering pretty okay take the green screen down again you can see that the image is now flipped pretty cool right now the last thing that i want to do is show you how to actually save these captures so say for example you wanted to show your friends or you wanted to actually produce results for a paper you can actually save the results from each one of these detections so let's quit out of this and go and wrap this up so the first thing that we're going to do in order to output our images is actually make a new folder within our current directory so to do that we can write os dot dr let me just double check and we're going to call it output images so this should create a new directory in the current place that you're working so let's go and take a look and you can see that now i've gone and got a folder called output images so let's actually so i'm just going to delete that let me do it again so you can sort of see it so as of right now we don't have a folder called output images so if i run that now we go and take a look it's gone and created it again so this gives us a folder to now work with now what i'm going to do is i'm actually going to copy all of this code and paste it down here and there's really just one change that we need to go and apply it to our baseline code to be able to go and save our images so let's go ahead and do this and then we'll wrap it all up all righty that is our last line of code now done so this line of code here is actually going to save our image so if i go and take a look at what we've written there there's quite a few functions stacked together so let's actually break this down so if i bring that there and i bring this down this down and i think that's looking good okay so what we've actually got first up is we're naming our image right so this line of code is actually naming our image now let me actually show you what that does so to name our image we're basically using our uuid library that we brought in right up here and again uuid creates a unique identifier so if i type in uuid dot uuid 1 you can see it's going to generate this string here and really this is going to be the name of our file now what we're doing is we're actually using some string formatting to be able to go and name our file so if i show you what that line does so if i go and actually copy this so you can see that it's going to go and append a unique identifier.jpg to be able to go and name a file and the reason that we do this is so that we don't have any conflicts when we go and output our image we want it to be a unique identifier then what we're doing is we're going and creating a path to be able to determine where our file is going to be output so to do that we're in os.path dot join and then we're going to store it in our output images folder and then we need to pass through our file name so that gives us this and so this basically specifies that we're going to be outputting our image to the folder output images and then because i'm on a windows machine the file path is a double backslash but on a mac or linux it'll be a forward slash and then we're passing through the name of our image here which has been randomly generated on the flight and then what the last thing that we're doing is we're actually using the cv2.iam write method to be able to go and write out our image so we effectively pass through cv2 or write cv2.iam right which gives us so the first parameter that we pass through is the file path to our image and then the second parameter is our actual image itself so if we actually ran this now let's go and take a look so ideally we should sell have a feed from our last detection so it'll be in hand pose output images and all right so you can see that it's gone and grabbed our last frame from our last detection in this case it doesn't actually have any hands on the screen but that's well and good so we know that that's actually working now so this line of code is exactly the same as this so if we bring it into one line that is all well and good cool now if we run this so it's actually going to save every single image that we actually process from our webcam so if i actually delete this line here and run this new code so for every single detection that we actually make we're actually going to see the results output to this folder over here so let's bring this to the side and this over here and if we actually run this now you should see all of our images start populating over here so let's go ahead and run it ideally we should get a pop-up we'll see if it works first time all right that's all working well cool so you can see all of our detections and you can see that we're actually drawing them they're being saved to our folder pretty cool right and it's pretty quick so if you wanted to go and share this put it in a paper save it for a research project you could definitely do that right and again i can keep messing around hit cue and you can see we've got all of our different images now saved so again you can work with these you can go and output them and they're going to be in the same resolution as our actual frame so if you actually flick through them you can see me moving it's almost like a stop motion pretty cool right but on that note that actually wraps up this tutorial so we've gone and done a bunch of stuff here so first up what we did is we installed and imported our dependencies we went and took a look at the landmark map for the hand model we then went and actually made some detections from our webcam rendered it applied some custom coloring which you saw down here and then the last thing that we actually did is we actually saved our output so if we wanted to go take this embedded in something else you could definitely do that but on that note that about wraps it up thanks so much for tuning guys hopefully you enjoyed this video if you did be sure to give it a thumbs up hit subscribe and tick that bell so you get notified of when i release future videos and let me know what other videos you'd like to see and what you ended up using the media pipe hand pose model for thanks again tuning in peace
Info
Channel: Nicholas Renotte
Views: 60,079
Rating: undefined out of 5
Keywords: mediapipe tutorial, mediapipe hand tracking, mediapipe python tutorial, hand pose estimation, hand pose estimation python
Id: vQZ4IvB07ec
Channel Id: undefined
Length: 36min 2sec (2162 seconds)
Published: Thu Apr 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.