Real-Time Head Pose Estimation: A Python Tutorial with MediaPipe and OpenCV

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so now we can see that when i'm moving the face here around we can see that now we're looking up we can see the blue line here actually gets larger the more we look up so we're scaling the line based on how much are we looking in what direction so we can see it follows around the head if i'm trying to do circles it'll just follow around my head the blue line scale based on how much are we looking in a direction now we're looking down now we're facing forward looking right looking left now we're looking up hey guys i'm always a new video in this computer vision tutorial in this video here we're going to do head post estimation so we'll use media pipe try to like get the face meshes as we've done in the previous videos and then we're going to use open speed to actually like do the post estimation of the head that we're detecting with media pipe but first of all remember john disco server i'll link to it on description here you can join channel shadows about computer vision deep learning ai and so on you can also become a member of the channel if you just want to support channel with a small monthly fee everything will go to create more and better quality content here on the channel also if you have some problems in your own projects and so on you can also become a member and i can help you out and guide you in the right direction so thank you guys first of all before we're going to jump into the actual code and do the head post estimation with media pipe and opencv and first of all i'm going to give a shout out to all these awesome members of the channel that i'm helping out with the project or they're just supporting the channel uh with a small monthly fee every month and a special shout out to audrey baska here for the massive challenge report so right now i'm going to do the code here we're going to use media pipe and opus v to actually do this head post estimation we've done posters mentioned in opencv before and we've also done the face mask detection with media pipe so we're kind of combining those two videos uh together so if you want to know more details about how to use post-destination with opencv some different kind of like optics where we're doing camera calibration first and so on or if you just want to know how we can do face mask detection make sure to check those videos out i go way more into details in those videos and this we're here we're just going to combine them to some kind of project where we can actually do head post estimation we're going to see the results here at the end of the video and it's actually like some really nice and cool results that we get as we're going to see so first of all here we're just going to import the different kind of modules uh that we need so we're going to oh import opencv mediapod numpy and then time because we're going to time how how long actually it takes to run this algorithm or this post estimation here off the head when we're doing this phase mesh detection too so first of all we're going to use media pipe to actually just open up this face mass detector so we're going to have our mp underscore phase underscore mesh and we're going to set that equal to the media pipes solutions so these are just the machine learning solutions that are provided by um media pipe we just go in and take the face mesh then we're going to say that the face mesh detection here we're going to set up minimum detection confidence so we need a confidence score higher than 0.5 to actually say that this is actually like a detection or like we should actually take this as a phase and then do uh face mass detection on that we also have a tracking confidence score here so we need to be above that threshold too when we're tracking around the face or like the face mesh in the image from frame to frame so here we're just going to set up the drawing utilities so we have the mp solutions drawing utilities so we can actually display the whole face mesh that we're detecting in the images we're going to set up some drawing specifications here where we can just set up the thickness and also the circle radius for the face mesh that i'm going to show you uh later on where i'm actually going to display the results so now we're going to open up our video capture it will just be opening up our webcam we'll store it in this variable or this object capture here all cap and then we go down into our while loop and then we just run our while loop as long as our webcam is actually like open then we're going to read an image from a webcam we're going to store it in the image variable here but then after we're reading the image we're just going to start a timer so we can so we can actually time the algorithm how long it takes and how many frames per seconds that we get when we're writing this head post estimation here with media pipe and opencv so first of all we need to convert our images from bdr to rdb because the models that we're passing our images through with mediapipe uses rgb where opensv reads in the images as br first of all we're just going to flip the images here so we can actually get a self use play later on so it's not like mirrored or like the image is not flipped when we're going to show the results then we just have this convert cvt color convert it from per to rgb then we can set the image here writable equal to false so we can write to the image we'll only be able to read from the image we just we only need to read from it when we're passing it through our new network or our machine learning model which we're what which is what we're going to do down here in line 29 we just call this face mess here which is the actual face mask detector object that we have created up the top of the program then we just call this dot process and then we just pass in the image that we want to feed forward in our actual like machine learning model so just pass our image in here it would process it and it will store all the results in this results variable then we can then later on go in and get all the different kind of facial landmarks that were actually detecting in the images so the idea behind this machine learning solution here is that first of all it is actually running a face detector and then it crops the image to that phase and then it does like phase um face mass detection of different kind of features in the faces um after i've cropped it and then it passes that through the neural network so right now after here we're going to set our image um writable here again so we can actually write your image um display some things on it and so on then we're going to convert it back here again from rgb to big er so we can actually like use the methods and the operations from opensv to draw on it and do a lot of different kind of stuff with our image then we're going to store our image height and also the image width and also the number of channels of our image here so we just take image dot shape we're going to use this information because the results that we get out from uh this facemaster process method up here it will actually be normalized values so we need to scale our scale those values with our image dimensions so it will be the height and the width then we're going to have two lists here which is our phases in 3d and 2d because when we're doing post estimation with omcv we're using an image based approach we need some 3d reference points that we're mapping down to the image plane and then we're trying to find the poles or like the rotation and translation from those 3d points down to the image plane of our 2d point so it will be the corresponding corresponding points in 3d predicted down into the image plane where we have our 2d facial landmarks that we're taking with media pipe and this over here we're just going to use a molecular camera so we're just going to scale our 3d values out into space with some value you could use derivation to get the exact values of the phase that you actually want to do post estimation of but this is just for simplicity and we're just going to scale those values out in the environment or in the world and then we're going to protect them down and do post-estimation of those points we might create another video where we're going to do more detail with stereo vision calibrating our cameras first of all then get to get the exact depth of the 3d object that we want to do post estimation of and so on so make sure to subscribe button under the video here and also hit the bell notification so you get a notification when i upload some new awesome videos that you can use in your own projects or if you're just learning opencv or programming in general so down here after we've created our two list where we're going to store our 3d point and our 2d points off the face we're just going to go down here into into results we just check if there's any results so inside the results we have dot multi-phase landmarks and if there is actually some detections we can act like this will be true so if there's any detections we go down and then we just run through all the phase landmarks detections that we have done so if there's only one person in the image it will only have like one result where we just have all the fade landmarks here that we can access individually um inside this for loop so here we just run through all the landmarks that were detected in our image then we go down into another for loop where we're just going to take the index and also this lm value so actually we're just running through all the landmarks in our face mesh detection these are indexes here for the different kind of like nose ears mouth and eyes that we're going to use to actually do the post estimation you can use some other points or you can actually use more points if you want to but this would here we're just going to use these six points here as we're going to use in the pnp method so it will just be for example like the nose the mouth the eyes and ears then we're going to use those points and then we get the corresponding 3d points and then we try to do post estimation with those two corresponding points so down here if the index is one we're just going to set the notes to here and those 3d to to the exact values that we're detecting and then we just scale out the values for 3d nodes here um which is three thousand so we just scale it value out three thousand into the three thousand into the world coordinate system so this will just be our nodes coordinates that we're going to use later on to actually draw a line so we can actually see where we're looking in the image frame or like in the world when we're doing head post estimation then we're going to get our x and y values which will just be the face coordinates for all the different kind of coordinates that we want to track so we'll just be all the indexes that we have here we want to store the x and y values of those we just go in here take the exact value so this will be the landmark value and then we'll get the corresponding index for that landmarks so we go in take a landmark and then we take the x value and the y value of those and then we just scale it with the width of our image and the height of the window image because these these values here that we get out they are normalized so we need to uh convert them back here again onto our image space then we will get the x and y value for nose the ears the eyes and also the mouth and so on and then we'll just append those x y coordinates to our phase 2d and also to our face 3d but for the face 3d we also need to store our set value here or like a c value our c value here for the facial landmarks so now we're going to have this 2d phases and 3d phase lists here with all the different kind of coordinates so our 2d coordinates and our 3d coordinates and then later on we can use that to actually like um actually use that in the salt pnp method from opencv we're going to convert our list here to a numpy array and we just specify the d type here to floating point 64 values we're going to do the exact same thing for phase 3d and then we're going to set up the camera matrix if you're a calibrated camera i'm going to create another video where we're going to calibrate a camera do all this process here over again but this way here we're just going to make a symbol so we're just going to set the focal length here equal to 1 times the width of our image then we can set up our camera matrix so the intrinsic parameters of our major camera camera so again if you have calibrated camera you will get outtakes intrinsic and also the extrinsic parameters but only the intrinsic parameters in this video here and also the distortion parameters so this is not this distance matrix it's the distortion just distortion like parameters or like values you will also get those out if you're calibrating your camera but here we can see we have our focal length our focal length and then we have our optical sender here in x-axis and obstacle center in the y-axis we get our distortion parameters we will just set those as zero so we'll just have a vector with four values of zero so we don't have any distortion in our image again if you're a calibrated camera you'll get your camera matrix you'll get the focal length and you'll also get distortion parameters and then you actually have everything you need to solve this pnp method here and get to actually post off those points that we created or like that they detected in the image frame so we use salt pnp here we pass in the 3d 3d points of our face detective we pass in the 2d point of a face detector and then we need to pass in the camera matrix and also the distortion parameters here so now we have everything the output from this one here is that if we're actually successfully um estimating the pose of those points we only get the rotation vector so how much is the point actually rotated in the image and we also get the translation vector so how how much is the our point actually like translated around so this will just be a transformation of the of the points in our actual like image plane that were detected with ohm's v or like with media pipes phase mesh detector so down here we get our rotation vector we want to process this more because this is actually like what we want when we're doing head pose estimation so we want to say like how much is are we looking in in what direction so we get a rotation vector first of all we're going to convert it to our rotation matrix we're just going to use almost v to do that so we have this rodriguez we just pass in our rotation vector and then the output will be our rotation matrix and also the jacobian matrix which we're not going to use so then we can get the actual like angles for all the axises so we're interested in getting the the rotation around the x-axis and also the recitation around the y-axis so we can actually say how much are we looking in what direction in our image plane but we can't really detect like to see the z-axis because it will just be this rotation around um the camera so now we're gonna go down get the actual like angles again we're just using um almost v so we have this rq decomposition 3x3 rotation matrix we just pass in our rotation matrix and we get out the angles and the rotation matrix and we also get these these q values so it will be a rotation matrix around all of the x-axis the y-axis and the c-axis but here in this angle we can just get direct lead x y and c c values again those values here will be normalized so we need to so we need to actually convert them back we're going to multiply that with 360 degrees so we actually like get the number of degrees that we're that we're looking in what direction from this is from the center of our image so now we're actually have the rotation here for the x-axis the y-axis and the z-axis as we're going to display on the image when we're going to run the program here as well so now we have done this we have estimated the pose of the head we've detected all the points that we need we have done the post estimation of our head and now we actually have the angles so now we can just set up some if statements checking in what direction are we actually looking and then we can also draw a line for the direction that we're looking and also scaling that line based on how much are we looking in what direction as we're going to see at the end so here we're just going to say like if the y value is is less than minus 10 we're looking left if the y values is greater than 10 we're looking to the right and if the x value is is less than -10 we're looking down and the other way around if it's greater than 10 we're looking up and if none of these here are the cases we're just looking forward we're just facing the camera if these values here i belong 10. so now we can go down here and actually just predict the points back again so we can see if we have actually done a good enough post estimation so we're just passing the nodes 3d our rotation vector and our translation vector these are the values that we get out from the solve pnp method again we pass in the camera matrix and also this distortion parameters and then we're just projecting our points because now we have actually predicted our points out in the 3d world then we're going to project those points back down to the image plane so we're going to protect the nose down to the image plane we can also use this information to draw the line so now we will just get the nose 3d coordinate here protect it down to the image plane and again we get the jacobian so then we can set up the two point the wax like water drawer line so the line in what direction directs like looking to actually just display how our face or like our head post estimation works and to see in what direction our wax like looking um when with respect to the camera so here we just have our point one it will just be the nose to the x value and and nose to the y value which will be the first point so the line just starts from the tip of our nose and then we're just going to scale those values out with the degrees in the x y direction and also in the x direction and then we just scale that with a value of 10 so we actually like just create a line and then the line will be scaled um it will be scaled based on how much are we actually looking in what direction as you're going to see so now we're just going to draw the line which opens the v we pass in the image that we want to draw the line the two points that we want to draw the lines between the colors and the line and the thickness and so on then the last thing before we're going to run the program is just that we're just going to put out this text so we're just going to display in what direction are we looking we're going to display the x y and the c rotation so how much are we actually rotating around the x-axis the z-axis um so these are the values that repeated that we're actually basing or like that where the act like set up this if statements for so just in what direction are we looking but just to display like how much are we actually looking um in the x direction c directions and y direction and so on then we're going to end the timer because now we have done all the code that we want we just want to calculate how many frames per seconds do our algorithm act like get when we're running this program we display our frames per seconds and then we're just going to draw the landmarks here with the drawing utilities that we set up at the start at the program we're going to improve so we're just going to have our head post estimation we're going to show the image where we're drawing all of this stuff on if we had xk or lq here at a keyboard at any time it will just terminate the program and it will release our webcam so now i've been through the whole code here and we are now going to run it i will upload this to my github i'll link to the github download description you can just go in there take the code directly and then you can just play around with it yourself try it out yourself and and try to optimize it and use it in your own projects and on prop problems so here we can see i'm just going to close it here and then we're going to count out this print frames per second because we're already displaying it in the image frame so now here we can see that we get up the image it looks really nice we're detecting all the meshes all the different kind of features in the image here we can see my eyebrows my nose my mouth here that is moving and also the face so we got this really nice face mask that we're detecting with media pipe we can see up here at the top we're now facing forward and then we can see the blue line here is actually like in what direction are we looking based at based on these values up here that we're estimating so this is the post estimation of our actual point that we're detecting with with our face mass detector from media pipe so this is solid pnp this is media pipe and this is the our own logic that we set up so now we can see that when i'm moving the face here around we can see that now we're looking up we can see the blue line here actually gets larger the more we look up so we're scaling the line based on how much are we looking in what direction so we can see it follows around the head if i'm trying to do circles it'll just follow around my head the blue line scale based on how much are we looking in a direction now we're looking down now we're facing forward looking right looking left now we're looking up again we're running at around 100 frames per second so this is really fast i'm just running on a standard low body cpu so you can run it on your own computer laptops as well we're not using any hardware accelerators or gpus when you see all the coordinates up here so right now i'm basically in the middle if i'm moving up here we can see that we get a larger x value if i'm going in the opposite direction we can see that we get a negative value and if i'm moving to the sides we like to like get a displacement or like a change in the y value so now we can see it is positive y value and now it will be negative we can't really see anything for the z-axis because it will just be it will just be the camera axis that we're actually facing so this is really nice really cool it can be used for a lot of different kind of things like for example autonomous cars you want to detect if people actually like looking down at their phones while they're driving so now right now it's looking down if it if the person has been looking down for free five seconds give them a warning we're going to create some projects with that it it's going to be really nice and really cool i'm really looking forward to it we can do a lot of different kind of things with head post estimations and we can also have multiple faces in the image it will still uh do all of this here um together so thank you guys for watching here and run the subscribe button and notification on the video here so we get a notification when i upload new videos or projects about computer vision deep learning and so on we're kind of combining some of the videos creating some really cool projects also remember to like the video here if you like the content and you want more in the future it really helped me and youtube channel out in a massive way i'm currently doing this computer vision tutorial where we're talking about camera calibration how we can combine some of these things basic image operations on the images how we can use stereo vision to get depth in the image create point clouds and do operations on that so if you're interested in computer vision tutorial i'll link to it up here or else i'll just see next video guys bye for now

Info

Channel: Nicolai Nielsen

Views: 31,149

Rating: undefined out of 5

Keywords: python, opencv, machine learning, mediapipe, mediapipe tutorial, google mediapipe, mediapipe machine learning solutions, machine learning solutions, opencv python, python opencv, computer vision, computer vision python, opencv tutorial, computer vision tutorial, head pose estimation, opencv pose estimation, pose estimation opencv, estimate head pose, pose estimation python, mediapipe facemesh detection, object pose opencv, opencv object pose, find object pose

Id: -toNMaS4SeQ

Channel Id: undefined

Length: 21min 10sec (1270 seconds)

Published: Tue Dec 21 2021