AI Body Language Decoder with MediaPipe and Python in 90 Minutes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Music] what's happening guys my name is nicole astronaut and in this video we're going to be building our very own body language decoder now you're probably thinking nick what on earth is a body language decoder well it's exactly what it sounds like so what we'll be able to do is leverage our webcam and be able to decode what our body language might be saying at a point in time so specifically in order to do this we'll be leveraging a pre-built library called media pipe as well as a custom machine learning model to be able to take our landmarks from our face as well as the different poses from our body so where our arms are where our torso might be where our legs might be to be able to decode that body language to see what our body might be saying let's take a deep look as to what we'll be going through so what we're going to be doing is a bunch of stuff now specifically first of what we're going to do is set up media pipe so we'll just do a quick pip install to get our dependencies installed we're then going to do a recap of the previous media pipe holistic model so we're going to be able to leverage the code that we already wrote from there so if you wanted to get this up and started real quick you'll be able to do just that then what we'll do is we'll collect our joint data so specifically in order to train our custom machine learning model we're going to need some data on where our joints are positioned for a particular pose so we're going to collect that down and save it down as a csv then we're going to use that data that we just collected to train a pose or a body language classification model i stopped at pose because it's actually a body language classification model because the pose classification model would just leverage your pose data whereas what we're actually doing is we're taking our facial landmarks and if we wanted to we could take our hand joint landmarks as well to be able to leverage this custom classification model so we'll train a model to detect a couple of different body language poses so maybe we'll do happy and sad and possibly like victorious or something along those lines maybe we'll have a special one towards the end and then what we'll do is we'll leverage that model to detect our different body language poses at the end of the video so we'll actually be able to do this in real time using our webcam enough on what we're going to go through let's actually take a look as to how this is all going to fit together so as i was saying first off what we're going to be doing is leveraging media pipes so mediapipe is a pre-built machine learning library that allows you to do pose estimation and landmark estimation really really easily so the beauty of this is that it's really accurate and really easy to get up with and what we'll be doing is we'll be leveraging the code that we wrote in the previous video to give us a bit of a kickstart then what we'll do is we'll capture all of that joint and facial data so we're going to use both our facial data and our body language poses to be able to train a custom machine learning model which we'll do with scikit-learn to be able to classify each of those different components of body language then what we'll do is we'll actually go and render it using opencv so we'll actually be able to use our webcam to be able to detect different poses and different body language components in real time ready to do it let's get to it alrighty guys so in order to build our body language decoder there's five key things that we're going to need to do so first up what we need to do is install and import our dependencies then what we're going to do is make a couple of detections now for this we're going to leverage the pre-written code that we wrote in the media pipe holistic tutorial link to that tutorial is going to be in the description below and somewhere up above once we've made some detections we're going to start leveraging that particular block of code to capture landmarks and eventually export those landmarks to csv so really what we're doing is we're using the media pipe model to detect all the landmarks and then we're actually exporting that out based on a specific class again i'll talk about this later but we're going to use that csv to be able to train a custom model so specifically for our body language decoder we're going to train it to recognize three different poses initially so happy sad and we'll probably do one that includes a slightly more dependent model which leverages the different components in your body so we'll probably do like victorious where you've got your hands up so once we've done that we're then going to train our custom model and for that we're going to use scikit-learn which is probably one of my favorite packages out there so as part of that i've broken that into three subsections so we'll read in our collected data so this is going to grab the csv that we've got from step two over here we'll then do a little bit of pre-processing up here we'll do a train test split we'll then train a machine learning classification model to be able to detect our three different poses based on our joint coordinates and then we'll evaluate the model pick the best one because we're actually going to build four different pipelines i'll show you how to do that and we'll save that down then once we've saved that model down we'll actually be able to use it to make new detections on our coordinate data based on our webcam so we'll actually be able to build that model or that custom model into our new pipeline alrighty that's enough of me talking let's kick this thing off and start by installing some of our dependencies alrighty those are our dependencies now installed so the line that i've gone and written is exclamation mark pip install media pipe space opencv dash python pandas and then scikit-learn so what we've actually gone and done here is installed four different libraries so specifically we've installed mediapipe opencv pandas and scikit-learn now let me quickly give you a little bit of background as to what we're going to use each of these for so you've got some context so mediapipe is actually going to give us our holistic model now let me actually show you the media pipe documentation so media pipe is actually going to give you a whole bunch of good stuff now specifically we are going to be using the solution called holistic which is going to give you something that looks a little bit like this so you can see that you get all your joints estimated and in this particular case you've got someone playing basketball and you've got a lightweight facial recognition model facial landmark estimation model there but you can't really see it on this you can see it a little bit better here so we're going to be using mediapipe for that so that's going to give us all of our different landmarks then we're going to be using opencv to be able to render our results to the screen we'll use pandas down in step number three to be able to train a custom model we'll also use scikit-learn to train our model so pandas is going to give you your tabular data processing so think of it as reading in your csv working with your csv as a table inside of pandas also known as a data frame and scikit-learn is going to give you all your traditional machine learning capabilities so we're going to use scikit-learn's machine learning capabilities so the train test split we'll also use a bunch of algorithms in there to be able to train a classifier cool all right now the next thing that we need to do is actually start bringing in some components now as i said we're going to leverage a bunch of code that we wrote in the media pipe holistic tutorial so rather than having to write this from scratch we're going to be able to leverage this but again all the links to this including the tutorial and the code for the previous tutorial are going to be in the description below now this is actually the code that i'm referring to so this is available on github so github.com forward slash nick knock knock forward slash full body estimation using media pipe holistic so this repository over here now from here we're going to start out by copying a few things so i'm just going to copy them over and then we'll take a look so i'm going to copy over this cell which starts with import media piper's mp and again all the code for this tutorial that we're currently doing is going to be in the description below as well so you'll be able to grab that so we're going to copy this over lost my mouse there and paste it into our notebook so import media pipe is mp and import cv2 i'm going to go through this in a second and then we're going to copy this line as well okay now let me take you through what each of those two blocks of code are so the first one is actually importing media pipe so we installed it up here we actually need to bring it into our notebook so to do that we've written import media pipe as mp and this is really just the code that we've copied from that github repo right then we're importing so if you think of this as import media pipe and then this is importing opencv so we've now got that available as well so mediapipe is going to give us a model as we're discussing and because we've written as mp we're going to be able to work with it as the mp variable and then we've imported opencv so to do that written import cv2 so this is going to give us all of our rendering and drawing capabilities then these next two lines that we've brought in are really media pipelines so the first one is going to give us some drawing utilities so mediapipe makes it a whole heap easier to actually draw the different components of your model using these drawing utilities so rather than having to draw each individual landmark you can just use the drawing utilities and that's going to do it all for you and you'll see that in action in the next block of code that we're going to bring over so that line of code there is mp underscore drawing equals mp.solutions.drawing underscore utils and then we've got so think of this drawing utils utilities or helpers really and then this line is actually our media pipe models or media pipe solution solutions so the next line is mp underscore holistic equals mp.solutions.holistic so this is going to give us all of our landmark detection models that we're going to be able to use cool all right so that's those two lines of code done now the next block that we're going to bring in is again back inside of our github repository if you scroll all the way down there's this component down here so under 3 apply styling which you can see there there's this block of code here and this is our main detection code that we wrote in the previous tutorial so we're going to copy that over as well and if i paste that into my notebook if i zoom out you can see that we've got it all there now i'm going to briefly go over this code but we won't go into it into too greater detail because we want to keep this tutorial as fast as possible so what this model or what this block of code is doing is initially getting a feed from our webcam so cap equals cv2.video capture is actually connecting to a webcam so this detection on this particular model is going to be all in real time so if you do get any errors when trying to display your feed or trying to capture your image you might need to play around with this number here so if i zoom in you can see that it's currently set to device zero now this is because my webcam is associated to device zero but yours might be at a different number so you might need to play around with that in this case zero tends to work pretty fine on my windows machine but on my mac i need to actually have it set to device number two so just something to keep in mind and then basically what we're doing is we're loading up our holistic model so we're going to be able to leverage the holistic model as holistic we're then looping through each frame that's coming through our webcam so while cap is opened is going to basically check whether or not our webcam is open it's then going to read the feed from our webcam and we're going to get the image from our webcam inside of this variable here called frame then everything else is pretty much mostly media pipe so everything down here so this is actually recoloring our feed and making a detection we'll come back to that this is actually drawing each of our landmarks this line down here so cv2.iamshow is actually showing or rendering our results to the screen and these blocks are really all to do with actually quitting out of our frame or how do we handle gracefully how we close down our web frame or our view you'll see that in a second now let me i want to spend a little bit more time on these blocks here because these are really important and we're going to actually make one performance tuning improvement that was actually suggested to me by a subscriber so when we read our frame using opencv by default the feed that we're going to get back is going to be in a bgr format so this basically means that the color arrays are going to be in the format blue green red now in order to work with media pipe we need our frames to be in the format rgb so first up what we do is we recolor our feed so cv2 dot cvt color and then we're passing through our image and converting it to from blue to green blue green red to rgb and we're storing that variable inside of or restoring those results inside of a variable called image now the performance improvement that i was suggested was actually to set the image flag or the image writable flag to false so we can do that using image dot flags dot writable equals false and we're going to make one more change and set it back to true once we've made our detection then this particular line here is quite possibly the most important line in all of this so here we're actually using our media pipe holistic model which we had up here and we're using the process method to actually make detections to that we're passing through our image which is the image that we've gone and recolored so this means that all of our detections from our media pipe model are actually going to be stored inside of a variable called results so we'll be able to grab that out later and i'll actually show you that then what we're doing down here is we're recoloring it back from rgb back to bgr because again when we render with opencv we want it to be in the blue green red format but before we do that we're actually going to convert our or set our writable flag back to true so we can write to it so again i've just copied this line over here but instead of having it as false we'll set it to true and that's pretty much all well and good so what we're going to do is we're going to test this out so we'll actually run this big block of code and we'll take a look at our results so you'll actually be able to see the results rendered to the screen so you'll get a little python gui pop-up so you'll actually be able to see that and then we'll interrogate this results variable in a second so let's run this and ideally you should get a little pop-up so you can see it down there and there you go so you can see that we've got our estimation happening in real time and the webcam is a little low but we can fix that that's looking a bit better all right cool so you can see that we've got my body and if i've put my hands up we've got all of our detections being rendered now what you're actually seeing here is the results of a few different models so media pipe holistic actually has four different models built in or three if you count the hand models as one model so you can see here that these green lines are our face model so this is our face mesh model or a blade i believe it's face mesh or something along those lines but basically it's our facial landmark model that has around 468 different landmarks detected and you can see if i move my eyebrows i move my mouth it's all tracking in real time right this model down here that you can see on my arms and on my shoulders this is actually the pose model and then i've got my two hand models which give me really detailed hand tracking as well and all of this code is the code that we wrote in a preview in the previous tutorial so if you want to take a look at that by all means do take a look now you can see at the moment there's sort of detecting my elbows my arms but if i actually take the green screen down and step back you'll actually see that it actually does your whole body so if i stand up you can see that it's actually detecting my whole body right so i can move my arms up and down i can squat so there's a really really flick or it's really really flexible what you can actually do with this if i bring my leg up you can see it's doing legs pretty cool right actually doing feet as well there you go not that flexible guys alrighty cool so that is a quick demo of our baseline code so you can see that we've got all of those different landmarks now what we're actually going to do is we're going to use these landmarks that you can see here so our face landmarks and our body pose landmarks to actually go and classify custom body language poses so right now we're just seeing the results but we're not actually doing anything with them right so what we can actually do is export these coordinates and train a custom model to be able to recognize what the combination of those coordinates mean so if i close out of this let's take a look at how do we actually get those coordinates so remember i said that all of our detections are going to be stored inside of this variable called results here so if i actually take a look at results you can see it's just going to say mediapipe.python.solutions or solutionbase.solution outputs but what we can actually do is access the results from each one of their different models that we've got in there so remember we've got a face model we've got our hand model we've got our body model so if i type in results.pose this is our body model so pose underscore landmarks you can see that this is giving us all of our landmarks for each of our different joints in our body now these actually map to specific components or specific body parts so if we actually undo this let me or uncomment this i'll actually show you the map so you can see here that this particular landmark or landmark zero actually maps to nose and this down here is actually the landmark map so this actually shows you what each one of these landmarks actually maps to so landmark zero is going to map to nose landmark one is going to map to our left eye in us at this particular this little landmark down here left eye maps to landmark two left three so on and so forth so you've got a whole bunch of landmarks that you can leverage there now again this applies to our face model as well so if i type in face landmarks again you've got a whole bunch of face landmarks now we can actually grab these landmarks as an array just by typing in dot landmark after it you can see this has given us all of our landmarks now there is one key thing or and actually if we wanted to grab these coordinates as well which we're going to do in a sec we can grab say for example the first one that's going to give us our first result and then we can type in dot x to get our x coordinate dot y to get our y coordinate and dot z to get our z coordinate and we can do this with our pose landmarks as well so if i type in pose you can see i'm now getting my pose landmarks now the pose landmark model actually has one additional value that's available there so if we type in visibility this actually tells you whether or not the particular landmark is actually displayed to the screen now our face model doesn't actually have that available or doesn't actually have that value available so if we type in face landmarks when we go to grab the visibility component it's just going to read out a 0. so already it's handling that particular issue pretty gracefully so we don't need to do any additional processing just know that when we go and export these coordinates when in this particular step here our visibility values are going to be zero for all of the coordinates for our face model but that's fine if you wanted to you could drop off those columns but in terms of keeping this relatively simple you don't need to remove that it handles it gracefully all right so we've got all of our landmarks we've gone and tested out our code installed our core dependencies and now it's time to actually go on ahead and capture some landmarks i'm just going to comment this again so we don't need to take a look at that it's taken up a bit too much screen real estate now what we're going to do is go ahead and capture those landmarks so first up what we need to do is actually bring in some additional dependencies so we need to bring in something to work with our csv something to work with our file structure and we all probably also need a numpy to handle our numpy arrays so let's go ahead and bring in these dependencies okay so those are our three dependencies imported so i've written three lines of code there so import csv so this is going to help us work with csvs and actually write out our data to a csv then we've imported operating system so import os this is going to help us with our folder structure and actually exporting into a specific folder structure and then we've brought in numpy so import and numpy as np and numpy just helps us work with numpy arrays so in terms of array processing array math functions numpy is really really good in that space okay so that's pretty good now the next thing that we want to do is double check how many landmarks that we've actually got and start to set up our column name so ideally you want your column names defined so that when we get down to here and actually start training our model you can actually reference each coordinate by its column name so we're going to have the first column as representing our class and then the rest of the columns as actually representing our coordinates so we're going to have a x coordinate a y coordinate a z coordinate and a v coordinate so x y z and visibility now again as i was saying the face model doesn't actually have a visibility component so for all of our face landmarks we're actually just going to have a zero value there now what we probably want to do as well and so that first column class is actually going to represent our different poses right so we're going to have happy sad and then victorious but again you can add in so many additional poses to this so if you wanted to add in more stuff you could if you wanted to add in a whole bunch of different poses you could so the world's really your oyster with this block of code and i'll actually show you how to retrain using this code so what we want to do now is double check how many landmarks we actually need to handle so in order to do that we can type in landmarks actually it's results dot face landmarks right so that's going to give us all of our face landmarks now if we take a look at how long that is you can see that we've got 468 landmarks and what we can take a look at is our pose landmarks as well so if we type into oppose landmarks you can see that we've got 33 post landmarks now what we want to do is actually loop through all of those so we're going to loop through the 33 plus the 501 so we're going to store this inside of a variable base landmarks so all up we've got 501 coordinates so we are going to store this inside a variable called num chords right so num chords is really just the number of coordinates that we're going to have to loop through right so all up it's 501 coordinates so that's all well and good now what we want to go ahead and do is actually create our column names remember we're going to have class and then all of our different sets of coordinates so really each set of coordinates is actually going to be four values so x y z and v so v being visibility so let's go ahead and loop through we'll create our different column names and then we'll actually be able to use that once we export it to csv okay so i think that is looking pretty good there so we've gone and set up our landmarks let's actually run that so that's looking good so if we actually type out landmarks let's let me show you what it looks like before we explain the code so you can see that inside of our landmarks array or list here that we've got a whole bunch of different values now these just represent the column names of our csv that we're eventually going to export with all of our coordinates so we want to have this header row that we're going to have inside of our csv now the first column that we've actually got is class and this is going to represent the different pose that we want to detect so happy sad and victorious and then the next values are all about coordinates so remember i said that each coordinate is going to be made up of four values so an x coordinate a y coordinate let me zoom in on that x coordinate y coordinate z coordinate and a v coordinate and these go all the way down to 501 so if we take a look at the last couple of values so minus four wrong way so you can see that i've got x501 let's take a look at three y501 z501 and v501 so we've got our class column and then we're eventually going to have all of our different coordinates so again this is really just the name of our column so this we haven't actually exported our coordinates yet this is really just going to be our header row inside of our csv now what we want to do is actually export this to a csv initially so we're actually going to start out by exporting our csv with our column row so eventually what we'll actually be able to do is append to this csv as we want to add new poses and build up from it really so let's go on ahead what we're going to do now is export our columns into our csv and then we'll actually be able to take a look at what our csv looks like okay so those are our columns now written out to csv so let's actually take a look at that so it's going to write out into the same folder that you're currently working in so you can see here that i've got this file called coords and we've got all of our columns now exported so we've got our class so if i just highlight that let's make this a bit bigger so we've got our class and then we've got our four sets of coordinates all the way up to 501 so if we actually zoom out of this so and we scroll all the way to the right you can see that we're all the way out to 501 all the way in this column here so you can see that we've got all of our coordinates now ready cool all right now let's take a look at how we did that so what we first did is we created a new file so again we just used the with open statement and we created a new file called coords.csv if you wanted your csv to be named something different you just got to change this file name here but remember that we're going to need this file name to be consistent when we actually export our coordinates then we pass through two keyword arguments so mode equals right and we also set what we want our newline character to be so in this case a blank value and then we're going to work with this file as the variable f so over down here when we actually use our csv writer so this is actually using the csv library that we brought in over here so what we did is we said csv and we used the dot writer method we passed through our file and then we set a couple of keyword parameters here so this csv writer is effectively think of this as our operator so we're then going to be able to use our csv writer to go and write stuff out so we initially create our csv writer and then we use the right row method so csv underscore writer dot write row to write out our landmarks right so effectively all we're doing is creating a new csv writer which you need to create or instantiate initially and then we're actually using the right row method to push out our landmarks which you can see here and if you remember correctly our landmarks are all of these so when we actually go to write out our coordinates we're going to use a similar line so we'll use csv underscore writer dot right row but instead of having our landmarks here we're actually going to have the coordinates that we've processed from add detection now in terms of these keyword parameters that we set up up here so we just set up our delimiter equal to a quote our quote character equal to double quotes and our quoting equal to csv dot quote minimal so again you can play around with these keyword parameters but i found this works pretty well so that's our header row now exported now what we want to do is we actually want to start capturing some coordinates so remember i said we're going to capture coordinates for four different poses or three different poses so happy sad and then victorious so again but you could customize these to whatever you want so say for example you wanted to do angry or irritated or you had a specific say for example you wanted to detect drowsiness so whether or not your eyes are closed or whether or not you're dozing off you could do that here as well in this case we're going to set up a variable called class name and we're going to set that equal we'll do happy initially right so we'll use this in a second so really all i've done here is created a variable called class underscore name and set it equal to a string which is called happy now we're actually going to do this a few times so we'll collect coordinates for the happy pose or for the happy body language estimate and then we'll do it for sad and then we'll do it for victorious as well now in order to actually do this we're actually going to go back up to here and we're going to copy our main detection code so i said this is going to be used a few times so we'll actually copy this down and bring it into here so that's our baseline let me zoom out so you can see the whole thing so that's our big block of code now what we're actually going to do when we're collecting these coordinates is run this code but as our model detects our different coordinates we're actually going to write them out to our csv so we're actually using the model to detect or to extract the landmarks attached to that particular component of body language so we'll extract the landmarks based on when i'm happy so ideally it should say that my mouth is sort of open we'll extract them when i'm sad so again head down and sort of disheveled and then we'll extract them for when i'm victorious so arms up happy so on and so forth let me zoom back in now what we need to do for this is actually write a little code to actually export these coordinates so we're going to use a similar block that we did to export our column names but this time rather than just exporting our column names we're actually going to export our coordinates so let's go ahead and write this then we'll take a step back and take a look at what we wrote so initially what i'm going to do is i'm going to wrap this all inside of our try accept block so this just means that if we get any errors we won't break out of our loop completely we'll just gracefully exit or gracefully go to the next particular frame so let's go ahead and write this code to extract our coordinates okay so before we go any further that's our pose row created now let me explain this a little bit so what we've actually got here is we're actually extracting all of the landmarks for our pose now remember this is just akin to doing this really so results dot pose landmarks dot landmark right so that's going to give us all of that then this line of code is actually going on ahead and formatting our different landmarks into a numpy array so rather than working with it like this we're actually structuring it so that we remove our keys now if i actually show you what the result of this particular line looks like so let's actually store this inside of variable corpose and if we run this you can see that what we're actually doing is we're extracting all of the different coordinates into a numpy array so we've got our x value our y value our z value and our visibility value now we're going to do the exact same thing for our face landmark so if we copy this paste it down these ones are going to be called face and we're going to call it the second line face row because remember we're first up extracting our pose then we're extracting our face landmarks and we can just change this and this over here so face so if we take a look at this now so these should do the exact same thing but rather than doing it for a pose model we're now doing it for our face model so if we take a look and the reason that i'm putting these inside of a numpy array is because we're actually going to combine them in a second so you can see that we've got all of our face coordinates now it's cut off because we've got so many so if we actually take a look at face so you can see that then if we take a look at face row you can see that we've got all of those values now flatten is let's actually take a look at what we wrote there so for pose we wrote pose equals results dot pose underscore landmarks.landmark so this gives us our pose landmarks which again you saw previously and then what we're doing is we're just doing a little bit of a list comprehension so we're looping through each landmark in our pose detection or our pose line up here so this is effectively going to give us each one of our different landmark components and then we're storing all of those inside of a separate array so and we're extracting the x value so landmark.x landmark.y landmark dot z and we're also grabbing the visibility component so landmark dot visibility then we're storing all of that oh then we're actually wrapping all of that inside of a numpy array and then we're using the flattened method to be able to extract it out of each of these individual arrays and convert it to one big array now if we take a look at the face component let's actually take off the flattened line and take a look at what that looks like if we didn't go and flatten so if we run that and run that again now you can see that if we actually remove the flatten method you're actually going to have a bunch of individual arrays inside of one big array but if i add dot flatten you can see that now you're left with one just big array and this is the format that we want it in so that we can export it out to our csv so those four lines of code are really just extracting our landmarks now what we want to do is actually join these two arrays together to be able to export it to csv so let's wrap this up and then we'll take a look at the full code set okay so that is our exporting code now done now what i went and did is i added what is that five additional lines of code so all of these so remember we had our post coordinates being extracted combining them into one big giant array so again we use the flattened method to do that we then did the face landmarks again one big giant array and that's going to be stored inside a variable called face row the big array for pose is going to be stored inside a variable called pose row and then we're actually combining these into one big row so let's actually add some commentary here so this is grabbing pose landmarks or extracting pose landmarks extract this is extracting face landmarks then this line down here is actually adding them together so we're grabbing our pose row and our face row and we're making it into one v grow so if i delete this stuff down here let's actually show you what that looks like pose row so we've got to define pose poser sorry that variable wasn't initiated so let's do that and we need to do face row as well probably tweak that what's happening there oh and we actually this is good so we actually detected an error there so these actually need to be lists so i'm just going to wrap this inside of a list let's try that now okay cool so if we actually take a look at our row and we're going to have to change our code up there so you can see that we've now got all of our coordinates in this one big mega array here so let's go and tweak that before we forget so this should be wrapped inside of a list as and well cool so what we're effectively doing is taking our pose row and our face row and combining them into one big array called row so if we take a look at the length of that you can see it's 2004 values now what this should be is the class so if class represents one value plus uh 501 times four and you can see that we are one short why are we one short we gone and oh so we actually need to append our class name as well so if we actually appended our class name you can see that we're 2005 now so we've now got all of our values now if we actually take a look at our row with our class name appended you can see that our first column is going to have what our particular class is so happy and then the rest of the values are actually going to represent all of our coordinates so we've got a whole bunch of data there that we're going to be able to use for our custom model so that's all looking good all right so let's take a look so then we've combined or concatenated our rows and then we append our class name to the first column and then we're exporting it out to csv so this export to csv is almost identical to this line that we wrote out here to write out our columns with two key changes we're changing our mode from right to append because we want to append to our csv and not create a new file and then we're actually writing out a different row so instead of writing out a row which had landmarks we're writing out our actual row which is what we built up down here so that is all well and good now so now we should actually be able to capture some landmarks so that's let's get rid of this let's clean up a little and we don't need that or that and let's actually go on ahead and run this block of code so just to quickly recap so we've now gone and added this block of code to our initial code that we copied over from github and this should give us our coordinates or append to our coordinates inside of our coords dot csv file so what i'm going to do is i'm going to get into the pose of happy so i'm just going to do happy as me smiling and sitting here then we'll do sad so again we'll come back up here we'll change our class name to sad then we'll do a sad pose and then we'll do victoria so we'll put our hands up and cheer on so let's go ahead and run this and wait for a pop-up okay that should be enough coordinates so you saw the pop-up come up onto the screen i didn't want to talk while it's capturing because it was actually capturing our coordinates now if we actually open up our cohorts file you can see that we've got our class name which is over here and then we've got all of the different coordinates that actually represent our face in a happy state now what we want to do is append to this right so we've got our happy coordinates now now what we want to do is apply some coordinates for sad as well as victoria so let's go on ahead and do those so again we're going to do a similar thing the only change now is that we're going to change this class name to sad and run it again so i'm going to run that run it again and sad it's going to be sort of downtrodden and so if you get a little pop-up and then it closes down again when you run this code just try running the cell again and you should be good to go all right so that is our sad coordinate done and again i'm sitting down but i could just as easily be standing up and you could extract your entire body language so then what we're going to do is do our victorious pose so here we're going to use our other parts of our body a little bit more so i'm actually going to use my arms and try to get those into the screen so let's type in victorious v-i-c-t-o-r yeah that looks good and let's just check that we've got our sad coordinates appended so we've got happy and we've got sad as well cool so that's looking good and you saw that when i was collecting those coordinates i was moving my head around a little bit as well now this is important because ideally you want your body at a whole bunch of different angles to be able to go on ahead and detect that so that's looking good let's go on ahead and capture victoria so this should be our last class but again you could do a whole bunch of others as well right so let's do that i'm going to put my arm up for victorious oh let's close again let's run that again so victorious is really going to be with our arms up that's going to be the defining feature of this particular model and i'm going to get into pose all right and we're good so that is those are all of our classes now collected so if we actually go and take a look so we've got our happy class we've got our sad class and we've got our victorious class now again this is sort of our first iteration so if we run through this and we see that it's not performing that well or a particular class isn't performing that well we can actually go ahead and performance tune so go on ahead and add additional values go and capture more values or go and capture more frames so for these particular coordinates and go and tune that way but again we're going to run with this for now and see how we actually go cool so that's looking good so that is step two now done so we've gone and so far we've gone and installed and imported our dependencies made some detections to test out our code and we've actually gone and captured our landmark so this these two steps of our pipeline are actually really really important and we've actually done some significant work here now what we want to do is actually get to my favorite bit do some modeling so first up what we're going to need to do is read in our collected data and start processing it now by processing it what we're actually going to do is convert this into a train test split so this is just best practice when you're building a machine learning model you'll train on a particular portion of your data or a particular partition of your data and you'll test it out on a separate partition to see how well it's likely to perform in the real world so ideally if we get a good accuracy metric when we go and test out our model this should be indicative that it's going to perform well when we go and make our detections down here so let's go on ahead and first up we're going to import a few dependencies okay so those are our two dependencies imported for step 3.1 so i've written two lines of code there so what i've done is i've imported pandas so remember pandas is our tabula or our data frame library so this and you'll see this in a second so it allows us to work really easily with tabular data and in this case our csv is an example of that so to do that written import pandas as pd so we'll be able to work with it as pd and then we've gone and imported one function from the scikit learn library so from sklearn.model underscore selection import train test split so this train test split is actually going to give us that partitioning so it's going to allow us to create a train partition and a test partition now what we want to go ahead and do is bring in our data frame into our jupyter notebook so let's go ahead and do that okay so that was a relatively straightforward line there so we've just written df equals pd let's add some more cells below here so what i've written is p df equals pd dot read underscore csv and then we've just passed through the name of the file that we want to read in now keep in mind when we exported our coordinates we exported it to a file called coords.csv that means when we read it in here it's going to be called the exact same thing so coords dot csv now if we take a look at the first five rows you can see that we've got our class and we've got all of our different columns here so x1 all the way out to v501 now what we probably want to do as well is take a look at the bottom of our data frame so you can see that we've got all of our victorious classes so remember we collected happy first then sad then victorious so it looks like we've got all of our data there so we can take a look and just double check that we've got our side class as well so i'm just going to say do a little quick filter you can see that we definitely do have sad data as well there so to do that i've just written df and then i've included a filter in between here so i'm saying on the class column so df and then square brackets class equals equals sad so basically this is just how you do a filter inside of pandas so it looks like we've got our happy data our victorious data and our sad data so we should be good to go now the next thing that we want to do is actually set up our features and our target variable so in this case our features are going to be all of our coordinate values so our x values our y values x value x values v values y values and z values so those are going to be the features that we've got to pass to our classification model and our class is going to be our target variable so what we're effectively trying to do is train a multi-class classification model to understand the relationship between this class and these coordinates so ideally what we then should be able to do is pass through new coordinates and our model is then going to be able to predict what class that is so it's actually going to then be able to decode our body language and determine whether or not we're happy whether or not we're sad whether or not we're victorious and again going to add in a whole bunch of additional classes to the end of this to be able to train up and classify different poses we'll actually do that towards the end with our special class so that is looking good now what we want to do is set up variables for our target value and our features so normally in terms of data science we'll just call this x and y so let's go ahead and grab those values okay so those are so x think of x as your features and y uh is your target value so if we actually take a look at x you can see it's every single value except for our class column so again you want to remove your class column from your features and if we take a look at a y it's just our class column here so again you can sort of see how that flows together now what we want to do is actually create that train and test partition so let's go ahead and do that okay so those are our partitions now created so what i've written there is and again this code that i actually wrote is really really common when i actually or i use this so much whenever i'm building classification models or regression models using scikit-learn so we're actually using our train test split function there and we're passing through a couple of key values now let me show you the output before we actually delve into that so if i take a look at x train you can see that it's got 383 rows up here now if we took a look at our complete data frame which is in df you can see that that's actually got 548 rows so what we've actually gone and done is we've extracted a specific partition out of our data frame randomly and we've extracted that and it inside of a variable called x train so this data frame here is going to be what we use to train our model alongside our y train variable and again you can see that that's got 383 values again for our x test similar style of thing but in this case we've got a smaller partition so 165 rows and we've also got y test there as well so we're actually going to use our x train and y train values to fit our model or train our model and we'll use our x test and y test values to actually go and evaluate it so that's looking well and good let's actually take a look at this line of code so what i've gone and written here is x underscore train comma x underscore test comma y underscore train comma y underscore test so this is just extracting the values that we get returned from our train test split then what we actually want to do is use that function so train underscore test underscore split and we pass through our x value which we set up up here and our y value which again we set up down here and then we set up two keyword arguments so test underscore size equals 0.3 so this means that we're going to use a testing partition of 30 and then we set up our random state so this just ensures you get similar results whenever you run this so random underscore state equals one two three four and you saw that we had all of those values there so that is really step 3.1 now done so we've gone and read in our data from our csv we went and did a little bit of processing and we went and created our training and testing partitions now what we're going to do is actually start training our machine learning classification model so again as i was saying what we're going to do is train a model to understand the relationship between these features and this class so we're going to understand how x1 y1 z1 all the way up to 501 how those features relate to this particular class so that ideally when we then go and make new detections we're then able to determine what class we've actually got so that's well and good so this is effectively akin to decoding our body language but again you could add in so many additional poses so many different arm movements facial expressions so on and so forth so that's looking good so let's go ahead and bring in our dependencies to train our classification model so there's quite a couple of lines here so we'll take a step back once i've actually written this all righty so those are our classification model dependencies now imported so we've gone and written four lines of code there so from sklearn.pipeline import make underscore pipeline from sklearn.pre-processing import standard scala from sklearn.linear underscore model import logistic regression comma ridge classifier and i'm going to explain all of these in a second so don't fret so from sklearn.ensemble import random forest classifier and gradient boosting classifier and that is our last line so let's break this up so these two over here are really all to do with making a machine learning pipeline so the bank pipeline method allows you to build an ml pipeline now you can append a whole bunch of different functions to your specific pipeline and this pipeline will be used for both our training as well as our score but again if you'd like me to do a more detailed video on how these are actually built up please do let me know in the comments below to our pipeline we're actually going to pass through a pre-processing function called standard scalar so leveraging standard scalar actually normalizes your data so it actually subtracts the mean and divides it by the standard deviation so it ideally tries to get everything onto a level basis so that no one feature overshadows the other features so that's standard scalar then what we've just gone and done is brought in a whole bunch of algorithms so i don't want to go into too much detail for this just think of each one of these as a separate classification algorithm so we've got logistic regression we've got ridge classified mouse going crazy logistic regression ridge classifier random forest classifier and gradient boosting classifier what we're actually going to do is we're actually going to train four machine learning models when we actually go and do this so rather than just relying on one particular type of algorithm to ideally get us good results we're actually going to use them all test them out and then pick the best one so those are our algorithms done now what we need to do is actually set up these pipelines so we're going to loop through each one of these pipelines eventually but we need to set them up first up so let's do it okay so those are our different pipelines now created now each one of these pipelines think of these as a separate individual machine learning model or eventually they're going to be a separate individual machine learning model pipeline now what i normally do as well is i'll actually do a hyper parameter search but in this case i'm trying to keep this code relatively short it's probably already going to be super long but this is the initial step or part of my production classification pipeline so what we've actually got here is we've got a dictionary so if we type in pipelines.keys we've got four different pipelines here so we've got lr which i've just mapped to logistic regression rc which is going to map to the ridge classifier algorithm rf which is going to map to random forest classifier and gb which is going to map to our gradient boosting classifier algorithm now under each one of these keys so let's take a look at one dot values so you can see here that we've actually got this pipeline as i was saying so we've got pipeline and then we've got our first step which is our standard scalar which is what we brought in over here and then we've got our actual algorithm so what's actually going to happen is our data will first be passed through to our standard scalar to ideally bring our data all to a level basis and then we'll actually get our logistic regression model then trained now in order to do this we've written pipelines equals curly brackets and then we've set up our first key which is lr then colon make underscore pipeline and then to that we've passed through two arguments so standard scalar and we've used that function that we imported up here and then we've actually passed through the algorithm so comma logistic regression then what we've basically gone and done is done the exact same thing for all of the different algorithms so this is the exact same as this the only difference is that we've changed the algorithm in the second part so i've gone and created a key for random or ridge classifier which is what we've got here then we've gone and created another key for random forest classifier which is what we've got here and then we've gone and created our last one for gradient boosting classifier which is that one there now what we can actually go ahead and do is actually train these models so let's go ahead and do this so we're going to store all of these inside of a variable called fit models i'm going to create a blank dictionary and then we're actually going to loop through each one of these pipelines and train them so let's go ahead and do that okay so that is going to be our training step so what i've actually gone and written there is and i haven't run this yet so it's going to take a little while so once we go and train it we'll train it and we'll actually come back so fit underscore models equals a blank dictionary so curly braces then what we're actually doing is we're looping through each one of our pipelines so to do that we're in for algo comma pipeline in pipelines let's zoom in on that a little bit so this is really critical in pipelines dot item this should be items let's change that so pipelines dot items then what we're doing is we're grabbing the pipeline and we're using the fit method which is think of this like the training method or and it's really really common across pretty much all machine learning libraries so dot fit normally means train right and then to that method we've passed through our training data so our x-train and our y-train data and then we're going to store the trained model inside of a variable called model then what we'll do is we'll take that model and we'll store it inside of our dictionary so we're using fit underscore models we're setting a key equal to the name of the algorithm so lrc rf or gb and then we're setting it equal to a trained model over here so now if we go on ahead and run this we should ideally get four trained models out of it so let's go ahead and run this and we'll be right back so ideally it should kick off training and you'll eventually get to the end it'll stop and you'll have all of your models inside of the fit models dictionary okay so that is our model completed or that is our training completed so if we actually take a look at our fit models dictionary you can see that we've got all of our different fit models and what we can then go and do is actually evaluate them so if we actually wanted to test them out we can actually do that this brings us to step 3.3 so first up what we need to do is bring in some dependencies to be able to go and evaluate and serialize our model so think of serializing as effectively saving your model down so we're going to use a library called pickle for that so let's bring those in and let's test out our models let's actually go and make a prediction first up so if we tap fit models and we go and grab uh let's grab random forest for now right so this is going to give us our fit our random forest pipeline then if we actually wanted to test this out we can type in dot predict and pass through some data so in this case i'm going to pass through my test data and you can see it's actually predicting whether or not based on the coordinates whether or not it's happy sad happy victorious happy sad so on so we can actually test this out on each of the different models so if i type in lr and again doing the same thing rc doing the same thing so effectively you can start to see a little bit of a pattern here so we're taking our coordinates we're training our model we're then going to pass through our test data to evaluate it now when we put this into production right or actually try to make real-time detections what we're effectively going to be doing is passing through our row of data that we've collected so this sort of gives you an idea as to how this is all going to fit together so that is all well and good let's keep on going and evaluate it so again we're going to bring in our dependencies to actually go and evaluate our model we're going to add in a few additional cells so we're in the center so let's finish off bringing in those dependencies and then we'll evaluate them okay so those are our two dependencies imported so we've written two lines of code there so from sklearn.metrics import accuracy underscore underscore score so this is going to give us our accuracy metric or accuracy function that allows us to calculate accuracy so think of accuracy as just a metric or a performance metric for your machine learning models there's a whole bunch of them out there and that you can use to evaluate your model i'm just keeping this lightweight so we're going to use accuracy and then pickle is a really popular library that people use in the data science field to be able to go ahead and save their models down to disk so we're going to do exactly that so now what we want to do is actually go on ahead and evaluate our model so remember we went and predicted based on a different row up here and we got our different values what we want to do now is see how well it's actually doing that so let's go ahead and test this out alrighty so that is our evaluation code now done now what we're effectively doing is we're first up looping through each one of the models inside of our fit models dictionary so fit underscore models dot items and we're specifically doing four algo comma model in fit models or fit underscore models dot items we're then making a prediction so this line of code here is no different to what we're doing up here except what we're doing is we're going and doing it for each individual algorithm so model.predict and then we're passing through our x underscore test data and we're storing the results of that prediction inside of a variable called y-hat so y-hat is pretty common or pretty standard notation for where you store your actual predictions in the data science field so normally you'll see the y with a little hat symbol i've just written y hat then what we're doing is we're actually printing out our accuracy so print algo comma and then we're using our accuracy score function which we imported up here and to that we actually pass through two key values so first up we pass through our y test data so y underscore test and then we pass through our y hat data so this is basically saying that so we know all of our different y test values right so we know how those map through to those coordinates what we're doing is we're pinning our y hat prediction against those y test values and we're seeing how well our predictions were or evaluated against the actual data so think of it as a bit of a gauntlet so we're pinning our y hat data against our y test data to work out how well it actually performed and in this case it looked like it performed pretty well so our accuracy for three of our models was about 100 so this is actually very high and gives me a little bit of anxiety because nothing is ever perfect right but in this case we've got 100 um we're probably going to use the random forest model but again if you get this in this particular case normally what you'd want to do is test out or in a normal data science field or project you'd want to try out a whole bunch of different metrics to evaluate whether or not each individual model might be performing better in certain tasks and worse in other tasks but in this case we're pretty happy we've got a lightweight evaluation there so our logistic regression model our ridge classifier and our random forest classifier all got 100 accuracy our gradient boosting model was a little bit down so it got 0.98 so 98 point or 99 but again still pretty good considering we only gave it 500 ish rows of data now what we actually want to do is save this model so we're going to use pickle to do exactly that so say for example so you sort of saw how we can actually make that prediction up here so if we wanted to use our random forest model all we need to do is grab that model so fit models and then pass through the name of the model dot predict and then pass through our data so in this case it's going to be x test cool so these must be completely accurate right so if we take a look at our y test data so let's take a look so it's saying so our first value is happy happy second value sad sad happy happy victorious victorious happy pretty good right so and again this is sort of why i wanted to build this tutorial because it's actually relatively straightforward to build this up and even if you don't get the best performance you can always add more data to your csv train again and then evaluate but again the beauty of this is that it also has a whole heap of different use cases so if you wanted to do drowsiness detection or if you wanted to do action detection or action detection is a little bit different but if you wanted to do i don't know let's say uh if you wanted to build a sign language model out of this which is based on a single frame or based on a single coordinate you could do it here pose classification could do it here sentiment analysis from your face could do it with this pipeline but on that note it is time to save our best model so let's go on ahead and do that okay so that is our best model now saved or the one that we chose that we're going to export so remember we said we were going to export our random forest model they were all 100 but in this case random forest is an ensemble method so i prefer that one again if you want me to do a bigger data science series do let me know in the comments below so we're pretty happy with that for now so what we went and did is we exported it using pickles so if i actually show you that what we went and did or the model that i actually named it was body underscore language which you can see here you can see that we have in fact exported it now let me take you through this code so what we've done is again we've used a with statement so with open body language or body underscore language dot pickle we're then passing through another argument so this if you wanted to change the name of the model all you need to do is change this line here so it could be pose detection model could be sentiment analysis model could be action detection whatever you wanted to be named all you need to do is change that there and then we'll pass through the right binary flag so we're going to write it out as a binary file and then we're using it as a variable called f then what we've gone and done is we've used our pickle library so in this particular case we've written pickle dot dump and then we're specifically dumping out our best model which is our random forest model based on our accuracy metric here so what we're doing is we're accessing our fit models dictionary so fit underscore models and then we've passed through our random forest index and then we've passed through our second argument as f which is this here so basically we're saying export our best model which is our random forest model save it as a file called body language.pickle and dump it down so that's exactly what we've got here now what we can do is actually put this into action so so far what we've done is a lot of pre-processing a lot of machine learning stuff so when and did our dependencies did our detections collected and captured our csvs we then went and trained a custom machine learning pipeline which is pretty intense now it's time to have some fun and actually make some detection so what we're first up going to do is we're going to reload our model into our notebook so in this case we're going to do the opposite of this so this is doing the dump we want to bring it back in and then we're going to tweak this big block of code here to be able to then go and make detections with that particular model and apply a custom classifier and we'll do a little bit of fancy formatting with our opencv frame as well so it looks a bit better so let's go ahead and first up import our model and then we'll be able to double back and go back to also another key note is that if you wanted to you could export this to a python script and run it in your favorite code editor so i know i do a lot of coding inside of a jupyter notebook because it's my fave but if you wanted to do it via an id say pycharm you could do that as well so let's go ahead and import our model okay so that is our model now loaded so what i've just gone and read in there and the reason why i reload it is because if you wanted to go and save a bunch of different models and reload them then you've got this pipeline sort of set up or if you didn't want to go through the entire training pipeline again you could load this next time simply by running through and loading up your model from scratch so to do that we've written with open body language.pickle so we've passed through that as an argument and in this case we're typing in read binary as our second argument so up here we had right binary because we're writing it out here we're passing through read binary because we would now want to read a binary pickle file and then we're going to work with it as a variable called f and then we can just use the pickle.load function to bring it in so pickle.load and then we're passing through the name of our file and we'll actually have that variable stored inside a variable called model so if we type in model so you've got your standard scalar step and then you've got your random forest classifier step so that is all well and good now comes the big bit so what we're going to go on ahead and do is copy let me zoom out a bit so you can see this is copy this big block of text which we had under step two so remember this was the code that we used to actually capture our coordinates now the nice thing about this is that we've actually already got our coordinates formatted in a way that we can actually then pass it to our model so what i'm going to do is i'm going to copy this and then i'm going to go all the way down to step four and we're gonna make a couple of key changes to now make some detection so we're gonna change this block of code around about here to actually start leveraging the model that we just went and loaded so again think of it as first up you capture your coordinates using your facial landmark model and your body pose model and then what you can do is once you've trained your custom classifier using those coordinates you can then use that model that you just went and trained to be able to decode your body language so let's go on ahead and again we're going to take this step by step so rather than doing it as a big block of text we're going to do this step by step so first up what we're actually going to do is we're actually going to comment at these components here so we're going to comment out the appending of a class name and the export to csv because we don't actually need those anymore then the next thing that we're going to do is actually start working with our bundle so let's start doing that okay so before we go any further so i wanted to write out these three lines of code or the four lines of code show you how this works and then everything from here onwards is really just to do with rendering how do we show our class to the frame so what we've actually got here is four additional lines of code so i'm going to zoom in on this because these are pretty important so what we've got let me zoom out a little so what we've got is the first line which is x equals pd dot data frame and then to that we're passing through our row inside of some square brackets so remember our row is what we wrote right up at the start and we were just concatenating our pose row and our face row together to be able to get one big array of all of our coordinates and remember our coordinates were x y z and v then what we're doing is we're actually making two detections so first up we're determining the class and to do that we're just using model dot predict and we're passing through our x values and we're extracting the first value out of that array so remember when we actually went and trained our model we actually used that method again so we used model fitmodels.predict predict uh where was it model.predict over here so again this is no difference our predict method is what we use in production to actually make detections on new data the new function that actually didn't write it correctly predict proper the new function that we've actually used here is model dot predict proper so this is actually predicting the probability of that particular class now in this particular case we're going to then get the class as well as the probability of that being the actual class i wanted to show you both of those values you can drop this if you wanted but i found it actually useful so if you wanted to add a threshold you could just add an if statement to apply that as well here then what we're doing is we're basically printing those out so i'm storing the class prediction inside a variable called body underscore language underscore class and i'm storing the probability prediction inside of a variable called body underscore language underscore prob so that is going to give you the two key components that you're actually going to need to be able to render out your prediction so again we're probably going to go through the rendering pretty quickly because i want to focus on the actual prediction in this case we're pretty good so if we actually run this what we should get is our class and our probability printed out i just want to make sure we're not printing out anything else it's looking good all right cool let's test this out we're running yep we're running cool so we've got our frame and you can see all of our probabilities so you can see here that we've got our sadness or whether or not we're sad or not if we're happy so you can see we're predicting happy what about victoria so if i put my arm up you can see that we're predicting victorious pretty cool right awesome now in this case what we're actually doing is we're actually printing out the whole probability so in this case you actually need to grab the maximum value but we'll do that a little bit later you can see that we're actually predicting all of our different classes right so we've got sad we've got victorious we've got happy so on and so forth and it looks like our prediction metrics are actually pretty good there so our probabilities are coming out at 0.98 uh what did we have for sad we've got uh 0.45 so maybe we could do a little bit of additional training i'll explain these all in a second happy looks like we're getting 0.53 so not too bad so again pretty happy with those results now right now we're not actually rendering anything to the screen so ideally we want to show our results to our actual screen rather than just printing them out and this is exactly what we're going to do now there's quite a fair bit of code that i'm actually going to write here so again we'll take a step by step and build up so let's go ahead and do this okay so before we go any further i sort of want to explain what i'm doing here so first up i'm grabbing the coordinates of my left ear because that's where we're going to display our class right so i want to have the class displayed next to my ear so there's quite a fair bit of code to actually do this so let's break it down let's bring that to there and bring that there actually let's bring that to another line that's looking a bit better okay cool so first up what we're doing is we'll let's start up over here so this line of code is grabbing our results value which we get from our baseline detection over here and then what we're actually doing is we're accessing our pose landmarks dot landmark so again let's actually take a look at this so results dot pose landmarks dot landmark so it's giving us that and then what we're actually doing is we're grabbing the coordinates or the point of our left ear so if i grab this line of code here paste that in there so these are the coordinates for our left ear and then if i grab my x value so it's this line is effectively doing exactly that then what i'm doing is the exact same thing for our y value and then we're storing that inside of a numpy array so if we stack all of those together and then put it inside of a numpy array so that brings us to here so we're now going to get these coordinates here but these coordinates aren't scaled for the size of our webcam or the size of the feed that we're actually getting from our webcam so we're then passing it to a numpy function so we're doing a multiplication so we're then multiplying this numpy array over here which is effectively this by the dimensions of our webcam frame so 640x480 so if we do numpy.multiply and multiply it by 640 by 480 this is now recaptured or this is now recalculated for the dimensions of our web frame so we're all from our webcam so we should be able to render this successfully but what we need to do is actually pass integers when we actually go and position a particular component using opencv but you'll see this in a sec so to do that we type in as type int to convert it to an integer and then opencv expects our coordinates to be in a tuple so we just pass through and wrap it inside of a tuple so that effectively gives us that so everything that you're seeing here is effectively just that so that gives us our er coordinates and that's actually going to allow us to render our class right next to our yeah and again you could change where you displayed so in this case i'm doing it next to my ear you can do it next to the elbow you can do it on your nose for example you can move it around now what we actually want to do is actually go and render our prediction so let's go ahead and do that and then we'll keep building up okay so that is our rendering now done or at least the first part of our rendering so we've gone and written quite a fair bit there so it might look a little bit complicated but let me break it down so what we've actually got here is first up we're drawing a rectangle so we're actually going to have a background overlay on our class and then really all of this stuff is really just it's positioning so in this case you can play around with this but in this case i've grabbed our coordinates which we had up here so our x coordinate and then i've grabbed our y coordinates and i've made it a little bit thicker so i've grabbed our y coordinate and i've added five then again i've done the same thing i've grabbed our coordinate and then rather than having a fixed square background i've actually made it a little bit more dynamic so we're grabbing the body language class we're grabbing the length of that and we're multiplying it by 20. so again if you find the class names are really really long then you might want to change this background size so that it renders appropriately and then i'm setting the y coordinate appropriately and again i've just tweaked this so it actually encapsulates the entire text block we're then passing through the color so bgr as well as whether or not we want the rectangle filled in color and then we're doing a similar thing for our text that we're actually going to display so we're grabbing cv2 dot put text we're passing through our image passing through our body language class which remember is going to be sad victorious or what was the other one happy and then we're passing through the coordinates of where we want to place it and then some other parameters so the font that we want to use so cv2 dot font hershey simplex the size of the font the color of the font in this case is going to be white but you could tweak it if you wanted to the size of the font line and as well as the line type so cv2 dot line aaa so again my font size or font line size is going to be 2. the line type is going to be line underscore a8 so let's now go and test this out and see if we've got any errors or if we're all good so i'm just going to run that and so additionally now you should have our face but we should also be able to see our results let's run it again so it looks like that box closed so let's try that again cool so you can now see that we're getting our detection so right now it's saying i'm sad because my head's down happy sad victorious how good is that right so again it's detecting our body language pretty accurately so sad but you could do this for so many poses so right now i'm doing it for like just happy and sad but you can do this for literally anything i know i'm getting super excited about this but it's so cool so victorious happy sad and again looks looks like we've got a slight not the greatest detections there but you sort of get the idea so again you can do so much so we're getting happy i don't know i'm sorry this is awesome sad victorious awesome all right cool so pretty good with that now i want to do a little bit more so rather than just having happy and sad here i actually want to display the probability as well as the class up here so let's go and do that so if i quit out of this this part here is purely optional but i sort of wanted to do it because it looks way cooler so let's go on ahead and what we're going to do is we're going to add a status box in the top hand corner we're going to display the probability and we're also going to display the class that we've detected let's do it okay so we've now gone and done the first part of our status box so what i've gone and done is i've written three lines of code i've separated out into multiple lines but three lines of code so first up what we're doing is we're drawing a rectangle and in this case his positioning is going to be based in the top left-hand corner so we're passing through our image and to do that we're actually using the rectangle method so cv2.rectangle through our image pass through our top coordinate pass through our bottom coordinate and again you can play around with these pass through the color of the box that you actually want and by passing through negative one as the line thickness it's actually going to fill the box with color and then what we're doing is we're actually loading up some text so in this case i'm using two cv2.put text methods one is going to be a little title which says class and then one is actually going to have a body language class word so in this case what i'm doing is i'm actually splitting it out and i'm only grabbing the first part of the class in case we've got really really long classes later on everything else is really similar to what we did up here so we've got the start coordinates or where we actually want the text to start the font the font size the color the width of the line and then the line type again same sort of thing here but again you can play around with all of these options there i've just found that this positioning tends to work so what i'm actually going to do is i'm going to copy this as well and rather than just do it for class we're also going to do it for our probability so let's do that and then we'll run it okay i think we should be good to go there so all i did there was i changed the title text so in this case it's going to be called prob but you could change that and i also changed uh the value that we're going to display in the second component in this case over here we had the body language class that we're splitting out and the second component we're just going to display the probability so let's go ahead and run this and all things holding equal should have our final model okay so it looks like a close there so if it pops up and then closes just rerun that cell again oh third time's the charm just realize we've got to grab the maximum probability as well it's looking good so happy sad victorious pretty sweet right now probability isn't working and i realized just after i ran it why it's not so let's quit out of this we actually need to extract the maximum probability out of this so let's go ahead and do this so we can do this in np.argmax okay so what i've gone and done there is i've used the numpy.argmax function to grab the maximum probability because that's the one we want to extract and that's going to give us the index of the probability that has the highest likelihood of being detected the body language class is just automatically going to give us the highest probability value or the highest probability label with the probabilities we actually need to actually extract the values specifically so in this case by doing this we're going to get the highest probability which will map to the class as well and then what i'm doing is i'm rounding it to two decimal places because you're going to get a long value and i'm converting it to a string to be able to display it using opencv so if i run this now it should ideally get our probability let's try that again and the third times charm cool so you can see that we've got our probability in there now so if i go to saad you've got our sour probability happy we've got a happy probability and if we put our hands up victorious so we've got our victorious probability so again this is pretty sweet right so you can do a whole bunch of stuff with this we're now getting all of our detections and we've effectively decoded body language and again you can mess around with this so you can move our arms it's looking still good happy looks like we might need a little bit of additional training on sad so you can see there that sad isn't getting the highest probability so we could potentially tune that a little bit to get it a little bit better but again we're getting sad there happy sad happy sad happy sad that's probably enough messing around now the last thing that i want to show you but that about wraps it up so if you wanted to stop there you definitely could so we've now gone and done a ton of stuff so we installed and imported our dependencies we made some detections we captured our landmarks and exported to csv and we captured for three different classes so happy sad and victorious we then also went and trained our custom model so we did all of that we then went and made our detections with our model as well now i want to do one last thing and i want to show you how to add an additional class to this if you wanted to extend it later on so it's actually reasonably straightforward with this pipeline so say for example we did happy we did sad and we did victorious but what happens if we wanted to add another class now in this particular case i'm actually going to add another class so what i want to actually detect is the wakanda forever side so what we're going to do is we're going to create a new class called wakanda forever and again it's completely optional but you could add a completely different class if you wanted to and i'm going to add this color update this class name under step 2 capture landmarks and export to csv as you might have guessed i'm a big fan of black panther but um so all right so capture landmarks and export the csv so we're now going to rename our class name to wakanda forever but you could do whatever you want and then we're going to run this line of code again and this is going to give us additional landmarks in our csv so if i'd run this get into those just make sure it's running closed let's do it again cool so let's go into wakanda forever let's make sure it sees all of our landmarks again it should ideally be detecting that our hands are crossed so the camera's getting in the way it's looking good all right not too bad so now if we go and read in our data frame again so basically we're just going to go through our same process so reading our data frame all right so it looks like we're still about happy we've got what kind of forever appended to it don't need to run that or we can do that if we wanted to and then again we're going to create our x and y values go through it import our dependencies create our pipelines train our models so again we're going to wait a little while for this to train and we'll be right back all righty so it looks like our model has finished training so again we're going to take a look at our fit models make some predictions and it looks like we've got a bit of an error there could not convert string to float wakanda forever oh hold on this shouldn't actually be like this this should be x test perfect so that's working looks like we've got what kind of forever in there as well now let's take a look at our accuracy okay so it looks like our best performing model is now our rc model so what we're actually going to do is export that one so again we can export a different model so we'll type in rc here to export that and then load up our model again and this time we're bringing in our ridge classifier model and now if we go and make our detections with our detection pipeline this should pick up our new class as well and looks like the box closed let's do it again cool oh and it looks like we've got an error so it looks like nothing is actually displaying here so we might need to close out out of this let's try that again so let's delete or comment out our try accept blocks and see if we're getting errors unexpected indent so let's bring this back in here see what happened there bridge classifier object has no attribute predict problem okay so rather than using our ridge classifier model let's actually export a different one so this is throwing up an error because the ridge classifier model doesn't actually have the probability so if we actually commented out the probability we'd be good to go but in this case rather than doing that let's just use a regular let's actually export our regular random forest model so let's do that let's do that so all i'm doing is i'm changing it back down to this model actually our gb looks like it's performing better than our random forest model but we know that random forest model has predict robots so let's just use that one so leave that as rf don't go and change it and then we're going to load our model in this case we've got our random forest classifier we've gone and uncommented that let's try running this now try running it again sorry about the police sirens i don't know if you can hear that but they're down there okay cool so we've got sad happy sad victorious and wakanda forever how cool is that it looks like it's detecting all of our body language poses so if we go happy sad happy sad happy sad you guys probably going to be sticking me up victorious up victorious there we go and we'll kind of forever awesome how good is that so we've now fully built our body language decoder and again it's still predicting me as happy how good is that so we've gone and done a ton of stuff in this tutorial so let's quickly go through and do a recap and we'll wrap it up so what we did is we went and installed and imported our dependencies we then went and made some new detections with our pre-existing code from our github repo we then went and captured our landmarks and exported those to csv and again we did it for happy sad and victorious and then we went and did some additional training to add a new class which in that case was wakanda forever then if we go all the way on the way down we then went and trained our custom model which is pretty beefy and then keep in mind that it doesn't look like the ridge classifier model has a predict proper function so again if you get hung up on that hit me up in the comments below and i can help you out so just use the random forest model for now then what we went and did is we went and made some detections with our particular model or if anyone's got a solution to that hit me up in the comments below as well i believe there's a way to get the probability but i'd need to dig into it all right back to it so then we went and made some detections with our models we loaded up our model and we went and did all of this rendering so we commented out this block of code here and we went and added all of our prediction and detection code and on that note that about wraps it up thanks so much for tuning guys hopefully you enjoyed this video if you did be sure to give it a thumbs up hit subscribe and tick that bell so you get notified of when i release future videos and if you've got some additional ideas as to where or what i should be creating videos on next by all means do leave a comment down below i'd love to hear your thoughts on that note thanks again for tuning in peace
Info
Channel: Nicholas Renotte
Views: 82,321
Rating: undefined out of 5
Keywords: ai body language analysis, mediapipe tutorial, mediapipe python tutorial, mediapipe face mesh, mediapipe pose estimation, mediapipe pose detection, mediapipe pose python, pose classification
Id: We1uB79Ci-w
Channel Id: undefined
Length: 92min 40sec (5560 seconds)
Published: Tue Apr 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.