Emotion detection with Python, OpenCV and Scikit Learn | Mediapipe | Landmarks classification

Video Statistics and Information

Video

Captions Word Cloud

Captions

So on today's tutorial we are going to build an emotion classifier and we're going to classify a person's face into one of these three emotions, happy sad and surprised, so let me show you how it works, so this is my webcam and now this is happy, sad I'm surprised, so now let's get started, and now let me show you how to build this emotion classifier so let me show you the data we will be using in this tutorial we're going to use the same dataset we created in one of my previous tutorials where I showed you how to create an emotion recognition synthetic dataset, we are going to use this data over here you can see we have four different categories which are angry, happy sad and surprised, let me show you a few examples of each one of these categories for example this is the happy category and you can see we have many pictures of many people which look very happy right all of them are smiling all of them are very happy and this is for example the happy category now let me show you a few examples of the sad category you can see we have many people which look very sad right and now let me show you a few examples of the surprised category and you can see that all of them look very surprised so this is the data I am going to use in this tutorial but you can obviously use any other that that you want if you want to use a different emotion recognition dataset you can definitely do so, as long as you have many pictures of people showcasing different face expressions then you can just use any other dataset you want but if you want to follow along this tutorial and if you want to use exactly same data I am going to use then you have two different options, you can generate this data yourself if you follow along that tutorial I am going to be posting over there you can just follow all the steps in the tutorial and you can just generate exactly the same dataset yourself or the other option is to just download this dataset which is going to be available in my Patreon for all my Patreon supporters, so you have these two options in case you want to use exactly the same dataset I am going to use today and now let's continue with this tutorial now let me show you all the steps we are going to take in order to complete this process the first step will be to clean the data right this is an amazing dataset we created with stable diffusion but we are going to clean this dataset in order to make it even more amazing right we are going to clean this dataset and this is going to be the first step in this process, then we are going to prepare the data in order to train this classifier and then the next step in this process will be to train the model we are going to use as an emotion recognition model, and then finally the last step in this process will be to test the model and I'm going to show you how to test this model using your webcam so these are the four steps we will take in this process and let's get started over here with the data cleaning let me show you how to clean this data, the first thing we're going to do is to remove one of the categories because I have been inspecting all the images in these categories and I noticed that the angry category looks very similar as the sad category the images in these two categories look very similar so it's going to be very hard for the classifier we're going to build today to tell these two categories apart so we are just going to remove one of the categories like this right now we have three categories which are happy sad and surprised, then the next thing we will be doing in order to clean this dataset is to remove many images which are not the images we are looking for in order to train this model remember when we created this synthetic dataset we were looking to create frontal faces of people looking at the camera and although most of the pictures are like that there are many other pictures which are not for example this one over here he's not looking at the camera right so this picture is not really the type of picture we are looking for the same happens with this other picture over here and there are many other examples of pictures which although they look surprised they are not really frontal and they are not really looking at the camera so these are not the pictures we are looking for and then we are also going to remove many pictures which don't look like the expression we were trying to generate for example remember we are looking at the surprised category and this picture over here he doesn't really look surprised right he looks more with a neutral expression or maybe angry based on this part over here so he doesn't look surprised so we are just going to remove all the pictures of all the faces which don't really show the emotion we are looking for right there are many other examples for example this one over here he doesn't look surprised either this one over here he doesn't look surprised either so we are just going to remove all the pictures which are like this right we are going to remove all the pictures which are not frontal which are not looking at the camera and which do not portray the emotion we were looking for so this is what I'm going to do now I'm just going to select all these pictures for example this one doesn't look surprised this one doesn't look surprised either this one either this is not looking at the camera and so on I'm just going to go through all these pictures and I'm going to remove many of them, okay so that's going to be enough you can see that now all the pictures look surprised they look very surprised and all of these are very frontal pictures of people looking at the camera so this is going to be perfect now let's continue and let's do exactly the same for the sad category, okay now all the pictures look sad and they all look frontal so everything is it's okay and now let's continue with the happy category okay and that's going to be pretty much all for the happy category as well now you can see all the pictures look frontal and all of them look happy so that's pretty much all now we are going to do something else you can see that now we have 194 items in the happy category we have 160 images in the sad category and 96 in the surprise category we are going to balance this dataset so we make sure all of our categories have the same amount of objects so we are going to remove images in the happy category and in the sad category so both of them have 96 items as the surprised category this is what we're going to do now we're going to the happy category and the only thing I'm going to do is to select 96 pictures... 54... 78... 97... 96 okay and then I'm go going to invert my selection over here and I'm just going to delete all these pictures okay now we have 96 items in the happy category and I'm going to do the same with the sad category something like... this... 94... 96... then edit invert selection and I'm just going to delete all these pictures okay now we have created a balanced dataset and this is going to be exactly what we need in order to train this classifier so now it's time to go to pycharm because now it's time to get started with all the coding of this tutorial now we are going to move here to the second step in this process which is preparing this data in order to train this model and this is how we're going to do remember we are going to train this model using scikit learn and and we are going to detect the face landmarks and we are going to train this model using the landmarks so let me show you a function we are going to use and it's going to be very important in this tutorial which is this one over here and it's called get face landmarks we are going to use this function we're going to input an image and the output is going to be all the image landmarks right all the image face landmarks and this is going to make things much easier in order to work on this process this is going to be a very quick tutorial so in order to make things easier for us we are just going to use this function and we're going to use this function in order to extract all the landmarks of our faces so let me show you how we're going to do now I'm going to create a new file which is... prepare data... and this is the one we're going to use in order to prepare the data remember to install this project's requirements which are scikit learn mediapipe opencv and numpy please remember to do so before you get started otherwise nothing is going to work obviously right please remember to install the requirements now the first thing we're going to do is to import this function we have over here from utils import... get face landmarks okay and then we are going to import os and we're also going to import opencv okay and then we are going to iterate in all the images in these directories and we are just going to extract the landmarks of all the faces we have over here and by doing so we are going to prepare this data in order to train the model so the data directory is this one over here data remember we are here we are here this is prepare data.py the script in which we are currently working in and this is the directory with all the data now let's get back to pycharm and the only thing I'm going to do is for emotion in os listdir data directory we are going to say for image path in os list dir os path join data directory emotion we're going to create a new variable which is image path and this is os path join data directory emotion and image path okay okay and then the only thing we're going to do is to call this function over here get face landmarks and this is going to be something like this face landmarks get face landmarks and we're going to input the image obviously we need to create image first so I'm going to say something like image cv2 imread image path okay and then we are going to input image like this so we're just going to use this function in order to extract all the landmarks but let me show you how it works you can see that we input an image over here the first thing we do is to convert this image from BGR to RGB and we create this new object over here and then we are creating this new variable which is face mesh we are using mediapipe and we are basically just using this object in order to process this image and then we are just extracting all the landmarks from this object over here right so you can see it's a very straightforward process but in order to save us some time we are just using this function over here something that's very important is that you can see that for every single landmark we are extracting three values the X Y and Z coordinates so for every single landmark we are extracting these three values and then something else which is very important is that we are going to get the face landmarks but for every single coordinate we are going to subtract the minimum value of those coordinates because this is going to help us to normalize this data right this is like a way of normalizing this data so we make sure that we are are just going to extract all the landmarks and we don't really care where the face is located you can see that we have all of these faces over here and all of them are in different positions right we have this face over here then we have this face over here which is way upper right in the upper part of this image then we have this other face over here which is much more central we have this other face over here which is on the right we have all the faces in different positions so in order to normalize all this data we are going to subtract the minimum value for each one of the coordinates and this is going to help us in order to normalize the data and in order to train a much more robust model so let's continue, now that we have extracted all the landmarks we can just continue with this process and maybe something else I can show you about this function is this part over here which is going to take care of all the drawing but we are not going to do any drawing in this part over here which is the data preparation but we're going to do the drawing later on right we are going to use exactly the same function in order to draw the face landmarks in the last part in this process over here right when we are going to... when it's time to test the model so let's continue in order to make sure everything's okay I'm just going to execute this script and everything seems to be okay now I'm going to print the length of this array and let's see what happens... and you can see it says this value over here which is 1404 so this is how much data we have in this list over here which is the one we are going to use in order to train this model this is a lot of data these are a lot of values we're going to use in order to train this classifier we could do it which much fewer values because if you think about this problem most of the information is going to... it's going to be in the landmarks in the lips right and in the eyes and in the... and that's pretty much all right so we don't really need all the landmarks of the face if I'm not mistaken mediapipe extracts something like 468 landmarks we don't really need so many landmarks in this problem but in order to keep things simple we're just going to use all this data we have over here we could use much fewer landmarks but let's just keep things simple and we're going to use all this data so let's do something we are going to make sure that we are working with all this data because there could be other situations in which we do not detect any face whatsoever and this is going to be an empty list right and there could be potentially there could be other situations in which we have detected a fewer amount of landmarks right we don't know what's going to happen so in order to make sure we are working with all the data we need we are going to say something like this if we have 1404 values then we are going to continue and then we are going to create a new list which is output and we are going to... output append... face landmarks okay and then obviously we also need the label right because otherwise we will not be able to train this model so I'm going to say something like this for emotion index emotion in enumerate... enumerate... okay... we are going to do exactly the same but now we are going to extract the emotion I mean the... the emotion name happy sad and surprised but we're also going to have an index an integer value for each one of these emotions and this is what we are going to do now we are going to append another element to this list over here... which is going to be the emotion right the emotion index actually is going to be this value over here and that's pretty much all so now the last value we have in this list is going to be the emotion and then we are just going to append this list over here to another variable which is output and this is the one we are going to use in order to train this model just to make sure this is an integer I'm going to something like this I'm going to cast this value to integer and that's pretty much all and now the only thing I'm going to do is to call np save txt I'm going to name this value something like data. txt and then I'm going to input the... this variable over here which is output but I'm going to convert into a numpy array right first okay and I obviously going to import numpy otherwise this is not going to work import numpy as np... okay so this is pretty much all if I'm not mistaken this is pretty much all so now I am just going to run this script and this is going to be pretty much all in order to create all this data something I'm going to do first is to quit this... my browser because I have noticed that every time I execute this script it takes a lot of memory a lot a lot of memory so this is going to be pretty much... this is going to be much better in order to execute this script so I'm just going to press play... okay and that's pretty much all and now if I go to my directory over here you can see I have a new file which is called data.txt and this is exactly the file we need in order to train the model so I'm going to do something which is creating a new file which is called train model and this is the file we are going to use in order to train this model remember we are going to use scikit learn in order to train this classifier and we could do all the coding ourselves but this is a very good problem to use chatgpt to just ask chatpgt to provide all the code we need I'm going to show you how to use chatgpt for a problem like this remember programming is changing is evolving so it's very important to get familiar with all the tools which are just going to make our lives way easier and this this is a very good problem this is a very good situation to use chatgpt to just make everything much easier so I'm just going to ask chatgpt something like this, give me Python code to train a model using scikit learn random forest classifier the data needs to be loaded from a txt file called data.txt each row is a different data point the label is the last value let's see what happens... okay and this is what we got and obviously I have already prepared this tutorial so I know how this should look like and yeah I think this is perfect this is exactly what we need so I'm just going to copy and paste it over here... and you can see that we are importing numpy and scikit learn then the file is called data.txt we are loading this file the label is the last value so everything is okay and all the features are all the other values... then we are splitting the data... we are splitting the data like this let me do something which is... do not show hints okay, uh I'm going to do something else which is stratifying... stratify... according to the labels which are Y this is very important because we want to make sure all these two sets the training set and the test set they have the same amount of objects or actually the same proportion of objects of all of our categories this is very important and also obviously shuffle equal true right and that's pretty much all and then let's continue I'm going to create this random forest classifier without any parameters I'm just going to use all the default parameters but other than that everything is just perfect everything is exactly what we need right, so this is a very important tool remember programming is changing is evolving so you need to be familiar with how to use chatgpt in order to make your life way easier so I'm just going to execute this script as it is and let's see what happens I'm going to close the browser again so we make sure we can execute this file with no... without any issues, actually this training process is not really that memory consuming we are not really consuming that much memory or that many resources so we didn't really need to close the browser in this case but it doesn't matter let's continue so you can see that we're getting an accuracy of 77% let's do something else which is we are going to import confusion matrix... and we are going to print the confusion matrix as well so we have even more information regarding what's going on... y test y pred okay so these are the results and you can see that we are training a very powerful classifier you can see that for each one of our categories we have an almost perfect result this one over here is where we have the best result but nevertheless in these two other categories we also have very good result now that I think about it I'm not really sure how this model is going through these categories I think this is alphabetical order but I'm not completely sure only to make sure I am going to do exactly the same but I'm going to prepare the data again... sorting this list over here so we make sure all the emotions are sorted alphabetically now I'm going to train the model again sorry I'm going to prepare the data again... and now I'm going to train the model again and let's see what happens now, okay now we have very similar values so yeah this category over here is happy this one is sad and this one is surprised so so you can see that for all of our categories we have very good results especially for the happy category which is the one we have a much better classification a much better accuracy right but nevertheless the other two categories we have a very good accuracy as well and overall we have a 77% accuracy so this is a very good and a very powerful model so this is pretty much all in order to train this classifier you can see how easy it was once we used chatgpt maybe I can explain super quickly what exactly we are doing in all these instructions so over here we are loading the data here we are just splitting data into the features and the labels here we are calling train test split in order to produce the training set and the test set then we are just creating a classifier we're going to use random forest classifier and if you ask me why we are using random forest and we are not using any other classifier, just because, there's not any particular reason this is a classifier I really like a lot because it performs very well so that's the only reason why we are using it because I like it now let's continue then we are calling the fit method of this classifier and the input is the training data right these are the training features and these is the training labels and then then we are just calling the predict method and we are predicting the test data and we are comparing the prediction against our ground truth which is y test and then we are doing the same over here with the confusion matrix and that's pretty much all and that's a very quick explanation regarding how this script works and now let's continue with the next step in this process which will be to test the model this is where we are going to use our webcam in order to test this model so I'm going to create a new file... new file... test model... and we are going to do something like this we could use chatgpt as well but I'm just going to do it myself it's not really that... it's not really that much so I'm going to import cv2 and I'm going to create a new object which is cap cv2 video capture and I'm going to use my webcam with the index number two... in your case this depends on how many webcams you have connected to your computer so if you have only one you have to say the number zero but if you have more than one then you just have to select the one you want to use in this project, then I'm going to say something like this ret frame equal to cap read and then while ret I'm going to copy this instruction... we're just going to read frames from the webcam one after the other and we are going to visualize these frames right imshow frame and this is going to be called frame okay... and then we are going to wait something like 25 milliseconds in order to produce something which is going to look like real time we're going to wait 25 milliseconds after we show each one of our frames and this is going to make the visualization look real time and then at the end we're just going to say something like release and then cv2 close... close all windows something like this if I'm not mistaken... destroy all windows okay, okay then we are going to use this classifier... something we didn't do over here is to save this classifier so I'm going to import... pickle and then pickle. dump... we need to say something like... with open model this is going to be this object over here... and then f, this is actually w because we're going to write a file and this is wb because we're going to write this file as bytes okay and that's pretty much all so I'm going to train the model again... okay now we have saved the model over here this is the model we have trained and now the only thing we need to do is to load this model model over here so I'm going to import pickle... and then with open I'm going to input the model path which is this one this is going to be rb as f... and then this is something like model equal to... f no sorry pickle. load... f... okay and that's pretty much all then we need to extract the landmarks so I'm going to import... from utils import get face landmarks and then we are going to say something like this get face landmarks frame and we also need to adjust another argument which is static image mode we're going to set it in false because now this is going to be a video okay we are not going to draw anything for now so this is going to be false for now later on we are going to say to true but for now let's just leave it like this okay and these are going to be the face landmarks okay and now we're just going to use this model... predict face landmarks okay and this is the output... let's print output just to make sure everything is okay and then we are also going to plot this value into our webcam but let's take it one step at the time, so let's see what happens I'm going to press play... okay so we are getting some predictions now we are predicting sad which is the index number one if I smile... now we are predicting zero which is happy so everything seems to be okay now we are going to print the... the emotion the category we are predicting we are going to print this value in the frame and in order to do so we are going to use cv2.put text okay this is going to input the frame is going to input the text we are going to uh print and this is going to be something like this emotions equal to... happy... sad... surprised... okay then the position which is going to be something like... something like 10 and the... the Y Position will be something like frame... shape... 0 minus one we want to print this text in the bottom of the frame so this is going to be something like this and then we need to input other values, one of them is the font we are going to use so I'm just going to copy and paste this value over here something like this and then the text size which I'm going to set in three the color... which is going to be green and then the width which I'm going to set in five... something like this... oh I made a mistake actually this goes over here okay... now everything should be okay and now we need to input the emotion which is something like this output... zero... okay let's see now what happens I am going to convert this value into integer okay now let's see if it works so sad... happy... and surprised... so everything is working just fine now let's do something else which is plotting all the landmarks and in order to do so this is going to be very simple the only thing we need to do is to draw equal to True okay let's see now and you can see that now we're also visualizing all the landmarks in my face so let's try again this is happy... sad... and surprised... so this is going to be all for this tutorial my name is Felipe I'm a computer vision engineer and this is exactly how you can build an emotion classifier using python scikit learn and mediapipe, this is going to be all for today and see you on my next video

Info

Channel: Computer vision engineer

Views: 1,669

Rating: undefined out of 5

Keywords:

Id: h0LoewzGzhc

Channel Id: undefined

Length: 34min 42sec (2082 seconds)

Published: Thu Feb 01 2024