So on today's tutorial we are going to build an
emotion classifier and we're going to classify a person's face into one of these three emotions,
happy sad and surprised, so let me show you how it works, so this is my webcam and now this
is happy, sad I'm surprised, so now let's get started, and now let me show you how to build this
emotion classifier so let me show you the data we will be using in this tutorial we're going to
use the same dataset we created in one of my previous tutorials where I showed you how to create
an emotion recognition synthetic dataset, we are going to use this data over here you can see we
have four different categories which are angry, happy sad and surprised, let me show you a few examples
of each one of these categories for example this is the happy category and you can see we have
many pictures of many people which look very happy right all of them are smiling all of them
are very happy and this is for example the happy category now let me show you a few examples of the
sad category you can see we have many people which look very sad right and now let me show you a few
examples of the surprised category and you can see that all of them look very surprised so this is
the data I am going to use in this tutorial but you can obviously use any other that that you want
if you want to use a different emotion recognition dataset you can definitely do so, as long as you
have many pictures of people showcasing different face expressions then you can just use any other
dataset you want but if you want to follow along this tutorial and if you want to use exactly same
data I am going to use then you have two different options, you can generate this data yourself if
you follow along that tutorial I am going to be posting over there you can just follow all the
steps in the tutorial and you can just generate exactly the same dataset yourself or the other option
is to just download this dataset which is going to be available in my Patreon for all my Patreon
supporters, so you have these two options in case you want to use exactly the same dataset I am
going to use today and now let's continue with this tutorial now let me show you all the steps
we are going to take in order to complete this process the first step will be to clean the data
right this is an amazing dataset we created with stable diffusion but we are going to clean this
dataset in order to make it even more amazing right we are going to clean this dataset and this
is going to be the first step in this process, then we are going to prepare the data in order to
train this classifier and then the next step in this process will be to train the model we are
going to use as an emotion recognition model, and then finally the last step in this process will be
to test the model and I'm going to show you how to test this model using your webcam so these are
the four steps we will take in this process and let's get started over here with the data cleaning
let me show you how to clean this data, the first thing we're going to do is to remove one of the
categories because I have been inspecting all the images in these categories and I noticed that
the angry category looks very similar as the sad category the images in these two categories look
very similar so it's going to be very hard for the classifier we're going to build today to tell
these two categories apart so we are just going to remove one of the categories like this right
now we have three categories which are happy sad and surprised, then the next thing we will be doing
in order to clean this dataset is to remove many images which are not the images we are looking
for in order to train this model remember when we created this synthetic dataset we were looking
to create frontal faces of people looking at the camera and although most of the pictures are like
that there are many other pictures which are not for example this one over here he's not looking
at the camera right so this picture is not really the type of picture we are looking for the same
happens with this other picture over here and there are many other examples of pictures which although
they look surprised they are not really frontal and they are not really looking at the camera
so these are not the pictures we are looking for and then we are also going to remove many
pictures which don't look like the expression we were trying to generate for example remember
we are looking at the surprised category and this picture over here he doesn't really look
surprised right he looks more with a neutral expression or maybe angry based on this part over
here so he doesn't look surprised so we are just going to remove all the pictures of all the
faces which don't really show the emotion we are looking for right there are many other examples
for example this one over here he doesn't look surprised either this one over here he doesn't
look surprised either so we are just going to remove all the pictures which are like this right
we are going to remove all the pictures which are not frontal which are not looking at the camera
and which do not portray the emotion we were looking for so this is what I'm going to do now
I'm just going to select all these pictures for example this one doesn't look surprised this one
doesn't look surprised either this one either this is not looking at the camera and so on I'm just
going to go through all these pictures and I'm going to remove many of them, okay so that's going
to be enough you can see that now all the pictures look surprised they look very surprised and all of
these are very frontal pictures of people looking at the camera so this is going to be perfect now
let's continue and let's do exactly the same for the sad category, okay now all the pictures look
sad and they all look frontal so everything is it's okay and now let's continue with the happy
category okay and that's going to be pretty much all for the happy category as well now you can
see all the pictures look frontal and all of them look happy so that's pretty much all now we are
going to do something else you can see that now we have 194 items in the happy category we have 160
images in the sad category and 96 in the surprise category we are going to balance this dataset so
we make sure all of our categories have the same amount of objects so we are going to remove images
in the happy category and in the sad category so both of them have 96 items as the surprised category
this is what we're going to do now we're going to the happy category and the only thing I'm going
to do is to select 96 pictures... 54... 78... 97... 96 okay and then I'm go going to invert my selection
over here and I'm just going to delete all these pictures okay now we have 96 items in the happy
category and I'm going to do the same with the sad category something like... this... 94... 96... then edit invert selection and I'm
just going to delete all these pictures okay now we have created a balanced dataset and this is
going to be exactly what we need in order to train this classifier so now it's time to go to
pycharm because now it's time to get started with all the coding of this tutorial now we are
going to move here to the second step in this process which is preparing this data in order
to train this model and this is how we're going to do remember we are going to train this model
using scikit learn and and we are going to detect the face landmarks and we are going to train this
model using the landmarks so let me show you a function we are going to use and it's going to be
very important in this tutorial which is this one over here and it's called get face landmarks
we are going to use this function we're going to input an image and the output is going to be
all the image landmarks right all the image face landmarks and this is going to make things much
easier in order to work on this process this is going to be a very quick tutorial so in order to
make things easier for us we are just going to use this function and we're going to use this function
in order to extract all the landmarks of our faces so let me show you how we're going to do now
I'm going to create a new file which is... prepare data... and this is the one we're going to use in
order to prepare the data remember to install this project's requirements which are scikit learn
mediapipe opencv and numpy please remember to do so before you get started otherwise nothing
is going to work obviously right please remember to install the requirements now the first thing
we're going to do is to import this function we have over here from utils import... get face
landmarks okay and then we are going to import os and we're also going to import opencv okay
and then we are going to iterate in all the images in these directories and we are
just going to extract the landmarks of all the faces we have over here and
by doing so we are going to prepare this data in order to train the model
so the data directory is this one over here data remember we are here we are here this is prepare data.py
the script in which we are currently working in and this is the directory
with all the data now let's get back to pycharm and the only thing I'm going
to do is for emotion in os listdir data directory we are going to say for image path in os list dir os path
join data directory emotion we're going to create a new
variable which is image path and this is os path join data
directory emotion and image path okay okay and then the only thing we're going
to do is to call this function over here get face landmarks and this is going
to be something like this face landmarks get face landmarks and we're going to input
the image obviously we need to create image first so I'm going to say something like image cv2 imread
image path okay and then we are going to input image like this so we're just going to use this
function in order to extract all the landmarks but let me show you how it works you can see that
we input an image over here the first thing we do is to convert this image from BGR to RGB and we
create this new object over here and then we are creating this new variable which is face mesh we
are using mediapipe and we are basically just using this object in order to process this
image and then we are just extracting all the landmarks from this object over here right so you
can see it's a very straightforward process but in order to save us some time we are just using this
function over here something that's very important is that you can see that for every single landmark
we are extracting three values the X Y and Z coordinates so for every single landmark
we are extracting these three values and then something else which is very important is that we
are going to get the face landmarks but for every single coordinate we are going to subtract
the minimum value of those coordinates because this is going to help us to normalize this data right
this is like a way of normalizing this data so we make sure that we are are just going to extract
all the landmarks and we don't really care where the face is located you can see that we have all
of these faces over here and all of them are in different positions right we have this face over
here then we have this face over here which is way upper right in the upper part of this image
then we have this other face over here which is much more central we have this other face over here
which is on the right we have all the faces in different positions so in order to normalize all
this data we are going to subtract the minimum value for each one of the coordinates and this is
going to help us in order to normalize the data and in order to train a much more robust model so
let's continue, now that we have extracted all the landmarks we can just continue with this process
and maybe something else I can show you about this function is this part over here which is going to
take care of all the drawing but we are not going to do any drawing in this part over here which
is the data preparation but we're going to do the drawing later on right we are going to use
exactly the same function in order to draw the face landmarks in the last part in this process
over here right when we are going to... when it's time to test the model so let's continue in order
to make sure everything's okay I'm just going to execute this script and everything seems to be
okay now I'm going to print the length of this array and let's see what happens... and you can see it
says this value over here which is 1404 so this is how much data we have in this list over here which
is the one we are going to use in order to train this model this is a lot of data these are a lot of
values we're going to use in order to train this classifier we could do it which much fewer values
because if you think about this problem most of the information is going to... it's going
to be in the landmarks in the lips right and in the eyes and in the... and that's pretty much all
right so we don't really need all the landmarks of the face if I'm not mistaken mediapipe extracts
something like 468 landmarks we don't really need so many landmarks in this problem but in order
to keep things simple we're just going to use all this data we have over here we could use much
fewer landmarks but let's just keep things simple and we're going to use all this data so let's do
something we are going to make sure that we are working with all this data because there could
be other situations in which we do not detect any face whatsoever and this is going to be an
empty list right and there could be potentially there could be other situations in which we have
detected a fewer amount of landmarks right we don't know what's going to happen so in order
to make sure we are working with all the data we need we are going to say something like this
if we have 1404 values then we are going to continue and then we are going to create a new
list which is output and we are going to... output append... face landmarks okay and then
obviously we also need the label right because otherwise we will not be
able to train this model so I'm going to say something like this for emotion index
emotion in enumerate... enumerate... okay... we are going to do exactly the same
but now we are going to extract the emotion I mean the... the emotion name happy sad and surprised but
we're also going to have an index an integer value for each one of these emotions and this is what we
are going to do now we are going to append another element to this list over here... which is going to
be the emotion right the emotion index actually is going to be this value over here and that's
pretty much all so now the last value we have in this list is going to be the emotion and then
we are just going to append this list over here to another variable which is output and this is the one
we are going to use in order to train this model just to make sure this is an integer I'm going to
something like this I'm going to cast this value to integer and that's pretty much all and now
the only thing I'm going to do is to call np save txt I'm going to name this
value something like data. txt and then I'm going to input
the... this variable over here which is output but I'm going to
convert into a numpy array right first okay and I obviously going to import numpy
otherwise this is not going to work import numpy as np... okay so this is pretty much all if I'm
not mistaken this is pretty much all so now I am just going to run this script and this is
going to be pretty much all in order to create all this data something I'm going to do first
is to quit this... my browser because I have noticed that every time I execute this script it
takes a lot of memory a lot a lot of memory so this is going to be pretty much... this is going
to be much better in order to execute this script so I'm just going to press play... okay and
that's pretty much all and now if I go to my directory over here you can see I have a new file
which is called data.txt and this is exactly the file we need in order to train the model so I'm
going to do something which is creating a new file which is called train model and this is the
file we are going to use in order to train this model remember we are going to use scikit learn in
order to train this classifier and we could do all the coding ourselves but this is a very good
problem to use chatgpt to just ask chatpgt to provide all the code we need I'm going to show you how
to use chatgpt for a problem like this remember programming is changing is evolving so it's
very important to get familiar with all the tools which are just going to make our lives way
easier and this this is a very good problem this is a very good situation to use chatgpt to just
make everything much easier so I'm just going to ask chatgpt something like this, give me Python
code to train a model using scikit learn random forest classifier the data needs to be loaded from a txt
file called data.txt each row is a different data point the label is the last value let's see what
happens... okay and this is what we got and obviously I have already prepared this tutorial so I know
how this should look like and yeah I think this is perfect this is exactly what we need so I'm
just going to copy and paste it over here... and you can see that we are importing numpy and scikit
learn then the file is called data.txt we are loading this file the label is the last value
so everything is okay and all the features are all the other values... then we are splitting the
data... we are splitting the data like this let me do something which is... do not show hints
okay, uh I'm going to do something else which is stratifying... stratify... according to the labels which are
Y this is very important because we want to make sure all these two sets the training set and
the test set they have the same amount of objects or actually the same proportion of objects of all
of our categories this is very important and also obviously shuffle equal true right and that's
pretty much all and then let's continue I'm going to create this random forest classifier without
any parameters I'm just going to use all the default parameters but other than that everything
is just perfect everything is exactly what we need right, so this is a very important tool remember
programming is changing is evolving so you need to be familiar with how to use chatgpt in order to
make your life way easier so I'm just going to execute this script as it is and let's see what
happens I'm going to close the browser again so we make sure we can execute this file with no...
without any issues, actually this training process is not really that memory consuming we are not
really consuming that much memory or that many resources so we didn't really need to close the
browser in this case but it doesn't matter let's continue so you can see that we're getting an
accuracy of 77% let's do something else which is we are going to import confusion matrix...
and we are going to print the confusion matrix as well so we have even more information
regarding what's going on... y test y pred okay so these are the results and you can see
that we are training a very powerful classifier you can see that for each one of our categories we
have an almost perfect result this one over here is where we have the best result but nevertheless
in these two other categories we also have very good result now that I think about it I'm not
really sure how this model is going through these categories I think this is alphabetical
order but I'm not completely sure only to make sure I am going to do exactly the same but I'm
going to prepare the data again... sorting this list over here so we make sure all the emotions are
sorted alphabetically now I'm going to train the model again sorry I'm going to prepare the data
again... and now I'm going to train the model again and let's see what happens now, okay now we have very
similar values so yeah this category over here is happy this one is sad and this one is surprised so
so you can see that for all of our categories we have very good results especially for the happy
category which is the one we have a much better classification a much better accuracy right but
nevertheless the other two categories we have a very good accuracy as well and overall we have
a 77% accuracy so this is a very good and a very powerful model so this is pretty much all in
order to train this classifier you can see how easy it was once we used chatgpt maybe I can explain
super quickly what exactly we are doing in all these instructions so over here we are loading
the data here we are just splitting data into the features and the labels here we are calling
train test split in order to produce the training set and the test set then we are just creating
a classifier we're going to use random forest classifier and if you ask me why we are using
random forest and we are not using any other classifier, just because, there's not any particular
reason this is a classifier I really like a lot because it performs very well so that's the only
reason why we are using it because I like it now let's continue then we are calling the fit method
of this classifier and the input is the training data right these are the training features and
these is the training labels and then then we are just calling the predict method and we are
predicting the test data and we are comparing the prediction against our ground truth which is
y test and then we are doing the same over here with the confusion matrix and that's pretty
much all and that's a very quick explanation regarding how this script works and now let's
continue with the next step in this process which will be to test the model this is where
we are going to use our webcam in order to test this model so I'm going to create a new file... new
file... test model... and we are going to do something like this we could use chatgpt as well but I'm just
going to do it myself it's not really that... it's not really that much so I'm going to import cv2
and I'm going to create a new object which is cap cv2 video capture and I'm going to use my webcam
with the index number two... in your case this depends on how many webcams you have connected to
your computer so if you have only one you have to say the number zero but if you have more than
one then you just have to select the one you want to use in this project, then I'm going to say
something like this ret frame equal to cap read and then while ret I'm going
to copy this instruction... we're just going to read frames from the webcam
one after the other and we are going to visualize these frames right
imshow frame and this is going to be called frame okay... and then we are going to wait something
like 25 milliseconds in order to produce something which is going to look like real time
we're going to wait 25 milliseconds after we show each one of our frames and this is
going to make the visualization look real time and then at the end we're just going
to say something like release and then cv2 close... close all windows something like
this if I'm not mistaken... destroy all windows okay, okay then we are going
to use this classifier... something we didn't do over here is to save
this classifier so I'm going to import... pickle and then pickle. dump... we
need to say something like... with open model this is going to be this object over here... and then f, this is actually w because we're
going to write a file and this is wb because we're going to write this file as bytes okay and that's
pretty much all so I'm going to train the model again... okay now we have saved the model over
here this is the model we have trained and now the only thing we need to do is to load this
model model over here so I'm going to import pickle... and then with open I'm going to input
the model path which is this one this is going to be rb as f... and then this is something
like model equal to... f no sorry pickle. load... f... okay and that's pretty much all then
we need to extract the landmarks so I'm going to import... from utils import get face landmarks
and then we are going to say something like this get face landmarks frame and we also
need to adjust another argument which is static image mode we're going to set it in false
because now this is going to be a video okay we are not going to draw anything for now so
this is going to be false for now later on we are going to say to true but for
now let's just leave it like this okay and these are going to be the face landmarks
okay and now we're just going to use this model... predict face landmarks okay and this is the
output... let's print output just to make sure everything is okay and then we are also going
to plot this value into our webcam but let's take it one step at the time, so let's see what
happens I'm going to press play... okay so we are getting some predictions now we are
predicting sad which is the index number one if I smile... now we are predicting zero which
is happy so everything seems to be okay now we are going to print the... the emotion the
category we are predicting we are going to print this value in the frame and in order
to do so we are going to use cv2.put text okay this is going to input the frame
is going to input the text we are going to uh print and this is going to be
something like this emotions equal to... happy... sad... surprised... okay then the position which is going to
be something like... something like 10 and the... the Y Position will be something like frame... shape... 0 minus one we want to print this
text in the bottom of the frame so this is going to be something like this and then
we need to input other values, one of them is the font we are going to use so I'm just going
to copy and paste this value over here something like this and then the text
size which I'm going to set in three the color... which is going to be green and
then the width which I'm going to set in five... something like this... oh I made
a mistake actually this goes over here okay... now everything should be okay and now we need
to input the emotion which is something like this output... zero... okay let's see now what happens I am going to convert this value into integer okay now let's see
if it works so sad... happy... and surprised... so everything is working just fine
now let's do something else which is plotting all the landmarks and in order to do so this
is going to be very simple the only thing we need to do is to draw equal to True
okay let's see now and you can see that now we're also visualizing all the landmarks
in my face so let's try again this is happy... sad... and surprised... so this is going to be all for
this tutorial my name is Felipe I'm a computer vision engineer and this is exactly how you
can build an emotion classifier using python scikit learn and mediapipe, this is going to be
all for today and see you on my next video