So on today's tutorial we are going to use this
amazing python library which is called diffusers in order to create an emotion recognition
synthetic dataset, we are going to use stable diffusion and we're going to generate hundreds or
thousands of images and then I'm going to show you how to use this dataset in order to train an
emotion recognition model, in a future tutorial I'm going to show you how to use this dataset in order
to train a model so this project is going to be amazing and we're going to get more familiar with
diffusers, with this library from hugging face, and we're also going to get more familiar with image
generation, stable diffusion and most importantly we are going to get more familiar with synthetic
datasets and with how to create a synthetic dataset, today's tutorial is not for beginners oh no no
creating a synthetic dataset is a very advanced skill and is definitely going to set you apart
from many other developers and many other machine learning engineers so this is a very advanced
technique in machine learning and let me show you how we are going to create this synthetic dataset,
so this is a notebook I created in my Google collab and this is where we will be working today,
I'm going to work on a Google collab only because, only because we have a free GPU and only because
it's going to make things easier for you to reproduce all these results but you can also work
on your local computer if you prefer you can work on something like pycharm or visual studio and
it's going to be pretty much the same it's going to be pretty much the same so you can just work
wherever you want but in my case I'm going to show you how to do it in a Google collab, so this is a
notebook I have created in my Google collab and we are going to execute this two cells first this is
where we are going to install all the requirements we need in order to work on this project, this is
where we're going to install diffusers which is this library over here and we also need to install
transformers which is another library we need in order to work on this tutorial and then I'm
also going to execute this cell over here which is where we are going to create an object which
is the object we are going to use in order to do all the image generation, this is how we are going
to use stable diffusion using this library over here right so I'm just going to execute these two
cells first and then I'm going to show you how to do all the image generation so the only thing I'm
going to do is to execute these two cells... ok now it's ready and if I scroll up let me show you
something, you can see that we have downloaded many models and many files once we executed these two
instructions over here and these models are needed in order to use stable diffusion and this library over
here which is called diffusers so please mind that once you execute these instructions you are going
to download many files and some of them are huge like this one over here is 3.4 GB then this one
over here is 1.2 GB so if you are working on your local computer please mind that these instructions
are going to download many files and some of these files are huge now let's continue but before we
continue please remember that we are going to work on a Google collab with GPU right so to make
sure you are working on an instance with GPU you need to click here and then change runtime type
and you can see it says T4 GPU right please make sure this option is selected over here otherwise
you are going to be executing everything on a CPU and it's just going to make everything takes much
longer right it's going to make everything to take much more time so it doesn't make any sense
whatsoever please make sure you are working on a GPU now let's continue... and now let me show you
how to do the image generation so you can see that these are only a few instructions and this
is everything we need in order to do the image generation, you can see I'm importing matplotlib
and this is the library we are going to use in in order to visualize the images we are going to
generate and I have defined two variables one of them is called prompt and the other one is called
negative prompt and then the only thing I'm doing is calling this object we defined over here
which is called pipeline and then the only thing we need to do is to input the prompt and the
negative prompt as arguments into this function and that's pretty much all that's all it takes
to do image generation using this library which is called diffusers this is amazing this
is very simple very straightforward and now let me show you exactly what these two variables over
here mean so prompt this is where we are going to specify what's the object we are going to
generate right so this is going to specify what's the object or what's the image we are
going to generate and then the negative prompt and this is very important because this is where
we are going to specify all the objects we don't want to generate, and stable diffusion is going to
try to stay away from everything that's over here and this is a very standard negative prompt
if you search for examples on Google you are going to see that this is how they usually look like
in this case remember we are going to generate human faces we're going to generate people and we
want these faces, these people. to look as realistic as possible obviously otherwise this synthetic dataset
doesn't make any sense whatsoever because once we train a model in order to recognize different
emotions we are going to apply this model on people on real people, so the faces, the people, we
are going to generate now are obviously going to look as realistic as possible otherwise nothing
is going to work so we're going to tell stable diffusion to stay away of everything that looks like
a cartoon like an anime like a sketch and this is going to help us in order to achieve much more
realistic results also we are going to try to to generate images with the highest resolution
and the highest quality as possible so we are going to tell stable diffusion to stay away of
everything that's low quality everything that's low resolution and so on right we are going to try
to generate images in BGR we are going to try to generate color images so we are also going to try
to make stable diffusion to stay away of everything that's grayscale or everything that's black
and white and something which is usually a problem when dealing with person generation with people
generation with face generation is that sometimes the results look a little disfigured right
the people look a little like disfigured or deformed or something like that so we are going
to explicitly tell stable diffusion to generate results which are very far away from these objects
over here which are disfigured and deformed and this going to help us in order to generate
pictures with people which do not look disfigured or deformed this is very important
so this is a very classical negative prompt when dealing with person generation and this is the one
we are going to use now so in order to move one step at a time let me show you how it works so I'm
going to input a a prompt which says something like... let's start one step by a time so I'm going
to say something like a picture of a squirrel right so we are going to generate people later
on we're going to generate human faces but let's take it one step at a time so I'm going to generate
a picture of a squirrel and this is going to be my negative prompt and let's see what
happens... oh I see I have this character over here which is not correct... okay so this going to take
some time and now the image is being generated and then we are going to see how it looks like so this
is very important you can see it takes 9 seconds in order to generate this image and this is the
result and you can see that this is definitely a picture of a squirrel and this doesn't look like
a cartoon doesn't look like an anime doesn't look like like a sketch or anything like that and this
is like a very high resolution image right so now let's do something which is going to be more
similar to the images we are going to generate in this tutorial which is a human face right...
yeah let's say something like this, okay this is the result you can see that the eyes look like a
little disfigured or the eyes look like a little strange but other than that I would say this is
definitely a human face so everything is okay now let me show you the prompt we are going to
use in order to generate our results this is the prompt we are going to use and this prompt is
going to help us in order to achieve the best quality results as well so you can see... I'm going
to try to make it look better... something like this... okay something like that and maybe I can
do it even better I can take these two words over here and this going to be better okay and
later on I'm going to explain why we have these characters over here but for now the only thing
I'm going to do is to replace these characters with 'a man smiling' okay so it says medium shot
portrait of a man smiling front view looking at the camera color photography and then it
says photorealistic hyperrealistic realistic and so on so we are going to use this prompt so we make
sure... we help stable diffusion to make images which look realistic hyperrealistic with many many details
we want these pictures to look as realistic as possible and we are going to tell stable diffusion
to create an image of a man smiling and with a front view looking at the camera and so on and
also this is going to help stable diffusion to make color images right... ideally we want
to generate color images and we want to avoid anything that looks like grayscale right so
let's see what happens if we use this prompt okay and this is the result you can see we have
generated a picture of a man smiling which is looking at the camera with a very frontal view
this is a color image so everything is perfect now let's do something else I'm going to repeat
the same process 10 times... for j in range 10... and I'm going to do exactly the same I'm going to
use the same prompt and the same negative prompt every time and we are just going to plot all the
images and this going to help us to look at many different examples so I can show you something...
and these are the results so you can see that we are generating images which definitely look
realistic they definitely look very well very realistic, most of these pictures if not all of
them are looking at the camera they are frontal view mean, in most cases, this is not really frontal
this one over here is not frontal either this one is not frontal either I mean it's like on
the side, a little on the side, but they are pretty pretty well, now something that you are going to
notice when you are generating your images is that in some cases although we are specifying exactly
what we are looking for over here and we are specifying stable diffusion exactly what to stay
away from, in some cases we are going to have some results which don't really match exactly what we
have over here right this is going to tell stable diffusion to try to generate something like this but
in some cases it's not going to be possible and we are going to have some images which are not really
frontal or we are going to have some images which are not really color I mean some images which
are going to be grayscale I mean we are going to have some situations which are not going to be
exactly like this as we have specified over here but in most cases we are going to have some very
meaningful results... in most cases we are going to have some very good results now something
that I noticed looking at these images is that I think stable diffusion is taking this parameter
we have over here which is color photography it's taking this as a parameter to specify the skintone
of the person that's being generated right if you look at the results most of these results
are from a person with a dark skintone so we are going to do something remember we are going to
generate an emotion recognition dataset and we are going to generate a dataset which is going to
be comprised of many images of people of faces human faces and we want to make this dataset as diverse
as possible so we are going to do something we are going to explicitly tell stable diffusion what's
the ethnicity we want to generate in all these pictures and we are going to generate pictures
of people from many different ethnicities and we are going going to have the same amount of all
of these ethnicities so we make sure this is a very balanced dataset and this is a very diverse
dataset and that we can use this dataset in order to train a model so we are going to do something
only to show you how it works I'm going to say medium short portrait of a white man smiling front
view and everything else right now I am explicitly telling stable diffusion to give me pictures of a
given ethnicity and let's see what happens and these are the results you can see that now we
are generating images of people with the exact ethnicity we were asking for and this is what I
meant regarding that although we are specifying stable diffusion that we want to generate color
images we want to generate images in BGR we want to stay away of everything that looks like a
grayscale image, every once in a while we are going to generate images like this right I'm not
sure if this is like grayscale or this is like something which looks like grayscale but this
is like... this is not the type of images we are looking for but nevertheless we are going to
generate images like this every once in a while but other than that all the other pictures look
pretty much perfect this is not really a frontal view but... he's looking at the side but other
than that everything is okay we have another grayscale image over here and this one is not
looking at the front either he's not looking at the camera either but everything is okay
right most of the pictures are okay, so this exactly what we will be doing this is exactly
how we are going to generate all these pictures in order to make sure in order to try to have a
very diverse dataset of people with many different ethnicities and many different features and many
different everything, in order to keep this dataset as diverse as possible we are going to
explicitly tell stable diffusion what are the ethnicities we want to generate and we also want
to tell stable diffusion the gender we want to generate we want to generate many pictures of men and we
also want to generate many pictures of woman right so this is how we are going to do... I'm going
to create another variable which is ethnicities... ethnicities... and this is going to be something like this
let's start with latino... then white... black... middle eastern... indian and asian, so these are the six
ethnicities we are going to use in this example in order to show you how to create this
synthetic dataset and now let's continue I'm going to create another list which is
genders and this is going to be male... and... female... okay so male and female and
we also have these ethnicities and everything is okay and now I am going
to replace this value where it says white... maybe I can do something like this...
a latino, a white, a black, and indan, an asian... so I can just replace all these
two words and then I am going to say format... and I am going to do
something like... I'm going to do it over here I'm going to import random because... we are going to sort for every
single iteration we are going to sort one of these ethnicities and one of these
genders, so ethnicity equal to random choice... ethnicities okay and then
gender equal to the same random choice... genders okay and then... I'm going to put this value over
here and then I need to do the same over here... okay now the results are going to be more
random right now we're going to generate pictures of people of many different ethnicities and many
different genders or actually 2 different genders so let me show you how the results look
like in this case, and these are the results so you can see in this case we're generating the
picture of a woman so everything is okay then we are generating the picture of another woman
everything's okay and then we are generating the picture of something which is definitely not a
person so let's see what happens over here but let's see if we have another result like this
this is another picture of a person a person a person a person and a person a person and also
a person so this is the only strange result we have over here and I'm not sure why this is going
on my thoughts are that as we are specifying the gender as male or female and we are not really
explicitly telling stable diffusion to generate people so I guess in this case it has generated
something which is not a person right it's some sort of animal or something like that so let's
make sure we are generating people and maybe we can say something like this men and woman right
you can see that this is not a set and forget right this is something you have to be on top to
make sure everything works as expected okay and now we have another situation over here and if
I scroll down we generated a couple of animals over here not sure what could be going on but
I am going going to do something I'm going to explicitly tell stable diffusion to generate
pictures of people so medium shot portrait of a person... I'm going to do it like this...
again and I'm going to remove the 'a' over here... or maybe I can do it like this... okay now we are telling stable diffusion to
generate pictures of people and let's see if this fixes all these issues we are having
okay now we are making some progress because now we are generating pictures of people we are
not generating any animal whatsoever all of these pictures are about people but I do notice
that this random is not really that random right because we are generating pictures of women in all
cases and also I see the ethnicity is not really that diverse so let's do something I'm going
to start this random generator I'm going to say... random seed 50 maybe I'm going
to do something which is I'm going to print... the ethnicity... and the gender we are using... everything's okay right?... yes let's see what happens, okay we obviously
have a mistake because you can see that we are not generating the pictures we are looking for, right...
we are... definitely we have definitely we have a mistake so let me see what could be going on... okay
I see the issue so the format, this is actually only one string but the format we need to do
it in this one over here... okay over there... okay now this is going to work just fine, okay now
everything seems to be just fine here we are generating a middle eastern female and everything
looks okay black male asian female black male indian female white male white female and so
on okay now everything looks the way it should everything looks much more diverse so everything
is okay so I'm going to do something because I notice in these two cases we are generating the
picture of an entire person and we don't really care about the entire person we only want a
portrait right so I'm going to do something I'm going to get back to the original version
which was this one and then I'm going to remove the seed because well we don't really need it and
then that's pretty much all, and now let me show you how to generate all the different emotions
remember this is an emotion recognition dataset so it's very important we generate different
emotions and the emotions we are going to generate are... happy sad... surprised... and angry... let's keep it simple for
now and we are only going to generate these four emotions, some of these emotions are going to
be easier than others in order to generate an image of them for example if we want to generate
happy pictures the only thing we need to specify is smiling and you can see that all of these
pictures are smiling and these are definitely very happy people so this is all we need in
order to generate... happy... the happy emotion but if we want to generate the other emotions we
need to be a little more crafty because otherwise the emotions are not going to be very realistic
so let me show you, so in order to generate the happy emotion by saying smiling is going to be
enough and if you want to generate angry people, the angry emotion, I think it's going to be just
fine if we just set it as angry let me show you I'm going to generate three pictures and this is what
we got I would say that we are okay... I mean yeah I would say these pictures are angry maybe they are
not super angry but they are angry I mean it's okay so the happy emotion is going to be very easy
the only thing we're going to do is to put smiling in order to do the angry emotion we are going to
just say angry but in order to generate the other emotions which are surprised and sad we are going
to use these prompts we have over here right for example for surprise we are going to say surprised
open mouth and raised eyebrows right so not only we are going to say the emotion but we're also
going to describe how this emotion looks like on a face and we are going to do the same for the sad
emotion we're going to generate people frowning and with a sad face expression and crying right
and this is definitely going to be sad so what I'm going to do now is to create a dictionary... emotions
is actually going to be a dictionary and we are going to use this dictionary in order to say all
the prompts we are going to use in this case this is going to be something like emotions prompts
something like this emotion prompts so happy will be smiling... sad will be this prompt over here... let's do something... like this okay surprised will be this prompt... and then angry... will be 'angry' okay so
this is pretty much all and obviously we want to create a dataset and remember datasets always
need to be balanced right somehow balanced so we are going to generate the same amount of
pictures for each one of these emotions and this is how we're going to do... we are going to
do something like this for emotion prompt in emotion... prompt... then something like this and
then format emotion prompt sorry this is going to be emotion prompts keys... and
then emotion prompt this is actually emotion... this is emotion prompts emotion okay
okay this is going to work just fine now for every iteration we are going to generate a picture for
each one of our emotions right so now I'm going to print emotion ethnicity and gender and let's
see what happens and I'm only going to do it for one iteration right, or for two iterations, so let's
see what happens, okay here we have happy an Asian female and yeah everything is okay sad a Latino male
everything is okay yeah he looks sad surprised she looks surprised for sure here we have angry and
we also have a man which is pretty much naked I don't know how that happened but it doesn't matter
we only care about the emotions so everything's okay then let's continue this one is happy this
one is sad this one is surprised and this one is angry and yeah he looks angry for sure so these
are the pictures we are generating and we are almost there because now the only thing we need
to do is to iterate in something like... 250 times we're going to iterate 250 times and this going
to generate 1,000 images, 1,000 images and that's going to be pretty much enough in order to train
the emotion recognition model we want to train in a future tutorial so now the only thing we need to
do is to iterate all these many times and then we are going to save the pictures in this... collab
and then we are going to get the pictures into our computer so let's do something I'm going to
create some directories, I'm going to import os... and then I going to os makedirs this is going to be content faces
happy let's do something exists... okay... something like this I don't remember
if it's... yeah exist okay and this is going to be true and we're going to create a
directory for each one of our emotions so sad... angry... and surprised okay so now we are going
to create... we're creating a directory for each one of our emotions and then the only thing
we need to do is to save the pictures there and something which I haven't told you so
far is that the object we got this image we got when we execute this instruction is a
pillow image right so this is in pillow format and this is a pillow object so in order to
save this picture we need to say something like this, image save, I'm going need to specify
the location and this is going to be something like... like this... format emotion and that's
pretty much all and then obviously... we are iterating over here... okay so we are going to save the pictures
in this directory and we are naming these pictures according to the iteration we are
currently in and we are just filling the number... the filename with four zeros so we make sure
all these filenames are named properly so this is pretty much all we need to do we don't really need
to visualize the pictures anymore we don't really need to do this print anymore and although we are
going to run this process for 250 iterations so we generate 1,000 pictures let's do it for
only maybe two or three iterations for now so we make sure everything works properly and then
we are going to run this process for 250 iterations let's make sure the pictures are
saved properly and also let's make sure we can get these pictures into our computer right because
remember we are working on a google collab so we need to do something in order to get the pictures into
our computer so let's execute it and let's see if it works... okay so this is what we got remember we
are not visualizing the images anymore so this is the only thing we got now, now please mind this
message we got over here which says potential non suitable for work content was detected in one or more
images a black image will be returned instead so you are going to have a similar message every once
in a while and this means that after you generate the entire dataset you will need to curate this
dataset and you will need to remove all these black images which have been generated now I'm going to
show you how to get all these images we generated and then we are going to do the entire iteration
right then we are going to iterate 250 times and we're going to generate 1,000 images so let me
show you how to get these images first I have a few commands over here the only thing I'm going
to do is to copy and paste these commands this command over here is going going to zip the entire
faces directory we created with all the pictures within this directory and then we are just going
to generate this file which is called faces. zip so I'm going to execute this file first and then
we are going to mount our Google drive because we are going to copy all this data into our Google
Drive and then we're just going to download the file from our Google Drive and that's going to
make the process of downloading this data much easier okay and then the only thing we need to
do is to execute this command over here... and you can see that it says SCP this file which is
faces.zip into this location of my Google Drive which is my drive synthetic data face
generation stable diffusion faces1.zip actually I can call this file faces.zip and if I show
you this location in my Google Drive this is a directory I created my Google Drive which
is called synthetic dataset face generation stable diffusion so please remember to change
this value to a value to a directory in your Google Drive otherwise this is not going to work
so the only thing I'm going to do is to press enter and this is going to take care of copying
this file let's see what happens... I think this directory... I have a typo with this directory
name yeah I have a typo synthetic dataset face generation stable diffusion faces.zip let's see
now now everything's okay and if I get back to my Google Drive if I refresh sometimes you have
to wait a few minutes okay now so I'm going to take this file and I'm going to download this
file like this now I'm going to open this file and this is where we have these four directories
which are surprised sad happy and angry and let me show you a couple of examples this is one of the
images under the surprised category now let me show you another example over here this is sad so
everything is okay... happy we have a person that's smiling so everything's okay and then angry...
this is the black image we generated so we are going to remove this image later on and this is
a picture of a person that's angry so everything looks okay so the only thing we need to do now
is to iterate for something like 250 iterations so we are going to generate 1,000 pictures and
then we are just going to do the same process we are going to zp the directory and then we're just
going to download the directory like this, now I'm going to scroll up and I'm going to generate
something like 1,000 images and that's pretty much all so something that's very important is
that if you want to download exactly the same data set I am currently generating this is going
to be available in my patreon so it's going to be available to all my patreon supporters if you
want to download exactly the same dataset I am generating right now you can definitely do so in my
patreon so this is going to be all for today this is how you can generate an emotion recognition
synthetic dataset and this is going to be all for this tutorial my name is Felipe I'm a
computer vision engineer and see you on my next video