Creating an Emotion Recognition Synthetic Dataset with Python & Stable Diffusion | Image generation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
So on today's tutorial we are going to use this  amazing python library which is called diffusers   in order to create an emotion recognition  synthetic dataset, we are going to use stable   diffusion and we're going to generate hundreds or  thousands of images and then I'm going to show you   how to use this dataset in order to train an  emotion recognition model, in a future tutorial   I'm going to show you how to use this dataset in order  to train a model so this project is going to be   amazing and we're going to get more familiar with  diffusers, with this library from hugging face, and   we're also going to get more familiar with image  generation, stable diffusion and most importantly   we are going to get more familiar with synthetic  datasets and with how to create a synthetic dataset,   today's tutorial is not for beginners oh no no  creating a synthetic dataset is a very advanced   skill and is definitely going to set you apart  from many other developers and many other machine   learning engineers so this is a very advanced  technique in machine learning and let me show   you how we are going to create this synthetic dataset,  so this is a notebook I created in my Google   collab and this is where we will be working today,  I'm going to work on a Google collab only because,   only because we have a free GPU and only because  it's going to make things easier for you to   reproduce all these results but you can also work  on your local computer if you prefer you can work   on something like pycharm or visual studio and  it's going to be pretty much the same it's going   to be pretty much the same so you can just work  wherever you want but in my case I'm going to show   you how to do it in a Google collab, so this is a  notebook I have created in my Google collab and we   are going to execute this two cells first this is  where we are going to install all the requirements   we need in order to work on this project, this is  where we're going to install diffusers which is   this library over here and we also need to install  transformers which is another library we need in   order to work on this tutorial and then I'm  also going to execute this cell over here which   is where we are going to create an object which  is the object we are going to use in order to do   all the image generation, this is how we are going  to use stable diffusion using this library over   here right so I'm just going to execute these two  cells first and then I'm going to show you how to   do all the image generation so the only thing I'm  going to do is to execute these two cells...   ok now it's ready and if I scroll up let me show you  something, you can see that we have downloaded many   models and many files once we executed these two  instructions over here and these models are needed   in order to use stable diffusion and this library over  here which is called diffusers so please mind that   once you execute these instructions you are going  to download many files and some of them are huge   like this one over here is 3.4 GB then this one  over here is 1.2 GB so if you are working on your   local computer please mind that these instructions  are going to download many files and some of these   files are huge now let's continue but before we  continue please remember that we are going to   work on a Google collab with GPU right so to make  sure you are working on an instance with GPU you   need to click here and then change runtime type  and you can see it says T4 GPU right please make   sure this option is selected over here otherwise  you are going to be executing everything on a CPU   and it's just going to make everything takes much  longer right it's going to make everything to   take much more time so it doesn't make any sense  whatsoever please make sure you are working on a   GPU now let's continue... and now let me show you  how to do the image generation so you can see   that these are only a few instructions and this  is everything we need in order to do the image   generation, you can see I'm importing matplotlib  and this is the library we are going to use in   in order to visualize the images we are going to  generate and I have defined two variables one of   them is called prompt and the other one is called  negative prompt and then the only thing I'm   doing is calling this object we defined over here  which is called pipeline and then the only thing   we need to do is to input the prompt and the  negative prompt as arguments into this function   and that's pretty much all that's all it takes  to do image generation using this library which   is called diffusers this is amazing this  is very simple very straightforward and now let me   show you exactly what these two variables over  here mean so prompt this is where we are going   to specify what's the object we are going to  generate right so this is going to specify   what's the object or what's the image we are  going to generate and then the negative prompt   and this is very important because this is where  we are going to specify all the objects we don't   want to generate, and stable diffusion is going to  try to stay away from everything that's over   here and this is a very standard negative prompt  if you search for examples on Google you are going   to see that this is how they usually look like  in this case remember we are going to generate   human faces we're going to generate people and we  want these faces, these people. to look as realistic   as possible obviously otherwise this synthetic dataset  doesn't make any sense whatsoever because once we   train a model in order to recognize different  emotions we are going to apply this model on   people on real people, so the faces, the people, we  are going to generate now are obviously going to   look as realistic as possible otherwise nothing  is going to work so we're going to tell stable   diffusion to stay away of everything that looks like  a cartoon like an anime like a sketch and this is   going to help us in order to achieve much more  realistic results also we are going to try to   to generate images with the highest resolution  and the highest quality as possible so we are   going to tell stable diffusion to stay away of  everything that's low quality everything that's   low resolution and so on right we are going to try  to generate images in BGR we are going to try to   generate color images so we are also going to try  to make stable diffusion to stay away of everything   that's grayscale or everything that's black  and white and something which is usually a problem   when dealing with person generation with people  generation with face generation is that sometimes   the results look a little disfigured right  the people look a little like disfigured or   deformed or something like that so we are going  to explicitly tell stable diffusion to generate   results which are very far away from these objects  over here which are disfigured and deformed and   this going to help us in order to generate  pictures with people which do not look    disfigured or deformed this is very important  so this is a very classical negative prompt when   dealing with person generation and this is the one  we are going to use now so in order to move one   step at a time let me show you how it works so I'm  going to input a a prompt which says something   like... let's start one step by a time so I'm going  to say something like a picture of a squirrel   right so we are going to generate people later  on we're going to generate human faces but let's   take it one step at a time so I'm going to generate  a picture of a squirrel and this is going to be   my negative prompt and let's see what  happens... oh I see I have this character over here   which is not correct... okay so this going to take  some time and now the image is being generated and   then we are going to see how it looks like so this  is very important you can see it takes 9 seconds   in order to generate this image and this is the  result and you can see that this is definitely a   picture of a squirrel and this doesn't look like  a cartoon doesn't look like an anime doesn't look   like like a sketch or anything like that and this  is like a very high resolution image right so now   let's do something which is going to be more  similar to the images we are going to generate   in this tutorial which is a human face right...  yeah let's say something like this, okay this is   the result you can see that the eyes look like a  little disfigured or the eyes look like a little   strange but other than that I would say this is  definitely a human face so everything is okay   now let me show you the prompt we are going to  use in order to generate our results this is the   prompt we are going to use and this prompt is  going to help us in order to achieve the best   quality results as well so you can see... I'm going  to try to make it look better... something like this... okay something like that and maybe I can  do it even better I can take these two words   over here and this going to be better okay and  later on I'm going to explain why we have these   characters over here but for now the only thing  I'm going to do is to replace these characters   with 'a man smiling' okay so it says medium shot  portrait of a man smiling front view looking   at the camera color photography and then it  says photorealistic hyperrealistic realistic and so on   so we are going to use this prompt so we make  sure... we help stable diffusion to make images which   look realistic hyperrealistic with many many details  we want these pictures to look as realistic as   possible and we are going to tell stable diffusion  to create an image of a man smiling and with a   front view looking at the camera and so on and  also this is going to help stable diffusion to   make color images right... ideally we want  to generate color images and we want to avoid   anything that looks like grayscale right so  let's see what happens if we use this prompt   okay and this is the result you can see we have  generated a picture of a man smiling which is   looking at the camera with a very frontal view  this is a color image so everything is perfect   now let's do something else I'm going to repeat  the same process 10 times... for j in range 10... and I'm going to do exactly the same I'm going to  use the same prompt and the same negative prompt   every time and we are just going to plot all the  images and this going to help us to look at many   different examples so I can show you something...  and these are the results so you can see that   we are generating images which definitely look  realistic they definitely look very well very   realistic, most of these pictures if not all of  them are looking at the camera they are frontal   view mean, in most cases, this is not really frontal  this one over here is not frontal either this   one is not frontal either I mean it's like on  the side, a little on the side, but they are pretty   pretty well, now something that you are going to  notice when you are generating your images is that   in some cases although we are specifying exactly  what we are looking for over here and we are   specifying stable diffusion exactly what to stay  away from, in some cases we are going to have some   results which don't really match exactly what we  have over here right this is going to tell stable   diffusion to try to generate something like this but  in some cases it's not going to be possible and we   are going to have some images which are not really  frontal or we are going to have some images which   are not really color I mean some images which  are going to be grayscale I mean we are going   to have some situations which are not going to be  exactly like this as we have specified over here   but in most cases we are going to have some very  meaningful results... in most cases we are   going to have some very good results now something  that I noticed looking at these images is that I   think stable diffusion is taking this parameter  we have over here which is color photography it's   taking this as a parameter to specify the skintone  of the person that's being generated right   if you look at the results most of these results  are from a person with a dark skintone so we are   going to do something remember we are going to  generate an emotion recognition dataset and we   are going to generate a dataset which is going to  be comprised of many images of people of faces human   faces and we want to make this dataset as diverse  as possible so we are going to do something we are   going to explicitly tell stable diffusion what's  the ethnicity we want to generate in all these   pictures and we are going to generate pictures  of people from many different ethnicities and we   are going going to have the same amount of all  of these ethnicities so we make sure this is a   very balanced dataset and this is a very diverse  dataset and that we can use this dataset in order   to train a model so we are going to do something  only to show you how it works I'm going to say   medium short portrait of a white man smiling front  view and everything else right now I am explicitly   telling stable diffusion to give me pictures of a  given ethnicity and let's see what happens and   these are the results you can see that now we  are generating images of people with the exact   ethnicity we were asking for and this is what I  meant regarding that although we are specifying   stable diffusion that we want to generate color  images we want to generate images in BGR we want   to stay away of everything that looks like a  grayscale image, every once in a while we are   going to generate images like this right I'm not  sure if this is like grayscale or this is like   something which looks like grayscale but this  is like... this is not the type of images we   are looking for but nevertheless we are going to  generate images like this every once in a while   but other than that all the other pictures look  pretty much perfect this is not really a frontal   view but... he's looking at the side but other  than that everything is okay we have another   grayscale image over here and this one is not  looking at the front either he's not looking   at the camera either but everything is okay  right most of the pictures are okay, so this   exactly what we will be doing this is exactly  how we are going to generate all these pictures   in order to make sure in order to try to have a  very diverse dataset of people with many different   ethnicities and many different features and many  different everything, in order to keep this dataset   as diverse as possible we are going to  explicitly tell stable diffusion what are the   ethnicities we want to generate and we also want  to tell stable diffusion the gender we want to generate   we want to generate many pictures of men and we  also want to generate many pictures of woman right   so this is how we are going to do... I'm going  to create another variable which is ethnicities...   ethnicities... and this is going to be something like this  let's start with latino... then white... black... middle eastern... indian and asian, so these are the six  ethnicities we are going to use in this example   in order to show you how to create this  synthetic dataset and now let's continue   I'm going to create another list which is  genders and this is going to be male... and... female... okay so male and female and  we also have these ethnicities and   everything is okay and now I am going  to replace this value where it says   white... maybe I can do something like this...  a latino, a white, a black, and indan,   an asian... so I can just replace all these  two words and then I am going to say format... and I am going to do  something like... I'm going to   do it over here I'm going to import random because... we are going to sort for every  single iteration we are going to sort one   of these ethnicities and one of these  genders, so ethnicity equal to random choice... ethnicities okay and then  gender equal to the same random choice... genders okay and then... I'm going to put this value over  here and then I need to do the same over here... okay now the results are going to be more  random right now we're going to generate pictures   of people of many different ethnicities and many  different genders or actually 2 different   genders so let me show you how the results look  like in this case, and these are the results so   you can see in this case we're generating the  picture of a woman so everything is okay then   we are generating the picture of another woman  everything's okay and then we are generating the   picture of something which is definitely not a  person so let's see what happens over here but   let's see if we have another result like this  this is another picture of a person a person a   person a person and a person a person and also  a person so this is the only strange result we   have over here and I'm not sure why this is going  on my thoughts are that as we are specifying the   gender as male or female and we are not really  explicitly telling stable diffusion to generate   people so I guess in this case it has generated  something which is not a person right it's some   sort of animal or something like that so let's  make sure we are generating people and maybe we   can say something like this men and woman right  you can see that this is not a set and forget   right this is something you have to be on top to  make sure everything works as expected okay and   now we have another situation over here and if  I scroll down we generated a couple of animals   over here not sure what could be going on but  I am going going to do something I'm going to   explicitly tell stable diffusion to generate  pictures of people so medium shot portrait of a person... I'm going to do it like this...  again and I'm going to remove the 'a' over here... or maybe I can do it like this... okay now we are telling stable diffusion to  generate pictures of people and let's see   if this fixes all these issues we are having  okay now we are making some progress because now   we are generating pictures of people we are  not generating any animal whatsoever all of   these pictures are about people but I do notice  that this random is not really that random right   because we are generating pictures of women in all  cases and also I see the ethnicity is not really   that diverse so let's do something I'm going  to start this random generator I'm going to say... random seed 50 maybe I'm going  to do something which is I'm going   to print... the ethnicity... and the gender we are using... everything's okay right?... yes let's see what happens, okay we obviously  have a mistake because you can see that we are not   generating the pictures we are looking for, right...  we are... definitely we have definitely we have a   mistake so let me see what could be going on... okay  I see the issue so the format, this is actually   only one string but the format we need to do  it in this one over here... okay over there... okay   now this is going to work just fine, okay now  everything seems to be just fine here we are   generating a middle eastern female and everything  looks okay black male asian female black male   indian female white male white female and so  on okay now everything looks the way it should   everything looks much more diverse so everything  is okay so I'm going to do something because I   notice in these two cases we are generating the  picture of an entire person and we don't really   care about the entire person we only want a  portrait right so I'm going to do something   I'm going to get back to the original version  which was this one and then I'm going to remove   the seed because well we don't really need it and  then that's pretty much all, and now let me show   you how to generate all the different emotions  remember this is an emotion recognition dataset   so it's very important we generate different  emotions and the emotions we are going to generate are... happy sad... surprised... and angry... let's keep it simple for  now and we are only going to generate these four   emotions, some of these emotions are going to  be easier than others in order to generate an   image of them for example if we want to generate  happy pictures the only thing we need to specify   is smiling and you can see that all of these  pictures are smiling and these are definitely   very happy people so this is all we need in  order to generate... happy... the happy emotion   but if we want to generate the other emotions we  need to be a little more crafty because otherwise   the emotions are not going to be very realistic  so let me show you, so in order to generate the   happy emotion by saying smiling is going to be  enough and if you want to generate angry people,   the angry emotion, I think it's going to be just  fine if we just set it as angry let me show you   I'm going to generate three pictures and this is what  we got I would say that we are okay... I mean yeah I   would say these pictures are angry maybe they are  not super angry but they are angry I mean it's   okay so the happy emotion is going to be very easy  the only thing we're going to do is to put smiling   in order to do the angry emotion we are going to  just say angry but in order to generate the other   emotions which are surprised and sad we are going  to use these prompts we have over here right for   example for surprise we are going to say surprised  open mouth and raised eyebrows right so not only   we are going to say the emotion but we're also  going to describe how this emotion looks like on   a face and we are going to do the same for the sad  emotion we're going to generate people frowning   and with a sad face expression and crying right  and this is definitely going to be sad so what I'm   going to do now is to create a dictionary... emotions  is actually going to be a dictionary and we are   going to use this dictionary in order to say all  the prompts we are going to use in this case this   is going to be something like emotions prompts  something like this emotion prompts so happy will be smiling... sad will be this prompt over here... let's do something... like this okay surprised will be this prompt... and then angry... will be 'angry' okay so  this is pretty much all and obviously we want to   create a dataset and remember datasets always  need to be balanced right somehow balanced so   we are going to generate the same amount of  pictures for each one of these emotions and   this is how we're going to do... we are going to  do something like this for emotion prompt in emotion... prompt... then something like this and  then format emotion prompt sorry this   is going to be emotion prompts keys... and  then emotion prompt this is actually emotion... this is emotion prompts emotion okay  okay this is going to work just fine now for every   iteration we are going to generate a picture for  each one of our emotions right so now I'm going   to print emotion ethnicity and gender and let's  see what happens and I'm only going to do it for   one iteration right, or for two iterations, so let's  see what happens, okay here we have happy an Asian   female and yeah everything is okay sad a Latino male  everything is okay yeah he looks sad surprised she   looks surprised for sure here we have angry and  we also have a man which is pretty much naked I   don't know how that happened but it doesn't matter  we only care about the emotions so everything's   okay then let's continue this one is happy this  one is sad this one is surprised and this one is   angry and yeah he looks angry for sure so these  are the pictures we are generating and we are   almost there because now the only thing we need  to do is to iterate in something like... 250 times   we're going to iterate 250 times and this going  to generate 1,000 images, 1,000 images and that's   going to be pretty much enough in order to train  the emotion recognition model we want to train in   a future tutorial so now the only thing we need to  do is to iterate all these many times and then we   are going to save the pictures in this... collab  and then we are going to get the pictures into   our computer so let's do something I'm going to  create some directories, I'm going to import os... and then I going to os makedirs this is going to be content faces  happy let's do something exists... okay... something like this I don't remember  if it's... yeah exist okay and this is going   to be true and we're going to create a  directory for each one of our emotions so sad... angry... and surprised okay so now we are going  to create... we're creating a directory for each   one of our emotions and then the only thing  we need to do is to save the pictures there   and something which I haven't told you so  far is that the object we got this image   we got when we execute this instruction is a  pillow image right so this is in pillow format   and this is a pillow object so in order to  save this picture we need to say something   like this, image save, I'm going need to specify  the location and this is going to be something like... like this... format emotion and that's pretty much all and then obviously... we are iterating over here... okay so we are going to save the pictures  in this directory and we are naming these   pictures according to the iteration we are  currently in and we are just filling the number...   the filename with four zeros so we make sure  all these filenames are named properly so this is   pretty much all we need to do we don't really need  to visualize the pictures anymore we don't really   need to do this print anymore and although we are  going to run this process for 250 iterations   so we generate 1,000 pictures let's do it for  only maybe two or three iterations for now so we   make sure everything works properly and then  we are going to run this process for   250 iterations let's make sure the pictures are  saved properly and also let's make sure we can   get these pictures into our computer right because  remember we are working on a google collab so we need   to do something in order to get the pictures into  our computer so let's execute it and let's see if   it works... okay so this is what we got remember we  are not visualizing the images anymore so this is   the only thing we got now, now please mind this  message we got over here which says potential   non suitable for work content was detected in one or more  images a black image will be returned instead so   you are going to have a similar message every once  in a while and this means that after you generate   the entire dataset you will need to curate this  dataset and you will need to remove all these black   images which have been generated now I'm going to  show you how to get all these images we generated   and then we are going to do the entire iteration  right then we are going to iterate 250 times and   we're going to generate 1,000 images so let me  show you how to get these images first I have a   few commands over here the only thing I'm going  to do is to copy and paste these commands this   command over here is going going to zip the entire  faces directory we created with all the pictures   within this directory and then we are just going  to generate this file which is called faces. zip   so I'm going to execute this file first and then  we are going to mount our Google drive because we   are going to copy all this data into our Google  Drive and then we're just going to download the   file from our Google Drive and that's going to  make the process of downloading this data much   easier okay and then the only thing we need to  do is to execute this command over here... and you   can see that it says SCP this file which is  faces.zip into this location of my Google   Drive which is my drive synthetic data face  generation stable diffusion faces1.zip actually   I can call this file faces.zip and if I show  you this location in my Google Drive this   is a directory I created my Google Drive which  is called synthetic dataset face generation   stable diffusion so please remember to change  this value to a value to a directory in your   Google Drive otherwise this is not going to work  so the only thing I'm going to do is to press   enter and this is going to take care of copying  this file let's see what happens... I think this directory... I have a typo with this directory  name yeah I have a typo synthetic dataset   face generation stable diffusion faces.zip let's see  now now everything's okay and if I get back to my   Google Drive if I refresh sometimes you have  to wait a few minutes okay now so I'm going to   take this file and I'm going to download this  file like this now I'm going to open this file   and this is where we have these four directories  which are surprised sad happy and angry and let me   show you a couple of examples this is one of the  images under the surprised category now let me   show you another example over here this is sad so  everything is okay... happy we have a person that's   smiling so everything's okay and then angry...  this is the black image we generated so we are   going to remove this image later on and this is  a picture of a person that's angry so everything   looks okay so the only thing we need to do now  is to iterate for something like 250 iterations   so we are going to generate 1,000 pictures and  then we are just going to do the same process we   are going to zp the directory and then we're just  going to download the directory like this, now I'm   going to scroll up and I'm going to generate  something like 1,000 images and that's pretty   much all so something that's very important is  that if you want to download exactly the same   data set I am currently generating this is going  to be available in my patreon so it's going to   be available to all my patreon supporters if you  want to download exactly the same dataset I am   generating right now you can definitely do so in my  patreon so this is going to be all for today this   is how you can generate an emotion recognition  synthetic dataset and this is going to be   all for this tutorial my name is Felipe I'm a  computer vision engineer and see you on my next video
Info
Channel: Computer vision engineer
Views: 1,065
Rating: undefined out of 5
Keywords:
Id: vXASIOLQGrU
Channel Id: undefined
Length: 35min 24sec (2124 seconds)
Published: Fri Jan 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.