Creating an Emotion Recognition Synthetic Dataset with Python & Stable Diffusion | Image generation

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

So on today's tutorial we are going to use this amazing python library which is called diffusers in order to create an emotion recognition synthetic dataset, we are going to use stable diffusion and we're going to generate hundreds or thousands of images and then I'm going to show you how to use this dataset in order to train an emotion recognition model, in a future tutorial I'm going to show you how to use this dataset in order to train a model so this project is going to be amazing and we're going to get more familiar with diffusers, with this library from hugging face, and we're also going to get more familiar with image generation, stable diffusion and most importantly we are going to get more familiar with synthetic datasets and with how to create a synthetic dataset, today's tutorial is not for beginners oh no no creating a synthetic dataset is a very advanced skill and is definitely going to set you apart from many other developers and many other machine learning engineers so this is a very advanced technique in machine learning and let me show you how we are going to create this synthetic dataset, so this is a notebook I created in my Google collab and this is where we will be working today, I'm going to work on a Google collab only because, only because we have a free GPU and only because it's going to make things easier for you to reproduce all these results but you can also work on your local computer if you prefer you can work on something like pycharm or visual studio and it's going to be pretty much the same it's going to be pretty much the same so you can just work wherever you want but in my case I'm going to show you how to do it in a Google collab, so this is a notebook I have created in my Google collab and we are going to execute this two cells first this is where we are going to install all the requirements we need in order to work on this project, this is where we're going to install diffusers which is this library over here and we also need to install transformers which is another library we need in order to work on this tutorial and then I'm also going to execute this cell over here which is where we are going to create an object which is the object we are going to use in order to do all the image generation, this is how we are going to use stable diffusion using this library over here right so I'm just going to execute these two cells first and then I'm going to show you how to do all the image generation so the only thing I'm going to do is to execute these two cells... ok now it's ready and if I scroll up let me show you something, you can see that we have downloaded many models and many files once we executed these two instructions over here and these models are needed in order to use stable diffusion and this library over here which is called diffusers so please mind that once you execute these instructions you are going to download many files and some of them are huge like this one over here is 3.4 GB then this one over here is 1.2 GB so if you are working on your local computer please mind that these instructions are going to download many files and some of these files are huge now let's continue but before we continue please remember that we are going to work on a Google collab with GPU right so to make sure you are working on an instance with GPU you need to click here and then change runtime type and you can see it says T4 GPU right please make sure this option is selected over here otherwise you are going to be executing everything on a CPU and it's just going to make everything takes much longer right it's going to make everything to take much more time so it doesn't make any sense whatsoever please make sure you are working on a GPU now let's continue... and now let me show you how to do the image generation so you can see that these are only a few instructions and this is everything we need in order to do the image generation, you can see I'm importing matplotlib and this is the library we are going to use in in order to visualize the images we are going to generate and I have defined two variables one of them is called prompt and the other one is called negative prompt and then the only thing I'm doing is calling this object we defined over here which is called pipeline and then the only thing we need to do is to input the prompt and the negative prompt as arguments into this function and that's pretty much all that's all it takes to do image generation using this library which is called diffusers this is amazing this is very simple very straightforward and now let me show you exactly what these two variables over here mean so prompt this is where we are going to specify what's the object we are going to generate right so this is going to specify what's the object or what's the image we are going to generate and then the negative prompt and this is very important because this is where we are going to specify all the objects we don't want to generate, and stable diffusion is going to try to stay away from everything that's over here and this is a very standard negative prompt if you search for examples on Google you are going to see that this is how they usually look like in this case remember we are going to generate human faces we're going to generate people and we want these faces, these people. to look as realistic as possible obviously otherwise this synthetic dataset doesn't make any sense whatsoever because once we train a model in order to recognize different emotions we are going to apply this model on people on real people, so the faces, the people, we are going to generate now are obviously going to look as realistic as possible otherwise nothing is going to work so we're going to tell stable diffusion to stay away of everything that looks like a cartoon like an anime like a sketch and this is going to help us in order to achieve much more realistic results also we are going to try to to generate images with the highest resolution and the highest quality as possible so we are going to tell stable diffusion to stay away of everything that's low quality everything that's low resolution and so on right we are going to try to generate images in BGR we are going to try to generate color images so we are also going to try to make stable diffusion to stay away of everything that's grayscale or everything that's black and white and something which is usually a problem when dealing with person generation with people generation with face generation is that sometimes the results look a little disfigured right the people look a little like disfigured or deformed or something like that so we are going to explicitly tell stable diffusion to generate results which are very far away from these objects over here which are disfigured and deformed and this going to help us in order to generate pictures with people which do not look disfigured or deformed this is very important so this is a very classical negative prompt when dealing with person generation and this is the one we are going to use now so in order to move one step at a time let me show you how it works so I'm going to input a a prompt which says something like... let's start one step by a time so I'm going to say something like a picture of a squirrel right so we are going to generate people later on we're going to generate human faces but let's take it one step at a time so I'm going to generate a picture of a squirrel and this is going to be my negative prompt and let's see what happens... oh I see I have this character over here which is not correct... okay so this going to take some time and now the image is being generated and then we are going to see how it looks like so this is very important you can see it takes 9 seconds in order to generate this image and this is the result and you can see that this is definitely a picture of a squirrel and this doesn't look like a cartoon doesn't look like an anime doesn't look like like a sketch or anything like that and this is like a very high resolution image right so now let's do something which is going to be more similar to the images we are going to generate in this tutorial which is a human face right... yeah let's say something like this, okay this is the result you can see that the eyes look like a little disfigured or the eyes look like a little strange but other than that I would say this is definitely a human face so everything is okay now let me show you the prompt we are going to use in order to generate our results this is the prompt we are going to use and this prompt is going to help us in order to achieve the best quality results as well so you can see... I'm going to try to make it look better... something like this... okay something like that and maybe I can do it even better I can take these two words over here and this going to be better okay and later on I'm going to explain why we have these characters over here but for now the only thing I'm going to do is to replace these characters with 'a man smiling' okay so it says medium shot portrait of a man smiling front view looking at the camera color photography and then it says photorealistic hyperrealistic realistic and so on so we are going to use this prompt so we make sure... we help stable diffusion to make images which look realistic hyperrealistic with many many details we want these pictures to look as realistic as possible and we are going to tell stable diffusion to create an image of a man smiling and with a front view looking at the camera and so on and also this is going to help stable diffusion to make color images right... ideally we want to generate color images and we want to avoid anything that looks like grayscale right so let's see what happens if we use this prompt okay and this is the result you can see we have generated a picture of a man smiling which is looking at the camera with a very frontal view this is a color image so everything is perfect now let's do something else I'm going to repeat the same process 10 times... for j in range 10... and I'm going to do exactly the same I'm going to use the same prompt and the same negative prompt every time and we are just going to plot all the images and this going to help us to look at many different examples so I can show you something... and these are the results so you can see that we are generating images which definitely look realistic they definitely look very well very realistic, most of these pictures if not all of them are looking at the camera they are frontal view mean, in most cases, this is not really frontal this one over here is not frontal either this one is not frontal either I mean it's like on the side, a little on the side, but they are pretty pretty well, now something that you are going to notice when you are generating your images is that in some cases although we are specifying exactly what we are looking for over here and we are specifying stable diffusion exactly what to stay away from, in some cases we are going to have some results which don't really match exactly what we have over here right this is going to tell stable diffusion to try to generate something like this but in some cases it's not going to be possible and we are going to have some images which are not really frontal or we are going to have some images which are not really color I mean some images which are going to be grayscale I mean we are going to have some situations which are not going to be exactly like this as we have specified over here but in most cases we are going to have some very meaningful results... in most cases we are going to have some very good results now something that I noticed looking at these images is that I think stable diffusion is taking this parameter we have over here which is color photography it's taking this as a parameter to specify the skintone of the person that's being generated right if you look at the results most of these results are from a person with a dark skintone so we are going to do something remember we are going to generate an emotion recognition dataset and we are going to generate a dataset which is going to be comprised of many images of people of faces human faces and we want to make this dataset as diverse as possible so we are going to do something we are going to explicitly tell stable diffusion what's the ethnicity we want to generate in all these pictures and we are going to generate pictures of people from many different ethnicities and we are going going to have the same amount of all of these ethnicities so we make sure this is a very balanced dataset and this is a very diverse dataset and that we can use this dataset in order to train a model so we are going to do something only to show you how it works I'm going to say medium short portrait of a white man smiling front view and everything else right now I am explicitly telling stable diffusion to give me pictures of a given ethnicity and let's see what happens and these are the results you can see that now we are generating images of people with the exact ethnicity we were asking for and this is what I meant regarding that although we are specifying stable diffusion that we want to generate color images we want to generate images in BGR we want to stay away of everything that looks like a grayscale image, every once in a while we are going to generate images like this right I'm not sure if this is like grayscale or this is like something which looks like grayscale but this is like... this is not the type of images we are looking for but nevertheless we are going to generate images like this every once in a while but other than that all the other pictures look pretty much perfect this is not really a frontal view but... he's looking at the side but other than that everything is okay we have another grayscale image over here and this one is not looking at the front either he's not looking at the camera either but everything is okay right most of the pictures are okay, so this exactly what we will be doing this is exactly how we are going to generate all these pictures in order to make sure in order to try to have a very diverse dataset of people with many different ethnicities and many different features and many different everything, in order to keep this dataset as diverse as possible we are going to explicitly tell stable diffusion what are the ethnicities we want to generate and we also want to tell stable diffusion the gender we want to generate we want to generate many pictures of men and we also want to generate many pictures of woman right so this is how we are going to do... I'm going to create another variable which is ethnicities... ethnicities... and this is going to be something like this let's start with latino... then white... black... middle eastern... indian and asian, so these are the six ethnicities we are going to use in this example in order to show you how to create this synthetic dataset and now let's continue I'm going to create another list which is genders and this is going to be male... and... female... okay so male and female and we also have these ethnicities and everything is okay and now I am going to replace this value where it says white... maybe I can do something like this... a latino, a white, a black, and indan, an asian... so I can just replace all these two words and then I am going to say format... and I am going to do something like... I'm going to do it over here I'm going to import random because... we are going to sort for every single iteration we are going to sort one of these ethnicities and one of these genders, so ethnicity equal to random choice... ethnicities okay and then gender equal to the same random choice... genders okay and then... I'm going to put this value over here and then I need to do the same over here... okay now the results are going to be more random right now we're going to generate pictures of people of many different ethnicities and many different genders or actually 2 different genders so let me show you how the results look like in this case, and these are the results so you can see in this case we're generating the picture of a woman so everything is okay then we are generating the picture of another woman everything's okay and then we are generating the picture of something which is definitely not a person so let's see what happens over here but let's see if we have another result like this this is another picture of a person a person a person a person and a person a person and also a person so this is the only strange result we have over here and I'm not sure why this is going on my thoughts are that as we are specifying the gender as male or female and we are not really explicitly telling stable diffusion to generate people so I guess in this case it has generated something which is not a person right it's some sort of animal or something like that so let's make sure we are generating people and maybe we can say something like this men and woman right you can see that this is not a set and forget right this is something you have to be on top to make sure everything works as expected okay and now we have another situation over here and if I scroll down we generated a couple of animals over here not sure what could be going on but I am going going to do something I'm going to explicitly tell stable diffusion to generate pictures of people so medium shot portrait of a person... I'm going to do it like this... again and I'm going to remove the 'a' over here... or maybe I can do it like this... okay now we are telling stable diffusion to generate pictures of people and let's see if this fixes all these issues we are having okay now we are making some progress because now we are generating pictures of people we are not generating any animal whatsoever all of these pictures are about people but I do notice that this random is not really that random right because we are generating pictures of women in all cases and also I see the ethnicity is not really that diverse so let's do something I'm going to start this random generator I'm going to say... random seed 50 maybe I'm going to do something which is I'm going to print... the ethnicity... and the gender we are using... everything's okay right?... yes let's see what happens, okay we obviously have a mistake because you can see that we are not generating the pictures we are looking for, right... we are... definitely we have definitely we have a mistake so let me see what could be going on... okay I see the issue so the format, this is actually only one string but the format we need to do it in this one over here... okay over there... okay now this is going to work just fine, okay now everything seems to be just fine here we are generating a middle eastern female and everything looks okay black male asian female black male indian female white male white female and so on okay now everything looks the way it should everything looks much more diverse so everything is okay so I'm going to do something because I notice in these two cases we are generating the picture of an entire person and we don't really care about the entire person we only want a portrait right so I'm going to do something I'm going to get back to the original version which was this one and then I'm going to remove the seed because well we don't really need it and then that's pretty much all, and now let me show you how to generate all the different emotions remember this is an emotion recognition dataset so it's very important we generate different emotions and the emotions we are going to generate are... happy sad... surprised... and angry... let's keep it simple for now and we are only going to generate these four emotions, some of these emotions are going to be easier than others in order to generate an image of them for example if we want to generate happy pictures the only thing we need to specify is smiling and you can see that all of these pictures are smiling and these are definitely very happy people so this is all we need in order to generate... happy... the happy emotion but if we want to generate the other emotions we need to be a little more crafty because otherwise the emotions are not going to be very realistic so let me show you, so in order to generate the happy emotion by saying smiling is going to be enough and if you want to generate angry people, the angry emotion, I think it's going to be just fine if we just set it as angry let me show you I'm going to generate three pictures and this is what we got I would say that we are okay... I mean yeah I would say these pictures are angry maybe they are not super angry but they are angry I mean it's okay so the happy emotion is going to be very easy the only thing we're going to do is to put smiling in order to do the angry emotion we are going to just say angry but in order to generate the other emotions which are surprised and sad we are going to use these prompts we have over here right for example for surprise we are going to say surprised open mouth and raised eyebrows right so not only we are going to say the emotion but we're also going to describe how this emotion looks like on a face and we are going to do the same for the sad emotion we're going to generate people frowning and with a sad face expression and crying right and this is definitely going to be sad so what I'm going to do now is to create a dictionary... emotions is actually going to be a dictionary and we are going to use this dictionary in order to say all the prompts we are going to use in this case this is going to be something like emotions prompts something like this emotion prompts so happy will be smiling... sad will be this prompt over here... let's do something... like this okay surprised will be this prompt... and then angry... will be 'angry' okay so this is pretty much all and obviously we want to create a dataset and remember datasets always need to be balanced right somehow balanced so we are going to generate the same amount of pictures for each one of these emotions and this is how we're going to do... we are going to do something like this for emotion prompt in emotion... prompt... then something like this and then format emotion prompt sorry this is going to be emotion prompts keys... and then emotion prompt this is actually emotion... this is emotion prompts emotion okay okay this is going to work just fine now for every iteration we are going to generate a picture for each one of our emotions right so now I'm going to print emotion ethnicity and gender and let's see what happens and I'm only going to do it for one iteration right, or for two iterations, so let's see what happens, okay here we have happy an Asian female and yeah everything is okay sad a Latino male everything is okay yeah he looks sad surprised she looks surprised for sure here we have angry and we also have a man which is pretty much naked I don't know how that happened but it doesn't matter we only care about the emotions so everything's okay then let's continue this one is happy this one is sad this one is surprised and this one is angry and yeah he looks angry for sure so these are the pictures we are generating and we are almost there because now the only thing we need to do is to iterate in something like... 250 times we're going to iterate 250 times and this going to generate 1,000 images, 1,000 images and that's going to be pretty much enough in order to train the emotion recognition model we want to train in a future tutorial so now the only thing we need to do is to iterate all these many times and then we are going to save the pictures in this... collab and then we are going to get the pictures into our computer so let's do something I'm going to create some directories, I'm going to import os... and then I going to os makedirs this is going to be content faces happy let's do something exists... okay... something like this I don't remember if it's... yeah exist okay and this is going to be true and we're going to create a directory for each one of our emotions so sad... angry... and surprised okay so now we are going to create... we're creating a directory for each one of our emotions and then the only thing we need to do is to save the pictures there and something which I haven't told you so far is that the object we got this image we got when we execute this instruction is a pillow image right so this is in pillow format and this is a pillow object so in order to save this picture we need to say something like this, image save, I'm going need to specify the location and this is going to be something like... like this... format emotion and that's pretty much all and then obviously... we are iterating over here... okay so we are going to save the pictures in this directory and we are naming these pictures according to the iteration we are currently in and we are just filling the number... the filename with four zeros so we make sure all these filenames are named properly so this is pretty much all we need to do we don't really need to visualize the pictures anymore we don't really need to do this print anymore and although we are going to run this process for 250 iterations so we generate 1,000 pictures let's do it for only maybe two or three iterations for now so we make sure everything works properly and then we are going to run this process for 250 iterations let's make sure the pictures are saved properly and also let's make sure we can get these pictures into our computer right because remember we are working on a google collab so we need to do something in order to get the pictures into our computer so let's execute it and let's see if it works... okay so this is what we got remember we are not visualizing the images anymore so this is the only thing we got now, now please mind this message we got over here which says potential non suitable for work content was detected in one or more images a black image will be returned instead so you are going to have a similar message every once in a while and this means that after you generate the entire dataset you will need to curate this dataset and you will need to remove all these black images which have been generated now I'm going to show you how to get all these images we generated and then we are going to do the entire iteration right then we are going to iterate 250 times and we're going to generate 1,000 images so let me show you how to get these images first I have a few commands over here the only thing I'm going to do is to copy and paste these commands this command over here is going going to zip the entire faces directory we created with all the pictures within this directory and then we are just going to generate this file which is called faces. zip so I'm going to execute this file first and then we are going to mount our Google drive because we are going to copy all this data into our Google Drive and then we're just going to download the file from our Google Drive and that's going to make the process of downloading this data much easier okay and then the only thing we need to do is to execute this command over here... and you can see that it says SCP this file which is faces.zip into this location of my Google Drive which is my drive synthetic data face generation stable diffusion faces1.zip actually I can call this file faces.zip and if I show you this location in my Google Drive this is a directory I created my Google Drive which is called synthetic dataset face generation stable diffusion so please remember to change this value to a value to a directory in your Google Drive otherwise this is not going to work so the only thing I'm going to do is to press enter and this is going to take care of copying this file let's see what happens... I think this directory... I have a typo with this directory name yeah I have a typo synthetic dataset face generation stable diffusion faces.zip let's see now now everything's okay and if I get back to my Google Drive if I refresh sometimes you have to wait a few minutes okay now so I'm going to take this file and I'm going to download this file like this now I'm going to open this file and this is where we have these four directories which are surprised sad happy and angry and let me show you a couple of examples this is one of the images under the surprised category now let me show you another example over here this is sad so everything is okay... happy we have a person that's smiling so everything's okay and then angry... this is the black image we generated so we are going to remove this image later on and this is a picture of a person that's angry so everything looks okay so the only thing we need to do now is to iterate for something like 250 iterations so we are going to generate 1,000 pictures and then we are just going to do the same process we are going to zp the directory and then we're just going to download the directory like this, now I'm going to scroll up and I'm going to generate something like 1,000 images and that's pretty much all so something that's very important is that if you want to download exactly the same data set I am currently generating this is going to be available in my patreon so it's going to be available to all my patreon supporters if you want to download exactly the same dataset I am generating right now you can definitely do so in my patreon so this is going to be all for today this is how you can generate an emotion recognition synthetic dataset and this is going to be all for this tutorial my name is Felipe I'm a computer vision engineer and see you on my next video

Info

Channel: Computer vision engineer

Views: 1,065

Rating: undefined out of 5

Keywords:

Id: vXASIOLQGrU

Channel Id: undefined

Length: 35min 24sec (2124 seconds)

Published: Fri Jan 26 2024