127 - Data augmentation using keras

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys you're watching Python tutorials on my youtube channel Python for microscopist in this tutorial I'm going to talk about data augmentation using Kerris now data augmentation is very very useful if you would like to augment your data or increase the amount of training or validation data now if you have 2000 images let's say and you would like to get 5,000 or 10,000 of those then this can be very useful but if you only have five or ten images don't expect to get 2,000 or 20,000 images sort of data augmentation and still be able to get decent results out of your deep learning because your deep learning will be highly highly biased towards these five or ten images that you're augmenting so please use this tool in a in a wise way now if you what to do if you only have a handful of images or if you have less than 100 or 200 images then I recommend using traditional machine learning again watch my videos on this topic just search for traditional machine learning in my in my channel and what I mean by traditional machine learning is where you extract features and then use like random forests or support vector machines now for deep learning data augmentation can be useful and let's actually dive in to have a quick look okay now I have already pre written a few lines of code so I'll copy few lines to explain but first let me show you the folder structure that I have here so I'm going to use a couple of images that I have first I'm gonna show you single image to demonstrate the data augmentation process and then I'm gonna show you okay now that you have done it for single images how to do it for a couple of images that you have okay now if you really have like a two or three different classes of images okay so I believe I created a folder there let's say a folder for all n stein images a folder for all Monalisa images or think of a folder for all you know cats and dogs and you know automobiles and that kind of stuff then there is an easier way so I'm gonna show you all these three ways and by the way I only have one image of each here which proves the point anyhow okay and in the next tutorial I'm going to actually apply this on malarial cells parasitized and uninfected cells where we use data augmentation I've already done a video on this but without data augmentation and this time let's do data augmentation so let's jump in so first of all let me show you how this works on a on a single image okay so let me jump back and copy a few lines of code and explain what these mean in a second so here you go now first of all data generator is part of your Karras dot pre-processing dot image okay and again in case you wonder I'm using tensor flow let's go ahead and do that import tensor flow as TF okay and now TF version oh sorry TF dot version and I'm using one point four and four Charis I'll do the same so you know if things don't work on yours you can always check what version you have okay so k RS dart I believe it is two point zero point eight yeah so my tensor flow is one point four and chaos is two point two point eight I know tensorflow two point whatever is out and Karis latest updates are out but they do not work for the GPU I have so I'm sticking with this my GPU is NVIDIA Quadro Quadro K five thousand which is an old one but still works very well okay okay now of course whatever technique I'm showing you you don't need a GPU but I'm just showing you that okay with this the reason why I have these two is because I want to use my GPU okay so now that that's out of our questions so let's go and look at the libraries that we need first of all the library that we need for data augmentation is called image data generator well the method is called image data generator it's part of our Kerris library and inside pre-processing dot image okay so I'll share the code so you should be able to do this so please focus on the content here and not the single lines of code you can always copy it from the file that I share okay I'm also importing psychic IO from psychic limits so we can read a single image okay now first we need to create an instance for this image data generator yeah and the way you do that is it's customary to call it data Jen you can call it whatever you want so I'm going to do data Jen equals to image data generator yeah this one and then the first R and keep adding your arguments right so here I want to rotate my image between 0 to 45 degrees random rotation again 45 is just a limit and width shift range in this case it's shifting an X by 20% of whatever the image size is and height it's doing the same thing 20% and the shear it shears the image by 20% zoom range 20% zoom in zoom out by 20% horizontal flip equals to true so it just does a mirror reflection you can also add vertical flip you can add a whole bunch of stuff now this is one of the key things that not many people talk about if you watch other videos they don't talk too much about the film mode at least I didn't find much when I was learning that so maybe there are much you know more videos now but anyway fill mode is okay when you move the image by 20% there is some docks I mean some space left over right in your image what do you want to fill that with if the fill mode is constant you can you don't have to give a value if you don't give a value then it would be black pixels okay I'll show you that in a minute if you put a value of I let's say 125 then it would be a grey pixel right so pixel on a value of 125 is gray zero is black and 255 is white okay so this is how we are going to move shift and shear our images and I created a folder called augmented so here let's actually save all the augmented images now how do you apply this first of all let's let me show you on a single case yeah I would like to read a single image and I'm assigning to a variable called X how am i reading I owe dot in readwrite again this is just reading a single image you can use open CV you can use whatever method to read an image it doesn't matter and I'm reading this as a color image and I'm reading just the Mona Lisa a single image okay as a color image because I did not say as grey equals to true okay again I hope you worked with psychic image if not again go ahead and watch my video on that topic okay so once we do that what do we do next so next let's actually reshape let's copy all of these lines so I can explain them I didn't save this and let's run the code up to this point up to the point where we are reading it and if you look up here my X is 256 by 256 by 3 a single image of 256 by 256 size and 3 represents it's a color image for red green and blue channels okay so now I'm reshaping it and there is a reason why every shaping this and when every shape it my X became 1 256 256 and 3 that's because most machine learning you know when you're going into the convolutional layers or something else you know the input size is typically the first one is the number of images in this case only one image if you have thousand images that would be 1,000 by your image X&Y dimensions 256 and 256 by the number of channels if it's a grayscale image that would be one if it's color image that's three okay so that's why I'm reshaping it and now down here the way we are applying this data generator this is an instance that we created right the way you apply that is data genzler okay ignore this for a part for now okay data gen dot flow dot flow because it's a single image later on I'll show you flow from directory when we want to suck a whole bunch of images from a directory structure okay so dot flow and the X is or X right which is nothing but n numpy array of size 1 by 256 by 256 by 3 and that size 16 means it's actually creating 16 images at a time yeah it's generating or augmenting 16 images and save 2 directory I'm going to save all in this case I'm going but typically when you apply augmentation you just apply it while you're training it while you're training an algorithm okay so you're not going to save all these thousands of images you can if you want but you're going to augment the data on the fly that's how it happens you're not gonna save it but from now I want to show you how the output looks like so I'm going to save this to a directory called Augmented that's why I created this Augmented directory and then I'm gonna put au G as a prefix to each image and then I'm gonna save all the images as PNG you can put JPEG you know if you want and this further part is I'm just doing this until I generate 20 images that's pretty much it okay otherwise this data gen is in finite loop it keeps generating images until you put a stop to it now in real life scenario you have number of epochs as they start a stopping point you have the bad size you have an other ways of stopping it right now I don't have other ways of stopping it so I'm just setting a limit of 20 okay so let's go ahead and run this I hope things are clear right now and let's do this in a second one more time but when I open my augmented now I should see a whole bunch of images now you see how these images are let's actually view extra large icons so you can see this part is filled with gray of value 125 if you do not put anything here you just say okay mode equals to constant let me delete all of these images okay to make the folder empty and now when you generate this of course it's going to again randomly generate a whole bunch of images but then you see the default value for the fill is zero I am NOT a big fan of this it kind of works okay for object detection for example but in reality if you want to do for example generative adversarial networks and you want your image to be completely filled with something that's realistic yeah otherwise your generated images also will have like this type of you know black boundaries I do not like that so let's try a couple more so if you try nearest which is the default I believe if you don't give any fill mode nearest is the default it's okay but again let's go back to large icons and delete everything so we see the output of what nearest looks like okay so let's open this and as you can see nearest is it's actually looking at the nearest pixels and it's stretching it yeah let's keep going through stretches it this is also I mean this is acceptable for most applications but again are not my favorite because again if you are trying to do for pixel segmentation that's okay but if you are trying to degenerative and for cereal networks that's not good so what actually works best is reflect this is my personal favorite and for again for generative adversarial networks for other stuff you guys again depends on the application now you see as the name suggests reflect is it's just reflecting this part but look at these images they look very I mean you see that's where it's reflecting so you see that weird shape this is where it's reflecting this is where it's reflecting so you can see how it's actually reflecting so it looks like in real image the entire image looks pretty realistic that's why I like this and warp I think is the is that the warp wrap sorry not warp wrap is wrapping it okay just like nearest wrap actually wraps it around so it's a different way again so as you can see the head instead of up there is coming back down here okay it's just what happened yet now again this works very well for nuclei images or other type of image is not just these type of paintings but in real life wrap also may work very well but my personal favorite like I said for most applications tends to be this reflect yeah so let's go back to our reflect okay and let me show you something else now there was only one image what if you want to augment a whole bunch of images that are in your folder okay let's say you have a folder and you have 1,000 different Monalisa images and you would like to do that augment those so for that again I'll show you a couple of ways for that what I would recommend is let's go back and show you this again this is basically reading an image or images out of a directory let me remove this psychic image I think I already imported it and I'm importing a library again called numpy and OS so I can actually get the filenames from a directory again please watch my video previous videos on this topic or there are a whole bunch of other videos on YouTube so go ahead and search for how to use OS and then I'm using PIL to import an image again you can also use psychic image if you want ok again I'm just showing you a different way now I'm defining my image directory as test folder again in my test folder I have it let's say we have a thousand images in this case two images but it doesn't matter how many now I define my target size as 128 because I'd like to no matter what size your input images are let's actually resize them all to the same size okay so it it works smoothly with our next step which would be convolutional neural networks or unit or whatever you would like to do with that data now I started an empty list called data set and I'm going to fill the information that I'm getting from each image into this data set okay and the next step I'm walking through our using OS dot lister to look inside my image directory and it and then I'm going to enumerate it using a for loop right so when I use a for loop I am actually looking at each image and for each image write image name for each image I'm going to split it at JPEG in this case this is jpg that's why I'm doing that if it's PNG go ahead and type PNG here and then I'm going to use psychic image IO dot M read and image directory plus my file name and now I'm going to read them as RGB and I'm going to resize them into 128 by 128 and finally after this what I get is a data set let's actually run this okay so we can see so this part we already ran but that's okay up to this point I'm going to run all of these now if you look at my data set it's a list of two because I have two images and then each of this is a separate image yeah so I do not want a list I want a numpy array so I added this line x equals two numpy aerator data set and now my X will be you see the size of two 128 by 128 by three previously we had one 128 by 128 by three when we write only one image so if we have 1000 images that would be one thousand 128 128 three okay so we are all set now to apply let's create some room here to go ahead and apply pretty much the same thing I shouldn't have deleted that part but let's do that so now I'm this is exactly the same as before yeah with a bad size of 16 now I'm actually going to stop when we get 20 images so do we have any images in our Augmented folder if so let's delete them no nothing so let's run this now we should see augmented images of both Einstein and Monalisa so let's actually get in and there you go so I have my Einstein and Monalisa this is again think of this as having thousands of images now typically if you have two different classes like Einstein class and Monalisa class this is not the best way to do if you have single class this is a great way to do it okay now what to do if you have multi-class problem that you're trying to solve let's delete everything I'm not going to delete this data Jen because we are using the same data generator to kind of judgment or images so the next set finally multi class is probably what you would be using for the most part so multi class right there now again I'm going to look into let's delete all of these let's delete all these I'm going to look into a folder called mona lisa underscore Einstein and here I put each image according to its classes because that's how this data Jen from flow from directory is going to read images if you have single class that's okay use single folder sub folder that's fine but if you have five six different classes okay catch dogs bicycles airplanes buses then put all those images in the correspond folders and then instead of data gendered flow remember previously we used data gender I'm not sure if you remember that part let's go ahead and zoom in previously for single images or previously when we are inputting a predefined array yeah our X was 2 by 128 by 128 by 3 we first generated this X and then supplied us that as input if you're gonna do that use data gendered flow because your data you're doing all pre-processing here manually ok but if you have these folders the way you do that is data gendered flow underscore from underscore directory and then as you can imagine you have to give your directory name so let's delete everything let me delete all of my variables let's clean the slate here so we can only look at this very few lines of code first you define data gen and then flow from directory that's it ok which directory Monalisa underscore Einstein bat size 16 that means for each folder it actually does 16 images for each you know yeah sub folder and target size 256 256 you can do 128 by 128 so on-the-fly you are resizing these images you don't have to do it beforehand like I showed you earlier and then color mode let's read them as RGB so it's creates these three channels and save to our Augmented directory the rest of it is pretty much the same ok and now I said if I is greater than 31 so let's do like 32 images I guess in this case for each of these so let's go ahead and run it it says found two images belonging to two classes and now it's augmenting so let's go ahead and open our augmented folder there you go all of these images augmented right here ok ok so I hope you learned something new as part of this tutorial and then in the next one I'm actually going to use this data flow from directory onto a convolutional neural network to do malarial infected malarial cells versus healthy malarial cell segmentation so please stay tuned to watch that thank you very much and don't forget to subscribe to this channel so I don't have to remind you every time you know on Twitter or other places you know when I create a new video so thank you
Info
Channel: DigitalSreeni
Views: 21,456
Rating: undefined out of 5
Keywords: microscopy, python, image processing
Id: ccdssX4rIh8
Channel Id: undefined
Length: 20min 10sec (1210 seconds)
Published: Tue May 26 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.