Albumentations Tutorial for Data Augmentation (Pytorch focused)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hope you're all doing freakishly awesome in this video we will be taking a look at the albumentations library so for those not familiar with albumentations it is a python library and a pretty awesome one at that i should say specifically for data augmentation so the library can be used with deep learning libraries like pytorch and tensorflow and in this video i want to take a look at how we can use it for for different tasks so classification segmentation and detection then in the final uh sort of part of the video i will show you a full example of how to use it with pytorch all right so just talking about it from a pytorch perspective why should we use it so there are three benefits i would say for using it the first is that it is faster than torch vision on every benchmark and then secondly which i guess is also the most important one is that it has support for more tasks like segmentation and detection which is a lot harder to do in just in torch vision and i don't remember the third point so let's just move on with the video [Applause] [Music] alright so to start off i'm just going to show the images that we're working with so we have a an image of a cat that we're going to use we have an image of elon and then we have a corresponding mask to that image and then we also have a second mask to that same image and we also have a data set for cat and dogs which we're going to take a look at later on in the video but so first of all let's just create a file and the first thing we're going to take a look at is how to do it for classification so let's call this classification and starting with imports we're going to use cv2 we're going to import albumentations as a that's how you normally do it we're going to import numpy as well we're going to also from utils import plot examples so that's just a file that i created beforehand and uh it's not essential to to sort of the the transformations that we're going to do on the images but just so that we can plot and see that they actually work and all the code is going to be on github so if you want the details of that you can check it but from pill we're going to import image all right so first of all let's load uh the image of elon so to to do that we can use cv2 which is what they sort of in their official tutorials or examples that they have that's what they normally use i'm more used to using pill so that's what i'm going to use so i'm going to do image is image dot open images elon jpeg and that's it so for the transforms it's pretty similar to pytorch but we do transform equals a dot com compose and then inside we do a list and inside this list we specify the transformations that are going to be applied uh sequentially so first we might want to resize the image then we do a dot resize we specify the width maybe 1920 and then height 1080. perhaps we want to do a random crop of that resize image so let's do random crop and by the way i'm not going to show you there are so many different uh transformations that you can apply i'm just sort of showing you a couple of ones and and the structure of it you can go to their website to check out for more augmentations you can do but so for a random crop let's say we want the width of 1280 and a height of 720. perhaps we can also do some common like rotation so a dot rotate we can specify a limit to how much that should be rotated so let's say let's say i don't know 40 degrees um and then to these also these um transforms we can apply a sort of ascending a probability um so we can send in for example 0.9 which means that this transformation is going to be applied in 90 of the cases all right so in 10 percent of the case it's just going to skip that one and move on to the one next in line so some other common ones are horizontal flip and we'll send in a probability and so as you can see it's very similar to to pi torch you know this in pi torch would be random horizontal flip and that's you know the structure is very similar so we can do vertical flip maybe in less percentage maybe 10 percent some other ones we can do rgb shift we can specify a shift um to each of these sort of changing the coloring of it and uh the probability is default 0.5 i think so no it's maybe not so let's set it to maybe 1.9 and let's say you would want to choose one of several data augmentations you could do a one of and then you can specify so let's see here you can specify the ones that you want to sort of consider having so let's say perhaps we want to consider to do a blur so we can do blur limit and you know i'm going i'm not showing sort of going through each of how they work you can read the documentations for that but sort of saying that we have a blur and we can have a color jitter so how this works is that one of these two augmentations are going to be chosen with a certain probability so let's say it's a 1.0 so in 100 of the cases one of these are going to be chosen and then that they are actually going to be applied is is uh with a probability of 50 or yeah 0.5 so that's how that works and you can sort of um consider that you can build these pretty complicated the data augmentation sort of trees of how you want to actually perform this augmentation so it's a quite general in that way but all right so let's just say we have those transformed that's just an example what i'm going to do now is i'm going to create images list and i'm just going to first set sort of have that image that we uploaded from the beginning or that we open from the beginning as in this list then i'm going to do for i in range of 15 so we're going to show 15 examples and what we got to do first of all is convert this uh this image that we opened with pill to a numpy array so we're just gonna do numpy array of image that's it then for uh sort of doing the transformations all that we do is that we set maybe augmentations equals transform and then we'll send an image equals image all right so pretty simple um now from this augmentation it's going to return a dictionary and from this dictionary we have to uh get the image so we maybe do augmented image is equal to augmentations of um sort of uh the key image as a string so then let's do image list dot append that augmented image then in the end let's just plot those examples of that image list and let's see how that looks like so if we now run this all right so here we can see sort of some examples of how that looks like um one thing here is that yeah so you know there's a color i don't know if this is the color jitter or if the it's the rgb shift but something's changing the coloring there's obviously a rotation involved there's some horizontal flip there's a vertical flip so it seems to be working pretty nicely one thing that so one thing i think perhaps could be problematic is that when it's performing a rotation you can see that it's doing it um sort of outside of the rotation it is doing a reflection of that image which perhaps could be problematic uh one thing you could just do there is for the rotation you can specify border mode to be cv2 border constant there are so like a lot of different options here but if we rerun it then we can see right here that it's just going to be sort of dark constant value outside of the rotation and yeah so that's how you do it for classification let's move on and let's create one for um segmentation and you know so i'm sort of assuming a little bit that you're here because of you want to use it in pytorch but you know in pytorch you're creating a when you're creating your data set you have your get item and all you would change is that instead of uh when you do self self transform of your image you just gotta obtain that example from your as from the dictionary that you get returned so you know there's really not much change to to generalize this but we're going to see a full example as well in the in the um in the last part of this video but all right let's uh let's copy paste our code because it's going to be actually quite similar for all of these examples so for segmentation uh we need to open our mask obviously so mask is going to be image dot open images and then i just call it mask all right so that's we have the mask all we're going to change now is we're also going to send in no first of all we got to make that to an umpire right so mask is numpy array of mask and i'm not really sure if this matters too much but i've seen people use as array of mask and numpy i don't know some other stuff as well and you can use numpy array of course i'm not really sure if there's a performance difference here but i i would doubt that it matters too much all right so you know what we got to do first of all is send in that mask so mask equals mask pretty simple i would say and then how to get that mask we just gotta let's copy that line and then augmented mask instead and you guessed it all we have to change is from that dictionary let's now take out the mask so um let's add that as well to our image list and i think that's it we can rerun uh this code here or we can run this code rather and let's see how it looks like all right so for a couple of examples here we see that um here is there's been a crop applied and the corresponding crop has also been applied to the to the segmentation the target label and similarly here's our rotation uh the segmentation mask has also been rotated and so you know that that's really nice that's really cool and all we had to change is really like two lines and i don't know if you asked me that's pretty amazing so let's now create one for detection or actually one thing i want to show you is that you can also have sort of multiple masks for a single image so let's say mask 2 and what we'll do is we'll change this to second mask and then instead of doing mask equals mask what we're going to change right here is to do masks equals a list of mask and then mask 2. and of course we gotta also convert that to a numpy array that's that's pretty much it and now we should also obtain let's see a second mask so we could do that augmented second mask and then we'll append that to the list as well so augmented second mask and we might have to change this so let's change it to maybe four and let's rerun it all right just.jpg all right sorry so what we got to do here actually is um we got to do augmented masks and then we gotta get that by um using masks in the in when we as the key and then uh right here when we're adding that to the list let's do augmented masks and then the zeroed uh element and then we'll do the similar thing for the second one but we'll use one right there so hopefully this should work now all right and as you can see here so at the top left that's the original image here's the first rotated example and i'm not sure there might be some cropping involved here too um and so on and then we can see that the map both of the masks same transformations has been applied to both of the masks and that is awesome pretty darn cool library so what we got to do now for for a an example of detection before we move on to showing sort of a full example in pi torch let's just copy paste or paste in that code right there and what i'm going to do now is i'll show you how to do it with cv2 if you want to load images with that so we'll do image cv2 dot imread um and then you'll do images and then let's choose the cad example for this one and one thing you gotta do as well is you gotta do cv dot cvt color of that image and then cv2 dot color bgr to rgb and so from what i could understand for some historical reasons opencv loads the image images as bgr instead of rgb and that's what the format that we wanted in so all we got to do is just do this line right here and now that we have the image we're also going to have a corresponding bounding box so we have bounding boxes and in this case i'm just going to put them in manually but how you would do it is you would have a list for each of the bounding box in that image so you would send the list of lists where each list has the bounding box and here it's quite important to also know the format that your bounding boxes are in so i guess for those familiar with object detection um if you're there's sort of some different formats for pascal voc you have x-min i think weiman x-max y-max sort of the top left corner and the bottom right corner and then you have different ones so yolo for example has a different one where i think it has sort of the width and the height and then it has a center point and then you also have coco which has a different bounding boxes so you gotta just know which one you're using in this case this is from pascal voc so why it's important uh let me show you so here we're gonna use the transforms but now we're also going to send in bounding box params all right and all that we do here is really a dot bounding box params and here we specify the format so here is when it's important to know so we specify format is pascal voc and if it's in yolo you would just write yolo if it's coco i think you would just write coco there's going to be a link to the documentation for that but yeah pascal voc um and then one thing we got to do as well is specify labels fields we're just going to specify that to be empty i'll talk a little bit more about that soon but all right so we'll remove this mask thing right here um we don't have to convert it to an empire right because we used opencv and one thing we have to do though is write a save the bounding boxes and let's just do bounding boxes of zero so we'll keep um that original one as well so those four values right there and let's see what we gotta change so mask we got to remove that we don't have that anymore and all we do here is we specify bounding boxes to equal bounding boxes so pretty easy uh and let's see so we gotta we are appending that thing right there and one thing we can do as well is save bounding boxes of uh dot append argument augmentations of bounding boxes and it's going to return it as a list of lists i just wanted the list in this case so i'm just going to do of zero but yeah so if you have multiple it would make sense in this case we just have one so that's why i'm doing it this way it's pretty much just for the visualization that we're gonna get when we do plot examples all right anyways so i'm just going to send that in save bounding boxes to that example in the utils files and util utils file so let's run this example and let's see how it looks like let's spray let's do a couple of more examples so let's do maybe 15. all right so now we can see a couple of more examples and as you can see it's obviously following you know the bounding boxes are following where the cat is located one thing you might you might notice is that when we're performing the rotations the boxes bounding boxes become become inflated um but that's a common issue and so that's that's not an error it's sort of just expected yeah so i guess one thing here is that when we're using random cropping as you can see sometimes the cat is not uh in the image anymore right because that perhaps that part was cropped out so one argument we can also send in is uh something called the minimum area to this box uh params so we specify min area to be equals to maybe a 2048 and this is meaning that the bounding box area shown in the image should have at least 2048 pixels in in area and then you can also send another another argument that is min visibility yeah so the minimum visibility here uh means that the bounding box should have um i guess some area which was sort of with respect to the entire image so it should cover some percentage of of the of the output image i guess so in this case i would assume that this means 30 percent of the entire image i could be wrong there though but so if we use those arguments and we rerun it and yeah so now what it's doing as you can see we got an error right here and that is because now it will drop some bounding boxes so specifically you know we could make sure here that we could do you know check if the length of the argumentations of bounding boxes if that is um if that is empty then let's just continue right because then that would mean that the bounding box was dropped because it didn't it was too small for the image or something like that so now we can see these examples which look pretty good and yeah so i mean doing th these kinds of i mean if you were to code sort of the just a rotation part and having the bounding box that it would be tricky to be honest and it would probably take a lot of time so having this kind of a library is super useful and yeah so let's create another example let's call it full by torch example all right so these are the imports that we're going to use torch numpy opencv uh pill and then albumentations we're also going to import two tensor v2 and then os and so for this sort of more complete example we have a cat and dog data set so we have a couple of images of cats we have a couple of images of dogs and i just want to show you a complete example of how we create a data set class and also how we can create the data loader so let's create class let's call it image folder and uh i guess for those of you who are familiar with image folder in pytorch we're just kind of going to replicate that one um or create our own version of image folder i guess you could send in transform as a function in some way and you can sort of just use the the image folder that's built in in in torch vision i think but let's just create it ourself uh who cares so let's do init we're going to have self a root directory and specify transform to be none as default then let's call super on that on that thing so self we'll do self.data is empty and so my idea here is that we're going to go through the folders in that the subdirect directories of cat and dogs and specifically i want self.data to be sort of the the the image file so maybe it's it's called cat0.jpg or something and then i want the label with that one so that would be zero because it's in you know in alphabetical order it's the first folder so that would be index zero maybe then we would have some more examples and then we would have dog dog zero jpeg and then we would have one as labeled and of course we want to do this general but that's what i want for the data so let's just do self.root directory is a root directory self.transform is transform and then let's do cell.class names equals os dot list directory of root directory all right so that's going to list all of the subfolders in the root directory which is cat dogs all right so what i'm going to do here is for index common name in enumerate self.class names we're first going to go through find all of the files the files we're going to do os.lister of ospat.join of the root directory and the name all right so that would be cat dogs and inside cats so we'll find all of the files included in that one what we want to do now is we want to add it to this list and of course there are probably much better ways of doing this so if you have any suggestions of making this a little bit cleaner and better just comment and we'll make it better but so let's have that data we're just going to do plus equals a list of zip of all the files and then the index and then we're just going to do that for the length of the files so you know for the first one it's going to be index zero so can all of those files are going to be associated with the label of zero all the cat images and then dogs are going to be labels of one so if you're not following completely here that's okay this is could perhaps be done in much easier ways but for the length uh we're just going to return length of self.data and then for our get item so we'll do get item we'll send in an index so first we'll get the image file so the image file and then the label will get that from self.data of index pretty easy now we just have to find that file all right so what we got to do first is just find sort of the root and the directory so we'll do os path join of self.root directory and then we'll do stuff.classnames of that index oh sorry of label so we'll get what the class name is for that example that was sent in right so maybe that was a cat then we'll get cat dogs and then this is going to be cats all right so to load the image we'll do numpy array of image dot open of os path join root and directory and then the image file okay so now we'll check if self.transform is not none we'll do augmentations equals self.transform of image equals image right and this is exactly as the example that we've shown we've shown in the classification example and then we'll do image equals augmentations of image all right so what is the difference if you would also have bounding boxes and stuff you would you would do bounding boxes equals bounding boxes right and then you would get those bounding boxes and you would return them at the end and that's the difference right similar if you had a segmentation or a segmentation task so this you know it's quite general in that way and in the end we'll just return image comma label and that's it so i'm going to copy in some examples or the example that we had previously let's maybe do the classification one i'm just going to copy paste all of that and just paste it to our example here all right so what's the difference here there are just some small differences first we gotta do normalize so a dot normalize we gotta send in mean so let's just set it to zero zero zero we gotta send a standard deviation let's just set one one one and then we gotta send in a mean or a max pixel value and it's gonna be 255. all right so um i think this is also by default 255 so you don't have to change send it in but i think this is important to be clear because this is not similar or exactly similar to pytorch so in pytorch you would do you would do to tensor right and then you would do uh normalize and you would normalize with all of the mean values um sort of uh between zero and one and similar for the standard deviations that is also what you do in this normalize so you would find out what the mean standard deviation is and all of those values needs to be between 0 and 1. and that is because what is going to happen um when we when we're doing this normalization inside that function actually we can go to that function so let's just see what it looks like so i'm just following that that example and if we go to normalize what is happening here is that they're multiplying the mean with the max pixel value which is 255 so therefore assuming that mean is between 0 and 1 similar for the standard deviation and here they're just they're just taking the reciprocal of the standard deviation because they want to multiply it instead of divide it but so i guess what is important here if we go back is that these mean and standard divisions are going to be exactly similar to how you do it in pi torch but you need to remember that when we're doing two tensor at the end that is not dividing by 255 which it is in tensorflow or pytorch sorry all right and so uh what we got to do now is i guess just data set equals image folder of root directory we'll set cat dogs and then transform equal transform and now of course you could just import to you can you know create this to a data loader send in the batch size number workers and it would work exactly the same just to make sure that it it works for this data set let's just do for x y in data set and print x dot shape just make sure that it works all right so as we can see it's working um and yeah i'm not going to show you the data loader but you know it's exactly the same as how you would normally do it all right that's it for this video uh hopefully it was useful for you to sort of take a look at how albumentations library work and how you would do this different tasks so i just want to say thank you so much for watching and i hope to see you in the next video

Info

Channel: Aladdin Persson

Views: 9,332

Rating: undefined out of 5

Keywords: Albumentations tutorial, Albumentations pytorch, Albumentations augmentation, Albumentations segmentation, Albumentations detection

Id: rAdLwKJBvPM

Channel Id: undefined

Length: 31min 29sec (1889 seconds)

Published: Tue Jan 26 2021