286 - Object detection using Mask RCNN: end-to-end from annotation to prediction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone welcome back in this video i'm going to show you how you can put together an end-to-end project for object detection of course using mask or cnn and we are going to go all the way from actual images to the prediction so this is a true end-to-end project you can google search for certain images download them i'm going to show you how to annotate and get the annotation files ready and these annotations let's focus on json files because i've already talked about how to work with xml files in the last video so if you haven't watched that you should have and don't forget to hit the subscribe button so you know when these videos are are out okay coming back to what we are trying to do we are going to annotate them and download the annotations as json file both as vgg format and also in the coco style format and i'm going to share you the code to handle this data i'm going to share you both the files that i'm going to use today one that handles the coco style annotations the other one that handles the vgg style annotations they both are json files but they're slightly different and we'll go through that in a minute and once you have those we are going to train a mask rcnn model and after the training we are going to take the trained model and use it for inference and in the interest of time of course i'm not going to spend a few hours annotating i've already annotated a couple of images for some marbles so the data set is not huge i'm going to show you the process of doing this but every step takes some time yeah even the training part i believe i stopped after three training epochs for each of these two files that i'm going to show you and obviously for you just leave it for a lot longer but i think we'll get good results even with like three training epochs for with a handful of images for training okay with that background information let's go ahead and jump in and first thing first let's do a quick google search for marbles i'll show you exactly what i have done literally yesterday in collecting this and getting myself ready for this video today okay so let's jump into our google search and there you go i searched for marbles and i picked a few images i picked like one for example this one where the marbles are you know clearly separated this is a challenging problem if you want to segment images that look like these like marbles then you need a lot more training data but something that is uh that is like this we should be able to use it and in fact if i open which images did i download end up downloading so i downloaded like these three images so this one you can see and this other one is also somewhere around here so you can see those i think it's that one so i just downloaded a few like three images and i started to annotate them so let's go back to these images and these are the ones that i annotated how did i annotate these so for that because the task is relatively small i mean i don't have a thousand images that i need to annotate so i don't want to i don't want any fancy labeling software where i need to store data on a daily basis and all that i definitely recommend you doing that but for this purpose i just use something called make sense i talked about it in my previous video one of my previous videos where i showed you various annotation tools so here let me go ahead and get started again please do this on your own images maybe you'll do something other than marbles but let's say these are the three images so i'm going to drop the image in fact i should have copied there there there and drag those oh my god okay drag all of these right there three images object detection and it says okay no labels go ahead and add the label in my case i did blue marbles and non-blue marbles so let's go ahead and add that and non-blue okay let's just add that and start the project so i do polygons right here and i it's pretty straightforward you just go ahead and click click click click of course you need to be a bit more careful to define the boundaries uh nicely because that defines how good your result is going to be and let's close it and it's saying select a label this is a blue marble and then let's just do one more and you should get the point right there so let's do that that that that that and you should have this so this is the most boring part right i mean this takes time this is not a blue marble this is a non-blue marble so go ahead and add all of these on all of your images and once you're done then you click on actions and export annotations and when you export them you have a choice of two different uh things right here single file in vgg json format or single file in coco json format so it's up to you what you would like to download because i'm going to share the code to handle both of these so i downloaded that and i have my labels down here in fact i i did multiple things one i downloaded the single class one and then i downloaded the two class ones so let's go ahead and see how that looks like so if i open the coco style right there let me open that in notepad plus plus so you can see how that looks like so there you go it's one long sentence right there but i uh but i installed some a plug-in right there go ahead and google search for how to format json for easy visualization in notepad plus plus or something like that i did a similar search and i ended up formatting it so it's easy for me to show you i don't i think you can go ahead and use this even after formatting as part of uh as part of your uh input but just in case i say both copies yeah so let's look at so that's the coco json let me also open the vgg json so we can look at the differences right away so first of all let's understand the coco style format you have a project name right there you can change the project name if you want and within that you have images how many images do you have it's a single json file by the way even if you have thousand images it's a single json file with all the information so it has uh images as part of images it gives it it gives names and ids to each uh and every image that you have in our case we only have three images and these are the names of the images right i did not even re rename them after downloading from google so that's it so these are the names and it gives you the width and height of each of these images so this is 400 by 400 450 500 and so on and it also gave id for each of these images now in each image we have annotations right we have multiple annotations in each image so that's what comes down here so the next section of this json file is defining the annotations first of all annotation id and is what is the image id so this annotation the polygon that i just defined for the a specific marble is part of our image number one okay and then this is the polygon how many times i click this is the one that defines the polygon and then this is the bounding box for that polygon so it has both information it has the polygon level information and it also has the bounding box information and what's the area and then the next one id category id is 2 which would be non-blue marble and then these are the these are the coordinates so if you scroll through you'll see pretty much the same thing the entire document is pretty much the same thing id for each annotation and the image id where it came from and the category id whether it is marble or blue marble or non-blue marble and so on so if you scroll all the way down that's exactly what's going on and what are the category id i mean we just saw one and two one is blue marble and id2 is not blue marble that's it so this is your coco style format and exactly the same thing i downloaded as the vgg json format and again it's a json file so but the the way it's structured is different so the way the data is defined is different here so it is defining for each image right now it doesn't have like image id one two three four it has like okay that's my image name right so that's the name of the image and the size of the image and what is the file name i mean it just gave the default name there but the file name is right there and the regions within that image so now you have multiple regions in that image so this is region number whatever zero and then if i scroll down you should see region attributes what is that region that corresponds to a blue marble and now the next one what does this correspond to this corresponds to a non-blue marble so the way they define the data is different between vgg and the coco style if i go all the way down pretty much you see all the data is captured right here one way or the other it's just captured as part of this json it's very important for you to understand this the actual coordinates are the same you see 80 36 85 if i go here you should see the same thing 80 85 91 but then the 36 is right there this is x y and they have in a different way so other than that it's it's a json file it's not an xml and in the previous video we looked at xml okay enough about annotations now that we understand this let's jump into our code and i have my code right here let me clean up my experiment right there and let me also zoom in so you can see things a bit better and what else and let me just show you the data set in my directory i think i showed you i have three images that's it with some of these marbles uh labeled and the labels are in vgg format and cocoa format so far so good i hope and now jumping into the code i am going to add a header right here to show you or share the links of all the ways you can generate these type of labels if you don't like the one i showed you you can pick any of these i have one for vcg style this is the file that you're seeing on the screen for vgg style and a similar one for coco style so instead of boring you with all the details since i'm going to share these files anyway that the key difference is how you load the data so here it is coco like data set and here it is a custom data set in vgg like maybe i should have called this vgg like and i did copy bulk of these from the mask rcna repositories so i do not take credit for any of these all i did was take what's existing out there from multiple sources and then just customize it to my style that's it that is the best way of being efficient uh if you want to prove to someone that you are amazing at coding go ahead and do that but this is being practical go find something that's 80 percent there 90 there and finish it off by adding or customizing the remaining 10 to yourself that's what i typically do for my for my code like this a complicated code like this okay with that information let's jump in if you remember in the last video you know in addition to mask rcnn libraries we imported xml so we can read xml so that's exactly what we do here instead of xml we're going to read json everything else is pretty much the same thing intuitive i should say so these are the libraries that we need let's go ahead and import them and again this is the model as you know mask rcnn we'll load the model and then we'll load the pre-trained cocoa weights and then we use transfer learning from that point on so that's that and we need to define the configuration so it's going to download the default configuration that can be updated with our own uh for example number of epochs and not number of epochs the step size and so on um data set so i think we are good right there i don't think we need the data set because i'm not downloading any dataset that are part of mrcnn but i'm going to leave it right there display instances is going to visualize help us visualize the instances after on the screen and extract boxes also helps us in extracting the boxes as it as it says okay now going through loading the data set so here the class custom data set has multiple functions so one is loading the custom data set and loading the annotations and loading the mask and there should be one more down here so let's go back all the way top and this is where you can if you have multiple classes you can add more yeah so add class i added one and two for blue marble and non-blue marble if you have multiple go ahead and add them in fact i like the way the coco style guys actually this code handles it you don't have to define any of that because they are automatically reading that from the coco like the categories and id and the name and then they're updating it automatically so i don't have to edit anything as part of the code each time i have a new data set um what else do you should you be knowing okay so training and validation that i also like the way we are handling it right here if you remember in the xml xml case for example if you look at the kangaroo data set it says okay the first 150 images go ahead and use for training remaining used for testing this is a much better way of actually handling that than splitting it that way okay so getting a subset of what you have is what i mean and we are defining the data set directory and we just looked at the vcg how the vcg annotations look like and just a snapshot of that so based on that go ahead and load the data set go ahead and load the annotations get the values and annotations is basically the re we just looked at this right so let's go back the regions regions right there so as part of the regions you have all of these that's exactly go ahead and for each of those go ahead and read that yeah and for each annotation go ahead and look at the shape attributes uh in the region and region attributes uh right there yeah again it's it it helps if you're doing this for the first time going back and forth and trying to make sense what the heck does shape attributes mean because within the regions you have shape attributes and you also have the region attributes when you scroll down yeah so region attributes right there it tells you that the label is blue marble so from the region uh attributes you're getting the labels so go ahead and read that and uh that's uh pretty much it so then we have two classes right here and same thing with loading the mask it's pretty self-explanatory so i'm not sure if i should be wasting your time going through each line by line we are drawing a polygon here and so on so with that information let's go ahead and run these lines and i think there is not much to add when it comes to cocoa now i'm showing you the coco style i'm switching between vgg to cocoa and it's it follows very similar path the whole point is guys you're trying to load the annotations and the labels as long as you know how to do that given your annotations you're set to go ahead and train the mask rcnn training is is is right there with a single click with a single line you can start the training it's just getting the data ready so i'm showing you two ways vgg and coco style and as and understand how the structure looks like and go through each line and say oh it's actually getting the attribute it's actually getting all x's from here all wise from here so so that's exactly what i want you to focus on and it's not wise uh thing for me to go ahead and go through each and er every line of this code to explain the basics but let us with that information let us run all of these lines up to this point i'm going to run only the vgg for now so we have everything down here i mean it's just defining a class right so now we need to define our data set so let's define the data set as i mean we did this in the last tutorial so we instantiated that as a custom data set this is the class we are calling and now we are going to load the custom data set for training and that would be part of my marvel data set slash train unfortunately i don't have like testing data set where i annotated so i'm going to use the train as test but i hope you have both train and test data set so let's go ahead and prepare this and let's go ahead and print out so we have three training images that's pretty much it and let's uh visualize it before we start the training process and for that i'm just taking image id equals to zero if you have hundreds of images you can randomly pick one image and then display it but since i have only three i just want to display the bounding box right there and display instances these are the two methods that we imported from our masker cnn and there you go so this is these are the labels i have done right now well yesterday exactly using this method yeah so when you download this so all i'm trying to do here is display the label so it's display the bounding box and i'm not sure if you can tell from your side this is there is a color overlay over the marbles showing my polygons yeah so that's what i'm trying to show right there okay so that is display instances and now we have to define the config file we have to update the config file so the config file the three things that i always update everything else i leave it default one is the name of the configuration file i'm going to call this marble configuration number of classes we have two classes plus one background so total three classes and steps per epoch i'm just going to leave it to 100 steps per epoch and you can define like minimum confidence level and a whole bunch of stuff as part of config what other stuff that you can define as part of config there you go everything you can define pretty much uh everything right there yeah okay now this part is basically okay where do you want to store the log files while you're training where do you want to store the the models as we are training like it stores a model after every epoch and also where are my cocoa weight weights because this is transfer learning we start by loading the cocoa weights uh mass garcinia and cocoa weight are the weights that have been trained on this cocoa data set so that is what we are loading there and now we are ready to define the model load the pre-trained weights and start the training process so define the model mass rcnn in the training mode right there and then populate the weights to the existing model right there so let's go ahead and load the weights this is very similar to the transfer learning we did in keras anyway and model.train go ahead and train this and using the training data set and i'm using the training and testing data set to be the same but don't do that define another test data set and use that for testing purposes so go ahead and do that and now it should start my training anytime okay so this is how you do your training and i was a bit uh impatient i just did i think only for three where is my logs folder yeah so i trained two different models one using the coco data set one using the vgg and i trained each for only three epochs it's not fast it's low but of course if you have more data it does make sense for you to train it a little longer but i'm cure i mean we'll see i already saw that the results are not super spectacular but they're not bad either i got decent results even in uh even in these three epochs anyway each epoch is going to take seven eight minutes i'm going to kill that we don't have that type of time during the video and let us start the inference process now that we have a trained model let's go ahead and start the prediction process i am bad at writing that stuff so i should have done that inference this part is the inference part so you can put this in a different file if you want i am i mean from this point on but it doesn't matter let's go ahead and load these we already loaded them anyway i guess as part of our training and i'm defining the prediction configuration again it's same as the same as this other configuration marble configuration right there i could have just called the marble configuration but that's okay let's go ahead and run this and a function to evaluate the model it's just doing the model.prediction i mean it takes an image gets it ready performs the prediction and then and then reports the reports the metrics if you are curious go ahead and check that out so let's go ahead and run this and uh now let's instantiate our configuration right there we already defined that and let's actually define our model exactly like we have done before but now we are going to load the weights from our specific training which is trained on three epochs so let's go ahead and load that one and let's start by evaluating the model and then let's do prediction so evaluating the model and print the map value oh it's actually it's not done yet it takes it takes a little while and map is 0.8 again i'll leave it to you to go ahead and research what map is that's that's okay i thought about explaining that that's a different i think that's an altogether different uh video by itself but again i'm getting a value of 0.8 i'm happy with that i'm fine and now let's go down and visually see how good the how good the results are yeah this is this is a a metric that you can use uh to quantify the results but let's we are humans we like to see things so let's go ahead and visually see that okay so let us load a data set that is not used or an image that's not used as part of the training and i put that in my validation folder uh marble data set i have train and valve the only thing is for validation i don't have the annotations i was lazy enough not to do them but if you have annotations that's when it becomes a true test data set otherwise it's just a bunch of images sitting in a folder so i'm going to load one of these that's a good one it has a whole bunch of marbles so let's go ahead and use that load that image using psychic image and of course we can go ahead and look at this image the one thing i'm curious how this performs i'm not sure if i tested this on this image the training images the background is different it's bright and here the background is kind of dark so it comes down to does it how does it handle that because it never saw it during the training process we'll find that out okay so let's do model.detect as you know for mass rcnn it's not model.predict it's model.detect and detected gives you a dictionary oh sorry i opened the wrong one where is my detector right there it gives a list of let me close this detect it right there it's a list of a bunch of stuff let me open the dictionary right there so you have your class ids for all the marbles detected in the image and the masks and the rois for each of those and the score 99.9 percent in these cases sometimes you see 78 so using that let's go ahead and try to plot um or let's try to plot that on to an image in a second so i am defining my class names so they can be displayed on the final image let's do display instances and results is not defined okay results is my detected zero let's run these two so you should see something um right here there you go oh that's actually not bad let's expand this that is a pretty good job in terms of detecting the objects now is it getting the blue versus non-blue right now we have to pay close attention in terms of is it getting blue versus non-blue right with only three training epochs and limited training data i'd be super shocked if it did a great job but i have way too many so it's tough to kind of see again i'll let you work on your own data set the point of this video is to make sure you understand how you can do the entire project end to end and i hope you you're seeing that right now i found a couple of uh functions while i was searching for how to better display uh the results and one thing i like was this color splash that actually renders everything black and white except for the objects so if you like to make your reports sexy you can go ahead and and and use these functions right here so i am just going to define the functions there and let's do the color splash right here test test for jpeg so i'm using test 4 not the test one it's actually it's doing the prediction and then doing all the display and everything and what is it doing image path color splash i think color splash it's going to write the file so it says video writer i haven't tested this on video files guys maybe it works on video files too so let's go back to custom mask splash right there this is the one it just saved let's open it and there you go so i mean supposedly each and every one of these is detected as an object and all detected objects are in color everything else in black and white and obviously it's detecting some parts of the hand as color so the more training the better it is the more training data and the longer epochs the better it is so that is uh that is that and just to prove that even the coco style is working let's go ahead and run this so you can see that it is actually starting the run so both these files i hope you can use them as nice templates for customizing nascar cnn for your own specific purposes and again this mask cnn rcnn is working on tensorflow 2.2 and python 3.7 and two videos ago we went through the process of installing it and there you go so this is your entire project for you so if you really like this video again go ahead and hit the like button and please do not forget to subscribe to these channels guys thank you
Info
Channel: DigitalSreeni
Views: 31,756
Rating: undefined out of 5
Keywords: microscopy, python, image processing
Id: QntADriNHuk
Channel Id: undefined
Length: 27min 45sec (1665 seconds)
Published: Wed Sep 07 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.