Train Mask R-CNN for Image Segmentation (online free gpu)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we are now going to learn how to create a custom mask rcnn detector to detect any object for example i picked screwdrivers and this is the result i got hi welcome to this new video my name is sergio and i help company students and freelancers to ease and efficiently build visual recognition projects we're going to see today the easiest way possible to train a musk rcnn detector if you're not familiar with this let me give you an example for example we can pick an object that have this screwdriver if we if we train a detector to detect the screwdriver we mask rcnn we will be able to detect the location of the object doesn't matter where the object is but also to draw specifically the polygons surrounding the object so that's what mask rcnn does and we're going to see the simplest way possible also online so you don't even need a gpu or a powerful computer we will do everything online using the free google column notebook so if you're not familiar with this stuff everything will be clear once we go on and we're going to do this in three steps on the first one we're going to collect the images and prepare the data set and of course i will show you this on the second we are going to see how to train the custom detector with the google call up notebook i've prepared and that you can download of course on the link below in the description and finally what you probably are most interested in is the first step where you can take your model and use your model to detect the in my case the screwdrivers but you can choose any object you want so let's start right now the first step that we have to do is prepare a data set with the images so in this case for example i have a data set with a lot of screwdrivers and you can do this with your phone you can take pictures with your phone and they will be fine to go so you don't need any advance any specific things to take the pictures as long as the the object you're interested in is on the picture so how do you need to take the images let's save for example in my case i want to detect different screwdrivers so i put a few screwdrivers on the table i change the position all the time so the more the variety the better and even if there is some other object like in my hand or whatever any other object even if you are not interested in that one is fine because the model will consider that as a background after so it's fine as long as you have your own objects that you want to detect right there a few of them and how many pictures should you take well the concept is the more the better uh so you could even take thousands of picture but and it would be of course better than having a few pictures but it will take a lot of time for you later to make the annotation so i will recommend to start with uh 50 images 40 50 images just to see how things are going and then you can improve it later with more images so in my case for example i have 43 images and that's what i've got now after the pictures there is the annotation which we have to make so the annotation it means telling where each screwdriver is located on the image so that later during the training we can teach the model to know the exact location so we cannot just give the picture without anything else we need to put the annotations how do we put the annotations there are many different softwares for this but the easiest to use and the fastest that i found is this open source project which is called makesense.ai i will put the link somewhere on my tutorial so you enter makesense.ai we click on get started and now drop images or click here to select them so what i'm going to make i'm going to select all these images and then i'm going to drop here the images and we have now 43 images loaded what do we want to do with these images we want to make object detection and this is where we need to click now we need to create a label so before you start you can create a list of labels you plan to ascend to each object by the way this project is only formal label so this is a demo version we work only for one label if you have multiple labels at the end i will tell you what you have to do so i have also the pro version of this project that you can get on my website if you want more precision more labels and many other things but for the moment let's start with only one object so if you're not sure what to make just you can just use my images so you can download also for free the images all the files will be available on my website on the link below and let's go with this so i have this object and let's start the pro okay let's insert the label i'm going to call this screwdriver and i can now start the project here we will see all our images so that we can work with them we have to make the annotations but keep in mind there are different types of annotation one is the simple object detection annotation where you just need to put a box on each object like this you're annotating the screwdriver of course maybe better than what i'm doing more precise so this is object a simple object detection and that's not what we want to do so i'm going to remove all these labels segmentation is something much more advanced in this so it's not only a rectangle but it's a polygon so you can see on this right side of make sense ai and we can choose the type of annotation so we have rectangle point line and polygon so we select polygon and now we can start making the annotations so with the mouse i click here and we need to select correctly our object so we need to draw a polygon surrounding our objects so like this one this is the first one and let's press okay once you have the polygon then we go on the second one like this of course then the better you make the the polygon on this annotation the the better will work your models so be careful to be really precise with these annotations and it will take some time so there is a lot of work that's why i said start with a few images if you're not sure if you never done this start with a few images and then later you can do the improvements but the beginning there is already some work with this i'm going to pause the video because i need to draw all the polygons once i have all the polygons i will be back and we will go on the next step so see you in a bit after the annotation is completed this is what you should have so for all the images you need to have all the objects with the polygon around the object for all the images you have right here so now we have the images and you have also the annotations remember what you need to do this you need to download the annotation file so after you finish you go on actions at the top left and there is export annotations and let's click here you can choose two file formats to export the annotations and it's necessary that you select single file in coco json format it's important that you use cocoa format because all the steps after will support only these formats so we will it will not work with the previous one single file in json coco json format export and then you need to select where you want to export this one so i'm going to export this on screwdrivers okay i have it already because i exported this before you can call it let's say annotations i suggest to use the same name and you uh i'm using because later it will be easier for you to follow if you change names later you need to change the names again on the next step so just call this annotations.json okay just so on it's there already annotations and then click on save i'm not going to do that because i have it already once you have images so let's go back so we have one folder with the images here all the images plus you have the annotation json file you're ready to move to the next step so let's go on the train already and right now we are already at the second step where we're going to make the training for the training i prepare a google column notebook that you can download there will be a link where you click the link and you will get access to this specific page if you're not familiar with google call up is a notebook offered by google that you can use online and you just need a gmail or google account and you can log in here for free let me show you how to go further with this first of all once you are inside this notebook right here disney notebook you need to enable the gpu the graphic card and this is important we go on edit notebook settings and then there is hardware accelerator there is gpu so non gpu and tpu by default it's set to none you need to enable the gpu why the gpu because we need the graphic card to make the training there is no training without graphic card and now we can start so i divided this in four major steps which i will show you the first one is the installation the second one is the image data set that we need to load the training and then for the detection to test our model let's go with the installation so there is not much that you have to do honestly so it's four steps but you just press this one you press start this notebook was not authored by google so you need to run anyway because google is not granting that this notebook is from extra from some external source because i created this notebook so you press run anyway and now it will take a few seconds to complete the installation so this is what you see right here it's downloading some files and make the installations in order to be able to run maskar cnn when you see this done downloading pre-trained model and here is not doing anything else you can go to the next step i've added this command which is nvidia smi just to see if you have the graphic card enabled like you don't really need to run this one here it's saying only that you have the graphic card enabled so this is the name of the graphic card tesla k80 we like 11 gigabyte of ram but you don't really care about this so you can run this one if you are curious to see the the specs what graphic card you're using but you don't really need that also if you run that if if you forgot to enable the graphic card it will give you an error so it's good that you run that step number two image data set so let's open this one how does this work you need to upload the images that you have on your computer so the images that you created before and also the annotation so let's do that first of all where do we upload them right here on the left bar there is a folder sign and you click here and it opens here all the folders of that you have now on the memory we should upload the files right there with these other files that you see there so let's do that let's do that i'm going to upload the annotation so let's move on notations right here you see drop files to upload them on the session storage um and then we press okay here just saying that after the session is finished all these files from the caller will be deleted so this is a temporary session so that's what it is saying then data set now there is something to do with data set as you see dataset is a folder with a lot of images as we can see right here we can not just upload straight a folder right there but we need to make the entire folder into our file so how can we do that we can zip the folder on windows it's very simple you can just click with the right button of the mouse on the folder you go on send to and then there is compress zip folder and you will get dataset.zip with all the images right inside so this is what we have all the images so what do i do i'm going to put data set right here with the other files and now it will take some time to upload the files so here you will see the progress bar so when this circle is filled the upload will be done if you have a lot of images and if they're heavy it will take it will take some time especially if you have a slow connection i have only a small dollar set of seven megabytes so that's why it takes it it doesn't take that long how to go farther here we have on image data set number two we have images path and then annotations path images path is the name of the zip file so in my case i have dataset.zip i've just uploaded dataset.zip so i need to put the same file name right here so we'll say dataset.zip annotations path so we have annotations dot json and it's the same my case annotations.json make sure that the path the name is correct and you press run on this cell how can we make sure that everything was loaded correctly well you will get a message where it says extracted 43 images of course the number of your images have 43 images if something is wrong we will simply get an error and so you will need you will know that here you need to change something then there is uh the next cell that we need to run where it load loads the images on the memory so and you will see validation and classes so one class and now display images samples so everything right here is regarding the data set still okay here it's the this function right here is just loading random image to show that you have the correct annotation you can see i have some image and then i have screwdrivers here is the the mask done by the annotations and you should see of course your own images right here with your own mask right here and here it's normal that you see empty because this is done for multiple in case you have multiple classes but this demo version will work with only one class so just use one object and not more than that once we're done with the images it's the step of the training so the training has two cells right here the first one lost the configurations of the model so we press run on this one now it's detecting for how many images are there so the right configuration of this model so everything you don't see this there is a lot of code written behind this which is performing right now and it takes just a few seconds this one few seconds and now after the configuration is loaded we have the training step and this is the step which will take the most of the time so we press run this is the step where the model is going to be trained to detect in our case in my case the screwdriver in your case your class that you choose to to train so now the training hasn't started yet when the training starts we will see here a epoch one out of five and we will see all the progresses step by step so that we also have an idea of how long this is going to take oh now around half hour is gone and we see that epoch 105 is completed with 500 out of 500 iterations and it says here how long it took 1421 seconds so less than half hour to complete one epoch so we can estimate that in total around two hours and off it should complete the project in your case might be taking less or longer because it depends on how big are the images that you have but also on the graphic card that you are using from google google call up doesn't give you always the same graphic card sometimes it's faster sometimes is lower so you can expect different timing for this now i'm not going to wait that everything completes because at least one epoch is enough so i'm going to pause this one more than it was i'm going to completely stop this one and i'm going to show you that we have the model um just quickly if we open the folder right here mask our cnn that you see here we have logs object this is our model we have mask rcnn object zero zero and then i don't see anything else zero zero one is the first epoch dot phi h5 this is the model that later you can download i will show this at the end of the project so just i will i will leave this one there for the moment and we will see that later how to download the model so that you can import it and use it while here at the end of the project i have a detection so this is just a code that you can use just to test your model without even downloading so before downloading the model you can test and see how this works there are two steps right here the first one is going to load the latest model created so i have only one so it's going to load that one but if you have epoch five out of five completed it's going to take like the latest one so it's going to take the fifth one because it's likely that the newest one is the most precise one and this is quite a fast operation so we load this one it will take a few seconds you see loading weights from content mass car cnn locks mask rcnn object001.h5 and now let's test this on a random image you don't have to upload anything the images are there already it's going to just choose a render image from your data set and see what is the result so let's run and test this one when you run the test you will see two images and i will explain how this works this is the test so it's working quite well already it doesn't detect uh all of them but the result it's pretty impressive to be after only half hour of training you see it's uh the boundaries are maybe not perfect because it's like the screwdrivers it is not taking the entire screwdriver so there is something missing like this screwdriver is not taking it at all but it's incredible uh that only in half hour we have such precision and this is the annotation with which is doing the comparison so this is the annotation that was done this is the image annotator and this is how the model is working so you can expect if you this if you leave this training for longer uh two two and a half hours or like till completing the process you can expect that it will get almost 100 of the things at least in this uh simple settings uh let's load some of the image so i'm going to run this to see some other random image oh it's the same one yeah because i have a small data set so it's only for oh i'm for okay now we we got a new image so this is another example that i have oh my validation data set is only four images so there is not a lot of choice right here if you have at least one other images or more you of course you can test this with more of them and this is how you make the training so now after only these simple operations we have our model ready to run now before concluding this we will uh use another notebook where we load our model so first of all let's download our model we go on mask our cnn and then oh let me look for all logs object and then there is a long number okay probably the date 21 the the date and then there is zero zero one h5 if you completed all the training you will see five of them you see zero zero one zero zero two zero zero three i suggest you just download the latest one that you have so in that case zero zero five so the highest number you just click right here on the dots and then download and it will take some time so you see here the progress is 205 150 megabytes the connection between the call up and downloading is not fast in general so even if you have a fast connection it will take some time so we need to wait that the download is completed and i will be back with the next step to run our model we're going to run now our custom mask rcnn detector for this one i have a second notebook that you can also download from the link there are two links that i put on my blog post one is to for the training and one is to run mask rcnn on images from google call up and it is quite a simple notebook because it has only two cells one is for the installation this one is necessary ways to install the mask rcnn library so first of all we're going to run this one and then before running the second we need to upload our model what is our model our model is simply our dot h5 file that we've just downloaded at the end of the training so you should have your h5 file on your computer and we're going to upload that inside call up how do we do that it is quite simple we need to access the folders that are on google call up we can access the folders right here we can see now there is mask rcnn which i have just installed and now i'm going to upload my model right here i have a few models oh they have because i created a few models uh you need to upload your h5 file i have right here which is called mask rcnn object zero zero six and i'm going to put to drop this file right here uh it will say that once the runtime is finished that once we close we are going to lose the files but that's you should already know that from columb this is going to be a bit slow the connection between collab and the computer is always slow so this 240 megabytes files file it might take even half hour to upload even if you have a fast connection uh this is why on the pro version i integrated masker cnn like the training and the detection with with google drive because with google drive the connection is quite fast so off at the end of the training all the files we will save on google drive on the pro version but for the moment just for testing this is just uh fine you just wait half hour all the time it takes and then we're going to load our model so i'm going to wait that the file loads and then we can move on uh when the upload is finished you should see uh mask rcnn object zero zero and then the name of your object i have zero zero six dot h5 so the h5 file must be here and once we have this we can go on the next cell which is run mask rcnn on images oh and there are two cells right here the first one is loading the image and loading the model and on the second one we're going to display the information on the screen uh first of all we're going to load an image so of course we need the image let me put some image right here so i have some images of the screwdrivers here so i'm going to upload one of these images so i'm going to put this one right here and first of all we load the image here is the path of the image so we need to put the path here of the image how do we do that simply we click on the three dots here on the name of the file and then copy path and then i'm going to paste the path right here instead of the one that i have so content screwdrivers three then the model we need to also do the same with the name of the model so on this line right here test mode inference config we need to put at the end between the apices between the quotation marks we need to put the name uh the path of the model and we do the same we check the model between the files we click on the three dots then copy path right here okay i did some mistake copy path and then we're going to paste the path right here and this is to load the model so we just uh it loads the model it is going already to detect the results uh these operations should be uh quite fast probably a few seconds to load the motor and detect the results so now it's loading the model you see loading weights from content mask rcnn object 006 and it took 15 seconds and the execution was done then we have this cell which is showing the results on the screen uh simply i would not have much time now to go through this but i will just give you some idea it's getting how many objects are detected and then it's looping through each single object and here getting the mask getting the the coordinates of the mask and here we are drawing a polygon around the objects detected and also we're drawing the mask on each object and then we display on the screen let's run this one and this is what i have we have the screwdrivers and you see the mask around each screwdriver and we have for each of them we have a different color to show that we are detecting separate objects uh this is all regarding this stem of course i couldn't go too much into details because it will take a long time i want to make this video short and as simple as possible so that everyone can can follow this if you have a more complex scenario where there is also a different background probably the detection will not be so accurate so now this demo version will work mostly with a simple scenario where there is like mostly non-homogeneous background for example if with the same screwdriver instead of having them on the table you will use for example right here with the room behind it won't be so accurate with such uh training uh now before concluding this video i want to give you a taste of what it's on the pro version so if you're interested to develop this project further you can you might need to you might be interested in the pro version so i'm going to quickly show you that and then we can end the video uh let's now have a look at the pro version that you can get on my website uh we see mask rcnn training pro with the pro version what are the advantages the advantages are that you can train it for multiple classes instead of on just one category you can improve the accuracy so you can have longer training and bigger images you can import a custom detector also there will be some source code separate the source code to use this with python and opencv so that you can have a simple integration and extract information so classes mask bounding boxes and score of each object detected you can run the detector in real time or from a video so from a video from a webcam there will be also the option to continue the training if it's interrupted so uh we've just call up uh if you close call up you lose the files and then you need to start everything from scratch uh i integrated so you see there is integration with google drive uh with the call up raw uh with the with the trending mask share process so that the model is saved on google drive if you close call up you open it again the model is still there you can continue the training so you can work this just a little bit every day and it will be fine you don't you don't have to start everything from scratch if you want to make a really big training with some complex project and also you can evaluate the accuracy of your model so just let's have a quick look at the notebook uh there will be some settings that you can personalize oh and i want to be clear there will be also it's a full mini course so you don't you do not get only the python notebook but also there is a mini course where i explain each single step of what you are doing so how to improve the training why you need to do certain things and so on so for each lesson you get a detailed blog post with explanation of everything that's inside and then there will be on each lesson like the source code download it you can use either on on collab the one for the training or also on your computer so for example uh i have on this lesson run muscles and then in real time i show how to run mask arsenal in real time on your computer and then there is also the source code and explain this also on the video lesson that it's here and so i was showing something about the pro right here also the notebook is more advanced so there will be option to continue frame last training fine-tune training so you can improve the model accuracy once you have the model if it's not performing well there is the section where i explain how to improve the accuracy and you can evaluate the accuracy of the model this is all for this video i hope you enjoyed this video and if you have any questions feel free to post them below and also let me know the comments and what projects you are working on is all for now see you on the next video
Info
Channel: Pysource
Views: 13,712
Rating: undefined out of 5
Keywords:
Id: WuvY0wJDl0k
Channel Id: undefined
Length: 34min 21sec (2061 seconds)
Published: Tue Aug 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.