Instance Segmentation MASK R-CNN | with Python and Opencv

Video Statistics and Information

Video

Captions Word Cloud

Captions

here we have the segmentation of three different categories we have the dog which is correctly segmented right here with this polygon here we have the horse this blue one on this image and then we have this person right here hi there my name is sergio and i help companies students and freelancers to build visual recognition projects we're going to see today a really good argument which is mask rcnn mask rcnn is an instant segmentation algorithm it means that it can detect the objects on an image but also it can't put a mask on each object so that you have the segmentation pixel level to make this simple if there is a person you will not only have a box with the person but you will have exactly the coordinates surrounding the specific person and mass car cnn has been around for years since 2017 so it's a bit old algorithm because in computer vision for years is a lot as things are developing so quickly but it's still a great algorithm really effective and i've been using this on also commercial projects and it's working great in this video we're going to see the simplest implementation possible of mask are cnn so that even if you're a beginner you can run this with only opencv and most importantly with a few lines of code which are easy to understand if you're ready let's start the only thing that we need to run mask rcnn is the library opencv that you can install with pip install opencv dash python pip so on command prompt cmd pip install opencv dash python and you will have opencv once you have opencv there are two files that you need to download and let me show you which are the files we have two files one is frozen inference graph cocoa pb and the other one is mask rcn except this one with a long name anyway i will put this link on the blog post below so that you can know exactly these two files and now you will see how to work with them the first thing that we will do is import opencv so we're going to import cv2 which is the opencv library and we're going to take an image and display the image i have a few images you can use these images so you can with the file new files you can also download the images from the blog post let's load the first image emg equals cv2.inread and i have an image which is called raw.jpg road.jpg to make sure that we are loading corrected image we can show right away the image see it so that im show oh name of the window lets see image and then what do we want to display we want to display emg and we put this one now we need a wait key event to keep the image on hold until we press a key save it to dot weight key 0 let's run this one if we see the image it's a good beginning and we see the image now what is the idea the idea is that mask rcnn is going to detect a few objects for example the dog the people that are on the road the cars and we will have first the detection and later also the segmentation now how do we load mask rcn and let's do that before the image we can load masker cnn so loading mask rcnn load mask rcnn we're going to use the deep neural network module of opencv and we can do it this way net so that stands for network equals cv2 dot dnn dot read net here we've the diplo network module of opencv where we can load module from different other networks that are not opencv in this case we're going to read the network from tensorflow does this mean that you need to have tensorflow installed absolutely not you don't need tensorflow just we will load a model that was built on tourism floor so you don't care about tensorflow on this line we now to call two files one is the wait file and the second one is the configuration file and these are the files that i told you at the beginning that you have to download i put them into a folder which is dnn you can put them wherever you want just for matter of of having everything organized i put them into a folder and i will put their path here so the first one that we need to put is the model so the model weight file is for dnm so this is the path then frozen inference graph you can see the file here frozen inference graph cocoa dot p b and then the configuration of this one comma configuration is mask rcnn inception version 2 coco 2018 1 28 dot p b t x t now how do we make sure that we're loading this correctly okay first of all here i'm missing dnn we can run this if we are getting some error then it means that we're not loading them if we see that it runs like in my case it's running we're loading them let me show you what kind of error you can get let's say that there is a typo on the name and we cannot load the file so if we run this of course we're going to get uh can't open the file pretty much so first we load the file and we did that let's see let me fix this one okay now that we loaded the the network so that we loaded the mask rcn and model we can move further and we can go right away to detect things on the image so first we load the masker cnn and we did that then we load the image and we did that and now detect um detect objects let's say detect objects i couldn't come up with a better explanation i don't know you will see what we're going to do now first thing we need to convert the image into blob so blob equals cv2.dnn dot blob from image why do we do this simply uh before processing an image to be analyzed by a deep neural network there are a few operations which we need to do and this function does this operation it might be uh scaling of the image to down simple to make the image smaller swapping the channels because different framework use different color format and so on so in this case blob frame from image what do we need to say now of course which image this one the one that we loaded before so blob from image is this one then what else do we have to do first of all i'm going to check the parameters which she's asking here scale factor we don't need that at least for this for mascara cnn size we don't need to do anything with that mean we don't have to do anything with that uh swap r and b yes so swap r and b is true why this because um tensorflow works with the format of the image rgb so the the color images are made by three channels the red the green and the blue on opencv is the same but the first we have the blue so bgr first with the blue and last the red so that's why we need to swap them and we can go on with this one once we have the blob then we need to put the image into the network so net dot set input and what do we want to put we want to put the blob and we are almost at the end of at least the masker cnn processing step now finally net dot forward and now we need to access the last layers where we have the information so where we have the boxes and where we have the max the the masks so detections out final and then detection masks and what do we get we get boxes and we get masks and let's now run this one and of course i'm getting an error so let me see what is this error i think that there is some type of weave the name of the final layer so i don't remember them exactly so always detection out final and detection mask probably there was a typo here uh let me check one moment uh well for what i see probably there is a typo somewhere here so detection out final and detection masks let's run this one i get the image i don't get any error it's working what does this mean it means that masker cnn was loaded mask rcnn processed the image and we have already the mask and the boxes so a big part of the work is done we have all the information that we need for the segmentation of this image now it's only i mean not only because there is some work still to do but it's all about extracting the information from what we have right here so let me show this to you boxes what is inside boxes well you can guess first of all let's print boxes on boxes we have a lot of information regarding the position of the object we have the the confidence of with which the object was detected uh we have also the class of from which the object was detected and all these kind of information so we need to extract them somehow let's start with box by the way before you ask i will put some information more on the blog post below what object are you going to detect with this model this model can detect up to 80 different objects which i will put the list on the blog post you can attack people cars trucks um different kind of animals so the really common objects you can detect them and there are 80 of them and you will see them on the list coco dataset if you are wondering where they are from a box equal boxes zero zero zero and i will now run this one slowly i will explain each step so in order to access the box so in order to access the coordinates to draw the box surrounding the object we need to access this last value right here on the array we need to change this one so as we have many different objects if i put one i will get object zero the first object if i put one i will get the second object if i put two the third and so on let me show this to you i'm going now to print box and i'm going to run this one and inside box we have some a few values you see zero zero zero point nine nine zero and so on um these are somehow coordinates that we need to to adapt to our image we will see how to do that but keep now my word for what i'm saying on this specific array we have the position of zero one two three four five six so we have the last four of these are our coordinates it means that x equals box 0 1 2 box 2 or box 3 box 3 then y is box 4 then we have y2 and x2 so x2 equals box 5 index 5 and y two box index six now why did i say kind that where they need to be adapted to our format as you see uh i hope you see this word because it's really small the the character of this one we have zero point something so we have 0.6 60 0.30 0.71 and so on when we need to draw coordinates on an image let's say if the image is 1000 pixels by 500 pixels we need integer numbers so if let's say if this one is 1000 of width and 500 of height we want let's say some integer to draw maybe some rectangle here or around the image why do we have these coordinates zero point something we have them because on the deep neural network the value go goes from zero to one and so we need to adapt that with the size of the image and it's somehow simple where we have x 0.60 we're going to multiply 0.60 with the width of the image so we get the width and the height of the image high height and then with and then chan is we don't care so i'm going to put underscore and i'm going to pause to drink some water we use underscore emg dot shape we multiply the x with the width we multiply the x to also weave the width and then we multiply the y and y 2 with the height y and y two with the height and let's print some of them let's print x and y and let's see what we get now you see we have coordinates 768.73 and 264.29 it means that somehow we got bigger number associated with the image but still there is one last change that we have to make because either the coordinate is 768 or 769 it cannot be 768 point something so we're going to change this and we're going to put int all of these must be integers so i'm going to put in on all of them right this way and now it would be the moment of the truth we're going to draw the rectangle f e if the rectangle is matching with the object so it means that we did something correctly see if it's rectangle where do we want to draw the rectangle we're going to draw the rectangle on the original image when we want to draw a rectangle we need two points we need if we for example want to do a rectangle around this dog we need the top left point so it would be right here and then the right bottom point which will be right here so the top left point would be the coordinate x and y x and y and then we have the right bottom which will be x2 and y2 then we have the color we can put oh let's make this blue so 2055 of blue which is the maximum zero of green and zero of red and i'm going to run this one oh and we have the girl which is surrounded by a blue rectangle so it means that we did something correctly but let me uh make the thickness of the rectangle a bit bigger so that we can see this better so i'm going to add after all the the parameters of the rectangle i'm going to add at the end the value which would be for the thickness so we're going to make the rectangle three pixels thick and let's run this one now this is for only one one object of the image so as i was telling you this will detect all the objects but we can try changing now the value right here so this took the first object of the picture now let's take the second object and i have it right here you see we have another person right here and so we can take all the objects this way of course it will not be um it wanted walden's be smart to take the objects this way so we're going to first of all uh understand how many classes we have so total uh not classes or uh how many objects total detection or detection count i know you can call this uh detection count equals and we have boxes dot shape two and then we have also and that would be enough for the moment let's print detection count and let's run this one uh detection count we have 100 so probably 100 is the limit net now the algorithm is set to to attack up to 100 objects so this means that now we can loop through this for i in range of what of the detection count so we want to loop through all the objects that we have right here let me move this one and this way so it will this will be a loop from which will count from 0 to 100 and here on each count we change the value so that we look through all the objects and let's run out this one and as you see right now we have this count correctly what would be the next step we might do a few things for example change the the color for a different class or or yeah that would be one thing that we can do but we can leave that for the latest steps now let's finish the work and then at the end of the video we're going to make the optimizations and make this look better the important thing is that you see that we are correctly detecting the objects and now this first step is object detection but as we're using mask rcnn we're not satisfied only with object detection so we're going to move to the second step which will be also detecting the mask for each object so we're going to get the surrounding exactly the pixels surrounding exactly each object now before moving to the masks let's extract a couple of more information let's print box so uh i show you before that from box we can extract the coordinates so three four five and six the indexes are the coordinates x and y the top left point and x two and y two the right bottom point but in box also so if i run this one and we see what it's here on box we have also the index one which is the class what it what does this mean the class is associated with a name so class zero for example in this um mask rcnn return algorithm is associated with a person so when we see zero we can say okay this is a person there is class 17 class seven i see in this example that they might be uh the dog the cars and some other objects and then we have a box so box index one box index two is the score so the confidence how confident is the algorithm that that specific object is really that object so we need of course both of these values so let me put some comments over here get coordinates get box coordinates accordingly and then we have the box index one is the class um object plus and then we have box index 2 which is the score in and box so is the score um let me check i want to make sure that i'm doing this correctly yeah box 2 is the score so let's call this one class id not obj class class [Music] class id and after this we can now we can either use this information but i will go right away to get also the mask so after we have all of these let's also get the mask how do we get the mask the mask is on these arrays so if we print these masks and run this one here we have a mask for each single object so the way to extract the mask is that you see we have boxes and we have masks for each box there is the mask associated so box 0 there is mask 0 associated box 1 there is mask 1 associated and so on this makes things really simple to be extracted and so the mask mask equals masks plural so we're going to take from this array the same index that we're extracting from the boxes so we're going to extract box with index i and we do the same for the masks so index i but also we need to give the class id to the mask and we have also the mask lost so let's let me print the mask of the object so somehow we're going to be close to the conclusion of the project if it runs but somehow i did some mistakes somewhere here so i'm going to pause the video and i will find what was wrong with line 29 what is wrong with line 29 class id must be an integer so these things sometimes can be a big pain if you don't get the error right away so i was lucky about this one so class id this must be an integer and so if we run this one and we print mask that's what we see now what is this mask okay but the mask is okay yeah we're printing a mask if we print mask dot shape we're going to get a better understanding so let me go a bit deeper master.shape and i'm going to run this one to see what we get mask.shape we're going to get a array with size 15 by 15. what does this mean it means that we have a really small image of 15 pixels by 15 pixels which is the mask of each single object how can be the mask of this object 15 by 15 and the same 15 by 15 for this one of course they cannot be this way so this is where we get information but we need to um to adapt this one to the size of to the size of the object so we're going to do that on the mask we're going to resize this mask with the size of the object cv2 so mask is cv2.resize what do we want to resize we want to resize of course mask do we have the size uh we don't have it but we need to get the size so how do we get the size of the object first of all we're going to extract the object how to extract the object we're going to take the coordinates object object right let's call this just right equals this right stands for original interest and i'm going to extract the objects from the current so frame and on the frame we're going to put y not frame from we're going to accept what the image okay it's emg uh emg from ing we have y2 y2 and then from x2 x2 and to prove you that i'm doing things correctly i'm going to show the roi so you don't need not to do this uh this c2m show because it's just temporary to show you that this is correct right so we're going to extract object by object one by one so this is what i extracted right now so when we move to the next object and so on so we're going to extract one by one for each object we have also the size so we have roi um roi height roi width uh the channels equals ri dot shape um and then well that's big part of the work because now we have the size of each single object we have also not only the size where also they're the region of the object and we can now do mask equals resize mask of mask and the size is alright width and roi height is this all unfortunately this is not all but we will get close to that let's print now mask if i run mask right now that's what we get we get a big mask with the size of this region there is something on this mask and it's the values that this mask has so in this specific case uh i i have to check because on the mask the value go from zero to one only so i need to see why we have this six point something and so on if i'm printing something else as well but i don't see anything else anything and so let's move to other objects oh this is what i was talking so the mask has this value from zero so we have zero point zero one and so on zero point till one why do we have so many different values because this is the threshold the the confidence of each pixel on the image so we have a value for each pixel and here we can put a threshold if the value is above a certain size then we can say okay take this one as mask and take this one out and we can start with a value of 0.5 which will be in the middle of confidence so if the value so now we are going to create the mass the mask is a black and white image white is the object that we want to allow and black will be the background that we want to remove if the value on this array is greater than 0.5 we're going to make it as white pixel if it's less than that we're going to make it as black pixel and with the threshold function for this task mask save to the threshold not this one cp2 i missed the threshold okay sieve to the threshold of what we're going to make the threshold of mask and then 0.5 so this is the threshold value that i'm setting if is more than 0.5 then will be converted to a white pixel which value 2055 cv 2 dot thresh binary and now to make sure that the mask is correct we're going to display the mask cv2 dot in show mask and then mask there is probably one last thing to make to make this work um i will run this one but most likely we're going to get an error i'm pretty sure that well i was surprised we didn't get any errors so even happier than expected so yeah here is our mask so what is the goal of the mask is surrounding the person and we get that so if we put this mask over the person it match it matches quite well let's go and check all the other objects don't have a look on this girl that it's there because it's somehow um the window function of window which is messing a bit but it's only this one so the man and we have the mask where the girl the mask and the dog the mask you see it's working quite well the mask detection and so on so i'm go i want now to conclude all this uh step right here and to conclude this we are going now to use this information we're going to draw the mask on another window so that it doesn't mess with the entire with the entire window right here um i want just to run this because somehow there was some error i don't know why during the loop when displaying some objects something is happening on line 33 uh what was happening there why we're getting an error well uh we have the score the score means the confidence of the detection if we have that there must be some reason so there are so many objects we've close to zero detection which will display some object that doesn't even exist and we're going to get coordinates zero with size zero and zero so we're going to use the score if the score is less than 0.7 of course you can adjust the score as much as you wish but generally you should use this between 0.3 0.5 0.7 0.9 usually not less than 0.3 if the score is 0.7 we're going to continue it means that all the things that are here below are not going to be executed if the value is less than 0.7 so if there is not a good confidence about the detection of the object we're not going to display that specific object instead if there is the confidence we will display the object now i was saying we're going to create a new window so we will display all the masks on the new window so that we don't mess with the rest how do we create a new image with the same size we can do uh black image after the image we're going to create create black image black image equals cv2 dot no not c2 np dot zeros um and the size so we're going to create an array which would be an image height and width and three channels and then n t and p dot u int 8 uh let's import numpy as mp num pi s and p so this is an image um this is a black image so it's a lot of zeros that's it a black image is an array a multi-dimensional array with a lot of zeros what do we do on this black image on the black image we're going to now draw the masks of the object detected here we have the mask how do we draw the mask we can get the coordinate so uh get mask coordinates how do we get the coordinates we have the contour function of opencv which is contours uh underscore equals c so that fine contour where do we want to find that we find the contour on the mask and then save it so that chain approx now cv2.red external will not go deep into this thing for for more advanced stuff i recommend you uh to check uh the core the professional courses that i have to learn all these things and on my website and then see to the chain approx simple for this one you just need to to know that we're going to extract the boundaries of the of the white area on the mask contours now that we have the contour of the mask we're going to loop through them for cnt contours and then if we print c and t c will be nothing more than the exact coordinate and of course now we have some error which which what is this error so line 45 uh we're going to get some value from the mask i'm gonna probably it's the data type format that we're using that the control function doesn't like so i'm going to convert this to another format np array mask then np.uint8 and probably this one will be more appreciated than the other format and it looks like it is what are all these numbers these numbers are coordinates what do we need them for we need them to draw the polygon surrounding the mask and so now we get the coordinates and we're going to use them see if it so that feel polly and i will show this step by step by the way feel paulie on um phil polly of what of let's say the region of interest roi and then roi uh it will not run because i have to comment this one no okay feel paulie okay let's do everything feel for it means that we're going to feel a pulling we're going to draw polygon and fill it with the color field poly where we want to draw now a polygon on the region of interest roi what's next the points of the polygon the point the polygon are cnt which are the coordinates that i was just printing the one that i showed before and now we can just define the color let's make now this blue for simplicity and then so 2055 of blue zero for green zero for red and that's all pretty much uh let's now run this one okay that's what i want i mean this is the roi and we're now going to put the color with the mask to cover the object and we will do this also on the on the second image so that we don't mess up with the original image otherwise we wouldn't see anything if we cover the objects so what do we do instead of taking the roi from the original image follow this we're going to take this from from the black image so our eye will be from the black image black image so if we run this one you see now this is black and it will not mess with the original image so let me show the final result i'm going to display cv to the in show we're going to display black image and then black image right here and let's run this i need to remove this one let's put a comment and let's now run this one okay this is already something is working so we're detecting the objects and here we're going we're displaying the mask for each single object detected i would go a step further now to make things look better so what i'm going to do consider that we have different classes we don't have only the person but we have also the dog with the cars we want to make a different color for a different class so that it will be easy to distinguish the different categories of the objects so let's do that how do we do that well one way would be to generate random colors so a color is made by three values the red star the blue the green and the red so we're going to generate a lot of three values together and we can do that somewhere at the beginning uh somewhere right here generate random colors colors equals mp.random.randint and how does this work first we need to say from which value does this need to start so the color can start from zero which is the lowest of the absence of the color to 2055 which is the maximum of that color and then how many do we want to generate of this we want 80 of them y80 because with this deep learning model we can detect 80 different categories now one last thing how many we want in one array we want three objects uh two of three numbers because a color is made by three channels so that's why we use three here so if i print colors that's what we get so we have 80 80 different colors made by blue green and red so class 0 will be this one so each time we have class 0 so person will be always this specific color each time we have class 1 will be this color and so on and so let's do that right here we have phil poly so we're going to fill the polygon with a color so let's define the color color equals colors of colors and class id so class a d is the index so if a class id is zero it will take the first column if class and this one it will take the second and so on i'm going to put color right here and i'm going to run this one and work close to the end of the project uh there's only one problem that on line 52 we get an error what is the error oh that we're extracting class id okay class d must be an integers so the index must be an integer so it's important and i guess also that the color all need to be integers so in order to avoid mistakes i'm going to make color i'm going to take all the values color 0 int so that this function will not give me any problem so we have 3 values and we're going to extract all of them and convert them into integers so we should be fine with this and i'm probably using more parentheses than i should because somehow uh somehow i'm not happy of how i'm closing this one probably this is the way okay better than before let's put uh more happy colors on the background instead of the black let me do something like this instead of the black the black image we can fill the black image with colors so it will not be black image anymore even if i call it black image it will be now black image and on the black image we're going to fill this with 100 of blue 100 of green and zero off red and we're going to run this with random colors this is our object segmentation instant segmentation where we're detecting the different categories so we're detecting people we're detecting trucks we're detecting cars and we're going also the dog the dog which is here alone and we're also segmenting them at pixel level and you can see it's working really well we can also improve this one or have more objects by changing some values for example we can change the confidence threshold we can go um confidence threshold where we get the detection this is the score so we can use 0.5 so we're going to get more objects at least we should get more objects probably we have a few more people on the background right here and of course you can try with this with different images i have some other image right here let's use horse.jpg just here beginning with horse dot jpg and let's see what we get here we have the segmentation of three different categories we have the dog which is correctly segmented right here with this polygon here we have the horse this blue one on this image and then we have this person right here i hope you enjoyed this tutorial this was everything i mean there was a lot of things right here so probably if you're a beginner this might be a bit overwhelming but if you follow this carefully there is a lot of information that you can apply with your own project if you want to develop more advanced things for example if you want to train um different algorithms to detect different kind of objects or if you want to learn how deep learning and object action works in general i suggest you to watch my website buysource.com because you will find a lot of sources for this this is all for the moment see you on the next tutorial

Info

Channel: Pysource

Views: 8,708

Rating: 4.9727893 out of 5

Keywords:

Id: 8m8m4oWsp8M

Channel Id: undefined

Length: 49min 26sec (2966 seconds)

Published: Tue May 18 2021