What's the simplest form of Image Segmentation? | Python Tutorial

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone I hope you're all enjoying your quarantine recently I've been thinking a lot about image segmentation now it has a lot of really cool applications and what I wanted to do today was just implement a really really simple program or algorithm in Python so we can actually understand what the most basic form of image segmentation is because from that point on we can start expanding our knowledge and so we're going to start with a simple basically like a binary image segmentation we're going to look at images that have two objects and we're going to try and separate them as much as possible before we do that let's just have a quick google of image segmentation so we actually know what we're doing so I've already googled it but we look at some Google images and we type an image segmentation we get a bunch of these pictures now image segmentation can actually be split further further into two categories we can do semantic segmentation and we can just do instance segmentation now if we actually look at this picture here it's really good photo image this is a perfect example of semantic segmentation we've got a legend on the left hand side here and it says every person has this dark blue color we can see that every every car has this dark red kind of color which we can see in the background and roads a purple which we can see up here and on vegetation or trees and stuff on this green so so the point being in semantic segmentation every object of the same category will have the same representation now this is really useful because then you can kind of build a rule for you know anything which is blue as a person anything which is green as a tree bla bla bla so in instance segmentation however we treat every object as its own instance so even though these two this is a person and this is a person this is a person and this person and so on you can see they've all got their own unique color representations and the same goes for these cars so this is an example of instance segmentation now they're both really an important concepts especially in self-driving vehicles you can actually check out my blog post which I wrote about how Tesla uses image segmentation you know it should be the first link in the description anyway so that's what image segmentation is now this is all a bit more advanced we can see where these colors we're not going to be doing that today we're just going to be dealing with basically this binary image segmentation using open CV so we want to understand now that an image segmentation all we're trying to do is determine what pixel belongs to what object in that image so before writing some code I'm just going to quickly go into my resources folder which is just where you know we're keeping some images and then we can just look at them quickly I'm gonna start with this coin site jpg and we should probably talk about a couple of things and so this is exactly where you expect there's just a couple of coins and we've got a white background and then two coppery coins in the foreground but what's interesting to note and a problem we won't be able to solve today with our binary classification really and is that if you look closely there are parts of the coin which have kind of deteriorated or you can even think of it as glare like a sunlight gland stuff like that where it's shiny well the this looks closer to this white background than it does to this coppery color especially when we're going to do this binary segmentation but so that's our that's one of the images we're going to be working with the other image we've got is this blend which is quite interesting where we kind of blend from pink into this turquoise color and it's going to be really interesting to [Music] actually do this with binary image segmentation and then the last one which is kind of like to be a bit trivial and we might I might actually exclude I'm decided yet is this colors JPEG so those are the three images we're going to be working with so now we can actually start - encode so this is the moment we've all been waiting for we're going to write some code now I created this empty Python file called Sega dot pie and I've actually been using sublime recently so I've been doing a bit of and sublime is much better for web development than Emacs Emacs is just a text editor for like scripts and stuff generally that's what I use it for at least you probably I don't care that's right so we're going to import two things for now and we're going to import open CV 2 which is actually just CV - and we're also going to form my plot matplotlib we're going to import pie plot which we're going to use to display our images and we're just going to import that as PRT but I think we need an umpire anything right now but if we do we can just import it later so of course the first thing we need to do is we need to grab our image luckily CV 2 gives us a method for this we can just say inbreed and then we just need to give our file path so we're going to say resources and we're going to get our Coinstar jpg now I said we've said already that we're not going to be doing like any crazy image segmentation we're going to be doing what's kind of like just binary image segmentation but we're also going to play around with a couple of different methods for it to so it's definitely stick around but what we want is we want grayscale values right so we've got currently we've got we're going to have this color array where every pixel has 3 values and red value green value and blue value our RGB values and we want to convert that to grayscale so we just have one value so it's going to give us a brightness value where 255 is white and 0 is black and so let's define gray and we can do that again with another whoops what's the CVT color I think another cb2 method we're just going to pass our image in here then we've got another cv2 method which is a cheek color and then we're going to convert it form while usually would say RGB but actually in CV - they use BTR and we're going to convert that to gray now what you're going to define something quickly also because we want to show our normal image to in but not lib so just stick around me here for a second and we're going to say image color and what we're going to do is we're just going to grab all of this but rather than converting it to grayscale we're going to convert it to RGB and the reason for this is that in matplotlib they still use the RGB so right now if we try to display the image we would actually be giving it BTR values so all of our colors would be completely mixed up even though it would still work so now we've got our in our original image this is basic and this is our original image and then this is our grayscale image so I think what we should do now is just plot it so I'm going to create a figure and this figure is going to hold our color in the chocolate and original and we're going to leave peyote tour in show and this is going to hold our image color I can actually show you what happens otherwise if you want PRT figure again and then this is going to be our grayscale value and it's going to be P of T dot hangover what am i doing right okay and then the last one we always need to do is we just need to see it's a PRT - oh so we go back to our terminal and we execute our script which was just saved up PI we should now get two images now we've done something funky here I actually already know what it is so let's go in and fix that actually finger should close these down first though my plot nerd so let's go back to our script and when we actually plot color a greyscale image we need to tell it our color map space is grayscale right think we do it like this I don't think used to be all caps that looks a bit aggressive so that's try again and then we've got a grayscale image and we've got our original image which is great so that's all working so we go back to our script now we're going to move all of this down to the bottom we want to start doing our image segmentation though our binary image segmentation on grayscale values now we're going to be looking at what Sue's algorithm and a triangle algorithm is biases stick around but essentially all binary segmentation is going to be is we're going to have some threshold value so we're gonna have a threshold and we're basically going to say if our pixel if our pixel is greater than this threshold value then we're going to make that pixel with some bat max value so we're going to push it all the way to one spectrum alternatively otherwise what we're going to do is we're going to say that pixel value you know is going to be equal to zero so that's some terrible pseudocode right there but you can't forget what we're trying to do so luckily open-open CVID allows to do that so we're going to define something this is going to be the we're gonna have two values here the return and Thresh that the return value is just the value that OpenCV chooses to be the best candidate to split our image into a foreground and a background right and then our threshold is going to hold our image or fresh so we're going to say CBT dr. Thresh no that's wrong it's a threshold and then inside here we're going to give it a grayscale image we don't want to give it our color image and we're going to actually just to find some arbitrary numbers so this is our threshold value so and then we're going to say it's two five five so right now every value that's greater than zero we're saying is going to be pushed to five five so if it's not black it's white that's what this statement is basically saying if it's not black it's white because black is zero and 255 is white so that's our threshold and that's I'm like smarty so that's what we're doing and then you see v2 dot Thresh and there's a bunch of these so you could also invert this so do the opposite now we could look at that but for now we're just going to do see v2 dot fresh binary and I think we wouldn't really longer really care about our original image I think we all know what the image looks like now Chesham have done that so what we're going to do is we're going to replace this with Thresh so this is going to just I'm gonna call this binary and then this is going to hold our Thresh ready and this should give us something so let's execute our python script so again what I haven't done is we haven't defined this color space right so I did this on the wrong one but we still need to say see map equals gray what's going on here so let's execute our python script so this is our binary so we can see this is pretty terrible because we can actually see that the only things that are really black are these outlines you know these little bridges all around the coin and we can just about see those so the algorithm is doing exactly what we've told it to do if it's not black then it's white we've said now we could change these parameters have been so that's hmm let's play around with that so what happens if we say well actually only things greater than say one to eight are going to be pushed to white and everything else is going to be black are going to be black so what happens when we do that so we get a completely different result now and we can still see that within the cord and this is what we were talking about earlier there's a lot of white parts but we can definitely see that where the coin has a lot of gray parts morva is now kind of distinguished between the background we're not really done there yet so this is kind of this is kind of a really this is really just a binary real literally saying it's this grayscale value bigger than this if it is and make it 255 a voice make it zero we can make those numbers anything but we want to do something that's more interesting so we're actually going to look at two different valleys you know what's going on at least you know the most fundamental way we could kind of segment an image and then we actually chose a different image for example if we chose this colors I wonder what would happen here this might be might be more interesting we can very distinctly see in binary whatever the threshold is that we chose it's clearly made everything zero so everything was below that threshold so that wasn't very useful I want to go back to our coin there because we're not done with that yet coins so what we can do now is we're going to actually check out this Otsu calculation that we can do and we do this by just adding it and we say dot Otsu and this is actually going to ignore these values and it's just going to try and figure out itself so what this Otsu threshold is going to do for us is going to try and determine single-handedly what the best threshold value is to split something into the foreground and it's the background so let's check that out we'll call that Thresh two will now call that Matsu and we'll call that flash 2 so let's execute Python script again so I made a mistake and the reason is that we haven't actually said that this is part of the CV to library so it's Siri to know Thresh so that's our initial grayscale value and we can actually now see that this is what Otsu has done and I can actually show you that because if we change these values if you remember what we got at the beginning then this should now still be the same see so this is still the same whereas before when we did it with 0 255 all we had was this kind of an outline faint outline so this algorithm has not done something it's saying that this is the optimal value and we can actually print this valley I'll be interesting so let's say what is this value that you think is so great and the value is 170 so this is the grayscale value so if we now actually implement it if we just went back to our own program up here and we set our threshold at 170 then although this is going to be really difficult to detect what we should get is something that's a completely identical taught that Otsu program the Otzi algorithm determined for us so that's one thing so we've done hot suit algorithm and by the way what Otzi's algorithm does is it systematically kind of goes through your pixels and it tries to push them I think the way it describes it is it tries to maximize the inter class variance so you have a bunch of pixels that are sort of close together it tries to push it groups and then pushes them all towards zero or towards 255 and it basically looks for the optimal way to push the most pixels in the opposite directions if you were so another one we can do I can actually show you the whilst we're looking at Thresh I can show you what the in does so the in as you expect it just does what we did before except it's going to invert our value so we've got a black background and we've got white coins instead nor the bits that were white and now black so you can use that if you like it makes no real difference I don't really want it so the last one we can do then is this triangle method and from what I've actually played around with this kind of seem to work the best I don't really uh know what it's doing so I can't describe that to you but we're going to call that Thresh three and what we can actually do you'll be interesting to print the return value of that though so let's print ret and this is going to be fresh three so if we now execute this Python script so our values actually two five two which is really high so all it's actually done so this is actually probably the best one we've done because remember what we're trying without actually care about the coin what we're just trying to say is what's part of the cord so right now we could actually write a program that basically said like birth this one with this triangle here we could say any pixel which is two five five is part of the coin and any pixel which is not 255 is not part of the coin so actually look really close to Rimac summize this we can only see a few small patches so the next thing we could actually do is we could average all the pixels right so we could move their neighbors so what this would do is all of the pixels here should actually look at the surrounding areas and they would say well that's black that's black that's black maybe there should be black so in this particular case I would work it doesn't work in every case of course but it's pretty good so this triangle method has actually given us the best solution to our problem currently so what's now really interesting to say is I think the value was - five - five - all right it's - five - so we just show Thresh again now we change our title once more to just finery because these are all still binary a specialist remember and if we now execute our python script one last time then hopefully we get this really good result and what we've basically done is we've basically said unless it's actually white we want it we don't want it unless if it's actually white then we want it to be white otherwise it's going to be black and so we've really narrowed down the threshold and what's interesting is to see what that background is so I wonder what would happen if we made this at what point does does this kind of fail so let's make this 2 2 5 4 I kind of just like playing around to know develop some intuition so this is even better arguably so in our particular example the best thing to do was to basically just say anything that's not too 5/5 already just make that a zero and that's actually giving us some really good result again I'm kind of rambling at this point but I hope you've learned something you've understood what's going on you've been able to take something away from it what's great of course causes the code is pretty simple you just need to install OpenCV if you haven't going forward I'm going to be doing more thorough image segmentation but I think this is a really nice example to build your intuition that all we're doing is manipulating numbers and one of the simplest ways we can do that is to say well we can choose some kind of pixel value and if it's greater than that pixel value that we wanted to be one color otherwise we want it to be another color now of course this only works when there's two things really that's why we talked about background and a foreground the second you have like five six different things it doesn't make sense right it doesn't make sense because you can't have a lamppost being black and a road being black and a tree being black and then just the sky being white you know you're just kind of it's over you're over generalizing and like massively so we are going to have to use deep learning probably to get better methods probably cone your networks or convolutional neural networks and if you're interested in that then I definitely recommend subscribing I am doing a lot of research I've been watching the Stanford University lectures on computer vision which are really good resource so yeah if you're just interested in computer vision in general what I'm trying to do if this channel basically is make computer vision my excites the more I think it has so many net positive applications there there are a few applications of computer vision law I'm not fond of lots of makers political but yeah subscribe if you're interested in this topic and leave a like and a comment and thank you for watching [Music] [Music]

Info

Channel: MonkHaus

Views: 11,302

Rating: undefined out of 5

Keywords: computer vision, image segmentation, image segmentation using opencv, python opencv, triangle binary segmentation, otsu algorithm opencv, otsu image segmentation, binary segmentation, binary image segmentation, otsu algorithm python, otsu tutorial

Id: W-oVad7x-HQ

Channel Id: undefined

Length: 22min 30sec (1350 seconds)

Published: Wed Apr 29 2020