SAM - Segment Anything Model by Meta AI: Complete Guide | Python Setup & Applications

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this intro was made with some segment anything model released by meta AI last week the model itself is very versatile and allow you to use different modes to obtain different segmentation results as the name suggests it can be used to segment anything visible on the image but you can also pick a single point visible on the frame and extract the full segmentation mask associated with that point not only that but some can be used in conjunction with any object detector to create two-stage segmentation solution here the detector produce bounding boxes and Sam converts those bounding boxes into segmentations so if you want to learn all of that just sit back relax and let me show you how to do all of those cool things before we start let me just tell you that the model is so awesome that we actually work on integrating it into our annotation tool so stay tuned for that okay so so let's dive in as usual we start our journey in a roboflow notebook trip was very perfect place if you want to learn more not only about some but also other computer vision models we peaked the first Jupiter notebook from the top and open that in Google Co-op here in the top section you can find links the original Sam repository and the paper released by matayai and when we scroll just a bit lower we'll see the first cell that we are going to execute Nvidia SMI command just to confirm that we have access to GPU in case your notebook runs and CPU mode please follow the instructions in before you start section the model will most likely run regardless but it will be just much faster if you will run it on a GPU next we install some project and our python environment it is actually really fast this time Sam doesn't need a lot of external dependencies so we should be done in no time and in the next cell we install few external dependencies that we will use in our notebook funnily enough installing those few python dependencies takes longer than installing the actual model in our python environment that's not official start but Sam is most likely the fastest installing project we ever had in our tutorials similar to other models we need to download our weights from external link before we load them into the memory and when the download is completed we save the path leading to those weights to a variable and just to sleep good at night we confirm that the file exists on our operating system okay and the last thing that we will do before we load the model into the memory is to download few images so that we have some examples that we can play and experiment with and that's it everything is ready we can just pick the right device in our case it will be cuda0 which means that it will be the first GPU on our machine and we can load the model into the memory like I said in the intro Sam has multiple modes that you can use for inference and what we are going to do first is learn the API and use all different ways to generate masks First Step automated mask generation this is the mode where you essentially create a segmentation mask for any object visible on the scene and to use it we need to import one additional utility from segment anything package and this is some automatic mask generator now we just pass our already loaded model into that utility click shift enter and we are pretty much ready to go now to generate masks automatically we need to read one of our example images using opencv and convert it from BGR to RGB and pass it as the argument of generate method in our mask generator as the result we'll obtain a list of python decks where every dict describe a single segmentation mask the most interesting keys from our perspective are certainly segmentation this one stores the Boolean numpy array describing the actual mask of the segmentation area and the bounding box where the first one described the amount of peak cells that belong into the mask and the second one obviously the location of the detection on the frame to make working with segment anything model just a bit easier we added a native support for some into the supervision store starting from version 0.5.0 you will be able to process those masks efficiently using supervision so let's take a look how easy this to use this P package to annotate our segmentations on the image so I just create the instance of mask annotator convert our sum result into detections this is the object that is recognizable by the rest of supervision Library run annotate method using our original image and detections and at the very end I can just print side by side The Source image and the segmented image yeah I know it looks a little bit creepy but that's because we have several dozens of masks and we don't really know which class they represent so we try to use as many colors as possible so that they are distinguishable from each other we'll explore how many masks exactly we have right now as well as try to understand which part of the image is represented by each of them this is also the perfect opportunity for us to learn Sam's API so like I said before the mask generator returns a list of dicts and every each of those dicts contain a segmentation key we can now use list comprehension to extract segmentation key from each result and display all of them on a single image at the same time we'll also sort those segmentations by the area so we will start from the largest to the smallest what is interesting is we see sort of duplicates in our mask set and that's because Sam expects ambiguity and allows you to pick the right mask so in a real life scenario that most likely means that you need to add some sort of mask post processing and select the right strategy for your use case that means that either you picked the smallest or the largest or you try to merge mask with the largest IOU it's up to you but without any handling you risk having multiple detections describing the same object now let's talk about using points or bounding boxes to pick the area of an image that is most interesting to us and extracting masks related to that area to do it we need to import Sam predictor from segment anything package and once again pass Sam model as an argument we will use that utility in just a second but in the meantime we'll Define our bounding box instead of just hard coding that manually as the python list I decided to use something more interactive so we will run a jupyter notebook widget and now we can just use our Mouse to draw bounding box around the area of the image that is most interesting to us and when I access the bounding boxes property of the widget we can see that our bounding box is here unfortunately that information is not stored in the right data structure we see that we have decked with X Y with height properties and some expect to get for element numpy array where we have X-Men Wyman x max y Max so we need to add few lines of python code to convert the right structure when that is done we can just pass our bounding box through the mask predictor predict method hit shift enter and we get our mask we need to watch out however because mask predictor has different output format then and the previously used automated mask generator previously we got a list of dicts where every dick described a single mask now we get a tuple of three elements and those are masks scores and logins and from those three the first two are the most important for us to handle that we need to use a bit different post-processing but when we hit enter similarly as before we see two images side by side The Source image on the left and the segmented image on the right but interestingly enough this is not the result of just plotting a single mask on the image the model was once again not sure which masks are we mostly interested in so it returned three of them and just allowed us to select the right one now just to make sure that it was not a fluke let's go quickly to the top of that section pick the different area of an image and go through each of those cells examining the result okay so now instead of picking the dock let's pick for example a building in the background scroll a bit lower confirm that we have obtained our bounding box convert the right format pass it through the model and display the result we see that we got a perfect segmentation for our selected building and similarly to the case before the model actually gave us three different segmentations very similar yet different to choose from what I love about that model the most apart from the fact that it works like magic and it's absolutely awesome is the fact that it works super fast it's actually fast enough to run in real time using bounding boxes like that is actually great way to annotate your data super fast and like I said in the intro we are actually working on including Sam into our roboflow annotation tool but in the meantime let me show you how you can convert bounding boxes in your current object detection data set into segmentation masks I was looking for perfect example and decided to go with brain tumor MRI images we can see that those images are annotated with the bounding boxes and now we will try to use sum to convert them into perfect segmentation masks let's do it so first thing first we need to fetch the data set into our python environment we use roboflow login to generate the token we copy the token into our Jupiter notebook paste it in the input field press enter and the data set is being downloaded keep in mind to download it in Coco format if you want to make it compatible with this notebook now we load annotations into the memory and print the list of the classes in this case we only have one class and in a Cell below we see the code that will allow us to convert bounding boxes into the masks it picks the random image load the annotations associated with that image as well as the image from the hard drive create a copy of the image and annotate it with the labels create another copy of the image and run it through Sam predictor exactly like before but this time use our bounding box annotation to pick the region of interest and when we run that code we can see side by side the bounding box annotation and the freshly generated mask prediction as I said every time that we rerun that code we will get a different image from the data set so you can just play a bit execute it through times take a look at different examples and examine how Sam is handling different cases and obviously feel free to use different data sets either yours or coming from roboflow Universe let me know how it did okay people that's all for today playing with Sam is a lot of fun I have a deep feeling that it's not the last video on this channel where we will cover that particular model I already have few super cool ideas for small projects that we can do in upcoming videos but if you have some make sure to let me know in the comments and I'm looking for some inspiration the blog post about Sam is also coming soon maybe by the time that you watch that video it is already there as well as our integration with Sam in roboflow annotation tool is also coming quite soon our engineering team is working very hard on that and in the meantime of course stay tuned for more computer vision content coming to this channel soon like And subscribe my name was Peter and I see you next time bye
Info
Channel: Roboflow
Views: 49,197
Rating: undefined out of 5
Keywords: Segment Anything, Segmentation, Labeling, One-Shot, Detection, SAM, Segment Anything Model, Colab, Tutorial, Meta AI, image segmentation, Python, foundation models, computer vision, zero-shot, promptable, SA-1B, bounding box, object detection
Id: D-D6ZmadzPE
Channel Id: undefined
Length: 12min 55sec (775 seconds)
Published: Tue Apr 11 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.