331 - Fine-tune Segment Anything Model (SAM) using custom data

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone welcome back in this video we are going to look at how we can use our own custom data set where you have images and masks how can we use those to train your own custom segment anything model of course I'm going to talk about what it is what the model is and then we'll jump into the code part and I definitely recommend watching my last two tutorials on the topic of detectron 2. for instance segmentation I am a super big fan of that approach so please go ahead and watch those if you haven't already done so and you can search for videos number 329 and 330. uh if you're looking for those detectron tutorials and for any future ones if you want to be notified I recommend hitting the Subscribe button right now and while you're there if you're feeling extra generous try to find the little thanks button okay now getting back to the topic let's start by looking at or understanding what this Sam is all about and of course I'm doing this video because this has been a buzzword recently right I mean it can detect anything that's that's what the segment anything model stands for now it's an image segmentation model and that's developed by meta AI or Facebook AI if you want to call that and it has been trained on multiple segmentation masks uh over 11 11 billion segmentation masks and these masks are coming from millions of images so millions of images with multiple segmentation masks well 11 billion segmentation masks that's why it's kind of segment anything model now it's designed to take in addition to these images like human prompts you have to supply a prompt and that prompt can be in the form of a point you can say hey here are the points of interest go ahead and find the objects it can be bounding box you can say I want to draw a bounding box or it can even be a text input that will be embedded obviously into some sort of an embedding and provided as an input but this is at a high level what segment anything model does and why do people care about it what are some of the key features it's uh ability to zero short generalize that means it can be used to segment objects that it never ever saw before okay then you may ask the question why do we need to train custom train it on our own data sets we'll see reasons why we should do that in a minute but uh by actually doing a practical like let's upload an image that we plan on actually using uh as our training image later on and then see how the segment and anything model does to get a feel for what it is and then let's go ahead and train the model OKAY Pro I'm assuming this is going to be a long video but I appreciate your patience and uh the prompts you can provide the prompts in a flexible way that's why people like this because uh like I already mentioned it can be a bunch of points it can be a bounding box or it can be text descriptions and also it's very fast so it helps in real-time applications like autonomous driving for example if you want to detect things in real time this can be a pretty good algorithm and of course ambiguity awareness which means you have objects that are kind of overlapping and it's like yeah I know there is another object underneath this object so so it kind of in those situations although detectron also does a pretty good job when it comes to instant segmentation and overlapping objects uh now if you ask me the question of okay given you know detectron and segment anything which one would you use I always go back to detectron because that is uh at least in my point of view that's more appropriate for a scientific image analysis applications uh although I should admit my limited uh you know practical experience with segment anything model So based on that bias I should say I I naturally lean towards titatron it has been more reliable given uh on a couple of data sets that I worked with at least okay between these two Sam and detectron two now I'm not saying stop watching this of course maybe you'll find that this approach is more appropriate for your data set and you get to test it on your data okay let's jump into how or continue the discussion of how it works this is uh the paper I encourage reading this paper I'll leave the link as part of the description but uh the the way it works is if you look at the left hand side right I mean what is going as input into your model well it's one of these segmentation prompts it can be individual points it can be a box it can be like your outline around the object or it can be just a text hey Cat with ears right and an actual image goes in and the model takes that input and then it gives you an output in this case cat with ears and uh this the the input that actually goes is two inputs like we just saw here right so there is a prompt and there is an image and they both go into this lightweight mask decoder and it generates this valid mask now we'll get into one level deeper I'm not gonna go too much deep but a one level deeper so you have an input in this case a scissors right so the and that image gets encoded into a image encoder so and you have an image embedding right there so which is nothing but a vector right so you have an image right here in fact I probably put this in the form of text so it first encodes the image into a high dimensional Vector representation so now you have a vector that represents this image in addition to this we know that we have to we have to supply prompts right it can be points it can be a bounding box it can be text but it doesn't matter that gets converted into a uh another encode encoded Vector so the prompt is another Vector now both of these actually go in they're combined and passed into this mask decoder which output outputs this mask and it gives you a score hey this is a scissors and what is the score this is like I don't know part of it what is the score and so on so now the image encoder what is it it's nothing but a vision Transformer it's a vidh in fact when we train our own we are going to download this model and it's a large language model that has been pre-trained on a lot of images like we already talked about now the question is wait a minute we're talking about images and you're mentioning about language model yes it can be a language model can be used for image analysis tasks but first we have to encode the image into a text representation that's exactly what these guys do here and then the prompt encoder is basically a simple text encoder that converts your input prompt it can be a text it can be points like I already mentioned it converts that into a vector representation and The Mask decoder is basically a Transformer model that predicts the object mask based on these input embeddings now how do we fine-tune it and when do we say fine tune we're not changing weights of any of these we are just adjusting the weights in this mask decoder part so we are fine tuning The Mask decoder and how do we what do what do we need to actually fine tune it well obviously you need to be ready with your own custom images and corresponding masks right The Masks again can be in whatever format but what I'm going to walk you through today is actual images and actual masks that you would have uh typically normally I'm not gonna use some theoretical data set or some data set that you download from somewhere it I mean some uh via the code we are going to physically download a data set from a source that comes with images and masks and we're going to use that and train our own model so this is completely walk a complete walk through that you should be able to reproduce and if not already available bounding boxes for each object because we need some prompt right we need either points or box or text so we'll generate bounding box which which should be pretty straightforward again we'll look at this in the code and basically this is what I'm talking about you have an image and corresponding mask where it shows where in this example where the mitochondria is and then we basically just binary is this image and say hey in this image just put a box around this object yeah uh if you don't have already right in this case we don't have it that's what we are trying to do here now why is it so important to have the prompt obviously because prompt is also part of the inputs but if you actually look at the output with and without the prompts on the top you can see how the probability map is where I haven't supplied any uh any prompts but down here I actually supplied a random grid of points I'm like hey do five by five grade or ten by ten grid so and this is uh how the probability output looks like okay I think this is enough background information I'm sure most of you are eager to jump into the code but before jumping into the code I want you to have another look at segment anything model on their web page okay so when you go to segment hyphen anything.com this is what you're going to see and it kind of summarizes exactly what it is designed to do there you go prompt with interactive points or you can just generate a grid of points or you can do other stuff now let's do that uh firsthand so let's go to demo I'd say something I agree to segment anything terms and conditions of course I want to do that you can use one of their pre-existing images here or you can actually upload uh an image yourself so let's go ahead and upload an image and by the way I should actually start with showing you the type of data set we are going to work with and that data set can be downloaded from here I'll leave the link as part of the description uh or just Google search for uh mitochondria dataset epfl so I absolutely love this data set it comes with a tiff stack Tiff I hope you know what tip images are it's a tip stack of 165 slices 165 images each of that specific size so we'll look at this in a second so I downloaded the trading data set here you can also download the testing if you want so training data set it comes with a 165 images and corresponding masks and these masks are just binary and I just downloaded them and let's uh let me open those so we can have a quick look at this image okay so here it is and now you can see these are images which is basically a tiff stack and what do we mean by tip stack it's just a stack of images and this is in fact a 3D Volume that's stored as a tip stack and this is corresponding through mask right here and the mask is binary you can see when I put my mouse in the background here uh at here at the footer of this image J you should see a value of zero when I move it to this bright region it's 255 so it's a binary image 0 to 55 that's it uh okay now this is enough information let's go back to where we were what we were discussing was the segment anything model so I took one of those images the first image basically of this 165 images and now let's go ahead and upload it and now it got uploaded and it's extracting embeddings now you can see how it's not bad I mean I put my mouse right there it's actually showing the object I put my mouse right here I mean it's not accurate this is why we need to train it right so this is I put my mouse this is perfect that is perfect in fact I can actually add a box so let's go ahead and add a box let's do that around this object and see if it picks it correctly not really right I mean this is the mitochondria but it's actually picking the region surrounding and I believe you can actually draw a box to remove certain area so you can kind of say this is my positive that's your negative but this is exactly what we do when we train it you know why we do that for every object in fact if I draw the same here I'm not sure let's see that's actually uh I mean if I put a smaller bounding box it probably does a good job anyway so that is bounding box as an input what if I just put everything everything is nothing but it puts a grid as you can see on the image so it's actually applying that as a prompt it's applying that as a prompt and again this looks like it's doing an amazing job but not for scientific images this is not accurate for scientific images right I mean you don't want to take some of the background pixels you don't want to include this background around these two mitochondria and wrongly call that mitochondria right so that's not mitochondria that's just some shape up there this is of course and a whole bunch of objects are missed look at these two these are missed that is missed that is missed and this one is not picked up so there's a lot of objects so upon first look because it is colorful we think that wow this is doing an amazing job but there is a lot of customization that we can use you know to get this thing more accurate so that is exactly our goal how can uh because I don't care about any other objects I only care about mitochondria in this case right so why not just fine tune it so it is good at detecting mitochondria and nothing else okay so now we are finally ready to jump into the code and here it is in Google collab I used to do it on my own uh you know using spider IDE on my desktop but uh to make this accessible to everyone because Google collab is accessible to everyone for free I started to do more and more of my videos on Google collab so it's easy for you to reproduce copy my notebook and all that kind of stuff okay so first thing first let's look at the runtime and make sure we are using a uh GPU okay and of course you have to pay for any of these Advanced gpus but I'm okay with that GPU and let's go ahead and connect the runtime I was trying this a few hours ago and it said I reached my daily limit I'm like okay so I hope it connects right now looks like it is connecting but that if if you want good access obviously go ahead and pay now I have already connected my uh notebook to Google my Google Drive right there otherwise go ahead and mount the drive so you can uh store your data you can access your data that's exactly what I've done I downloaded these two mitochondria data sets I mean this data set with images and masks and I placed it uh in my drive so they are accessible now they are Tiff Stacks remember so you cannot just use opencv to open this tip stack because opencv it reads one image at a time we'll get to that in a second I added up whatever the presentation I showed you the text up here now one thing I should mention here is the link to the data set here is the link to the original code that I shamelessly copied from but I heavily adapted it to this specific application this code original code actually uses some uh data set that you need to import using a line of code it's not like it's difficult to do that it's just that it may not be relatable to you because oftentimes you have raw data images and masks how do you deal with that that's exactly what I am trying to go through but again this is this is the original code link if you want to go ahead and look at it and I I did heavily borrow from there okay now with that information let's go ahead and start installing segment anything and this is basically installing it from their git so you are going to directly install from their git same with Transformers now the implementation that I'm gonna use is from I believe a hugging face where are we importing these uh we'll get to that when we get there okay but go ahead and install Transformers and go ahead and install the segment anything and uh also install data sets and within this data sets you can actually access the original data set that this tutorial was referring to but uh we'll prepare our data into this data sets format so go ahead and download that now also in this one thing I like about how this guy uh or I should say this person I don't know if that's a guy or girl but uh how this person actually did the code I mean update uh the default code was by using custom loss functions right so go ahead and play with a few different custom loss functions but anyway if you want to do that on AI is a good Library so that's why we are installing it and finally I am installing patchify library because we are going to work with large images and we can cut them down into smaller images one way is you can write your own custom code patchify it allows you to just write a single line of code to divide large image into regular grid patches so I'm doing a lot of talking so let me just go and do some execution of code here and once this is done we are going to go to the next step which is basically importing the the appropriate libraries of course numpy you know Pi plot you know for plotting and Tiff file is uh the library that I use to import these Tiff Stacks you can try scikit image because psychic image to open the tip files it uses the same library in the background so you can try scikit image but I like tip file Library so for scientific images so why not use it and import Os patchify from patchify we are going to import patchify obviously only to handle large images where we break it down into smaller ones and import random why because later on we'll just randomly show certain images and you know display images and masks to make sure it's a sanity check so that's why random and and the image for sci-fi so all the libraries you should be familiar with okay even after all that talking it's still installing but it's now doing patchify so okay there you go we're done now let's go ahead and run import our libraries there you go we got all our libraries imported now let's load our data into numpy arrays yeah I'm using Tiff file.m read and I'm using uh loading both images and masks and remember we have 165 relatively large image not too large but large enough uh to not fit into our memory when we want to do training on a bunch of images using the free resources okay so now we have the large images and large masks uh in fact let's go ahead and add a line of code to see what the shape of these large images is yeah so let's go ahead and run this line and we have 165 images each image 768 by 1024 in size okay so same with the masks now let's go ahead and divide these large images into smaller patches for training and let's do 256 by 256. so I'm uh I'm defining patch sizes 256 step size is 256 step size is basically ux slack pad size 256 and then move to 56 and then move to 56. if you do less than 256 that means there is an overlap between patches not a bad idea if you know how to deal with the probabilities of objects within the overlapping region so check a tutorial that I have done in the past on the topic of uh uh smooth blending or you know smooth blending of predictions okay so now let's go ahead and run this line and now I all I'm doing is I'm for each of these patches like of images we have 165 of those right for each of those I'm dividing them into a patch of 256 by 256 with a step size of 256 and I'm repeating that that's it and that's for images and I'm doing exactly the same for masks so at the end of this I should have a whole bunch of patches that's pretty fast you see now let's go ahead and look at the shape of our images and it's 1980 by 256 by 256. okay so our 165 images of 768 by 1024 are now 1980 images of size 256 by 256 same with the masks okay now let's you can go ahead and proceed right here but in my specific example in these images a patch may actually have null objects in it which means it may be all values zero may actually run into some error where it says the last function probably throws an error saying that hey you have an empty tensor I don't know what to do with it so why use those for training so this next lines of code is basically hey go ahead and see if the if the pixel value is zero or you know uh in the entire mask if it is then go ahead and drop it if not I want to use it so remember we have 1980 of these images now when you run this you'll end up with 1642 of those images okay so about 300 images are kind of dropped because they're all dark I'm dropping both images and masks at the same time again optional step only if you work with this specific data set again I recommend not doing this go ahead and run start the training and then see what the error looks like because there is a lot to learn from these errors okay now we are ready to create a data set because why do we need to create a data set because it serves as an input for our images and masks and what do we mean by data set it's basically a dictionary so we are converting this uh our images or adding these images and labels into this data set so let's go ahead and run it and once you have it you can see how the data set I mean I'm leaving the outputs here from my last test or last run so you can see exactly what's happening so my data set I have let's go ahead and run it it's a dictionary where you can see my features or images and labels and number of rows is 1642 because we have that many images end masks now let's make sure our images and masks are loading appropriately which means I'm retrieving random image and label from my data set that's exactly why we need that random library and then I'm plotting them so let's run it and then look at the image so there you go that's my image that's my mask perfect this is a sanity check let's run this one more time I spend a lot of time right here because this is what goes into your training and sometimes if your images are not lined up for whatever reason you may have done some simple uh error up here uh but this is where you catch it right so okay now that we're convinced everything is working fine let's go ahead and extract bounding boxes because so far what we have we have an image and a mask like here but the input that our algorithm actually Sam takes is like a bounding box or a bunch of points or text input although the text input I don't think they made that public yet the API so we can work with bounding boxes for now and individual points uh no point in getting individual points here bounding box is easy to extract from this mask and how we do that is just you know go ahead and look at where the ground truth map is not equal to zero because it's a binary mask and go ahead and Define the bounding boxes right there so again this part of the code directly gotten from the original reference that I mentioned and that's the function now let's go ahead and Define the class to create a data set again that serves as inputs that serves as an input to our masks images and masks and let's go ahead and run this because when we fetch this get item it's going to give us images and ground truth masks okay so uh and the prompt is basically a bounding box right so it's actually uh generating it's taking my input it's taking my ground truth mask and it's also giving me the bounding box around it because it's invoking this function great bounding box and it's actually giving us the bounding box so basically it returns something called input right there where you have your image and the ground root mask that's it okay now uh initialize the processor let's go ahead and initialize the Sam processor again from our Transformers Library we have something called Sam processor and this Sam processor we are initializing it on our pre-trained and what model are we using this bit based model is what we're trying to use and this is exactly what we are trying to fine tune uh in a minute when we load when we load all the weights and and continue our training or fine tuning okay now create an instance of our Sam data set so now we have the Sam data set where is it where is my Sam data set did I import uh yeah right there we have the Sam dataset class right there so let's call it and from SRAM data set class we are going to instantiate that using our data set and our processor what is the processor again our processor is this Sam processor segment anything model processor and these are the two inputs that actually goes into this Sam data set so let's actually Define that for our training now you can actually if you're curious what what the heck is this Sam data set now you can actually look at let's look at the zeroth one right there so you can see 1024 by 1024 uh the thought size and the ground truth mask is 256 by 256 and input boxes we have like four coordinates of the input boxes right there and so on so this is uh this is basically our train data set and now let's define our data loader I know I'm just reading the code right there but these are all one step after the other right in a methodical way we are going from defining your okay first let's get the boxes and then let's create this data set where we can actually get this item that gives us our inputs and that's exactly what we're doing down here where it takes the data set and the processor as input and what processor we are using the Sam processor right there which is nothing but it's taking this pre-trained model and now let's go ahead and Define the data loader and data loader takes the data set and what batch size because it's giving us how many number of images whatever we request right here do you want to shuffle it and I added this drop last because I was looking at I was I was encountering this error that I mentioned to you earlier where it says the tensor uh some weird I should have noted on what exactly the error was where completely dark mask gets loaded and it says that something about tensor I'm like okay what's going on is it like the last one in the batch that's creating an issue so I said okay drop the last batch or something anyway that's why I added in case you wonder why I have it uh now let's go ahead and make sure that we are actually getting like each batch so let's run that and pixel values there you go that too represents each batch right so in each batch we have two images right there in each batch we have two masks right here so we are all good and again if you want to look at the shape we already have the shape of mask right here that's exactly what I'm printing down here no point in running that code and now we have to load the model okay so my model is Sam model from pretrained exactly Sam vit base is my model okay and uh anything else [Music] it's it's gonna download it right now it's 375 megabytes right there okay and uh from uh for this specific model we are going to update the loss function that we're going to use and of course the optimizer is going to be atom so we are directly importing it from torch and for the loss function mon AI has a few of these uh dice loss is what we are using but also try the focal loss or dice vocal loss if you're working with multi-class especially so let's go ahead and run these lines and now we are all set to train our model again how do we do that I just set number of epochs equals to one because I've already trained it on 10 epochs but I want you to see how this training is on one Epoch and again our device is Cuda if it's available and don't forget like whenever you do this I'm pretty sure you know this if you're working if you if you're familiar with a torch you have to push this into whatever device you use you have to push your data where is my uh sorry trying to find another line where you kind of push this into did I already do that because sometimes here is what's going to happen you don't push your data set into GPU but then your model is in GPU and then it says like hey again it comes with some cryptic error right it's not in GPU so that basically then you go back and you make sure that okay my model is actually available as uh in the GPU so that's exactly what we are trying to do uh oh looking all over OKAY model 2 device this is where we are pushing the model into GPU and then our data is also available there and let's go ahead and run these lines it should tell us exactly how long it takes and I've done this like I said I believe for 10 epochs let's see yeah uh 10 epochs and I saved it so we are going to load that in a minute but I want to make sure you see that the training successfully gets started and it should tell us approximate time right there approximate time is shown right here right there okay so yeah that's that's uh it would take uh for 10 epochs uh about probably an hour or so because for each Epoch well longer than our each Epoch is 12 minutes so about two hours if you're trying to do this by 10 epox so let's not do that right now of course I have already done that right before recording this tutorial so let's go ahead and kill this I saved it and now you have a trained model on your own data now let's go ahead and open it and try to see how it actually does on our data so first thing first if you can create this part of the code you can you know separate this into inference code so we are importing some model Sam config Sam processor from Transformers and torch and now we are going to load the model configuration and once you load the model configuration you kind of also Define the processor right there and then my I'm defining my model as my Mito model what is it it is nothing but my Sam model Sam model and config is coming from here okay it's just a configuration remember that's not the uh weights the weights that we want to use are uh uploaded from here torch.load and these are the weights okay so let's go ahead and do that after running this or my Mito model should be ready uh for uh segmentation uh now the next set of lines while it's doing that uh the next set of lines again we do find our device as scooter device which because we have it right here and I'm pushing my model to that specific device right so my Mito model and you can go ahead and check this you can read this and see exactly what it's trying to say there okay so now that we have our model ready let's go ahead and segment our images how do we do that well all these libraries you should know that but first thing first I'm picking a random image from my train data set first thing first we are working on the train data set and then I'll show you how to load a completely new image and then do something with it so first thing first let's get a random image a test image and then ground truth mask we have a ground truth mask right because this is training data we have a ground root mask and what ground truth mask is not you know the input the input is going to be a prompt which is in our case a bounding box so we are using the function get grounding box the one that we defined earlier and we are extracting the bonding box that is going to be our prompt and our inputs are going to be our test image the raw image that we have and the bounding boxes right so the bounding boxes is nothing but our prompt remember you have to give it as a list of a list as input you'll find that when you run it and it says something and then you'll figure it out but anyway as long as you remember its list of lists right there you're good and uh yeah and everything else is my Mito model dot eval again if you're using torch or if you're a torch person you know exactly uh how to instantiate this uh and then go ahead and get your outputs by applying my Mito model onto your inputs and then once you get that you we are applying the sigmoid because your you need to convert that into a probability map and this is your probability map now to the probability map we want to set some sort of a threshold because probability is continuous between zero to one you need to set a threshold to binarize your image and in this case we are setting the threshold to 0.5 play with this this is another important parameter to work with and bottom part is basically plotting the probability map the mask and the image so let's do this for a few images remember the inputs are image and prompt prompt as a bounding box okay now let's go down here what is the result and there you go you see a mitochondria right here you see a mitochondria right there and this is the probability map that's a pretty good probability map and there you go the binarized objects let's do this a couple more times and make sure everything looks fine yeah you see around the edges how it thinks there is something down there that's why it's very useful if you have uh smooth blending you know where instead of 256 patch 256 patch with 256 size you know step size if you overlap like maybe I don't know uh 32 pixels 64 pixel overlap within that region now you have continuity because right there it sees something dark right there and it's like oh maybe this is an object but if you have the rest of the image right here it has context to interpret how that probability is right so then you will minimize a whole bunch of these spurious ones around the edges okay now let's do this one more time and move on to the next image so there you go I I am pretty happy with this model again I only trained it for 10 epochs training it for 100 epochs this is why I like the detectron because detectron is super fast uh a training even with small data sets Okay now apply a trained model on a large image now how do you apply this on to a large image we are going to break the large image into patches and we apply it but the you'll see uh I think we'll see the problem are we gonna uh we'll we'll see the problem if we put all the patches together or I'll leave that to you like unpatchify is something that you can actually do if you are putting all the images back together and see how it the the segmentation doesn't look that continuous around the boundaries but I leave that exercise to you but let's uh since I only have uh a large image this is a different set of images this is only 12 images that I have and I am going to load that uh okay so what's going on here no such file or directory 12 training images so apparently I just wrote the code I haven't tested this part so I'm gonna pause right now upload this 12 training Mito images uh and uh I'll continue the video sorry about that guys okay so this part uh is going to be interesting because I wrote the code I haven't tested it yet so let's go ahead and test it uh live so uh yeah I uploaded my data set of only 12 images and uh these are 12 large meaning 768 by 1024 I'm using patchify to bring it down into break them down into 256 by 256. now I've written a lot of text here so go ahead and read that because now this is unknown image I don't have ground root this is a new image now how do you enter prompts how do you put bounding boxes around it I don't know uh the ground truth so why not just apply a grid of 10 by 10 or whatever size you want to do it on to the image just like we have seen it earlier if I go back to uh if I can find my mouse yeah you see how we put like everything a grid of data points that's exactly we need to generate and go ahead and apply that onto our input image so that's that's what this entire thing talks about here because you can input points right there which means you have to provide the points in this format and uh blah blah blah whatever we'll we'll uh look at the code and we'll understand that a bit better so our array size is Define the size of the error 256 right 256 by 256 and my grid size is 10 I want to do 10 by 10 grid it's up to you how many ever you want to put and then I'm generating like uh the grid of points X and Y and grid of coordinates so this is our grid of coordinates in our two Dimension and convert those into lists and we need to combine those into a list of lists remember we are we need to provide this as a list of a list that's why we are converting that as a list of a list and then we have our input points as a torch tensor ready to be provided as an input you just copy the code you should be you should be all set I don't think this part is all of this part is uh new this is not from the earlier so Source I mentioned this is this came out of my experimentation on new images right so there you go so let's go ahead and generate the grid of 10 by 10 points right here and now let's go ahead and print out the shape right I mean we should have a hundred data points and the shape is one by one by 100 by 2 why is it one by one by hundred by two because you need to have the batch size number of images you want to process at once in this case I'm just providing one image uh that is my batch size and then uh Point batch size the number of Point sets you can give multiple Point sets hey here is my point set number one point set number two but in our case we generated one set of points that's why this is one right there how many points 100 points and this last two stands for something that uh that they Define here that's why you see how it says in the documentation add a tool right there that's because we are doing X and Y uh you know for vertical and horizontal coordinates it's not a 3D grid of data points that's why we have two right there so our points are in the right shape now let's look at our patches our patches are in the shape of three by four by 256 by 256 because I divided my images into uh my 12 images into this shape okay now let's go ahead and select a random patch for segmentation and I'm just I mean you can select a random patch or pick a specific patch I mean this is nothing but selecting a random I and J but now I'm like uh let's just Define it one two because I want to study it by changing things you know so it I don't want to work with random uh okay so that is my random uh patch because I'm extracting I keep losing my mouse okay the first and second uh right there that's what I'm doing right here and let's go ahead and get that patch and I'm defining that as my single patch I'm converting that into a pillow again this image is a pillow object yeah so that's what I'm doing right here and first try without providing any prompt no bounding box or inputs so my input right here is a processor and remember the processor takes multiple inputs image as an input and the prompt as an input right in this case I'm not giving any prompt in the next case we will provide our input points as prompt but first let's not give any prompt as input and go ahead and evaluate the image so rest of the code is very similar to what we saw earlier which is uh you apply a sigmoid which means you get a probability and then uh the the you apply a threshold to convert that into binary and we're going to plot it okay a lot of talking let's just look at some results now so there you go this is exactly what I showed you earlier you have an image obviously a mitochondria right here a mitochondria right here a mitochondria right here and the probability should have been higher in these regions for these objects and compared to the background but I kind of see like some higher probability right here all of these probabilities is I I believe less than 0.1 or something so very low probability go ahead and print it out if you want but now let's go back and provide a uh where is it where is my yeah let's uh go ahead and comment this part out and activate this part which basically means we are providing our input points the 10 by 10 grid that we just generated like one by one by one hundred by two right there as my input and let's go ahead and run I'm not changing anything else I'm just providing the prompt right now that's it and let's look at the result this is a much better result as you can see right I mean in the top you see higher probabilities and this is higher probability for mitochondria down here and you can obviously set play with the threshold to actually get some of these objects you can actually play with cleaning up these objects uh you know by using morphological filter or any of the other tricks but I hope you appreciate how uh how amazing this algorithm is and at the end of this I again use the term detectron 2 because that is my favorite way of segmenting images compared to any other approach but segment anything model if you figure it out how to use it to power up your annotation for example so this is where I see segment anything model to be more powerful because it's faster it's uh it can be used in real time so you're trying to annotate certain objects you're like hey I annotated six objects show me all the remaining 500 objects in this image because I want to use this as annotations for those applications Sam can be pretty incredible uh I I really hope you found this tutorial to be useful and again as usual I encourage you to hit the Subscribe button let's meet in the next video bye

Info

Channel: DigitalSreeni

Views: 30,631

Rating: undefined out of 5

Keywords: microscopy, python, image processing

Id: 83tnWs_YBRQ

Channel Id: undefined

Length: 44min 7sec (2647 seconds)

Published: Wed Sep 06 2023