Using Machine Learning with Detectron2

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone my name is Kimmy Williams and I'm a developer advocate for Facebook open source have you ever asked yourself hey I want to get into machine learning how do I get started what do I need to know first and then after doing a quick internet search you start to ask yourself what are neural networks reading further what is gradient descent what is training bias what is back propagation after asking yourself all these questions you might then think should I just give up and go get a PhD in machine learning and with ml it is pretty easy to fall down that rabbit hole there are a lot of great community resources that can help you learn what neural networks are learn about artificial intelligence all of the algorithms behind that but that's not what this video is this video is for the programmer who looks at tech Doc's or read knees and skips all the words just copies code blocks and hits run hoping that it will work this video is for the person who just likes to experiment learn something new and then move on to the next school thing we're in luck because there are fortunately a lot of community resources out there that give you datasets pre-trained models and more at your disposal so that way integrating ml into your application is actually a breeze so I'm gonna showcase how to find those tools and libraries and use one of them to make a fun little hack this is intended to be an experiment I haven't actually done any of the pre coding for this so fingers crossed that it actually works but I want to show you that it can actually be easy to integrate these highly technical concepts that you might not be familiar with into your application so what am I gonna make today I have something in my head essentially I want to make an I spy app kind of where I'm going to give my computer an image and then I'm going to say I am thinking of something that is green and the computer will come back with different options based from what is in the image that are green I'll say yes or no and then hopefully eventually it will find what I am thinking of so if that sounds interesting to you let's get started programming I'm gonna be using PI torch for this hack if you'll notice I tried to match my hat with the website that was as close as I could get but on PI Gorge we have this cool section of the website called the eco system now the ecosystem is essentially a showcase of those community resources you can find more projects that are made by community members in the pipe wrench forums sometimes there are blogs about them but for now I'm going to just show you the ecosystem so there are different models that you can discover publish reuse models are neural networks it's just another word for neural networks all of this is to say if you don't want to write your own neural network or if you don't want to curate your own data set and write a model for it you can use this to discover different ones so they have many and this list is growing so if you're interested in starting from here take a look at this page the other part of the ecosystem is the tools and libraries section these tools and libraries are here to support accelerate and explore AI development some of these tools are useful if you are building your neural network some are useful if you are pushing your neural network to production it's a plethora of different projects that help you at various points in your ml development I highly recommend taking a look at all of these and just reading through if anything to discover something new the one that I'm going to use today though is detect Ron to now detect Ron - is Facebook a eyes research next-generation software system that implements state-of-the-art object detection algorithms it is a ground-up rewrite of the previous version detect Ron and originates from this thing um so all of that is to mean if you're a visual learner like me detect run - does this so basically you can give it an image it detects the objects in the image and what's nice about it is that it not only gives you the objects the accuracy of the prediction but it also gives you the bounding box and the mask of the object so I can get the exact line of the object detection which is cool without reading all about detect run - I'm just going to jump right into this quick start collab notebook collab if you're unfamiliar it's a service by Google it's basically a runnable notebook where you can run pi on it's very similar to Jupiter notebooks I mean using runnable notebooks it's pretty standard in ml development especially on the research side so I'm just gonna click on the QuickStart it is the beginner tutorial it shows you how to get started with the Techtron - gives us a sample image and then the detection of objects in that sample image there are other parts of the tutorial that might be interesting to you you can see that they use custom datasets with balloons what I want to do is make a copy of this notebook so that way I can just have it for a reference but I'll do my actual hacking in the notebook okay so now I have my copy I'm just going to rename this my spy hack we'll just call it that for now okay so one of the great things about collab is that you can change the runtime I always make sure to look at the runtime before I run any of my ML code right now we're on hardware accelerated GPU if you're on a laptop chances are you have CPU you don't know what CPU GPU where TPU is a lot of it just comes down to speed of running code CPU stands for central processing unit GPU stands for graphics processing unit TPU stands for tensor processing unit I almost always run on GPU sometimes I run on TPU but GPU is fine for what we're going to do we're not doing any real heavy lifting here I'm just going to delete the stuff that I know I don't need which is a lot of the text and what's nice about using a runnable notebook is you can run all of these code blocks independently you can move them around it really helps with experimenting because you can take a look at things that you've previously done you can easily replace or find things and see the output as the code run so it's great for debugging for obvious reasons for now I'll just show you how this first section runs so we're going to just hit the installs first this is installing torch torch vision detector on to for me it takes a minute so I'm going to pause the video and then come back when it's finished and drawling okay so mine just finished you can see we have all the outputs here just to clean up my window I'm gonna just delete those look like but these have been run and you can tell that they've been run because of the one and two next to them that just shows that we ran it so then we have our imports so we ran those imports next we are going to get an image from the cocoa dataset we're utilizing the cocoa dataset in the sample if you run a read more up on it it's essentially a large data set that helps with object detection next we're going to be running some detector on to code here and ultimately you can see we end up with a default predictor now again it's up to you to read up on this but I do think that it is important to reference this part of the detect run to tech Doc's so under Quick Start you can see they have a link to the documentation it creates a simple antenna and predictor with the given config that runs on a single device for a single input image so it essentially gives us the prediction for the input image and once we call it it gives us an output of the model for that single image okay so we can see that we're doing that here we're creating predictor based upon our config and then we're getting the outputs from that predictor now what does the outputs look like we have a link here it's also linked in the tech Doc's under C use models but the model gives us all of this information so the probably most prevalent ones are in instances this is where we can get the information for the bounding box the classes which is like the labels what the object actually is labeled as so that person that the umbrella whatever the mask so that is the specific outline of the image and then key points what does that actually mean I'm quickly going to print what output instance predicted classes looks like you can see it just gives us an array of numbers now these numbers represent indexes of those labels how do we get those labels you can see in the visualizer it's linked here I did a little pre-reading on the docks to figure out how but it is in this metadata catalog for me to access what each of these indexes mean you can see that we're in a tensor I'm just gonna do four data in this output predicted classes so I want access to these tensors really quick so I'll just show you how to do that again this is me just knowing a little bit about tensors but the operations performed on tensors are very self-explanatory so doing tensor at zero you can see we get seventeen in a tensor still and then to get that actual integer you just do dot I Tom and now you can see we have seventeen in this four loop I'm just going to do four data and outputs dik-dik classes I'm going to set the number equal to data item and then I'm going to print according to that metadata catalog that at num so now you'll be able to see by running this oops Oh metadata does not support indexing wait oh okay got it so first let me again just quickly print this out so you can see what it looks like I think I have to do dot train classes or think classes yeah so the thing classes again gives us all of the categories so I have to do metadata catalog at thing and classes at num and now that should work yeah okay great so you can see in this image we have a horse a person a person a person person-- umbrella and so on so now we have all of these objects which is great we can see it's visualized here just by hitting run this will take a minute but we have the bounding boxes we have the masks we're doing a great job and we can use all this information to our benefit this is pretty much all of the ML side of things that I want to do so like I said I'm going to write my hat in the collab and again I want to be able to input a color let's say I Spy something Brown the computer to take a look at all these objects to understand what the overwhelming color is in that object bounding box and then return back to me a guess of okay in this image the horse is brown and you know the ground is brown but in this case maybe it will give me back horse and then I'll say if that's right or wrong if it's right then great we won if it's wrong then it'll give me another guess hopefully that all makes sense I'm not gonna make like a web or mobile app front-end for this I'm just going to do it in the collab for sake of time but if you're interested in seeing me connect a front-end to this code just leave a comment down below and maybe I'll make a part two to this video I'm going to just quickly delete these outputs for my own visibility so I'm going to create a function that inputs let's say an image and a color as a string the computer outputs a guess which is the string name of the object that matches that color and then just kind of a little bit of logic behind this I'll have the color equate to an RGB let's say I'll have to have some kind of generalization of colors so it let's say something is like light green or dark green just have that all match to green I'm going to input image detect objects and masks then with the mask I am going to generalize color so image and math generalize the color match the color to the generalization of colors up here then see if there are any matches from the input color to the colors available that we found if yes output the object name and image with masks around that single object if no output all object instances and image and say I don't know give me another color how about so now let's just start from here maybe we'll move forward and create like that back and forth dialogue yes that's this no it's not this but for now let's just start with this function and see how long it takes to just do this now I'm gonna create a function we'll just call it spy oops pay spy the color will input the color which is our string and then the image okay so now this is where some stellar grade-a programming comes in and I'm going to do a Google search on if and have this logic already exists because hey if I don't have to reinvent the wheel that's awesome one of the many many benefits of open source so I know that in Python you can use PIL to help you you know find that color generalization I just want to see if I can then take that RGB that I get and then also have it matched to say like a string of green or red or magenta or whatever maybe like web safe colors so let's just do a quick full search on that okay so I'm back I managed to get a solution but the answers I got were from this github user who created a solution about eight years ago to get the color of an image and then also this stack overflow answer which is also eight years old that uses this package called web colors so installed web colors I'm using it web colors has a calm era where if your RGB doesn't match to a website cover it throws an error website color it throws an error so ultimately I had to utilize a function to match the RGB color from the list of colors to website color and then give me my output so right now I can do this I just returned that list of colors along with which one is the most dominant so in my current input image we can see that the most dominant color here is dim gray and the least dominant color is silver which you know those are kind of similar but it's fine because I'm going to keep going this is again getting the entire image ultimately I want to get the dominant color in that mask again what we were able to do is first get the color match and generalization of colors now we want to do it on now we want to use detect on to to detect the objects then pull the masks out of the objects perform this logic on each one map each mask object to a color and then output the label of that mask object sounds like a lot but I think we're kind of you know getting to the easier part hopefully fingers crossed because we're going to be using detect Ron to right now copy in our detector onto logic so at this point I'm just going to use this code a little bit more explicitly let me say that this is my color logic and I'll just move it down there for now but above that I'll do my detect Ron to logic so we have the predictor and then my outputs for my detect Ron to image now I want to get those mass I'm converting them here to numpy arrays just so they're a little bit numpy sorry just so they're a little bit easier to iterate over and then I'm also gonna get the bounding boxes so because we have multiple masks multiple boxes I'm going to split these out into some helper functions so that way I don't have to worry about writing loops and kind of keeping track of my context here the first helper function that I'm going to write is to get that image inside the mask I'll take in the mask in the box and then I need to take in the detect Ron to image so I'll do original image we'll call it so first I need to get the height and width of the mask to do that I'm just gonna quickly peep into what our outputs look like to provide a little bit of clarity here so I'll just do the first index of the mask and the first index of the boxes okay so the mask is just going to give us like falses and truths to where the pixels in the larger image are true for the mask and then this box it looks like it's just giving us some coordinates here so so it looks like the coordinates are probably the upper left and the lower right that that's my assumption I'm going to quickly check the tech box to make sure that's right cool yeah so that was the case so essentially I am going to get the mask height and the max width amass with first from those bounding box coordinates okay so now I have the true dimensions of the box not just based on the main image but based on it itself so from here you know that's going to be like again 459 minus 1:26 to give me the width and then 480 minus 244 to give me the height and I'm getting this so ultimately I can create this temp image of what is inside the mask and then get the main color from that temp image and I'm going to create a numpy array filled with zeros according to this height and width so I can populate what of that box is mask versus is not mask so now at this point we should have a box that has the shape of the mask filled in black and white just running the code in the helper function real quick but I've already written you can see now we have this outline I'll quickly print what mask looks like so you can see that it's just a bunch of zeros and ones and it gives us this outline of an object I can also change it so let's say I want the tenth one I think there are ten objects here there might not be yeah so it looks like it might be an umbrella and also just a quick side note I'm using matplotlib in order to give me this image this visualization I imported it up here now I want to get all of the pixels that are white in this mask to now equal the RGB colors from the original image actually also point out that I renamed these variables so that way they were a little bit more intuitive to height and width okay so now I want to be able to get the proper colors in this white area but I also want to set black as transparent just so I don't get black as like the the dominant color and we will create a multi-dimensional in them pie array and the value at the height width coordinate is going to be an array of size for meaning rgba so it'll the a will tell us if it is transparent or not now I'm going to iterate over the height and width from this hump mask filled with the ants giving us the zero is in one's height index height is it black or white with index with black or white if the width B W is equal to zero zero zero zero zero meaning it'll be just transparent black otherwise otherwise it was a one so I want to take my original image which I put up here and get the pixel value at this spot in the original image so I'm going to need to do a little bit of index manipulation there so we have the original height we have the original width now get the RGB original RGB a and then append of one at the end and now with all of that this temp mask fill is going to equal original RGB a ok great now let's see what this looks like if we just do in show this temp of mask well I don't think it'll show up just give me the 0 through 3 indices there so with this fine height is fine but then here it give me 0 to 3 show that and great so now we have this outline of the man okay and then for this helper function I'm just going to return this attempt mask fill so now I should just be able to do so now our helper function works we have this image that we created so now in my main application if you want to call it that I just have to call that function and then ultimately we'll create an array of masks masks the red dot now we have the input image we need to we got the image in the mask now we need to generalize the color in that image again for ease of reading I quickly I cut and I moved my helper functions up here I'm going to convert these into helper functions as well this identifying the most dominant color so now we have helper functions for our color logic our mask logic and we can call it all in the spy the color I can either save that image output from this masked logic locally or can try and manipulate this to identify the colors I think the latter will be easier that kind of renders this logic a little bit useless because we have all of those you know RGB a colors now so really all I need to use is this closest color function the the problem is going to kind of be you know how do i how do i generalize the colors i think i will attempt to do that now and we'll see what we come up with okay so it's been a while but I managed to write a function that detects every single color in the image and it's not going well as you can see you know it outputs the color and then the count and there are basically thousands if not potentially millions of unique colors in this image so this isn't the right approach but I want to show you how tight I want to show you how I did it regardless so use dump unique axes zero essentially means again the row and column so over each column find all the unique colors then I have this track unique colors helper function where we dive into the unique colors if it is our transparent color ignore it otherwise see if it already exists in the colors that we're tracking if it already exists then just update the count of that color which happens here otherwise append it to append it as a new color with its count and then return so like I said it's it's pretty much all unique colors so I have to find a way just so that way I don't run out of space and it goes faster to somehow generalize the colors first before I start to track these unique colors it was a relatively easy fine um image quantize this function converts image P mode with the specified number of colors so after messing with image quantize I figured that I actually do need to save this mask of the image that we were able to get and I can use that pillow quantize so I saved it just as Geeks this was just the variable from the tutorial but you can see the original image here and then when I open Geeks you can see the quantized image which essentially means that the colors are more combined you know the manipulation with the transparent color unfortunately I can't really use it because this image overlay only works with cv2 so what only to do is to go back to my helper function up here do that I'll write this as the temp before one time and then we can do temp after Tai's we save it will return quantized put this around it for extra measure i'll just have this return right now just print this out put great let's see that looks like yay great okay awesome okay so now we have this quantized image let's see if my tracking unique colors logic works on it actually this should be much run hopefully let's just see yeah wow that was a lot faster currently as it stands prints out every iteration of all colors but if I just move that out to here colors there should be only a total of ten great you can see that the instance of black is largest let's move this helper function up and then we we should really quickly create another helper function here at all colors Wow I'm so pleased that that is so much faster you don't even know I was waiting for a really long time for that other helper function to run so then here we just returned all colors just to make sure that this works print get all colors the inputs here are the RGB a lists a print colors so let's set this equal two colors in image really quick and then this goes for each color and count okay and then we'll we'll print the list colors an image let's see if this works oh okay so I need to convert this numpy array to just be the RGB string in an RGB format too no that's right perfect lavender black dark gray I see red but whatever let's just keep going and then we can debug so I'll just redo this function colors an image so we'll take in the RGB a list return list colors and images what have we done we generalized at the color we matched the color to a generalization of colors now we have to see if there are any matches from the impact color to all of the colors available that we found we need to get to the most popular color which this list isn't sorted okay so I managed to do this based on a sorting algorithm I found here this essentially short sorts my list in ascending order according to the second index of the tuple and then black is almost always the largest color because again of that mask so I just pop the last index if it is black and then return and then I ran it down here and it works so this means that our most major color in this image is that medium slate blue which I actually think it's probably red I just have an error somewhere but it's fine now we're each mask that appears I want to get a list of all of the major colors okay so now we should have a list of all of our dominant colors let's just run that and test so this will give us a list of all of the dominant colors of the masks too many values okay so we need to numerate simple easy errors here medium the slate blue is pretty popular now is the time where we need to do some debugging so red is too five zero zero it's kind of hard to tell what here matches that color I'm I'm worried that they might be backwards because like yeah I think they're backwards because this color backward matches here just like a burgundy so I'm just going to do two and zero and let's see black Browns like gray dim gray rosy Brown antique white dark slate gray Indian red tomato that sounds right okay great now let's run this by the color logic and we have all of our top colors so it's a lot of gray yeah it's a lot of great but we do have a tomato for one of them so now I want to go through this list of dominant colors and if the dominant color matches this then return the object that we think it possibly could be we'll call it selected item and then the selected item index color let's see color and count if color is equal to the who call this requested color how about then selected item and we need to get the index of this tuple which I've done before let me just find it really quick I believe that that was up here yeah if the tuple is the requested color and we want these indices so then the selected item index is equal to indices at zero here will do if selected item is equal to nothing like I know is going to equal the metadata catalog dot training setting classes get this selected item index from the classes you in class is equal to great and then that should give us an integer else return ok let's see what this looks like so I'm going to spy the tomato I Spy something of the color tomato is it the tomato person yep it is ok so now we just need to show the image of it outlining that person it's been a minute I've been reading the visualizer Docs I couldn't figure out the mask to select one image but I did figure it out for boxes and labels so with spy the color if you're searching for tomato in this image and you hit run puts a square around with the thing it thinks it is and it says is it the tomato person and the way I did this was I just used this overlay instances function I selected the box of the output at the index that we found and then I added the label to it as well so that's essentially it we were able to create our own spy game according to a color using detect run to the colors are a little wonky but you know that would that would just take a little bit of cleanup work here we got colors to match to RGB generalization of colors on the to side we had the input image detect objects and masks get the image of the mass generalize the color of the mask match the color to a generalization of colors that we made before see if there are any matches if yes we output the name and image with the mask around the single object we can do that we ultimately did the bounding box and then if no I'll put all object instances and say I don't know give me another color we ultimately said that color was not found and that's basically it and we were able to do all of this logic in our Google collab with the tech drum - ok yes so that was a lot of fun it was a lot of coding it's much later than when I started but hopefully you managed to follow along if you were able to if you did thank you I appreciate you all having patience behind this and letting the experiment with something interesting again we we were able to integrate machine learning into a relatively simple application with without touching any of the neural networks or datasets you can take a look at the ecosystem that pi torch offers or read more about what community members have made on the PI torch forums or blogs as well thanks for sticking around see you next time my name is kami Williams bye

Info

Channel: Meta Open Source

Views: 17,048

Rating: undefined out of 5

Keywords:

Id: eUSgtfK4ivk

Channel Id: undefined

Length: 36min 46sec (2206 seconds)

Published: Fri Apr 17 2020