ML.NET Object Detection

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone and welcome to another video today we're going to cover how to attack objects in images using ml.net a machine learning framework that will let you use your dot net skills to implement machine learning into your apps in this case we will be using a pre-trained model with the on and x format onan x is a standard format to get models from any other machine learning framework like tensorflow core email krs pi torch and many more okay let's begin by understanding how machine learning model works the machine learning models normally have inputs and outputs the inputs are the values that we need to provide and the outputs are the predictions based on all the data in the mode the input has a specific format that it requires to work with the model and the output will have a specific format based on the model that will provide us the data we require and maybe we need to parse to get a result a prediction in this case here i have the tiny yolo b2 model which is a pre-trained model in tensorflow converted to one and x so we can use it with machine learning.net ml.net okay here i have the input the first thing here which is an image as you can see the format of the of this input is a three-dimensional array of 3 x 4 16 x 4 16. it represents an image of 416 of width and 416 of height three channels the three channels are the rgb so it's an image of in rgb with this resolution cool we can get that and we'll get into that when we check the code but let's see the outputs the output is a grid with one with 125 values 13 columns and 13 rows okay let's see a graphic example of this here i have the image already converted into the output grid so what we will get it's a 13 column and 13 rows grid with each cell containing 125 values what happens inside those 125 values is that those are not a single thing are five boxes of 25 values each one the first four values are things about the position of that box inside the cell the position the left position top position the width and the height of the box that contains an item the confidence about the values about the labels that it's the five ball the five value from the 25 in the box and then a lot 20 more items that represent the confidence about each label covered by the model okay that's hard to understand but let's see how it works so imagine that the first label on the model is car that's the first label registered we can see that in the documentation of the model this will give me the confidence if there is a car in the in that box in that cell so it will give me zero if there's no car in it it will give me 99 if there is a car in there so based on that we can detect what's on the picture it will it has 20 possibilities of what it can be we can check what represents the one the two the three whatever the all the values every value has attack maybe flower car whatever so we can validate those values and know what's on that picture let's see an example by checking the code okay so i created a class called the image object detector and it has a method that returns a boolean called image has object i send the path of my image and the object that i'm looking for so i can send the path of the image and maybe a airplane and i can see if that image provided has an airplane in it let's see how the code works first i instantiate my ammo context which contains all the operations to convert my inputs and get the outputs from a model so i need to process my data first i load it to load it into my context i go to machine learning context data load from enumerable and i take the path of my image as an input and i leave an output property that will be used for for enabled context to reference this item this image so we need to convert it to what our model expects if we go back to netron here and go to the top we will check that it requires an image called image literally this is a name of a variable it's not referencing the type it's just a variable called image that we need to fulfill and it needs to have three channels the three colors and 416 and 416 of size an image with width 416 and height 416. if our image doesn't match that format then it will explode so we need to convert it exactly to what the model expects to do that first we load the images we take the value from the input we load the image and it will leave it into the output we have to provide the image folder in case that we have a folder that contains all the images provided so we don't have to use the we we can use the relative path through to that folder in this in this case i have my full path there so i don't need to do that okay now that that's done we need to resize the image to the 416 expected so my input will be the last output the loaded image this is my input the loaded image and the output will be on the same property output so i take the original image i resize it to the size expected and i leave it as the output so now i have a resized image at this step now that the image is resized i'm going to extract the pixels i'm going to extract a the three values for each pixel well it's the three values of each pixel this this returns the rgb for each pixel it converts it into the array expected so i will have a three values 416 pixels uh multiplied by 416 pixels i will have every single pixel with the with their three values to get exactly what the what the model expects a three channel a three dimensional array with pixels and the rgb colors so i received the the input and i output it as an image you remember that the name of the of the input is image okay now my image it's outputted as the variable image for the model so now i have to apply my model after i converted my image and i have exactly a format that the mobile is expecting i can apply my model and tell him okay take the image and convert it into the output that you will provide me and which will specify so i take the model file this is the on nx file that i have on my project assets machine learning tiny yellow b2 or nx the same one that i have open in nethrone and i tell okay your input is going to be the image that i that i created on the last step and the output will be accurate what is the grid the thing that we already saw the image here which has a 13 column 13 rows five boxes in every cell and in those five boxes we have 25 values the five first features and the other 20 probabilities of which labels that square has so now we need to make our type fit this this this model what happens is that we need to feed it what happens when you fit you set a schema to that type so if we have a type called person and we want to to make it fit into our pipeline then first we need to make fit and whatever whenever we send a a person to the transform method to process it it will detect that it has to run this pipeline so cool what happens is that first we tell whenever you see this type you need to run this pipeline now because we can do it right there we not we don't only fit but we transform at the same time okay we saw okay you have to run this process process now in the data that i provided you just transform it what will happen is that it will run the all this pipeline step by step including my model and it will give me my output my the result of that of this conversion so now if i get i can get the column grid which is exactly the column that i said it was the output this is the result of the the model that i have here as as you we if we go down we see that the grid it's the output so this grid has a structure of 125 x 13 x 13 125 values 13 columns and 13 rows so every every cell has 125 values why 125 values because the 25 values every 25 values we will have a five features x y w which is with hate o which is confidence and then 20 other values which represent in the order of of labels which probably it's that it's an item for example um i have a prediction parser which basically takes all those floats which is a linear float array with like 21 000 values and convert it into a more readable uh thing i'm not going to explain that one you can check the documentation of the jola model to understand how it works inside and how you can parse it and use the values that you need from the array in this case i just took the values that i need from the labels but it's a mathematic process that will take a long time to explain so i'm not going to cover that if you want to learn how to parse a yellow model you can go to the yellow model documentation and check how it works and what what are those 21 000 values and what you want to take so here i have an array which represents the order of every item in a after the five features so i have my five features x y with hate and confidence and then i have 20 values the first value is the percentage of the image to be an airplane the second bicycle the third birth so it will give me a value from 0 to one after conversion of course after comparison that will give me a value from zero to one a representing if a by order which of these items is so if i go to the zero one two three four the four no the five the fifth item after the features i will have the probability of that box being a bottle that's how i can get how i can get which labels are available i just go through all the 20 values after the features after the pictures and i can see okay this is the the first one represents an airplane how much possibility is that it is an airplane oh zero then it's not an airplane and that's how i work with the model so now that we know that and that we're doing the process to understand the model and get the values from it i already did a parser which uses the logic that i already explained to see what labels are mathematics a lot of loops and stuff to check the 21 000 values and get the labels from those and that's how i can check that if if i get all the labels are available and i can check if any of those values of those labels match the label expected that which i send in the parameter and if it exists a label that which matches the one that i sent then it will not be null which means that it will be true and that's what i validate if the image has an object now i can use it in my controller how here i just make a temporary image with the with the one received from the api i take the image i save it into a temporary file and then i call my image object detector image has object i send the image and i tell has a car and if it doesn't have a car then i can send a validation exception telling that the image doesn't contain a car and it will fail letting my users not upload anything else that it's not a car because it's in this case i'm having a car api which should only let my users upload cars let's see if it works so let me run the api here i have my swagger let me restart it to start from zero and i'm going to try it out by adding a car picture this would should work perfectly because it is a car and my api only expects cars so you can see it worked but what happens if i don't upload a car like maybe this fruit i execute and it says the image doesn't contain the car which is great that that means that my users cannot upload anything anything else that is not a car in my car api and it will help me so i don't have to validate each picture uploaded and delete it myself which is some manual process that i had to do if my algorithm if this algorithm with ml.net didn't do not doesn't it didn't exist but it exists so i can validate anything that it's on my pictures this is just the beginning of what computer vision can do check more models there are models for everything in the world you can check anything skin colors a eye color age gender every every single thing in a picture is mo it it's it's on a model somewhere so you can check it out check how the model what are the model inputs what are the mobile outputs and use it at your benefit there is a light a full library of of models and using on an x you should be able to use them all from any to any framework if you have seen a model it's on tensorflow you can use it just compare it to one and x it's a really straightforward process that's it for this video i really hope you liked it and i i will try to provide all the code from that that you have seen here so maybe you can implement it or check it out and make some modifications for your own benefit thank you for watching if you like this video press the like button it's right there and it's easy to click you can also subscribe and check our other videos which are great see you later and [Music] have a happy coding you
Info
Channel: Hahn Software
Views: 2,293
Rating: undefined out of 5
Keywords: #C, TypeScript, Angular, .NET, dot net, JavaScript, ml net, object detection, angular, aurelia, vs code, ml net object detection
Id: rkfosYH2JIA
Channel Id: undefined
Length: 17min 5sec (1025 seconds)
Published: Fri Aug 12 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.