The Easiest Way To Do Object Detection in C#

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

object detection is a computationally and humanly difficult task but in today's video I'm going to show you how you can do it in just a few lines of code in C sharp standing on the shoulders of some machine learning Giants let's dig in so for this video you're going to need a few files and what these are are the files used in our neural net training luckily a few people before us took a bunch of data and created these neural Nets and train them and we're going to take that data that they output and use those as our models to do this face detection the four files you're going to need are in the GitHub repo down below and they are in a folder called detection and if you'll just download those four files you should have everything you need and then from there just put them in a folder in your project and we should be good to go so with that being said let's get into the code okay so the first thing we're going to need is what's called a Cascade classifier object so we're going to name it base Cascade and what we're going to be doing with this is as you might have guessed from the name we're going to be doing detection on faces now in the not last video the video before that we did template matching which actually allowed us to detect a face because we would take a template of a face and match it to my face and um that's cool and all but that is very specific and doesn't really work all the time this however is trained on tons of faces it's like having a million face templates and so that's what we're going to use for here and you'll see it works much better so we're going to do this Cascade classifier object and what we pass it is the hard Cascade frontal face default XML file base with all.xml okay so yeah so we have a Caskey classified object and then from there we need to create a few mats so we're going to do a frame and I don't remember I said that it's not but we're going to do this through video so we're gonna do the same logic we did in the last video where we you know looped through caught each image as a frame processed it spit it back out doing something with it so we need a frame object we also need a frame gray object because we're going to have to turn these frames to grayscale and that's that's part of the the um the classifier so we have a few of those we may come up here and do some more later but uh so I'm just going to say while true and then we're going to start our Loop that we did last time oh we actually we do need a b c video capture and then uh we'll do the I think it's imgu CV video capture Dot API dot d show that's what it is oh and uh yeah I think we need to oh sorry new keyword I was like wait a minute oh that's not working all right so now we've got our video capture as well okay so from here we're gonna do vc.read to the frame and then we're going to do a CV invoke function called tvt color and we're going to use that to create our frame into grayscale as mentioned you need that for the hard classifier algorithm so frame frame gray and then imgoo.cv.cv and num dot colorconversion dot BGR to gray um I'll just I'll say right now you you know every time I type these namespaces out you don't have to do that you can you can put them as using statements above whatever just to save you some typing but you know whatever okay so now we have our gray frame and uh from there it's actually kind of simple so we're going to do VAR faces equal base Cascade Dot and then we're going to do a detect multi-scale method and in that method you pass it frame gray you pass it a scale factor which is 1.1 by default I did 1.3 in my testing and that worked pretty well and then we're going to do Min number of neighbors and that that's basically there's neighbor section of part of the algorithm and I just did five uh we'll see how that works these are these are things you could technically tweak as you go so okay so um yeah from here it's actually pretty simple so we're gonna do faces is not null and then we will also do faces.length is greater than zero so basically did we find any faces in our frame and if we did we're going to do a CV invoke.rectangle I'm going to do I'm going to write it onto frame then I'm gonna do faces Sub Zero and then we're gonna do MVC scalar object and we're going to do 0 2 5 5 0 and that's just green so okay and then you can there's also a thickness to it as well for the rectangle all right and so basically yeah if we if we found a face we're gonna draw a rectangle around it it's gonna be green and then so let's do the um show and then we're also going to do our CV invoke weight key and we're going to do 27 which as mentioned in the last video is the Escape key and if we hit it we're gonna break okay I believe that is all the code you need to do face detection with the horror Cascade classifier I actually have to uh kind of stop and turn my camera off so that I can use it for here because I can't use it in both places at the same time so I will do that and we'll be right back okay I look a little different right now but as you can see it's picking up a nice little square in my face and you know if you remember the template matching example last time if I move my face if my face like the lighting changed if if anything changed didn't work but this one just doing pretty good it's trained on you know if I give it if I give a side profile it even kind of works it's trained on you know like I said tons of faces instead of just the one template and so works pretty well that's the hard Cascade classifier like I said it's just a few lines of code and you can do face detection there are other models that come with opencv in case you're wondering where I got that file that file actually came default with opencv if you download opencv from the website itself you'll find that in the file so you can just search on the in the folder you downloaded there's other ones as well there's ones for eyes there's ones for like all kinds of different features yeah you can basically use those to identify other things other than just a face so pretty cool all right so with that one being said let's dig into another maybe a little more complicated subject object detection with the YOLO algorithm so we're just going to start with a uh a variable called net and this is going to be our neural net we're using igoo.cv dot DNN dot DNN invoke which is a a method we have not used before and it's going to be read net from darknet and so this is going to take two different objects first thing it's going to take is a config file and that config file I named out there YOLO v3.cfg and then the model file it took is going to be the textion slash YOLO v3.weights I forgot the detection here okay so first of all let me tell you where I got those from they actually came from a website one of the more interesting websites I found in the machine learning space the person that created darknet and the YOLO algorithm and all that seems to be a pretty interesting person yeah so you know I scroll through there and I think I I think there's a link on the page to his GitHub which had them so I downloaded them they're free to use or whatever and so I put them in the GitHub so you just get them from my GitHub it's fine but I did want to show you that web page because it's pretty interesting it goes into the explanation of how the algorithm works it goes into the speed of the algorithm which apparently is very fast Etc so all right so the other thing we're going to need is this little guy called class labels and all we're doing there is we're going to read in a file and uh it's going to be protection slash coco.names and we're going to use that basically to label our objects once they're found so you'll see what I mean all right from there we're going to set the preferred back end on our neural net and there's a few options you have here we're actually going to use opencv it makes sense it's opencv course and then we need to set the preferable Target and this is where you would tell it um you know if you want to use CPU or GPU or whatever and so we're going to use imgoo.target.cpu because what we're doing it's it's fine it'll be a little framey but ultimately CPU will be fine it's not a not a huge deal if you were going to be doing something where computational speed matters then yeah you don't use GPU but you'd also have to set up all the Cuda stuff and if you've ever done all that it's kind of a pain maybe I'll make a video on that at some point but yeah so for right now we're gonna use CPU just to make it simpler all right so then we need another video capture same as uh we've done every time we show okay and we're going to make a frame mat so we need a few objects and these are ones I don't think we've used before so we need a vector of matte and we're going to call it output the vector of matte object and the others we're going to use come from using mgoo.cv.util so we're going to do a vector of mats which is exactly what you think it is it is a vector of mats and then we're going to do a vector of rectangles called boxes and then we're going to do a vector of Floats or call that scores and we're going to do a vector of ins and we're going to call it indices indices so right now that I think that's all the objects we need to create outside of the loop so then we're going to do the same video capture Loop that we did previously all right so inside our Loop the first thing we're going to do is VC read frame as always which reads our video capture into our frame and then we are going to do a little resizing and the reason for that is uh it actually makes the computation a little faster this is something I kind of um figured out over after playing with it for a little while now the resize you do is it's important to get it right and what I mean by that is this resize may not work for you so what we're doing right here is we're reducing it by what 60 or whatever we're reducing it down to a 40 of its size um I believe if I'm not mistaken that your image that you're passing to the model has to be it's it both of its Dimensions has to be a multiple of 32. I could be wrong about that but I'm pretty sure as I was playing with it the certain sizes did not work and certain sizes did and then I did a little research and I kept finding stuff suggesting that it needed to be a multiple of 32. so do with that which you will one thing you might do is print the size of the frame you're passing to see what what you're you're doing your resize is ending up at and you can use that to kind of figure out is this a multiple 32 or not you know do some math on it too if you wanted to set the size the way you wanted it okay so CV invoke resize and then we're going to this might look a little weird but we're going to uh re-instantiate our objects here and the reason for that is um well you'll see in a moment but it's important it turns out as I found out all right so then we're gonna do VAR image equal frame dot to image and this is a BGR bite and uh as per usual the reason we would transmit or transform our mat to an image is because we need uh well we need an image either you're going to do transversal on the image you know by like a two-dimensional Ray or whatever you're passing needs an image and that is the case here so the next thing we're going to do we're going to have this variable input and we're going to do a DNN invoke dot blob from image DNN in Vogue comes from the DNN namespace so we can just throw that up here you know not to type the whole thing so blob from image takes a few arguments it takes our image as you might imagine it takes this scale factor this scale factor I believe affects how it'll affect the size of your blob that you're creating that then it gets passed to the model I played around with this for a little while the number I kind of stumbled on that I think does the best in at least what I'm doing is this uh one divided by 255 number and that shrinks it down you know pretty small you still get good object detection on it um and it actually helps performance as well so there's the resizing the health performance and there's this blob from image scale factor that helps as well and then the other thing we want to set we don't care about this size defaults I actually want to set swap RB so we're going to do it this way and we're going to set that to true and what that does is the you know if you remember in opencv the red and blue channels whatever they're called are swapped they're B gr not RBG and um so this swap RB basically says hey you need to swap this back because opencv is weird basically so we're going to do that all right so we have our blob now we need to set the input of our not new sorry net set input we're going to set our input as our blob so this is saying hey neural network this is what we're going to be sending into you to train or to test not train test and then we are going to do net.forward to kind of get our result basically um feed the the neural network forward to determine what the output was we're going to store that output in our output Vector of mats and then we are going to uh in terms of the output names we're going to do a property of net called unconnected out layers names and that will give us the names of the unconnected Health layers okay so this is where the code gets a little complex you know I said the hard classifier was just a few lines of code and it was but this one is a little more than that so uh we're gonna do our best but all right so the first thing we're going to do is a for Loop and we're just going to iterate uh through our output easy enough so we're going to do a VAR matte equal output sub I all right so we've got a vector of outputs and we're going to grab the first we're going to grab ith one and we're going to put it in matte and then we're going to do VAR data equal Matt dot get data and this function just returns an array type but we actually need for our purposes we need a float two-dimensional array so we're going to going to cast it and you'll see why in a moment I know it's not the prettiest coat in the world but it's you know it works all right so we're also going to do another loop J equals zero J is less than data dot get length Sub Zero and then we're gonna do J plus plus all right so in this inner for Loop we are going to basically grab the row the row score the class ID the confidence that the object is what we think it is um stuff like that so first thing we're going to do we're going to get a float array called row and the way we're going to grab it is with a little little link query innumerable dot range uh zero to data.getlinks zero or sorry of one not mistake it's one and then we're gonna do T select X Lambda data J X okay and then we're going to do a DOT two array on that all right easy enough this is this is basically going to grab a row of our data so we got a row of data and that row represents the in terms of how the net placed it in these arrays basically each row is a record of data if you want to call it that and so we're going to grab the score the class ID the confidence things like that from this row so I'll show you how to do that all right so row score so the score from this detection instance is going to be row dot skip Five Dot two array um the reason we're skipping fives because there's other stuff in there we we've got this this array object and uh we just we don't need the first five elements for the row score so so basically everything after the first five is going to be our row score so then we're gonna get the class ID and that's going to be row score dot to list dot index of Rose score dot Max so nice little Nifty trick there basically we're taking the row score array which again is the row minus the first five elements and then we're basically making that a list and the reason we're doing that so we can use index of and then row score.mex so so that gives us our class ID which we will then use to know what the neural network thought the object was and then we're going to grab a confidence from row score and we'll use the class ID to grab the confidence we've got some important information in there now we need to basically detect well how confident are we and if we're confident enough then we need to draw the rectangle so we are going to do confidence greater than um I did 0.8 in my testing that worked pretty well so this will go with now that's obviously a number you could mess with and so the way we're going to do we're going to do the box is well we've got this row information and we can use that to determine the location where the object was detected so we're going to do a couple bars we're going to do a center X variable and that's going to be the center uh point on the x-axis and we're going to cast to an INT row 0 times frame dot width yeah okay so that's going to give us the x value of our Center Point for the object detection so then we're going to do the same thing but for y and as you might imagine it is the next position in the row array as you can tell now the the first five positions or four yeah four positions in the row array are these locations okay and then we're going to do box width if I can type and that's going to be rho sub 2. times frame Dot and then we'll do finally box height so now we've got the center X Y points the X and Y points for the center of the detected object and then we've got a bit a box width and height to be able to draw the rectangle okay so the next thing we're going to need is um actually we're going to need an X and Y value of the place to actually draw the Box because you know we have the center of the object but we don't have a place to position the rectangle around the object so we're going to do a little math to get that and so we're going to do bar x equals we're going to cast it to an end Center x minus box width divided by 2. and we're going to do bar y equal int Center y minus box height divided by two and that gives us the locations of where the actual x y coordinate where we need to start the rectangle all right so we have that now if you recall we have these three Vector of Rec Vector float Vector of int that we're not using yet well we're about to use them and what we're going to use those for is to keep up with the basically the instances that we found stuff so we're going to do this Vector of rec.push and it's actually expecting an array and so the way we're going to get around even though we only at this point only have the one we're going to get around that by doing this is not drawing that rectangle array of one element and it's going to be a new system Dot drawing dot rectangle and the coordinates again are going to be x y and then we're going to do box width and box height okay so we're going to do the same thing with indices we're going to push it needs another array as well it's going to be an INT array with class ID so we're going to push the class ID onto indices and then we're going to do scores dot push um it's going to be a float array and we're going to put the confidence in that float array and now we need to do a little bit more we're almost there calm down so we're outside of our output Loop now and now we need to do a few things so the first thing we need to do is what's called near Max suppression algorithm which is a DNN invoke method and what it does is it suppresses any boxes to basically get the best one we only want the best box and so we've looped through and we grabbed all of them but now we need the best one right so we need a variable and let's call it um best index so it's going to be Dean invoke dot nms boxes and so we're going to give it our boxes two array we're gonna give it our scores.2 array and then it takes this score threshold and m s threshold and each of those we're going to do 0.8 f so the index of our best box will be here and then uh yeah now we need to start building our output frame so what I'm gonna do here I'm gonna do frame out equals frame dot to image nope not to string and that's going to be a BGR byte image and I'm gonna do another for Loop lots of Loops like I said computationally hard uh and then while it's less than best index dot length we're gonna do I plus plus yeah I should mention the DNA invoke Ms boxes gives you an in Array all right so then we have an index equals s index should be best indices but that's okay I have I and we're going to depart box equal boxes of index okay and then we just need to do our sleeping book rectangle on frame out okay just like always when we draw a rectangle we're gonna do thickness two we're gonna make it green just to make it easy okay and then the next thing we need to do is not only do we want a rectangle to be drawn around our object but we also want it to tell us what label it is so if you identified a bottle object for instance I wanted to say bottle up above the head um in the thumbnail for this uh this video you saw I had box around my face and said programmer right that's the the same thing we're going to be doing here uh we have to use the class labels that you always trained on which I think there's like 80 or so it has a decent amount but okay so the way we're going to do that is with the CV invoke dot put text function which we have not used before so it's pretty cool so we're gonna do frame out we're gonna draw under frame out and we're going to do class labels sub indices of index okay and that gives us basically our uh class label that we need and then we're going to do a new system.drawing.point and our point is going to be box dot X box dot Y and we're going to shift it a little bit from the Y because if you put it right where that's at um it's going to be covered up by the box so we're going to do minus 20 20 pixels all right and so we've got our point now there's a few more a few more arguments we need the next one is going to be imgoo.cv.cv inum Dot font face so you're telling it what font to use um there's a few you can use I just you know I'm a plain guy so we're gonna do plane uh then it needs a font scale we'll just do 1.0 for now it needs a font color so we're going to do a new MCV scaler let's make it blue no no that's not gonna show up let's do um red must be cool and then a thickness for the font all right so that puts our text onto the image now uh we've got about 60 more lines of code to write I'm just kidding we're done so now we just need to resize the image again and then because remember We Shrunk it down so let's uh let's shrink it back up crank it back up that's called enlarging uh and we're going to do system dot drawing dot size of 0 0 and because of that we're going to use our scaling factors and let's do this for now we'll see how that runs and then I'm gonna do I'm show output so I'm going to do the time show and then we're going to do our typical if it's the invoke dot weight key of one equal to 27. you can tell it one here but it's not gonna be that fast I can promise you so unless you're maybe using the GPU stuff okay I believe that is all the code we need to do this so like I said I'm going to turn off my camera and we will test it out see how it goes okay there we go it's picking me up a few times but look at that oh hold on let's see if I can do this I'm reflecting like three times okay person it knows I am a person if I hold up something I think it will yeah look at that I picked up for a second cell phone I didn't know that's crazy uh it's because of yellow it's uh it's a pretty nice little object detection algorithm just having trouble there it goes cell phone look at that cell phone um there's other stuff too I have a laptop here let me try that laptop let's go laptop yeah all right so yeah proof that it picks up quite a few things uh and who knows maybe I'll right here I'll throw in a uh video of it picking up my dog okay and that is object detection um it's pretty easy you know for you you only had to write the code you didn't have to actually train the models which is actually the hard part so like I said standing on shoulders of some machine learning Giants no matter how weird their websites are uh it's you know it works pretty well and I I'll say if you uh get that code and you kind of maybe put it on a mobile device or something go around see what all you can find that it will detect because it'll detect all kinds of stuff you know dogs um people chairs whatever like there's 80 things you can go look at the the file to see the classifications it's pretty interesting so yeah object detection not too hard so that's all for this video uh and actually probably the end of the opencv series we may dig more into it later on but if you're looking for something else to do I do have a suggestion uh you might want to check out my.net Maui Series where we're learning the new microsoft.net Maui framework it's pretty cool it's a way to build applications for a bunch of different systems all in one project it's very nice and it actually it's a lot of fun you know I've had some struggles but some of that is because I'm a Mainframe programmer not an app programmer but that's okay but anyways yeah so we're building an app over there I've got two episodes out now and then one showing you how to set up.net Maui so pretty good stuff that playlist is right there uh yeah and then if you'd like to level up your Dev skills that playlist will be there so like now I'm kind of like pointing at both I guess that worked right okay all right well other than that I'll talk to you next time bye

Info

Channel: Programming With Chris

Views: 19,303

Rating: undefined out of 5

Keywords: C#, Computer Vision Programming, Computer Vision Tutorial, Emgu, Image Processing, Image Processing in .Net, Image Processing in C#, Introduction to Computer Vision, Introduction to Computer Vision and Image Processing, OpenCV, OpenCV Tutorial, OpenCV for Beginners, OpenCV in C#, Programming, computer vision, csharp, intro to OpenCV, opencv, opencv tutorial, Video Processing, Video in OpenCV, c# tutorial, Object Detection, Yolo Object Detection, Haar Cascade Classifiers

Id: v7_g1Zoapkg

Channel Id: undefined

Length: 30min 7sec (1807 seconds)

Published: Thu Jul 28 2022