Face Detection Using C# and OpenCVSharp - Practical ML.NET User Group 01/19/2022

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] cool so thanks everybody for joining uh so today we're going to be talking about how can we do face detection using opencv sharp um not really an ml.net topic but still a machine learning computer vision task that you partake in in c sharp so it kind of fits here so cool uh again thank you to a press they're sponsoring us we are going to be giving away another copy of ml.net revealed book it's a digital copy so we'll just draw a name um after the presentation's over and um i will contact you guys directly to let you know if you want um so a little bit about me uh you know i've been doing um coding for well over 15 years now a good chunk of that has been in the dot net realm mostly in c sharp but they've branched out and done other things like python go lang and now coming back around to getting heavy back into c sharp.net again especially in the ml.net space and all the all the cool things we can do now and.net c sharp around machine learning it's just a awesome time to be in this space so what we're going to look at today is we're going to do a real quick brief overview of what opencv sharp is and then we'll jump straight into demos and we'll be doing for face detection on basically just a static image then we'll look at how we can do that on a webcam stream we have covered those we'll take a look at how we can actually place text on top of the screen and we'll do that with the webcam example and then we'll look at how we can place artifacts on the screen in this case we're going to do a transparent overlay on top of a static image and we will put a pair of sunglasses on a on a person after we've detected their face so where's opencv sharp um it's basically you know we got to look at opencv before we can look at opencv sharps opencvs open source computer vision software library you can do a lot with it you can do image manipulation so you can do all types of resizes switch rotations you can do color manipulation uh you can use it to do ocr do barcode readers obviously face detection is in there and you can also train like face recognition models so you could you could train it to learn who you are in an image which is pretty cool originally developed by ibm um back in 2000 so it's over 20 years old now it's been around for a while ibm doesn't maintain it anymore but they were the ones who came up with this and opencv sharp is basically what you think it is so it's a net wrapper um it's been around since 2013. it's had about 72 releases in that time it is maintained by one user by the name of shamath i'm not sure if i'm pronouncing that incorrect um but he does have 58 contributors helping out on this project so there's a lot of there's a lot of stuff going into this a lot of people helping out on it and what's cool about opencv sharp is that it's modeled as close as possible to the opencv sql plus api um and that's really important uh for the next slide so what we found uh my colleague nate um has helped out with a lot of this work and what we found was when we started diving into opencv sharp there's just not a ton of examples out there even though it's been around for a long time and i know a lot of people are using it you can go look at the nougat downloads to see like it's be it's getting used a lot but there's not a lot of people blogging about it or showing examples so it's been kind of difficult to find pure c-sharp examples but what's really nice is that there's a ton of python examples out there on just about anything you can think of and since the open cv sharp folks took a lot of care and time to try to make the api look as identical to other the other implementations we can easily take that python code and deconstruct it and convert it over to c sharp code pretty easily and here's a list of examples showing you how how easy that is right so thank you here we go so if we want to load a hard cascade uh xml file you can see there's a cb2 cascade classifier and python in c sharp it's a new cascade classifier pretty straightforward same thing with loading image this is a little different in there's a cv2.iamread to get jpeg file in and c sharp you do a new matte and a matte is basically sort of you can think of it as an image and then for all intents and purposes in opencv sharp it is an image um but it's a it's not like a bitmap or or like an image class from like image sharp um it's a little different and the way you interact with it is a little different but for all instant purposes it is an image um you can see if we want to show a window open a window and show an image in there it's a cb2 in show so basically you can see back and forth the it's fairly straightforward to take python code turn it into c sharp code when you're working with opencv sharp there's a bunch of different flavors out there and you want to make sure you're pulling in the right ones so this is straight from their github repository this is straight off for their readme file and basically you can see no matter what you're doing you're going to be bringing in this first one here the opencv sharp 4 package you're probably going to be bringing in the opencv sharp 4 extensions i have yet to find where i didn't need that package and then you need to get down into these native bindings depending on where you're running so this is very important so when we go through our examples i'm on a windows machine today so i pull down the opencv sharpruntime.when i've done some of this on a mac so i would have pulled in the runtime os x 10.15 but just keep that in mind that the managed libraries you almost always are going to pair them with a native binding library as well so we can jump straight in some demos so we're doing all this today in uh this is visual studio code and we're going to do all this in net interactive notebooks and so if you're not familiar with the net interactive notebook basically python style jupiter notebooks but we can run that code in them so a couple things to note in our in our structure here in this repo because this is just the link to this repos provided at the end in the data folder we do have our hard cascade xml files now we're not going to go into depth on what our cascade is there is a great link at the end of this presentation that you can go read a very in-depth detailed explanation on what hard cascade is but the high level is it's basically an algorithm that you can train to do face detection right and at the end of that training you get these xml files that basically tell tell the machine how to go look at an image and determine if it's a face or not it's pretty cool technology it's been around for a while and it works really well and you can go to the opencv the official opencv github organization and they have about a dozen of these files pre-built for you to use so in this example we have one that we're looking for eyes in a face and then we have this one the frontal face alt which basically is just looking for a face and an image speaking of images we have a bunch of images that we're going to try to run through we have a single face image we have a face image with multiples and then we have an image with a lot of multiples so we wanted to see how this is going to do against what i don't know 30 40 faces and then we have our transparent sunglasses that we're going to add later and then we have all the notebooks that basically just do each one of these functions so let's jump into number one so the first thing we need to do has mentioned earlier is we need to load our new get packages and if you're new to edited character notebooks this is how you do it basically um there's this pound our keyword nougat and then whatever package you want to bring in so we can just have to tell it that's fine boom okay so we can see that we've installed these packages um this is basically just grabbing the latest from nougat if we wanted to control which version that comes in we all we have to do is just do a well that didn't work basically just do a comma at the end plus the version and it would actually point whatever specific version we're looking for so now we have our nuget packages loaded we just have to bring in some using statements that's good all right so now we can sort of dive through this code um let's just run it real quick to show you what it's going to do cool all right so i loaded up the face the single face and i drew a box around the uh the face that it detected pretty straightforward pretty easy so let's walk through the code and see exactly how that's happening so the first thing we need to do is bring in whatever hard cascade classifiers that we want to use so in this case we just want to bring in the frontal face alt so we're just going to do a new cascade classifier basically this color here is just we're going to use that to draw the boxes around the image right so we just need some sort of color this is just green02550 and we're just saving that off for now all right so now we're going to start loading in our images right this is where the matte objects come into play and mad objects are very important to always dispose of these mad objects they they are disposables and you've got to make sure you're cleaning up after yourself because if you you're going to see some code later where we run through a ton of these matte images to do the transparent overlays and if we're not cleaning up these resources you're going to introduce memory leaks very easily so it's just very important to try to wrap these in using statements if you can if you can't just make sure you're cleaning up after yourself when you're done with them so the first thing we're going to do is we're going to load in our source image which is the single face image and then the next thing is we're going to create a new empty mat and there's called gray image and the reason we're doing this is face detection works best on black and white images so the first thing we do is once we load our source images we actually call this cb2 convert color we pass it whatever we want to convert we want to pass it whatever we want that output to go into and what the conversion should be so in this case we're just saying convert blue green red to gray and the next thing we do is we do the cb2 equalized histogram and if we look at what that does it basically just normalizes the brightness and contrast right and you'll see this a lot in some other python code where you just you basically once you convert it to grayscale you just want to normalize the the brightness and contrast because it makes the cascade classifier work better and then it's pretty straightforward after this we basically just say hey for that classifier we just created we want to detect multi-scale so we want to detect as many faces in this image as we can multiple scales what image we want to detect on which is the gray image and the minimum size of the face and this one's very important and we'll see it in a little bit on why this is important because basically this is saying hey if you find a face if it's not 60 height by 60 width if it's less than that i don't want you to classify it as a face and the reason we want to do things like this because sometimes depending on what your backgrounds look like you can get artifacts in there that this will come back and say as a face but you know it's not a face once you see the image and we may be able to trick it into showing us that on the next one so this is really good depending on sort of you know what kind of images you have coming in if you're a webcam if your expected use case is a webcam and you know the the face is going to be pretty big basically centered in the frame you can increase the size a lot to try to reduce you know false positives on the detection but for right now 60 by 60 is working pretty good for this image we just console rightline how many faces that we found because again this is a multi-detect so it'll detect as many as it can and then we just loop over the faces that we found so we just do a simple for each of these and basically what we can do is we can create a new mat and again you see here i'm not doing a using statement i'm introducing a memory leak here that's okay for this uh so basically what we do is we pass in the um the source image which is our full size image of the face and the rectangle that was found inside that phase and then basically we get the detected you know what i don't think i'm using anything with this i got some bad code here i bet i can comment that out never mind that was used for something later you'll see that actually i can take that out that's used in the next example my bad so now that we have the faces all we have to do is we're going to draw a rectangle so we can do a cv2 rectangle on the source image the rectangle that it found the color and the line size and then we can call cb2 i am show image to open up a new window with a title of face detection show the image and then once you do an imshow you have to include this weight key and the weight key is basically a way for the user to kill that window if you need them to and the way this works is this is basically just this number you pass is a millisecond of delay to wait to look for a way key uh to look for a key press so it doesn't really matter right now if we pass in zero this is basically saying we don't expect a weight key to come in and this is what you would normally do if you're showing just a static image like we're doing today um we don't need a wait key they can just hit the x on the window to kill it but it'll come more into play when we look at the webcam demo here in a second so again we'll play this one more time and it's doing exactly what we hope it'll do so i'm gonna stop right there and ask if there's any questions all right cool i keep going um i have a question sorry yeah yeah okay so my question is like in the initial blog like your jupiter notebook how do you actually know the default packages that i installed using the script uh i'm not sure follow the question can you can you say yeah okay so my question is how do you know the defaults packages like nuget packages in this space this workspace yeah yeah so like right here so like i'm just hitting that yeah um so there's two ways like you can do it like this and just say pull the latest and don't worry about it um or you could actually we could go to nougat and see what the latest package is or or or if we want to use a different one not actually the right my question is like for this environment there should be some default uh c sharp packages that are available for namespaces so which default namespaces are available here oh oh so like in our using statements yes like what are the default ones available in this space yeah yeah yeah um i want to see let's see there i can't remember so i think.net interactive is basically set up to use whatever the highest dot net you have installed so in my case i have dot net six installed um so from here i i can do a using on anything that's in dot net six okay or i could pull in a package and and just use a using statement from there is that is that answering your question yeah yeah thank you okay cool all right all right cool so let's look at the next one so with this one what we're going to try to do is we're going to not only detect the face we're also going to detect eyes in the face so we're gonna do the same thing as before we're gonna install our packages we're gonna bring in our using statements and then this code's gonna look very similar but a little different at the same time so we're going to bring in two uh cascade files this time so the same frontal face that we saw before and then we're also going to bring in the eye cascade and this is just going to be called nested same as before we're just going to draw a set of color that we want for uh to draw the rectangles similar as before we're bringing in our face image and we're using a gray image don't really all this is really the same as before but what we're going to do now is now that we have our faces we're actually going to loop through those and then try to do an additional detection of the eyes so for each face rectangle and faces we're going to try we're going to do a new detected face image using the source image and the face rectangle now what this is doing is it's creating a what's called an roi or region of interest and basically all that means is this face rectangle you can see it there it's a rectangle logic right so it basically just has an x and a y and a height and a width so when we pass a rectangle object into this new mat we're saying hey take this source image and then pluck out this rectangle from that source image and stuff it into this map so now what we have is just a slice a single rectangle slice from the source image that's now here and we're doing that because we know we've detected a face at this point right and now we're going to try to look for eyes inside that face so basically it allows for the second detection to work on a smaller uh region of interest and it just makes that work a little bit easier now we're going to draw the rectangle here probably we could have done that up here it doesn't matter we're going to draw the initial rectangle around the face and then similar as before we have our face image this region of interest we're going to gray it out it's probably already gray you probably didn't need to do that that's okay we're just going to go ahead and convert it to gray again and then on that we're going to do another detect multiscale but this time we're going to do it with the nested cascade which is looking for eyes and you can see here we're using a much smaller size this time right because if our face if we're saying that our face detection needs to be 60 by 60 rectangle we shouldn't expect the eyes to be bigger than 60 by 60 so we're going to shoot for something much smaller 30 by 30. and then once we go ahead yeah sorry thank you could you zoom in can i zoom in yeah yeah yeah absolutely let's see is that better more yeah sure thank you okay cool all right so now that we have detected our eyes uh if we have any we'll just loop over those and we basically just create a new point and basically all we're trying to do is just some simple math to basically try to find the center of the eye that we just detected right now i'm not going to go too deep into this but this is basically what this map is doing saying for each one just find me the center x y of the detected i then we create a radius and we're just going to basically we're going to draw a circle this time around the eye and basically all this is doing just some quick math again to take that center point build a radius around it and then we can draw a circle using that same color and the thickness so now what you get is a box around the face and then two circles around the eyes if it works let's take a look boom so now this is really cool right so it's kind of interesting you can see that it drew a bigger circle here on the on the right eye or left eye whichever one you want to share how your perspective there and a smaller one over here um and you know that might be worth some it might not especially when you start to do overlays uh especially when because when we if you really wanted to get professional and try to drop like sunglasses on this guy you probably want to look at the the pen points of the eyes and figure out how big to make the image that way but what's nice is that even though these circles are a little bit bigger one's bigger than the other the pinpoints should be right on the eyes or close to it now what i forgot was that we actually do have other images in here so let's take a look at the group and see how it does with that yeah now okay so this is where we get into trying to control for the size of the box that it's finding right so in this case it did find the faces but it didn't find any eyes and that's probably because in this image these spaces are much smaller so if we look back at our size object here when we're trying to look for the eyes clearly we're going too big so let's bump this down to 15 and 15. let's see if this will find any nice found a couple um about a couple and you can see like in this face he's squinting a little bit because he's smiling [Music] may not have found the eye here because of some of the shadowing from the from the hair so you see these aren't perfect um let's drop it down to 10 to see if we can do any better same result okay so not perfect but we are picking up some eyes now let's try it on the big uh faces oh okay you can see that we found a bunch of faces in this one we found some eyes but it's not perfect again you should notice a trend in this one a little bit is all these that didn't find a face you can see they're kind of tilted a little bit and then it found this one face here that's clearly not a face so i made a little rectangle here over on the side but all in all i picked up quite a few picked up some eyes on most of them but it's something to note right is that this tilted head we probably would need to bring in a third classifier to try to pick up on this tilted head um some of these with glasses sometimes it doesn't do well with glasses there are uh cascade classifiers specifically built for people wearing glasses so it's just something where you need to sort of experiment with uh which classifier should we use how big should we make our our the men's size on our boxes you know before it can really start to pick up things you can see here this woman uh second woman in the fourth row it picked up the corner of her mouth thought that was an eye right so there's definitely some logic that needs to go into if you wanted to sort of productionalize some of this stuff to you know potentially weed out some of these artifacts like in our case if we want to try to productionalize this we would probably say i only care if there's one rectangle and two circles that are almost on the x same x plane like that might be something we could do to sort of remove some of these weird artifacts that are showing up since we're doing that let's go back over here to our first one although it's kind of the same code yeah that still works okay so i'm going to stop there for a second on the face and eye detection to see if there's any questions all right cool so now we'll jump into a webcam sure so now we'll jump over to the webcam and what you're going to find here is that this code is going to look almost identical to everything you've seen so far so this is the webcam example very similar to what we just looked at we're going to try to find the face draw a box around it and we're going to try to find eyes and draw eyes around it now the main difference here is that we're introducing this video capture and no magic here we're basically just saying hey we want to use a webcam that's attached to this computer in my case i only have one webcam and so i can basically just say hey i want to create a video capture and use index zero um and that's basically because i have one camera um i know it's always going to be at index zero if you have multiple cameras you'll need to figure out which index you're at i think this just takes an index yeah there's six overloads there might be you might be able to do something else there but for me i only have one i'm just going to go for index zero um something else is different here is that we are going to create a window up top here because we're going to be reusing the same window over and over again but the rest of this is pretty straightforward i mean this is what you've seen before we create the new source image we create the gray image we create the detected gray image and we can basically you can see i'm just sort of moving my using statements up a little bit so once we fall into this we basically just want to say hey while the webcam is open and basically all this is saying is that the web camera is on and it's capturing an image so for whatever reason if you're in this loop and i were to go and unplug my webcam this would come back as false and we'd fall out of this while so this is just the handy little uh just do it while the camera is on so once we fall into this the first thing we're gonna do is just to capture read and this is basically just going to capture a frame from the web camera and put it into that source image map and then nothing else is changing here all of this code is exactly the same we're going to try to find our face i'm going to change this c60 to another code somewhere as before once we have a face rectangle we're going to fall into this and try to find the eyes we're going to try to draw the circles around those eyes and then the only difference here is this uh window.show image we're going to since we created this window object up here we're using it down here so i just want to say window show image and then the weight key now i'm going to change this because i found out that 30 was way too long i'm basically saying hey wakey wait for just one millisecond to see if a key gets pressed and this is very critical right because this we found out last night that this greatly affects how many frames per second you can capture in your webcam um so i just want to wait one millisecond between frames to see if a key gets pressed and in this case if the key is 27 which is escape i want to break out of the loop and it'll shut the shut the stream down so let's play this now this does take a little bit to initialize um connecting to the webcam making it open and start capturing generally takes about 20 seconds i found on my machine oh there we go cool all right so this is probably a little delayed on your side but you can see my webcam is now open it's tracking me it's showing a box around my face distracting my eyes see if i can get it to there it will pick up a corner of my mouth as an eye every now and then but it's doing pretty good and if i hit escape so i'll stop there see if there's any questions around capturing webcam data um yeah i have a question yeah so my question you talked about the weight scheme yeah but when does rich key method gets executed because i'm saying that this is true can you like to explain the which key function yeah so the weight key so basically this is something that's it's unique to opencv right so basically when you do an ion or cv2im show or a window.show image here it's expecting this cv2 weight key to be the very next line and if i take that out let's just take it out what's going to happen is if i remember correctly the window is not going to show at all let's check it out let's test it um so this is just something inherent to opencv like it the minute you show a window oh it is going to show up i don't think it's going to show the frame so yeah we should have we should have some frame to get in here by now um and you can see it's not responding now um so and i don't understand i don't fully understand the the internal workings of why this is the why opencv does it this way but basically when you show a window you have to have the cb2 weight key there otherwise it's never going to show the image that you're looking for all right that's thank you is that kind of like a a non-answer [Laughter] i wish i wish i'd done a little bit more research i could explain that better to you but um unfortunately that's the best i can do you gotta have it and you can see like it's still running down here um so basically what i need to do now let's do a cd2. let's destroy all windows uh see now i've got i've got my whole um set up here in a bad state so hang on one second what i have to do is kill visual studio and then basically redo everything so yeah that's a lesson you'll learn pretty quick um it only takes a couple times if you're not including this to remember to do it every time um and then we'll talk a little bit about the the millisecond delay here in a second when we look at the next example of displaying text on the screen i just want to make sure i'm getting my code back to where it works there we go all right we're back all right cool oh and the cool thing here is that you can see that we are saying if the key equals 27 which is escape that's that's the break get out of the loop if i try to kill this window it's just going to come back every time it's kind of crazy and kind of cool at the same time all right so let's look at the next example so this is basically the exact same code we just looked at with only the addition of we're going to try to show our frames per second that we're getting from our webcam so we just bring in a little bit of extra code here we're going to be using systems.diagnostics to bring in a stopwatch and there's nothing special here uh this is basically just some code we we pulled straight off the internet to uh determine our frames per second so i'm not going to dig too deep into what's happening here the big thing is that we just create a previous time a new time we have a stop watch we start the stop launch and then if we scroll down here after everything has happened basically what we do is we figure out our frame so we take the count that's happening i should have shown that up here so once we're in the while we actually start incrementing our accounts and basically just divide that by the stopwatch elapsed milliseconds times a thousand that's going to give us our frame rates and then we're going to do this sourceimage.txt we're going to tell it what text we want on the screen so in this case we're going to put our current frame rate in a new point this is basically just an x and y coordinate on where we want it on the screen what font we want to use this is the font size what color we want and how thick the um the lines of the text are and let's go back and look at this so all the way up the top just like before we declared that green color for the for the box around the faces and the eyes and what i'm doing here is i'm defining another color called fps color and this is red the 25500 is red and basically what i'm saying is that if our frame rate is less than 20 frames per second i want that color uh i want that color to be red otherwise if it's greater than or equal to 20 i want it to be green so what we should see is when this thing launches we're not going to be at 20 frames per second at first it's going to sort of creep up and get there and then once it hits 20 frames per second it should switch over to green so let's take a look at that okay there it is okay cool so you can see the text there we're at 16 frames a second 17. it's going up we're getting there you can see it's trying to detect a face somewhere over here but hopefully we're going to get over 20 frames a second here and we'll see this text go green cool all right all right so that's basically how you can put text on the screen um and maybe control the color oh that would dipped um so yeah and you might be asking yourself right now man this thing should be getting about 60 frames a second um and we were thinking the same thing when we started this we were thinking we should be getting 60 frames a second what we found was um there's a lot of talk online around why opencv will not get 60 frames a second and there's so many variables that go into it it's your webcam it's your computer it's the processes that are going on there's just so many things that can affect um what your fps should be and we've kind of mentally settled on if we can get 20 or more out of this thing we're pretty happy the video is a little choppy it's probably way choppier over skype um but for us um this is enough for what we're trying to do at the moment so if we can hit 20 or higher we're pretty happy with what we've got but i do want to show something around um the weight key so remember i talked about this is how many milliseconds it's going to wait to see if a key get pressed and originally we had this at 30 and i don't know why i can't remember why we had this at 30 i think it's just because we pulled some random code off the internet and we used it but what basically what we're saying is that we want to wait 30 milliseconds to see if the key gets pressed and i want to show you what that does to your frame rates um because now what's happening is we're capturing a frame we're displaying that frame then we're waiting 30 milliseconds to see if a key gets pressed and then we're repeating that over and over again um so hopefully this thing's not going to make me a liar because uh basically what we were finding before with that introduction of 30 milliseconds yeah it's holding me right there at about 13 frames a second um so you definitely want to be conscious and cognizant of what type of delay you're putting into that way key in our case we wanted it to be as low as absolutely possible it cannot be zero zero will cause this webcam stream to just not display at all so it's got to be something higher than zero but it shouldn't be anything higher than one in my opinion all right so i'm gonna stop there uh it's 12 45 we got one more uh demo an example to look at and then we can wrap up the presentation so is there any questions around displaying text all right change that back to one okay so now we're going to look at how we can do uh image overlays on top of images and i'm going to go ahead and admit right now that this code is not optimized in any way whatsoever so everything that i talked about how you need to dispose of your matte images wrap them in using statements i'm not doing any of that here and that's mainly because i didn't have uh time to uh work this code out the way it should be so we're just gonna quick uh go through it and you can you can hold my feet to the fire later so we have this make overlay method and we're going to look at that in just a second let's go back down here to the same code so we've moved back to code that was in the original demo just to keep things as simple as possible right so we're just looking for the frontal face all from this from this classifier we're bringing in that single face image and then we are also bringing in this new map which is called overlay and we're bringing in the sunglasses if you remember over here these are a png or the transparent background and when we bring these in we want to use this i am read modes unchanged um and the reason for that is uh remember how i talked about mattes or images but they're kind of not images um what happens is a matte is it basically holds a ton of information about this and it also holds the channels of this image so it's holding the red blue and green channels so if we just do a new matte we get an image that has a red green and blue channel just three channels but our png actually has four channels right because it has that transparent channel so we know which pixels are transparent so if we don't say bring it in on change it will bring it in as a 3d channel and we'll lose the transparency element so we have to be taking special care we're dealing with transparent images or anything that doesn't have three channels so this all looks pretty much the same we're just loading the images we're looking for the uh we're looking for the face uh but this time around we're not actually going to draw the rectangle on the face because this is not what our goal is our goal is to actually put some sunglasses on this dude's face so we're not going to draw a rectangle what we're going to do is we're going to call this makeoverlay we're going to pass the overlay mat that we want to use and the source image and that's going to return us back an overlay sourced image now given the time we're going to run through this code very fast but i've put uh comments on everything so hopefully it will make sense if you're going to look at it and get updated later now i'm cheating here i'm 100 cheating this is all hard-coded because i just need this demo to work um and i reason i'm cheating is that i'm providing the height and width of the overlay hard-coded and i'm providing the x and y coordinates where i want that overlay to land hard coded what you would do in real life is this make overlay would take probably an x and y coordinates of where you want to put it because you would be you'd be loading your image you'd find the rectangle face you would get some coordinates out of there you did some math and you would pass that x and y into this so it would know where to put that put those sunglasses i didn't have time to do that unfortunately so what we're doing here is we're faking it i know exactly where i want the sunglasses to land and it is on 114 and 120. um i probably don't need this here this resize this was originally here in that we wanted to put these overlays on the webcam video so that rectangle that's being drawn is constantly changing height and width and as such the overlay we need to put on would be constantly changing height and width so basically all this does is it would resize the overlay image for us based on how big the rectangle was again i don't need this here because we're cheating um but this is something to take into mind if you're going to do this in in real life so the first thing we need to do since we're using a transparent image is uh we need to pull our [Music] transparent layer out um and basically all these layers are indexed in the mat um so i think it gets our rgb red green blue so red is at index zero uh green is it index one blue is it index two and our transparent is at index three so that's the first thing we want to do because we need we need to preserve this transparent channel so then the next thing we do is we're going to take the overlay and we're going to convert it over to an rgb so we're basically going to get rid of the transparent layer so we create a new map called overlay converted colors we're going to convert the color passing in the original overlay where we want it to land and what we want to convert to and this is all very confusing at first but as you go through it a lot you start to understand really well what's happening so now we have our overlay that's been converted into a three channel so we took the four channel converted into a three channel preserving the transparent channel and basically what happens is when you start combining mats in opencv the channels have to match this is very important um height and width has to match number of channels have to match so since we want to put this transparent layer back in we have to create um i'm getting confused now wait a minute hold on oh no i'm sorry we convert it yeah there we go we convert it to a three channel rgb that's right okay i'm sorry i'm so confused because uh i'm kind of new to this as well we take our original overlay we converted to rgb the alpha channel channel's still in here at this point we create a new array because basically what we're going to do now is loop through and basically just pull out the rgb channels from our original overlay and this is what this is doing so on a loop we're just saying if the current index is less than or equal to 2 so we should only get 0 1 and 2 indexes we're going to extract the channel from the overlay and add it to that ring so what we're left with down here is the overlay merge converted colors matte is we're basically saying cv2 opencv merge all the channels you got in this mataray into this one new map okay i'm up speed now sorry about that so now what we have is our alpha channel which is just the transparency of the overlay and now we've extracted the red green and blue and we've merged them down into this one map that's going to get even trickier so since now we have our overlay merged to a three channel grigri and blue we want to put our our transparency channel back onto that but we can't merge a single channel with a three channel map so we have to take an overlay alpha channel and actually merge it three times into a new mat called overlay alpha 3 channel then we can combine them back and we're not going to get into um bitwise and bitwise orbit wise not um there's a great article i've linked into the presentation that explains what this does but at a super high level it'll take two images and it will flip pixels on and off based on based on what you used and or not it's complicated the article i've linked gives a great example it has a ton of image examples showing you what happens when you use the different and or nots um so i'll let you read that so the next thing what we do is we're going to get the region of interest on where we want to put the sunglasses and again we're cheating we know the height and width we know the x and y we define them up here so then basically what we do is the face they get passed in we're going to pull that rectangle out and this is our face region of interest and this again this goes back to we want to take these sunglasses and put them on the image but you can't combine a mat of a different size with a matte of a different size they've got to be the same size so that's why we pull this rectangle out we're gonna do some magic i'm not gonna go through this but we basically do some bit wise knots some bitwise and some bit wise ores to basically take that rectangle we just pulled out of the face and put those sunglasses on top of it and keeping the transparency in place and then basically what we can do is we can take the final region of interest that now is that rectangle we pulled out we put the sunglasses on it and now we're just going to stick it right back on the face in the same spot we took it out of and we're gonna return that face so if this works we've got some sunglasses on face so i know that was a super quick super fast walkthrough of something that is extraordinarily complicated are there any questions no no it's fine can you hear me yeah i can hear you hi hey thanks for the great talk uh the samples are really good and good to see the uh net ecosystem evolving and having great things in it so i just have one question can we run this cv sharp on a microcontroller such as raspberry pi um yes you should be able to now i will say that i have not gotten that far um but if we look back at the different flavors there is a linux arm here and i'm pretty sure that's the one you're gonna bring in to run on pie i do have plans to actually get some of this code deployed to a pie i just haven't made it that far yet um but everything i'm reading online is that yes that's that should be absolutely doable looks good all right so i'm just gonna get back into this real quick because i only have a couple slides um so pros and cons of opencv sharp uh it's easy to use like it's very easy use if you're if you're familiar with net c sharp you're gonna be able to dive into this pretty quickly um as shown in the previous slide it's very easy fairly easy to take python examples and convert them over to net it took us a little bit to get that overlay to work properly but honestly it only took us maybe three or four hours to really dissect the python code and convert it over to c it's fast um you know because again it's binding down a c plus plus it's it's very fast and as we saw in this demo we can use it with dotnet interactive so if you're coming from a python and you're used to jupyter notebooks you can have that same sort of experience some of the cons not as heavily used as some as opencv and like python so finding examples online can be tricky at times not a ton of examples online and then i have long term support in here as a question mark and i only put that in there because it is maintained by a single person there are a lot of people contributing to it um but i would still feel comfortable putting this out uh in the wild in production um but that's just you know something that something i think about q and a we talked we did some q a but if you guys have some more questions just let me know um but here's all the references so i do have the link to um the opencv sharp github sponsors please sponsor this guy um if you're using opencv sharp um you can go give them a dollar two bucks five bucks whatever you want to give them um i i sent him some money because i've been really enjoying um what he's doing and i know there's a lot of work that goes into this uh this blog post talks about the hard cascades it's a great um deep dive into what what they do and how they're built this uh is a blog post from pi image search um this is a great great article on what bitwise and or xor not does the image overlays this is the actual blog post that we use to deconstruct that python code into c sharp code so you know if you're interested in seeing um the python implementation of that overlay um it's here and then my github repo um so again i will link all this um i'll send i'll send an email out via the meetup so everybody can get this i'll give you uh the link to the github repo which will include this presentation um if you guys have any questions you can always hit me up in the discord channel or just uh email me through the meetup so before we wrap are there any other questions um yeah i have something to say and i have to say that prior to i joining this event this online meetup i don't need accessing the link so i think i don't know if it's from my from my site but i think you should maybe update it do you want to meet up the one in the middle on meetup like the link the meetup link oh the meetup link okay yeah yeah i had issues i had issues assessing this meeting okay all right fair fair i'll do that absolutely yeah cool all right well thanks everybody for joining thanks for the questions this has been awesome and i really appreciate it and uh and please hit me up if you got any other questions you
Info
Channel: Practical MLNET User Group
Views: 10,888
Rating: undefined out of 5
Keywords: c#, .net, opencv, opencvsharp, ai, machine learning, computer vision, face detection
Id: THwnlz2PEvE
Channel Id: undefined
Length: 56min 14sec (3374 seconds)
Published: Thu Jan 20 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.