Object Detection/Identification - ComfyUI Workflow Howto

Video Statistics and Information

Video

Captions Word Cloud

Captions

[Music] DJ AFK here back with another video for comfy UI uh today we're going to look at image and video segmentation and detection uh this used to be more referred to as computer vision uh what we will do is put identifying boxes around objects uh known objects that is there are custom data sets but today we will stick to the YOLO base models uh those are you only look once uh but without further Ado let's get into it we're going to look at a few of the different nodes that um are custom for comi uh you should know how to install those with the manager here custom nodes you can search uh for this purpose just for YOLO anything that takes these models you could also check Ultra litic or um uh sag seg segmentation things like that um we've got a few of these installed one of them is Pretty Dirty which is just fine and uh one of them is claiming to be pretty full featured but some of the features don't quite work right I'm sure they'll get fixed here but let's uh run a small demo this is a slightly cluttered room we've got got a couple laptops here a bottle a TV clocks let's see what we can find in the base Holo models there are 79 different objects that you can detect uh the custom models can do some different things um there are models for things specifically like your hands or fingernails um stuff like that more for video editing but we will focus on what's in the base model so we can detect a person bicycle car plane stop sign all sorts of stuff let's populate a few of these previews see what we come up with so this model right here that's the larger photo this is a pretty down and dirty implementation of just loading the model and detect everything on the same screen works pretty good we found a chair with really good confidence we found two parts of a clock it did correctly identify that but it's it is one object it did not get this clock over here uh one of the other detection models does get it and I didn't see any of them get this digital clock uh everybody correctly identified the TV uh that book is not a book it's actually a uh blue lynus router a WRT54G um that is 32% confidence so it's just above threshold which I think in this node is 25% uh we did identify a laptop another laptop a keyboard and another keyboard which is accurate now the bottom one here it's actually picking up a um audio device a DJ equipment as a keyboard which it does have keys and that could be categorized as a keyboard but if we go to a slightly more complicated model we've got YOLO V8 model detect and model segmentation the differences between these is the models that you can feed them both of them will take the segmentation models but only the detect node will take the non segmentation model so the ones that don't have a SG at the end um if we go for the Nano this is only 6 and A4 Meg model that's pretty quick if we run it it will fail on the um segmentation node because it's not a segmentation model um what you'll see as a visual difference is when this spits out the image it does not colorize the object itself it just puts the bounding box around it whereas the segmentation model does actually colorize the pieces within that bounding box of what it believes are part of that object we go back to a segmentation model that will now run in this node and we can we can select all but it doesn't doesn't actually catch all it it just actually um still chooses from the input list here I think that's a a problem with just variables not being reset in the code I haven't um haven't looked into it but if you find anything in the choose list here any of these are good keywords for input so if there were things like a dog or cat in the photo it should detect those we can also run this against video I grabbed some stock video off of pixels I believe and here's a a cafe just a camera panning around it and it it pretty much gets a lot of the stuff the painting on the wall or picture whatever it is is is now a TV and one of them shows up over here as a TV for a moment uh that guy with his cell phone uh turns into to a handbag for a moment but that's okay and I don't know what this object is but it sometimes with low confidence thinks it's a chair other than that it it gets the people the chairs the tables pretty decently even on a busy street here it'll identify the cars separately from buses and things like that let's grab this simple and dirty node right here and run it through the city street now we're going to want to do more than 30 frames Let's uh take all the frames and let's choose the Nano segmentation model and let it run we now have a segmented video seems to detect the cars and even pedestrians crossing the street here uh this little Trash Can newspaper stand whatever that is keeps wanting to identify as a car but uh we do see traffic lights um and one that's not actually a traffic light but that's only popping up with about 26 30% but lots of people cell phones things like that and for a 10-second video 30 frames a second that render pretty quickly let's go ahead and make our own workflow we don't need any of the default nodes we're not going to run K sampler nothing like that whichever nodes you've installed go ahead and set a model loader let's load a model and apply that model right so we'll use the detection node we'll still use the uh version 8 in and let's grab a load picture node we'll just go ahead and load the same photograph in here this will export uh Json data uh there's two different ways you can do that and what it what it will um export is all of the bounding boxes the um a bit map of where the objects are let's just for Giggles let's let that run as detecting all and it detects nothing so if we go to input we don't have a person cat or a dog in here let's uh put in some things to detect chair TV and if we set this back to all it'll still detect those same things I'm sure they'll fix this in a in an update so we can add laptop and book let's see which models detect that Linkus router as a book all right so the Nano did not let's try the X model this is 130 mag model versus 6 it also didn't detect it this time sometimes it does sometimes it doesn't um I think that was the difference between the segmentation model and the detection model let's try that with the segmentation all right so yeah that is now detected as a book they must have slightly different data sets in there or but as you can see that only took a few seconds to put this together uh these are all open source they're really really small if you look at how they work they're only a few lines of code for to actually run the model running these YOLO models in Python by themselves is pretty simple even if you don't have good coding experience you can just grab some files off of something else modify them to what you need you could run this against a list of photos or a video uh some of the models take different arrays of images in and out um if we go back to our test example here we'll go and save that the uh model that's just labeled seg over here let's see if we can determine who's that is that is this one right here say zc Frank first this particular model will not accept um arrays of images out in the same fashion that like video combine will take we need it to go to an image batch it comes out as an image list so in order to fix that uh when we load the video we actually go to a image batch to image list node this can converts the batch format output to a list then each um frame gets processed independently so it feeds them one at a time to the seg node then we convert those back to an image batch so that VHS video combin can put those back into a video however with the model detect here which is from The Comfy UI ultr litic Yolo by shadow cz7 that one we do not have to do that so your mileage may vary with custom nodes some of them may come out as a batch some of them may come out as a list or your loaders and combiners may be a little bit different as well so if you're using these types of nodes keep that in mind you may have to conver the arrays other than that good luck with the models stay safe

Info

Channel: djAFK

Views: 782

Rating: undefined out of 5

Keywords: comfyui, stable diffusion, object detection, object identification, object segmentation, ai detect, detect and label, ai segmentation, ai segment, segment video, segment anything, learn comfyui, ai art, segment detection, segment objects, comfyui interface, segmentation tech, tech segmentation, detection labeling, interface detection, segmentation labeling

Id: 2iavu4rMQ1c

Channel Id: undefined

Length: 13min 18sec (798 seconds)

Published: Fri Mar 22 2024