YOLOv10: Train a Custom Model and Run Inference on Live Webcam

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so we have a new member in the YOLO family YOLO V10 it was just released yesterday we're going to create a video on how we can train a custom Yola 10 model in this video every single step from scratch we're going to see how we can generate a data set connect to a webcam store individual images how we can upload them to Robo flow for labeling export into Google Cola notebook and see how we can find tune a Yol V1 model then we're going to export and run it live on a webcam to see how we can set up the old computer vision pipeline so let's just jump straight into the YOLO V10 GitHub repository we're just going to scroll through it quickly so it's just released yesterday here we can see the benchmarks where it Compares it compared to all the other versions in the YOLO family so we have YOLO v 6 Yol v 7 8 9 now also YOLO V 10 but it also Compares it to some of the other YOLO models and also the real time detection Transformer and just by looking at the results we can see that this model is significantly better on the Coco data set so just to make it clear this is still on Benchmark so the Coco data set Benchmark could be that some of the other models are still better for your specific applications and projects so definitely test out various models on your specific data set so right now we're just going to take a look at the results we can see the meaner positions we have better performance on pretty much all the models but the huge Advantage from the Yol 10 model compared to all the other YOLO models is the INF speed and the latency that we can see down here on the xaxis so we can see that it only takes around 2 milliseconds to process a single image with the Nano model we have the Nano small medium large and extra large model the exact same variations as we have with yolo V8 so this is what e buil on top of the ultra litics framework we can use all the models for multics directly training validation inference export model and so on so that is built around that framework if you just go through it we can see here we can download all the weight files we can see the number of par parameters floating Point operations and also how to set up the environment but let's just jump straight into it and see how we can do it on our own so when we're starting out with the computer vision pipeline we need to generate data set I'm going to show you the whole pipeline here there's a tons of videos out there just showing you how you can go inside a Google cab notebook and fine tune and model which is relatively easy but here we can see how we can generate images from a live webcam which is most often the case or you could have like an image folder we can upload it into Ro flow and annotate them so right now we just have this generate images python script so what it does we're importing the different modules so CV2 and also our operating system we connect to our webcam so if you have a video file we can also specify the mp4 file video file and so on in here but right now we're just going to connect to a webcam could also be an rtps stream first of all we're going to check if the webcam is open if not we're just going to print that it's unable to read the camera feed we create an output directory this is where we're going to store all the images to our data set we then have our one Loop just loading in frames from our camera or video stream we show it here with our imow function we check which of the keys that is hit on our keyboard if we hit Escape we will just terminate our program close down our webcam and then we can use our output directory as our data set and upload it into our computer vision pipeline if we hit s on our keyboard we're basically just going to create an image name and then write out that image to our output directory so we can generate our data set so that's pretty much it let me just open up a new terminal there we go and we can just run this python script so generate images. Pine now I'm going to run it it's going to open my webcam right here in front of me and we can capture some images so here we can see that it has pretty much just opened up this webcam and right now we can hit Escape it will ter the program or we can hit s on our keyboard so right now I'm just going to hit s we can take up different types of objects so right now we can have multiple objects so this is my Apple wallet we can just hit s we can see in a terminal that it act like saved these images so say that we want to create a data set where we want to detect this wallet could also add my phone here as well there we go and then we'll just have to generate X number of images if you want to f- tune the Yol V 8 Yol 9 Yoli 10 models as we're doing now we probably need 100 200 images we can always do data augmentation later on but this is just to show you guys how we can have an data set I'm going to show you some we all have on robf FL that we can use directly to train our YOLO V10 model model so right now I just hit escape on the keyboard let's go over and see if we actually have our output directory so right now we have our output images if I scroll a bit further up here we have our output images and we have all the individual images that we have saved in here for my Apple wallet but also my phone down here at the end then we can take this data set upload directly to Roo flow annotate it we can draw the bounding boxes the auto annotation tools we already have videos covering every single step in there and also in more details now I'm just going to to show you that on a high level so if we go inside Robo flow we can create a new project right now let me just call it wallet and here I just have a class called wallet as well we can choose between object detection classification instant segmentation and keypoint detection I'm just going to create this data set right now we have the data set we can go inside our code files again I'm going to reveal it in finder there we go then we have all our images we can just upload a batch of it there we go I'm just going to drag them in it's going to upload them and we can go in and annotate them later on so right now it has uploaded it we're going to save and continue and we can always export it later on as well so now when our images are uploaded we can start The annotation right now we only have like 20 images or like 16 images so it might not be enough to train a model but we can just test it out and this is like the whole pipeline that you need to set up I just need to annotate each individual one of them so here we have phone I'm just going to annotate all of them and then we can start training a model and also use data augmentation so now we have annotated our images I'm just going to add it to the data set and we can specify how many images we want to use for train and validation right now I'm just going to have one for each or probably there's just a couple for our validation just to make sure that we have enough so I'm just going to add it to the data set again this is not enough data at all we probably need like 100 or 200 images but it is exact same steps that you'll need to do if you're creating your own project so right now we have our data set we can go in and create a new version if we go inside generate we can specify the size of it auto orientate or if we need to add other pre- browsing steps prob probably just go with default settings they're more than good enough also when we're going to train the model just always go with the default settings unless you want to trange something specifically or testing out multiple models so let now go and add some augmentation steps if you don't have enough data and this is the reason why we only need like 50 100 200 images in our data set because then we can just use data augment ation so right now I'm just going to add some flip both vertical and horizontally we can add some zooming effect as well with the cropping and then we can just add how many images we want to have in our data set so now we're pretty much ready we have all the steps in here we can specify how many images we want to generate so right now let's just go with 7 7x so we have 81 images in our data set let's see if that's enough even though we probably need um 100 more images to have a really good model but we will still be able to to train it we'll still be able to find unit and get some results out that we can take a look at and also run live inference where I'm moving my phone and the wallet around as well so now we can go and Export our data set we can go in and specify the data set format as well so right now we can just use any of these ones from Alo litic so we just need it exact same format so we can take your of8 for example you can download the SI file to your computer or we can get this downloadable code if youit continue here we just get a Cod name it we can copy it paste it directly into our Google cab notebook so we have our Co nit here we're going to copy it so we can paste it into our Google collab notebook in just a second so now jump straight into the Google collab notebook that we're going to use for now first of all we need to specify the runtime so we can change the runtime you can use the free dpus resources in here so we can speed up the training process so right now I'm just going to choose this T4 dpu make sure that you do that and then we can connect to the runtime it will get an T4 tvu assigned to our environment once it has connected here let's go in and run this media is a my command which basically just shows us the information about our GPU and here we can see that we have our Tesla T4 and around 15 GB of GPU Ram so that's pretty good right now we're just going to import our operating system so we can create this home directory and set that which is our SL content in a Google collab notebook so that is the file system that we have over here to the left now we need to just go in PIV install the new Yol V10 model and also supervision so we can use that to show results later on and basically just do inference so right now we're just going to process each individual one of these we're going to pip install Yol V1 and it will also be integrated into allytics so you can use it directly with allytics we can call all the commands and so on because it is built on top of that framework after we've done that we need to go in and create a directory with our weight files and then we're going to pull and download the weight file from the GitHub repository so this is just the URL for the download link and then we just call this command V get and copy all the weight files into our new directory so that's pretty much everything that we need if we just go inside it again this is actually like just the link if you right click on this one here you can copy the link and this is the download link that I'm showing you can also just press on these and it will download it to your computer so that's pretty much it we can just run this block of code and let's go down and take a look at how we can run inference so I'm just going to take an aritar image upload into this Google cab notebook environment so we can run inference and just see some initial results with a pre-trained model before we fine tune it on our specific data set that we just created from scratch so just upload this image to the Google Cola notebook just a screenshot from a video and let's try to go in and do inference on it so I go over to left copy the path and then I just need to specify it in here for the file name so we can sh it to start with so we're just going to use IPython to display this image and this is pretty much the image that we're going to throw through the model to be able to take these backs and also suitcases so to be able to run inference this is the exact same thing that we're going to run later on on live webcam but in Python code so we call the Yellow command we want to do opdate detection we want to do prediction so this is the mode that we need to specify we can set up the confidence score we want to save the results and also specify the path to our model weights and we can choose any of them so small medium Nano and this is exact same way that you will do with Yol 8 and also YOLO v9 so right now let's just go with the small model we need to specify our source so right now that is not a video but our image over here to the left so just copy it replace the source here and we specified our image and we should be good to go so we CD into our home directory call this yolow command and do inference on just a single image so it might just take some time just a load off the model but the inference is very fast with the new UL 10 model so here we can see that I like got an error we haven't specified the correct path so we have content twice so I just need to make some modifications with that I just copied the whole path here we don't really need to have content in front of this one because we have our home directory so should be good to go now I just rerun it we have content and we're doing prediction now there we go we load the model and we do our inference it takes around 100 milliseconds because we're only processing a single image right now so yeah that's pretty cool here we can also do it in raw python code so from allytics we import YOLO 10 we create an instance of our model and then we just call our source directly on the model instance and we get the results out here which we can extract so that's pretty much it let's just go in and take our image again copy the path and this is the exact same way that you will do it in Python there we go we can run this block of code with python we get our results from the results variable here we can extract the bounding boxes the XY values confidence scores and classes so I'm just going to do that we're going to print it out so here we have xyxy for all the different classes that we're detecting which is most likely suitcases cooler device here we're using dpu we have our tensor with our confidence scores and also classes so here we can see that these are the class labels for suitcases in the Coco data set so these are pre-train models that we're using right now but we're going to f- tune it on our own data set in just a second so if you want to use supervision to visualize the results and the predictions directly from our results variable here we saw how we can extract it now we need to know how we can visualize the results so I'm just going to copy paste this image again we throw it in here there we go and we should be good to go we load in our model weights and now we're using supervision to draw our detections we create this detections from our results have our bounding box annotator label annotator and then we just plot that on top of our image and here we can then see we have our bounding boxes plotted on top of suitcases detected in this image so we have some false predictions here and also here could probably be f builded out with a confidence score so right now let's go down and take a look at how we can set up and download a data set from the robot flow so right now I'm just going to go in copy our code snippet we paste it in here then we run this block of code it will download the data set automatically then we can just call the path to our data set and train our Yol 10 model so if we go over to the left now we now have our wallet one so this is the data set that we annotated from roof flow we have our test images training and also validation and then within each folder we have our images and also labels so another cool thing here is that we have our data yl file so this is structure that Alo litics and all the YOLO models are using so we just need to specify the path and I just like to do that manually because sometimes it messes it up so image path here I just copy the path and change it out here for test train and also validation so here we have valid and we have train there we go we hit contrl s just to save it and then we just specify here we have RF flow the number of classes we have two which is the phone and the wallet we should be good to go now let's close our directory and we can then do custom Training to start with here we're just going to go with 25 Epoch the bad size probably be a bit too large here just with the 3 dpu so let's go with eight for now we want to plot it and also specify the model let's go with the nanom model for now because we don't have too much data in our data set and also specify the data file or the path to our data yo file so that's pretty much everything that we need to do and again this is the command from allytics that we're just running directly and here you can just Swap this model out with Yul V 8 or Yul v9 if you want to train those models just because we have better Benchmark on the C data set for the Yol 10 model doesn't mean that Yol V8 and also Yol 9 would be able to outperform on your specific data set and task we get it a summary of the model first of all it will download it it will set up the whole data set both with our training set and our validation set so we have 77 images in our train and three in our validation set so when we only have three images in our validation set our training results Epoch per Epoch might look a bit off but let's take a look at it once we evaluate the model and do inference later on so model is now done training for 25 Epoch let's just take a look at the results just after a single Epoch here we already get pretty good results with the mean a position of 80 but it should be able to increase over time and also the losses Cal down so here we can see that the loss is 1.4 3.4 1.6 here they are decreasing over time as well and also the mean positions it is fluctuating a bit in the start because it's modifying and optimizing the weight a bit more in there and then we kind of just start to converge up here like mean prision of 050 is pretty much spot on this should be close to one and we can also see the Precision recall is also fairly good once we come down here at the later Epoch so let's just gooll all the way down to the bottom them and let's take a look at the final results so yeah we have pretty good results pretty close to one in both precision and recall again we only have three images for our for our validation so of course ideally we want to have more images and also here we have images instances and our Precision recall and me Precision so this is the validation that is running on the validation data set with our best trained model so if we go over to the left here we then have our runs directory if we go inside runs we have detect and then in side detect we have train we have our weights and right now we can just download the weights and use them in our python application and project I'm going to show you that in just a second how we can upload it how we can take up open up the webcam stream set it up draw our visualizations and use this model in your own applications and projects on a live webcam so let's just download the model here and then we can use it later on but we also have all the other training graphs here we have our F1 curve prison recall curve let's take a look at that it should ideally be close to one and we can see here that it is pretty much just spot on for most data sets you can probably like expect it to curve out here a bit towards the top right corner another good way is to look at the confusion Matrix ideally we want to have all the values in the diagonal it makes a bit more sense when we have more classes but again we want all our values to be in diagonal right now we only have three images in our validation set but this is pretty good to take a look at we also have our training badges here both training and also a validation batch let's just take a look at our training badge a validation badge it hasn't been training on those images before here we can see all the different predictions or like the bounding boxes around these images that we're training our models on if we taking a look at our validation batch here is just going to throw in the free images and do predictions on top of that so we can see that the model here is act like able to detect the phone and also the wallet in pretty much these cases again we only have three examples so let's try to open up a live webcam and see how it performs so right now I'm just going to go inside our code editor again let's go in and grab our code from generate images I'm going to show you how we can set up this whole pipeline we have our best. PT so this is the model that I just downloaded let's make this a bit smaller and this a bit larger so right now we don't need our output directory anymore so let's go Ahad and delete that part image calendar we can keep that for now and we don't really want to hit s on our keyboard either we just want to hit escape with for closing our application and show our results so in between here we need to run inference on our images but we need to set up a model to start with and if we just go inside our Jupiter notebook or our Google cab notebook again we can go and grab the code that I showed you guys earlier on so let me just scroll up there we go we kind of like have the whole code base here we need to import supervision and also YOLO V 10 from Al litics there we go and now we can set up our model and also our image we don't need to do that right now we can just take our model create create an instance of it outside of our W Loop now we just need to specify best. PT there we go and this is how easy it is to set up with Alo litics and the whole framework around that so right now let's just grab this one here the bounding box annotator the label annotator can also be set up outside our while loop so let's do that and this would actually go inside our while loop so first of all we need to do our predictions after we have done our predictions it's going to basically just throw it in here and to get our results we do this and instead of image we need to swap it out with our frame so we grab our frame throw it in here we extract the results and then we're going to use subv Visions to create this detections instance and for image down here as well now we have our annotated image with both our labels annotated on top of it and also our bounding boxes and then we can go in and just visualize our annotated frame so this is pretty much the whole pipeline now we can just run it and run it on a live webcam see how it performs and move my phone and wallet up in front of the camera and this is how easy it is to use in your custom application and projects if you want to extract the individual results again you can go up and use these ones so you can just access the bounding boxes the XY values confidence scores and also classes and you're pretty much good to go you can use it for whatever application that you can come up with and you can also use a custom data set an arbar data set I just took this data set here while doing this video you can use an arbit data set there's also ton of data sets on the roof flow platform or use your own so this is a pretty cool way to learn how to set up this whole computer vision pipeline we're good to go let's now go down and run our python application so webcam inference thep let's run it and take a look at the results so now we can see that our model is up and running I'm just running this on a CPU right now on my MacBook we're running around 70 milliseconds inference time so that's pretty fast as well that's around 12 13 frames per second so that is actually like pretty cool just by running on a CPU and you'll get significantly speed off by using a GPU as well especially with this new YOLO V 10 model where I feel like this is the advantage that you get is the inference speed or the latency how long it takes to process a single image so let's just take the phone now let's move it up and see if we can actually like get some predictions so right now we need to have it like fairly close to the camera I think we got one detection again this is not an ideal model we definitely the need to have it for longer train it for longer could also be because some of the reflections but we act like get it in some specific situations or positions so here we can see that it detect the phone so this is pretty cool we might get better results with the wallet here let's just try to move it up I'm just going to rotate around and now we have the wallet that we're detecting in the frame of course we made miss a lot of detections here and there we can probably also like lower the confidence score and so on but now we have this custom train computer vision model up and running we can move it around in the frame and we still get these detections right now we only do detections on each individual frame you can also apply optic tracking on top of it I have coures around all the theory behind optic trackers how you can apply it on top of different optic detection models deep sword sword and all of those stuff go over the theory it's all based on calman filter how you do a prediction State and also an update State and then it's just like an iterative update prediction States and this is pretty awesome because then we can assign an ID to each individual track or object that we want to track around in the frame you can use it for object counting tracking like how long a specific Optics are in the frame but this is pretty cool this is how we can set up a whole training Pipeline and again just remember I only captured around 16 images with both two classes so my phone and also this apple wallet so this is pretty awesome probably need to have it in disorientation because this is what we had in our data set let's try to see if we can get both predictions at the same time might not be the case we didn't really have that situation in our data set there we go now we have our predictions and this is how easy it is to set up the whole pipeline you can take all the code it will be available on my GitHub and you can have this up and running in 30 minutes so thank you guys for watching the video here I hope you have learned T definitely go in and test this one out here this is the new member in the YOLO family YOLO V 10 I've tested out a bunch here on bunch of different video streams and so on I'm not sure if we get better accuracies and results on custom data sets but it is definitely a faster model compared to the y v 8 and also Yol v9 model but it all depends on the task and also the data set that you have on your side so go test it out you can solve from this video here every single step in the whole pipeline it is easy to do just need to follow the video you're good to go use it in your own applications and projects I hope to see you guys in one of the upcoming videos until then Happy training so we also have an AI career program if you want to learn how to land AI jobs and get AI freelance work I teach you everything in there we have programs all my technical coures weekly live calls personal help and I would love to have you guys in there help you out in any possible way you can check out the program down description and the community and then I'll just see you guys in there
Info
Channel: Nicolai Nielsen
Views: 15,684
Rating: undefined out of 5
Keywords: YOLOv10, object detection, deep learning, artificial intelligence, yolo v10, Custom YOLOv10, Object detection YOLOv10, Train YOLOv10, Train Custom YOLOv10, object detection state of the art, Yolov10 vs YOLOv8, Best object detecdtion model, how to train YOLOv10, How to run inference with YOLOv10, YOLOv10 live webcam, train yolov10 on custom dataset, train custom yolov8, YOLOv9, yolov8, yolov8 custom dataset, yolov10 live webcam, yolov10 fps, real-time yolov10 camera
Id: 29tnSxhB3CY
Channel Id: undefined
Length: 24min 36sec (1476 seconds)
Published: Sun May 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.