Ultralytics Live Session 3: Ultralytics YOLOv8 - The State-of-the Art YOLO Model

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

now all right so I just click the button right now and it says that we're live so and we're live okay all right so thanks for joining us everybody uh I am here with uh ayush and let me introduce myself I'm Glenn and I'm the founder and CEO at ultralytics uh and most well known as the YOLO V5 author and I'm here with the use who joined us last year uh as an ml engineer and he's been a key part of the YOLO V8 uh Construction hey everyone okay all right so I see a bunch of people talking to chat here oh so first of all let me say uh the structure of this webinar is going to be like this so you should I are going to present a little bit about YOLO V5 and what we've done new for YOLO V8 and that's going to take about 15-20 minutes and then we're going to open this up for questions and I think that's always the most interesting part for me so feel free to leave all your questions here uh right in the YouTube chat window and you should not be browsing this to uh pick the most interesting ones okay uh so where does the story start so uh ultralix was founded in 2014 uh in the DC area and we initially worked on particle physics for uh intelligence agency contracts this is how we got into artificial intelligence uh we were looking for better solutions for data analysis for the particle physics and we started working on AI for things like regression like the energy of a particle and where did it come from and from here we've kind of migrated into the open source space and it's been pretty exciting ever since we started working on YOLO models about three years ago actually I started by myself back then and I open sourced my work I started with yellow V3 by Joseph Redmond I helped Port that over to Pi torch and then I just kept working on it kept working on it and hey guys see a lot of highs and eventually uh I decided to launch something new and this is the olivi5 so this has been out for about two and a half years now uh it's it's changed a lot in that time so the most important thing about it is it's always constantly evolving it's getting better it's improving um we're folding feedback in we're trying to figure out how to make things easier for everybody and we're seeing all the bug reports and all the feature requests and when they start to get popular we say okay we should probably if I do this because everybody seems to want it so that works really well for yellow V5 and um it's turned into a pretty mature tool but it hasn't um it's it's missing a few key features and some of the things were sort of difficult to change after time went on so uh when I first started working on the YOLO repositories I was actually brand new to AI I was brand new to Python and Pie torch and this meant that uh the YOLO V5 repository as you see it now it sort of built on not the most mature Foundation it's it's built on a lot of trial and error so it's sort of like one mistake over another fix or another fix and with a lot of hard work and a lot of elbow grease so it's gotten to where it is right now and it works really well but using everything that we've learned in the last few years you know the question that we sort of posed ourselves was if we had a blank plate carte blanche uh pen and paper like what could we build using everything that we've learned all the mistakes that we've made and corrected and all the feedback that we've gotten like from students from organizations and it's been a lot of feedback on there's over I think 10 000 issues in the yellow B5 repository um I try and answer all of them but it's it's it's a tough job but it's given me tremendous insight and so over the last couple years I've gained like a great sense for what's going wrong so what the pain points are what we could do better um and we've tried to work on those but like I said some things just work better if you start from a blank slate and so uh late last year day like September we started working on a secret project internally and this was while we maintained yellow V5 on the one hand uh we started with a blank Repository and we decided to see what we could build there uh and the key Point here is that it was a new repository and we didn't really it wasn't like a fork of yellow V5 it was really something new and we ported over what we could from like low level functions things like conversion functions like xyxy to XY with height but for really high level stuff we decided to create something new and what we built was informed by what we've seen in some other tools things like Pi torch lightning and also what we learned from trying to implement new tasks the task is what a model is applied on it's the problem that is solved so tasks could be something like detection segmentation classification in the yellow V5 repo supports all three of those now uh which is great but we learned that as we added new tasks we were duplicating code so we would copy things like detect.pi from detection and we'd make a classification version of that and we realized a smarter approach would be to create classes that were adaptable as base classes so you could have like a base predictor which would serve as a base for something like a detector class that would inherit from the predictor class and so we started creating a structure like this and this is what we have now as YOLO V8 so yellow V8 is two things it's first of all a lot of r d on the architecture side which we'll also get into um but it's also the construction of the ultralytics repo which is a bit of a hybrid approach so you could think of it as something like pie torch lightning mixed with paddle paddle mixed with pie torch and stir together and focused really on YOLO so it's capable of creating models for any architecture and we've been really careful when we created the ultralytics repository to make it architecture agnostic so you'll notice in the docs and in the repository we have examples but none of the code has YOLO hard-coded into it it loads a yaml file for architecture or Pi torch model for trained waves but those package models could really be anything and the ammo files could really construct anything so that's the real exciting part about it it's really flexible and it's capable of constructing all sorts of not just yellow models but even other models and so out of the box right now in the electrolytics repo we have YOLO V3 ammos yellow V5 yamls and YOLO V8 ammos and we're working on adding other architectures like Yola V6 for example yeah good that's pretty exciting um but let's see I'm sure a lot of you guys like want to hear about the architecture most of all that's always like a Hot Topic and so let's talk about that for a little bit so okay so the YOLO V5 models were built off of the YOLO V3 models and they inherited the same head for detection the same type of box construction and XY with height format the same channel order which was permuted which was a little bit odd um the same sorts of loss functions and initially the same augmentation policies and so moving over to Yellow V5 I tweaked those subtly in a few different areas but I mostly maintained a high relationships to YOLO V3 so the changes were very minimal and the improvements that we saw in Yolo V5 were based on uh all of the above a combination of factors so uh higher augmentation Mosaic augmentation in particular uh improved loss function implementations and some tricks on the validation side and just a lot of work on smoothing everything out making sure that it all worked correctly and so when I launched geology 5 originally uh the first year 2020 I mostly worked on improving the math and the accuracy and then in 2021 I uh continued last year though I did not last year I focused more on adding features so this was things like exporting yellow V5 to other formats um writing tutorials to make sure that everybody knew how to use those um making it inference capable for all the different formats so you could export a yellow V5 model to tensor RT and then run it with the same repository uh you can also validate it if you can make sure that your exported model didn't lose any accuracy in the export process so this is a lot of work and it meant that I kind of moved away from focusing on uh the basic r d right like what are people doing in the research space to increase accuracy but other people did not so there was a few groups last year that released interesting research uh and some of it was pretty yellow specific so we have companies uh like medjvi that released yellow x uh more recently May Tuan with the yellow V6 uh paddle paddle is also releasing their own YOLO bottles called PP yoloe most recently uh and of course we have Alexi uh with Yola V7 released last year also and this is great uh because this gives us an opportunity to read all those papers and look at the r d we'll talk about papers also because we're going to have one too um but this meant that we had like a nice pile of r d sitting around that we could Implement as improvements and so we really undertook a an ambitious step which was to take all the r d that had accumulated in 2022 to create the new ultralytics repository from scratch in a more extensible way than we had of the Olo V5 and to create new architectures which we call YOLO V8 put it all that together try and debug it and try and open source and ship it by earlier this month so that's um that's what we've done we put it out on January 10th and we we try to mature and debug it put in some CI and some tests but of course as a brand new architecture and a repository it does have some bugs and we've been furiously working on those day and night um the ml team here at ultralytics by the way so I introduced myself into use but we have another contributor who goes by the GitHub handle laughing queue He's listed on the repository uh if you want to take a look and he's also been key to the r d effort yep so let me share my screen here let's see do we have a lot of questions dogs uh the new dogs that we have oh start with the new talks yeah definitely yes the dog season exactly definitely okay so as you all know to get to the new repository you go to github.com ultralight ultralytics and here one of the main changes like you said is the docs and with yolo V8 we got a lot of feedback uh we should work on the docs more and with yellow V8 we've taken that into account we've listened and we've focused a lot of effort on the docs so the docks are all in the same place the docs are at docs.ulturalytics.com and uh we have different sections here so uh YOLO V8 can be used in two main ways with the CLI the command line interface by running YOLO commands and in Python if you want to construct your own tools in a python environment so the dogs explain uh everything how to get started uh the installation is in quick start we have explanations of the different tasks like I said before detection segmentation and classification you can see an example here of each and then from there on we have usage examples uh how do you predict and also how do you update your configuration arguments so this is a main Improvement that we did from YOLO V5 configuration in yellow V5 was split in a number of different places we had an ARG parser for command line arguments we had hyper parameters in hyper parameter files and some variables were hard-coded in the code and these three different places made it kind of confusing to figure out how to change your setting specifically how to reproduce a run and with yellow V8 we have everything in one single place so whether you use python or CLI everything is in the configuration yaml which is right here default.yaml it's a single file there's only one yaml for configuration in the repository and this contains all the defaults for everything training inference validation and export so you can either change these defaults or you can pass what are called overrides so if you want to run a prediction command you can specify the model with model equals and that'll override the defaults here so this is really cool there's there's one single place where settings are stored and this way there's no guessing you know where everything is okay Okay so let's see we're going to run through a quick tutorial here so the easiest way to get started I would say is with the collab notebook so right here you'll see this button in the repository opening collab and if we click that this will take us to the yellow VA collab and we tried to make this really simple so even if you've never run python in your life like me when I first started YOLO you can do this just by clicking uh display button so another main advancement with yolo V8 is that it's pit package native and this means that you don't need to get clone the repository you can simply pip install ultralytics and this helps a lot out with environments so with yellow V5 you could clone the repo but maybe you might not install the requirements and it's led to some problems like people might not um have the environment set up correctly so when you pip and sell a package it automatically installs all dependencies required so here we've run the setup cell and to get started running predictions you again just click this predict cell and this will run this command the command that we see here is YOLO predict we pass a model which is yellow v8n which is our smallest model it's really small it's like three million parameters that we pass a confidence which again it's not necessary unless you want to modify it if you don't pass confidence it'll pull from the default arguments and then a source and we've got this online image and you can see this uh downloads the image and then it runs inference in about 15 milliseconds and then we get the result here so some of this usage might seem similar from yellow V5 and we've done that on purpose we've tried to ease the transition it's not identical but it's also not identical to give us the opportunity to implement these breaking changes and the Breakthrough changes are quite a few they are pit package native uh they're now we have this simple YOLO command on the CLI uh and a few others like the single CFG and the different architectures so we are we're working on yellow V5 native support in the ultralix repo and pretty soon we're going to train some YOLO V5 models here uh they're not going to be identical to the YOLO V5 repo it's a little confusing but they're actually going to be better so our recommendation is for any architecture we'd recommend you train it on the new ultralytics Repository that way you'll benefit from uh the training the loss function the augmentation improvements that we put in here so you'll get a yellow V5 model but it'll perform better than if you trained it on the ultralytics revo how much better we're not sure because you haven't trained them yet um but the command is pretty simple you simply pass the Eola V5 vml instead of the yellow V8 EML uh do you think yeah so can you talk a little bit about uh well you already talked about the differences but uh about the directories uh in yellow V5 there was this requirement that you can run these these commands as a python script right so you have to be linked to repository or at least like refer to a repository but here this uh doesn't need that right it's system-wide registered command so could you talk a little bit more about that as well yeah that's right so anytime you run a command here the result will be deposited in the runs directory so runs is created in the current working directory uh so like you said with yellow V5 you always had the git clone repo so you kind of knew where your stuff was this is a little different because now your directory can be created anywhere in your operating system so you pip install trilytics um and then no matter where you are if you're in a directory or subdirectory it will create a runs directory and if it's Auto downloading data sets it'll create a data sets directory so these two will be created in your current working directory and this helps you keep track of everything so here you can see that we have different folders for the different tasks and within a task we have what are called modes a mode is something like predict train vowel and so this keeps everything really organized just like yellow V5 but with the added capability that you can do it anywhere in your operating system um also we have something new we have settings so settings control uh things like where you create a data set directory so maybe you want it fixed in a certain place and you can change that in your settings and the settings are really easy to figure out and modify you do this with What's called the YOLO settings command so you can run this command it will show you your settings and it will show you uh where they live and this is just a yaml file and you can change it like any other yaml file so right now runs are created in the current working directory but if we wanted to fix this we could put an absolute directory here rather than the relative directory and then the runs directory would always be created in an absolute place in your computer so uh uh let's see validation again it's pretty simple so again we see the same thing with task and mode uh and we're actually working on eliminating the need to define a task so since we're passing a model the models have identification inside about what type they are what task they're doing and so we are working on a special case one where you pass the pi torch model we inherit the task directly from that in the case of exported models like if you export to say onyx or openvino we're not able to identify a task automatically from the model and then in that case a task would be required like pass detect so a mode is required it's what tells us which you want the model to do whether it's predict or validate and if you validate the model has its training configuration embedded within it and so it actually knows its data set uh and this is something that's new with yolo V8 so the yellow V8 models contain everything that they were trained with so they know the data set that they were trained on and the training settings with things like image size for example and they will automatically apply those at inference time which is really smart and it's something that we've wanted to do for a long time in yellow V5 but it was just too complicated to kind of implement a fix that way but now that we've had the opportunity to build something from scratch we've taken that so uh we have got an example of uh detection uh validation here and segmentation models are just as easy uh so for example if we wanted to run a prediction for let's see here so we've got detect here but if you change this to segments and we change this to Dash seg or segmentation models now this will run segmentation prediction on the same image and it'll tell us the runs directory that it dropped the result in so right here you see runs segment predict so I go here runs segment predict and I've got my image double click it and it should show up over here and yep effectively we see the result so and again this is this is the Nano model uh so this is just a model that's three million parameters and it's running segmentation at a level that I think is SATA it's um it's super fast it's super accurate and it's really capable so oh but to the architecture okay so we should talk a little about the architecture yes okay so the the yellow V8 architecture uh embeds improvements uh that we were working on from YOLO V5 uh let's see from YOLO VX we adopted a split head and so we've got a pretty aggressive split head approach for segmentation we have three heads um one for classification one for oh I'm sorry uh one for masks one for boxes and then one for a classification so we've actually we've dropped objectives in yellow V8 um YOLO V5 and yellow V3 and most cellular models have what's called objectives so we identify an object first and then we also have classification next to that we identify the class with yellow V8 we've merged the two together we've gotten rid of the objectives output so we're directly classifying individual uh outputs and we've also gotten rid of anchors so YOLO V8 is completely anchor free um I worked a lot on auto anchor for YOLO v5s to to evolve anchors and all that kid's gone we don't need it anymore so I'm kind of happy in a way kind of sad in a way but this also means that the model should be more robust to different shape inputs uh sometimes with the yellow V5 with like really skinny inputs uh like maybe like long rods like we had problems with high aspect ratio objects and now that should be much improved we're also using something called dfl which actually has 16 box outputs and we merge those into a single one so we've cited the research there and it's responsible for some of the improvements that we see in regression uh we've Also let's see we've adopted um a much higher use of three by three convolutions which we saw in the YOLO V16 for matuon uh this was a way to improve accuracy while maintaining similar speed uh mostly on two to accelerated devices so this means that the yellow V8 models are uh they're about the same size as the YOLO V5 models they're more accurate uh and on Cuda in France they're almost the same speed on CPU they may be a little slower because there are more flops so it's a bit of a trade-off that we calculated here you know about augmentations how have augmentations update have been updated yeah so we've uh let's see so we've done some improvements and some fixes and we've actually applied a little more aggressive augmentation uh we've done more uh copy paste augmentation when segmentation labels are provided and one of the things that we've also done is we've implemented what's called closed Mosaic and so the YOLO V8 models have this on by default this means that for the last 10 epochs of your training there will be no mosaics in the data loader and this is something I think again that the YOLO X team came up with and other groups like magic have adopted and Mage one and this allows the models to um I guess for the last few epox kind of like see images which are more representative what they would see in the validation sets it's obviously invalidation and in the real world you're not going to see mistakes so this allows you to have aggressive Mosaic augmentation during training um but then apply more real world side images right before the end of training and this results in a slight boost also so it's uh it's really a combination of all these different factors uh the dfl loss the objectness removal of course when you eliminate uh one loss then this means that the model can focus more cleanly on the remaining losses so we still have three losses um the acceptance said in Yolo V5 the losses were objectives classification regression we've maintained regression but we've added dfl to it and now we still have classification so these are the three losses here in yellow V8 and about the paper there will be a paper so we've taken the feedback seriously the reason that we haven't published one for yolob5 is that as it was moving quickly uh it didn't make sense to publish a paper since we were constantly updating the architecture not only that uh but our efforts were fully focused on the software side and since our team is small it's about uh 10 people split between the US and Europe we just didn't have enough manpower really to dedicate to to the publication side oh you're taking the feedback in to account and we've actually gotten started on a paper uh We've written the abstract the intro some tables and we should have that out fairly soon here um I don't want to give a date because I've said some things in the past that people have held up but um if I was a betting man I would say probably in February so what do you think you used to did we cover the bases or yeah I think so and we also have a lot of questions so oh yeah we have a lot of questions here so I think I should probably stop talking here and just get questions all right uh I'm gonna stop sharing my screen or maybe you should keep sharing because you might need to show some things I don't know okay all right let's do it so uh okay yeah why don't we wow my goodness okay uh from the most recent ones yeah some of them we can just uh ask yeah you guys to open an issue uh because those are like related to code so I can just solve them on issues but yeah otherwise okay let's see so I got one from darshan renuka he says training data consists of different camera angles the issue is yellow V5 fails to understand the camera angles on a test okay so this is really data set related so in AI there's two main components there's your data set there's your model A lot of people think that the model is going to kind of solve everything no matter what they do with the data set but that's not true the data set should also be representative of what you want the model to achieve in the real world so this doesn't mean that the data set should be composed of great pictures with great lighting it means that if you're going to deploy it in a certain place then you have to take a look at that image space and you have to make sure that your training set encompasses that and provides examples throughout that image space so if you have certain angles or Lighting in the deployment space you got to put that in your training data and then you'll get much better results uh your model will generalize better as it's called yeah but why don't you just scroll up let's let's start from the first question otherwise we'll lose them as okay all right let's see wow okay yeah there's I think from starting with Sergio he has a question but that probably will need to go to issue about that uh so yeah Serge you just opened up an issue yes oh you just have an issue okay oh that's another thing obviously if you guys have questions that we don't answer here or uh bugs that you see definitely open up issues you know we really rely on that feedback to know what to work on um so what I spend my day doing is is going through the issues figuring out what's wrong what are people mostly complaining about and just focusing my effort first on that so so my job is kind of really to work for you guys here so let's see here but Sergio the way you've shown this example here that's correct you should be able to say CFG equals point to any yaml that's a copy of the default yaml with some Fields changed and it should work if it doesn't though raise a bug report uh geez shoeboxes how do I use the Olivia with each sword um I think you could talk about that yeah uh so we have this repository by a friend called I mean the repository is called yellow V8 deep sort um I can probably add a link okay never mind uh you can just search it uh so he has implemented most of the Sorting algorithms the Sota ones and those work with uh both yellow V5 and yellow V8 models and we are working closely with him at this point to sort of support uh build it on top of hydrolytics repo so that as we update things uh though that repository gets those features also so we'll collaborate more in in coming days and you can expect us to support all different tasks like including including deep sort uh better directly from the Repository mm-hmm let's see I see a really good question here so Pavlov's lab says you'd like to see support for Apple's M1 and I think we all would so yeah um we've put effort into that um but ultimately some of it's out of our hands so the pie torch team themselves is working on support for all the different functions and so M1 support isn't just it's not like one switch that you turn on it's really like a little switch in in every single function like a convolution function or a multiplication or subtraction and pytorch has good support for M1 and they're uh adding to the list of operations that support it uh as we speak so I think right now today it's not fully natively supported um but the list of supported functions is growing and pretty soon we should be able to enable better support for that um but what we've seen though is really good speeds they have um they have like an air override that'll switch to CPU when you turn that on I think some functions do work like classification training should work uh detection and segmentation functions that I don't think are fully supported yet but the speed up is pretty incredible so once we get that I think all of our Lives will be much better at least for Apple people yeah there's this question by Ronald uh that I like how does your work compare with the Euro development in China I mean yeah yeah China China has been getting uh a lot of improvement on on the yellow model development so so when I first started I think most of the action was um it was like on the open source side in the U.S but more recently we've seen very very uh impressive efforts um by different Chinese groups and they're usually backed by well-funded institutions so for example the paddle paddle team I think is backed by Baidu we have Meiji we've also got mituan who's a recent entrance uh they're a very big company and they're using it uh for a little like delivery robots um so sometimes in the streets like you see these little robots going around delivering things these days or sort of cropping up around the world and you'll always inside those and you know they're helping them kind of like not trip over people and find the right way to go I think it's it's pretty admirable the work that they're putting and also of course if they're opening sort that they're open sourcing it yeah also like we are not like we want to work with them and we've already been working with uh most of these uh companies and labs and you hear from us uh more on that in coming days but yeah we we you can also like use Alternatives repository to train any YOLO model or even non-yolo models so uh but but we'll provide official support soon we are working with these companies yeah um let's see future Bim says after retraining with a custom data set do I need to create my own yaml uh so no like you never need to create an yaml if you want to you can uh yaml really has it has the configuration that's used to train the model uh and you can leave the default yaml you never need to change it and you can just use command line arguments if you want to but if you want we have a special command it's just uh YOLO copy CFG it'll copy the default yaml in your current working directory so you can see it you can modify it and then you can point to it if you want to do new trainings from it um and that's explained uh in the docs and also we have a super simple help command which I should tell everybody so it's just YOLO help uh or just YOLO if you just type YOLO with nothing you'll think you need some help and it'll show you a few examples so all right let's see so let's see what else we got here do what training steps do you recommend to train YOLO and very small objects okay so this this question always has the same answer uh so on the architecture side we've tried to improve the models But ultimately for small objects you want larger image sizes there's no way to get around that and you'll always get better performance at larger image sizes so if your objects are real small like two or three pixels probably like double your image size and your accuracy will improve it's uh it's a sacrifice obviously on the flops and infant speed but that's the way the game works it's always a compromise you can have really fast or you can have really accurate and there's a whole world in between those two that you can play with okay let's see let's see what else we got what do you think is a good question to use uh yeah I just uh I was looking some for some feedback also I got nice feedback by Justice he says we have to fix the resume functionality uh oh yeah yeah we have we have a list of things that we want to like work on we know that those are broken on the readme including the link including the paper that we want to publish so yeah we'll work on those as soon as we get some time uh we are currently working on fixing some like P0 bugs which are like the most crucial bugs uh but yeah as soon as we get some time we'll get to the paper we get to the resume functionality and and DDP stuff also yeah yeah I would say in general if you guys have any bugs or you're frustrated with something uh just assume it's gonna get better in about a few days or even a week uh just just come back and and if it's not definitely let us know like we are uh fully on top of of making everything work the intention here is to provide all the feature functionality that we have with the yellow V5 but we're not there yet um we have a few things that are missing uh like resume we also have like another big one is tensorflow exports and we haven't fully validated uh inference with exported formats so YOLO V5 is very mature and capable in that sense so you can export to any format and then you can run inference and validate those exported models yeah we want to do the same thing here we just we haven't gotten to that yet because we're sort of like taking off the uh checking the boxes on the day's functionality yeah Glenn right now we're not showing some demos so maybe it's better for you to stop sharing the screens probably uh yeah that'd be better I think yeah sure okay you know and just uh read off your screen because it's not really okay okay um right yeah just uh let's let's continue with the questions all right let's see what else we got oh compare yellow V8 with yellow V5 in terms of speed oh I guess I should share my screen again here okay yeah okay let's do Okay so we've got uh another another really good thing that we've done here is we've profiled the models uh against all the good stuff that's out there today so um the first thing you see here front center is this table and we put a lot of effort into this table so um we're gonna open source the script that we used to create this table but what happens here is we go to these different repos we download their models we export all of them to Onyx format and then we run an onyx profiler and then we run a an exported a tensor RT and then we run a tensor a t profiler with trt exec and so we've saved all those times and then we go to the repos we go to the papers we get the accuracies and the map and then we plot these charts here and we plotted two different comparisons one of them is the size of the model that's this first one parameters and the second one is the speed um we got ourselves a super fast a100 that we profile design and uh what we see is that we are kind of like leading the edge here so the best place to be is in the top left side spot if you were to the left then you're faster uh or smaller and if you're up then you're more accurate so this is uh this is the difficult game because we try and optimize for really wide regions in this space so the end model like I said it's only three million parameters and the X model is uh I think it's almost 100 million so these are vastly different sizes and they require slightly different hyper parameters to not overfit and to get the best results and so we're going to publish code to reproduce in the paper that we're releasing and also code to reproduce this plot but what we see here in general is that the yellow V8 models compared to the yellow V5 models are smaller and more much more accurate the biggest Improvement is with the smallest model to sort of give a comparison of the biggest Improvement the yellow V8 end model is about 37 map compared to 28 for the yellow V5 model perfect excuse me and then on the on the large end we've also focused on some of the competition like YOLO V7 and YOLO V6 uh in this other models here that we didn't plot but we compared to also and that we tested against um like for example like RTM Dead uh ppoe and others and so our typical approach was to export these to Onyx and look at them in nitron and I would I would look at the architecture that way uh so the publication you know sometimes like they say certain things it's hard to figure out exactly like what's in the model and so I just go straight to the Onyx model because the Onyx model and then the trial viewer they don't lie you know they show you the actual underlying structure and so I'd start to see patterns in those like I see the same uh sort of CSP blocks the C3 blocks that we've used for a long time some of them tried attention modules like RTM dead and so we ran experiments with everything that we saw the other models doing um where we could extract gains and we would kind of like put those in our list of changes for yellow V8 and the accumulated list is what we trained on and what we have here is yellow V8 so it's really something that's folded in all the improvement from all the other groups out there uh sorry I'm losing my voice I'm just getting over a cold here while you're on this on the repo there's another question uh which goes like using ultralytics with other architectures what is required to use autolytics repository with different architectures uh like yellow X yellow V7 Etc oh yeah that's a great question yeah we can show them like we already have yamls for YOLO V5 it's just that we don't have so if you go inside the models directory you'll see these three subdirectories uh V3 V5 and V8 so V8 has the V8 demos and V5 has the V5 ammos uh and if you can guess it V3 has V3 demos and so to train on any one of these uh you can simply go to the command line so we have a train cell here and uh let's see so we're pointing right here to a pre-trained yellow v8n model but if I change this to say Yola v3.yaml that's it that's the only change you need and now you'll be training a YOLO V3 model but it'll be using this new Advanced split head to the and that's that's a major difference here so the yellow V3 ammo when you go inside it this is the exact architecture that Joseph Redmond created the only difference is that we have a detect head and this detect head is now using uh a more a much more advanced loss function removal of objectives removal of uh anchors the dfl loss and also the split head from YOLO X and so just this single change is going to produce a better model and so this is no longer a identical replica of Joseph Redmond's V3 but it is an improvement so so we don't want people to think that they're going to train an identical model from the paper we're sort of providing an ultralytics flavor YOLO V3 here and the same for yellow V5 this is an ultralyx repo flavor YOLO V5 I think uh like yellow V5 is very popular and some people I think may want to continue using it especially if you want the very smallest or the very fastest model so so you'll aviate models in general are much more accurate um but they may be a little larger or a little slower so it's a bit of a trade-off that we've tried to work with to provide the Best of Both Worlds um the plot that we show here is let me see sorry comparison plot obviously it if we look at this we see the red line is yellow V5 and the blue line is yellow V8 so it looks like blue is outperforming red everywhere which is true but again this is for a100 tensor RT so your application may be different if you're on a CPU for example if you're running on onyx or openvino uh you may want to reconstruct this table or maybe we should work on providing one of those also use so we can see what the CPU comparison looks like yeah yeah we have to we have a lot of to-do's yeah a lot of tutors yeah that are not uh official yellow V8 models there are no pre-trained weights but we plan to add them soon so if you use those yamls you have to train from scratch you cannot use dot PD files to like and get the pre-trained models from the assets uh but we plan to add them soon and we also plan to add more architectures not just yellow V3 not not just architectures by ultralytics so yeah just stay tuned for that we are working online mm-hmm yep so you can see I've clicked the play button here it's it's constructing the yellow V3 model tells you what it looks like and it starts training it so we're using the ultra links repo here now we're training a YOLO V3 model so as you can see these new losses that we talked to that box classification and now dfl which is our new lot okay let's see uh I haven't looked at the question list in a bit let me see what else we've got here wow this is definitely our most popular video all right so I see I've seen this question a bit before about 3D compilations um Medical Imaging analysis this is interesting so uh in general like we okay so the input to a YOLO model it can be any Dimension uh in the channel space and so for example you can do single Dimension images black and white you can do hyper spectral images um and everything in between so naturally it's three channels for RGB but you don't actually need 3D convolutions to work with multi-dimensional uh imagery of spectral imagery maybe if you're getting into like video sets and you want sort of like a custom solution that reads multiple frames um you might you might be looking at some 3D compilations But to answer your question specifically we haven't experimented with 3D convolutions but I don't see any reason why you wouldn't be able to extend what we have to them let's see Melissa says I'm encountering some issues yellow V5 is great for my custom data set but UOB it seems a little more tricky so Melissa like this uh this doesn't uh seem too strange so uh the YOLO V8 models have different losses and they are not as mature as yellow V5 so like oftentimes when I talk to uh business customers or Enterprise users I I advise them to stick with yellow V5 for now because we have what's called maturity um with yolo V5 that we just don't have a deal you'll be it's more for like when you're looking to take on like a little bit of risk and some excitement and some debugging and uh obviously we're working on making it smoother and making that work better um but right now um V8 is definitely not as mature as the other V5 so some places that we could look at improving the stability for training maybe things like warm-up uh and taking a look at what the new losses you're doing so some of the new losses may not be as stable like dfl perhaps um maybe things like grading clipping we might want to introduce to improve stability on custom data sets but as we get more feedback from users like you uh and if you can give us reproducible examples where it's unstable that'll help us definitely reproducibility on on issues is like super helpful so if if we can reproduce the bug that's 10 times more useful there's another question that goes by could you explain the logic behind dfl I think dfl loss hmm that's I can try not to dfl author but the way I understand it is that the boxes are constructed at different scales and so rather than solving for one box uh you saw it for say like a very small box in the larger box sort of like a metroshka doll and then you merge all the boxes together uh using a weighted combination and since you have more estimates more estimates always means less noise and so dfl is intended to regress a box more accurately than traditional mm-hmm so I see a really interesting question here is that any benefits or Integrations with python 311 so uh this is interesting because Forge doesn't officially support 311 yet and so if I wanted to and we we allow people to install a trilics on 311 but you may need to to tweak your torch installation you may need to build some sorts there um 311 I think was released in October November last year so it's been out for a few months uh and there's an open issue on the pie Church repo about official support for it but for right now I think if you try a new storage with 311 uh you may be on your own but hopefully that changes soon hi there have I eaten my hat yet okay so this is uh this is something I said uh last year I I claimed the paper to be out soon and then when I said that I really did think it would um but life is uh throws curveballs at you and uh after that we had to work on some other things and I wasn't able to to dedicate the effort to the paper so uh so yes I apologize and um and we will we will have a paper out here soon with Yola V8 uh and we will also explain yellow B5 as part of that all right oh this is interesting this is a soggy question so sahi is um is a tool that uh takes an image and chops it into sub images and then runs inference on those and then reconstructs the results together and so this is useful for like the very large images and uh uh my take on this is that uh I wish this wasn't needed so ideally if you could pass a single image at a really large size then you wouldn't need to chop it in smaller smaller pieces so oftentimes like I would say try and use like a very small model like an N model on a very large image at batch size one uh and if that still doesn't work then yes then it needs to be chopped up uh so right now there's no official support but uh we also have it on the list we're aware of that and we've talked to the sahi out there actually so it's funny because when I first got into AI I worked on a satellite image competition exview and there I did have uh a function to chop images up but uh that was my very first work and ever since I haven't included that officially since did this really a workaround that ideally wouldn't be necessary uh let me see can we give numpy array image to pip install version of yellow V8 yep yeah yeah you can pass a lot of things to Yellow V8 uh so you can pass it an Empire array if you have that uh or just a file name uh or even an online file or like a YouTube link a directory of local images uh a video directly also or you can construct a batch of like say pill images or opencv loaded images there's one thing I should say that's very important um with yolo V5 the inputs are all RGB order so if you load an image with CV2 as an Empire array it has to be converted from BGR to RGB with the ov8 we've taken a different approach so you can load an image directly with CV2 and pass that numpy array you don't need to switch the order but that means that if you Olivier receives an Umpire red it is assuming that that is in BGR order that's the only input that it assumes is in BGR order everything else should be RGB foreign 's really fun and we created this from we actually sort of uh were inspired by Elon musk's first principles idea and so we sort of set aside the real world and we said in an Ideal World how could we use this from the python uh interface like most easily and what we have right now is that so we first like put them on a whiteboard like what would be the easiest way to use YOLO we imagined it and and then we went about kind of making that happen and so that's why that's why it's really cool and that's why it's really different um but let me see the question says I'm just wondering if there will be a model natively trained at 1280. yeah so these are called P6 models because they have what's called known as a P6 output it's at um stride 64. so yes there will be uh oh by the way we should talk about the release schedule so yellow V8 is just like YOLO V5 it's gonna improve over time and it's actually going to improve faster and better than yellow V5 so we have specific dates where we'll be releasing minor patches so YOLO v8.1 will be released on April 1st 2023 you'll love you 8.2 will be released three months later so on a quarterly basis we'll be releasing minor patch versions which will include architecture changes and we have P6 models scheduled for 8.1 on April 1st uh and in the meantime we're releasing patch versions which is uh like 8.0.18 which we're on right now so we're releasing those pretty aggressively uh just because we're trying to improve and stay on top of things uh hopefully in the future we'll kind of slow down that schedule a little bit but for now we're releasing the moments on a daily basis um which is another thing to point out so if you got a bug like on Monday maybe it's solved on Tuesday so just update your package okay so Reuben says why did you decide to increase the size of the Nano model yeah this is a good question so uh and the answer is because our competition came out with the Nano model that was pretty big uh so each one has a nano model YOLO v6n and it's pretty big it's like six megabytes um but it gets similar accuracy as ours at three megabytes and so to compete with that six megabyte model we couldn't keep the end model at two megabytes uh even though I wanted to so that was an internal discussion a big debate we had but uh we'd like to make the model smaller uh if we can so there's always a tug of war between smaller and more accurate let's see here okay uh let me move this is it possible to customize the training with respect to different metrics for example training to maximize the recall the training maximize Precision Okay so uh when I first started AI I was really confused about that map and recall on Precision but I've learned a lot since then and now I've learned that these two metrics specifically recall and precision are not absolute metrics these are relative to arguments that you pass and so you can have any recall that you want on the model simply by modifying the confidence threshold and so if you set the confident threshold to zero for example you'll have 100 recall that easy uh but of course then you're gonna say oh I got a bunch of false positives you know but then then that's on you to sort of adjust that confidence threshold and improve the model and so so when you ask for improved recall or improved Precision you're not really asking for those things because you can have those anytime you want simply by adjusting the confidence threshold what you really want to ask is how can I improve the map and of course that's the game that we're all here playing trying to uh approach 100 map um which of course will never get there but uh we can get closer and that's who we spend our time doing and that's uh what we've been doing on the yellow V8 r d to try and improve the map without sacrificing speed or size mm-hmm uh okay can we comment on Android app installing the pipeline if you're deprecated oh yeah no no so we definitely have an Android app um so if you if you go straight to am I still sharing my screen yeah oh perfect Okay cool so here if you go to the repository uh and you got an Android phone just go over here and scan this QR code and we got this app that works uh for iOS and for Android and it's got the oh actually hold on it doesn't have the Olivier models here okay maybe that's what they're asking so it has yellow B5 models and yes uh we're working on support for YOLO V8 models and this should be out um I'd say probably in February also the the Android models are a little more difficult since we've changed the output we've permuted the dimensions to make the models much simpler so a comparison that I like to do is to open up like a yellow V5 Onyx model and EO V8 model uh I don't think I have one of this handy right now but if you do like you'll see a dramatic difference in the output like the head is much much simpler we've gotten rid of maybe like 80 or 90 percent of like the stuff in the head and that's great but that means that now the output format is different and so we're adapting the Android uh app to that new format and it's going to take a little bit of time especially since we don't have tensorflow exports working yet so that's uh that's one of the prerequisites there okay uh okay so the data argument future Bim is asking about data argument so data argument is your data set by default like what we see here in the data argument is uh pass Coco 128 and so we have a few data sets that are available automatically in the repository so here the place to find these is you go into ultralytics and then YOLO and then you click on data and so we have a data sets directory with a bunch of official VMS so we've got some common data sets like argoverse Global wheat imagenet for classification uh actually we have objects 365 too so this is this is a really cool detection data set much bigger than Coco and you can point data to any of these to start training and they will all automatically start downloading the data set locally so Coco 128 is a really small example data set of the Coco data set and Coco is one of our Auto download data sets so that means that it's got this field here called download and when you run this with the Coco data set when you run data equals coco.yaml it'll run this code I'll start downloading the data set install it and start training on it okay okay let's see a question about there's lots of exception right Glenn for classification you don't need the data you can just pass the folder of them yeah if you have a classification data set you just pass the directory that contains the data set to the data argument [Music] so we have a user asking about pose estimation yep so what do you think about that each I mean I'd say we plan to do everything it's uh like we want to do semantic segmentation we want to do tracking we want to do Post estimation it's just that uh we are working on the base of a case-by-case basis on the uh sorted on the basis of urgency right so first priority is to fix really critical bugs and then the second priority is to do r d and these are all parts of r d so we will expand on that uh but yeah we cannot provide a date right now like we do plan to have it yeah but yeah no dates at all as of yet we always get in trouble when we provide a day don't we yeah exactly so so we have uh like another question here from aurelian Matt majette he says we provided pre-trained YOLO model on imagenet or other data sets and I have a good answer for you yes so we have pre-trained models on Coco and on imagenet so when you scroll to the model section by default you see the detection section is open and we have these models trained on Coco can we train these from scratch there's no pre-training we start from the yaml argument so we say model equals Viola v8n dot emo and we train that way uh but when you click on the classification models these are trained on imagenet again from scratch so we do have imagenet pre-trained models and for segmentation we also train on the Coco data set so all these models are available you can click any one of these so if you click on the ovied and classification model it'll download it and it's right there it's trained on imagenet so let's see is there a tutorial for training YOLO on cattle data set oh so this is interesting um we've been working on supporting different data set providers like roboflow but kaggle is unique in that the data sets are all kinds of different formats um they're more of like a host than I guess like a traditional data set company and that means it's difficult to support kaggle in general because anybody can upload anything to kaggle so that's a bit of a complicated one uh right now there's not direct support for that is there a tutorial for training YOLO uh I just did that one dude when do we have to use yaml and when do you CFG can you mention the circumstances okay so CSG is just an acronym for configuration uh configuration is just the default dot yaml that we have in the CFG file so this contains training arguments so when we go to YOLO slash CFG you'll see default.yaml and this is training arguments this is things like image size or the optimizer that you want to use for training and uh this doesn't need to be passed this is automatically inherited but if you want to change any of these you can modify it so if you just run YOLO train uh it'll use batch size 16. if you say YOLO train batch equals 32 then suddenly you'll be training with batch size 32. so you never need to modify the ammo or or pass any ammo directly but obviously if you modify this value in the default yaml then now the new default is 32. uh let's see here oh technique said oh no you actually meant to say PT okay so so models are available in two different ways okay so now I got the question here so if I uh go to the collab notebook and look at a training command what we usually recommend is that you train from a PT model so right here we see model equals yellow v8n.pt so this is going to pull out the pre-trained yellow VA model and of course it downloads it if it's not available locally and this means that you're going to start off from a pretty good spot like your model is going to perform pretty well on the very first step if you change this to yaml which you can do what's going to happen is that we're going to construct a model for you from that animal and it's going to have the same exact architecture as this model it's just that the weights are not trained they're randomly initially initiate and this means that the model is going to take a lot longer to produce good results so for every circumstance I would always advise you to use PT uh we train from scratch um did this sort of like a benchmark to show how well the model improves in a certain amount of time but in real world applications always always start from PT you'll always get better results no matter the size your data set the model or anything like I've never seen a situation where you're Warsaw starting from a pre-trained model okay let's see how the heart DX card how can I trade my own model on my computer uh so RTX should be supported um local environments can be kind of tricky so sometimes we just point users to our online environments but obviously if you've invested in your own Hardware you should use it uh if you're on a Linux computer that should be pretty easy if you're a Windows user sometimes it's a little trickier to get Nvidia graphics cards fed up correctly with Cuda um if you have problems on Windows Sometimes using our Docker image can be one way to sidestep it because that creates a local isolated Linux environment where everything works so you should definitely be able to train YOLO with an RTX card locally oh let's see here there's so many questions um there was another about P6 model did you answer that already like will there be a B6 model I don't remember yes uh P6 is scheduled for release with 8.1 on April 1st so okay as part of our quarterly minor updates uh let's see I saw something about P2 model actually so okay so P2 is all the way on the opposite side of this bedroom uh it's only got a P2 output it doesn't have P3 or P4 and this is intended for really really small objects P2 stride is only four pixels um so the yellow V5 repo has a lot of custom model yamls in the hub directory P2 is one of these um we don't have yellow V8 P2 right now it is very easy to create uh if I go into the models directory here actually it's so easy I can just show you here in about two seconds so you go into V8 you click on any one of these via yamls and the the P3 output oh I see what's going on yeah no to add a P2 takes a little bit of work to cut off outputs is really easy you simply like delete these two and then like delete the convolutions related to them but if you want to add a P2 that's a bit of a work uh so right now we don't offer that but I think in the future it might be a good idea all right so yeah I think we're out of time here so we've we've used up our hour we got a bunch of questions left it let's say we can't get to everybody but look let us know on GitHub uh just raise an issue ask a question there and like we'll try and get to it so uh thanks everybody for joining and uh yeah I know you said anything else okay all right all right thanks everybody awesome bye take care bye

Info

Channel: Ultralytics

Views: 10,985

Rating: undefined out of 5

Keywords: Ultralytics, AI, Artificial Intelligence, Object Detection, YOLOv5, Ultralytics Live Sessions, YOLOv8, Ultralytics YOLOv8

Id: IPcpYO5ITa8

Channel Id: undefined

Length: 64min 12sec (3852 seconds)

Published: Tue Jan 24 2023