Top Object Detection Models in 2023 | Model Selection Guide sponsored by Intel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
computer vision moves fast every year dozens or even hundreds of different object detectors are being released so how do you know which model to use in your project well we'll answer that question today I read the papers crunch the numbers spoke with members of the community and overall spent probably like 40 hours putting that video together so here comes the list of top object detectors to use in 2023 but before I show you the models let me tell you a little bit about the methodology we used to compare them first we group the models according to their task wait didn't I just say that we are focusing strictly on object detection models that's right but depending on your project you may have different expectations if you process video streams you need a model capable of running in real time often on edge devices without access to enormous cloud gpus and you'll probably be happy to sacrifice a little of models accuracy in exchange for sizable performance boost on the other hand if for example you detect tumors on medical images prediction quality will be your primary concern not to mention zero shot object detectors models that combine information from text and images and allow you to detect objects without training on specific set of classes those models are becoming more and more important however are not there yet when it comes to overall accuracy and performance all in all we decided that comparing models across those categories would be simply unfair and decided to add additional column to our table to make that distinction next up map Min average Precision I'll be honest I don't like that metric or should I say I don't like the idea of boiling down the whole model benchmarking into a single number it's not very transparent and it hides important nuances but evaluating models using map on Coca data set is pretty much the industry standard and it's the only consistent metric that we have for all of the models remember that map on Coca is only a hint of model capability and your score on custom data set will be probably completely different we actually created a separate Benchmark called rf100 that aims to measure model's trainability unfortunately we didn't have enough data points to compare all the models on the list speed well that's the metric that we are not going to use today don't get me wrong I tried I crunched the numbers I have the spreadsheet with multiple columns but it's very difficult to compare model using only information from papers and GitHub readmies there is no standardization outers use different hardware for evaluation some of them use Nvidia T4 some a100 or V100 gpus and some of them even RTX 3080 or 3090 on top of that we have different runtimes I've seen models being benchmarked in default pytharch environment but also on the next and tensor RT not to mention quantization and different batch sizes all of those factors have a dramatic impact on the final result I think that the only way that you can truly compare the inference speed of those detectors is to run a clean Cloud instance and compare the performance of each of those models individually and maybe we'll do it someday it's just that topic for a completely different video it will be a lot of work so let me know in the comments if you would really like to see that paper that's pretty straightforward the model was released with the paper or not packaging simple as well and the license this Factor can be critical especially if you plan to use the model in closed-sourced project if the model is distributed under the MIT or Apache to license you're pretty much good to go in other cases it's always a good idea to read the license and understand the limitations it carries if you are not sure if you can use the model it's always a good idea to ask the authors after all they are probably the most reliable source of information if you plan to use the model for open source or for research you should be fine with any model from the list as you may imagine taking into account all of those factors and picking the top model is a lot of work thanks a lot to Intel for sponsoring This research okay enough of the theory let me show you the models let's start with yellow V8 one of the latest installments of yellow architecture released by ultralytics at the beginning of 2023 the model comes in different sizes ranging from Nano to extra large depending on the selected version it scores between 37.3 and 53.9 map on Coco validation data set importantly all of those metrics were calculated using 640 input resolution that will be quite important in later part of the video there is no paper unfortunately so the model is not included in papers with code Benchmark but its accuracy and speed seems to be on par with top models in real time object detection category next up YOLO V7 yellow V7 paper was showcased on cvpr 2023 and is currently holding number 3 in real-time object detection category on papers with code with 56.8 map however it is important to point out that the score was achieved using 1280 input resolution if you remember yellow V8 used strictly 640. so if we only consider versions trained on 640 input the top score for yellow V7 drops to 53.1 actually lower than yellow V8 by itself yellow V7 doesn't have an AP package or SDK you can only install it from Source I mentioned yellow V7 is number three in real time object detection category the model that currently holds the state of the art title is yellow V6 V3 with slightly higher map equals to 57.2 just like before the score was achieved using 1280 input resolution and if we go through similar process and only keep those with 640 the map score drops to 52.8 like I said at the beginning all of those nuances make it super hard to compare results either way just like yellow B7 your V6 can only be installed from Source no P package is available but interestingly ultralytics recently added both of those models into their Library unfortunately I'm not sure which license you should follow in that case RTM debt is the model that was originally not on my list during my video research I asked our community to give me some advice and RTM that was one of the models you recommended just like previous models it comes in different sizes and the largest one scored 52.8 on Coco validation data set at the same time is also quite fast reaching over 300 FPS on RTX 3090 in tensor RT environment to top all of that it is distributed in mm detection package under Apache to license which is awesome if you would like to use it in Enterprise project overall very solid choice and thanks a lot for the suggestion next up RTD TR finally we have the first Transformer on the list and that's another model proposed by the community some say that RTD TR is the last nail in the coffin of convolutional neural networks after all Transformer based models are slowly taking over computer vision and real-time object detection is the last line of defense RTD TR is still not as fast as other architectures on the list reaching between 74 and 114 FPS on Nvidia T4 in tensor RT environment but it compensates when it comes to accuracy scoring a whopping 54.8 map on Coco validation data set with 640 input resolution the model was originally distributed using pedal pedal package but since recently it is also available and limited sizes in ultralytics when creating this list I decided I need to make room for at least one model Deployable with Transformers package it offers a lot of options when it comes to Transformer based object detectors most notably the original DTR we already have a video about that model the link is in the top right corner and in the description below but in the end I decided to go with DTA with map equal to 63.5 the ETA is currently number 13 on papers with code object detection leaderboard unfortunately I didn't manage to find any good speed Benchmark for this model all I know is that depending on the backbone it can reach between 4 and 13 FPS on Tesla V100 but with batch size equal to 4. I guess it makes sense somehow because we are no longer talking about real-time object detectors by the way Intel hpus have great support for Transformers package models we tested hpus and gpus and found great result especially when it comes to cost the hpus were 25 cheaper this is something worth considering and last but not least my favorite model of 2023 grounding Dyno multi-modal model that is trained on both text and images and allows you to detect objects of any class without explicitly training for it my favorite example is when I ask it to detect the tail of my dog and it managed to do that all you need is a text prompt it is currently state of the art in zero shot object detection on both object detection in the wild and Coca dataset make sure to take a look at our previous videos where we experimented with automated data set annotation using grounding Dyno and some there is one more important factor that is often overlooked and that's the community why should you care because if the community is there there will be probably documentation tutorials and most importantly a ton of other people that will have the same problems and questions as you have life used to be simple you just open up GitHub search for the repository check the amount of stars and get the idea about the size and the strength of the community however since Auto GPT has surpassed python itself in this category I'm not so sure if trusting only stars is a good idea anymore that's why we need to look for a better way to estimate the actual Community after some consideration I decided to build a completely separate table with star count but also number of project Forks projects using this project contributors as well as active PRS and issues over the last 30 days I hope that should give us enough information to figure out which projects are truly powerful and which are simply overhyped putting together all that information was quite challenging so I really hope it will help you to pick the right model for your project let me know in the comments which one you choose also make sure to check our notebooks repository we have a lot of examples showing you how to train and use models we discussed today that's all like And subscribe and stay tuned for more computer vision content coming to this channel soon my name is Peter and I see you next time bye
Info
Channel: Roboflow
Views: 18,405
Rating: undefined out of 5
Keywords: Computer Vision, Object Detection, Real-Time Object Detection, Zero-Shot Object Detection, Accuracy, Benchmarking, Leaderborad, Model Selection, Hardware, Model Licensing, GitHub, Transformers, YOLO, YOLOv6, YOLOv7, YOLOv8, DETR, DETA, DINO, GroundingDINO
Id: dL9B9VUHkgQ
Channel Id: undefined
Length: 12min 12sec (732 seconds)
Published: Mon Oct 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.