Speed Estimation & Vehicle Tracking | Computer Vision | Open Source

Video Statistics and Information

Video

Captions Word Cloud

Captions

have you ever wondered how to calculate the speed of moving Vehicles using computer vision in this video I will explore the whole process from object detection to tracking to speed estimation along the way we'll confront the challenges of perspective distorion and learn how to overcome it with open CV and a little bit of maff so fasten your seat belts and enjoy the right you see what I did there by the way the whole code you will see in this tutorial is open sourced you will find it on GitHub the link is in the description below okay we start in pretty much empty python script all I have is a way to pass Source video path as an input let's go ahead and import supervision as SV that's a computer vision library that we will use to do all sorts of useful things in this project we start by creating an instance of frame generator that we will use to Loop over frames of our input video for now the for Loop will be empty we go back to our Imports we will need to import from inference models utils import get roboplow model and just above our frame generator we will load our model to do it we call get roboplow model and pass the name of our checkpoint in our case YOLO v8x with 640 input resolution now we just run the inference on every frame in our for Loop and convert the result into supervision detections object once we do it we can use supervision to easily create nice looking visualizations supervision has almost 20 different annotators and you can customize and combo them for even more unique results if you want to explore the link to supervision docs is in the description below now we will use probably the simplest annotator in supervision package bounding box annotator to just draw the boxes on the frame we'll select a hardcoded thickness of four for now now inside the for Loop we'll create a copy of our current frame call it annotated frame and reassign the result of bounding box annotator annotate method to it the last thing that we need to do is use open CV to display the result on the screen will'll add a small break mechanism I know it looks simple but trust me it will get a lot more complex along the way by the way in my script I'm using roboplow inference P package but you can very easily swap it for ultral litics YOLO or YOLO Nas or any other model you just need to change few lines in your code and you should be good to go it would be nice to make our annotations just a little bit more interesting so to do it we add additional label annotator that will for Now display the class name of the object that we detected but before we do this let's use two smart methods coming from supervision P package to figure out the optimal line thickness and text scale for our frame resolution now we will pass the thickness to our bounding box annotator and thickness and text scale into our label annotator and once we do this we will go uh back into our for Loop and Below bounding box annotate or call we will add additional overlay with our labels unfortunately object detection is not enough to perform speed estimation to calculate the distance traveled by each car we need to be able to track them and to do that we will use by track coming together with supervision P package if you want to learn more about about plugging it into your detection project head over to supervision docs page there you will find endtoend examples showing you how you can do it with different detection models the link is in the description below let's do it adding a tracking into your existing object detection project is actually very simple all we need to do is create the instance of bite track we will pass a frame rate because by track is depending on that information in the Constructor and then in our for Loop below our detection we will just call by trck update with detections and that's it now we can add a little bit more flavor into our annotators and just Loop over our track ID values uh create labels and pass them into label annotator seems like our tracking works as expected however I'm a little bit concerned about those small detections that are are far away from the camera those tend to be less stable blinking from time to time anytime this happens we risk our tracker will increase the Tracker ID to get rid of some of those unwanted detections we will once again use supervision to be precise it's polygon Zone filtering feature we start by just pasting the list of vertices of our polygon Zone by the way don't worry I will explain how did I get those coordinates in just few minutes now let's go just below frame generator definition and create our polygon Zone pass n array with vertices as well as information about frame resolution and between our detection and tracking we will add detection filtering based on whether or not the detection is inside or outside the polygon Zone let's draw our newly added polygon Zone on on the frame just for debugging purposes it will make it a lot easier for us to confirm that all unwanted detections are indeed removed now let's take a look side by side and we can see that polygon Zone have managed to successfully remove those unwanted detections by the way if you would like to learn more about using zones in computer vision not only for filtering quite recently we've released an awesome video covering this topic showing how you can leverage zones for advanced traffic analysis the link is in a top right corner finally we can talk about speed in principle it is very simple speed is the measure of how fast the object moves it is calculated by dividing the distance the object travels by the time it takes to cover the distance so how can we estimate the speed of moving object using computer vision for a second let's consider a simplistic approach where the distance is estimated based on the number of pixels the bounding box moves here's what happens when you use dots to memorize the position of the car every 1 second as we can see even when the car moves at consistent speed the pixel distance it covers varies the further away it is from the camera the smaller the distance it covers as the result it will be very hard for us to use raw image coordinates to calculate the speed we need a way to transform the coordinates in the image into actual coordinates on the road removing the perspective related Distortion along the way fortunately we can do it fairly easily using open CV so let's go to drawing board and let me show you how to perform a perspective transformation with open CV so here's a single frame coming from our video and we can see that on the left and on the right side of the Road there are those vertical markings and we can see that on the image the distance between those markings is different and they are getting closer and closer the further away they're from the camera and in reality there is always 50 mters between them so now how do we convert our perspective to make those distances equal so we first create this trapezoidal shape and call it our sour region of interest and what we would like to do is to convert it into our Target region of interest and the target region of Interest looks more like a road but from the bir ey view so it's pretty much a rectangle and I done my research the road is 25 M wide and around 250 M long this is because we have five sections between those markings and each section is like I said 50 m so the target region is 25 M by 250 M now to convert between those regions we will need to convert between a coordinate systems the first coordinate system uh originates in the top left corner of the image here is our 0 and the y- axis is coming from Top the bottom of the image the x axis is coming from left to right the resolution of the image is 3840 by 21 60 and our source region of interest is defined as as I said as a trapezoidal shape with a b c d vertices and we would like to convert it into to rectangle with a b c PR and D vertices so I spent a little bit of time and I figure out what are the coordinates of points a b c and d so it's uh 1252 by 787 and 2298 by 83 now it's get a little bit more interesting with points C and D because they're outside of the image so I assumed that they will be lying on the bottom edge of the image so that's why I get the coordinates 2159 and I get it by subtracting one from 21 160 because uh we start to count from zero so we need to compensate for that and that's why we need to subtract one the final coordinate of Point C is 503 9 by 2159 in case of Point D it gets even crazier because the y axis stays the same the Y coordinates stay the same but we go into the negative side on the x axis and we actually get min -550 as for the Target region is actually way simpler all we need to do is use the dimensions of our rectangle the coordinate system starts in point a prim and we just use the dimensions to figure out the coordinates of points B CPR and D we just need to remember to subtract one because once again we start to count from zero not from one now our transformation is really interesting because the whole red section our source region will get transformed into this fairly small section of the target region and what is even cooler is that the small blue section of the source region of interest will at the end get transformed into equally sized section of the target region so we see that stuff that is far away will get bigger and stuff that is close will get smaller how do we get that transformation so open CV allows us to do that all we need to do is prepare our region of Interest data so we need to put our source region of Interest vertices into the metrix the metrix need to be two-dimensional every Row in that metrix uh is the coordinate of one point from our source region of interest and we need to do the same but for our Target region so this time a prim BPR CPR and DPR and the first row in our case will be 0 0 Etc that will allow us to calculate M Matrix and it will be the result of calling get perspective transform uh method from open CV all we need to do is pass our source and Target nire arrays and we will get our M metrix then we can use that M metrix to do something magical so our object detector provide us with the result in the form of bounding boxes we can use one of the points from that bounding box to Define that bounding box I'm usually using bottom center of the bounding box here I'm calling those points D1 D2 D3 and D4 and we can use Matrix M to convert our points from the source region of Interest into Target region of Interest so you can see those small dots on the edge of red section those are trans transformed points um and we can use Matrix M to do that all we need to do is Define um a input data uh in the form of once again to the uh metrix we put uh coordinates of each point as the row in that Matrix and we can calculate our points Prim which is pretty much points in our Target region of Interest by calling a perspective transform pass those points and additionally our M Matrix and that's it now let's try to Cod it I just start by pasting the information about with and height as well as the vertices of the target region of Interest now let's Implement a small utility called view Transformer that will pretty much execute the logic that we discussed a few minutes ago so in Constructor it will take source and target number arrays those will be the vertices of our source and Target region of Interest we just need to make sure that those num arrays are in float 32 uh format because get perspective transform function expects those number array to be in this D type things get slightly more complicated in transform points method and that's because perspective transform from open CV expects points to be defined in 3D space not on the 2D plane so we need to add this additional dimension of those kind of like empty dummy information we need to do that so that our data will go through perspective transform and after that's done we just remove this extra Dimension now that we are ready we can create an instance of our view Transformer we can do it right over our for Loop and now inside the for Loop below our tracker we first convert our bounding boxes into a list of points we can do it using supervision we just need to specify the point that we are looking for as I said I'm going for bottom center and then we transform that list using our view Transformer to go from our source to our Target region of Interest last but not least we can use label annotator to display those coordinates on the output video we can see that as Vehicles distance themselves from the top left corner of our region of interest there is a corresponding increase in the values of X and Y that's exactly what we wanted as the bonus we can calculate the relative distance between the cars here we witness a hazardous situation when one car is dangerously close to another now let's go back to drawing board and learn how we can use those coordinates to finally calculate our speed we already have our detector and tracker set up that means that every frame we get a set of bounding boxes along with tracker IDs assigned to them so here it is number one two three and four and that allows us to track those objects how they move in time so here is the position of the object now but here is the position of the object a second ago and two seconds ago and each of those positions get a separate set of coordinates now it's XY about a second ago it was X1 y1 and two seconds ago it was X2 Y2 now of course those points will get transformed and they will end up on our Target region of interest but when that happens it will turn out that uh those points will will be pretty much on a straight line because that's a straight road and that means that the x coordinate uh and XY and X2 coordinates are pretty much the same all that changes is the y coordinate so to calculate the distance traveled by the car in the last second all we need to do is subtract y from y1 and take it all in the absolute and that's our distance the time is like I said 1 second so in the end our speed is distance divided by the time um and that will be defined in me per second so if we would like to get the speed in kilomet per hour we would need to multiply our value times 3.6 and that's it like you saw to calculate the speed we will need to be able to look in the past so we will need to be able to say where the car was let's say a second ago and to do it we will create a python dictionary and will store coordinates of the car in the past so every frame will just add this coordinate to this dictionary and inside the dictionary we will use DQ and that DQ will have a set length that will be uh in our case 25 because that's our frame rate that means that we'll always store the coordinates of the car over the last second and will probe 25 time second here we just Loop over our Tracker ID and point and what we do is we add uh like I said we add the Y coordinates to our coordinates dictionary one thing that I forgot to mention is bounding box flickering this is a tiny movement of bounding boxes up and down and left and right that can be pretty much defined as a noise but when our coordinate measurements are done very close in time that noise can lead to a gigantic inaccurate accuracy in our speed estimation so optimally what we would like to have is two coordinate measurements that are a half maybe a second from each other then the distance traveled by the car is proportionally way larger than those small inaccuracies and our speed estimation should be just fine we get rid of our initial labels formatting Define empty labels list above our for Loop and check whether or not our DQ related to that specific Tracker ID is at least half full if that's not the case then as a fallback we'll just display the Tracker ID however if we have enough information we'll calculate the speed we'll use the first and the last element in coordinates DQ to calculate the distance and the time because if our DQ is not full it can be less than a second finally we just divide the Distance by the time multiply by 3.6 and that's our speed that we can Now display as a label using label annotator well it took us a long time to get here but we finally see the speed next to our moving cars we also can see that there is a short period of time where there's only Tracker ID this is the time where we don't have enough information to calculate the speed yet now it's time for some final touches one thing that I always like to have when I track object is trace annotator it displays the route that the object traveled over the past few seconds in our case will go for two second and we'll pin that route to the bottom center of the bounding box of each object now let's do some formatting so that the lines wouldn't be so long and now we can chck change the position of the label the default one is top left we will do the bottom center so that the label will be just below the bounding box now we can go into The annotation section remove the line that was drawing our Zone and replace it with Trace annotation and the last small thing that we can do is to change the color mapping so up until now the colors of the bounding boxes were related to the class and now we will change the mapping to the Tracker ID it means that every object on the frame will have different color and that will just create a more interesting visualization let's take a look at the final results [Music] and that's it I'm super happy that we finally had an opportunity to record a video on speed estimation that was on my personal to-do list for probably a year now so having it as the topic of the first video in 2024 it's just awesome having said that it took me a lot of time to prod this video making all the demos visualizations and whiteboard explanations Al together took several dozens of hours so I really really hope that you will like it and if you did make sure to like and comment to help the algorithm find it of course there is so much more you can do with speed estimation like for example this small app that color codes cars based on their speed I hope this video will inspire you to build something something even cooler as usual make sure to like And subscribe and stay tuned for more computer vision content coming to this channel soon my name is Peter and I'll see you next time bye [Music]

Info

Channel: Roboflow

Views: 35,144

Rating: undefined out of 5

Keywords: yolo, tracking, multiple object tracking, multitarget tracking, traffic control, vehicle speed estimation, roboflow inference, yolo nas, yolov8, bytetrack, deepsort, yolo tracking, yolo v8, object tracking, sort, yolo object tracking, yolov8 object tracking, yolo v8 object tracking, python real time object tracking, video object tracking, deep sort, vehicle tracking, conveyor tracking, computer vision tutorial, computer vision

Id: uWP6UjDeZvY

Channel Id: undefined

Length: 24min 32sec (1472 seconds)

Published: Wed Jan 10 2024