Python: Real-time Multiple Object Tracking (MOT) with Yolov3, Tensorflow and Deep SORT [FULL COURSE]

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Music] so [Music] [Music] so [Music] in this video we'll discover how to perform a multiple object detections and tracking with the use of yolo tensorflow and deepsort so first of all i will cover a bit of how to set up a local gpu environment because we we are going to use a gpu or the object detections as well as the tracking however if you already have this setup lights knee please feel free to skip this part and on the other hand i will also show you how to convert a yolo model to a tensorflow format because later on we will need to use this tensorflow format yolo models for object tracking and then i will briefly introduce the deep shot for object trackings and finally demonstrate how we could detect recognize and track and objects after watching these videos i hope you will be able to develop your own multiple object detections and tracking models for your own problems so let's see how we can set up a gpu environment if you wanted to use the gpu versions of the tensorflow you must have a coder enabled gpu and of course first of all you need to check the list in the nvidia's website to see if your gpu is on the list if it is it means that your computer has a modern gpu that can take advantage of coder accelerated applications so once you identify your gpu in the next you need to install the latest gpu driver which can be downloaded from the nvidia websites as well and you can manually find the drivers for your gpu say for example i'm using a geforce 1050 ti versions and using window 10s so i just select them appropriately and then search funnel and install the driver once you install to driver you also need to install the cruder toolkits and here i select the version 10 and window operating systems because this will fit my tensorflow versions which i'm going to use tensorflow 2.0.0 and so you just need to download the toolkit from nvidia's website and install it in the next step you need to download the cudn files so you probably need to log in your nvidia account and for the first time user you need to set up a nvidia user account inside the website there are many different cu dnm versions you need to case i download the cudn wii dot 764 or cruder 10 which i set up in previous slide so once you download the zip files you should find three folders in the zip files now you need to go to the cruder file directory open the program files open the nvidia gpu computing toolkit open coder and open the version 10. you should see there are many folders inside the directory but three of them should have exactly the same names as shown in their that is download include and dip what you need to do is to open each of the folders in both zip files and the toolkits and copy the files into your folders one by one accordingly now you are all set in this final setup you just need to set up the typhoon package and make sure that the package are compatible with each others in order to avoid any warnings or fatal error and in my case i use tensorflow gpu 2.0.0 in which i find that you better to have a numpy version 1.16.4 to avoid some warning signals although it is still okay to use it i still downgrade my numpy just for avoiding the warning for the programming exercise you might also need some others python package so i just list out all the package and their versions just in case you might want to take a look so this is the first line of my package setup and this is the second slide of my package setup and this is the last slide of my package setup finally you can simply type in the two lines of coke for you to check whether tensorflow can access your gpu just to make sure that your gpu environment is set up correctly if everything is fine you should see a true result so great we are all set now now let's see how we could convert a yolo models to a tensorflow format because in the programming exercise we will need to use this tensorflow format for object detections as well as tracking there are many github files that you can find the yolo to tensorflow converter here a few here are the few suggestions that i find them working very well and my github structures follows their ai guys code so you can also visit my github to download the convert python file it's very simple just visiting the website then click on the code download the zip file extract and place to your local computer appropriately so once you download the convert python program and set up your folder structure according to my github you will need to download the yolo wait file and save it into the raid folders for example in our programming exercise i'm going to use yoda weavey so i just download via the link provided here and save the rate file into the raid folder now we are ready to run the iphone program file because the program provides us a command line interface for interacting with the program so you just need to open a terminal that can run python script in my case i use pycharm so i click on the view and click on the two windows and opens a terminal for running the converged python program first of all we need to use command cd docslash get the file directory where the convert program and folders are stored and then we just simply type python converse.py or converting a yolo referee into a tensorflow format and then if we are using yolo wifi tiny we need to specify three things firstly we need to specify the path to rate file and secondly we need to specify the path to output file and thirdly we need to add a tiny flat after we execute the convert python programs we should see field message confirmed in confirming that the conversions is successful or not for example weight loaded and weight saved but this is not the best way to check if everything is done correctly so we should go to the rate files for checking and there should be four newly created files which include three tensorflow files and one checkpoint files and if that is the case congratulations you've successfully converted your weight into tensorflow format and are ready for applying the network for detections and tracking in the multiple object tracking problems in order for us to perform an object checking there are two generic steps that includes detections and associations first all the objects are detected in the frame and these can be single or multiple detections performed by different types of algorithms or performed by different types of convolution neural network on the other hand one once we have the detections for the frames a matching process on matching procedures is performed for similar detections with respect to the prefix frame and this will help us to confirm whether the objects in current frame associates with the objects in previous frame and object detections and recognitions can help us in the first step and object detections detect the objects in an image in computer regions it's very often these passes provide you a bounding box which contains the objects x and y's coordinates and the object's width and height however it will not give the information regarding what the object is therefore in computer visions we also have another subject called object recognitions and these pauses tell us what kind of the detected objects are and very often the recognize the recognition's output will be a class label for the detections and recognitions yolo version 3 is used in our exercise yolo is a detector applying a single neural network to a full image the network predicts bounding box and probabilities for different regions of the image and the bounding box are weights are wasted using predictive probability it predicts detections across three different scales and for each bounding box the class which it might contain is predicted using multiple label classification for the second generic step associations that i just mentioned before object checking is needed object checking is a field within computer regions that involves tracking objects as they move across several video frames the general aims for object checking is to associate detections across frames by localizing and identifying all objects of interest an ideal tracker should provide a constant id for each of the objects within the scenes by keeping track of objects even when the detections are missing or false positive the multiple object tracking problems is challenging since objects can be occluded or temporarily leave the field of view the appearance of an object can change within the scenes because of scale rotations and illuminations variants in general object checking is the task of taking an initial set of object detections creating a unique id for each of the initial detections and then tracking each of the objects as they move around the frame move around frames in the video and maintaining the id assignment one can simply ask why can't we use object detections in each frame for the whole videos and that we can check the object however there are few problems with that if the image has multiple objects then we have no ways of connecting the objects in the current frames to the previous frames if the object's you were checking goes out of the camera field for a few frames and another one appears we have no ways of knowing if it is the same object so essentially during detections we work with one image at a time and we have no ideas about the motions and the past movement of the object so we can check the objects in a video by simply detecting object in each of the frame one of the ongoing tracking frameworks is simple online and real-time tracking shot which attempts to overcome the challenges that i just mentioned it used killman filterings and hungarians methods to handle motion predictions and data associations respectively in the short descriptions the sort algorithm keep tracks of each objects by estimating an object models for every frame and the object models contains current spatial information about the object positions scale and bounding box ratios the object's models also contains motion predictions for the next frames that is estimated using the kelman filtering the short algorithm solves the data associations by calculating the bounding box similarity between objects and detections and this is done by calculating the bounding box iou distance after calculating the iou distance the final assignment problems is solved by using the hungarian methods and sort is the fast and simplest and have high precision and accuracy but the problem with thought is that the frequency id switches and cannot handle occlusions very well because shots simply use a simple motion model and because of that this motivates an improvement of the short algorithms and developing deep sort the full name is sought with deep associations metrics if shots is an extension of the short algorithms described in the previous slide it adds in appearance features of an object to strengthen the associations in uterus it utilize describers for the visual appearance of the detected objects which are used when matching detected objects from one frame to another to keep track of identify of the identities throughout a video sequence therefore deep shot does not only keep track of motions but also appearance and because of such a improvement if sought very often can reduce the occlusion's problems and achieve a higher accuracy the implementations of multiple object trackings can be divided into three generic steps and that includes the object detections and recognitions motion predictions and features generations and finally the tracking in the first step are techniques of objects detections and recognitions based on cnn is used to calculate the detections far cnn or yolo are commonly used in this initial detection step on every frame in the programming exercise i use yolo version 3 for illustrations purpose and in the second step we need to predict the motions and generate appearance features an estimations model is created before the data associations this utilizes the states of each track such as bonding box centers box heights box with and etc and then the kelman filtering is used to model these stages these status as a dynamic systems and make motion predictions in addition in these steps features generations and bounding box describers are also computed using picturing the cnn so all of these informations are then passed into the final step for tracking so in the final steps given the predictive status from kelman filtering using the previous informations and the newly detected box in the current frames and associations is made for the new detections with all object tracks in the previous frame for each pair of frames the cosine features distance the iou distance and the k element state distance are calculated for matching updates and these three steps will continue throughout the video frames and help us identifying and tracking and objects if sort is a very useful tool apart from tracking if sort also has been applied for counting the number of people counting the number of vehicles and also for creating density maps and heat maps and also for many other exercise for example for countings very often we will define a gate or a line if an object is passing through it will make it count and for density map very often we will create a song or a grid in such a way that we can count how many objects are going in and going out of a specific areas and also we can count how many objects are within a specific area currently so let's go go to the programming exercise to see how we could do that so first thing first we need to import the libraries for applying the object trackings so let's import some of the library first so from absl import flex and then import uh we need to import seeds and then set the flats close to flats dog flats and then flats equals to system argument so because yo node library here is using the command line interface so we have to import the flash settings for using yolos version 3 in tensorflow later on so we need to we need these four lines just um for initializing the flat settings for the yellow refree and then we need to import time uh import numpy and import um cv and import the map or ui plots as plt so here we import time because we are going to calculate the frame per second and then we need the numpy we need these uh cv to open cv that is help us to for the visualizing the trackings and we also need the map leaps right here is for just for the color color map and then we need to import the tensorflow of course and so and then from the folder called it you can check the folder infrastructure structures you can see that there is a folder's name called the yodo wii free ef ii so that is the yolo version 3 that is in tensorflow 2 format so uh in the models that we are going to import uh yo low v3 because we are going to use the yolo version 3 and then inside the same folder we we have the data set we are going to import the transform image that is that is used for resizing our image for the yolo format and then we need to also imports the although we pretended tensorflow 2 dot use imports the convert boxes so that this helps us to convert the boxes uh back to the ipsods formats so and then we are going to import the deepshot import dp processing which is used for the non maximum suppressions and then we need to import the deep shorts from the deep shorts we approach the matching and then matching oh i forget to mention this deep shot is also under a folder names called deep shot we just import the um program file inside inside this dip shot models and for the nn metrics is just for setting up the deep association metrics and then we need to import the deep short captions and import detection just help us to detect an object and then we need to import the deep short dog tracker for writing the track informations and then we need to from the tools folders we need to import the generate actions as g g d t so this is um for us to import the feature generation features generations encoder so now we are all set so first of all let's initialize the yolo and load the yolo class names and as well as the weight first of all let's load the cost name into a list first use c dog strips right here for c [Music] in open i'm going to open the cocoa file that contains the 18 itty-class's name and i put it under the data labels uh folders and i we alpha lies in this coco dot names file and then let's load the create the yolo model i'm going to use the yolo version free what i just need to input is just the number of classes so that is the equals to the length of the classes names that mean that is the length of this class nameless so and then i'm going to load the weight into these models and i put the weight under the yolo version free dog tensorflow format and that's it for this part so once we defined it the yolo models and load rate to the models now we can initialize the dip shot including the parameters and also the encoder functions so first first of all let's define the maximum osi distance this is a i will use 0.5 this is the first hole for considering whether the objects is the same or not so if the cosine this is distance is larger than these 0.5 that means the features uh is very similar in these two in the in the objects in the previous frames and also the objects in the current frame and then we need to assign the nm budget however this is i should say this is this nm budget is used to um form a gallery because in for each detections we are going to we are going to use a deep network to extract the features uh factors and then these nm budgets is used is is used to create a libraries and then store the features letters and by default this is 100 uh it is 100 but now i'm setting it to be none that means we are the nm budget is not enforced right here and then we need to assign the nms maximum overlap um equals to 0.8 so this is just to avoid avoids if there are too many detections for the same object so by default this value is equals to one that means we are going to keep the all the detached detections but however somehow it might not be a good idea to setting this to be one because uh there might be two too many similar detections for the same object which uh we don't want to have that and then we can initialize the default so the disk shot so um we can now start up our applications and we can then generate the and cook the functions as well so the model files file names that we are going to use is the model status that is the boss small one two eight dot pb so this is a pre-divided or p-trained um convolution neural network for tracking um pedestrians so i just use these um models for um for simple side for illustrating purpose so and then we need to create a encoder which is actually the feature generations so we have the g e g d t dot create a box and cooker and then we put in the model file into it and the batch size is equals to one so now the the generator or the encoder is ready and then we need to create a association uh matrix so here we use the name matching dot nearest neighbor distance matrix and then we are going to use the osi osi features or cosine distance functions and then we just put it back to the maximum cosine distance first hole and putting back to the nm budgets right here so now the um if associations matrix is ready we can then pass it on to the deepshot tracker now we can capture our videos and also assign a file names for saving the output so here i assign a video by using the video capture log slash data slash videos latch as well mp4 and this is to capture your videos so you can change your mp4 to whatever videos you want and then if you replace this um path to zero and then this will use your computer cam camera so now we can then um because the outputs as a as an avi files so we just need to assign the file format you write to 4cc and star dot xv d so this means i'm going to use an avi um file format and then we need to define the uh video frame per seconds and also the video with and video height so we just get the dv2 dot fps and then to get this is get the frame per second from the original videos because the um for the settings because by default video capture returns a float number so instead of integers so i need to use um the in functions um to get an integer for them and then for the width and height we do similar capture we just see two dog cap and get the frame and get the width for that and then for the height same we use the similar functions so that is height and finally we are ready to save these outputs so we have the out as our output zero writer so we need to provide a path name to it so i'm saving back everything back to the video folders dot avi and then we all met to do that and frame per second and then we pause the width and height these um video writer functions and so that's it for this um for saving the video and then we move on and will and create a while loop so this while loop is used to capture all the frames from these videos so um i'm going to with the frame one by one with this um with the use of this round loop so i also add in a if statements that mean at the end of the video if there is no image so we are going to print completed and then break the value and then we need to transform a bit of these image capture from each of the video frames in order for us to put it into the yolo predictions uh models so i'm going to assign it to the image inputs for the yolo predictions so for the we need to convert the color first because the color codes in open series is bgl and however the yolo or tensorflow is rgb so let's convert the color first color that should be bg r g b and then we need to expand the dimensions by using the tensor expand dimensions functions so img in zeros because the the image originally have a 3d array shapes that contains the height width and channels so right now we need to add a add to add one more dimension to it that is the batch of the one mean that is a batch size so right now after using these functions we will have a 4d array as an input to the yolo predict that contains the batch size height width and depth so that means we just adding one more dimensions to this original image and finally we need to resize resize the image um for the yolo version three the joke um the default size for the yolo version c3 is 416. so we are going to reshape it to 416 here so we are going to transform the image img in all one six and the image is now ready for the yolo predictions so here i start to start the timer and then we are going to pass it past the image to the yolo prediction functions so the yolo predictions return returners um or numpy and the arrays that include the boxes scores classes and also the nums so by default there can only be maximum 100 binding boxes per image that can be detected so the for the say for example for the boxes right here it returns a 3d shapes because we only have a maximum 100 bounding boxes per image so it will return 1 104 and numpy rates to us and these four parameters is the x y center coordinates width and height and then for the scores it will return it as a 2d shaped and that is the one and 100 and then the for the detective objects confidence scores in a shorter form and for the remainings it will packed it with zero so and then we return us the classes and also a 2d shapes with a 1 and 100 and this is the detected objects classes and the west against it will be packed with zero and also just a friendly reminder that by default either is packed with zero or zero has another meaning is a person it's the classes of person and then for the nums this is a 1d shape that is the one only and that is the total number of detected objects so because of these four formats along are not aligned with the non-measurement suppressions functions later on that we will have and also along a light with the dip shot functions so we have to um modify it a little bit uh in order for us to perform the deep sort tracking functions so first of all let's get the classes uh value so i just need to the if the first row of the empire and the arrays and then we get the uh extract the class names according according to these um classes value so for i in range we are going to link through the loop for the classes and then the name stock append we are going to give the classes name with that is the integer of the corresponding classes and then we we are going to return everything into a numpy array that is the default format for the known maximum subtractions and also and also for the absolute algorithm and then we are going to convert the boxes and this converted boxes equals to use the functions convert box the image boxes equals to zero that mean we just um need to use the um first row of that of the boxes so these functions will help us to remove this zero uh to pack the zeros and scale it back according to the original size of the image and convert the boxes into a list and then based on these converted boxes we are going to use the encoder to generate to generate the features rather for each of the detected objects and then we can pass the converted boxes scores names and features into the detection functions which is actually a class and functions that has four attributes so we are going to delete um put the detections under a list and then inside this list i'm going to call the detection function with the bb box scores last name feature that is say from inside this for loop that is the bb box scores class name features and then in a sip function inside this ship functions we are going to have the converted boxes scores we just need the first row as the entry and then we have the names we have the features so we will have four uh attributes inside these detections that include the uh nd arrays for the box informations which is in a tlwh format that means that is the top left x top left y width and height format and then we are going to have the confidence and that is actually the scores in a nd array as well and we are going to have the class name in an nd array as well and then of course the features letter that describes the objects contained in this image and then all these informations will be used to perform long maximum suppressions on the detection frame to eliminate multiple frames on one target just like what i mentioned detections is a class object so in order for us to run a non-maximal suppressions we need to extract the information from the class object back to a numpy array for running a non-maximum suppressions we need to have a box we need to have his goals and also the classes so we just get back the numpy array from the from from the class objects or d in e action that is our class objects and then use similar we do similar for loop for that but we get the confidence for d in actions confidence is just a meaning discourse and then we get the class name or d in the actions as well so we will get a index result from the long maximum suppression and we are going to pass the boxes the classes the nms maximum overlook that is 0.x that we set at the very beginning and we also pass a scores into it and then these index will tell us um which objects that which boxes that should be disregarded so we are going to pass it back to the detections to remove the redundancy now these detections is ready for the deep shot because we just eliminated the multi-frames on one target by using the on maximum suppressions so we can simply just call the tracker.predict to propagate the track distributions one time step forward based on killman filtering and then we need to give the update or the detections that we just um just have into these checkers and this will update the command tracker parameters and feature set uh and additionally assess the target disappearance and the new target appearance so that means two step just two steps will help you to do all these calculations and provide you the results now we are ready to visualize the results first of all let's assign and create a color map on the map lib so what i'm going to use is the tape trendy b and this is just to generate color maps with a trendy tapped map leap object so you can consider that the color map will say dictionary which maps numbers to colors and this is just say in the built-in color maps functions provided by the map leaves and the values at each grid spawns is a floating numbers between 0 and 1. so and then i'm going to assign a colors from this color map so just get the the free color mapping numbers or i in mp.line space so here i just um try to generate a 20 steps colors in a list so we are going to have a color list that have them that can map with these uh color maps that generated by the map leads and then we will create a for loop to to loop all the results from this tracker so for track in trackers of cracks so um if not track dog is confirmed it or track dog time scenes update larger than one and we will continue to the next loop so these two statements this this statement is that it means that if komen's filtering could not assign a track and if there is no update in the track we will skip this track so just to make sure there's some updates and also just to make sure that um there is an assigned track so for the if everything is fine then we will just move on to the uh to get the box from the bounding box from the track to tlbr format this bounding box format will be used for the cv2 output for the open series output and that is the that is that provide you the minimum x minimum y and maximum x and maximum wide and then we need to give a class names from this track so chat deck classes get the um this is to get the corresponding classes and then we are going to assign the color based on the color code that we just um created outside this for loop so we are going to give an integer of track based on the id and then divide it by the length of the colors so this is to just to get the remainders for assigning the color code for each of the objects and then because the color is as i mentioned before this is between 0 to 1 value so we need to times it back to 255 you get back to the to convey back to the standard rgb scale in a lisk and then we can put everything into a uh rectangle um that let's put it back to the original image that is that is the frame of the video and then we need to provide the uh start um coordinates and then we also need to provide the ending coordinates and so the starting coordinates represent the top left corners of the rectangle and then the ending coordinate represents the bottom right corners of the rectangle that is just the first two figures first two uh index of these uh of this bb box so this this is the first index and then inside this i'm going to provide the x y coordinates of the top left corners of the rectangle and then i'm going to provide it a the ending coordinates representing the bottom right corner of the rectangle so and then i'm going to give it a color and also the thickness of the box so after creating this bounding box we also would like to create a rectangle that is above these responding box to show the length of the class name as well of us the class name as well as the track id so here we would like to shift a little bit up of these bonding bonds and then the starting positions for this will be will be zero and then for the ending will be one however the ending position should the x-axis for the ending position should have enough enough size that could contain all the text as well as the track id so that means here i add the length length with the class name and then plus the length uh with the strength of the track track id so put it in the next sentence and then for the overall sizing i would like to give a little bit margin for that so i just times it by 17 and then here because i want to fill the color into this um into this rectangle so i just use the minus one right here and then we can put the axis uh into this image or in sorry into this inside this rectangle so we first we need to provide the x that is the class name plus pass a hash and then pass the track track id this is the last name and now i need to provide the starting coordinates so that is somewhere around should be around in the middle of the box so around -10 and then i give it a the bonds default font and then the font size and then of course the text color just two five five five two five five five two five five and then the thickness of the text is 2 so that's it for this so it's just um this sentence just this coke just means that we are going to put the text somewhere in the middle of of these rectangles in a color with a color white color because we we already fill in the colors so the red color could give us a very nice contrast so um there's a i put here and also i get this one is an error as well yep i post well so we also would like to print out the frame per second so um outside should be outside this for loops because we've already done the tracking right here so um so for every frame just try to print out the frame per second so we will put it on the image at the top left hand corners so with two voting points and dot format uh fps inside it this is the test and then at the upper left hand corners the default font style the default font size and also pull it in a red color with size 2 and then you're going to resize the window for the output into 1024 7x6 766 43 ratio and then i'm going to i'm show the output and then the image remember that at the very beginnings we also we also would like to save the outputs into a file so we are going to use the out stock right so we are going to save this image for every frame in the video and finally we just to give a raise key to break the while loop just in case we would like to do that so break the while loop so outside the while loop we just release the video and also release the saving saving files and also to destroy all windows and that's it for these all the codings good let's see if everything works well if everything goes well for you you should have similar results as mine and here i have around three point something frames per second and people are attracted with class name person and also its associated track id above the bounding box after running through all the frames in the videos the file will be saved in the videos folders and you can see a better and faster video result just for your quick reference the output video should look something like this so now let's see how we could add the historical trajectory for each of the detected objects so first of all we just go to the beginning of the while loop so here i want to create a uh a bond length dq so that um the path or the historical uh data points that can be saved inside this eq so here i just from collections imports dq and then i just um use this point and then create a eq leads with the maximum length to equals to 13 points that is like a 13 frame or anything step inside the range of a thousand so here i created a 1000 list of the queue and each of the this skew has fixed length uh with um proteins for for teeth pawns so once a bundle length the queue is food when new items are added a corresponding number of items are disregarded discarded from the opposite ends so that means the the q or the point the data points cannot be larger than 40. and then i need to draw the motion path and also visualized it in the image so inside this for loop i'm going to add a few lines of code in order for me to do that so first of all i need to get the center coordinations of x and y from the from the from for each of the tracted object so first of all i need to get the x coordinate center point eb box so and then these will be divided by 2 and i also need to get the center coordinate for the for the y coordinate so i need to get the bb box 1 and bb box birth and then divided by 2. so now once we have the center point i am going to put all the point into under the track id dot append center so we have these dqs help us to store all the center pawn so now i just need to draw the motion path so for j in range 1 length that should be equal to the length of the track if pawns track or track id j minus one is none all points track dot track id j is none so here it means that if the if the current tracker i if the previous tracker ids there is no track and then or if this there is none so we are going to break this for loop we do not uh we are not going to draw any things for for this track id and then if the current frame uh has it is none as well then we will just continuously again we are not going to draw any motion path for this or in you here i will use a formulas for the thickness that i'm going to put in the for each of the detected objects so let me type it first 64 divided by folds j plus 1 and times 2. so what does it mean for this equation is that we are going to keep the closer distance line thinner and keep the farther distant light thicker so um so that means the closer the closer points we will just keep it uh uh thin line and then for the um for the bottom points uh we are going to uh visualize it in a thicker in a in a thicker line so i'm going to put everything into a line and visualize with the open cv so um i need to give a pawns dq for it so i'm going to give a track doc track id for the prefix point and then path the ending pawn is the track block tracker id the current frame this one is one and then the color is the color that we just assigned and then thickness is just the thickness that is this equation and that's it let's see whether it works well so if if it works well you should be able to see now each of the tractor objects shows its associated historical path and you can also see that the closer data points show thinner line and the further data points show fake line and this is how it looks like for the saved output videos so now let's move on to see how we can perform content again outside the value first of all we need to create a counter which is used to count the number of target objects detected and again we work everything inside this uh this track for do so let's work it at the end of this for loop here i need to get the heights and width for the image so in order for me to draw a joy line appropriately in the in the image so let's say if we let's say we're going to draw a line in the middle of the image so we're going to give a waiting for this height divided by six fill off this image and then after with in strip s1 so we have we now have a line across the middle of the image and this line the color of this light is green and the thickness is sequels ooh so this should be height height here i use 6 is because you can easily tune up and tune down these values to move up and down of the line and any any other values that you can you can use it doesn't matter it's just uh it's a conventions and then i'm going to get the center uh y of the for each of the tracked object got this vp1 plus bb box and then divide everything by two so because we just draw a line in the middle of this of the image so that we can just track the um center y so for anything is moving from from the bottoms to the top or anything moving from the top to the bottoms then we will give a conson ants on it there are many ways for performing content so here i just create a very tiny space in the image so if any things pass through this space it will make it count and this is a very simple way to do that and work very well for a one-way traffic but of course you can create more sophisticated methods for counting so for me i just create a if statement so if anything is if the center y is less than the middle parts uh very tiny space say for example i create a height divided by 30 and then center y is larger than the integer minus height divided by 13. so so these if statement help us to create a very very tiny space and then it's cost name say for example we want to track a call or track a or track a truck so any vehicle if any vehicle passing through this gate we will give a count on that so the counter appends will be increased or will be or we will just put the track id into this cons and then here i just measure how many counts how many counts in this um in this counter so measure the length of this and then to get the total current and then i will output the result with the use of cv to put top test image that is total vehicle count pass the the strength of the autocunt position should be somewhere around at the top left hand corner as well and then 0 1 and then 0 0 2 5 5 and thickness 2 and then i will output the total counts into the image by using the test function and this is the total vehicle count and faster strength total count and then put it somewhere at the top left corner and style style font scale and then that is the color that is red color and the thickness of the ants and i would like to change the video to r2 uh to better illustrate this example so let's see if it works or not so if everything works well you should be able to see a line or so-called gate in the middle of the video and when the cuff moves across the line either from the top to the bottom or from the top from the bottom to the top the total currents will increase accordingly however as i mentioned before this simple method works very well for a one-way traffic but if there is any things that could reverse in their directions it might count wrongly so a more sophisticated method is needed and this is how it looks like for the final output video so finally let's create a songs or a grid in such a way that we can can't how many objects are going in and out of a specific area and also how we can count how many objects within a specific area in the current frame so before the for loop of the track right here we make a current counter here to count how many detected objects or how many tracted objects are within the specific areas at the current frame and then we need to add just one more line here you visualize the to visualize the zone so here say for example i try to um expand a little bit of this song by 20 20 and then i just um create a song by increasing the height and also reducing the height so there are two lines right here is that uh that we that um that help us to segment the song and then we just need to change these parameters to illustrate the uh zone uh if anything is within within this song we will give it a count so here i will just give a current count plus equals to one what counting how many objects are within the song in the in this frame in this current frame and finally i just need to put another test here that is the current vehicle count and then should be change i change the positions and also change the total current protocols to current count and that's it and let's run the files to see what we have so if everything works well you should be able to see two lights or chorded songs in the middle of the videos and when the cuff moves into the song either from the top to the bottom or from the bottoms to the top the current vehicle cons and the total vehicle counts will increase and when the car leaves that zone the current current will decrease similar to countings there are many ways of for creating a density map heat map or hisong so here this is just an example or just a very simple method to count how many objects are in and out of phase specific areas and counts how many objects are within a specific areas at the current frame and the final output videos look like this and that's it for this video i hope you enjoy and thank you for watching if you have any questions or suggestions about what we covered in this video please feel free to ask in the comment section below and i will do my best to answer and if you enjoyed this video you can subscribe my channel simply like the video and it's a great support to share this video who you think would be find it useful and thank you all for watching

Info

Channel: eMaster Class Academy

Views: 52,086

Rating: 4.9699044 out of 5

Keywords:

Id: zi-62z-3c4U

Channel Id: undefined

Length: 90min 31sec (5431 seconds)

Published: Wed Jul 22 2020