Official YOLO v7 Instance Segmentation COMPLETE TUTORIAL | Windows & Linux

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and welcome back to the channel in today's video we will be running official YOLO V7 instance segmentation I will walk you through the process of setting up the environment installing necessary libraries and writing the code from scratch for instance segmentation on images videos and webcam the code will be available for our patreon supporters let's get started first let's install Microsoft C plus build tools on Windows you do not need it on Linux once you run the setup select desktop development with C plus plus and click on install when installation is finished restart the computer now open Anaconda prompt and create a virtual environment by command conda create minus n YOLO V7 underscore mask and python is 3.9 hit enter press y hit enter again and once the process is finished activate the environment by command conda activate YOLO V7 underscore mask and here we have our environment now install siphon by command pip install siphon now we need to install pytorch with GPU support go to official Pi torch website click on previous spy torch versions and I like to install pytorch 1.11 with Cuda 11.3 however latest version have some error which I will get into in a moment if you want to avoid that you can install pytorch 1.9.1 with Cuda 11.1 but I will go with version 1.11 just copy this pip command and paste it in Anaconda prompt or terminal if you are on Linux if you also choose to install this latest version of pytorch we need to modify one file go to your C drive then users then your username Anaconda 3 envs here you will find all the environments that you have created go to your YOLO V7 underscore mask folder then lib site packages torch and then modules and open up sampling dot pi go to line number 154 and comment this recompute underscore scale underscore Factor parameter make sure the closing bracket is after the comment save the file there is a quicker way to find this file as well I just use this small utility called Everything and if I search the file name here of sampling dot pi it's going to give me that file real quick and I can just double click on it and modify the file directly from here now we go to detectron 2 official repository and copy this git link you can install git for your operating system to use git clone command I already have installed it on windows so I can issue command git clone and then paste this link you will see a folder called detect round 2 in your local hard drive once it is cloned go to official YOLO V7 mask repository and download it as a zip file extract the file take these files out of the subfolder and delete the empty subfolder called ulub7 Dash mask all right now that we have these two repositories on your anaconda prompt go to detectron2 folder and issue command tip install e Dot and hit enter it will download and install required libraries and setup detectron 2. once it is done without any errors go to YOLO V7 Dash mask repository via Anaconda prompt and now we will install additional libraries but before that open requirements.txt from YOLO V7 Dash mask repository and delete torch and torch division from here as we already have installed them save this file then from Anaconda prompt issue the command pip install minus r requirements.txt this will install the remaining libraries required to run instance segmentation using YOLO V7 the installation is now complete without any errors let's run Python and import detectron to it's imported without any errors then import torch and issue command torch dot Cuda dot is underscore available it says true which means we can use GPU now go to official YOLO V7 repository and then click on releases than on assets and find YOLO V7 Dash mask dot PD here it is download it and save it in yolobism repository that we have on our local machine I already have downloaded this file so I'm just going to paste it here along with one.jpg and a video called AC underscore parkour.mp4 we will use these files for instance segmentation now we have everything set up and we are ready to code let's open detect dot PI from YOLO V7 Dash mask Repository we will need piece of code from this file create a new empty file called segment dot pi from detect.pi copy import our pass and paste it in the segment.pi file now in detect.pi scroll all the way down and copy the code from if underscore underscore main to the end and paste it in the segment.pi now these are the arguments that we can use in our code but we do not need all of them so remove minus minus device minus minus aft x t minus minus save Dash conf minus minus classes minus minus agnostic Dash nms minus minus augment minus minus update minus minus exists Dash ok and minus minus no dash Trace now we need to make our own edits the weights file is the one that we just downloaded so let's just change it to YOLO V7 Dash mask dot PT also change the default value of source to one dot jpg rest of the things are fine but we need to add a few more arguments let's copy this string argument definition and paste at the end here and change this to minus minus hyp this is the hyper parameter file for YOLO V7 mask it is located inside data folder this one so just change this default value to data slash hyp.scratch dot mask dot yml so that we do not need to Define it again and again when running the code next one is the integer parameter so just copy any existing integer argument and paste here let's call it seed and it will be used to change colors of the instances if we need let's set this to 1 by default we need another integer argument called thickness it is to control the size of the bonding boxes and the font let's also set it to 1. now we need Boolean parameters so let's copy this one and paste at the end call it no bbox and if this is set bounding boxes around the instances would not be shown create another Boolean argument and call it No Label if this is set the label will not be shown on the instances and finally another Boolean argument called show FPS and if this is set the FPS will be shown on the video or webcam stream that's it for all the arguments these arguments will be parsed and stored inside a variable called opt short for options we will be using this variable a lot so keep that in mind while coding along now let's fix numpy random seed with whatever we have from our past arguments options then we need to check if Cuda available if yes we will use Cuda 0 as device otherwise CPU will be used and if we use GPU we can utilize half Precision so let's define a flag which is set to true if device type is not CPU now let's load weights using opt dot weights argument and from weights let's get the model and place it on the device which is GPU in our case and if half Precision is possible let's ask the model to use it now we open hyper parameter file with open opt Dot hyp as f and hyp equals yml.load f loader equals yml dot full loaded we have not imported any of the libraries yet we will do that in a moment and now if we have not set the minus minus no save argument it means we want to save the output so we need to define a save directory save underscore dir equals path and increment path and path opt.project slash opt dot name exist underscore okay equals false and then we create this directory and if it already exists we do not do anything save underscore dir Dot mkdir parents equals true exist underscore OK equals true the save path then would be this saved directory and the source file name because we want the output to have the same file name as the source we also need to determine if we are using webcam so if opt.source is numeric it will set the flag webcam and if the webcam is true and we have not set no save flag we want to add dot MP4 at the end of the save path because it will be an integer representing the webcam number so we need to add an extension to the output for videos the extension is already there in the source so we do not need it we also need to determine if the input is a video or image so that we can call relevant methods later on so we Define a list of image formats so we allow MBP jpg jpeg PNG Tiff dng webp and mpo similarly for video formats we allow mov Avi MP4 MPG m4v WMV MKV and with torch dot no grade is fine the rest of the code is not needed now if we split the source file name based on Dot and look at its last entry it will give us the file extension if it is present in IMG underscore formats list we call a function on image else if this extension is in video format list or if webcam flag is set we call on video function because webcam and video functions work identically otherwise we print a message that it's not a valid source now our main function is complete we need to import the library so let's do that import time from pathlib import path import torch import CV2 import yml from torch Vision import transforms import numpy as NP and now we need to import utilities from YOLO V7 Dash mask Repository so from utils.datasets import letterbox from utils.general import non underscore Max underscore suppression underscore mask underscore conf and increment underscore path from detectron to dot modeling dot pullers import Roi puller from detectron to dot structures import boxes from detectron to dot utils dot memory import retry underscore if underscore Cuda underscore oom it stands for out of memory from detectron to dot layers import paste underscore masks I'll just go to n underscore image that's it let's save this file and run the code using Anaconda prompt just to be sure if everything works fine there is an error I have Mississippi memory let's correct it and run the code again now everything works fine of course it's calling on image function which we have not defined yet so let's define it right above this if underscore is our main statement image equals CV2 dot I am read opt.source and assert image is not none if it is none let's print message that image is not found if image is found and read we need to change it from BGR to RGB as CB2 reads images in BGR format and we save the size of the image in a variable called IMG underscore size now we need another version of the image which will be used to show all the masks labels and bounding boxes which will be used to display but we have not defined a function to get all the predictions for anything so just make a copy of the original image for now and store it as image underscore display variable we will replace it later with predictions and now we resize image underscore display back to the original resolution that we just stored above if no save option is not set we want to save the image so call CV2 dot I am right and write the image on Save path and the image that we want to store is image underscore display then print the message that the output is stored at this particular path similarly if opt Dot View underscore image is set by the way this hyphen in the arguments is parsed as underscores so we will be using view underscore ing here so cv2.im show in a window called results show image underscore display then we wait for a key press indefinitely let's run this piece of code and see if it works fine python segment dot Pi minus minus Source One Dot jpg I have missed R in pgr to RGB so let's type it in here and try again everything works and it says that the output is stored at this location if we go there the image is stored but of course nothing is plotted as of yet we can also test other flags minus minus no save minus minus view Dash IMG and we can see this time it displays the result but does not save anything all right now here instead of copying the original image let's call a function called process frame and pass the image as a parameter now Define this function which takes an image as input and we will call letterbox function to resize the image to 640 by 640 resolution and get the resized image from index 0 of the function return let's save a copy of this resized image as image underscore for future use now convert image to tensor by calling transforms dot to tensor on the image and then image equals torch dot tensor of NP dot array of list of image Dot numpy now we shift the image to the device that we detected which is GPU in my case and if we have half flag set we convert image to half Precision otherwise to float now let's call the model and pass the image to it and save the output to Output variable the model produces the output dictionary with six things information out train out attention mask intersection over Union bases and semantic output let's unpack these INF underscore output would be output at test train underscore out is going to be output at V box underscore and underscore CLS a Double T N is going to be produced from output at awtn mask underscore IOU is produced from output at mask underscore IOU let me maximize this window basis from output at basis and sem underscore output from output at sem now we concatenate basis and sem underscore output tensors and we also need the image size so the number of batches then Channel which we do not need so just put underscore there then height and width equals image.shape we can also extract class names from a model so names equal model dot names now we need to define the polar polar equals Roi polar output underscore size equals hyp which is this yml file that we loaded here at mask underscore resolution scales equals model dot Pooler underscore scale sampling underscore ratio equals one Pooler underscore type equals Roi alignment V2 canonical underscore level equals two now we need to perform non-maxima suppression which basically removes any overlapping bounding boxes it's going to produce five things output output underscore mask these two are the most important then output transfer mask underscore score then two more things let's call them output underscore AC and output underscore a b equals non underscore Max underscore suppression underscore mask underscore conf let's pass INF underscore out then a double d n then bases then Pooler hyp conf underscore d h r e s equals opt.com underscore t-h-r-e-s and IOU underscore t h r e s equals opt dot IOU underscore t-h-r-e-s merge equals false mask underscore IOU equals none now we need to unpack the information that we need to plot on the images so pred and pred underscore masks equals output at zero and output underscore mask at zero we also need base image so base equals bases at index 0. now we need to plot this information but only if prediction is not none so if pred is not none only then we want to execute this code here so B boxes equals boxes from pred all rows and four columns original underscore PRD underscore masks equals vred underscore masks.view minus 1 then the resolution which can be obtained from hyp at mask underscore resolution and hyp at Mass console resolution again then brid underscore masks equals retry underscore if underscore Cuda underscore oom based underscore masks underscore n underscore image original underscore p r e d underscore masks then pass B boxes then as a tuple height and width then threshold equals 0.5 now we need to detach everything from GPU tensors and convert it into CPU based arrays so pred underscore masks underscore NP equals pred underscore masks dot detach dot CPU dot numpy PID underscore CLS which is prediction class we already all rows and fifth column dot detach dot CPU Dot numpy pred underscore conf which is prediction confidence we already at all rows fourth column dot detach dot CPU Dot numpy and number of bounding boxes equals B boxes dot tensor dot detach dot CPU Dot numpy and we want to convert it to integer so dot as type integer now we need another variable image underscore display that is equals to image at 0 index Dot permute 1 comma 2 comma 0 and we want to multiply it with 255 so this will extend the range of the image from 0 to 1 to 0 to 255. so image underscore display equals image underscore display dot CPU dot numpy dot as type NP dot U int 8. so now we have converted this image to unsigned integer 8. so as you recall that this image was converted to RGB so we need to convert it back to BGR so CV2 dot CVT color on image display CV2 dot color underscore rgb2 BGR and now we have set of boundary boxes masks and class and confidences so we can use them to plot on our image so let us write a for Loop for one underscore mask B box CLS c1f IN Zip of prad underscore masks underscore NP and B boxes brid underscore CLS we already underscore conf and if this confidence is less than opt dot conf underscore Thresh we do not want to do anything and we just continue the loop otherwise we Define some random colors for each instance so color equals a list of some random integers from 0 to 255 and we need it three times because color has three channels r g and B so we are just gonna copy this and paste here and now image underscore display at one mask equals image underscore display add one mask into 0.5 plus NP dot color D type equals NP dot u in 8 into 0.5 so this will paste the mask on the image and now we need to define a label so label equals there would be one string input and one float input so the names at integer of CLS and then second one would be Co and F so now we need to calculate the font size and the text size so TF equals maximum of opt dot thickness minus 1 or 1 so this means you cannot have thickness of zero or below zero and then T underscore size equals CV2 dot get text size from label 0 font scale equals opt dot thickness divided by 3 and thickness equals TF and then index 0 then C2 equals B box at 0 plus T underscore size at 0 then be box at 1 minus t underscore size at 1 minus 3. now we need to plot this bounding box but only if opt dot no bbox flag is not set so if not opt dot no V box let's call CV2 dot rectangle on image underscore display and the bounding box would be V box at 0 to be box at 1 and then second Tuple would be B box at 2 and B box add three then the color that we defined above and our thickness would be opt dot thickness and line type would be CV2 dot line underscore AA which is NT aliasing similarly if opt dot No Label is not set we want to put the label on the image as well so let's do if not opt dot No Label CV2 dot rectangle image underscore display the box at 0 V box at one and C2 then color then -1 then CV2 dot line underscore a so this is going to fill the background of the label text now put the text with CV2 dot put text on image underscore display label then B box at zero be box at 1 minus two then zero then opt dot thickness divided by 3 and it's going to be in white color so 255 255 and 255 and thickness equals TF which we calculated above and the line type would be CV2 dot line underscore double A and once all of these bounding boxes are plotted we just want to return image underscore display and if no bounding box was detected we want to return the original image which we stored as image underscore all right let's run this code and there is an error so I have typed this comma two times here so let's remove this and run again and I have made another mistake yeah so model dot Pooler underscore scale this should be a tuple not a scalar so let's add a comma here and if we run the code again it gives another error so this should be CV2 and not CV so let's correct this and run the code again so yeah this time we can see the output so all the instances of person class are detected and plotted on the screen all with different colors now let's test other arguments as well so let's add minus minus no B box so this time bounding box is not shown on the instances and let's also try minus minus No Label and now you can see that label is also not shown and only segmentation map is plotted so now we need to Define on video function as well so let's define here below this on image function so if webcam flag is set we already know that the input is going to be a webcam so cap equals CV2 dot video capture integer opt dot Source else cap equals CV2 dot video capture opt dot source and let's see if the source is successfully opened so if cap dot is opened equals equals false print error that file cannot be opened and returned from the function otherwise success comma image equals cap dot read now we also need to determine the FPS of the source and width and height of the source so FPS underscore Source equals cap dot get CV2 dot cap underscore prop underscore FPS and width equals integer of cap dot get CV2 dot cap underscore prop underscore frame underscore width and for the height same command is used just change this to H for height and this width to height and now we need to define a video writer if opt dot no save flag is not set then with underscore writer equals CV2 dot video writer on the save path and encoding would be CV2 dot video writer underscore 4cc static mp4v and it's FPS gonna be whatever we have detected Above So FPS underscore source and its width and height would be whatever we have detected above so w and H now we Define start time equals zero this is going to be used to calculate FPS so while success image equals CV2 dot CVT color and convert it to from BGR to RGB then image electron display equals process frame and we are going to pass this image and if opt dot show fpas is set then we are going to calculate FPS so current time equals time dot time FPS equals 1 over current time minus start time and start time equals current time and then we put this text on the frame so CV2 dot put text on image underscore display FPS plus string of integer of FPS offset is going to be 20 comma 7 and font is CV2 dot font underscore Hershey also plane with size 2 in color green 0 to 55 0 and thickness is 2. and if opt Dot View underscore IMG flag is set we are going to show it in real time so CV2 dot IM show in result window image underscore display and if opt dot no save flag is not set we are going to call with underscore writer dot right and pass image underscore display so that it can be written in the video then we monitor key press key equals CV2 dot weight key 1 and 0x FF if key equals equals Q then break this Loop otherwise read next frame and once all the frames are read and processed that's call CV2 dot destroy all windows and finally if opt dot no save flag is not set we are also going to print where this video is saved so print output saved at save underscore path now for video same command would be used only difference is that now source is going to be AC underscore power core dot MP4 and here we have the result we can experiment with all the arguments let's remove no V box No Label minus minus view Dash IMG and minus minus no save and run this command again you can see in the video output the color is not consistent with respect to each instance of person that's because we are initializing color randomly for each instance here so to get consistent colors we can actually do video instance segmentation which is a separate topic maybe I'll make a video in the future and we can see this output is stored here but it's not going to play because we have to make one more modification in the code so as you can see in with underscore writer we have defined it with a specific width and height but the image that we are storing has different resolution so after processing this image let's do image underscore display equals CV2 dot resize image display and we are going to give same width and height here so this time if we run this command again we can see that the output is lot bigger and it's stored in the hard drive as well so now let me tell you how to run this on webcam so python segment dot Pi minus manage Source 0 which is your webcam number minus minus thickness you can do two or three or four depending on the resolution of your webcam then minus minus view Dash IMG minus minus no save minus minus no B box minus minus No Label minus minus show FPS and now if you execute this you would be able to see the results on the webcam in real time so with that I think I am done if you have learned something of value today leave a like And subscribe to the channel to watch more videos like this consider support on the patreon to help the channel out I will see you next time [Music] thank you
Info
Channel: TheCodingBug
Views: 28,024
Rating: undefined out of 5
Keywords: yolo v7, yolov7, yolo, YOLOv7, Official YOLOv7, yolov5, yolov7 tutorial, install yolov7, train yolov7, yolo v7 tutorial, yolo v7 object detection, yolo v7 windows, yolo v7 linux, yolov7 windows, yolov7 linux, yolov7 python, yolo v7 python, yolo7, yolo v7 official, official yolov7, official yolo v7, yolov7 instance segmentation, instance segmentation, yolo v7 instance segmentation, yolov7 segmentation, official yolo v7 instance segmentation, official yolov7 instance segmentation
Id: tq0GI4FahWU
Channel Id: undefined
Length: 33min 57sec (2037 seconds)
Published: Mon Sep 12 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.