Video Data Processing with Python and OpenCV

Video Statistics and Information

Video

Captions Word Cloud

Captions

hey youtube my name is rob and i'm a data scientist that makes videos about machine learning and coding in python today we're going to be talking all about working with video data in python we're going to be using the package opencv to read in video files and to explore them a little bit by the end of this video you should be able to open a video file in python understand how to iterate over all the different frames in that video edit those images and then save as a new file using machine learning on video data is a field of active research and there's a lot of cool things you can do with it but before you get to machine learning you need to get comfortable working with video data and that's what we're going to do in this video the data set that we're going to be working with is footage of a car driving through a neighborhood and we're going to use the labels provided to us to add annotations to the video i'm going to do a lot of this work in a kaggle notebook which i'll link in the description and you can click on that link and follow along with the code yourself if you'd like if you enjoy this video make sure you like and subscribe alright let's get to the code okay so here we are in a kaggle notebook before we jump into it too far i want to show you here on the right side we can look at what data i we will be working with and this is the driving video object tracking data set in this there are over a thousand videos with um cars driving and also the labels of all the different objects in it including cars and pedestrians so it's a really cool data set that you can explore later and we'll be using for the explanation here a little bit of background of what we'll be learning today so first what is video data and how do we work with it in python how do we use python to convert video files formats i'm going to show how we can pull the metadata from the video including the frame rate and the size of the video we're going to also talk about how to open a video and iterate through each of the images in it then we're gonna actually add the annotations as bounding boxes to this video and save it off as a brand new video okay so before we get too far into this we need to have a general understanding of what video data actually is this might seem pretty obvious but video data is actually just a series of images so you could think about this like a flip book and the images just appear in sequence so quickly that our eye perceives them to be objects in motion understanding this is really important because when we open up a video file in python we're actually going to be working with images and if you're brand new to working with image data i do have a video that's an introduction on how to work with images that you should check out because video data is just a sequence of images there are a few things that we can know about that video and one of those is the video resolution video resolution is basically just the size of those images you probably recognize a lot of these sizes already and they're commonly referred to in terms of the height of pixels in the image so 720p is 720 pixels high and 1080p is 1080 etc so in addition to resolution there's also something called frame rate and this also can be really anything for the video but there are some standard frame rates that are very common the frame rate is essentially just the amount of images that you see per second and it's commonly referred to in fps or frames per second or in hertz which is a general unit for frequency you can see here in this image for one second of video you would have 60 images for 60 frames per second and the lower that the frames per second goes down the less images that you'd have okay so to get set up with working with video in python you'll need a few packages installed you can pip install these opencv which we will import as cv2 matplotlib that we'll use for plotting also we'll need numpy and pandas there is also a package that is not a python package called ffmpeg which is very important for converting file formats and we'll call that from within python if you plan to explore this notebook in a kaggle notebook all of this is already installed so you don't have to worry about that but if you're working on your local machine you will have to install a few packages okay so let's get started with the imports let's import pandas import numpy we're going to import cv2 we're also going to import matplot lib pi plot as plt and we're going to import from glob will import glob then to display the video in this notebook we're going to import from ipython's display or we're going to actually yeah import ipython display as ipd and we're gonna also import from tqdm dot notebook we're gonna import tqdm this will let us track when we're going iterating over the images we'll be able to track our progress in a progress bar also we're going to want to import sub process and sub process will let us run similar commands that we would in the command line when we're calling ffmpeg and it will it'll make that easier so as i showed here we have these videos that are kind of nested in this directory that are mov files and because we can't display these directly in the notebook i'm going to use ffmpeg to convert these from movie file and i'm going to convert one of the examples to an mp4 file which is a standard format of video used commonly okay so i have the directory where i know this video file sits i know it's pretty long here but we're going to convert this locally using ffmpeg and we're just going to run subprocess which is will allow us to run something that we would in a command line so ff mpeg and then we're going to do dash i we're going to give it our input file called q scale at 0 and then we want this to output here as the same file name but with mp4 now i'm going to run this first with out suppressing the output so we can actually see what's going on and it's actually converting this video file to an mp4 file and you can see the progress running this would be the same as if we had just typed in all of these commands with spaces in between into a command prompt but subprocess lets us run it from within python if you're interested in learning more about ffmpeg you can read their documentation on their site and and learn more about what we're doing but we're basically just running the standard uh code to convert from a movie file to an mp4 now that that's done if i do a ls on this directory we can see let's do ls-l we could see this file is now here as an mp4 file and one thing i want to do is just to suppress this output because it can be a lot to show in the notebook i'm going to rerun this with a log level of quiet i'm gonna remove the file that we created and re-run this okay so we can see that we now have a completed process it returned a positive code so it's it's done and we also have this mp4 file if i run an ls on this directory we can see that it's 14 megabytes in size next thing we might want to do is just to watch this video inside the notebook and we can use that ipython display module to do this so and it's pretty simple we just use ipid video and then we feed it in this video name by default it'll show the entire size of the video but we can actually shrink it down by giving it a custom height or width we we see here that it can take a width height you can embed it it can even take a url but we're going to do width equals 500 and see how that looks maybe actually go to 700 so we can view it a little wide wider and let's actually watch some of this video to see what the footage that we're going to be working with so you can see it's a car driving through uh neighborhood and we do have labels for this that we're gonna add which will be very interesting once we get to that so the next thing we wanna do is is open the video and read metadata so as i mentioned before there are some things we want to know about the video like the size of the images the frame rate and maybe the total number of frames that are available in this video file and the way that we read in the video using cv2 is using the cv2 video capture class and we'll feed this in the name of the video like this and save it as cat for video capture now with this video capture object we can actually get some information from it by running the get method on it and then cv2 has these built methods that we can pull out information like for instance if we want to see the property frame count we can grab that here so this is the total number of frames in the video we can also pull some information about the file like the height and the width and that's cap prop frame height and then we could do the same with the width let's go ahead and print this so the height is 720 and the width is 1280. if you remember from our resolutions this is a standard 720p video format call this video height and width and then we also will want to understand the frame rate of the video so we can do get cb2 cap prop fps to get the frame per second and the frame percent second is 59.94 around 60 frames per second this is a pretty standard frames per second and we'll print that out also let's just show it to the second decimals now one thing to keep in mind about this capture object is you're going to want to make sure in your code that you release it once you're done working with the video file so that video file is no longer being used by python and we're going to run that command the next thing we're going to want to do is actually pull in the image from the first frame of this video and this is pretty simple so we're going to create this capture object again and then with this capture object we are going to run the method read and what read will do is return two things it'll return the return result and the image let's actually put this all in the same line and print out what we get so returned so we could see that we ran this it returned the value true and an image of shape 720 by 1280 that's good because that matches what our metadata said the height and width of this video file is and we also have three channels here this will return true every time we call this read function until we've reached the end of the video so the capture object and when we run read on it we're actually iterating through each frame in the video starting with zero but before we get too far with that let's actually take a look at this image it should look like the same thing as the first image in the video now if you're not familiar with image data i recommend you watch my video on introduction to image data but this image is basically a numpy array of three dimensions and we can display it using matplotlibs i am show but we'll notice that when we do that it looks a little bit strange and that's because the channels that matplotlib is looking for are a little bit different than the the channel setup of cv2 so i'm gonna just copy this quick function in here this function all it does is takes in this image in the cv2 format and it converts it from bgr format to rgb and that'll make it nicer for us to visualize in this notebook so we're going to run the display cv2 image on this image and see what it looks like there we go this looks like the very first frame of the video very good now that we're done with that i'm going to release the video capture object and the next thing i want to do is display multiple frames from the video so now we're going to actually iterate over this capture object and display a bunch of the images together in in a single plot what i'm going to do here is create a matplotlib subplot of 5x5 and we're going to make the figure size of this 30 by 20. just to show you what this looks like this is going to be a grid of different images that we're going to place in here for each frame and one thing we'll have to do is make sure we flatten this axis that'll just make it into a list that we can then go into and place images for each frame so now let's um pull out the video information or let's make a new capture object of this video and let's pull out the total frame count from this like we did before and we're going to start looping over the images in this video so we're going to do 4 frame in the range of the number of frames we're going to pull out the images by running cap read so this is actually gonna iterate over the all the frames and keep on reading from the video object now if return equals false we're going to break out of this so once we're at the end of the video it'll no longer be within this loop and if we remember the the size of this video in terms of frames is over 2 000 so we're not going to be able to plot each one instead let's go ahead and display just every 100th image in this plot that we have down here so we're going to say if frame is every hundredth we are going to add to this plot and we're actually going to need a image index that will track for where which plot in here that we want to show so we're going to take the axis at the image index and we're going to i am show this image and we're gonna actually have to convert the color of the image to cv2 color bgr to rgb and then we're also going to do a few things like set the title to the frame number and turn this axis off that'll just take care of removing these x and y grid numbers and then we'll increment the image index to make sure the next time we add to the next grid we're going to do pipe plot tight layout and then we'll do a show on that plot and we'll also make sure we release this video and i need to remove that plot show up there okay so there we go now we have plotted after iterating over each of the video images and we've displayed every hundredth one it looks like that it ended before this last box but you get the picture of all the different things going on in this video for each frame all right now on to the really cool part we're gonna try to add some annotations to the video images this data set as i mentioned has a bunch of object tracking annotated boxes for each of the pedestrians the vehicles and a few other things so we're going to read in that annotation label data set using pandas and low memory is false if i run a head on this label file you'll be able to see what the labels look like so we can see what these labels look like we for each frame index of this video we have the category we also have some information about if the image or the object is occluded or truncated we also have the x and y coordinates of a bounding box for this and we're going to just filter down this labels data set to the video that we're looking at right now by running this query method and call this our video labels put this up here and then one other thing i want to mention is that the video labels are not in the same frame rate as the video itself while the video is at 60 frames the labelers that went through this data set only labeled at 5 hertz and one thing i'm going to do then is to change this frame index column that we have and multiply it by the difference in frame rate so that we can align the frame rate with the actual video frame in the video that we are iterating over okay now to the cool part for this video that we just looked at we have labels for this many different objects mostly cars but we also have motorcycle in there and a bus and we're going to go ahead and plot these labels on an example image and in order to do this we're going to do a lot of the same stuff that we did before we're going to um actually pull one of the frames that has the most amount of labels in them so we're going to pull frame 1035 and i'll place paste this code in here but essentially what we're doing is same thing as before where we make our capture object and we're looping over the number of frames until we hit frame 1035 then we'll release the video and we will um then have an image from that frame so if i do our display image of this frame we can see uh it's somewhere in the middle of this video at the 1000 35 frame and the cool thing with cv2 is it has a built-in function to easily add in a rectangle around the objects given a bounding box so in order to do that we need to actually take out the labels for this frame so if you remember we have a data frame called video labels and we are currently at video frame number 1035 so we'll pull that out and we're going to call this our frame labels and then we're going to iterate over each of these frame labels rows using itter rows it's okay to use inner rows here because there's no way to parallelize the process of writing these rectangles and what we're going to do is take cv2 and we're going to add a rectangle to this image and what it takes in here is 0.1.2 so these are the two corners of the the rectangle the color and then we can also add a thickness and the line type so we actually need to pull out where these point numbers are and as we iterate over here let's actually run a break here so you can see what the d value looks like here it's um it has all the information for this one row of labels and we are going to create a 0.1 and a 0.2 which is just going to be the integer value of the box 2d x1 and the integer value of the box 2d one and similarly we're going to do this for point two and we are going to take this for each row and add in this rectangle to 0.1.2 and then let's give it a color this is in uh gonna take um it in the order of blue green red and 255 is a full value so this will make a red box and then we'll make the line thickness three just to make it look nice and after this we're gonna display our cv2 image of this image and see what our labels look like there we go so we've taken this image of this portion of the video frame and we've used cv2 to draw the bounding boxes of all the labels that are provided in our data set now we want to scale this up and actually do it for every frame that we have labels for but before i do that let's just make it look a little bit nicer by making the colors of these boxes different based on the label of each of the labels so if we look at d here we could see that um so we have a category field that will tell us if it's a pedestrian or a car and let's just give each one of those a different color and we can do that by i'm just going to create a mapping dictionary that i have from before again to show you the unique values we have car truck pedestrian other vehicle and we're going to take each of those and give them their own unique color that i've added here displaying colored by category and we could just take everything that we've used before up here and run this again down here but instead for our color being red all the time we're gonna make our color equal to the color map dictionary that we have on the category and we'll put this color replace it for the red and see how this looks there we go so we have all the objects with different slightly different colors you can see you can see blue looks like it's for pedestrians we have green for a bike and red for vehicles now another thing i'm going to quickly show you that opencv can do is adding uh text labels and it's actually pretty busy here in this image with a lot going on so adding the text image label is a a little bit hard to read but the way that we do that is is similar to adding the rectangle cv2s is put text method that we can run on our on our image so let me put this down here as an example we're going to rename this image example and you also need to put in the the text format there are a few that cv2 gives us we're going to use hershey triplex and try seeing how this looks we also can give the size of the font the color of the font which will just keep the same as the color of the bounding box and if you look closely you can see that each box now has a text label to it it's a little hard to read as i said but you get the idea if you wanted to have um the labels shown with each of the boxes that's definitely possible all right now to the fun part we're gonna label and output annotated video we're gonna create a brand new video with all these anime annotated images and then we're gonna show that so like before when we were running this annotation code what we're gonna do is we're gonna take that in abstract that out into its own function here this just takes out uh from the video labels it takes all the video labels of the last frame that we have available and then it adds these rectangles with our colors from our color map that we had created before and we can run this function now on each of the frames as we iterate over them one thing i'm realizing is we want to release the video capture object okay so how do we output a new video from the images that we have so cv2 similar to the video capture has something called a video writer so we can do cv2.videowriter to create a writer object and this takes a few different things so we're going to call this out test.mp4 and it also needs to take a codec that we're we're planning on using so we're going to be using mp4 and we write that down here it also is going to take our frames per second which we'll define here and it will take the let's split this up and it will take the width and height of the video that we plan to write to now you need to make sure that you have these correct because if you try to actually create a video with the wrong width and height of the image that you're creating it's not going to work so if we remember from before this was our frames per second our width and our height and then we're also going to pull the number of frames in this video total and we're going to loop over this like we did before first we need to create our video capture object so we can iterate over the images in this existing video and then we're gonna do for frame and and actually i'm gonna use tqdm here so we can track the progress of this for frame and the number of frames we're going to do our return image cap.read and then we're going to also say if return is false we will break out of this loop otherwise we're gonna add the annotations like we did before we this takes the image the frame number and then our video labels and then here's the key part we're gonna take our out object which is our video writer oh we needed actually called this out and then we will just write the new image to it once it's done iterating over we want to do a few things like release the out file and we're gonna take the capture device and release that as well so i do need to add the total to this tqdm to make it show it correctly that's right we need to figure out the number of frames after we create the capture object and there we go now it's running through it's looping through each image adding our annotations and saving it to this out test file okay now that that's done running if i do a ls g flash color this is just my way of displaying the files in this root directory now we have this out test mp4 file that's 72 megabytes now before we can view this we actually need to use ffmpeg again to convert this file into a compressed mp4 and i'm just going to paste in here i'm going to paste in here this is the sub process command that we can do to convert from this out test into a compressed version of the mp4 which we can then view we can see this is running here and like last time i could re-run this afterwards with the output suppressed there it's complete you can see the process is completed i'm actually going to remove this out test compressed and i'm going to add this log level quiet command to our ffmpeg so we don't see all this output when i rerun this notebook again there we go we can see that it's completed running and the final thing we want to do we can also see the compressed version is here it's it's uh compressed a little bit smaller we can display the video by using i p python display out test compressed mp4 and let's do the width equals c600 again now if we we're playing this video and we can see that the labels that we have in our label data set are now overlaid on the video it might look a little bit like it's not smooth and remember that's because the labels are in five hertz and the video is in 60 hertz but five times a second the video is updating the labels and showing us all these objects so this is cool and that's the end that's the last thing i wanted to show you with how to work with video data in python thanks so much for watching this video on how to work with video data in python using opencv i hope that you learned a lot and this encourages you to explore some video data it's a really awesome field and there's a lot that you can do with machine learning once you have the basic understanding of how to interact with video files again if you enjoyed this video please like and subscribe and i'll see you in the next one

Info

Channel: Rob Mulla

Views: 59,181

Rating: undefined out of 5

Keywords: video data in python, video processing opencv python, opencv tutorial for beginners, computer vision basics, rob mulla, computer vision on video, video processing for machine learning, machine learning video data, how to open video in python, how to read video in opencv python, edit video in python, video editing python, opencv python tutorial, computer vision course, image processing python, learn opencv, video analysis, computer vision, how to open video file in python

Id: AxIc-vGaHQ0

Channel Id: undefined

Length: 32min 4sec (1924 seconds)

Published: Thu Jun 09 2022