Jetson AI Fundamentals - S3E5 - Training Object Detection Models

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone dusty from nvidia here in this video from the jetson ai fundamentals we're going to train our own object detection networks using pi torch onboard jets and nano then we'll collect our own data sets and test it on a live camera stream so with that let's get started a while so go to the jetson inference page on github at the url shown here and go down to the retraining ssd mobilenet page under object detection training and you might remember from the previous video on object detection inference ssd mobile net was the network architecture that we used for that with the pre-trained ms cocoa model and we're gonna use that same architecture again but just retrain it on a couple custom data sets that we're gonna use so the first one we're gonna try is this open images data set that we can download online and open images has a lot of different object classes in it it has over 600 different types of objects that you can pick and choose from and you can browse those all here in a big list and kind of explore what type of source material you're dealing with and there's also a searchable list included here on the github in case you're wondering if that object class is included with the the data set so what i'm going to be doing is downloading a bunch of different types of fruits and then retraining the model on that but you're welcome to pick and choose whichever types of classes that you're interested in in using if you have a particular application in mind or just want a particular kind of model so there's a script here the open images downloader script that's included with the project and that will automatically download these images of a particular classes for us so let's fire up a docker container here cd into your jetson inference directory and then use the docker run script okay and then we're gonna go into the python slash training slash detection ssd directory and you can see here here's where all these pytorch training scripts and utilities live so what we'll do is pass in a bunch of class names to this open images downloader and then also where we want to save the data to so you can just copy and paste this if you do want to train on your own classes from the open images data set you can substitute those in this string here all right so we'll kick this off so first it's going to download a bunch of annotation data then it'll download all of the images for us okay so it's done downloading and what i will say is that for certain object classes open images has really a lot of data for example images of people or vehicles some of them have hundreds of gigabytes of images when we're training on the jetson or the nano you know generally you want to keep it less than 10 000 images to not have the training time be too high so what you can do first i recommend is running this downloader script with the stats only option and that will print out exactly how many images are in the the data set that you selected to be downloaded and then once you know that you can tell it to you know only download 2500 images or 10 000 images i think this fruits one i downloaded has 6 500 or so images in it so depending on your disk space and how much time you want to spend training you can limit the number of images that downloads that way otherwise it could you know start pulling hundreds of gigabytes on you if you train a model of people for example so the open images data sets really quite large okay so now that we've got our data set we're going to use this train ssd.pi script that uses pi torch and retrains the model for us so what we can do is we'll go here do python3 train ssd.pi and first we're going to pass in the directory we saved the data to that was data fruit which that directory is mounted from the host into the container so all that data set's saved for us and then we're gonna uh tell it where we want the trained models to live so it'll be model slash fruit that's another directory that's mounted from the host into the container so all of our models will be saved for us uh and then we can tell what batch size we want the default batch size is four uh but since we're running on jets and nano which has uh you know a little less memory like the jets and nano 2 gigabyte and you know object detection training is very heavy on memory usage we're going to reduce the batch size to 2 and also tell it to only use one data loader worker thread and next you can tell how many epics you wanted to run for normally that would be 30 or up to 100 and you know lieu of time here i'm just going to do one for illustrative purposes all right let's kick this off [Music] okay so it's done training and next what we want to do like we did when we trained our image classification model we're going to want to export it from pi torch to onyx so that we can load it into tensor rt and run it on a bunch of images and stuff so we'll usually this onyx export script which is real similar to the classification one the onyx export and then point it to the model directory that you chose and model slash fruit and it'll export it to onyx for us okay all done and next what we'll do is use the detec net program like before but with these extended command line parameters that allow us to load a custom model into it okay so just run detect net and then our model will be model fruit ssd mobile net and then the labels is labels uh models slash fruit slash labels dot txt then we're gonna specify the input and output layers input zero detect net has two different output layers there's the confidence grid and then there's uh the bounding box data so output coverage and output bounding boxes and then we're going to run this on a bunch of fruit test images that are included in the repo so maybe jetson inference data images fruit wild card and then we'll save this to the images test directory so we can view it here okay so we'll fire this up here all right we see it start to run it looks like it's getting some detections that's really good for that our retraining worked all right so let's go into our file explorer here and actually view these let's go data images test okay here they are these were saved from the container to the host this one's good got a bunch of these the confidence values are very high that's a good sign it's getting some small ones interesting here it did miss this strawberry that it wasn't ripe yet so it's only trained on ripe strawberries apparently i've got some grapes awesome this is a good one i got a bunch of different oranges in there that's great got two different types of fruit in the same image there's another one with multiple types of fruit in the same one there's one with a lot of different fruits in it alrighty it's very good it looks like it worked pretty well actually and you know i was able to classify a bunch of these test images again you're welcome to play around iterate on this download a bunch of new test classes depending on what you have in mind but what we're going to actually do next is collect our own detection data set that's totally custom and we're going to use this camera capture tool that we used previously to collect our own image classification data set it just turns out this tool actually has a mode for doing detection data sets and bounding boxes too so what we'll do here is fire up the camera capture tool again point in our video device that's dev video zero okay and to do a detection data set as opposed to classification you just want to change this data set type drop down to detection and then you see there's a bunch more options that show up here and the gui is documented here on this page so if you're wondering what all these options mean below you're able to figure out what all of those mean okay so first it wants this the path and the path to the labels of your data set so let's go in here and create a new directory for our data set so we'll store it python training ssd the data directory let's create a new folder so what i'm going to do is i have a bunch of these little tractor toys that i'm going to train this on and a little tidbit about myself i grew up on a dairy farm actually so i have a bunch of these toys left over from when i was a kid and i'm going to call my data set tractors okay so we'll go in here and while we're at it let's make our labels file so call that labels.txt and open this in a text editor here and so my data sets gonna have four classes there's a john deere there's cat for caterpillar there is a case international and then there's just like a flatbed truck that i have okay save that out then go back into the gui here and point it to that path that we created so that'll be under sort data fruit or data tractors this time and point it to your label file here okay so we'll start putting the object in front of the camera and there's a couple ways you can use this tool one if you have live objects that you're trying to annotate like a person in front of the camera generally you would want to freeze the frame so you can press the freeze button here that will then freeze the camera frame and you can start drawing bounding boxes over things and once you draw a bounding box you'll see this entry here and you can change the class of that bounding box by changing this drop down and there's a couple other options down here there's save on unfreeze that will automatically save this image and annotation when you go to unfreeze it there's clear on unfreeze that will clear all the bounding boxes when you unfreeze it i'm going to uncheck that because the way i like to work is you know move the object around and keep the bounding box and so i don't have to redraw it all from scratch every time and then there's the merge sets option which that duplicates the data between train val and test data sets here and in a production model you don't want to use that you want to maintain complete independence between those data sets but just for this test example it makes it go a lot faster just to you know essentially have one data set so then you can unfreeze the image it'll save it and you know then move it around a little bit move the bounding box to the new location you can resize them here and press the save button again and it's generally pretty important to keep the bounding box pretty tight around the object as well as you know getting a lot of different viewpoints and orientations of the objects if you're doing a real data set you would want to have multiple objects on screen at the same time for example you know collecting multiple of these here so you can you can draw multiple bounding boxes and change them in my example here i'm not going to do too many that actually have multiple because it really increases the number of images and i'm just going to collect 100 images per class here and that should be enough to do a test example in reality you know you can get into several thousands of images pretty easily just by you know running lots of different orientations and how many different objects are on screen at the same time different camera viewpoints different backgrounds so in order to make a really robust model you know you want to have several thousands of images but for this test i'm just gonna go and collect a hundred so i'm gonna get set up a while for collecting a bunch of this data now [Music] so [Music] okay so i've collected uh 400 images in my data set 100 for each object class so we can shut down this tool now and what we're going to do is retrain the network the same way we did previously the only difference is is we're going to specify this data set type equals voc argument which just means the it's in pascal voc data set format which you can see here what that means is there's just a couple different folders there's the annotations there's all of the images and then a list of all the images for each of the training vowel and test set and the annotations are just in xml here that you can open and check these out so it tells you what the image file name is and then all the objects in there and what their bounding box is and this is referred to as the pascal voc format which comes from that very popular pascal data set it just turned out this is easier format to to work with than the open images data set which was made to be you know a very huge data set so this one's much more manageable so we will just run this again but use this data set type flag and point it to our new data set as opposed to the fruits one that we previously downloaded okay so let's run python3 train ssd.pi use the data set type equals voc and then we're going to point it to our data set so that would be data tractors in my case and the model directory i'm going to output this to is models tractors and let's specify the same batch size of two and the data loader workers of one that we used last time i'm just going to train this for one epic but you know you can do as much as you want 30 100 the generally these custom data sets that you create yourself are smaller and will train faster than the ones that you might download from online okay fire that up okay so it's finished training what we'll do next is export it from pi torch to onyx again so python3 onyx export pi point it to the same model there models tractors and once this is exported we'll be able to load it with tensor rt into our camera program okay all done now we'll just use a very similar command line that we did with the fruits test except we're going to specify a camera device here instead of some file names so detect net model equals models tractors ssd mobile net then labels equals models tractors labels.txt what i will say about the labels is that it's important to use the label file that gets saved into the models directory when you're running the inference because when you train the model with pi torch it adds a background class as the first class and if instead you try to load the labels file from your data set that you originally created that has no background class then when you're doing the inferencing you know your labels will be mismatched so when you're running this inferencing test use the labels file that gets saved automatically into your models okay then specify the input and output layers here put blob equals input 0 the coverage and the bounty boxes okay and my camera is dev video zero so let's fire this up all right so we got the truck there that's a good start tried a bunch of different orientations here all right let's try a different one let's try the john deere alrighty very good it's doing pretty well actually for only having 100 images per class it's like very certain that these are the objects i guess if i did this again i would put my arm in the picture a couple times because it seems i didn't have any images with my hand in there actually so you could always retrain it like that it would be more robust against the hand okay let's try this earth mover here all right very good all right and last one here let's try the case perfect okay so it's recognizing all of the object classes that's a what we wanted all right so i mentioned previously i wasn't collecting any images that had multiple objects per image per labeled image just to keep the complexity down because otherwise you could be there all day doing these in all different combinations and such but i'm just curious to see if it can actually still do multiple okay so it's got two there all right it's got three and it's able to do all of them that's awesome so even though it wasn't trained having all of these in the same image it's able to to get those um just from its independent training so it's uh quite robust actually um okay so another little experiment that i'm gonna try here is i have a couple of other tractors that there's still like a john deere and a caterpillar and here's like a case but they're different than the tractors that i collected the data on and trained the network with so let's see if it's able to you know adapt itself and wow okay 100 sure so it seems like if it it was trained on images of a green tractor that we told it was john deere and now if it sees other green tractors like that then it and it's it's like 100 sure that's awesome okay so here's another one this is a cat bulldozer but i trained it on a different type of caterpillar sweet okay and let's try this other okay okay it's got that and i actually have a little wagon for this guy okay he even does even got the wagon on there funny okay let's see if this works with having multiple okay it's got that let's see about the john deere here we're getting a little there we go awesome it's really done quite well at identifying obviously it thinks green tractors john deere yellow caterpillar red case international so it all depends on what you train it on and you can train it on whatever types of objects you like if you have a particular application or demo or robot or something in mind that you want to do you can train it to detect whatever so alrighty well that brings us to the end of this object detection retraining video and you know thanks a lot for joining us and we'll see you next time i'm dusty from nvidia to learn more visit nvidia.com dli or email us at nvdli at nvidia.com

Info

Channel: NVIDIA Developer

Views: 33,588

Rating: undefined out of 5

Keywords:

Id: 2XMkPW_sIGg

Channel Id: undefined

Length: 24min 33sec (1473 seconds)

Published: Mon Nov 02 2020