Jetson AI Fundamentals - S3E2 - Image Classification Inference

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone dusty from nvidia here in this video from the jetson ai fundamentals we're going to be covering image classification on jets and nano using deep neural networks and then writing our own python script that does recognition then we'll show real time classification on a live video stream from a camera and play around with objects that you might have around your house or your desk so with that let's get started a while so navigate your browser to the jetson inference github repository the url is shown below and in the description and what we're going to start with first is the classifying images with imagenet section from the hello ai world tutorial what image classification does is it returns to you the object class that best represents the entire image as opposed to object detection or segmentation which can provide multiple objects per image detection for example can give you bounding boxes of multiple independent objects per image but the basic building block of that is classification and then those networks serve as the backbone into the more complicated architectures like uh for object detection and segmentation so the models that we'll be using in this example are google net resnet and a bunch of other pre-trained models that come with the repo here the ones that are automatically downloaded by default are google net and resnet18 but if you like these other models you can download those as well and all of these pre-trained models are trained on the 1000 class imagenet data set uh the ils vrc benchmark and you can see all the objects that these models were trained to detect here there's lots of different animals different animal species different types of food household objects different types of vehicles there's really a lot in here almost too much that's why in a later video we're actually going to be retraining our own classification model on a subset of this data or on data that you collect yourself for custom objects that you might want to detect but when in doubt we're running the demos here uh consult this list to see if what you're trying to recognize is actually in the list of objects that the network was trained on okay so what we're going to do first is essentially run the image classification network on a bunch of different test images just to familiarize ourselves with what it's like what it does how it works and to do that we're going to be using this imagenet program it's a sample that the repo comes with later we'll make our own sample that's boiled down and go through the code line by line but the one that comes out of the box with the repo is shown here let me zoom in a bit so you can see the code more easily essentially it just loads the network from disk by default this would be google net but you can pass in resnet or any of those other ones from the pre-trained list or even your own custom model so after the network is loaded it creates video input and output streams and then in a while loop it essentially processes each of those frames classifies it gets the textual description of the class and in this example it overlays that text on top of the image and then saves the image back out or renders it to the screen so if you haven't already yeah get this back here to the normal zoom okay so you haven't already fire up a terminal window and navigate to your jetson inference directory and then we're going to start a doctor container if you aren't already running one so run the doctor slash run script then once then once that started cd into build slash ar64 bin this isn't strictly necessary because you can run these programs from wherever inside the container but it does help in order to use autocomplete to find the the test image files that we're going to be using and you can find those in your file browser here jetson inference data images directory so here's all a bunch of example images that the repo comes with and you can download your own images uh you know capture them from camera find them online things like that for for testing so there's a bunch of examples that are already shown in the repo but let's do some different ones that aren't already covered these object ones look interesting and we're actually able to process them all at the same time without having to run the program multiple times so in order to do that we will just run the imagenet program and then pass in this path images object we'll use a wildcard here so it actually processes out all of those and then we're going to save it to images slash test directory i'll explain why we use that specific directory here in a moment so what it's going to do now is it's going to load the network which is optimized with tensor rt the very first time you start up a particular model it's going to take a couple minutes to run optimizations on the model but then after that you know it saves that to disk and loads nearly instantly so after it does the optimizations and loads then it's going to process all those images that we passed in and even though this is running in the docker container we can go to the test directory we save those to from your host and actually view them because this particular directory is mounted into the docker container so meaning any changes here are reflected inside the container and vice versa so that makes it easy to browse these with your file browser as opposed to trying to do that from inside the container we have a bunch of different objects that it's classified thinks these are mopeds but pretty close trains things like that let's see what else we can play around with here here's a bunch of different cats that the repo comes with so let's try that images slash cat use the wild card again i will mention when you're doing the wild card it's important to enclose that inside of quotes because otherwise the reap the terminal will automatically auto expand those and so save it back out cat underscore i the i just means that's the index of the image that gets processed so it'll that'll be expanded to be 0 1 2 3. you can insert whatever pattern there that that you want to use okay let's check how these came out so it says it's a tiger cat i think that's a small cat breed that looks like a tiger you can kind of see why it's got like orange and black stripes on it tappy cat persian cat and lynx so interesting thing about the imagenet dataset is lots of different species and breeds of cats and dogs and bears and all types of different animals so it can actually be able to tell these different species or breeds of cats apart from each other which is quite interesting okay so now that we've played around with the sample a bit processed a bunch of test images we can actually see how to run this on a real-time video the only difference how we'll run it is instead of passing in the file name of an image we'll pass in the file name of a video to run instead and the type of videos that supports loading are here on the camera streaming and multimedia page it supports mp4 mkv avi flv in these codecs here like h.264 h.265 and mpeg and these samples can actually do a wide variety of multimedia streaming like from video files on disks or sequence of images that we just did or usb cameras maybe csi cameras or even network video streams and those are all documented here how you run with those how you use them and if you're using a compressed format like this video we're going to load that actually uses the jetsons hardware video codec to do that decompression in hardware and likewise you can save out a video that gets compressed uh using the jetson hardware so it's not using the the cpu for that okay so there's a test video that's linked to here on the github page let's download that quickly here and then we can run it with the imagenet program like we did before so same kind of thing but this time we're just going to pass in the file name of this video file instead of the image after loads starts loading the video here we'll see the screen pop up and in this case it's running at 80 frames per second and processing all of these frames independently this is just a real simple video of a bunch of jellyfish floating around um but you can supply your own videos record videos and then show them later we'll show how to run it on a live camera stream that is fun to play around with but next what we're going to actually do is code our own recognition program in python so go to the next step in the github tutorial here and this is what it's going to look like it's very simple pretty similar to the previous example that we just ran except this one we're going to go through each line of code and it's going to load an image from disk and then print out the classification result from that so in order to set this project up what we're actually going to do is exit the docker container here so you can do that by pressing ctrl d or typing exit and the reason we're going to quit the docker container and then later restart it is because we actually want to store this program on the host here so and mount that into the docker container so that we don't lose our program after we edit it and shut down the container because any files that you save inside the container are going to be lost when you shut that down unless you make a checkpoint or something like that but oftentimes it's just simpler to mount a directory from your host device on the jetson itself into the the container environment so let's make a directory in our user's home and i'm just going to call mine my recognition cd into that and just create a blank file with the touch command my recognition dot pi and then there are some example images here that you can download right from the repo or you can supply your own test images if you want i think these are different bears different kinds of bears here okay so download those let's check that these are good to go okay so now we're going to go back and fire up the container again this time mounting in our my recognition directory into the container so in order to do that when we run the docker run script this time we're going to use the dash dash volume command the dash volume argument and what you do first is pass in the directory on the host that you want to mount and then separate it with a colon and then it's the directory inside the container that you want it mounted to so you'll see here when it starts this time the user volume flag is showing up and for sanity purposes let's make sure we can navigate to this directory and there's all the files but this time it's being run from inside the container okay so now we're going to actually start editing the code and we'll go through this line by line you could edit this from within the container itself oftentimes it's just easier to open it in your text editor of choice from the host okay so the first thing we're going to do is import the jetson inference module that's a python wrapper around the c plus version of the jetson inference library and because of that it has very fast performance and you should see the performance of jetson inference python be almost identical to the c plus version because it's actually implemented under the covers with c plus and tensor rt and cuda things like that then also import the jetson utils library that does all the video i o and display windows things like that whereas the jetson inference library just does the image recognition object detection segmentation and the deep neural network primitives then we're going to import the arc parse module and just do some basic command line parsing actually we can just copy that code here from the github page because it's just boilerplate just takes in a file name and an optional network which the default which is googlenet and then we're going to load the image so just do image equals jetson.utils.loadimage and pass in our file name argument here this load image function from jetson utils is a bit different than other image loaders that you might have worked with in python in that the memory is directly loaded the image is directly loaded into gpu memory so you don't need to do that extra step and because of that it's a bit faster next we're going to load the network so net equals jetson.inference.imagenet and pass in this optional network argument and i i manually write out the module names here but if you prefer you can use a using statement and skip that i just like to be explicit in this tutorial where those actual objects are located in in which module next we're going to actually classify the image this is going to give us back the class index that the network thought the image represented and then also the confidence value of that between zero and one one meaning a hundred percent confident um we use the net dot classify function pass in our image and if you're curious about these different functions that are available the different objects you can consult the python reference documentation that comes with the repo so here's the imagenet python object and all the different parameters that you can load it with the different functions like classify what objects and uh parameters those take there's also a c plus documentation if if you prefer to use that okay so we got the class index and the confidence back and um in order to actually get the class name itself we're going to use a mapping function that looks up that big list of objects and returns to us the text from just the index so class name equals net dot get class description pass in the index the index in this case will be between 0 and 999 since these models have a thousand different object classes in them and it just looks that up in the big list of objects and gives you back the textual description next we're just going to do a real simple print statement of what the results were so print we're going to print out the string and the class id and the confidence let's format that class name class index and the confidence okay so that's it simple example but you can take this build on it from here output the value to whatever you're doing you're working on a project using gpio or servos or things like that to have this actually go and control something but for illustrator purposes we're just going to output the result to the the console here so in order to run this do python3 myrecognition.pi and then let's try this black bear image okay so the result was it thinks it's 98 black bear let's try the next one here which is the brown bear run it again this time specifying the brown bear image instead this result was 99.9 confident that this is a brown bear so again kind of like the cat example it's kind of interesting that it's able to tell different types of species apart from each other as opposed to just a whole high level thing like that's a bear i actually can tell that's a black bear versus brown bear versus polar bear so let's try it on this polar bear example too in this case it's a hundred percent confident that it's a polar bear okay so now that we've covered how the code actually looks how it works how we classify an image let's test on a real-time camera stream and play around with a bunch of different objects that you might have around your house i have here my desk so in order to do that we're just going to run the imagenet program again but this time pass in an identifier of your camera and you can find the type of cameras that are supported here on the camera streaming and multi media page again and i'm going to be using a video for linux 2 camera it's a logitech c920 that i'm using but there's lots of different cameras that are supported you can use a mippy csi camera you can use an rtp or rtsp video stream generally the usb webcams are plug and play and really easy to get going so my camera is mounted to dev slash video zero that's a video for linux 2 device so that's going to be the path that i pass in to the imagenet program here so that's going to fire up and this is running on my desk here i have a couple different objects that we can try around there's a an apple that it actually picked out as a particular type of apple granny smith some kids toys for my kids see if it can recognize this hammerhead shark because my hand was in the way there it's interesting to play around with it and see what confuses it and we'll we'll see later when we actually retrain our own model this type of experiment experimentation is very useful to learn how the network responds what type of additional training data you might need to to gather also it's kind of fun just to point it at your desk because it can do tvs and common items that you might just be working with see if we can do laptops things like that okay so that concludes the video on image classification inference in a later video we're going to retrain our own image classification models and also do object detection and other types of deep neural networks so thanks for joining us and we hope to see you next time to learn more visit nvidia.com dli or email us at nvdli at nvidia.com
Info
Channel: NVIDIA Developer
Views: 18,175
Rating: undefined out of 5
Keywords:
Id: QatH8iF0Efk
Channel Id: undefined
Length: 22min 52sec (1372 seconds)
Published: Mon Nov 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.