Real-Time Object Detection in 10 Lines of Python Code on Jetson Nano

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone dusty from Nvidia here today we're going to code real-time object detection Python from a live camera stream running on Jetson Nano Nvidia's $99 AI computer for deep learning and France and computer vision Jetson Nano is invidious smallest and lowest power embedded system and has an integrated GPU on board with 128 cuda cores and half a teraflop performance in 5 to 10 watts of power that's easy to use and runs lots of different neural networks from popular machine learning frameworks like tensorflow pi torch and cafe and a full Linux desktop with graphics acceleration the Nano is a complete computer with a quad-core arm CPU 4 gigabytes of RAM the CUDA GPU dedicated 4k video encoder and decoder and lots of i/o including hardware off-loaded camera capture 4 USB 3 ports PCI Express and dot 2 Gigabit Ethernet a microSD card slot and the typical low-level interfaces like I swear C spy you art and GPIO now before we dive into the code if you haven't already let's get your nano setup by following the getting started URL shown here or you can skip ahead a minute or two in the video if you already have it up and running out of the box the dev kit includes a preassembled nano compute module heatsink and carrier board ready to be plugged into your typical connections like HDMI or display port USB devices and a 5 volt power supply for the power supply you can use a 2.5 amp micro USB charger or 5 volt 4 m DC barrel jack adapter listed on the page are some recommended power supplies that work well next you'll flash your micro SD card within videos a jetpack image which contains the aboon 2 OS along with the Linux for Tegra kernel and videos tools like the CUDA toolkit and deep learning libraries like kudi and n and tensor RT you can flash the jetpack image to your SD card from a Windows Mac or Linux PC with the edger tool or similar which are covered in the instructions here then after plugging in an HDMI or DisplayPort monitor and USB keyboard or mouse you can take the SD card and plug it into the SD card slot on your Nana plug in your power adapter and the Nano will boot up automatically the first boot you'll need to do some initial configuration steps for Linux to create a new user and set up your networking connection so now on to the tutorial part of video we're going to be using a github called hello ai world which the URL will be shown here also in the description below and this github includes a number of different deep learning components for inferencing which is the runtime component of deep learning after the networks trained including image recognition object detection and semantic segmentation the hello ad world includes a runtime library in addition to a number of examples both in C++ and Python that you can use to deploy your own applications so the main capability that we'll be focusing on today from the hello AI world tutorial which you can run at any time on your own is the object detection capability and the tutorial comes with a number of pre trained networks that you can load on the Nano that are downloaded when you download the repo the one that we will mainly be using is the SSD mobile net v2 model which was trained on 91 different types of object classes using the MS cocoa data set the SSD mobile net v2 is the default model that's loaded although there's a number of different ones that you can use as well so after you know you go to the github browse around a little bit go to the building the project from source page which will detail how you download the repo and build it on the Nano and if you already have this installed you can skip ahead a few minutes but we're going to go through the steps to install it here so first what we'll do is install a couple of prerequisite packages through the app pack manager things like get and see make for the steps first you'll want to do a sudo apt-get update and then once that's done install get see make and if you want to use Python 3 dot sex you will install lib Python 3 - dev in python 3 - numpy again all of these instructions are covered in that page we just showed so after you have installed those packages you're going to want to clone the repo the addresses github.com slash dusty - and B slash Jetson - inference Jetson inference the name of the underlying library that the hello AI royal tutorial is built on and then it will clone that along with a bunch of documentation and test images that we will be able to use during this example here so after that is all cloned you're going to CD into the Jetson - inference directory then you're going to make a build directory that the whole package will be built - CD into that then run the see make step that's going to configure the whole thing and you'll see this screen pop up that lets you select which pre-built pre-trained networks you want to download generally it's okay just to select the defaults which will you know be covered in the tutorial here SSD mobile net be - is one of those pre-selected networks but you can select a bunch of others that you want like the image recognition other object detection networks and there's a bunch of segmentation networks included as well then it will automatically download these and if a download fails that will retry to download it for you as we see it complete those then it's going to ask you if you want to install PI torch we can skip that step at this time steps for that and PI torch is used for training networks onboard the Nano which we'll cover in a future tutorial then after it's done configuring what we're going to do is run the make command that will build the underlying C++ library in addition to the Python bindings which will allow us to interface with Python from C++ we sped that up a bit there too so this video isn't so long then you're gonna run sudo make install and that's gonna install it into your system then one thing to run is sudo LD config just to link all the libraries appropriately across your system then after that whole system in libraries installed correctly you're going to CD into the a arch 64 slash bin directory underneath jetson inference and there's a bunch of applications in addition to images that get built their test images that we'll be able to use during the tutorial to test that our detectors working appropriately so what we're going to do is process a couple of test images from the console before we actually run the full-on live camera recognition just to test that the library is working correctly and that the object detection network is producing valid results and what it's doing is running the inference through library called tensor RT and this is a library that accelerates inferencing on jetson and other Nvidia platforms and this program is written uses tensor RT underneath so you don't necessarily need to know the API what we're going to do is look into this Python program that basically loads an image from the disk and then runs it through the tensor RT object detection API which is a wrapper that we've essentially written around ten so RT for you and you can see here this is where the image is loaded which was specified from the command line then we create this detect net object from the jetson inference library that's what uses tensor RT underneath to get the real-time performance then we use a net dot detect function which is a function that's implemented in the detect object and that returns to us a list of detections that have been returned from the image then we can save out those detections and the detections are also automatically overlaid on top of the image for us including bounding box information the confidence values and the label of the class that was detected okay so generally the first time you run a model it'll take a couple minutes to load tensor RT is doing its optimizations but when that's done you can navigate into the bin directory through your browser which is where the image is saved that we just processed you can open that up and view the results this is a image of a bunch of different pedestrians you can see here that the detector detected all four of them and the confidence values are quite high so let's now do a couple more test images this next one is going to have both a car and a human in it that are overlapping so it will serve as a challenging example so we'll run the detect nak console program again on this different image then you see this output in the browser here you see here it's text both the car and the human independently with high confidence even though there's a lot of overlap between them this next example called airplane underscore zero dot jpg also has a lot of overlapping content which is challenging we open up the output image here we'll see that it detected three different humans and the airplane even though there was a bunch of overlap between them so in addition to humans and cars and different types of vehicles the SSD mobile net Network which is trained on the cocoa dataset includes a bunch of different types of animals like cats dogs and a bunch of different zoo animals as well so this is an image of a cat and you can see it also detects that model in the image here let's do another one on a dog this one is called dog underscores your JPEG and there a bunch of images in each of these sequences that you can try let's check out the results of this one this one has a dog and it's human in it you see it detected them both even though you know they were kind of overlapping there so we'll do one more here this one is a horse horse underscore zero dot jpg we check out the results from this one we can see that it's a horse and it detects the rider on top of it which is actually a pretty challenging because it wasn't really trained on images of people riding horses per se okay so now that we built the repo tested the code and you know perform some test images now we're going to proceed to actually doing the coding for our real-time object detection program from the camera so go to the coding your own object detection step you can follow along on this github page every step we're going to do is also covered here so that could make it easy to follow for you so for thing we want to do is open up your text editor of choice here we're just going to use the default text editor with a boon to G edit and you're going to open a new file and save it as my dash detection dot PI or you know whatever you want and save it wherever in this case we're just saving it in the users home directory so the very first thing we'll do as you saw in the detect that console sample is we're going to import the jetson inference and Jetson dot utils libraries these are the Python bindings for the course C++ library from hello world that used the tensor RT library underneath to accelerate the enforcing to real-time rates so the next thing that we're going to do is create a detect meant object just like we saw in the previous script it's using jetson inference detect net now you could use like a using import statement here in pipe so you don't have to type these all out for clarity of where these objects originate from I just use the full module paths here so we create the detect net object we're going to load the SSD mobile net b2 model and we're going to set the threshold to 0.5 or 50% which is the default we just set it here for clarity so you know how to change it in the future if you need to you can decrease the threshold and it will detect more objects or increase the threshold and it will detect less objects if you're getting lots of spurious detections or you know not getting enough detection for example so this SSD mobile net v2 string it corresponds in the table of pre-trained models that were downloaded with the repo that we saw before here is that table you can change out this string for the other objects like SSD Inception v2 which is a bit larger network a little bit slower but also more accurate and here is also the API documentation for the detect net object so this is where all of the parameters to the different detect net functions are documented and you can go in there and look at all the different options available to you okay so the next thing we're going to do now that we've loaded the object detection Network is to create the camera object this is using an object from the Jetson utils module called GST camera which uses gstreamer underneath to either use a video for linux 2 or a mipi CSI camera so example cameras that you can use can be found here on the Jetson Nano wiki I most common are generate the Raspberry Pi camera module v2 which is a mipi CSI camera based on the I MX 219 sensor which support is built into the jetpack l40 coronel 4 or you can use like common USB webcams like the Logitech c920 which is what I'm going to use an example because the USB cameras have like a record typically so you can move it around the room and test things out and make it easier that way so there's also API documentation available for the Jetson utils module and you can find it here that was the GST camera documentation that will show you all of the different options and functions available for that so here we just specify GST camera object they were going to create the first parameter is the desired width followed by the height followed by the device file for video for Linux - in this case my camera is on dev video 1 so you're supposed to file that here or if you're using a mipi CSI camera you will just specify 0 or 1 or the index of the mipi CSI camera in general you will have one maybe si si camera plugged in so it will be 0 you can list the cameras that you have available with the v4l to control command V for l2 control - list devices and this will list out all of the different video for Linux - cameras available in your system so this was how I found that my Logitech c920 camera was connected to dev slash video 1 and then you can find which video formats are supported by using the list formats extended command which will list all of the valid resolutions that you can set the camera to here's 1280 by 720 listed which I've set in the code which is generally like a good resolution to to set on these USB or it be CSI cameras so next we're going to create an open GL display window for rendering the results of the overlay - that's just using the Jetson dead utils dot GL display object which is in that documentation that we showed a minute ago next we're going to create a main application loop which is basically just going to loop forever until the display gets closed by the user so we'll use this while display dot is open function which will return true if the display window is still open or false if it has been closed or exited by the user which would then terminate the program so the first command inside of the application main loop is the camera capture command which will return the image along with its dimensions from either the video from the next to or the movie CSI camera and this function will block until the next frame is available if it's at 30 Hertz for example it'll wait until the system has received the next frame then it basically takes the raw format of the camera and converts it into floating-point rgba on the GPU so that we're able to use it with the neural network next we're going to use the detect net object to actually perform the detections just like in the console example it looks very similar we just pass in the image along with its dimensions and it returns to us the list of detections here's the documentation for the detect function it also takes it an optional string that specifies the format of the overlay which can be bounding boxes or labels or confidence values or any combination of those by default it will output the boxes the confidence values and the class labels all of the same image so the list of detections that it outputs each of those has a bunch of different members included in it including the bounding box the confidence values and the labels and a bunch of different functions to use and manipulate those detections those are all documented here in the code that you can use so if you want to make your own custom you can interact directly with this list of detections next what we're going to do is we're going to render the overlaid image out to the opengl window we just use this display dot render once function which signifies to the window that we're just gonna render this one texture so after we render that you know flip the back buffer so you don't have to manually make another call to do that the last line in the program is we're going to update the title of the window to include the current performance so we just do that with this string formatting command here and we're going to get the performance of the network via the net dot get network FPS function which uses the internal profiling mechanisms to spit out the frames per second that the networks processing in so that's it it took all of ten lines of Python code and we basically imported the modules we loaded the network we created the camera and the opengl display and then we created our main loop which captured the image perform the detection and render them out to the display so now we can run the application and play around with it detecting a bunch of different objects that you might have at your desk or in your household or in your office space so CD your console back to where you stored the script in this case it was the user's home directory then you can launch it by running the Python space my dash detection PI which basically we're just gonna load this script with Python then it's gonna load the network like before ok so here we go it's detecting me and a desk chair in the background running at 22 frames per second on the name you can generally expect between 22 and 25 frames per second out of the SSD mobile net B to model on nano it also detects a bunch of different types of animals cats dogs if you have furry friends at home you can play around with it I hear some other like zoo animals that it detects an elephant a bear or horse funny enough these are just little animal figures but it detects them as the real object here are some Matchbox cars that it detects as real cars with high confidence values as you can see the transfer learning and it detects a bunch of different kitchen objects as well cups bowls that you might have around including different stuff that you might have on your desk like keyboards mouse displays laptops so here's looking at my dirty desktop here's the Nano that all this is running on right now you can see I actually have a bit be CSI camera attached to here's a laptop that it's detecting on my workbench and I actually have another laptop that I use for a training drone that works so it can do a lot in real time on the Nano all in that little script that we wrote so that will conclude the object detection portion of the tutorial but we hope that you follow the full hello AI road which includes image recognition and also segmentation the image recognition portion will walk you through how you classify images with networks like Google net or ResNet based networks and like object detection includes a bunch of pre-trained networks and you start by processing images from the console and then doing a similar live camera demo both in Python or C++ depending on your preference then you can follow up with the semantic segmentation step which is a lot like classification except that it classifies at a per pixel level and there's a bunch of cool pre-trained models that are included for that for things like self-driving or off-road robot navigation this is the cityscapes model which is Buhler semantic segmentation dataset this is an off roads trail dataset that it was trained on called deep scene there's also this multi human parsing model that's included for doing things like pose estimation and then one of the classic semantic segmentation datasets is called Pascal vo C which includes 21 different types of objects all of these models are actually 21 classes except for deep scene which is five and then the final model is based on the Sun RGB dataset which has a bunch of different indoor scenes from office spaces homes bedrooms kitchens and is like really cool if you're working indoors so that's the hello AI world tutorial we'll be adding more features and hopefully doing more videos on hello ai world walking you through the steps but you can also file the tutorial at your home pace so there a bunch of other resources to help you get started with Nano here on the dev kit page including a bunch of different projects that have come from the community like the Jet pot which is an autonomous open-source robot created around the jetsam nano other ones like the jet racer which is an RC race car that can be trained to be autonomous using the Nano and a bunch of different projects that the community has contributed so you can use those as inspiration how to apply your own applications to the Nana if you need any help there is also a community support forum for all of the Jetsons including the Nano you can find a link to it right here on your jetson desktop so thanks for joining us for the video we hope that you'll try out the hello AI world tutorial including a bunch of the other jets and nano projects available and have a lot of fun and we'll see you next time

Info

Channel: NVIDIA Developer

Views: 407,031

Rating: undefined out of 5

Keywords: NVIDIA Jetson Nano, Python tutorial, Real-Time Object Detection, AI Object Detection

Id: bcM5AQSAzUY

Channel Id: undefined

Length: 26min 18sec (1578 seconds)

Published: Fri Jan 31 2020