AI on the Jetson Nano LESSON 53: Object Detection and Recognition in OpenCV

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello guys this is Palma quarter with top tech boy comm and we are here today with lesson number 53 in our incredible tutorial series where you are learning artificial intelligence on the Jetson nano I'm going to need you to pour yourself an enormous mug of iced coffee and I'm gonna need you to get ready to learn some cool new stuff so let's go ahead and let's get out our Jetson nano gear and as you're getting out your gear as always I want to give a shout out to you guys who are helping me out over at patreon it's an encouragement to me and it's a help and it's your help and encouragement that keeps this great content coming you guys that are not helping out yet look down in the description there is a link over to my patreon account think about hopping on over there and hooking a brother up but enough of this shameless self-promotion let's jump in and talk about what we are going to learn today what we are going to learn is we're going to learn how to do object detection and object recognition and this is where we're really getting into the machine learning in the deep neural networks this is where we're really getting into the heart of artificial intelligence now last week we did image recognition and the obvious question is what is the difference between image recognition and object detection well image recognition is where if you show if you show the computer something like I have a nice little teapot here if I show it the teapot it knows that it's a teapot but image recognition really works in the context of like showing it a very clear picture like right now you see there's a microphone there's all this stuff my face in the background you would sort of give it more like a really clear picture with nothing else in it where it sees obviously what you're showing it and then it tells you what object that you have put there now that is neat and that is exciting and we had some fun with that last week okay we had some fun with that last week but what we're gonna do this week is we're gonna go beyond that to something much much more powerful and that is object detection and the way object detection is you look at a much more complex scene like you point the camera at the room and looking at this very general case it goes in and it picks out things that it recognizes in the room and so actually object detection is quite a bit harder because it's a much more complicated problem also it's detecting more than object one object at once one object might be kind of partially in front of the other object can get kind of complicated the other thing that can that object detection does that image recognition does does not do object detection not only recognizes the object but then detects where it is so it gives you a location of it where image recognition just said okay I see a teapot object detection would say I see a teapot and it's at this location in that location so really quite a bit more complicated also I should probably I should probably get out of your way so I will get out of your way here but let's kind of go in and look also it's something that's really important I did a search on hello ai world and this first response back is dusty - NVIDIA Envy Jetson inference hello ai world so we're going to that Nvidia hate hello ai world page on github and if we scroll down towards the bottom we can see these two things that we are doing we are doing an image recognition which we did last week and you can use different pre-trained models and I believe we were having the most luck with Google net okay I hope you played around with different ones of these models okay but remember when I had you installed those Jetson utilities and Jetson inference library I had you install all of these so you can play around with the different models it's just that it's that detect that detection setup that Jetson inference where you're where you're loading that library that you specify which one of these that you're using okay in today's lesson we're not going to be using an image recognition model we are going to be using an object detection model and you have all of these different object detection models you can play around with different ones I think we're using one of the mobile net ones today but don't use an image recognition library to do object detection and don't use an object detection library to do image recognition also it's this column here this is the keyword you use in your program to get it it gives you these other descriptors but it's the center column and similarly over here on object detection we need to use one of these in the second column if we want it to work right okay enough of this introductory nonsense let's jump over here and get ready to learn some cool new stuff so we are gonna go ahead and open up visual studio code and then I believe we are working in the Nvidia folder and it looks like we've done four programs so far on deep learning so we will come up and create a new one which is going to be deep learning dot dash five dot Pi that looks good okay and we come up with a fresh new window here fresh new program I am going to move the file navigator out of the way so you have a nice clear view of what we're doing well what do we need we need to import okay we need to import jetson dot inference okay that's the library that's going to let us do the object detection and as always or at least as we did last week we are going to import jets and utilities the jets and utilities really allow you to either work with the camera work with a display or do some conversions and so we'll get both of those libraries we like to do frames per second and so we need to import import the time library our old friend CV e2 is always a good one to have okay now we will be doing a frames per second in our loop and so we need to go ahead and we need to set up that time stamp to take our first measurement time stamp is equal to time dot time and so this is just grabbing a time of where we started the program and then it'll begin doing it in the loop we also are going to calculate frames per second filtered so we need to start that with a value so it doesn't die the first time through and we'll just set that as zero all right now this is the important one we're going to set up our inference you know you know set up our inference Network so we're going to call it net and that is going to be equal to Jetson dot Jetson dot inference dot detect net and now we've got to tell it what detection network we are using I'm going to use SSD - mobile net - V - V - and I think I just showed those to you right and so I am using this SSD mobile net version - all right so that will look good and then we need to give it a threshold I've got to turn this thing off or it's gonna keep beeping hi I'm sorry about that okay now we've got to give it a threshold value like how do you say whether you found something or that well you need to have like let's say a 50 percent confidence level so we will say Thresh ooh and that needs to be outside that needs to be outside the quote okay and so we're gonna say threshold threshold threshold is equal to point five all right now we need to tell it the display with we want so we will say display with and I got that wrong got an extra parenthesis let me close this parenthesis over here that looks good display with we will make 1280 okay display height is equal to 720 okay and now we're going to say cam is equal to Jetson dot util dot GST that's like GStreamer camera I believe is what that stands for Jetson utils GST camera and now we've got to tell it the display width that we want the display height that we want and then for the raspberry pi camera we just put in single quotes zero all right now for you home gamers that are using the will copy this and paste it if you are using a webcam instead of the Raspberry Pi camera I do believe you would want to put slash de vie slash video one and it's very important to spell video right okay so I think that we'll just go ahead and we will start with we are going to start with using the webcam so I'll just comment out I'll comment out the Raspberry Pi camera remind me to come back and make sure that it works with both of them just to make sure that our code is good okay now we have set up the camera now we need to set up the display display is equal to Jetson dot util dot GL display okay and this will just set up whatever display you're using all right it'll just use the display that you're using so this could not get much simpler so now we need to start doing the stuff the real stuff so we're going to create a while loop while display dot is open while display is open we're going to open close and then this will be a Lou that needs to be outside okay well display is open and now what are we going to do well we're gonna grab our image so IMG we're also going to grab width and we're also going to grab height and that's going to be equal to cam dot capture cam capture RGB eh okay and open close all right so we're using the Jetson utility to grab the frame to do the object detection and then to display the results now what we're going to see is is that you can do the grabbing the frame with OpenCV and you can do the displaying with OpenCV but we're doing the simplest possible program to begin with okay that all looks good now we are going to do a display display dot render once why is it display because that's what we named the display object up here if I'd call that screen then I would call it screen here what are we gonna do we're gonna render once so that's kind of like the imshow in OpenCV and then we are gonna do IMG comma width comma height hei ght okay so we are going to render once our image that we got width and height okay now we're going to calculate our frames per second so we're going to say DT that's change in time is equal to time dot time that's the present time - add timestamp which we set up at the top of the program we already have that and now we need to go ahead and grab timestamp for the next time around because we're looking for elapsed time so we've got to get the timestamp that we're going to use the next time through the loop so we're going to say timestamp is going to be equal to time dot time the present time now how much time elapsed since the last time through the loop in the present time through the loop well that would be DT and that's going to be equal to time dot I already did that I'm sorry how are you did that so frames per second FPS is equal to one over DT so like if if it takes a tenth of a second to go through the loop how many times can you go through the loop a second ten times so it's 1 over whatever that elapsed time is and then because this can be all over the place we are going to low-pass filter it in the way we do that as per before we say frames per second FPS filtered fps filtered is equal to 0.9 times whatever we had before frames per second filtered plus 0.1 times whatever our latest reading is and what this means is we don't trust a new reading we kind of gradually change over time and it takes all the noise out of your data low pass filter very very powerful tool okay now what we're gonna do is we're gonna go ahead and print our and let's round it off so I'm gonna take the store I'm gonna make a string and I'm going to round I don't think I need to make a string I think I can just print round no I need to make a string okay string of rounds we're gonna round off what frames per second and how many places do we want to round it to one so we just have one decimal point okay and then we close the two parentheses and then we concatenate with that string the plus just adds a string to the end of a string we're gonna concatenate what the label space fps alright guys what is the chance that this thing is going to run I've been doing a lot of typing here let's right mouse click let's run Python file in terminal and let's see what happens oh right off the bat built-in network was requested what is this nonsense did I oh I misspelled mobile guys yelling at me mobile net its mobile net all right that was a horrible mistake made it further than we did the first time loading the cocoa uff that's a big milestone okay let's see what's happening we got a lot of stuff happening ah okay boom all right let's take a look here's the camera and it died what it did not like didn't like frames per second oh okay here I said frames per second filled where it was an uppercase F did you guys catch that that was a typo all right right mouse click and I'll have my camera at the ready this time run type Python file in terminal okay let's see I'll get some stuff going here okay it is loading I'm trying to get some stuff here ready for you to look at okay now it froze up on me so that's not good and so let's see if we can figure out what happened and this thing is frozen hard let's see let's see if I can that was really frozen hard what did it not like still didn't like frames per second filled I'm having problems here okay frames per second is what over DT frames per second fi LT is 0.9 times frames per second filled what is this not liking line 20 do I still have a typo here FPS filt FPS filt I said filtered up here and it should have been FTS filled like that okay let's try it again taking me a second to get in my coding zone here got the camera ready to show it something if it pops up looks like it is still running down there and still pretty happy okay we got something happening here but not a whole lot okay it is not really doing anything and so what did I do wrong okay we got the camera working but it's not actually doing any detection okay let's look it's printing the frames per second that's good but it didn't actually do any ah ah we forgot the most important thing we never actually did the detection so detections okay is equal to net dot detect and then what are we going to detect IMG comma with comma height right so three things you grab the frame you analyze the frame and then you show your results of the frame with your analysis so we grab the frame we do the detection and then we display it with the render once command so it's sort of like grab analyze show grab analyze show this time I feel like this is real and it looked like we're operating in about 10 frames per second and that is not too shabby so let's see what happens here takes a second for these things to load okay all right look at that Shazam look at that it found the keyboard it sees two keyboards let's see yeah I have a lot of keyboards going here three keyboards uh-huh look at that and then it sees two mice and a keyboard let's see if it can do the camera it's having a little trouble with the camera it sees the cup okay it's in its interpreting the monitor as a TV that kind of makes sense let's look up here and it sees the umbrella it sees my lighting is an umbrella if I look over there it sees a chair okay and then let's see if it'll see that desk my desk has so much jump out it sees the stack of books it sees the computer screens out there it sees the chairs out there let's see I'm not sure I thought it should be able to do this camera but let's see we can give it a better view of the camera it might be that this model that I'm using doesn't know the camera but you can see the cup the keyboard let's see if we can give it a little different view there let's see if we pointed it the Jetson Nano it thinks it's a motorcycle the Jetson Nano thinks itself as a motorcycle the judson Nano dreams of one day being a motorcycle okay so let's see here we've got cup it sees my coffee and it sees the keyboard let's see if it'll do a pin okay I thinks it's a knife alright so not doing very well on the pin but what's just amazing is all the things that it can find when you when you put it out there you see it sees the books it sees the TVs in the background so it can pick the out of a very complicated scene umbrella sees the umbrellas in the chairs at the same time so that is pretty darn cool okay guys that is really really something neat but now let me click out of this so what do I love here I love that in about 15 minutes of code we're doing deep neural networks we are running a pre trained deep neural network and we're finding things and we're doing it at 10 frames a second so that is really really really exciting what do I not like so much about this well really all we did was make three library calls to already written jetson you know nvidia jetson libraries but if you look at this I really don't have any parameters to play with and let me show you like one thing like just even really simple things let me show you something really simple run it again they're not giving us any handles any parameters it's sort of like grab the frame analyze the frame and show the frames and we're not having any parameters or any hooks into the program to control exactly how it does it okay so do you see there that it turns the it opens up this window okay we don't have any ability to control how big that window is that window just comes out full screen whatever your screen is oh well you can go in and resize it well let's see if I come in and resize the window what it does is it crops the top part off and it doesn't give me any scroll bars and so right off the bat I see that what they are doing here no offense to them but it's kind of like let's make it simple so people all understand it but then they make it so simple that they don't give us any ability to do anything with it on our own so if we came in and on this display render once absolutely no ability this width and height this the width of the frame it's not the width of the window so they can give us any ability to control that so we're going to say that this command is totally useless beyond the simple demonstration also the problem is similarly with this image captured just about is useless all right and so what is very very useful is this this this was useless what was very useful was this one did with this one command where it goes in and does the detection that is the thing that is really really useful so what we need to do is we need to go back and take the strategy that we kind of ended up with in what was it about lesson number 50 where we're gonna grab the frame using OpenCV we're gonna convert it using CUDA from numpy and then we're gonna do the detections here with the jetson utility the net detect and then we're gonna display it using OpenCV so we're gonna go in and we are going to work on this program to make it better but I want to save this where it is so let's go ahead and I'm just going to save it as deep learning 6 now or I'm gonna start a new program deep learning 6 so I can do that by doing a save as and I will call this one deep deep learning 6 alright and save and now I will be editing deep learning 6 but we're not going to lose our deep learning 5 ok so let's go in and let's see if we can make this thing where we can actually start doing something because like if we're gonna start doing things on our own I need to know where the objects are and I need to know what they are and then I need to be doing things but if you look at this program it didn't even return to us positions it's sort of passed the parameters from one of its routines to another one and it kind of cut us out of the loop so we got a bust into this thing we got a bust into this thing okay so let's come up to the top and see what we need to do and I think right off the bat we still need Jetson inference Jetson utilities because we are going to be doing the detection and a conversion we're going to need time and OpenCV we're also going to need import hi okay we will need dumb pipe whoops and we need to import numpy as NP and like that okay we'll still do our timestamp and our frames per second filtered and we are still going to do our detection Network okay we're still going to do our detection Network because we'll need to do that alright now we have the display width we have the display height now also because we might want to use the Raspberry Pi camera and we might want to use that long GStreamer link launch string so we need that's going to want to know the flip method so we're gonna say flip is - because my camera here my camera needs a flip - to make it right-side up yours might not depending on how you have it mounted and then also since we're going to be doing OpenCV and we're going to be labeling our image we need to go ahead and set up a font so we're gonna say font is equal to cv 2 dot font underscore hershey underscore simplex you guys notice how I kind of in one of the earlier lessons I fixed that autocomplete thing but it still seems to get confused like sometimes it's not doing the opencv autocomplete and you got to kill the program and then you got to start it again and then it seems to do the oh the the intellisense or the autocomplete for a little while but it's still not working perfectly so I am greatly disappointed with that okay so we have set up our font okay now we need to do the GStreamer code for launching the camera and the easiest way to do that is go to the most excellent ww top tech boy calm and come back to our less less an AI on the Jetson Nano lesson 52 and remember how we came up with that optimized ring that really sets up a great that really sets up a very clear image from the raspberry pie camera really really high image quality well we need to get that here and it's just this line of code so we will click on the two little pages right mouse click copy and then we are going to come back over here right mouse click paste and now we have that we have that really really good GStreamer string okay and now after that we need to go ahead and set up the camera so we will say cam is equal to cv 2 . video capture capture and then we will do that on cam set like that all right so that is if you're using the Raspberry Pi camera all right now and that was if we were using the Raspberry Pi camera on the earlier program okay and I just comment that out now we are not going to be using the Jetson utility to do the camera so we're going to comment out that line I'm gonna leave it in because if we want to switch back and forth we can always uncomment a comment it out and then we are not going to be using the Jetson utilities for the display so we're going to comment that out okay does that make sense I hope it does but now with those commented out we've got to come in and now we have to put in right here the setup if we're going to use the webcam with this and so we would say cam here would be equal to C v2 dot video capture okay and now we are going to use slash GE v / vid do1 should be the webcam alright now remember that remember this business down here this detection this net detect it's going to want the width and height well we set the width and height there but that's kind of like what we're asking the camera for but do we really know that that is what we got so what we need to do is we need to make sure that you know in the in I guess what I'm saying is in the Raspberry Pi camera we send those parameters but we just launched we just simply launch the webcam so we need to explicitly tell it that so we're gonna say cam dot set and what are we gonna do see v2 dot cap underscore prop underscore frame underscore width and we're going to set that to our display width from above now I think the easiest thing is to copy this and then paste it and this time we are going to set cap prop frame height okay and then this will be display height like that so now what we're going to do is we either need to comment out these three or we need to comment out these two okay I think I'll use the webcam so I'm going to comment out the Raspberry Pi can if I remember will come back and try it with both okay so now we have set those things so that is good and now we're going to let's see let me think here a second we I just want to make sure that we have all of our setup bookkeeping done alright now since we're not using the Jetson display we can't use while display is open because we're not using that anymore so we're going to comment that out I'm just gonna comment things out rather than erase them and so we're gonna say while what our old while true while true when it's true true I think it wants an uppercase true while true okay when its true true true is always true so we're creating an infinite loop okay now what we are going to do is we are not going to read it using the Jetson utility the cam capture rgba we are going to go back to our old friend underscore comma and then what are we going to get we are going to get image IMG and that is going to be equal to cam dot read our old OpenCV command for reading the camera all right so now good news is we have the image but while we ask for a very specific size we better ask what size we have because this detections if you don't use the right width and height you're going to get you're going to get a crash you got to make sure that you're sending it not what you asked for but what you actually have and so you can say height is equal to IMG dot shape and it is height is the zeroth element of that array and that is should be like this height is the zeroth element and then width is equal to IMG dot shape and it is the first element and so if you do an IMG shape it takes your image and it tells you the height comma width and so we're breaking that little that little array up into the two things we want height and width all right so we have that that is very useful now this is the problem and this is what we talked about last week our problem is our problem is that that image that comes from OpenCV is a blue green red image with a depth of 3 and like an 8-bit integer numbers okay so it's an 8-bit integer depth of 3 and that is not what net dot detect wants so we have to do a conversion and remember that for conversion is first we've got to go to our GBA from BGR and so we're gonna do a and we're gonna create a new image you see I want to create this image because I'll use this image to display down in the imshow and so I don't want to convert to and then have to convert from and so I'm gonna keep IMG because I'll use it later and I'll create a new variable which I will call frame and that is equal to cv to cv to dot see the T color with an uppercase C and then IMG dot C v2 dot and now I'm going to do the conversion yeah that was not right okay what am I going to convert I'm gonna make frame a new variable a new picture out of image by converting the color and then how am I going to do the color conversion well see v2 dot color underscore BG are to RGB a because our friend mr. net dot detect wants an RGB a all right now it's worse than that after the parenthesis you have to give it the type of numbers it's going to be so it's going to be as type and it's going to be n P dot float float 32 so it's a 32-bit floating-point number which is the number that tells you what the color strength is in each position as opposed to you know a simple like I believe it's an 8 8 bit unsigned in it that that OpenCV uses so this should give us the right array now okay but the problem is it's up here in the normal memory and we need it down in the CUDA we need it in the CUDA course so we've got to do that conversion to a CUDA array so now we're gonna we made frame out of image but now we're just going to still use frame we're just gonna change it and what we're gonna do is Jetson dot util dot CUDA from numpy and we want to do that on what frame okay so we're gonna take image we're going to turn it into frame that's going to be an RGB a float32 type of array and now we're going to take that and we're gonna change it into a CUDA array and now frame should be something that the detection program now is going to be happy with okay it's going to be happy with it now I guess we got to do something we get this detection thing back but it's going to be pretty complicated because it has like all the locations and all the names of everything you find and so we're we're gonna have to figure out what to do with that and so I'm just gonna try print detection since it's probably gonna be like an array it's gonna be an array of some type that has a lot of information in it but let's just look at it first print detections that looks pretty good okay and now let me see what else do we want to do okay we're gonna print detections and now we are going to do our frames per second all of that kind of good stuff alright and now what we are going to do is we are going to have to clean up the program now you know like we have to do a sieve well well what I want to do is I want to put a label on the frame now of how fast we're going the the frames per second so let's go ahead and do that and this is always the command that I mess up on more than any other command and that is our old friend our old friend and I'm not going to print frames per second because I'm gonna display it so I'll take that out I'm not going to print frames per second I'm gonna display it with our old friend CV to dot put text and then we are going to put it on image remember that is our original image that is in the opencv format not frame which is in that cuda format that the Jetson detect wants so IMG we're gonna put on that and then the string of the round of the frames per second filtered which we just calculated and we're gonna round it to one okay and then we are going to come over outside of those two parentheses but before the end of the first parenthesis okay and then we're going to concatenate with that string the label so I'm going to put a quote space frames per second so that is a label that is going to be concatenated with the string value of the round value of frames-per-second filtered all right now we have to tell it where we want it and that's a tuple 0 comma 30 so we want to go over 0 but come down 30 so the label is on the picture and not half off the picture now we got to scoot over from that and we got to tell it the font which we defined above and then we want a height like a text height of about 1 and now what color will let's go red which is 0 comma 0 comma 255 ooh not o 0 0 comma 0 comma 255 and all that 0 comma 0 comma 255 is all inside of quotes and then let's give it a weight of 2 okay and now let's go ahead and show our frame cv2 dot I am show and then what are we going to show my detect cam and then what is it it is IMG cuz right that is the original image not the one that we converted okay and then let's be good and let's go ahead and do a CV c cv to move window move window and then we are going to move the window detect cam and then where are we going to move it we are going to move it to zero comma zero zero comma zero will be in the upper left of our screen so now we have that done now we got to clean up for a clean kill we got to have a clean kill when we do this and so now we will take care of all that stuff which will be if C v2 dot weight key for one millisecond will wait for one millisecond if that's equal equal Oh Rd the ordinance of Q so that's saying if the wait key sees the Q press then we are going to put a colon here at the end of the if statement what are we going to do we are going to break and then we'll come all the way back unand dent all the way and say kamd release to let that camera go and then we are going to say see v2 dot destroy all windows open close okay so now we are grabbing the image with open CV we're converting it to the CUDA top array that's used by the image the object detection program and then we're displaying things with the original image the open CV now this isn't going to label that image at all it's not going to label the frame but we're just seeing if the program runs at this point and then it's going to print out the one thing it's going to do is it's going to print out the detections so we can just at least sort of see how the data is coming to us so let's run Python file in terminal hasn't crashed yet maybe that's a good thing you know this model takes a little while to load okay Oh 1:36 what is this nonsense oooo did you guys see this I need to do the detections not on IMG on what frame okay I need to do it on Frank I went to all that work to create frame and then I didn't use it so let's try it again okay it's doing some stuff here ooh what is it not like this timeline 38 display render once oh-oh-oh yeah we don't want to do that remember we're not doing this display anymore okay we should have commented that outward we didn't create the display and so it certainly it's not going to be able to do that so we even gonna right mouse click remember that was the old Jetson utility method of displaying the results and we're using OpenCV so we should have commented that out okay looks like it is moving along are you guys seeing this the the trade-off I have oh all right now let's see what happens here okay so the cameras working now you see our frames per second of 10 frames per second that's what we want to see okay but now let's see we asked it to print out that detection Tsar Amen that this thing is not wanting to focus here huh well my webcam is not behaving very well okay but uh usually it's such a well-behaved webcam I'm not sure why it's not wanting to focus but I Oh I'll not worry about that right now because the real issue is what we want to see is what did it print out okay and let's see here this is showing the results of the conversion and then it's saying Jetson inference detect net dot detection object at and then we got this strange thing and so what that saying is this detection object that it's passing back to us is like a complicated data structure that a simple print can't parse it so it might be like arrays inside of arrays inside of array and so that is not going to give us any useful information and so what we know is is that on that detections it detects lots of different things so it'll be sort of like an array in a comma in an array in a comma and so I think instead of doing print detections we need to do a for loop where we try to step through that returned object and so this is just kind of a guess but I think that what I can do is say for detect that would be like one object in this list of objects for detect in the text shuns detection says that complicated object that was returned I'm going to step through it and look at it object by object for detect in detections and let me see here for detect in detections now let me just see if I just print detect so that's like the one the the one array or the one item inside of that complicated object that it's sending back to me so let's see if I can kind of get in and start looking at the data by doing this all right well that's right mouse-click let's run file in terminal all right whoa print detect what is this ah old school I'm still doing Python 2.7 I was very slow to upgrade of Python 3 but you got to use the parentheses in your print statements I do like Python 3 now that I finally moved to it like five years after everyone else did okay so I seem to have something going here and it looks like it's going to focus this time all right so what I'm going to do is just give it something to look at like I'm going to see if I can give it a couple of things to look at there and then quit out of it now let's see ah I am getting some results now look at this so it looks like it found two objects an object of class 76 and an object of class 74 and then it is telling me how confident it is it's 98 92 percent confident 99 percent confident and so if I were gonna guess this is probably the keyboard and this is probably the mouse but you notice it doesn't return to you keyboard and mouse it returns to you like this is an ID this is an ID class ID 76 and class ID 74 not to useful but don't worry we will get to the bottom of it but what do I really like I really like that it is giving me the top left of the Ottoman found in the bottom right of the item it found now it's also giving me the center in the area and a bunch of other stuff but really for me to do my magic in OpenCV I want the top left I want the bottom right and I want to know what it is that it found okay so let's see if we can get to the bottom of that and so I think we've got to now bust up I want it tells me the center that's kind of neat does this make sense yeah the center would be between the top and the okay so this makes sense but we got to kind of bust this up so that we have information that we can actually use and so we need to do that inside of our for loop we're in our for loop go there and so now we don't want to print detect we just did that to sort of see what it was all right but now we need to try to get that information out of that list and so what do I want well I want top and the way you do this is you say top is that or the first thing I want is the ID so ID is going to be equal to what did we call our object oh it's detect okay the object we're stepping through detections and we're finding each detect inside of detections so what I want is detect dot class ID because this is probably returning from a class an object from a class and so I should be this is probably the tag and this is probably the value so if I say ID is equal to detect dot class ID that should give me the ID okay so now if we run this we should see like a bunch of 74 s and 76 is I think if I'm parsing this right I'm just kind of guessing ok we'll run it and see what happens so what we should see is the two the two things that it that it's finding a keyboard and a mouse okay so I've got the keyboard in the mouse I got to get this over this let it see the keyboard in the mouse and then I'll cue okay and then let's come and see what was printed out okay was it finding anything I said oh I said ID is equal to let's say print ID so it was probably finding it good thing is it didn't crash now it should be printing out like 74 is in 76 is or whatever alright let's give it something to look at let's take some coffee notice I'm drinking hot coffee and cold coffee today okay so now there we go give it something to look at let's quit out of it now let's come down and see if it found anything oh yeah look you see it found a 74 and a 76 it found a 74 and a 76 okay so now the million-dollar question is what is a 74 and what is a 76 well luckily there is we can say the item the name of the item is equal to what what is the name of the item that is 76 in the name of the item that is 74 so we're gonna say item is equal to net that is that network that we created dot a very useful command get class description of what of ID okay I just found ID and now I want the name that goes with ID so I use my net object and it is get class description and I put that in the variable item and now let's print item all right now I think it should say keyboard and mouse I really think it should I am very confident I'm not even gonna hold my breath this time you notice we don't hold our breath so much anymore since these things take so long to to get loaded okay so there we go there we go now I'm going to quit out of it boom look at that it found a keyboard it found a person and it found a mouse where was the person just seeing my hand it grabbed my hand is a person and it found in the the mouse and the keyboard Shazaam all right we're busting this thing up man we're gonna get some stuff that we can really do things with okay so now I know what things are right and I know where they are but let's go ahead and bust out those things and I don't need to print these out anymore now go ahead and put them that's right so I've got the ID what else do I care about I want top is equal to detect dot t.o.p upper clay case and then top left is going to be equal to I am early what did I do I hit something strange on the keyboard you guys let me know can you see this font size well enough if I if I make it bigger it's easier for you to see but there's more stuff running off the end of the page give me feedback about it this is about the right size if you would like it bigger or smaller okay so now left oh man let me get back up here okay left is equal to detect dot left with an uppercase L and I need a t on the left all right and now top left bottom is equal to D Tech dot bottom and then right is equal to D Tech dot right okay so now I'm going to print the item and then where it is and so that would be top left bottom right could it really be that easy could it really be that easy run Python file in terminal okay confident I'm gonna get the camera at the ready just in case ah what happened Oh misspelled right did you guys catch that right some of this is my keyboard some of this is my wireless keyboard not working perfectly if you were small anytime you did video lessons on YouTube you would use a wired internet connection and you would use a wired keyboard but I live dangerously okay let's try it again that is really good coffee hey you guys that are sending me coffee thank you I just got some very nice coffee from a young man very very much appreciated it shows Sam look at that so now let's quit okay let's see what we found here a keyboard is at seven top left bottom right and a mouse okay you found the keyboard in the mouse all right guys that is really slick okay that is really slick all right this has been a little bit of a long listen so I'm going to go ahead and give you your homework now what your homework now is to take the image not the frame but remember take the image the only thing we use frame for is to do the detections now your with image and I want you to go in and I want you to box everything that it finds and label everything that it finds okay kind of like that original program ooh let's come back here okay let's go back to where we were that was deep learning five right let's try let's run this again okay there you go mouse and keyboard and let's see yeah you see the hand that sees the hand is a person okay that's kind of neat okay so you don't have to make shaded boxes like this I'm not sure how to do that in OpenCV but I want you to draw a box around the items that are found and then I want you to label them okay now yeah is this really cool what they did yes it was but the problem is you have absolutely no control over it absolutely no control at all over it so if you want to do something like imagine you wanted a robot to do something or you wanted to move something and you wanted to follow something you can't because that demonstration all it does is allow you to grab a frame analyze it and show a frame you can't get in there and start working with the numbers so it doesn't look like a huge thing that we are doing with what we did in today's lesson but it's the difference between just running someone else's demo and you being able to go out and understand how to write your own programs so start doing some really neat and some really cool stuff all right guys I really hope you're having as much fun as I am I hope you're having as much fun watching these things as I'm having making them and guys I've got some really really really exciting stuff coming up here in the next few lessons and man fifty three lessons we've also learned a lot you guys also think about kind of following along some of my Jetson Xavier in X lessons because on those the early ones are gonna be a little bit of a repeat of some of the stuff you learned in these early lessons but then I'm gonna be going faster further on that set of lessons so you guys that have made it to this point in the Jetson nano series you might enjoy the Jetson Xavier in X lessons because those will also run on the jets and nano but I'll just be doing some stuff there that I haven't done on this series of lessons and so you might kind of at least check out the video titles coming out there and see if there's anything that you want to do alright guys really again really appreciate you tuning in and sticking with this if you liked it think about giving us a thumbs up think about subscribing to the channel hitting the hitting the bill so you'll get notifications on all my stuff that is coming out and you guys let's get more people involved see if you can share these lessons see is there a young person in your neighborhood or a relative that you could start teaching this stuff to you that you could you know kind of start passing on the stuff that you're learning to other people I know a lot of dads are working with their kids a lot of granddad's are working with their kids and that's what I really want I want to get the younger generation and involved in and not just using technology but knowing how to make technology knowing how to create things you know the world has enough users the world needs more creators and that's what I'm hoping to do with these series of lessons ok guys having a lot of fun palma quarter with top tech boy comm I will talk to you guys later
Info
Channel: Paul McWhorter
Views: 7,842
Rating: undefined out of 5
Keywords: NVIDIA, Jetson Nano, Machine Learning, Deep Neural Networks, Deep Learning
Id: 3mo6vlz0qGo
Channel Id: undefined
Length: 61min 2sec (3662 seconds)
Published: Sat Aug 01 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.