AI on the Jetson Nano LESSON 50: Introduction to Deep Learning and Deep Neural Networks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello guys this is Palma quarter with top tech boy comm and we are here today with lesson number 50 can you believe it lesson number 50 in our incredible tutorial series where you are learning artificial intelligence on the Jetson nano I will need you to pour yourself a nice big mug of iced coffee and I will need you to get ready to learn some cool new stuff so let's go ahead and get out our jets and nano gear and as you're getting out you're just a nano gear as always I want to give a shout out to you guys who are helping me out over at patreon your help and encouragement keeps this great content coming you guys who are not helping out yet look down into the description there is a link over to my patreon account think about hopping on over there and hooking a brother up but enough of this shameless self-promotion let's jump in and let's talk about what we are going to learn today because today is really an exciting lesson in today is where we really get into the idea of deep learning and deep neural networks on the jetson Nano and so today we're going to kind of really start working more towards what real artificial intelligence is so I should kind of give you a little bit of a background a little bit of definition of words a lot of times you hear about machine learning artificial intelligence and machine learning and kind of on the surface machine learning can be something that's really a lot simpler and a lot of times would be kind of what I would call curve fitting that you could look at a whole bunch of data where let's say that you looked at reaction rate as a function of temperature and you have all this data and you kind of do some linear regression or some curve fitting and then if you look at a certain temperature you can kind of predict a you can predict a reaction rate and that's maybe a little of an oversimplification but I think that a lot of machine learning could be thought of as sort of a straightforward linear regression or a straightforward curve fitting now when we start talking about deep learning and we start talking about deep neural networks in those cases you're trying to build a network that is not sort of like a simple input in a simple output but you're looking at a lot of things in between that are sort of supposed to kind of model what's going on in your head your head has neurons and there's all these connections between the neurons and it's not just the neuron it's the connections between the neurons and that would be kind of like a neural network of how your brain works and then that's what we're trying to simulate when we are building a neural network or a deep neural network in the computer or specifically in our case on the Jetson Nano now if we're going to do deep learning in deep neural network we have to have models okay and we have to use those models and to have those models there's really kind of two parts of it there's the training of the model and then once we have a trained model there's using the model and you could kind of think about our first look at this was really with our face recognition program because we were training it on a set of faces then once that training was done then we were we were using that model that we had developed to be able to look at a face and figure out who it was and that was really kind of kind of pretty much fun but but the problem with that is that was very very specific towards faces and identifying specific people and so that maybe was kind of starting to get into deep neural networks but but you got to think there's a lot more stuff in the world than just than just faces and so if you're gonna recognize more than just faces you're gonna have to kind of move to the next level so what we're gonna do today is we're going to kind of talk about there's going to be two things that we're going to be doing moving forward they're going to be a lot of things we do moving forward but the next two things that we're going to be looking at is first of all the first and kind of simplest deep deep neural net work or deep learning that we're gonna do is called image recognition and let me let me kind of define what image recognition is image recognition you show the system a picture of something and it it recognizes it or identifies it okay and this would be using a system or a model cut that we call image net okay because what you got to understand is within the context of what we are doing here and what within the context of what normal people with normal equipment can do really it's very easy for us to use a model that someone has developed but it's very very hard for us to develop a model on our own and so what we're going to be doing is we are going to be using image net models that people have already developed and some of these models took years and years and years to develop in probably millions and millions in bazillions of dollars and so we're going to take advantage of those models that already exist and so one of the first model sets we'll be using is image net and that will be for image recognition now what do I mean by image recognition image recognition would be where you present the system with an object and it figures out what the object is based on its training model or its training set and so like right now if you look at the situation here you've got a lot of different stuff you'd see the microphone the curtain me my sharp my glasses so it's really kind of kind of complicated so you would I think for image recognition try to give a simpler view and so let's say that I switched it like this and then I got things out of the way and then I presented it with a coffee cup okay picture of a picture and try to give it an object in as simple of a context as you could and then it figures out what it is and that is what we call image recognition and we do that with the image net models and that's what we're going to work on today now much much more complicated than that be object detection and what object detection is is that like let's just say I took the camera and pointed it at the room and then it figures out oh here's a chair here's a desk or if I had something like this it would recognize me as a person and then in front of me it would recognize the mouse and so kind of image object detection does a couple of things it looks at a complicated scene and recognizes all the things that it can that it identifies all the things that it can recognize also object detection you know puts it in a space so it gives you the location of the object that you were detecting for the earlier example that I gave you with just image recognition it doesn't tell me anything about where the teapot is I think I'm going to call this a teapot I was struggling for words earlier but it didn't tell me anything about where the teapot was it was just saying okay buddy the thing that you put in the front of the camera was a teapot so we will begin our journey into deep learning and deep neural networks today and we will begin by doing image recognition where we're going to use one of these trained models that already exist and then we're going to use it for our benefit of identifying what objects are so I will need you to call up our old friend Visual Studio for those of you who might just be randomly finding this video realize this is video number 50 and in video number 49 we installed the libraries and the utilities and the tools that would allow us to do that would allow us to do deep learning and so if you just have a chat jetson Nano and just happen to this video you've got to at least do the installations and so forth that were shown in video number 49 and I don't want my videos to be too long so I kind of break them up like this and so we have a nice new we have a nice new visual studio code and I am working today in our Nvidia folder and I am going to create a new file and I am going to call it deep learning - 1 dot Pi and I know you probably can't see that but you can name it and put it where you want I'm calling it deep learning - 1 dot pi ok we've got a fresh new Python file here and I know that nothing makes you guys more angry than for me to be in front of what I am typing so I will move out of your way and I will try to be very mindful to not cover up anything that you need to see so if we are going to do deep learning and machine learning and deep neural networks we will need to import some libraries and so the first thing we are going to import is the Jetson inference libraries the Jetson inference libraries will allow us to do both image recognition and object detection we will start today again with image recognition we need to import jetson dot utils the utilities are what kind of let us interact with the camera and let us interact with a screen and so those are important I like to always as we're going through this a really big deal in this deep learning is framerate how fast can you do what you're doing ok and that depends on your hardware and so it's great that we have a Jetson Nano some of this stuff would be very hard to do on a Raspberry Pi would not be practical to do on a Raspberry Pi we need those 128 or however many CUDA cores that we got down there to do this type of deep learning and we've got to have the hardware but even with the hardware we've got to write the programs and we got to be mindful that we're doing it in such a way that we're taking advantage of that speed of that hardware down there so I like to always do frames per second to see if we're doing the job that we're doing at a fast enough frame rate so I will port time okay we will need to set the width and the height of our displays so I'm going to go ahead and say width is equal to 1280 and width is equal to 720 now you can't just set this to random things because sometimes the cameras will only accept one or they will only accept certain specific allowed Frank frame sizes and so you can't set this to random things I'm going to be looking today at the examples of both using the Raspberry Pi camera and I'm also going to be using a webcam and the reason I'm using 1280 by 720 is that is one of the resolutions that will work for both cameras which makes it a little bit easier to go through this code okay now let's see what I am going to do there's two ways that we can do the cam and I will do both of them so now we've loaded our utilities I'm going to create the camera object so I'm gonna say cam is equal to Jetson dot util dot G St I think that's like gstreamer but we're gonna make a GStreamer camera camera okay and what we're gonna want is we're gonna want width and then we're gonna want height and then we got to tell it which camera will for the webcam it's gonna be /de V / vid do1 now I think that the webcam will probably always be video 1 but if for some reason if you don't have a Raspberry Pi camera plugged in I don't know it might put the USB camera on video 0 so if video 1 doesn't work and you have a webcam you might try video 0 if you have two webcams then the next web cam would probably be at video 2 you kind of see how that works this is really nice this is very very nice for the USB camera but we might also want to I'm going to copy that and paste it we also want to do the Raspberry Pi camera and for the Raspberry Pi camera you still have width and height but you just tell it that it's camera zero now I have the old board the old Jetson nano board that has the one camera slot and that camera is at in single quotes zero it's camera zero now my guess would be and I can't confirm this because I don't have the dual the dual Raspberry Pi camera Jetson Nano but I would think probably the other camera would most likely be one in this particular call okay now let's start out by using the webcam so I'm going to comment out this you comment out the camera that you don't have okay and then you leave uncommented the camera that you do have so we've got that done and now we've got a camera created so now I need to create a display so I'm gonna say display is equal to Jetson dot utils again and this time a GL display and an open close notice the uppercase D okay you guys I hope know that these these uppercase lowercase are in fact very important so we have created a camera now and we created a display now what we need to do is we need to we need to create we need to create our network our deep learning network and so I'm gonna call my net you can call yours what you want but and that's a good one and that's going to be Jetson dot inference and again today we will be doing image recognition so we will be using the image net notice the uppercase in an image net and then I've got to tell it which model we're going to use him we are going to use today Google net and darn it I should have had this called up already but I'll kind of show you what which ones I'll try to show you which one's okay what we are going to search on is we are going to search on hello ai world jetson Nano and then you see that we've got two days to a demo Nvidia developer and then we have got AI world so we're gonna click on AI world and then we're going to come down under ai world and you can see that we have classifying images with image net right there let's try to go there and let's come down and here are the networks are the models the pre trained models that you can use now in Lesson number 49 we downloaded all of these models so we are going to use Google net okay you can try all of these different ones because different ones will find different things and some work better than others and some work better for some things than others and so you can play with all of these but that is what goes right there and if you follow the long property unless a number properly in Lesson number 49 you will have all of those that was a digression I did not mean to spend so much time on okay but now we have we have created our display we have created our network and now I'm just going to grab the time so I'm going to say time mark is equal to time dot time so this is just grabbing what the clock time is at that point that I'm going in this loop and so I will use that in order to calculate frames per second and then also I have to give it an initial frames per second filter frames per second filter is going to be equal to zero because I'm going to have a low-pass filter that's going to be smoothing out that frames per second so I've got a kind of an issue initialize it up here now we're going to go into our while Lu and we're gonna say while display that's what we just created up there while that display is open as long as it is open we're going to be in this loop now when you kill the window it will kill the program so we are now in our loop and what we need to do is we need to go out and grab our frame and so we get frame and then what this classifier or what this what this frame grabber does is it returns three parameters it returns the frame it returns the width and it returns the height okay and it's important that we use this width and height because even though we gave it a width and height in case it's giving us something different back we've got to make sure that we use this width and height okay so we've got width comma height and then that is going to be equal to cam that's the camera we created and then capture RGB a now what you notice is you notice that we are going to be working in RGB a as opposed to our old friend OpenCV which used BGR and this will calls us a little pain as we go through this as you'll see later okay so now i have a frame what i want to do is i want to look at it and see if i can identify or recognize an image in the frame so I will look for class ID because this classifier doesn't return a word like teapot it returns a number like 3 and then you've got to take the 3 and look it up on a list to see what classification goes with that class IDs but we're going to get class ID and then how confident are we confidence so this call is going to return two things class ID and confidence and the call is net that is the deep neural net that we set up above net okay net dot what are we going to do we are going to classify we are going to classify and what do we have to send it we have to send it frame and the width and the height and again that is the width and height that sent us and the cam capture rgba we're now going to send back to it with a little luck this will give us a class ID and now what I need to do is I need to find what the actual item is and we do that with net get class description get class disc net and I didn't put the dot in there net dot get class disc okay and then what do I send it well class ID and so if what it finds is an item of class for now I send this get class description I send it for and then it comes back and tells me it's a teapot or whatever okay so now I should actually know what the item is and so what I want to do now is I'm going to go ahead and calculate the frames per second and so I'm going to calculate a change in time DT is equal to time dot time that's what time it is now that's what time it is now and I'm going to subtract the time mark I got up above okay the time mark I got up above and so that's the change in time that has elapsed so far and then frames per second is equal to one over DT so like if I'm operating at a DT if the change in time is a tenth of a second between trips through the loop then what is the frames per second it's one over the change in time and so it's a real simple little way to calculate the frames per second but there typically is a lot of noise in that so I want to smooth that out with a low-pass filter so I'm going to create a new parameter called frames per second filtered and right I set that up up above filtered and make sure I got the same variable name for per second FPS filter okay and that is going to be a low-pass filter that I'm gonna put on frames per second so I'm going to keep point nine five ninety five percent of the old value which was frames which was frames per second filter and then I'm just going to put on point oh five or five percent times the new frames per second and so this will smooth that function out that frames per second so it's not jumping all over the place now if I'm going to do that what I'm gonna need to do then I'm going to need to go ahead and grab time mark for the next trip through so then the next time through I'm really looking at the DT or the elapsed time so that's time dot time okay so DT is going to be how long it takes to go through the loop once one over front one over DT is frames per second that's good and then I apply this low pass filter to smooth it out okay now we are ready to show the image we're ready to show the frame that we grabbed but what we need to do first is we need to go ahead and we need to put some text on it and the way we do this this is kind of like the opencv put text but here we need to oh you know what I didn't do up here I didn't create a font so I need to come back up here where I'm doing all of this good stuff I created my display but I did not create a font so I need to do a font is equal to jet Sun dot utils dot CUDA font with an uppercase F alright and then I'll do an open close and so now I have a font and so now I can do the equivalent in open C via put text but here it is going to be font that I just created dot overlay text and then where do I want to put it I want to put it on frame how big is frame it is width by height and then what do I want to put on it what's the text that I want to put well I want to put the string STR of I want to round off my frames per second because I don't want it to be twenty-one point three six four five seven C I don't want it to be that long so I want to round it and then what do I want around frames per second filter okay and I want to round it to comma one decimal spot okay so you see that frames per second comma one decimal spot so that should put me in pretty good shape I need to come over here for just a second pardon me for just a second as I as I get get this okay so that is going to be the string that it's going to be the rounded filtered frames per second and then I want to go before the last parenthesis and I want to add the string space FPS space so that's just a label I'm just taking the string which is the frames per second and I'm adding to it concatenating to it the strings space FPS frames per second space okay then what I need to do is I need to give it some more things and so I want to put what it actually is so I'm gonna put the plus okay do I need a plus there yeah so what I also want to label it with and I don't need that comma okay I'm just putting the string that I'm gonna put on there and I'm building that string and what do I want to put I want to put plus item and that's what it is that I actually found right that's what I actually found up here so I want to add that string to my string and you guys are gonna yell at me because for one microsecond you didn't see the screen okay so that's plus item all right now the a couple of other things that I need to put on here I don't want to get in the exact corner so I'm going to come down five and I'm going to go over five so that the label isn't off the string screen so I'm going to say come down five and go over five pixels okay and now what I will need to do is I will need to give it a foreground font or a text font I'm going to say font dot mudgin magenta as a color and then the background color and I'm gonna say font dot blue so do you see that font dot dot magenta comma font dot blue and that should work out very nicely if I'm thinking about this right alright that should work out very nice and so then after that last closed parenthesis be a little careful with closing your parentheses that's the last close and after this FSP filter right here that is closing the string function and then this is closing the round function so make sure that you get all of your parentheses balanced that's one of the hard things in troubleshooting is figuring out where you've messed up on that type of thing and that was quite annoying okay I'll come down here alright now we're ready to displace so what is our display call we called it display okay and then we want to render once so this is kind of like the CV to I am show where you're just showing the frame see a display dot render once what do we want to show frame and the frame we've already put the text on it and width and height okay so let's run this thing and let's see how many mistakes I made okay just a warning the first time you run this is going to take five minutes because it's got a sort of optimize and train and learn and and and and optimize all the parameters for this imagenet google knit and so if you haven't run google net before it might take five minutes but i've run it before so just take about 30 seconds for me I think okay ooh what is this height not found nonsense where is that did I misspelled height oh my goodness were you guys screaming at me with wi d th what kind of craziness was that with is 1280 and height is 720 I do not understand how it could have made that mistake okay and we will be running here in this call we will be running on the webcam okay run Python file in terminal jet Sun is not defined where is that in line one what I don't know me I think I might have started that wrong I hate this yeah you know what I hope I don't have to just kill this whole thing there's a kill terminal okay now let me try this again run Python file in terminal carefully this time looks a little bit happier loading up a model got some good stuff happening here okay it didn't show frames per second filler is built so frames per second filter let's try that hopefully you guys caught that frames per second FPS filter all right and then let's say you run Python file in terminal again it's thinking about it thinking about it will it work this time what did it not like that time did I misspell it wrong twice it says yeah where I count wow man it's mighty not working on my keyboard or something I think this wireless keyboard is maybe playing tricks on me so let's try it again where are we [Music] ah we've got some signs of life here all right so interesting we are operating at 10 frames per second and it sees the green screen behind me and it sees it as a shower curtain okay look at this this is really neat and so tens frames per second we're going really fast and so let me see if I can show it some other items okay and so let's look here and it sees the computer mouse okay and at 10 frames per second I'm getting computer mouse I'm getting computer keyboard I am getting spacebar I am getting punching pack no it's a microphone or mic that looks good then here it is seeing it as a monitor up here it is seeing my umbrella if I look over there it's seeing a desk boom guys do you see this this is pretty incredible for just a few minutes work okay let's look here and but do you notice how I'm trying to give it kind of a clear view of a dominant item and so that is a pill bottle can never have too much vitamin C in these days we're living in right now and let's see it sees that as a coffee mug I really wish that okay now it sees it as a coffee pod I wish that it would see it as a tea pot I would really sort of look at that more as being a teapot let's just see this is kind of a bowl cup type of thing so it sort of sees that as a coffee pot or coffee mug so I'd say that's pretty darn good let's look here and see if we can give it a nice view of a pen ballpoint pen that is looking pretty good I will show you my shoe not a very fancy shoe but it sees it as a loafer that's pretty good I kind of show it this and it's seeing a doormat No okay it's seeing a sleeping bag no jeans blue jeans or denim it's okay so that's pretty good all right man so this is pretty this is pretty darn exciting so you could see a couple of things that we could go in and start playing around with now remember on these we don't cue to quit we click to quit we kill the window to quit all right all right so that quit so we now have that working okay so let's sit and think about what I liked and what I didn't like what I really liked was how fast it was how bright it was but what I didn't like was I didn't like how notice when we run this thing around Python file in terminal I'll run it again what I don't like is I can set the image size but the thing makes the window full screen and so there's no tag I don't have any tag that would allow me to make the window the size of the image and this was kind of a silly way of doing it I mean they should have sized the window to match the size of the frame in my mind so I don't like that okay now for you guys that are running on the PI cam let's uncomment out that and let's comment out the webcam and let's see how this thing runs on the on the PI cam you guys leave me a comment down below who's using the webcam who's using the PI cam who has both because it kind of helps me boom okay so now what you can see is it recognizes the microphone all right now there are two things that I do not like about this okay there are two things that I don't like about this I don't like the that what I like about it is it's running at it's running at 23 frames per second okay 323 frames per second and watch this boom you see no latency at all 1 2 3 4 so fast no latency I don't like the quality of the image compared to the I don't like the quality of the image compared to the webcam did you notice how the webcam had much much greater brightness and contrast and the second thing I don't like about this is is that what appears to be happening is it appears to be cropping the image and not just it appears to be cropping the image in not resizing the image and let me just try that and see if that's really true so I'm gonna kill this well there's a couple of things I need to do I'm going to go back to the other one so that was like 22 23 frames per second let's check and see what the frames per second is on the webcam takes a second to get this thing going okay so it takes a second for the filter to come up to speed okay but you can see that we're operating at ten frames per second with the webcam but do you see how much higher quality the images from the webcam okay so let's see that impact it's getting microphone so 10 frames per second compared to 20 frames per second on the Raspberry Pi camera alright so let me let me kill this but I really think what is happening on this Raspberry Pi camera is I think that when we are setting it to 1280 by 720 instead of making a 12 in 1280 by 720 image it is cropping the bigger full scale so rather than scaling it down it's cropping it and that could be an issue later on okay so what is the biggest thing that I do not like about this I do not like that I get this great big window in that window your you can't even resize so it's sort of like steals your screen fills it up with a great big window and then it'll put a smaller image in there and so I do not like that at all so we're going to go in and we are going to try to see if we can alter this we can work with this for a second and then what we will what we will do is I've got to do a little bit of Windows management here okay okay so we're gonna see what I'd like to do is you know the thing about our old friend OpenCV we could control everything we had much much greater control over things so we could put the text however we wanted we could draw on the image we had all of these different tools that we could use but we just couldn't we couldn't access the these Jetson infra tools and so I want to use the Jetson inference tools because you could see that they were really fast 10 frames per second on the webcam in 20 frames per second on the Raspberry Pi camera that's pretty incredible but you see the thing is is that as far as what I can do with the camera and what I can do with the display I don't have much control at all so what my idea is is that could I grab it using the Jetson utilities camera and then do the jets and inference like we just did but then when we display it go back to OpenCV and display it with OpenCV and let's see number 1 does it work in number 2 how much performance hit do we take by doing that ok how much performance hit do we take by doing that does that sound reasonable ok I think that's a pretty good a pretty good thing to do so just remember webcam was about 10 frames per second Raspberry Pi camera was over 20 frames per second so if we are going to use OpenCV we are going to need to import it up here so let's come in and let's add import cv2 and then also we will need numpy because open cds is numb Pass we're going to import import numpy as NP that looks pretty good ok and then we still have with this 1280 and we still have height is 720 and then we still create the cameras the same we're still setting up this and now though what we are going to need is is that we are not going to be displaying using we're not going to be using GL displays so I'm going to comment that out and we're not going to be using the font on the display so I will comment those two things out because we're going to be doing the display side of this thing in OpenCV at least we going to see if that will work so what I will also need to do here is now create the font that OpenCV wants so I'm going to say font is going to be equal to cv to dot and then we will say font underscore Hershey underscore simplex okay and interesting it seemed like my autocomplete wasn't working again remember how we fix that autocomplete but it seems like it has ceased to work for at least right now we'll have to figure out what's going on there okay that looks good now still we will get our frame width height and we'll do a cam capture rgba but now we're going to be doing some conversions because remember OpenCV later on is going to want BGR and not rgba so we're going to have to do some conversions so I had to put a comment a comment a command in here in that command is 0 copy equal 1 and what that saying is don't be moving that image around in two different places in memory but kind of reference have like one place that the images and then reference that one place in memory and so it's beyond the scope of what I can really fully explain here but in order for these conversions to work right and for the program to work right we've got to do a zero copy equal one inside of this camera call ok now we are going to do a class ID and confidence that's going to stay the same we're going to figure out what the item is we're going to figure out what the frames per second is we're going to get the new time for the next one and now what we are going to need to do is let's see have we done our detection yet okay yeah so up here we did our detection right off the bat and then we figured out what the was and now when we get down here we are ready to display it but instead of displaying so we're gonna not put the text with the jetson utility and we're not going to do the display with the jetsam utility what we're going to do now is we are going to need to we're going to need to convert that frame from the cuda format which these jets and utilities use we're going to need to convert that over to our open CV format and that's going to take a couple of lines of code and so we're going to do a conversion where frame is equal to C is equal to Jetson utils so we're using the Jetson utilities and what do we want to do we want to go CUDA where we are to numpy okay so this frame image is in the cuda format and we want to go from cuda to numpy all right so look at that kind of carefully lowercase C and then the T and the in or uppercase so we're gonna convert from CUDA to numpy what are we going to convert frame how big is it width how high is it height okay and now we've got to give a dimension to the array the numpy array that we're creating and remember there's four numbers in this CUDA format our G B and a so if we're going to convert we got to give it a place for that fourth one for that a so our array is going to be Jetson utils could add a numpy frame that's going to be width height in a dimension of four all right and so now where are we we are probably right now in the numpy format but the problem is we're still in we're not in CUDA we're in numpy that's good but we're still in our GBA so what we would we do we would say frame is going to be equal to see v2 dot remember our friend CVT color convert color do you remember that one okay what do we're going to convert frame and then see v2 dot color and can you guess where we are going from we are going from RGB a to BG are all right now there is one other thing that we have to do that you haven't probably seen before we have to do dot as type because I believe that the CUDA those numbers came over as float 32 and numpy or I'm sorry OpenCV wants an unsigned int so we have to do the conversion as type in P dot unsigned int 8 and make sure you do this right you I and I did it wrong in PU I in T eight that's unsigned in it eight all right so that should now give us our frame in our old friendly open CV format of BGR so now we should be able to show it okay and so what I'm going to do is I'm gonna put the text on there so I will go back to C v2 dot put text okay C V to put text and I'm going to do frame and then I'm gonna make a string out of and I'm gonna round this is kind of similar to what we did before and what do we have frames per second filter cup the T in there this time round it to one and so now I have a rounded string of frames per second I come before the last before the last parenthesis and I'm going to concatenate to that string open quote space FPS four frames per second and then a couple of spaces so we want to put in the label FPS four frames per second leave a little bit of space and then like we did before we want to add or concatenate what the item actually is which we figured out above and now a comma and where do we want to put it well we'll put it at zero comma 30 so that should be roughly in the upper corner but not off the top of the strip screen and then we will want a comma font which we defined above a font height which will put as one okay and then we will be very careful not to cover up what you are looking at there and let me be very careful because I you know you guys know I have a lot of trouble with this I have a lot of trouble with this put text command so now the size was 1 and now we are going to give it a color which I'm going to give it zero comma zero comma 255 and then a comma two for weight and then it looks like it's already closed up all of my parentheses for me so that looks pretty good right like that okay so now we have put the text and now we will want to let's see we put the text now I think we are ready to show the frame so we will do see the two dot I am show and then what are we going to show our detect this is our recognize cam something like that and then I am going to show frame which we have converted into a format hopefully that OpenCV will like and then do AC v2 dot move window and what window do we want to move rec Kim and then comma we're going to put it at zero zero as we usually do now we got to do all this nonsense remember if see v2 dot wait key for one millisecond if that is equal to equal to the value of Ord not orgy ORD of hughes that means if we press Q then we want to break out of this hole while loop and then what do we want to do when we break out of the while loop we want to do a cam dot release okay like that cam not release and then we want to destroy all windows so we will do a CV - dot destroy all windows like that alright alright that's a lot of typing and I have no idea how many mistakes I made in there but this should do the same thing as the last program only it's allowing us to display it in our own window that we have better control of so let's run this thing and see how many mistakes we made along the way have you guys been screaming at me all right what is going on you need to check here and see if I can look and see what happened okay display is not defined where did I use display what line oh-oh-oh-oh when if we're gonna do if we're not gonna be using the display we never created a displace so we've got to go back to the old tried and true while true okay that makes sense let's try it again got a lot of stuff going on here oh look at that I've got something going on here okay so look at that boom kind of blurry all right but this is the webcam note I am operating at ten frames per second we've done this and we haven't taken any we haven't taken any performance hit all right we still have a really clear webcam image that makes me happy we're still operating at ten frames per second that makes me happy we're still recognizing things that makes me happy and so this is just a super super successful so let me go ahead and quit out of this and before we celebrate too much let's go try our old trusty Raspberry Pi camera and so I'm going to create the Raspberry Pi camera but if we are going to create the Raspberry Pi camera I hope I had both of those oh yeah yeah yeah yeah so we're gonna try the Raspberry Pi camera so I will need to comment that out and then this should give me the Raspberry Pi camera and we'll see what happens all right looking good so far okay boom boom you see that alright so operating at 20 frames per second alright and let's check the latency 1 2 3 4 5 no latency right no latency and we have got the full frames per second that we had while ago so that is really cool that is really cool and that is working ok so let's see where we are in our introduction to machine learning deep learning and deep neural networks we have our first program up and going and we've gone beyond the basic Nvidia package where they kind of gave us a way to show our rendered image without any ability to control it at all and we have kind of taken the best of the Jetson utilities and the Jetson inference and then we've added to it the things that we have more flexibility with with our old friend open CV and we didn't pay any FPS price for that so that was a pretty good move a pretty smart move and remember no latency as we were doing that there was no latency at all and so the question would be why don't we just stop there ok and I will show you the thing that I don't like on the on the web can we can set the size and that's really about all we need to do and that's really about all we can do in GStreamer when we were doing things the open other way with open CV but let me show you what I don't like let's go back to the webcam which I've heard to the Raspberry Pi camera which I believe were on here takes a second to load here we go I think it's gonna work boom alright now I like it that I'm getting 20 frames per second I like it that there's no latency 1 2 3 4 5 I like those things that there's no latency and I'm getting 20 frames per second this is what I don't like is that on this image when I told it that I wanted at 1280 by 720 the way that it has the way that that Jetson utility that GST camera launches GStreamer it's kind of like they did it wrong because the way they put that width in there they just took the whole image and they cropped it and in fact I don't know if this would run but if I made it even smaller let's see this might it probably it probably won't work but if I made it smaller it would just crop more and more so it's not resizing the image what you really want to do I'm gonna say I think it's a bug in the way that they launched their gstreamer command inside of that inside of that GST what was it that GST camera command I think they have an error in there and what that does is that kind of locks us out the other thing I don't like is they didn't give us any real handles on that GStreamer pipeline and so if you looked at that picture it was kind of washed out and because I got the fluorescent lights it was kind of washed out in blue but I don't have any ability to put parameters in there that would make it better alright so what would you think that maybe we could do hmm what if we use the camera from OpenCV right what if we use the camera from OpenCV converted it to CUDA and rgba then did the inference the deep neural networking artificial intelligent deep learning kind of business and then converted back to show what price would we pay in our frames per second and would it really work so it worked what we tried the first time let's see if we can go one step further and see what will happen so we will come over here and I need to do a little bit of windows management all right that looks pretty good and we're going to come in here and we are gonna do some new stuff so we still have Jetson inference we still have Jetson utilities cv2 numpy and Tom I think that's what we need we need width and we need height in now since we're going to launch the camera using our own GStreamer command at least for the Raspberry Pi camera what I need to do is I need to go ahead and tell it flip equal to and then we've got to get that GStreamer command and so we can come over here to a browser and I hate this word decides to resize it for me I know there's a way somewhere you can turn that off but I don't want to do it right now okay so we are going to come over here and we are going to go to top tech boy ww top tech boy calm and most of you guys that have been playing along know that you can just get it from one of your old programs but I I think if I just search on Nano camera this looks like it and there's that one line of code and rather than me trying to type it I'm going to come and get it it is this cam set so I'm going to select that whole line and I'm going to copy it and I am going to come over here and I'm going to paste it and so that is our old friend the GStreamer string that will launch a good raspberry pi camera okay now I am NOT going to create the camera using the Jetson utilities okay and I'm going to have two different ones here I could say that cam 1 is equal to cv 2 dot cv 2 video capture okay of what of cam set all right so that would launch the camera or the other way I could do it would be to say cam 1 is equal to cv 2 dot video capture and then I believe that we do /t e v / video 1 I do believe that's the way we do it if I don't if I'm not mistaken if we wanted to get a a webcam but I'm going to comment that one out and we will be going with we will be going with the Raspberry Pi camera for this first for this first time okay so that sets the that sets the camera now we still need the Google net we still need the frames per second we still need to grab a time and we still need a font okay now what we're going to do is we're still going to be doing while true alright but we're not going to be grabbing a frame that way and we are going to have to grab the frame using our old friend returns to parameters so underscore comma frame and it is going to be equal to cv to dot video capture all right CD to cv to video capture and then what do I capture I don't know what am I thinking cv it's my camera cam one dot read okay like that and we're reading that into frame okay now what I want to do is from frame I'm going to keep frame because I'll use it but I'm going to create IMG and that's going to be equal to see the - dot CVT color okay CVT color convert color and what do I want to convert frame and what do I want to convert it to see the two dot do you member what would you guess color underscore what BG are to RGB a alright because the inference tools want RGB a so now I'm in RGB a which is good okay but now remember how I had to convert it to an unsigned int eight well similarly here I had to go backwards and I have to do this as type but this wants MP float32 numpy float32 so it's reading this time from the camera an unsigned int eight and I got to convert it to what the inference tools want which is a float 32 so we've got that now the problem is we've got a great numpy array in the right format but the inference tools want a CUDA array so now I've got to make image to be equal to Jetson dot utils and this time we're gonna go to CUDA from numpy kind of opposite what we did before of what IMG so we're gonna grab a frame we're going to convert it to something we call image which is going to convert the color and the type of number in the array and now we're going to make it a CUDA array so with those two conversions we then are ready to do our class ID we're ready to figure out what the item is and I think after this everything else should be the same so this time we're gonna grab a BGR and then we're going to convert it and then we're going to come and then we're going to convert it back but actually you know what we can make this a little bit simpler because we saved for came up here so we don't have to convert back because we can use image we can use image to do our and in fact I think we should use image right yeah this this class ID we can't use frame because that's the wrong format we have to use IMG okay so we get the class ID from IMG and then the item is the class description that's the same okay and then down here though after we figure out what is in the picture we don't have to convert back because we still have our original frame right I created a new image IMG but we still have our original frame so I don't have to go backwards here so I can comment that and comment that and now when I say CVT you to put text it is still using the original frame so I'm kind of going down with the original frame in and the converted frame IMG if that makes sense let's see what happens here this is going to be a wild ride all right Oh display with this oh okay yeah yeah yeah yeah so remember how this this GStreamer string wants display width and display height and so what I need to just do is I just need to say display width is equal to width and display height is equal to height and now I have these two numbers that are used in that string rather than try to go in and change that string that we've had all this time I think that will fix that problem anyway lot going on there seems to kind of Hong that's awkward that's awkward I wonder it's not giving us an error though so it's kind of hard to see my friend it is frozen dead least the mouse is working now okay Wow is it still trying to go ah look at that it actually worked Wow I was fixing to try to kill it okay so what do I like I like that I'm at 15 frames per second I like that I created it and do you see how now it's not cropping it anymore but what do I not like one two three four five I got like a two second latency in there okay I have a two second latency and so let me quit that if I can alright great thing is the program ran the bad thing is I had a two second latency let me see if I comment out this I want to comment out this alright and I want to uncomment out this let's see if on the webcam what happens video capture what did I do wrong there silly me that's an uppercase v alright you guys catch that alright let's go not sure why this way takes so long to fire it up line 28 it doesn't like classifying the image I wonder if it doesn't like it if it doesn't see anything I wonder if we should give it something like like if it didn't find anything then maybe item doesn't have a description what if we say item equal like that just in case that was a problem I don't think that would be it but I'm not sure why it would work with one camera and not work with the other one is there anything else that we would need to do we are not giving I think I know what we're not doing we are not setting this is just a random a random dimension so I think that we're going to have to I think that we're gonna have to set that to something yeah so let's see if I can do this on the fly or let me see if I can find a different program here in which I have done this let's see if I I think it's these things here that I need and I'll don't worry i'll show you how to do it but you see that this is what this is what happened in this this is what happened in this in this program we went out and we turned on the video but it doesn't know what the display width in the display height are because that's not a parameter at least I'm not aware that that's a parameter when you're doing a TV to video capture of a webcam so what we need to do is we need to go in and set those okay and we're going to set those to display width and display height which we have up here now they should know what the display width and display height this and then they should know what width and height are and then when we send this command those actually have real numbers in them so let's run this thing now and see what happens I really think that's going to fix it all right look at that okay so look at this we are operating at ten frames per second we launch the camera with our own parameters and let's see one two three four five no latency all right so guys we let's let's kind of think about where we are here in this crazy world I've got to quit out of this okay we have made a lot of progress today right we've done our first real kind of hardcore artificial intelligence using deep learning and machine learning and a deep neural network and we were using the most excellent Google net I believe we were using the most excellent Google net as our detection model and that worked very well you guys again can go in and you can play around with these other ones okay that like for instance if I just type in Alec's net and it's this Center column that you want to use those words okay I can come in and I can come up to my model here and I can change it to the Aleks net and it's going to take a long time this time because I haven't run Alex net before so it's going to go in and do that tensor RT that tensor RT optimisation I do believe and so it might take a few minutes huh no I didn't okay so let's look at that let's see if it still finds the things that it found before found the keyboard spacebar it's having a little more trouble with the microphone okay so you see this is actually good because you can see that it's not the same as the Google net the Google net did great on the microphone and the Alex net thinks the microphone is a barber chair okay we've got the mouse let's see how it does on a pin ballpoint pen okay so you can see some similarities and you can see some differences it recognizes the monitor let me see here a kind of interesting thing that you can do sometimes like if I come up here okay this one doesn't some of the it recognizes it as a monitor some I think google night net might actually recognize if you look closer there it might recognize it as a it might recognize it as a website itself okay doesn't recognize it as a website but again that's the difference between the different on the different detection models we'll leave it on the happy little pill bottle yeah there it goes got the cut the pill bottle so you guys can can play around with the different models see which one you like the best okay you because we downloaded them all now what do I like now I like that I have control of the camera and I like that I have control of the display and I like that I have access to the Jetson inference tools you know the Jetson tools for doing the image recognition and in future lessons the object detection and what I didn't like about the Jetson utilities is it's kind of like launch the camera and show it and live with what you got it didn't give me any ability really to control parameters in the way that I had that that can in OpenCV and so what I like is I like the fact that I'm taking the best of both worlds the flexibility and power of OpenCV combined with the raw horsepower of the Jetson inference tool sets okay but now what was the one thing still that we don't like what we still don't like is is that for whatever reason something goes very very wrong when we try to something goes very very wrong when we try to use the Raspberry Pi camera let's go back and look at that again by uncommenting out that and then commenting out these okay let's go back and run that Raspberry Pi camera we got to figure out where are we going moving forward or so we're doing more development to give us the maximum power of the maximum frames per second in the maximum flexibility and so we got to kind of think of what our options are what's really clear is displaying it in open CV really works and it gives us complete flexibility to do things just the way that we want to so that is definitely the way to go but we still have a little bit of an issue as far as how we're going to grab the frame from the camera and like I say I don't know why this almost freezes up when I'm trying to run it on the Raspberry Pi camera but we'll give it a second okay it looks like it's gonna go okay there it is all right so now watch this you see I've got that - I've got that - frame per second I've got that - or I've got that two-second latency in there and I got good frames per second but I've got latency that is unacceptable and so what it appears is it appears that it must probably because we know everything else is working that it must be doing this conversion when I'm doing a conversion well this conversion certainly isn't going to likely create a problem but when we do this conversion this CUDA from numpy something in there is balling up and then you end up with a latency and that latency is just really really not acceptable and so what I think the best solution really the solution where I was using the Jetson utilities as the camera that worked good for both cameras except for it had messed up that GStreamer it had messed up that streamer command for for the Raspberry Pi camera and therefore the Raspberry Pi camera was not behaving like we wanted to and so what we're gonna do in the next lesson in the next week we're gonna go in and we're gonna edit that library because there's an error in it well maybe error is a wrong way of saying it but there is a poor implementation not thinking about that GStreamer command correctly we're gonna go in and fix that g string or command for the Raspberry Pi camera recompile the program and then we'll have something that actually works but what your homework is for next week is to play around with these different libraries these different models and see which one you like the best and then just make some comments down below about what you observed as far as which ones seem to work the best differences that you saw similarities that you saw okay guys I hope you haven't thought this is too long of a lesson I thought it was kind of exciting because we're really starting to get some major new horsepower in our artificial intelligent capabilities okay guys this is Palmer quarter if you like this video give us a thumbs up think about subscribing to the channel think about sharing this with other people let's see if we can get some more people working on this exciting world of artificial intelligence so this is Palmer quarter and I will talk to you guys later
Info
Channel: Paul McWhorter
Views: 8,633
Rating: undefined out of 5
Keywords:
Id: HhkkO-uRNgs
Channel Id: undefined
Length: 81min 23sec (4883 seconds)
Published: Sat Jul 11 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.