Jetson Nano Custom Object Detection - how to train your own AI

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey hey robot makers how you doing hope you're having a good day so far so do you want to know how to set up a jets and nano to detect custom objects and see those objects detected in real time then this is the show for you so let's dive straight in come with me as we learn to build the robots bring them to life with code and have a whole load of fun along the way okay let me get over to my keynote and we can uh richard says every day is christmas for kevin i just get a lot of stuff so yes i was playing with the this little m5 stamp before well i will take a look at that towards the end of the show along with some other things i've got to show you too but this is all about object detection so let's take a look at this shall we okay so this session is all about how to train our neural network to detect custom objects we have done something similar like this before on the raspberry pi but this is considerably quicker like light years quicker so we're going to look at types of computer vision we are going to look at something called mobilenet ssd and what that stands for we're going to look at how we prepare our model so capturing the assets the images labeling those images up training the model and then converting it to something called onnx and we'll have a look at what that is as well and then finally we'll use the model and do a bit of a demo as well it's so cool i can't wait to show you this so this is what we're shooting for want to be able to detect and identify and localize different types of robots in a live video scene so in that little demo i've got a snapshot of there you can see it's detected an auto diy in the scene and it's also detected a smart robot in the scene as well so you can just see over my shoulder here this is where i've been doing that image capture and if i go to my overhead camera you can see that this little camera that's on this arm here that is actually plugged directly into the jetson nano so i'm just going to swap out actually my other camera to the jets and nano so we've got that ready to go and um you can see there that that's pointing at the scene i've used a white background just so that it's not cluttered and it's not getting confused with other bits and bobs and it gets quite a clean capture of that image so that's kind of what we're shooting for and i've i've decided to capture four different types of robots um ultimately i'd like to be able to detect lots of different types so maybe an in-move head maybe um i don't know what else did i have on there the weather bot um the robot cat the open cat there's lots of different robots we could get it to capture but four was enough and i'll explain why four was enough shortly okay so computer vision models so there's three things that go on with computer vision there is classification detection and segmentation so segmentation provides an out an exact outline around the shape of an image pixel by pixel so it tells you exactly where that is so imagine you're driving down a street and you want to get um a person and not just a rectangle around a person you want to know the exact boundary of them or maybe it's an office space and you want to know where the floor is where the walls are and where furniture is that pixel by pixel will be really important to know which segment that falls into we're not interested in what we're doing today um so classification identifies what is in an image so we will be using that today we'll be definitely detecting which type of robot that we have and then detection places a bounding box around specif specific objects in the scene so there might be more than one so in that little segment i've got there you can see that there's a house and there's a tree and there's some coordinates as well just made those coordinates up but that's that's what um detection does so detection it is also broken down into a few different steps so detection is a branch of computer vision which deals with localization and identification of objects so this localization and identification are two different steps and when we put them together we achieve that goal of object detection so localization is specifically about locating the object within that image or video stream an identification deals with assigning the object a specific class or label so those are two separate things and what we're going to be using the mobile.ssd can do a lot of those things very very quickly together because it's optimized for that use case so preparing our model let's have a look at what we need to do here so this will take some time this took me all saturday to do so first of all we need to create a label file so we literally create a text file and each line on the file will have a different class or thing that we want to detect something that we want to give a label to so i chose to have a smart robot um a quad smiles robot a smart mini and an auto diy just for a bit of a bit of variation um so we want to create that file we want to collect a load of images and buy a load of images we're shooting for about a thousand images of each class so that's about 4000 images ideally to get this neural network so it's really chef kiss you know the best the best it can be we then need to label the objects with the images to produce an annotation file and we can use a tool to do that and in fact nvidia have merged those first two steps together so as you're capturing the image you can actually label it too just to try and speed up the whole process because it does take a long time to do once we've got all those we then need to train our model using this mobile net ssd and then we need to convert the output of that into something which is this opel neural network exchange on an x model format and you can actually use this on different devices once you've created it so for example you could create a lot of this stuff um train it on your just nano and then move it to a raspberry pi and it should work just as well it's a format that's open from all of them so pascal voc so there's a particular format that we need for these um storing these files and creating them and a lot of the tools are based around this pascal visual object class format so it's just a bunch of folders within those folders it expects them to have certain file names sorry certain folder names and then the files that are in there so for example in the images you can have a whole bunch of images and then you have a corresponding xml file for each image and those xml files in the annotation thing are created by your labeling tool and it simply just says what the file is what objects it can detect within there and what the coordinates are for each of them because what our neural network will do is it will just pull out each of those images resize them grayscale them and then shove them through the neural network and we can have a look at that in a couple of minutes time so creating the labels file is dead easy it's just creating a text file so like i said i've got smart quad smart auto diy and smart mini and you can see on that screen shot there it's detected all four types there and i just put that little red bubble um just above it so you can see a bit clearer because sometimes that text is a little bit small you also get the percentage of confidence that it's detected this correctly so you can see there the smiles the quad smiles it's 99.5 it's very very confident about that the auto diy 92.1 and that's probably because i didn't take as many pictures of that as i did with the others smiles mini it's detected 85.7 down there that's because it's physically smaller it's got less pixels to play with and then over here we've actually got an overlap so it's detected smiles twice one with 96.4 the other with 89.6 and that's probably because it's two types of smiles that it can detect it can detect one with the the matrix display or the one with the rangefinder on there um but i think also there's a bit of a bug in this detect net where it can overlap them and it's to do with a threshold so i've yet to to fix that one um so asset capture so this is a small video of me capturing some images so you literally just draw a rectangle around the robot you just move it in the scene um yep yeah then go back draw the rectangle around it then you go up to the the top right you say what kind of object it is so this one's a smart robot you unfreeze the frame you then move the object again you then go back draw under the rectangle it's like stop-motion animation so you can see here this is kind of the amount of effort that it takes to produce a reasonable model you need a lot of images and ideally not just from one angle you want them from above underneath different lighting conditions with different backgrounds overlapping other objects and so on so the more effort you put into this the better model you will end up with but um i knew that i only had so much time to do this and allow for an amount of time where it doesn't work i need to troubleshoot stuff so yeah you need to repeat this about a thousand times for each uh object class that you're shooting for um so on that piece of software i'm using there is part of this um nvidia jetson inference um github library which we'll have a look at and this is called the camera capture utility so it enables you to sort of do that image capture freeze the frame draw the rectangle and classify it okay so hypothetic says using different background also helps yeah so it can it can figure out what's just noise and get rid of that whereas i've gone for that white background and probably not the best thing but um it just made for a more pleasing video to record on so training and testing to the process flow for this so i've taken this slide from a previous um video that i did which was on the uh object detection using a raspberry pi um and using the um in move head as the the camera and so if you want to look at that i will put a link into the description on the video i've not done that yet but by the time you watch this i will do that so we need to load the data into the data model we need to fit the training model with the training data set and you do normally separate your training data your test data and your um your validation data have like three different sets and two of the sets um it might have seen before but the third set it will have never seen and that's how you can see if it's actually detecting things so what i've actually done here is i've got three different types of um smart um quad smarts it's never seen this particular one um it has seen another one that i've got on my wall over there um and it was able to detect this one even though it's never seen it before so that shows to me that it's the model's working well so we test the model with the test data the weight adjustment is done automatically by the algorithm by the the training program and we keep training until the the results are acceptable and one of the ways we can do that is just by defining how many epochs this will go through so for this one i i went through 30 epochs to train it i did one initially just to see and then after that worked well i then went to for about 30. and that took around this was it 30 minutes to 45 minutes it's very quick for a mobile device so again these slides just these next two or three are from that presentation i did on the object detection previously so i will whiz through this a little bit if you want more detail on that go back and watch that video but so essentially we capture an image this is how machine learning works um we then the camera will take that and it pixelates it it's um it's converting that real thing into a series of rgb values so red green and blue and each of them is between 0 and 255 it will then create a grayscale version of that because we don't actually need red green and blue for a lot of image processing we just need to see the image itself so turn it into grayscale so just the value between 0 and 255 will give us the grayscale and then we essentially split that into a big array so the the neural network doesn't see like we do it doesn't define areas and shapes and everything there is literally just pixels which are values between 0 and 255 and then it eventually they become a number between 0 and 1 so they become like a floating point number and um they just represent what that particular value is for each cell in the array so we convert that to grayscale it's just simpler to process we don't need the color information it doesn't actually add a lot of value in the cases that we're doing and a lot of the times as well it will reduce down the image so we might have captured this as like a hd image um so maybe um like 1920 by 1080p something like that and it will reduce it down to 300 by 300 pixels so it's a lot smaller a lot easier to to process uh it doesn't take as long to go through and in fact one of the um the ones we looked at on our previous stream i think they were actually reduced down to a 28 by 28 image grid whereas we're working with 300 by 300 so it's a bit more resolution we get more accuracy but it takes a bit more power to do that so as i've said it treats the image as an array of values so what we see is colors are actually just values between 0 and five five so they're very small uh to store each one of them just to bite and then there is the neural network so there's all these these blue things are the the neurons the um the net is the connections between all those neurons and they line up in sort of layers so each layer connects to the next layer to every node in that layer and where the connections are they are the weights they're the thing that get adjusted by the training algorithm hey wayne how you doing um so these neural nets so the neuron receives the input so initially that's just the pixel value and it might be a bunch of pixels not just one per neuron it might just receive 10 for example p neuron we then have the weights coming in initially they're a value of like a half and then we multiply the weights the weight by the input values so we simply do a bit of maths there you can see these these were the um grayscale values they're now multiplied by the the weight not 0.5 to give us a bunch of values we add them all together and then we have to crunch that down using something called a sigmoid function which just makes it a value between 0 and 1 because that number is too big for what we need it for and we looked at what the sigmoid function was in the other video so i'll not cover that on this one so the type of neural network that we're using is one that's called mobilenet ssd so single shot detection and before when we're talking about classification of images localization identification that identification and um localization of the objects in the scene think about how you would do that as a programmer you might scan every single pixel or a block of pixels maybe like a moving window to see does it match more or less this object that we're looking for now if you can have several objects in the scene several classes of things you might have to do that several times and pass through and that wouldn't be very fast so the single shot detection essentially just moves through the neural network um once and um it's able to detect all classes and all instances of an object in that particular image so it's very very fast um and very efficient at doing that so that's that's you'll see that very commonly if you're using neural networks to do this kind of work so mobilenet is what we're gonna be using today as well so if you like these videos remember to give it a like if you're watching on the um video now on facebook or on youtube give me a like give me a thumbs up um drop me a comment as well just let me know what you think about this if you've used this if you've got a jetson now or you're thinking about getting one or if you've used the raspberry pi or you've used something like an esp32 they can do an amount of processing i believe but i've not tried that myself yet i do have the um i brought it with me the um esp camera the esp32 camera and i'm intending to stream video from that and then run it through the the nvidia neural network to see if it can do that from several wireless cameras that'll be quite cool so yes if you if you like these videos give me a thumbs up a comment and uh if you've not subscribed to the channel the channel what are you doing subscribe now so i do also have oops that looks interesting let me just uh i seem to have disappeared there let me just bring myself back in there for a second um i've not adjusted these overlays in some time so i think um the camera must have changed so if i go to there there i am fantastic it'll do this on the other two as well so yes i do a video every single sunday uh seven o'clock greenwich mean time i think we're on gmt plus one just for about another month i think and then it just reverts back to granite's meantime um so yes you know where you live around the world so let's give this next one i think they'll do the same thing but i'll quickly swap that out so let me just switch the camera back to me so there we go i'll show you what this is to do that so yes check out the website smashfan.com um that's where i put all the tutorials i've actually given a bit of a facelift since i did this little clip here so i probably need to update this uh call to action as well but yes i recently upgraded the capture device i use i used to use these little hdmi video capture things to bring in the hdmi video from my camera and i'm now using a proper elgato s that's hd 60 plus i think it's called so it's a bit more crisp and it can do 60 frames a second and what was the last one i was going to show you again this is probably needs to be fixed let me just do that as well there i am awesome so yeah if you want to support the show you can go to bamyaki.com and um slash kevin mccleary you can also download um some of the stuff i've got on there so of the top trump cards that i created in last week's video quite a few people downloaded them if you've not grabbed a copy of them yet head over there to do that and that can help support the show and pay for some new equipment so i'm looking to get a new camera for my overhead stuff so currently the camera that i've got um is this this camera here it's not the best camera it was a cheapish usb camera and um yeah it's not the best camera in the world so uh if you want to help out support the show you can do just by going to buy me a coffee dot com slash kevin maclean's buy me a coffee and that helps support the show um i think that's the things i wanted to cover off there so let's get back over to our keynote it talks about our new video yes we've covered that right training the model so this is where the magic happens so we're going to train the model we can retrain an existing model in fact this is what we did with the um the mobile ssd net mobile net ssd um we've taken an existing model that's already been trained to detect images like people and cats and pens and there's quite a few objects that it can detect and we've essentially added four things to that and then we've re we've retrained it doesn't take as long and you kind of build on that learning that's already taken place there so we run the ssd trained.python script all these are python script there are c versions of this as well in the repository and we'll have a look at that in a minute as well so things i found out the hard way so the labels file simply needs to be called labels.txt or it doesn't work properly that must have wasted an hour of my life i think i'd call it smart labels or something class label or something like that and the path where you pass in where the model is shouldn't have any kind of like slash in it i was getting a bit ahead of myself with there and putting a forward slash which meant the route directory rather than the current working directory that was in again that threw me for probably another half an hour and it takes a long time to capture a thousand images of each class so i did skip that and probably only did a couple of hundred um so more images is a better quality of object and if i tilt these up they won't work um they're only sort of straight on or as they were sat on the on the table and the training i was actually surprised how quick this was so i think 30 epochs which is like 30 sessions of training took about between 30 minutes and 45 minutes um it was quite quick to do which i was impressed with okay so yes i followed along with the jetson nano developer site they've got a tutorial there that's called hello ai world and they've got loads of tutorials on there some nice youtube videos better than this one explained by the people who've written the software by dusty envy who's a github repository we can see there and it's built on pi torch which is um a neural network based on tensor art rt as well and in video thrown in their own stuff because it's obviously optimized for their for their hardware um so they take all through all the different steps we're gonna have a bash at this now but i'm not gonna do the entire thing because it does take a long time to do but i'm just gonna show you kind of a flavor of how this actually works so it's demo time it's my favorite time so let me go over to this here which is i just wiggle the correct mouse there we go so i'm on my jetson nano and i've got two windows open here so let's start off with this window here so what i'm going to do i'm just going to run the command which is camera capture and i'm simply going to say camera capture and then video zero is the the usb camera that i've got plugged into this so i've got a web camera just here so what i might do is um hold various things up and you'll see me just wheel back to them so if i pick up this here you can see in real time you can see in real time that i've got access to these things so if we want to oops so if we want to capture that auto diy that's there what we will do is we'll go over to this window up here let me just move my mouse away this window up here let me just bring this down a bit so we are going for detection and what we can say where the data path is going to be the class labels i'll just pick the class labels from there which is in our python training detection and ssd folder and then within that i've got a models folder uh in fact let me just jump back up a folder there data folder because we're creating the data i've got a smarts folder and there there we go labels.txt so labels.txt means i can now freeze this image here so if i go to uh what am i missing out there i think it's just the detection path let me just uh set that up so that is okay let's choose that folder there we go so now i can freeze the frame so if i if i'm waving my hand here and then i freeze my frame it's going to freeze that weight in my hand now and it's not actually showing you anything on there because it's frozen that frame but now what i can do is i can draw that rectangle around the object that we're interested in detecting now you do need to be quite tight with these boundaries anything that you do that isn't exactly around the object or if you draw too many that's also not helpful just remove that one um if you yeah if you if you don't oh there we go that's what i'm looking for so i'm just looking to get that foot and also the head once as tight as possible because we don't want to include any extra noise that'll just make it not as good detection um so we're going to make sure all the different parts of the thing we're detecting are within that bounding box it doesn't matter if it overlaps with another one so if this um if this was like overlapping i'm just going to unfreeze that now uh you can see there now that's slightly obscured let me just unfreeze that you can see that sort of obscuring the the auto there that's fine we would still do the same thing we would freeze it and we would draw around what we can see so the fact that we can still see some little bits there is fine it doesn't matter that this is being obscured because later images will eradicate that from the from the neural net so over time it'll realize that that's not an important aspect of what makes up an auto diy object so if you can see in a little window there it's not very easy for me to zoom in on this let me see if i can actually zoom in um i don't think i can because it's a screen capture so on this little window here it says class so i can drop down there and just read in that text box the labels.txt and i've got autodiy is a different class there you can see it's changing the color of the boundary box so that's purple for the uh smart mini but we want auto diy and then it's got an x a y a width and a height so that's just the x position the y position the width and the height and then we can delete that if it's a mistake but what we can also do we're not limited to just doing one object this scene has got several things in it that we're interested in so we can actually bring those in as well so i could draw a rectangle around that one i can say that that's a smart quad we've got um there we go that's a smart um yep a regular smiles i'm not differentiating between that and this type of smiles that's down here i'm just going to draw one around that so that's the smiles and then we have just at the very bottom there a smart mini so let's just select that one as well so you can see that it takes a bit of time to do this and then once you've done that and you've you've you've saved it you then have to move everything about a bit give it a slightly different angle and then do it all again so you then go back you freeze the frame you then sort of draw around each one of these again again so takes ages to do that and you've got to make a decision as well so for example should i do that and include the wires otherwise what really makes up that particular robot that object or is it actually just that that bit there because the wires are not something you might see on every single version you see these other ones haven't got that but um we'll include that there just move that out of the way a second and i'll just bring that sort of in slightly there and then i'll push that one out just to get the edge on there and there's also another one here we could just draw a box around that and we'll then move on to the next part so that's there okay so each one of them have been correctly assigned i don't think this one has here so that one is that that one there that needs to be a smart mini um and so on so once we've done that we've we've we'll have to count roughly how many images we've we've saved have we got a thousand of each type have we got them from every single angle so i've just appended one there so i unfreeze that you know should we get one that's like that angle depending on how we expect this to work if our robots are never going to see a robot on its side maybe we don't need to capture that but just to have total detection and be able to see something from every angle maybe that's something we should do for this as well and you can see that it says the current set so we've got we've got train we have validate and we have test and we can actually merge all the sets together just for speed and have all three the same but that's not best practice it's actually best practice to save half of your images for the training half for the validation and then another half maybe a third for the testing so yeah data collection as hypothetic says it takes the most um most time-consuming part there so um let's go over to the next thing so i'm just going to close that window there which is just going to stop that little python script running that's all been done in python you can see there it just says that's shutdown complete it's been using something called g streamer to bring in the video and the video is very very quick um as we shall see now when we we're actually going to run um the finished version and i'll show you the intermediate steps in a second but i just want to get to the the interesting part of the demo here which is this one and we'll go through once we've had a play with this and have a look a bit more detail a bit more detail about what's going on there right so it's going to bring up a window in a second and it's going to detect all those objects in that scene and uh just takes about 30 seconds for it to fire up so there we go everything's looking good i've typed all the correct things in because i tested it before i went live and then it's going to bring up the window with all the objects in with all the bounding boxes around them too come on i knew this would take was it 24 seconds when i was uh when i was testing out there we go right okay so you can see there it hasn't detected the smile that's on its side but the second i flip it it's detecting that it smiles correctly let's pull some things off this scene sorry if the audio is dropping in and out i'll just try and have that just there i'm going to pull all these away from the scene and let's just try let's just try one thing at a time so there's our auto diy it's nicely detecting that it doesn't look as confident as you'd expect it to you know 71 something percent let's rotate it round and see how that goes it's obviously i've not taken any um captures of my hand so it's guessing my hand is a smiles which is interesting but you can see that as we rotate this round um it's quite confidently detecting that there we go let's try moving a leg if we do an angle yep it's still happy with that and i i hadn't recorded any images of at that angle so that's quite interesting let's now drop on um a smart robot so it's happily detecting that 98 there's loads of smart images in there from pretty much every angle so that's why it's very happily detecting that i didn't take as many of the the smart mini so put that down there it is detecting it and you can see there we can obscure them and it's confidently detecting them it's not detecting the auto quite as well there let's do that yeah so the second that it goes behind it it's not quite as confident that there's something there to detect one of the robots that i didn't include in the image capture was the weatherbot so i'm going to bring the weatherbot in now oops just destroyed my weatherbot there if i bring in weatherbots it just ignores it it doesn't understand that that's anything that it should detect so it completely ignores it from our scene oops similarly this this smart quadro but it's never seen this smiles quadrotor in any of the training and it's guessing that it's 98 sure that's a quad robot so i'm very impressed with its ability to detect that again if we bring in the other one that it has trained on it's very happy 98 that that's a quad robot from almost every single angle i was curious to see as well if you tilt this up does it continue to detect it you can see there that's it hasn't got any data to say that that's a quad robot so it's never detected it from that angle before but side on it's very very confident that it knows what that is okay so let's move that out of the way and then another smart robot has never seen this one before so let's see if it detects this as a smiles that's got the um the line sensor module on and it's still happily detecting that as a smart robot even though we never trained on that so detecting these objects these these features of it quite confidently let's bring in some other smiles robots bring in that one i think that's a quad that's interesting so it's never seen this one before oh it's now detecting that as uh some of these things that maybe was detecting that one behind actually let's move out the way so yeah detecting that as a quad because this feature is not something that it's seen before we take that off and it detects it as a smarts nope still detecting that as a quad it's now detecting it as a smiles now to be fair i didn't train it on that so it has no way of knowing that that particular part was something of a smiles robot so you can see how fast this is this is running very very fast in real time um very impressive how quickly it can do that so i mean i'm not saying we could use this in a you know a road traffic situation but it's uh certainly good enough for doing real-time object detection for us now if he was watching james bruton's video he did something he had a similar kind of setup and he detected triangle squares and circles and then he had his robot drive towards that um and he was using the position of the object he'd taken that out using the the python script and edited that so where the x and y is he could move the robot left or right to make that object more centered and one that was centered he would then drive towards it so that the width increased and when that got to a certain size it would then turn and then try and find the next object and he had it sort of going in a circle detecting the circle the triangle and the square round and round around so i was thinking you know we could do something like that bringing in our where's my robot.com um this one has the wireless charger on the bottom so we could have a symbol that that's like power maybe two like um well maybe not two zigzags that's a probably not bring it from a historic point of view um maybe some kind of like power symbol and it can detect that and it can drive towards that and then actually as it gets closer um it it could slow down and be a bit more accurate and hop onto its little charging bay so that would be a really cool thing to do as well as just being able to detect its friends they could all swarm together and find each other and run away from enemy robots or something i don't know so yes i found this a really really fun thing to do um so let's get back over to the um nvidia and let's just stop this for a second and let me show you how we go about running the training program so i've just got this connected by um hdmi input so i'm not doing any kind of screen share thing it's just this is why that's so fast i did look at using a vnc but it was very very slow and it you wouldn't have got the full flavor of just how fast this is at running okay so let me head over let me exit in fact this and i shall take you through one step at a time how we do this so let's just type exit okay so the first thing i did was i went to github and i cloned the repository the jetson inference repository and once i've got that on my machine i went into this folder here and i think there is a couple of scripts that are in there and one of the first scripts is docker so let's just cd into docker um apologies that this is so small and if there's anything we can do to in there we go zoom in let's zoom in a bit and we can all see what's going on then a bit easier okay do another one or two okay so there is a folder that's called docker and inside that docker folder there is a bunch of scripts so there's a build script pull push run and tag and all i've done is run the run script so if i just jump back type in docker slash run dot sh docker not lockett if you've not come across docker before docker is a containerization technology so think about container ships got lots of containers on it and docker allows you to run lots of different pieces of software in separation within their own container and you can very quickly download containers update them distribute them you can do all kinds of clever stuff with them let me just type my password in properly here i changed it from the uh very simple password good grief there we go right so we're now running that docker instance so the container there you can see is called dusty nv jetson inference and then r32 6.1 and hypothetic says i do not grok docker the container is that the volume is a folder um which is that github folder that i downloaded but it actually has that as like the root of that particular container so you can't see anything outside of that it's just brought in and then the other thing that it's brought in is the device that's called video zero which is the web camera that i've used now i have actually got um the raspberry pi um csi camera mcsi camera on there but that isn't working at the moment for some reason so i'm not sure what i'm doing wrong there but that's not working so i plugged in a usb camera that works fine so we're using that for now so now that i'm in that docker instance we can see a bunch of folders there so i'm just going to go into the build folder let's have a look what's in there there's a download models you can you can run that and that will tell you what let's run it and see actually so if we just um download models you can see there there's a whole bunch of different models google net google that's similar to mobilenet resnet alexnet inception and there's different ones for different purposes so these object detection ones as you'd expect are very good at detecting objects the inception one which is huge has got all kinds of objects in there all kinds of office and household objects um ped net multi-pad face net detect net you can see their dog bottle chair airplane mono depth pose estimation so if you've got um think about a um xbox 360 or xbox one charmage with a 360 and that had the the connect sensor on it and it could detect kind of what body pose you're in what your limbs were the pose estimation does that so um it can detect all different parts of your body and therefore what position they're in using that segmentation type stuff then there's semantic segmentation which is about cityscape so that's good for driving down the road and detecting street signs and pedestrians and all that kind of stuff very accurately so the whole bunch of them got some legacy ones in there as well image processing and that's it so if you click on any of them you can have this script download them for you so i had to play around with the fruits one just to get familiar with it so then if we go into the um arc 64 which is the architecture of this we then go into the binaries folder and then there's a whole bunch of scripts in there so capture camera capture is what we used um in that window up there so when i run that up there that was actually camera capture and then the parameter that i was using was just simply slash dev slash video zero and video zero is the webcam um detect net is the the next program that we run that's the one that we ran in this window that it took that model that we'd built and then it ran that particular model so detect net is what we use there and then there's a few other pieces on there that um we probably don't need to look at so what we're going to do is come out of that oops maybe if i'm in the correct window let's just back out of that get back down to the first folder and then we're going to go into the python folder now so in this python folder there is a training folder so let's go into the training folder then it's got classification detection and segmentation we're going to go into the detection folder then there's the ssd model so we're going to go into the ssd folder and then finally in here there is a couple of scripts that we use to build and then test our model so these three folders that we're interested in this data models and vision so in the data folder that's where i stored my smart model so i've created a folder this is called smarts and then in there there is annotations image sets and jpeg images and then there's that labels.txt not let's get rid of that other one class we don't need that one so if we just have a look what's in labels.txt i just want cat to catalogue that we can see in there it simply says smart quad smart auto diy and smart mini that's all there is in that particular file so if we now go into jpeg images we can see that there's a whole load of images and in fact we can actually open that folder up is it which one is it for the full rubbish using ubuntu so do you know i can't remember how you open up a folder in this thing file thingy there it was let's go for that i don't know if that's the right one file manager it looks good enough for me okay so if we go back to where we were looking there so jetson inference we're in python then we're in the training folder then was in the detection and the ssd then the data folder and then smiles and then jpeg images so let's have a look at one of these there we go awesome picture of some smiles roblox just hanging about there so you could see what i did there just took a picture and then moved to another just moved them around a bit um took another picture and so on but using that software um which was the um the camera capture software it creates for each one of those images an annotations file so if we have a look of this particular xml file here let me just move over a bit there you can see it says annotations file is the file name there the folder is smart the source is smart the annotation is custom the image is custom the size is 128. sorry one one two eight zero ninety seven twenty so it's 720p ish the depth is three it's rgb uh and there's no segmentation information in there so that one i actually hadn't classified um so let's just pick one a bit further down that has got a classification in it so this one um where are we on the database mars i was looking for the actual name let's try another one randomly down here there we go that's better so this has got a few different objects in it so um there's an object that's called smiles it hasn't got a pose so that's unspecified um and then the x y x max and x and y max so that the coordinates of the window wherever that is within that object within that image in fact for that object is specified there so this is what it generates um which is just raw data once we've got all those things done we can just jump back out of there and let's jump back again okay so then we've got the train ssd so that's what we would actually use so if i just do python let's do python version we can see we're on version 3 3.69 so we're not not massively behind the times on that one what we are now 3.9 is it for python um so if we now do python and then train ssd dot pi now it does need to know lots of information you can't just run that it needs to know the model directory so that is smart even it's model smiles it needs to know the labels file hey adam so labels equals label so that's under model that's into data and that's under smart and that's under labels.txt what else does it need to know the data i think is that data i can't remember off the top ahead what this one is but you need to specify where all those images are so again that's under uh data slash mars uh let's just try that and see what happens i might have missed something else out there um it will tell me if there's an error or it definitely tells you if there's an error it's horrible um it takes a couple of minutes for it to sort of get started yes i've missed out it doesn't recognize labels equals so is it label singular let's just try that i'll give up if that doesn't work because it gets really really complicated but there's a whole bunch you can see there data sets um what else is the base directory scheduler epochs yeah we're not specified what the epochs are um so that's where it's best to go to which we'll do now actually let me load up github and i'll show you how we get more information about this so if i just go to there then i can share my screen there we go so dustin franklin is uh the person who created this repository for nvidia um and in there this jetson inference is where all the good stuff lives so um they've got all kinds of detail there image classification object detection semantic segmentation there's a better view of it there you can see right round each object it's entirely colored a particular thing you can see the street signs the trees are all separated segmented like so we've got pose estimation got some people there the limbs which way up they are um really cool stuff like that and simply what i did was just went through one of these how to set up your jetson and they've got all kinds of you know you can build it from you can use your docker container which is what i did and you just type in that three commands so we just clone the repository which is this repository we're looking at now we go into the folder and then we just run the document command run and it will grab everything it needs it sets everything up and then you're good to go um it then talks you through how to do all the other steps um such as training your model so you can find the bit on training there we go so it talks about transferred learning so the fact that we've run this through other objects before it already knows how to detect i say it knows the functions have been trained have been tweaked to the extent where they accept more information a lot easier so we don't have to bother with pi torch that's already installed by default whenever we do anything so we don't need to mess about with that and what else has it gone there just trying to see if there's a in fact there it was retraining your ssd mobile net so so the command that we are looking for the droids we are looking for um is around about here somewhere there we go so uh data model batch size and epochs right i'm going to keep that on one screen there and then i'm going to flick back to um let me see there now let me just see if i can get this correct so we so instead of it being model directory we don't need any of that we just need to say data equals data slash smart we then say the model directory so that was in there model dir equals models smarts helps if you type everything correctly and then batch size so this is how many images it processes at a time now i'm only on the two gig version of the um jetson nano and i stick to a batch size of two so two files at once and then the epoch is how many sessions of training it's going to run so this is going to be let's just do one for now now if i've typed everything correctly there um what's it not happy with what if i type wrong there models model directory that looks correct there's two dashes in there that's why that's not happy so let me just go back to that and then run it again i just wanted to get to the point where you can see that it's doing stuff right it doesn't know what the labels are that means so let's just tell it um where's the label thing is it labels equals data labels.txt i'll give up if that doesn't work you can have a play with it yep it's not happy with that labels argument i can't remember if it's label or label file or label i could be here all day without knowing that what that is um anyway so i can jump back to this other screen and show you on here so what it said so this is what you would say you would see a time stamp it would say epoch 0 and it would give a bunch of steps i think there's about 70 steps when i was doing this and then it would tell you what the loss is so it starts out as a really high number and then that would come down and you want that number to really to be under one ideally like 0.1 blah blah blah so the smallest possible number because that means you've got a really well trained uh network after it's done that it will then save the model out to um something like this so i'd give it a model name it'll say mb for mobilenet ssd for that single shot detection epoc which is the epoch number it'll tell you what the loss was uh and then it'll have this dot pth as the file extension and the more epochs you do it'll do one for each epoch and then when you try and convert that so we need to convert the model to be an open neural network exchange file so we just run this on x export we pointed at the particular model directory in theory that's what you should do i had to hack around with a little bit and it will look through each one of the epochs and find one with the lowest um lost number because it might not be the last one you can actually overtrain these models and it will actually get worse at detecting things so that's there is an art sort of specifying how many you want to run then it could be between 30 and 300 depends just how how much time you're willing to throw at this okay so that's what we can do there's a whole bunch of other um test ones we can run in there there's like fruit there's airplanes there's toys there's all kind of stuff but i was really keen to detect my own objects to see just how easy this is to do and it's easy it's just time consuming and a bit fiddly um i've said it's for intermediate i don't think this is an expert skill to do this because you are just following kind of a script from someone um so it is quite straightforward to do that so that's why i've sort of said it's a an intermediate kind of skill okay so let's have a look at some of the comments we've got through here i've not been ignoring your promise so let's have a see what people have been talking about on on the the chat so and also let me just throw up um the over overlays because i've got a few overlays there that i was missing off okay right so we have quite a few people good grief on the stream today we've got 12 people on here um so carlos hey carlos how are you doing nice to have you on the stream today and richard you were saying every day is like christmas for me yes so i i bought um a whole bunch of things i want to show you so one of the first ones was um i was talking with adam um on last week's show about these uh little m5 stamps the pico stamps it's an esp32 uh in a little little form factor um so i've just been having a play with that i've not actually done anything with it yet but i had ordered that and that did come in the post and i've got some other things i wanted to show as well so one of the shows i would like to do soon is one with infrared i'm using infrared to control this one comes with a little remote control um control robot so we can sort of send and receive data to it and have it move about i mean i think this one is just to send only i don't think you can send and receive with that and also in the um small robots group somebody was having an issue with some nrf modules so i bought a pack of stuff let me just show you what's in here so in here we have i don't know if you can see that i can get that on there so there's a module in in there and there's got an antenna as well and then these also come with a little transmitter module receiver module that that plugs into and then there is also there is it a whole bunch of arduino nanos as well so there's essentially two arduino nanos and two sets of senders and receivers so i've got a bunch of them i'm gonna have a play with that and see if we can get a remote control smiles using nrf as well so that's why it's like christmas every day for me i think they were the things i was gonna show you i don't think there's any more just yet although there probably is i'm getting so much stuff every day hey d johnson how are you doing so hypothetically saying i've got um i have on each feather rp 2040 and the teen c4 running micropython 1.16 and i've seen one of those things running on circuit python awesome so i do believe you can use um a tiny ml um on raspberry pi pecos and the esp32s um i've got tnt as well i was looking at that before actually i think it's a version three i digress that i've just found my i don't know that's the team notes but i get distracted easily it's in there somewhere but i did find the um the camera the sp32 camera so this thing um it's not particularly easy to plug into you do need one of these uh ftdi things to to plug it in but this camera here um it's got an sd card on it as well and it's just an esp32 chip you can just see on the back there and you can stream apparently from these so i was thinking we could stream from this video camera this could be you know mounted onto the top of the smiles robot or something like that i'm sure kev thomas has done something like this already this can be driving about streaming its video but then you could use the power of another machine even like a desktop computer to process that through a neural network and uh do image detection and whatnot so uh hypothesis is also saying that use different backgrounds can also help during the capturing because that that noise there you do want that noise to be filtered out and that's a good way to do is just have a kind of a busy background as well so question for later regarding training would putting of an object on a turntable to change the angle help at all it would you would certainly get many angles by doing that i do actually have a turntable just behind that white thing sometimes you can see it there's a whole bunch of stuff just sat on it there it's that white thing that's just there that is the turntable i have so that would certainly help um but you do want to get it from as many angles as possible it might be a bit too uniform if the thing's you know pivoting around there you want it to be sort of a bit closer a bit further away because that will um the the closer an object gets the more distorted it gets on the camera and the further away it gets you know you can get that perspective effect on it so you do want these things from many different angles just so that the network can take that into account um possibly even not using the same camera though it probably would make sense to train on the camera that you're going to be using to detect stuff on there so yeah i would say turntables certainly certainly make it easier as you're capturing it so hey wayne how's it how's it going um and hypothetically it's not actually bad to have images where part is covered the a will get smarter that way absolutely it'll then learn to detect what is and isn't part of the object so it's quite good to do that collecting data is the most time consuming part of deep learning amen to that i would say um so data the data is mostly part of the pro process if you have garbage in you'll get garbage out you know i was even thinking about including that exact phrase on one of the slides i was thinking you know quality in quality out and um uh the hackanis87 says that is so cool it is so cool honestly i have so much fun playing with this um so yes be sure to take some um images of them falling down yes so if it's tipped over we want that um to be included in there as well and from every angle too so you just have to spend a whole load of time doing this so it needs much more data and in fact so one of the projects i've got on the back burner um that this will be perfect for is twitcher pi so it's on my github repository and the idea was if you've got a bird table you've got a raspberry pi zero pointing at that you can detect that there's birds on a bird table and you can classify them as being different types of bird and i went to the extent of downloading the top 20 birds from the rspca royal society for protection of birds and um you know what the most common english birds that you'll see in a in a garden and then have it classify each of them and i didn't have a bird table at the time or a camera set up outside to take these pictures so i simply went on the web and downloaded 100 of each type of bird and there was 20 birds so it was like 2 000 and odd pictures is it is that maths right um and yeah this took a very long time to do drawing the rectangles rather than classifying them and so on and then what was what worked well was if you give it a picture of a bird that was kind of like this from the side from the front doing some elaborate thing it would work fine but the back side of it no one takes pictures of the back sides of birds so of the avian variety certainly so because of that because of that there it only works when the birds are front on or side on or sort of taking off something like that um so that's one of the things i was thinking about maybe um working on a bit further and using the sort of knowledge i've got from this as well i know how i could do that i was actually thinking about combining it to make it a bit more automatic so if i had um present sensor like using one of the infrared things i've got from the um one of the kits which is on that just there actually that's the pico starter kit in there there is an infrared detector um motion sensor that's the one upload i'm looking for a motion sensor so if i had a motion sensor set up to trigger to take some pictures i could do half the work which is taking pictures of the bird table while there were birds on it and then i'd only then have to go through and draw rectangles around a bunch of them so i was thinking about having some logic if it triggers it it would take a picture but then wait for two or three minutes before it does it again so that's one of the things i was thinking about there so hey tom how's it going so hypothetically says i grok doc i i do not grok docker grok is a unix command isn't it for uh finding things um adam was saying good evening how are you doing adam so you prefer the mate window manager instead of the stock one i hate that stock one i've got gotta say i much prefer the uh whatever they use on the raspberry pi i quite like that one um so is this on george george is that what we've called our ai i don't know have we given this a name i know we've got alf which is uh my other ai which we've been working on um maybe this should be called george from now on i think you've just named it there i'm going to have to rewatch this later because i'm getting distracted i get distracted and i'm the one doing the stream so yeah i'll watch it back too i think at the end and i need to transfer my pie camera to the jets and nano right so hi i have done this let me just set this camera up so i can show you what's going on here i've got a really messy desk as well i've got to apologize right so if i go for that there we go so this is my smart xxl and i've got i've just just stuck it on there for now it can actually go behind it in a little hole there but for now just stuck it on the front it's it it's configured the right way round but for the life of me i can't get this to actually work so i don't know what i'm doing wrong there apparently it should just pull that in but it's not working um that is a real pitta doll the real pitta to do though pitta is this some kind of gag that i'm not getting there um that was not the droid you're looking for absolutely not so i keep losing sync with the stream i'm going to jump out this watch later so you say out of sync that you're not following the conversational thread or is it um is it stuttering a bit because um i did i do make sure that i'm um yeah it looks healthy um i do check on the stats to make sure i'm connected by not wi-fi before i do these so does it work with other sources eg audio could you get it to detect bird calls absolutely so yeah you don't have to just do image detection there's all kinds of audio detection in fact if you think about um our i've got one just there actually i've got an amazon one just there that's just pure speech recognition so there's a whole load of technology around wake words i thought that was just as simple as running sound through um a speech recognition engine but wake words like a separate thing in themselves because they've got to listen to all kind of background noise and then fire off the uh the speech recognition so yeah you could definitely do that um i've not looked at any of them yet and the jets and nano even though it's we think of it being graphical the the gpu that it's got there is just very good of crunching numbers very very quickly so it doesn't matter that it's a graphics processor it can do audio just as quick so still waiting on those prototype pcbs and the esp um esp cam has heating issues right so mine has got that mine gets really hot um this probably was an early version i might have ordered some other ones before this stream i remember i think i might have ordered a couple uh new ones so yeah garbage in garbage out definitely a recognized phrase rspb is that what i said did i say rb pay i have no idea royal society for the protection of birds not prevention of birds that's a different society oh george jetson of course that's why he's called george yes george jetson i follow you now i'll follow you now yeah so it's tiny compared to my flat right this is really not tidy this is uh really bad i've just got a mound of stuff here that i really don't want to show you too much more of but um yeah there's just a whole load of bits um that just needs tidying up and stuff putting away i mean this there's a failed version of um that smart xxl a blocked out in the 3d model like this is the size of the jetson nano and i actually left that 3d block in the design so when i started printing i was like why is it printing that out so yeah that was kind of uh stupid all right so i said rspca royal society for the prevention of cruelty to animals uh yes yes it's rspb protection birds yeah so that's called the live adrenaline monster i think when you when you uh you're presenting and stuff a lot of uh stuff just falls out of your head so that's to do with that it happens cool okay so i hope you enjoyed this um this video today and how we can uh detect objects on there i'm just going to jump back to that and have a little play with that again if i can remember which button it is it is that button there and if i just wiggle my mouse and get rid of the screen saver we can hopefully just page up to get to the right one which should run in fact i'm gonna have to jump back to oh we want to be in the detection folder we want to be an ssd and then we want to find the command which just runs it which is the detect network so let me just find that clearly typed a lot of stuff in since then there we go we're not far off it now that was what was talking about first oh it's because i exited it didn't it though that's why um so detects net what is the thing for that is it model do you know what i can remember the whole command is that i need to write a script so i can just type go and it'll do it um because i'll forget what it is and you don't want to see me just badly typing in commands for the next half an hour so i'm just going to run that and have another play with it but that's fine you've seen it that's it if you don't make mistakes um it's not alive absolutely this isn't this isn't blue peter i prepared one here's one i prepared earlier blue peter was a kids program from the 80s and uh i think they still have it on actually on bbc and they always used to have this thing where they would they would have this really complicated build and then they would just pull out from another desk here's one i prepared earlier and it's like yeah you would like the production assistance that's way way production values are way way too high than is achievable by a child with some cardboard um can it deal with multiple inputs by the way so three streams from three different sources yes it can um i think the chip's really designed for just one lot of that and certainly the memory that two gig of memory gets used up pretty quick you get these low memory warnings in fact but yeah you can do that and it doesn't have to be a local camera either you can use a rtsp to bring the video across from something else which is why i was thinking about that esp camera wherever that's disappeared to you know i was thinking about bringing there it is bringing across video from that running it through the nvidia jetson and you know we can detect it from a humble smart that doesn't because this just needs five volts so this can just literally be stuck onto another robot it's broadcasting through its wi-fi signal so there's no processing going on here it's just sending out raw video and therefore we could then bring that in process it and then have that processor tell the robot what to do like go forwards go backwards and so on so if you already tried the neural network directly on the microcontroller with micro python so i've not yet tom um i have been looking through to try and find a good example of where that's um really practicable i know the raspberry pi pico has not got a lot of ram it's got like is it 256 maybe 300 and something k ram it's tiny so you have to crunch those uh mobile nets down even further i think you have to make them like an 8-bit version so that it can it's really really lightweight i have seen an example where somebody's um they've got a touch screen like an spi touch screen and you can scribble like a letter three for example and then it will detect that that's a three using um a common uh mobile net for detecting characters and there's only about 26 characters for it to grass so it's not a massive load on there and i don't believe it's too slow either it is okay at doing that so yeah i'm looking for a good example if you can find one just drop me a message and i'll look into that i'm definitely looking into seeing how how we could expand upon this and what else we could do with it which is saying uh have two smiles looking for each other exactly i was thinking like maybe them chasing each other but using image recognition so maybe they could have a little symbol on the back or maybe you could just get them to detect different parts of the robot so you can detect that that's the back of a smart rather than the front of us mars so we could certainly do that as well um do you see my message about the meaning of p-i-t-a i probably missed that actually let me just scroll back um i don't know what you said about that i i saw you you said a pun but i missed what the meaning of that is so i didn't see that come up on the stream tom says that's why i use esp's absolutely because they got loads more they're a bit faster and they've got loads of memory and they've got the wi-fi as well so yeah i'm definitely sold on them cool cool okay so i think that's everything i wanted to cover off on the show um i don't think there's anything else i was going to cover off and we have gone a little bit over there ah youtube probably blocked it it probably did yeah you don't want to get yourself banned on on youtube that wouldn't be fun at all um so i don't know i can't see that in there got you yep yep i have to be careful as well what i put on screen because uh if it detects that and it decides that that's like a foul language or something i can get like a takedown strike on that so that's fine cool okay so thanks for joining me on that one i hope you enjoyed it as much as i did um let's see where else this takes us this was just my first introduction to jetson and using the ai stuff that's on there the deep learning i'm really interested what your thoughts are where we should take this what we should do how we should build it into our small robots and bring them to life and make them more interesting so i shall see you next time hopefully we'll do a midweek video on uh maybe one of these projects uh if not i shall see you uh for the stream on sunday next time thanks everybody for watching bye for now so so so you

Info

Channel: Kevin McAleer

Views: 1,490

Rating: undefined out of 5

Keywords: object detection, image processing, machine learning, deep learning, artificial intelligence, computer vision, jetson nano, custom model, deep neural networks, object detection using tensorflow, object detection python, object detection tensorflow, object detection deep learning, machine learning projects in python, deep learning tutorial, deep learning ai, artificial intelligence robot, Kevin McAleer, Small Robots, Python, Jetson Nano, nvidia jetson nano developer kit

Id: kJpLMBqNcIQ

Channel Id: undefined

Length: 73min 26sec (4406 seconds)

Published: Mon Aug 23 2021