Jetson Nano Custom Object Detection - how to train your own AI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey hey robot makers how you doing hope you're  having a good day so far so do you want to know   how to set up a jets and nano to detect custom  objects and see those objects detected in real   time then this is the show for you so let's  dive straight in come with me as we learn to   build the robots bring them to life with code  and have a whole load of fun along the way   okay let me get over to my keynote and we can uh  richard says every day is christmas for kevin i   just get a lot of stuff so yes i was playing  with the this little m5 stamp before well i   will take a look at that towards the end of  the show along with some other things i've   got to show you too but this is all about object  detection so let's take a look at this shall we   okay so this session is all about how to train  our neural network to detect custom objects   we have done something similar like this before on  the raspberry pi but this is considerably quicker   like light years quicker so we're going to look  at types of computer vision we are going to look   at something called mobilenet ssd and what that  stands for we're going to look at how we prepare   our model so capturing the assets the images  labeling those images up training the model   and then converting it to something called onnx  and we'll have a look at what that is as well and   then finally we'll use the model and do a bit of  a demo as well it's so cool i can't wait to show   you this so this is what we're shooting for want  to be able to detect and identify and localize   different types of robots in a live video scene so  in that little demo i've got a snapshot of there   you can see it's detected an auto diy in the scene  and it's also detected a smart robot in the scene   as well so you can just see over my shoulder here  this is where i've been doing that image capture   and if i go to my overhead camera you can  see that this little camera that's on this   arm here that is actually plugged directly  into the jetson nano so i'm just going to   swap out actually my other camera to the  jets and nano so we've got that ready to go   and um you can see there that that's pointing  at the scene i've used a white background just   so that it's not cluttered and it's not  getting confused with other bits and bobs   and it gets quite a clean capture of that image  so that's kind of what we're shooting for and   i've i've decided to capture four different  types of robots um ultimately i'd like to be   able to detect lots of different types so maybe  an in-move head maybe um i don't know what else   did i have on there the weather bot um the robot  cat the open cat there's lots of different robots   we could get it to capture but four was enough  and i'll explain why four was enough shortly   okay so computer vision models so there's three  things that go on with computer vision there is   classification detection and segmentation  so segmentation provides an out an exact   outline around the shape of an image pixel by  pixel so it tells you exactly where that is so   imagine you're driving down a street and you want  to get um a person and not just a rectangle around   a person you want to know the exact boundary of  them or maybe it's an office space and you want   to know where the floor is where the walls are  and where furniture is that pixel by pixel will   be really important to know which segment that  falls into we're not interested in what we're   doing today um so classification identifies what  is in an image so we will be using that today   we'll be definitely detecting which type of robot  that we have and then detection places a bounding   box around specif specific objects in the scene  so there might be more than one so in that little   segment i've got there you can see that there's  a house and there's a tree and there's some   coordinates as well just made those coordinates  up but that's that's what um detection does   so detection it is also broken down into a few  different steps so detection is a branch of   computer vision which deals with localization and  identification of objects so this localization and   identification are two different steps and when we  put them together we achieve that goal of object   detection so localization is specifically about  locating the object within that image or video   stream an identification deals with assigning the  object a specific class or label so those are two   separate things and what we're going to be using  the mobile.ssd can do a lot of those things very   very quickly together because it's optimized for  that use case so preparing our model let's have   a look at what we need to do here so this will  take some time this took me all saturday to do   so first of all we need to create a label file  so we literally create a text file and each line   on the file will have a different class or  thing that we want to detect something that   we want to give a label to so i chose to have a  smart robot um a quad smiles robot a smart mini   and an auto diy just for a bit of a bit of  variation um so we want to create that file we   want to collect a load of images and buy a load  of images we're shooting for about a thousand   images of each class so that's about 4000 images  ideally to get this neural network so it's really   chef kiss you know the best the best it can  be we then need to label the objects with the   images to produce an annotation file and we can  use a tool to do that and in fact nvidia have   merged those first two steps together so as  you're capturing the image you can actually   label it too just to try and speed up the whole  process because it does take a long time to do   once we've got all those we then need to train  our model using this mobile net ssd and then we   need to convert the output of that into something  which is this opel neural network exchange on an x   model format and you can actually use this on  different devices once you've created it so   for example you could create a lot of this stuff  um train it on your just nano and then move it   to a raspberry pi and it should work just as  well it's a format that's open from all of them   so pascal voc so there's a particular format  that we need for these um storing these files   and creating them and a lot of the tools are based  around this pascal visual object class format so   it's just a bunch of folders within those folders  it expects them to have certain file names sorry   certain folder names and then the files that  are in there so for example in the images   you can have a whole bunch of images  and then you have a corresponding   xml file for each image and those xml files in the  annotation thing are created by your labeling tool   and it simply just says what the file is what  objects it can detect within there and what   the coordinates are for each of them because  what our neural network will do is it will   just pull out each of those images resize  them grayscale them and then shove them   through the neural network and we can have  a look at that in a couple of minutes time   so creating the labels file is dead easy it's  just creating a text file so like i said i've   got smart quad smart auto diy and smart mini  and you can see on that screen shot there   it's detected all four types there and i just put  that little red bubble um just above it so you   can see a bit clearer because sometimes that text  is a little bit small you also get the percentage   of confidence that it's detected this correctly  so you can see there the smiles the quad smiles   it's 99.5 it's very very confident about that the  auto diy 92.1 and that's probably because i didn't   take as many pictures of that as i did with the  others smiles mini it's detected 85.7 down there   that's because it's physically smaller it's got  less pixels to play with and then over here we've   actually got an overlap so it's detected smiles  twice one with 96.4 the other with 89.6 and that's   probably because it's two types of smiles that it  can detect it can detect one with the the matrix   display or the one with the rangefinder on there  um but i think also there's a bit of a bug in this   detect net where it can overlap them and it's to  do with a threshold so i've yet to to fix that one   um so asset capture so this is a small video of  me capturing some images so you literally just   draw a rectangle around the robot you just move  it in the scene um yep yeah then go back draw   the rectangle around it then you go up to the  the top right you say what kind of object it is   so this one's a smart robot you unfreeze  the frame you then move the object again   you then go back draw under the rectangle it's  like stop-motion animation so you can see here   this is kind of the amount of effort that it takes  to produce a reasonable model you need a lot of   images and ideally not just from one angle you  want them from above underneath different lighting   conditions with different backgrounds overlapping  other objects and so on so the more effort you put   into this the better model you will end up with  but um i knew that i only had so much time to   do this and allow for an amount of time where it  doesn't work i need to troubleshoot stuff so yeah   you need to repeat this about a thousand times  for each uh object class that you're shooting for   um so on that piece of software i'm using there  is part of this um nvidia jetson inference um   github library which we'll have a look at and this  is called the camera capture utility so it enables   you to sort of do that image capture freeze  the frame draw the rectangle and classify it   okay so hypothetic says using different  background also helps yeah so it can it   can figure out what's just noise and get rid of  that whereas i've gone for that white background   and probably not the best thing but um it just  made for a more pleasing video to record on so   training and testing to the process flow for  this so i've taken this slide from a previous   um video that i did which was on the uh object  detection using a raspberry pi um and using the um   in move head as the the camera and so if you  want to look at that i will put a link into   the description on the video i've not done  that yet but by the time you watch this i   will do that so we need to load the data into the  data model we need to fit the training model with   the training data set and you do normally separate  your training data your test data and your um your   validation data have like three different sets and  two of the sets um it might have seen before but   the third set it will have never seen and that's  how you can see if it's actually detecting things   so what i've actually done here is i've got three  different types of um smart um quad smarts it's   never seen this particular one um it has seen  another one that i've got on my wall over there   um and it was able to detect this one even though  it's never seen it before so that shows to me that   it's the model's working well so we test the  model with the test data the weight adjustment   is done automatically by the algorithm by the the  training program and we keep training until the   the results are acceptable and one of the ways we  can do that is just by defining how many epochs   this will go through so for this one i i went  through 30 epochs to train it i did one initially   just to see and then after that worked well i then  went to for about 30. and that took around this   was it 30 minutes to 45 minutes it's very quick  for a mobile device so again these slides just   these next two or three are from that presentation  i did on the object detection previously so i   will whiz through this a little bit if you want  more detail on that go back and watch that video   but so essentially we capture an image  this is how machine learning works   um we then the camera will take that and it  pixelates it it's um it's converting that real   thing into a series of rgb values so red green  and blue and each of them is between 0 and 255   it will then create a grayscale version of that  because we don't actually need red green and blue   for a lot of image processing we just need to  see the image itself so turn it into grayscale   so just the value between 0 and 255 will give  us the grayscale and then we essentially split   that into a big array so the the neural network  doesn't see like we do it doesn't define areas   and shapes and everything there is literally just  pixels which are values between 0 and 255 and then   it eventually they become a number between 0 and  1 so they become like a floating point number   and um they just represent what that particular  value is for each cell in the array so we convert   that to grayscale it's just simpler to process  we don't need the color information it doesn't   actually add a lot of value in the cases that  we're doing and a lot of the times as well it   will reduce down the image so we might have  captured this as like a hd image um so maybe um   like 1920 by 1080p something like that and  it will reduce it down to 300 by 300 pixels   so it's a lot smaller a lot easier to to  process uh it doesn't take as long to go   through and in fact one of the um the ones we  looked at on our previous stream i think they   were actually reduced down to a 28 by 28  image grid whereas we're working with 300   by 300 so it's a bit more resolution we get more  accuracy but it takes a bit more power to do that   so as i've said it treats the image as an array  of values so what we see is colors are actually   just values between 0 and five five so they're  very small uh to store each one of them just   to bite and then there is the neural network  so there's all these these blue things are the   the neurons the um the net is the connections  between all those neurons and they line up in   sort of layers so each layer connects to the next  layer to every node in that layer and where the   connections are they are the weights they're the  thing that get adjusted by the training algorithm   hey wayne how you doing um so these neural nets  so the neuron receives the input so initially   that's just the pixel value and it might be a  bunch of pixels not just one per neuron it might   just receive 10 for example p neuron we then have  the weights coming in initially they're a value of   like a half and then we multiply the weights the  weight by the input values so we simply do a bit   of maths there you can see these these were the  um grayscale values they're now multiplied by the   the weight not 0.5 to give us a bunch of values  we add them all together and then we have to   crunch that down using something called a sigmoid  function which just makes it a value between 0 and   1 because that number is too big for what we need  it for and we looked at what the sigmoid function   was in the other video so i'll not cover that on  this one so the type of neural network that we're   using is one that's called mobilenet ssd so single  shot detection and before when we're talking   about classification of images localization  identification that identification and um   localization of the objects in the scene think  about how you would do that as a programmer you   might scan every single pixel or a block of pixels  maybe like a moving window to see does it match   more or less this object that we're looking for  now if you can have several objects in the scene   several classes of things you might have to do  that several times and pass through and that   wouldn't be very fast so the single shot detection  essentially just moves through the neural network   um once and um it's able to detect all classes  and all instances of an object in that particular   image so it's very very fast um and very efficient  at doing that so that's that's you'll see that   very commonly if you're using neural networks  to do this kind of work so mobilenet is what   we're gonna be using today as well so if you  like these videos remember to give it a like   if you're watching on the um video now on facebook  or on youtube give me a like give me a thumbs up   um drop me a comment as well just let me know  what you think about this if you've used this if   you've got a jetson now or you're thinking about  getting one or if you've used the raspberry pi   or you've used something like an esp32 they can  do an amount of processing i believe but i've not   tried that myself yet i do have the um i brought  it with me the um esp camera the esp32 camera   and i'm intending to stream video from  that and then run it through the the nvidia   neural network to see if it can do that from  several wireless cameras that'll be quite cool   so yes if you if you like these videos give  me a thumbs up a comment and uh if you've   not subscribed to the channel the channel what  are you doing subscribe now so i do also have   oops that looks interesting let me just uh  i seem to have disappeared there let me just   bring myself back in there for a second um  i've not adjusted these overlays in some   time so i think um the camera must have  changed so if i go to there there i am   fantastic it'll do this on the other two as well  so yes i do a video every single sunday uh seven   o'clock greenwich mean time i think we're on gmt  plus one just for about another month i think and   then it just reverts back to granite's meantime  um so yes you know where you live around the world   so let's give this next one i think they'll do  the same thing but i'll quickly swap that out so let me just switch the camera back to me so  there we go i'll show you what this is to do   that so yes check out the website smashfan.com um  that's where i put all the tutorials i've actually   given a bit of a facelift since i did this little  clip here so i probably need to update this uh   call to action as well but yes i recently  upgraded the capture device i use i used to   use these little hdmi video capture things  to bring in the hdmi video from my camera   and i'm now using a proper elgato s  that's hd 60 plus i think it's called   so it's a bit more crisp and it can do 60  frames a second and what was the last one   i was going to show you again this is probably  needs to be fixed let me just do that as well   there i am awesome so yeah if you want to  support the show you can go to bamyaki.com and um   slash kevin mccleary you can also download um  some of the stuff i've got on there so of the   top trump cards that i created in last week's  video quite a few people downloaded them if   you've not grabbed a copy of them yet head  over there to do that and that can help   support the show and pay for some new equipment  so i'm looking to get a new camera for my overhead   stuff so currently the camera that i've got um  is this this camera here it's not the best camera   it was a cheapish usb camera and um yeah it's not  the best camera in the world so uh if you want to   help out support the show you can do just by going  to buy me a coffee dot com slash kevin maclean's   buy me a coffee and that helps support the show  um i think that's the things i wanted to cover   off there so let's get back over to our keynote it  talks about our new video yes we've covered that   right training the model so this is where the  magic happens so we're going to train the model   we can retrain an existing model in fact this  is what we did with the um the mobile ssd net   mobile net ssd um we've taken an existing model  that's already been trained to detect images   like people and cats and pens and there's quite  a few objects that it can detect and we've   essentially added four things to that and then  we've re we've retrained it doesn't take as long   and you kind of build on that learning that's  already taken place there so we run the ssd   trained.python script all these are python  script there are c versions of this as well   in the repository and we'll have a look at that  in a minute as well so things i found out the   hard way so the labels file simply needs to be  called labels.txt or it doesn't work properly   that must have wasted an hour of my life i think  i'd call it smart labels or something class label   or something like that and the path where you  pass in where the model is shouldn't have any   kind of like slash in it i was getting a bit ahead  of myself with there and putting a forward slash   which meant the route directory rather than  the current working directory that was in   again that threw me for probably another half  an hour and it takes a long time to capture   a thousand images of each class so i did skip  that and probably only did a couple of hundred   um so more images is a better quality of object  and if i tilt these up they won't work um they're   only sort of straight on or as they were sat on  the on the table and the training i was actually   surprised how quick this was so i think 30 epochs  which is like 30 sessions of training took about   between 30 minutes and 45 minutes um it was  quite quick to do which i was impressed with   okay so yes i followed along with the jetson nano  developer site they've got a tutorial there that's   called hello ai world and they've got loads of  tutorials on there some nice youtube videos better   than this one explained by the people who've  written the software by dusty envy who's a   github repository we can see there and it's  built on pi torch which is um a neural network   based on tensor art rt as well and in video  thrown in their own stuff because it's obviously   optimized for their for their hardware um so  they take all through all the different steps   we're gonna have a bash at this now but i'm not  gonna do the entire thing because it does take a   long time to do but i'm just gonna show you  kind of a flavor of how this actually works   so it's demo time it's my  favorite time so let me go over to   this here which is i just wiggle the correct mouse  there we go so i'm on my jetson nano and i've got   two windows open here so let's start off with this  window here so what i'm going to do i'm just going   to run the command which is camera capture and i'm  simply going to say camera capture and then video   zero is the the usb camera that i've got plugged  into this so i've got a web camera just here   so what i might do is um hold various things  up and you'll see me just wheel back to them   so if i pick up this here you can see in real time   you can see in real time that i've got access to  these things so if we want to oops so if we want   to capture that auto diy that's there what we  will do is we'll go over to this window up here   let me just move my mouse away this window  up here let me just bring this down a bit so we are going for detection and what we can  say where the data path is going to be the class   labels i'll just pick the class labels from there  which is in our python training detection and   ssd folder and then within that i've got a  models folder uh in fact let me just jump   back up a folder there data folder because we're  creating the data i've got a smarts folder and   there there we go labels.txt so labels.txt means  i can now freeze this image here so if i go to uh   what am i missing out there i think it's just  the detection path let me just uh set that up   so that is okay let's choose that folder there  we go so now i can freeze the frame so if i   if i'm waving my hand here and then i freeze my  frame it's going to freeze that weight in my hand   now and it's not actually showing you anything  on there because it's frozen that frame but now   what i can do is i can draw that rectangle around  the object that we're interested in detecting   now you do need to be quite tight with  these boundaries anything that you do that   isn't exactly around the object or if  you draw too many that's also not helpful   just remove that one um if you yeah if you if you  don't oh there we go that's what i'm looking for   so i'm just looking to get that foot and also the  head once as tight as possible because we don't   want to include any extra noise that'll just make  it not as good detection um so we're going to make   sure all the different parts of the thing we're  detecting are within that bounding box it doesn't   matter if it overlaps with another one so if this  um if this was like overlapping i'm just going to   unfreeze that now uh you can see there now that's  slightly obscured let me just unfreeze that you can see that sort of obscuring  the the auto there that's fine   we would still do the same  thing we would freeze it and we would draw around what we can see so  the fact that we can still see some little   bits there is fine it doesn't matter that this  is being obscured because later images will   eradicate that from the from the neural net  so over time it'll realize that that's not an   important aspect of what makes up an auto  diy object so if you can see in a little   window there it's not very easy for me to zoom  in on this let me see if i can actually zoom in   um i don't think i can  because it's a screen capture   so on this little window here it says class so i  can drop down there and just read in that text box   the labels.txt and i've got autodiy is a different  class there you can see it's changing the color   of the boundary box so that's purple for the uh  smart mini but we want auto diy and then it's got   an x a y a width and a height so that's just the  x position the y position the width and the height   and then we can delete that if it's a mistake  but what we can also do we're not limited to just   doing one object this scene has got several things  in it that we're interested in so we can actually   bring those in as well so i could draw a rectangle  around that one i can say that that's a smart quad   we've got um there we go that's a smart um yep  a regular smiles i'm not differentiating between   that and this type of smiles that's down  here i'm just going to draw one around that   so that's the smiles and then we have just at  the very bottom there a smart mini so let's just   select that one as well so you can see that it  takes a bit of time to do this and then once   you've done that and you've you've you've saved  it you then have to move everything about a bit   give it a slightly different angle and then do it  all again so you then go back you freeze the frame   you then sort of draw around each one of  these again again so takes ages to do that   and you've got to make a decision as well so for  example should i do that and include the wires   otherwise what really makes up that particular  robot that object or is it actually just that that   bit there because the wires are not something you  might see on every single version you see these   other ones haven't got that but um we'll include  that there just move that out of the way a second   and i'll just bring that sort of in slightly there  and then i'll push that one out just to get the   edge on there and there's also another one here  we could just draw a box around that and we'll   then move on to the next part so that's there okay  so each one of them have been correctly assigned   i don't think this one has here so that one is  that that one there that needs to be a smart mini um and so on so once we've done that  we've we've we'll have to count roughly   how many images we've we've saved have  we got a thousand of each type have we got them from every single angle so i've  just appended one there so i unfreeze that   you know should we get one that's like that angle depending on how we expect this to work if our  robots are never going to see a robot on its side   maybe we don't need to capture that but  just to have total detection and be able   to see something from every angle maybe that's  something we should do for this as well and you   can see that it says the current set so we've got  we've got train we have validate and we have test   and we can actually merge all the sets together  just for speed and have all three the same but   that's not best practice it's actually best  practice to save half of your images for the   training half for the validation and then  another half maybe a third for the testing   so yeah data collection as hypothetic says it  takes the most um most time-consuming part there   so um let's go over to the next thing so i'm just  going to close that window there which is just   going to stop that little python script running  that's all been done in python you can see there   it just says that's shutdown complete  it's been using something called g   streamer to bring in the video and the video is  very very quick um as we shall see now when we   we're actually going to run um the finished  version and i'll show you the intermediate   steps in a second but i just want to get to  the the interesting part of the demo here   which is this one and we'll go through once  we've had a play with this and have a look   a bit more detail a bit more detail about  what's going on there right so it's going to   bring up a window in a second and it's going  to detect all those objects in that scene   and uh just takes about 30 seconds for it to fire  up so there we go everything's looking good i've   typed all the correct things in because i tested  it before i went live and then it's going to   bring up the window with all the objects in  with all the bounding boxes around them too   come on i knew this would take was it 24 seconds  when i was uh when i was testing out there we   go right okay so you can see there it hasn't  detected the smile that's on its side but the   second i flip it it's detecting that it smiles  correctly let's pull some things off this scene   sorry if the audio is dropping in and out  i'll just try and have that just there   i'm going to pull all these away from the scene and let's just try let's just try one thing at a time so there's our auto diy it's nicely detecting that  it doesn't look as confident as you'd expect it   to you know 71 something percent let's rotate  it round and see how that goes it's obviously   i've not taken any um captures of my hand so  it's guessing my hand is a smiles which is   interesting but you can see that as we rotate this  round um it's quite confidently detecting that there we go let's try moving a leg if we do an angle yep it's  still happy with that and i i hadn't recorded   any images of at that angle so that's quite  interesting let's now drop on um a smart robot   so it's happily detecting that 98 there's loads  of smart images in there from pretty much every   angle so that's why it's very happily detecting  that i didn't take as many of the the smart mini   so put that down there it is detecting it and  you can see there we can obscure them and it's   confidently detecting them it's not detecting  the auto quite as well there let's do that yeah so the second that it goes behind it it's not  quite as confident that there's something there to   detect one of the robots that i didn't include in  the image capture was the weatherbot so i'm going   to bring the weatherbot in now oops just destroyed  my weatherbot there if i bring in weatherbots   it just ignores it it doesn't  understand that that's anything   that it should detect so it completely  ignores it from our scene oops similarly this this smart quadro but it's never seen  this smiles quadrotor in any of the training   and it's guessing that it's 98 sure that's  a quad robot so i'm very impressed with   its ability to detect that again if we bring  in the other one that it has trained on   it's very happy 98 that that's a quad  robot from almost every single angle   i was curious to see as well if you tilt this  up does it continue to detect it you can see   there that's it hasn't got any data to say that  that's a quad robot so it's never detected it   from that angle before but side on it's very  very confident that it knows what that is okay so let's move that out of the way and then  another smart robot has never seen this one before   so let's see if it detects this as a smiles  that's got the um the line sensor module on   and it's still happily detecting that  as a smart robot even though we never   trained on that so detecting these  objects these these features of it   quite confidently let's bring  in some other smiles robots bring in that one i think  that's a quad that's interesting   so it's never seen this one before  oh it's now detecting that as uh   some of these things that maybe was detecting  that one behind actually let's move out the way   so yeah detecting that as a quad because  this feature is not something that it's   seen before we take that off and it detects it  as a smarts nope still detecting that as a quad   it's now detecting it as a smiles now to be fair i didn't train it on that  so it has no way of knowing that that   particular part was something of a smiles  robot so you can see how fast this is this   is running very very fast in real time um  very impressive how quickly it can do that so   i mean i'm not saying we could use this  in a you know a road traffic situation   but it's uh certainly good enough for doing  real-time object detection for us now if he was   watching james bruton's video he did something  he had a similar kind of setup and he detected   triangle squares and circles and then he had his  robot drive towards that um and he was using the   position of the object he'd taken that out using  the the python script and edited that so where the   x and y is he could move the robot left or right  to make that object more centered and one that was   centered he would then drive towards it so that  the width increased and when that got to a certain   size it would then turn and then try and find  the next object and he had it sort of going in a   circle detecting the circle the triangle and the  square round and round around so i was thinking   you know we could do something like that bringing  in our where's my robot.com um this one has the   wireless charger on the bottom so we could have  a symbol that that's like power maybe two like um   well maybe not two zigzags that's a probably  not bring it from a historic point of view   um maybe some kind of like power symbol and it  can detect that and it can drive towards that   and then actually as it gets closer um it it  could slow down and be a bit more accurate and   hop onto its little charging bay so  that would be a really cool thing to do   as well as just being able to detect its friends  they could all swarm together and find each other   and run away from enemy robots or something i  don't know so yes i found this a really really   fun thing to do um so let's get back over to the  um nvidia and let's just stop this for a second   and let me show you how we go  about running the training program so i've just got this connected by um hdmi  input so i'm not doing any kind of screen   share thing it's just this is why that's so  fast i did look at using a vnc but it was   very very slow and it you wouldn't have got the  full flavor of just how fast this is at running   okay so let me head over let me exit in fact this  and i shall take you through one step at a time   how we do this so let's just type exit okay so the first thing i did was i went to  github and i cloned the repository the jetson   inference repository and once i've got that  on my machine i went into this folder here   and i think there is a couple  of scripts that are in there   and one of the first scripts is  docker so let's just cd into docker   um apologies that this is so small and if there's  anything we can do to in there we go zoom in let's   zoom in a bit and we can all see what's going  on then a bit easier okay do another one or two   okay so there is a folder that's called  docker and inside that docker folder there is a bunch of scripts so there's  a build script pull push run and tag   and all i've done is run the run script  so if i just jump back type in docker   slash run dot sh docker not lockett if  you've not come across docker before docker is a containerization technology so think  about container ships got lots of containers on it   and docker allows you to run lots of different  pieces of software in separation within their   own container and you can very quickly download  containers update them distribute them you can   do all kinds of clever stuff with them let  me just type my password in properly here   i changed it from the uh very simple  password good grief there we go   right so we're now running that docker instance  so the container there you can see is called   dusty nv jetson inference and then r32 6.1  and hypothetic says i do not grok docker the container is that the volume is a folder um  which is that github folder that i downloaded   but it actually has that as like the root of  that particular container so you can't see   anything outside of that it's just brought in  and then the other thing that it's brought in   is the device that's called video zero which is  the web camera that i've used now i have actually   got um the raspberry pi um csi camera mcsi camera  on there but that isn't working at the moment for   some reason so i'm not sure what i'm doing wrong  there but that's not working so i plugged in a   usb camera that works fine so we're using that  for now so now that i'm in that docker instance   we can see a bunch of folders there so  i'm just going to go into the build folder   let's have a look what's in there  there's a download models you can   you can run that and that will tell you what  let's run it and see actually so if we just um   download models you can see there there's a whole  bunch of different models google net google that's   similar to mobilenet resnet alexnet inception and  there's different ones for different purposes so   these object detection ones as you'd expect are  very good at detecting objects the inception one   which is huge has got all kinds of objects in  there all kinds of office and household objects   um ped net multi-pad face net detect net  you can see their dog bottle chair airplane   mono depth pose estimation so if you've got um  think about a um xbox 360 or xbox one charmage   with a 360 and that had the the connect sensor  on it and it could detect kind of what body pose   you're in what your limbs were the pose estimation  does that so um it can detect all different   parts of your body and therefore what position  they're in using that segmentation type stuff   then there's semantic segmentation which is about  cityscape so that's good for driving down the road   and detecting street signs and pedestrians  and all that kind of stuff very accurately   so the whole bunch of them got some legacy ones in  there as well image processing and that's it so if   you click on any of them you can have this script  download them for you so i had to play around with   the fruits one just to get familiar with it so  then if we go into the um arc 64 which is the   architecture of this we then go into the binaries  folder and then there's a whole bunch of scripts   in there so capture camera capture is what we used  um in that window up there so when i run that up   there that was actually camera capture and then  the parameter that i was using was just simply   slash dev slash video zero and video zero is the  webcam um detect net is the the next program that   we run that's the one that we ran in this window  that it took that model that we'd built and then   it ran that particular model so detect net is what  we use there and then there's a few other pieces   on there that um we probably don't need to look  at so what we're going to do is come out of that   oops maybe if i'm in the correct window  let's just back out of that get back down to the first folder and then we're going  to go into the python folder now   so in this python folder there is a training  folder so let's go into the training folder then   it's got classification detection and segmentation  we're going to go into the detection folder   then there's the ssd model so we're going to go  into the ssd folder and then finally in here there   is a couple of scripts that we use to build and  then test our model so these three folders that   we're interested in this data models and vision so  in the data folder that's where i stored my smart   model so i've created a folder this is called  smarts and then in there there is annotations   image sets and jpeg images and then there's  that labels.txt not let's get rid of that   other one class we don't need that one so  if we just have a look what's in labels.txt   i just want cat to catalogue that we can see  in there it simply says smart quad smart auto   diy and smart mini that's all there is in that  particular file so if we now go into jpeg images   we can see that there's a whole load of images  and in fact we can actually open that folder up   is it which one is it for the  full rubbish using ubuntu so   do you know i can't remember how  you open up a folder in this thing file thingy there it was let's go for that i  don't know if that's the right one file manager   it looks good enough for me okay so if we go back  to where we were looking there so jetson inference   we're in python then we're in the training  folder then was in the detection and the ssd   then the data folder and then smiles and then  jpeg images so let's have a look at one of these there we go awesome picture of some smiles  roblox just hanging about there so you could   see what i did there just took a picture and then  moved to another just moved them around a bit   um took another picture and so on but  using that software um which was the   um the camera capture software it  creates for each one of those images an   annotations file so if we have a  look of this particular xml file here   let me just move over a bit there you can see it  says annotations file is the file name there the   folder is smart the source is smart the annotation  is custom the image is custom the size is   128. sorry one one two eight zero ninety seven  twenty so it's 720p ish the depth is three it's   rgb uh and there's no segmentation information  in there so that one i actually hadn't classified   um so let's just pick one a bit further down  that has got a classification in it so this one   um where are we on the database mars i was looking  for the actual name let's try another one randomly   down here there we go that's better so this has  got a few different objects in it so um there's an   object that's called smiles it hasn't got a  pose so that's unspecified um and then the   x y x max and x and y max so that the coordinates  of the window wherever that is within that object   within that image in fact for that object is  specified there so this is what it generates um   which is just raw data once we've got all those  things done we can just jump back out of there and let's jump back again okay so then we've got the train ssd  so that's what we would actually use   so if i just do python let's do python version  we can see we're on version 3 3.69 so we're not   not massively behind the times on that  one what we are now 3.9 is it for python   um so if we now do python and then train ssd dot  pi now it does need to know lots of information   you can't just run that it needs to know the model  directory so that is smart even it's model smiles it needs to know the labels file hey adam  so labels equals label so that's under model that's into data and that's under smart and that's under  labels.txt what else does it need to   know the data i think is that data i can't  remember off the top ahead what this one is   but you need to specify where all those images  are so again that's under uh data slash mars   uh let's just try that and see what happens  i might have missed something else out there   um it will tell me if there's an error or  it definitely tells you if there's an error   it's horrible um it takes a  couple of minutes for it to   sort of get started yes i've missed out it  doesn't recognize labels equals so is it label   singular let's just try that i'll give up if  that doesn't work because it gets really really   complicated but there's a whole bunch you  can see there data sets um what else is the   base directory scheduler epochs yeah we're  not specified what the epochs are um so   that's where it's best to go to which  we'll do now actually let me load up github   and i'll show you how we get more information  about this so if i just go to there then i   can share my screen there we go so dustin  franklin is uh the person who created this   repository for nvidia um and in there this jetson  inference is where all the good stuff lives   so um they've got all kinds of detail there  image classification object detection semantic   segmentation there's a better view of it there  you can see right round each object it's entirely   colored a particular thing you can see the street  signs the trees are all separated segmented like   so we've got pose estimation got some people  there the limbs which way up they are um really   cool stuff like that and simply what i did was  just went through one of these how to set up your jetson and they've got all kinds of  you know you can build it from you can use   your docker container which is what i did and you  just type in that three commands so we just clone   the repository which is this repository we're  looking at now we go into the folder and then   we just run the document command run and it will  grab everything it needs it sets everything up   and then you're good to go um it then talks  you through how to do all the other steps   um such as training your model so you  can find the bit on training there we go so it talks about transferred learning  so the fact that we've run this through   other objects before it already knows how to  detect i say it knows the functions have been   trained have been tweaked to the extent where  they accept more information a lot easier   so we don't have to bother with pi torch  that's already installed by default whenever   we do anything so we don't need to mess  about with that and what else has it gone   there just trying to see if there's a in fact  there it was retraining your ssd mobile net   so so the command that we are looking  for the droids we are looking for   um is around about here somewhere there we  go so uh data model batch size and epochs   right i'm going to keep that on one screen  there and then i'm going to flick back to um let me see there now let me just see if i  can get this correct so we so instead of it   being model directory we don't need any of that  we just need to say data equals data slash smart   we then say the model directory so that  was in there model dir equals models   smarts helps if you type everything correctly  and then batch size so this is how many   images it processes at a time now i'm only on  the two gig version of the um jetson nano and   i stick to a batch size of two so  two files at once and then the epoch   is how many sessions of training it's going  to run so this is going to be let's just   do one for now now if i've typed everything  correctly there um what's it not happy with what if i type wrong there models  model directory that looks correct   there's two dashes in there  that's why that's not happy so let me just go back to that and then run it again i just wanted  to get to the point where you can see   that it's doing stuff right it doesn't know what  the labels are that means so let's just tell it um where's the label thing is it labels equals data labels.txt i'll give up if that doesn't  work you can have a play with it   yep it's not happy with that labels argument  i can't remember if it's label or label file   or label i could be here all day  without knowing that what that is um anyway so i can jump back to this other  screen and show you on here so what it said   so this is what you would say you would  see a time stamp it would say epoch 0   and it would give a bunch of steps i think  there's about 70 steps when i was doing this   and then it would tell you what the loss  is so it starts out as a really high number   and then that would come down and you  want that number to really to be under one   ideally like 0.1 blah blah blah so the smallest  possible number because that means you've got   a really well trained uh network after it's  done that it will then save the model out to   um something like this so i'd give it a model name  it'll say mb for mobilenet ssd for that single   shot detection epoc which is the epoch number  it'll tell you what the loss was uh and then   it'll have this dot pth as the file extension and  the more epochs you do it'll do one for each epoch   and then when you try and convert that so we need  to convert the model to be an open neural network   exchange file so we just run this on x export  we pointed at the particular model directory   in theory that's what you should do i had to hack  around with a little bit and it will look through   each one of the epochs and find one with the  lowest um lost number because it might not be the   last one you can actually overtrain these models  and it will actually get worse at detecting things   so that's there is an art sort of specifying  how many you want to run then it could be   between 30 and 300 depends just how how  much time you're willing to throw at this   okay so that's what we can do there's a whole  bunch of other um test ones we can run in there   there's like fruit there's airplanes there's toys  there's all kind of stuff but i was really keen to   detect my own objects to see just how easy this is  to do and it's easy it's just time consuming and   a bit fiddly um i've said it's for intermediate  i don't think this is an expert skill to do this   because you are just following kind of a script  from someone um so it is quite straightforward   to do that so that's why i've sort of said it's  a an intermediate kind of skill okay so let's   have a look at some of the comments we've got  through here i've not been ignoring your promise   so let's have a see what people have been talking  about on on the the chat so and also let me just   throw up um the over overlays because i've got  a few overlays there that i was missing off okay   right so we have quite a few people good grief  on the stream today we've got 12 people on here um so carlos hey carlos how are you doing nice  to have you on the stream today and richard you   were saying every day is like christmas for me  yes so i i bought um a whole bunch of things i   want to show you so one of the first ones was um  i was talking with adam um on last week's show   about these uh little m5 stamps the pico stamps  it's an esp32 uh in a little little form factor   um so i've just been having a play with  that i've not actually done anything with   it yet but i had ordered that  and that did come in the post   and i've got some other things i wanted to show  as well so one of the shows i would like to do   soon is one with infrared i'm using infrared  to control this one comes with a little remote   control um control robot so we can sort of send  and receive data to it and have it move about   i mean i think this one is just to send only i  don't think you can send and receive with that   and also in the um small robots group somebody was  having an issue with some nrf modules so i bought   a pack of stuff let me just show you what's in  here so in here we have i don't know if you can   see that i can get that on there so there's a  module in in there and there's got an antenna   as well and then these also come with a little  transmitter module receiver module that that   plugs into and then there is also there is  it a whole bunch of arduino nanos as well   so there's essentially two arduino nanos  and two sets of senders and receivers so i've got a bunch of them i'm gonna have a play  with that and see if we can get a remote control   smiles using nrf as well so that's why  it's like christmas every day for me i   think they were the things i was gonna  show you i don't think there's any more   just yet although there probably is  i'm getting so much stuff every day   hey d johnson how are you doing so hypothetically  saying i've got um i have on each feather rp 2040   and the teen c4 running micropython 1.16 and  i've seen one of those things running on circuit   python awesome so i do believe you can use um a  tiny ml um on raspberry pi pecos and the esp32s   um i've got tnt as well i was looking at that  before actually i think it's a version three i digress that i've just found my i  don't know that's the team notes but   i get distracted easily it's in there somewhere  but i did find the um the camera the sp32 camera   so this thing um it's not particularly easy to  plug into you do need one of these uh ftdi things   to to plug it in but this camera here um it's got  an sd card on it as well and it's just an esp32   chip you can just see on the back there and you  can stream apparently from these so i was thinking   we could stream from this video camera this could  be you know mounted onto the top of the smiles   robot or something like that i'm sure kev thomas  has done something like this already this can   be driving about streaming its video but then you  could use the power of another machine even like a   desktop computer to process that through a neural  network and uh do image detection and whatnot   so uh hypothesis is also saying that use different  backgrounds can also help during the capturing   because that that noise there you do  want that noise to be filtered out   and that's a good way to do is just have a kind  of a busy background as well so question for later   regarding training would putting of an object  on a turntable to change the angle help at all   it would you would certainly get many angles  by doing that i do actually have a turntable   just behind that white thing  sometimes you can see it there's a whole bunch of stuff just sat on it  there it's that white thing that's just there   that is the turntable i have so that would  certainly help um but you do want to get   it from as many angles as possible it might be a  bit too uniform if the thing's you know pivoting   around there you want it to be sort of a bit  closer a bit further away because that will um   the the closer an object gets the  more distorted it gets on the camera   and the further away it gets you know you can  get that perspective effect on it so you do want   these things from many different angles just  so that the network can take that into account   um possibly even not using the same camera though  it probably would make sense to train on the   camera that you're going to be using to detect  stuff on there so yeah i would say turntables   certainly certainly make it easier as you're  capturing it so hey wayne how's it how's it going   um and hypothetically it's not actually bad to  have images where part is covered the a will   get smarter that way absolutely it'll then learn  to detect what is and isn't part of the object   so it's quite good to do that collecting data  is the most time consuming part of deep learning   amen to that i would say um so data the data  is mostly part of the pro process if you have   garbage in you'll get garbage out you know  i was even thinking about including that   exact phrase on one of the slides i was  thinking you know quality in quality out   and um uh the hackanis87 says that is so cool it  is so cool honestly i have so much fun playing   with this um so yes be sure to take some um images  of them falling down yes so if it's tipped over   we want that um to be included in there as well  and from every angle too so you just have to   spend a whole load of time doing this so it needs  much more data and in fact so one of the projects   i've got on the back burner um that this will be  perfect for is twitcher pi so it's on my github   repository and the idea was if you've got a bird  table you've got a raspberry pi zero pointing at   that you can detect that there's birds on a bird  table and you can classify them as being different   types of bird and i went to the extent of  downloading the top 20 birds from the rspca royal   society for protection of birds and um you know  what the most common english birds that you'll see   in a in a garden and then have it classify each of  them and i didn't have a bird table at the time or   a camera set up outside to take these pictures  so i simply went on the web and downloaded 100   of each type of bird and there was 20 birds  so it was like 2 000 and odd pictures is it   is that maths right um and yeah this took a  very long time to do drawing the rectangles   rather than classifying them and so on and then  what was what worked well was if you give it a   picture of a bird that was kind of like this from  the side from the front doing some elaborate thing   it would work fine but the back side of it no  one takes pictures of the back sides of birds   so of the avian variety certainly so  because of that because of that there   it only works when the birds are front on or  side on or sort of taking off something like that   um so that's one of the things i was thinking  about maybe um working on a bit further and   using the sort of knowledge i've got from  this as well i know how i could do that i   was actually thinking about combining it to  make it a bit more automatic so if i had um   present sensor like using one of the infrared  things i've got from the um one of the kits which   is on that just there actually that's the pico  starter kit in there there is an infrared detector   um motion sensor that's the one upload i'm looking  for a motion sensor so if i had a motion sensor   set up to trigger to take some pictures i could  do half the work which is taking pictures of the   bird table while there were birds on it and  then i'd only then have to go through and   draw rectangles around a bunch of them so i was  thinking about having some logic if it triggers   it it would take a picture but then wait for  two or three minutes before it does it again   so that's one of the things i was thinking  about there so hey tom how's it going   so hypothetically says i grok doc i i do not grok  docker grok is a unix command isn't it for uh   finding things um adam was saying  good evening how are you doing adam   so you prefer the mate window manager instead  of the stock one i hate that stock one i've got   gotta say i much prefer the uh whatever they  use on the raspberry pi i quite like that one um   so is this on george george is that what  we've called our ai i don't know have we   given this a name i know we've got alf which  is uh my other ai which we've been working on   um maybe this should be called george from  now on i think you've just named it there   i'm going to have to rewatch this  later because i'm getting distracted   i get distracted and i'm the one doing the  stream so yeah i'll watch it back too i   think at the end and i need to transfer my  pie camera to the jets and nano right so hi   i have done this let me just set this camera  up so i can show you what's going on here   i've got a really messy desk as well i've got to  apologize right so if i go for that there we go   so this is my smart xxl and i've got i've just  just stuck it on there for now it can actually   go behind it in a little hole there but for now  just stuck it on the front it's it it's configured   the right way round but for the life of me i can't  get this to actually work so i don't know what i'm   doing wrong there apparently it should just pull  that in but it's not working um that is a real   pitta doll the real pitta to do though pitta is  this some kind of gag that i'm not getting there   um that was not the droid you're looking for  absolutely not so i keep losing sync with the   stream i'm going to jump out this watch later so  you say out of sync that you're not following the   conversational thread or is it um is it stuttering  a bit because um i did i do make sure that i'm   um yeah it looks healthy um i do check on the  stats to make sure i'm connected by not wi-fi   before i do these so does it work with other  sources eg audio could you get it to detect bird   calls absolutely so yeah you don't have to just  do image detection there's all kinds of audio   detection in fact if you think about um our i've  got one just there actually i've got an amazon   one just there that's just pure speech recognition  so there's a whole load of technology around wake   words i thought that was just as simple as running  sound through um a speech recognition engine but   wake words like a separate thing in themselves  because they've got to listen to all kind of   background noise and then fire off the uh the  speech recognition so yeah you could definitely   do that um i've not looked at any of them yet and  the jets and nano even though it's we think of it   being graphical the the gpu that it's got there  is just very good of crunching numbers very very   quickly so it doesn't matter that it's a graphics  processor it can do audio just as quick so still   waiting on those prototype pcbs and the esp um  esp cam has heating issues right so mine has got   that mine gets really hot um this probably was an  early version i might have ordered some other ones   before this stream i remember i think i might have  ordered a couple uh new ones so yeah garbage in   garbage out definitely a recognized phrase rspb is  that what i said did i say rb pay i have no idea   royal society for the protection of birds not  prevention of birds that's a different society   oh george jetson of course that's why he's called  george yes george jetson i follow you now i'll   follow you now yeah so it's tiny compared  to my flat right this is really not tidy   this is uh really bad i've just got a mound  of stuff here that i really don't want to   show you too much more of but um yeah there's  just a whole load of bits um that just needs   tidying up and stuff putting away i mean this  there's a failed version of um that smart xxl   a blocked out in the 3d model like this is the  size of the jetson nano and i actually left   that 3d block in the design so when i started  printing i was like why is it printing that out   so yeah that was kind of uh stupid all  right so i said rspca royal society for   the prevention of cruelty to animals uh  yes yes it's rspb protection birds yeah so   that's called the live adrenaline monster i think  when you when you uh you're presenting and stuff   a lot of uh stuff just falls out of your head so  that's to do with that it happens cool okay so i   hope you enjoyed this um this video today and how  we can uh detect objects on there i'm just going   to jump back to that and have a little play with  that again if i can remember which button it is it   is that button there and if i just wiggle my mouse  and get rid of the screen saver we can hopefully   just page up to get to the right one which  should run in fact i'm gonna have to jump back to   oh we want to be in the detection folder  we want to be an ssd and then we want to   find the command which just runs it which is  the detect network so let me just find that clearly typed a lot of stuff in since then   there we go we're not far off it now  that was what was talking about first oh it's because i exited it didn't it  though that's why um so detects net what is the thing for that is it model do you know what i can remember the whole command  is that i need to write a script so i can just   type go and it'll do it um because i'll forget  what it is and you don't want to see me just   badly typing in commands for the next  half an hour so i'm just going to run   that and have another play with  it but that's fine you've seen it that's it if you don't make mistakes  um it's not alive absolutely this isn't   this isn't blue peter i prepared  one here's one i prepared earlier   blue peter was a kids program from the 80s  and uh i think they still have it on actually   on bbc and they always used to have this  thing where they would they would have this   really complicated build and then they would  just pull out from another desk here's one i   prepared earlier and it's like yeah you would  like the production assistance that's way way   production values are way way too high than  is achievable by a child with some cardboard   um can it deal with multiple inputs by the way  so three streams from three different sources   yes it can um i think the chip's really designed  for just one lot of that and certainly the memory   that two gig of memory gets used up pretty quick  you get these low memory warnings in fact but   yeah you can do that and it doesn't have to  be a local camera either you can use a rtsp   to bring the video across from something  else which is why i was thinking about that   esp camera wherever that's disappeared to you know  i was thinking about bringing there it is bringing   across video from that running it through the  nvidia jetson and you know we can detect it from   a humble smart that doesn't because this just  needs five volts so this can just literally be   stuck onto another robot it's broadcasting through  its wi-fi signal so there's no processing going on   here it's just sending out raw video and therefore  we could then bring that in process it and then   have that processor tell the robot what to  do like go forwards go backwards and so on   so if you already tried the neural network  directly on the microcontroller with   micro python so i've not yet tom um i  have been looking through to try and find   a good example of where that's um really  practicable i know the raspberry pi pico   has not got a lot of ram it's got like  is it 256 maybe 300 and something k ram   it's tiny so you have to crunch those uh mobile  nets down even further i think you have to make   them like an 8-bit version so that it can it's  really really lightweight i have seen an example   where somebody's um they've got a touch screen  like an spi touch screen and you can scribble   like a letter three for example and then it will  detect that that's a three using um a common uh   mobile net for detecting characters and there's  only about 26 characters for it to grass so it's   not a massive load on there and i don't believe  it's too slow either it is okay at doing that   so yeah i'm looking for a good example if  you can find one just drop me a message and   i'll look into that i'm definitely looking into  seeing how how we could expand upon this and   what else we could do with it which is saying uh  have two smiles looking for each other exactly i   was thinking like maybe them chasing each other  but using image recognition so maybe they could   have a little symbol on the back or maybe you  could just get them to detect different parts   of the robot so you can detect that that's the  back of a smart rather than the front of us mars   so we could certainly do that as well um do  you see my message about the meaning of p-i-t-a   i probably missed that actually let me just scroll  back um i don't know what you said about that i i saw you you said a pun but i missed what  the meaning of that is so i didn't see that   come up on the stream tom says that's why i use  esp's absolutely because they got loads more   they're a bit faster and they've got loads  of memory and they've got the wi-fi as well   so yeah i'm definitely sold on them cool cool okay  so i think that's everything i wanted to cover off   on the show um i don't think there's anything else  i was going to cover off and we have gone a little   bit over there ah youtube probably blocked it it  probably did yeah you don't want to get yourself   banned on on youtube that wouldn't be fun at  all um so i don't know i can't see that in there got you yep yep i have to be careful as well what  i put on screen because uh if it detects that and   it decides that that's like a foul language  or something i can get like a takedown strike   on that so that's fine cool okay so thanks for  joining me on that one i hope you enjoyed it as   much as i did um let's see where else this takes  us this was just my first introduction to jetson   and using the ai stuff that's on there the deep  learning i'm really interested what your thoughts   are where we should take this what we should  do how we should build it into our small robots   and bring them to life and make them  more interesting so i shall see you   next time hopefully we'll do a midweek video  on uh maybe one of these projects uh if not i   shall see you uh for the stream on sunday next  time thanks everybody for watching bye for now so so so you
Info
Channel: Kevin McAleer
Views: 1,490
Rating: undefined out of 5
Keywords: object detection, image processing, machine learning, deep learning, artificial intelligence, computer vision, jetson nano, custom model, deep neural networks, object detection using tensorflow, object detection python, object detection tensorflow, object detection deep learning, machine learning projects in python, deep learning tutorial, deep learning ai, artificial intelligence robot, Kevin McAleer, Small Robots, Python, Jetson Nano, nvidia jetson nano developer kit
Id: kJpLMBqNcIQ
Channel Id: undefined
Length: 73min 26sec (4406 seconds)
Published: Mon Aug 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.