Real Time Sign Language Detection with Tensorflow Object Detection and Python | Deep Learning SSD

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] what's happening guys my name is nicholas trunot and in this video we're going to be building our very own sign language detector and we're going to be doing that using the tensorflow object detection api and python but let's take a deeper look as to what we're going to be going through so in this video we're going to go from start to finish in order to build our sign language detector so first up we're going to start out by collecting our images using python and opencv then we're going to label them using the label image package then what we're going to do is we're going to re-leverage some of the code that we wrote for our other object detector and build a sign language detector using transfer learning and the tensorflow object detection api then last but not least we're going to be able to detect different sign language poses in real time so let's take a look as to how all of this is going to fit together so first up what we're going to be doing is we're going to be using our very own webcam to collect images that we're going to use to train on so those images are then going to be passed to label image and we're going to start drawing detection boxes against our different sign language poses once we've done that we're going to use transfer learning against our tensorflow object detection api to be able to train an object detector and then last but not least we're going to be able to use python and opencv to detect those in real time so ideally what you should end up with is a real-time object detection device that uses your webcam and can detect different sign language poses ready to do it let's get to it alrighty so let's get to building our real-time sign language detector now there's seven key steps that we need to go through so let's take a quick look at those so first up what we're going to be doing is we're going to be cloning our real-time object detection repo and you can see that in the background there so we're going to be able to leverage all of the transfer learning all of the pre-processing and all of the training code inside of there then what we're going to do is we're going to collect some images and for that we're going to write a little bit of code with opencv to be able to leverage our webcam to collect images for training we're then going to set up label image and label our images for sign language detection then we need to update a couple of lines of code in our jupyter notebook inside of our repo train we can then update our checkpoints and then detect so these are really easy steps and our training step is pretty easy we just write a command and let it run so let's go ahead and do it so first up we're going to start out by cloning our repository so we can just copy this link here and go into a command line so if you're in a windows machine then you'll just be opening up a command prompt if you're in a mac or linux machine you'll just be opening up a terminal so in this particular case the command that we want to issue is git clone and then paste in our link and so this will go to github grab that repository and put it inside of a directory in this case we're inside of our d drive so it's going to appear in our d drive and you can see it there already so let's give that a couple of seconds and it should clone and you can see it's now done so we can now open up our repository and take a look at what's actually in here so if you've watched the real-time face mask detection video then this is the exact same code we're just going to be updating it in order to build our sign language detector so if we actually open this up you can see that we've got one jupyter notebook there already and we've also got a folder called tensorflow within our folder called tensorflow we've got some scripts and in here this is going to be our script to create our tf records we've also got a workspace folder and in here we've got a little bit more information so we've got annotations and there's nothing in there yet but we'll talk about that in a sec we've also got a folder called images but again there's nothing in there because we're going to collect them we've also got a folder called models and in there again there's nothing because we're going to set that up and then in our pre-trained models folder we've got a pre-downloaded model from the tensorflow model zoo so in this case we've downloaded the single shot detector mobile net v2 fpm light 320 by 320 coco 17 tpu model so this is a pre-trained model that allows us to perform a task called transfer learning and train our model a whole heap faster so now what we're going to do is go on to our next step so let's take a look at our to-do list now so if we take a look we've cloned our repo so we can mark that as done the next thing that we want to do is actually go and collect our images so let's go on ahead and start doing that so to do that what we're going to do is we're going to open up a new jupyter terminal and we're going to go into the folder that we just created so if we take a look it's called real-time object detection so that's the same folder that we just went and cloned down and if we take a look you can see that it's called real-time object detection so we're going to jump into there and create a new jupyter notebook so to do that we just need to hit new and then python 3. so this is going to give us a new jupiter notebook to work with and now we're going to write a little bit of code that allows us to collect images using our webcam so first up what we're going to do is import some dependencies so we've got four key dependencies so these are opencv uuid os and time so each one of these dependencies is going to make it a little bit easier for us to collect our images so let's go ahead and import those okay so those are our dependencies imported so we've imported four things as you can see there so import cv2 this is really opencv we've also imported os so this is just going to help us work with file paths we've also imported time and we're actually going to use this to take a little bit of a break between each of the images that we collect so we can actually move our hands in order to collect different angles for our sign language model we've also imported uuid so we're going to use uuid to name our image files now the next thing that we want to specify is our images path so let's go ahead and do that okay so we've now gone and specified our images part so in this case we're just going to put it inside of a folder called tensorflow workspace images slash collected images so if we go and take a look at our folder what we're effectively going to have is inside of tensorflow inside of workspace inside of images we're going to have a new folder called collected images so this is where all the images that we collect using opencv are going to reside initially at least so the next thing that we need to do is define the labels that we're going to collect and how many images that we're going to collect in this case we're going to have five different labels so these are going to represent the different sign language poses that we're actually going to collect so in our particular case we're going to do five so the first one is going to be hello the second one is going to be yes the third one is going to be no the fourth one is going to be thank you and the last one is going to be i love you so we're going to train labels for each one of those poses so let's go ahead and set up an array of all of our labels and a variable to hold the number of images we're going to collect okay so in this case what we've gone and done is we've set up two variables so the first one is called labels and this just represents each one of the poses that we're going to collect when we collect our images so we've got one for hello one for thanks in this case it's actually going to be thank you but that's fine we just need a single word to represent it yes no and i love you and then we've also specified the number of images that we're going to collect so in this case we're going to collect 15 different images so we'll probably use 13 of those for training and two of those per class for testing now the next thing that we need to do is actually go and collect our images so let's go ahead and write that code that allows us to collect it all right so that's quite a fair bit of code but let's actually take a step back and take a look at what we've written there so first up what we're going to do is loop through each one of our labels within this array then what we're going to do is create a directory for each one of these labels so ideally what we should have is five folders inside of collected images so one for each one of these labels then what we're going to do is start our video capture and to do that we've used the cv2.video capture method so this is going to initialize our webcam now you might need to play around with this depending on what device number your webcam represents so in this case for my computer it's zero but sometimes you need to play around so in this case on my pc it's zero on my mac it's two so play around with that if it's not working but also drop me a mention in the comments below if you have any trouble at all then we're going to print out that we're collecting images for the first label or whatever label we're up to we're going to sleep for five seconds so this is going to give us some time to get into position in order to collect our images then what we're going to do is we're going to loop through the number of images that we want to collect so in this case we want to collect 15 we're going to set up our capture and specifically we're interested in our frame here so this is going to represent our actual image and then we're defining it at image name so this is the entire path to our image so in this case it's going to be our images path which is this up here or to actually join this together we're using os.path.join and the first part of that path is this images path then the label because remember we've created a folder per label then the name of the actual file is going to be the label dot and then we're passing through some string formatting to pass through our unique identifier so this is going to make sure that each of our images has an individual name and we don't duplicate then what we're doing is we're using the cv2 dot i am right method to go on ahead and write it to that directory and we're also going to show it to the screen so we can take a look at what we've gone and collected then we're going to sleep for two seconds to make sure we can get into another pose and then we're going to finally release our video capture and if we've got any issues we can hit the break key and break out of that now in this particular case i haven't actually run it yet so i'm going to take down the green screen run it and start collecting our images and then we'll take a look at what we've got alrighty so green screen's down so let's go and start collecting our images so for the labels that we're actually collecting so hello is going to be this pose thanks is going to be this pose yes is going to be this pose no is this and i love you is this so what you're going to see is as i'm collecting these images i'm moving my hand pose and i'm using different hands so this is going to boost the likelihood that our model is able to detect each one of these poses so let's go ahead and start collecting them and we've got a bug there this should just be a single ampersand and there we go all right so hellos first image name is not defined what have we done there this should be this let's try that again okay so that looks like hello is done let's drag this over and now we're collecting images oh we can't actually see for thanks so thanks for this okay so that looks like thanks is done now we'll wait for our next so next is going to be yes so it's going to be in this particular format all right there we go switching hands that looks like yes is done now no so no is going to be this cool that's no done and now i love you awesome so that's all of our images collected so you can see that we're now done now let's just close this and if we go into our images folder now what we should do you can see that we've got a whole bunch of images so remember what we said is that inside of our workspace so within tensorflow workspace images collected images we're going to have a number of folders so one for each one of our labels in this case we've got one for hello one for i love you one for no one for thanks and one for yes now what we need to do is go ahead and label these so what we're going to do in order to label these images is use the label image package so the label image package basically is an open source package that allows you to label images for object detection really easily now we're not going to go through the entire steps to set this up so if you want to take a look i'll include a link to the description above so we did this in the full blown object detection tutorial it's only a couple of steps so what we're going to do is we're going to bring that in and then we're going to start labeling our images alrighty so i've got label image inside of our tensorflow package now again if you want to take a look at how to set this up i'll include a link somewhere above to the video where we actually went through and did all of this again it was only a couple of steps to set it up now what we're going to do is we're actually going to use it to label our images but first up what we're going to do is we're going to step into our workspace step into our images and our collected images and we're just going to take all of these out of their individual folders and just put them inside of collected images so that way when we're working with label image we can point to one single directory and label them from there so let's grab the rest and yes and then what we can do is just delete these folders perfect so ideally what you should now have is a folder with all of your captured images so now what we need to do is start up label image and start labeling these images so to do that we just need to go into our label image folder and then what we need to do is run the python command to start it up so that is just python label image dot pi and this is going to start up label image so you can see that we've got it up here it opened up on the other screen so i'm just dragging it over now what we need to do is just open up our image directory and then change our save directory so to do that we just open our directory first up and then remember our images are inside of real-time object detection tensorflow workspace images and collected images and then we want to change our save directory so we go change save directory and again we're going to point it to the same directory so real time object detection tensorflow workspace images collected images and so now what we need to do is label our images so specifically what we're going to be labeling is each one of our sign language poses and in this case you can see that this one's hello it's not exactly pointing to the camera but that's good because it's going to give us more variety in order to label all we need to do is hit the command w and this will give us our labeling tool and then we can hit a and d to pass through between each one of these images now a key thing to note whenever you're working with label image just make sure that you've got auto save mode on it's just going to make sure that it saves each one of your annotations so let's go ahead and start labeling so we're just going to hit w and then we're going to draw a square around that pose and then we're going to type in the name of the label in this case it's hello and we can hit ok and you can see there that as soon as we've gone and typed in that command we're going to have a little label box appear up here then we can go to our next image and do the same thing cool and you sort of get the idea so now what we're going to do is go through each one of our different labels and label them so remember we've got five different labels to do so hello yes no thank you and i love you so let's go ahead and power through those and then we'll take a look at our results okay so now we're up to a different sign so in this case we're just going to change our label so in this case it's going to be i love you and so what we're going to do is each time we're labeling a different time we're just going to make sure that we pass you the right label so in this case this one's i love you so we'll just tick i love you and then we're going to do the same for the other images alrighty and that is our labeling done so let's go and take a look at our results real quick so in this case if we go back into our collected images folder you can see that we've got not only the image but we've also got these xml files so if we bring that over so you can see that inside of these xml files you've got everything that you need to represent your objects so specifically you've got the folder that your image is in the file name the path to that particular file its source the size of that image so in this case you can see that we've got our width our height and our depth we've also got our object so this is the most important component so we can see the label so in this case this label is hello and if we go and take a look at the attached image you can see that in fact that this does represent a hello sign you can also see where the bounding box is so this is really really critical so this is what our object detection model is going to be trained on and you can see that we've got a whole bunch of others as well so if we scroll all the way down we've got quite a fair bit of data now what we need to do is go and split this up into a training and testing partition so this allows our model to train on a certain set of data and then test and evaluate on a separate partition so this ideally helps reduce the chance of overfitting now again you can use a slightly more scientific method of sampling and selecting what's going to be in the training and testing partition in this case we're just going to select a bunch and throw them into our training and select the rest and put it into our testing folder so let's close this and we can minimize our command prompt we're going to open up a new window in our drive so if we go into our real-time object detection folder into tensorflow into our workspace and our images so you can see that we've got two folders under here already so training and testing now there's nothing inside of these so what we're going to do is we're going to select a portion of our files and put those into our trading folder and then we're going to choose the rest and put it into our testing folder so in this case what we'll do is we'll do it by class so if we go to hello we'll grab everything up to about here so in this case you can see that we're going to have one two images for our testing partition now when you're moving these into the training and testing folders you need to make sure that you move both the image and the xml annotation in this case we're going to grab all of these and you can see that we've left the annotations for the rest and put it into our training folder then we're going to do the same for the other classes alrighty so that's our testing bit set up so you can see we've got a whole bunch of images plus the xml annotations now what we're going to do is just step into our testing folder and throw the rest in there and we can delete this dot git keep so that just makes sure that we have a folder all right cool so those are our images done now what we're going to do is we're going to open up our jupyter notebook so the one that came attached inside of our real-time object detection github repo and we're going to start setting up our pipelines so in this case what we need to do is just go into the jupyter notebook called tutorial.ipynb and we're going to start running through this but before we do that let's take a look at our to-do list and see how we're going so we've now collected our images so we can mark that as done we've also set up label image so we grabbed that into our folder and used it we updated our we labeled our images now what we're going to do is we're going to update our label map and we're going to train our model so let's go on ahead and do those so in order to work with this script what we're going to do is we're just going to step through and make the necessary updates in this case our first cell is fine to execute now this particular cell that we've got here so create label map we need to make a few updates now the label map basically is a representation of all the different objects that you've got within your model so what we're going to do is we're going to update this for the new labels that we've got remember we had no yes thank you i love you and what was the other one no yes and hello all right so let's go and update those so the existing label map that we've got in here is just configured for our mask and nomas so this is what we used for our real-time face mask detection so again if you want to check out that video i'll include a link somewhere up there check it out in this case what we're going to do is we're going to update it for sign language so let's include five different labels and then we're going to have one label per class so our first one is going to be hello a second one is going to be yes a third one is going to be no fourth one is going to be thank you and then the last one is going to be i love you and we also just need to change these id's so just make them sequential so in this case we've got one two three four and five okay so we can run that so in this case we've now got a new label map so now that we've executed this folder what we can do is take a look at the results so ideally what we should have is a new file called labelmap.pbtxt inside of our annotation folder so this annotation path here is basically where our all of our annotations including our tf records are going to go but again if you want more detail as to how we wrote this code by all means check out the face mask video it's all in there and all explained in this case we're trying to reuse some of the code that we already wrote so if we go into desktop 2 into annotations you can see that we've now got a file called label map and if we open that up you can see that indeed it represents each one of our different labels so we've got hello yes no thank you i love you and each one of those id's so that's looking good now we can jump back in here the next thing that we need to do is create our tf records so whenever you're working with the object detection api the way that it likes to be trained is using a tf record so it's a special file format that specifically the tensorflow object detection api uses but what we can do in order to simplify the generation of those records is use that script that was included inside of the github repository so if we actually take a look in there so if we go back into our so this is our top level folder if we go into tensorflow and scripts we've got this generate tf record script and this comes from the official tensorflow object detection tutorial so we're going to leverage that in order to generate our tf records now ideally when we run this script what we should have is two additional files inside of our annotation folder one called test and one called train so these are going to represent the tf records for each of our different data partitions so let's jump back into here and generate that now in this case we don't need to make any updates to this particular cell so let's go and execute that so we should ideally get a generated successfully script awesome and you can see that that's run successfully so we've successfully created the tf record file tensorflow workspace annotationstrain.record and we've also created the second one so successfully created the tf record file tensorflow workspace annotations test.record let's quickly take a look at our to-do list so now what we've done is we've now updated our label map and we've also gone and generated our tf records so in a couple of steps we're actually going to go and head and train so let's keep going through our jupiter notebook and we should ideally get to the step to train so now what we're going to do is clone the official tensorflow object detection library so if we actually take a look that is this folder here or that is this github repo here and this basically gives us everything that we need in order to train now if you need a bit more of a detailed guide to set up tensorflow i'd highly recommend this tutorial so this is the official tensorflow object detection tutorial but again i'll include all of the links into the description below so just check those out all be there let's go ahead and clone this this might take a little bit of time so we'll be right back and we're back so now that that's done if we actually go and take a look again inside of our main repository so again let's start from the top so if we go into tensorflow you can now see that we've got a models folder and this includes all of the tensorflow good stuff and specifically the script that we're going to use to go ahead and train our model now let's jump back into our jupyter notebook there's a couple of key steps that we've got left to do so the next thing that we need to do is actually go on and start setting up our configurations our configuration is basically going to be the set of steps and the model information that we're going to use to train our model now in this case remember we already had that pre-trained model inside of our main repo so we can just go and leverage that so we're just going to step through each one of those steps so if we go and run these this is going to create a new folder called my ssd mobnet and it's going to take our existing configuration from our train model and put it into this so now that we've run those two tails if we go back and take a look inside of our workspace and inside of our models folder we've now got a folder called myssdmobnet and you can see we've also got our template pipeline configuration now if we go back into our jupyter notebook we're just going to make a last couple of key updates to that so we're going to import our dependencies we're then going to grab our configuration path and open up our configuration so this is our baseline configuration now what we actually need to do if you scroll down at the bottom there's a whole heap of paths that need to be configured and we also need to change this number of classes figure here so this number of classes should ideally be the number of different types of objects that you want your model to recognize in this case we've got five so those represent each one of our signs in our label map so hello yes no thank you and i love you so we're just going to update it to 5. so to update it's 5 we just need to go to that first line change that to 5 and we're going to run this cell and run this cell so this cell is sort of pre-configured to go and redefine each one of those paths that you need to step through so we won't go into great detail but i'll quickly explain what's here so first up we're updating the batch size so this is how much data is processed within each epoch then we're also updating our fine tune checkpoint so this is where our model is going to start training from and this allows us to use a technique called transfer learning to be able to train a whole heap faster we're then changing our checkpoint type to detection we're changing our annotation path so in this case for our training and for our evaluation and then we're also specifying where our tf records are so in this case where our train record and where our test record is then what we can do is go and write that out so this is going to update our configuration so now if we go back in you can see that it's just gone and updated so 1208 that's the current time 1208 you can see that we've now gone and made those updates now all that's left to do is go ahead and train our model and make those detections so by running this cell here you're going to get the python script that you can use to train your model in this case what we're going to do is we're going to change that last parameter from five thousand to ten thousand so when i was training this previously what i saw is that by training for about ten thousand steps you got the best results so now what we're going to do is copy this command and then open up a command prompt and then run this script so this is now going to kick off our training process and start training our deep learning model now in this particular case how long it takes is going to be very much dependent on the hardware that you've got available so if you've got a gpu versus not having a gpu in this case what we're going to do is copy this command go and run it in a command prompt or in a terminal if you're on a mac and we'll be right back so let's copy this open up a command prompt and drag that over and then we're going to go into our top level folder so in this case called real time let's take cap blocks off and then from here this is where we can trigger our command so if we paste it in ideally what you should start to see is that our model will start training and as soon as you start to see some lost metrics i'll show you that in a second you know that your model is training successfully but again if you want to go into more detail as to how we actually did all this by all means check out the face mask video we went into great detail and actually took a look at how we were writing this code okay and you can start to see our loss metrics have started appearing so you can see here that in this case we've got the step per or the time per 100 steps or training steps and we've also got our first loss metric showing up so in this case it's 0.752 i've seen best results when we get the loss to around 0.18 to 0.15 so what we're going to do is let that train and we'll come back in 10 000 steps and see how our model's performing and actually start to make some real-time detections alrighty so you can see that our model has finished training so we're all up we've got a loss of about .099 and i ended up extending the training uh to about 20 000 steps so ideally this should give us the best possible likelihood of being able to detect sign language in real time so now what we can do is just step back into our jupyter notebook and make a few key updates so we've sort of gone up to step six where we train our model and what we did is we took this particular command and we put it into our command prompt and ran that the next couple of things that we need to do i just run these cells so in this case we're going to load some dependencies it looks like we've got a dead kernel let's just restart that right so now we've restarted our kernel now the next thing so again we're just going to import our dependencies so those look all fine then the core thing that we need to do is update this checkpoint so in this case the latest checkpoint that our model generated was checkpoint 11 and you can see these inside of your models files so in this case the checkpoint that we needed was checkpoint 11 so we can run that to load our model and this will load our model and regenerate it based on whatever we've gone and trained and then below here we've actually got our real time detection so all we need to do to spin up our real detection is to step through each one of those cells so if we hit shift and enter and go through each one of those cells what you'll get is a new python pop-up that allows you to make real-time detection so let's run through that and because our cap hasn't generated yet we can just delete this cell that's fine and ideally what you should get is a little pop-up so let's wait for it to pop up you can see the lighting's changed a little bit so i've just opened some blinds to make it a little bit easier to make our detections and you can see that we've got our detection screen here and if we type up put our hand up let's point out the screen so you can see that it's accurately detecting yes it's detecting i love you it's detecting thank you and hello and it's also detecting no so really quickly we're able to make those detections and it can switch hands let's put the green screen down we can take a look at performance there so hello i love you yes and no and thank you alrighty so just to recap so we captured our images using opencv we then labeled them using label image we then went through our pre-built jupyter notebook and actually went and trained our model and right now you can see that we're making real-time detection so as we're facing the camera we're able to make detections in real time and this could be deployed elsewhere if you wanted to uh it's a fully pre-built model and you can take those checkpoints and work with them going forward and that about wraps up this video thanks so much for tuning in guys hopefully you found this video useful if you did be sure to give it a thumbs up hit subscribe and tick that bell so you get notified of when i'm releasing future videos and let me know how you went about building your sign language detector thanks again for tuning in peace
Info
Channel: Nicholas Renotte
Views: 552,518
Rating: undefined out of 5
Keywords: tensorflow object detection api, sign language detection, python object detection, python deep learning
Id: pDXdlXlaCco
Channel Id: undefined
Length: 32min 28sec (1948 seconds)
Published: Thu Nov 05 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.