Build a Deep Face Detection Model with Python and Tensorflow | Full Course

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
ever wanted to build your very own face detection model well in this video we're going to be doing exactly that using a deep object detection model ready to do it let's get to it [Music] [Music] [Music] what's happening guys my name is nicholas renaud and in this tutorial we are going to be doing exactly what i said at the start building our very own face detection model now the cool thing about this is that it is actually based on an object detection architecture so if you wanted to go and use this for something else maybe detecting different types of objects you'd actually be able to use this pipeline to do exactly that now in this tutorial we're going to be going through a bunch of stuff and as per usual we're going to be doing it with a client and developer relationship so you're going to see the client conversations as we go through it but specifically we are first up going to start out by collecting a bunch of images using opencv and labeling them using a new library called label me but the real kicker is we are going to be performing image augmentation so we're going to be using a different library for that and i'm going to talk about that a little bit later in section two we're going to annotate it and then build up our deep learning model from scratch i'm going to show you how to actually build this deep learning model for object detection then we're going to test it out ready to do it let's get to it so nick i tried your object detection course but i got some errors uh how many a million a million a million well look how about we go on ahead and do it from scratch no wait really yeah well the first thing that we need to do is get some data and label it for this particular use case we are going to be detecting our face using bounding box detection this means that you'll be able to capture where your head is in the frame nice what are some of the use cases for this well you could use it for facial sentiment analysis or facial verification but keep in mind that this is a generic object detection pipeline so as long as you have one type of object you could actually use this flow for just about anything cool let's do it alrighty welcome on back to the breakdown board if i sound a little bit stressed this is because this is the sixth time attempting to try to record this breakdown board i'm gonna cross my fingers and hope all of this tech works so in this video we are going to be very much doing an object detection model type tutorial but we are going to be a little bit more practical and actually do a face detection model so the first thing that we are going to be doing is we are going to be collecting some images using our webcam now from these images what we're actually going to do is then begin annotating so we are going to take our images let's just imagine we have some luscious mountains in the background and the sun shining and we have ourselves now by annotation what i actually mean is we're going to be drawing a bounding box so we'll be drawing a bounding box around our head so this will look a little bit like this and in order to do that we're actually going to be using a library called label me so in the past for object detection tutorials i've used a library called label me but i actually not used label me i've actually used a library called label image what i found is label me is really really powerful because it allows you to do more than just bounding box annotation if you wanted to do key point annotation or if you wanted to do segmentation you could actually use this particular library to do just that now let's say for example we collect maybe a hundred ish images that's probably not going to be enough to actually train out a deep learning model and this is where data augmentation comes in for this we're going to be using a library called arbumentations i think that's how you pronounce it don't quote me on that but what that is going to allow us to do is take our data set and apply random cropping apply random changes to brightness apply random flips and apply random gamma shifts and rgb shifts this is going to mean that we can take our base data set and flesh it out so that we have over 30 times the amount of data which is going to take us let's say from 100 to 3 000 images which should effectively be enough to actually build up our deep learning model now this library is really really powerful because it not only augments your images it also does your annotations at the same time if you imagine when you do a crop on your data set let's say for example our full image was this if i actually crop it out there our bounding box coordinates actually need to change right so we actually need to shift those over albumentations actually does that for you so it's super super powerful and super super useful now once we've got our images and once we've got our annotations and once we've augmented them then it's time to actually train our deep learning model now if you think about what an object detection model is it's really two different types of models it's a classification model which is trying to classify what the type of object is our particular case is just going to be one class which is our face and the second part is actually a regression model trying to estimate the coordinates of that particular bounding box now we only really need two coordinates to draw a box we need the top left or the top right and we need the bottom left or the bottom right so if as long as we get two opposing sets of coordinates we can actually use those to draw a bounding box now before we actually go and train our deep learning model we need to define the losses so for our classification component it's going to be a binary cross entropy loss which is pretty common when it comes to actually doing a classification model so this is going to be doing the face classification so that's our loss for that and then the second loss that we actually need is our localization loss so what we're actually going to be doing is estimating how far off our predictions were from actually drawing that box now we are going to do it slightly differently so we're going to take a look at a top coordinate and find out how far off our prediction was from that and this is what this component of the loss is doing here so you can see we're taking our x-coordinate and comparing it to the predicted and our y-coordinate and comparing it to our predicted so that's the first component then the second component of this loss is actually evaluating the width and height of that particular bounding box so what we're going to do is we're going to compare our true value versus our predicted value so we're going to compare that for our width and then we're also going to do it for our height and that is what these two parts of this loss are doing so if you think about our localization loss this is ensuring that our box is as close to possible or close to representing our object as possible so once we've actually gone and defined those losses we're actually going to use the keras functional api to build up our model now we're actually going to be using a vgg 16 model which is a classification model for images now the beauty of this is that it's been pre-trained on a ton of data already so we can actually just use it inside of our model and add in our final two layers which are going to be our classification model and our regression model to be able to give us our bounding boxes so once we've gone and built up that model what we'll get out of this model is six different values so our third well actually no five different values so our first set of values is going to be either a zero or one it's not going to be both of those it's going to be a range within that value and that's going to represent whether or not a face has been detected within our particular image and then the second set of values is going to be a set of four values and this is going to be x1 y1 x2 y2 which represents the coordinates for our box so we'll actually get five outputs out of this model which we can then use to be able to do detections so once we've gone and trained that we'll actually go and test it out in real time and we'll make detections which will effectively be able to go and detect our face and that in a nutshell is the breakdown board over let's jump on over and get to coding alrighty guys so face detection so in order to go through this i've gone and written a ton of code which you're going to be able to leverage and if you want to go through this code yourself it is all available on my github repo so you just need to go to knick knock knack forward slash face detection and it is all there now whilst i've called it face detection what we're really dealing with here is an end to end object detection pipeline because we're actually going to be doing bounding box detection which you'll see in a second so first things first what is it that we need to go on ahead and do well let's actually take a look at our pipeline so first things first we're going to deal with section one setting up and getting our data as well as doing a little bit of annotation so first things first we need to go and install some dependencies so if we open this up we can see that we have one two three four five six different dependencies that we need to go and install inside of our python environment now if you don't know how to work with jupyter or you don't know how to set up custom environments by all means go and check out the deep learning beginners tutorial i'll show you how to get all of this set up so effectively you will be starting where i'm showing you right now cool so first things first we've got to go and install a bunch of stuff so we need to run exclamation mark pip which is a standard python installation so pip install we're going to be installing label me let me actually show this so label me so this is quite possibly my one of my favorite libraries that i've dealt with lately so this allows you to do a ton of different types of annotation and we'll probably be exploring this a ton more in the upcoming deep learning series or wherever we're going with this stuff so we're going to be using labelme for our annotations tensorflow and tensorflow gpu so our friends are going to be helping us with deep learning opencv python so this is going to be used for real-time detection as well as capturing images which you'll see over here we're also going to be using matplotlib and this is really to do our rendering so right about here we'll be using matplotlib and then albumentation this is a new one so this is what we're going to be using to do our data augmentation so the nice thing about this is that if you go into the documentation and again i'll link to this in the description below so if you actually go down there's actually a walkthrough to do bounding box augmentation now i mentioned it in the breakdown board so a really important thing when you're doing data augmentation for deep learning and specifically for object detection is that you not only need to augment the images if you start cropping the images you need to augment the labels and albumentations actually does that for you absolutely fantastic cool so let's go ahead and run this install so we're going to be installing label me tense flow opencv dash python map plot label and augmentation so if we scroll on down doesn't look like we've got any errors there so that is our first line of code successfully run so we've now successfully gone and installed our dependencies and set those up now the next thing that we need to do is actually go and collect some images so in order to do this we need to import four different libraries so first up we're going to be importing os and i've talked about this a lot before so os just makes it a lot easier to navigate through different file paths so if you need to join file paths together if you need to list stuff inside of directories os is your friend so that line is import os the second line is import time so when we actually go and collect our images we want to give ourselves a little bit of time to move around and that is exactly what this time library over here is going to help us do we are then going to be importing uuid so the nice thing let me actually show you what uuid does so uuid actually allows you to create a unique identifier a unique uniform unique identifier so if i type in uuid dot uuid one this gives us a specific uniform identifier so you can see that that is that identifier there this allows us to create unique file names for our images rather than going image one image two image three blah blah so on we're gonna use this to make us look a little bit more professional so uh this allows you to create uniform unique identifiers so that's uuid and then opencv by now if you're doing object detection you've probably heard about opencv a bunch so opencv allows us to work with different sensors different cameras just makes our lives a whole bunch easier in order to do computer vision so those are our four dependencies now imported the next thing that we need to do is go and define some paths so our images are going to go inside of a folder called data and we're going to specifically put them inside of a additional folder called images so we're going to specify this for now so let's actually tweak this so it's going to be os.path and we're going to put it inside of data and inside a folder called images and the number of images that we're actually going to collect is i've got 10 there but we really need more than that so we'll probably collect 100 to be honest it's going to take a while to annotate these but i want to show you antenna pipeline so let's actually collect let's collect 20 to begin with and then what i might do is move the camera a little bit collect another 20 move the camera a little bit more collect another 20 i don't know maybe change my shirt so that we've got different samples so ideally you want as much variability in these types of data sets that you're collecting let's collect 30 to begin with and then we'll do 30 30 30. okay so uh data is going to be uh oh we're still part uh this should be osophatojon so our images are going to go into a folder called data and then images and we're going to collect 30 images to begin with now we first up actually need to create these folder structures so if we go into our repositories i'm working inside of jupyter lab so nice thing about this just makes it a little bit easier to navigate through our file structures i'm going to create a folder in here and i'm going to call it data and then inside of data i'm going to create another folder and i'm going to call that images and i'm going to create another folder and i'm going to call it labels so this is going to be where we store our raw data and our unpartitioned data so in a couple of seconds once we get into partitioning or a little bit later on once we get into partitioning we're actually going to create three additional folders so train test and val and we'll actually split up our data manually so um there is a way to do it automatically but to be honest i was running out of time to get this tutorial done this actually took me five days to write in total if you have a better way to split it up let me know um but for now we will just know that you need an images folder and a labels folder inside of a data folder so this is my top repository if i go into data i've got a folder called images blank right now and inside of there i've got a folder called labels also blank right now now what we're going to do is we are going to collect a bunch of images so let's walk through this so the first line is establishing a connection to our video camera so again if you've seen any of my computer vision videos before this would be pretty familiar to you so it's cap equals cv2.video capture and then we need to pass through our camera number so i think a lot of you have had problems with this before when you are passing through this number you need to you might need a test right so the number for your particular webcam or your particular video capture device might differ right so the one that i'm recording this tutorial on right now is video capture device zero i believe and the one that we're going to be recording our or capturing our images for i think is video capture device one but we're gonna test this out and see if it works worst case scenario if it doesn't work you just rerun it change the video capture device number and and keep testing until you get that right capture number so that's our video capture device then we're going to be looping through our range of images so we are going to be collecting 30 images so we have written four image num in range and then these number of images that we're going to be collecting so effectively we're just writing a loop so for image num in range 30 print image num right so we're just going through 30 different images we're going to collect 30 to be in with right so it's just a basic loop then we're going to print out that we're collecting image and we're going to print out which image number we're up to we're going to read from our capture device so again this is common to a opencv image collection pipeline so we're going to get a return value to whether or not we've successfully gone and captured something as well as the frame itself so red comma frame equals cap dot read so it's going to capture the frame and it's going to allow us to write it down using cv2 dot i am right so this just defines the name of the file that we're actually going to be passing through and we are passing through the images path which is what we defined up here so it should go data images and then a unique file name which is based on uuid one as i mentioned before then we're going to write it out using cv2 dot i'm right and we're also going to show it back to the screen so we can see it now we're also using that time library that i mentioned up here so we're going to sleep for i believe it's half a second between each frame so this gives us a little bit of time to move around maybe move our head in around the frame move it out of the screen because we also want to capture a couple of frames with our face not in it so this gives us some negative samples particularly important for our classifier component of our deep learning model and then all of this stuff down here is really just our usual cv to break code that allows us to break out of the loop okay so that is looking pretty good for now so let's actually go on ahead and run this actually i'm going to take down the green screen so that way we don't have like we have a little bit of noise in our background rather than just a straight green screen makes it a little bit too easy for our deep learning model all right so that is the green screen now down i mean that actually looks pretty good in the recording over the green screen all right uh let's actually go and run this so we should get a little pop up towards the bottom of our screen assuming everything is working okay there we go all right so you can see that so i'm just going to move around let's move this out of the way i'm also going to jump out of the screen completely all right so let's actually take a look inside of our images folder so it looks like we've got a bunch of images beautiful i don't like that we've got the mic in there that's fine we've got a couple without so you can see that we've got a couple with ourselves out of the frame as well looking good okay so that's our first 30 but again we probably want a lot more so now in this set that we collect we might move a little bit further back and collect another 30. so let's actually just let me actually see that green screen back all right let's actually collect another 30. and again if you've already got images collected you can just annotate those as well another key point to note let's collect another 30. i'm moving around it hasn't even popped up yet okay there we go boom move over to here over here completely out of the frame move over here so close let's actually do a couple where we're kind of close to the screen as well i covered my face a little bit alrighty so that's 90 images now or it should be 90 images actually go and take a look how much shade do we have so if we go into images let's see we have 90 images okay i think that's probably going to be enough to begin with let's see bring the green screen back okay um all right so we've got images so we've got this i pi mb checkpoints we i'm going to delete that out of there because we don't really need it and then if we go into labels we still don't have anything so how we doing so far so we've got images where are we up to so we have successfully gone and set up and got some data we've installed our dependencies we've now gone and collected 90 different images and now what we're going to want to do is label those so what we're going to do is we are going to annotate those images using a library called label me so if i go and run exclamation mark label me this is going to trigger the label me annotation software or package right now again you can run this from a command prompt so if you open up a command prompt you activate your environment and run labelme at the command line this will allow you to do it but assuming you've got it installed from over here you should be able to successfully use it if you get stuck go and take a look at the documentation because there is a bunch so really all you need to do is go to the command line run pip install label me and then go to the whatever command line you're using well i'm using windows over here so it's really pip install label me and then you run label me to start it right i'm doing it inside of a jupiter notebook because i want it to be inside of a jupiter notebook but you sort of get the idea so if we go and run this we should get label me popping up towards the bottom here and there you go and what i'm going to do is we are going to open up the directory with our images so we can actually over here i don't know if i can zoom into this so open over here you can see it says opendr so we can actually open that directory go into our data folder go into our images folder and select our folder and look at that we got our images that we just collected so the other thing that we want to do is we want to change where our labels are going to be saved to because what we're effectively doing is we're creating labels now right pretty awesome part of the deep learning process so what we're going to do is hit file and then we are going to hit change output directory and i'm going to choose the labels folder the this isn't going to break anything the only thing that's going to happen if you don't change the annotation folder or the output folder it's going to save your annotation to a different folder you just need to move them around not no biggie right so i'm going to choose that folder then the other thing that i'm going to do and this is going to save us a bit of time is if you go to file save automatically it's automatically going to save those annotations so you don't need to hit save every single time just going to make your life a bit easier okay so that is the beginnings of this look how nice that plant looks that's a nice plant i don't know what it is we've got it randomly okay so that is our those are our images what we now need to do is annotate so what we can do is just go over to here or over to edit hit create rectangle and then you're going to get this little crosshair symbol to annotate all you need to do is click the starting point for the bounding box and then click again so if i click you can see i've got a little green thing and if i draw the box look at that and then click again and it's going to ask me to name what type of class this is and i'm just going to name it face i don't know if you can see that you can see i've just written face because we're only going to have one class and then we're going to hit ok and that is our annotation then if i hit d on my keyboard it's going to allow me to loop through each one of these so again i can draw another bounding box face this one's a bit blurry but we'll do it nonetheless bass so again all i'm doing is i'm clicking drawing and bounding box hitting okay clicking drawing a bounding box around my head okay clicking growing bounding box okay and i'm just going to keep going so for the images where we don't actually have a face in the frame we're actually going to do nothing so i've actually set up this code so that it will allow us to handle images where we don't have any examples and even though the mic is in the frame there we'll do it i'm just going to keep doing it so again so all i'm doing is i'm clicking to begin the annotation drawing the annotation clicking to end the annotation and then hitting ok and then i'm hitting d to go to the next image click draw click again next all right so for this one i'm going to draw like it's going to be a bit sketchy but i'm basically going to do that so we've at least got part all right so for this one our face isn't in the frame so i'm actually gonna skip that so that means there'll be no annotation for that particular file but that's fine the code is actually set up to handle that all right so i've drawn so bounding box there bounding box there bounding box there this is a close-up bounding box landing box and the cool thing about this is that like once you've gone to the effort of oh hold on no this is not d delete that want face boom bass how do we delete that annotation now that i've screwed that up uh that's fine we'll leave it we'll review boom base we delete that uh edit polygons so you can see that i've accidentally created a label list there with the d i don't know if that's gonna let's actually just quickly review our annotations actually this is a good point to to go and review so we've gone and started annotating what you'll actually see is if you go into uh the labels folder because that's where we've pointed our annotation so if we go into data and then labels you can see we've got all these json files now let's take a look at the last one because just want to make sure so this is what it looks like looking it's looking okay all right so you can see that this is what an annotation looks like so we've got the version i believe of label me any flags we've also got the shape so this over here is what actually represents our annotation so you can see that our label is saying face and it's actually got the points so these are the coordinates for our frame so i believe it'll be what will it be so it should be the width first and then the height and then the so this will be point the first point at the top this will be the second point at the bottom cool all right so these are our annotations so that doesn't look like adding that uh that oh hold on this one's got d let's actually clean this one up let's take a look to see if it screwed up any others that one's okay that one's okay which one was screwed up so let's actually go back and find that one that we screwed up so i'm just gonna hit there we go all right so you can see it there i'm gonna remove that annotation and save that so that should hopefully fix it so that was annotation for image 32ff8075 so these are things that happen right like sometimes it'll screw up you'll maybe write an incorrect annotation i sort of wanted to show you this rather than make it all perfect and then you're like nick it's not working okay cool so that looks like we've removed that let's just double check it's the same one super small no that is not the same one so 32 ff 805 it should be this one here see uh god that is a long all right let's zoom out look at my head there see if it's been no it's still in there why is it not deleting all right let's just go and do it manually so i'm just going to step in i think it was this one right so i'm actually just going to delete it manually so if we go in inside of this i'm going to delete all of that again this is purely optional but if you do screw it up accidentally put an incorrect annotation you can just delete it out of the shapes array so you can see that there i'm going to delete that hit save right so if we go and let's go back a little still looking good doesn't look like it's appearing anymore because we've gone and deleted it out okay i think we're looking good and again so you can just go backwards and forwards a and d it's like the normal wasd okay let's keep going so again we're going to draw another rectangle and we're going to set it up as d keep doing this so we're going to set it up as face not d really annoying me that i have to now go and delete let's actually save this let's close this and see if we can reopen it up and it's getting whether or not it gets rid of the d bit all right so let's open our directory so data images and then if we go and open our labels so we're going to change our output directories to data and then we can go into labels all right that looks like it's got rid of it so you can see that we no longer have that d label appearing anywhere because we've gone and deleted our app anyway okay so these are all the ones that have gone and successfully labeled so just restarting label me i think we'll pick up the new labels again all right so let's finish this off so again let's save automatically we're gonna go create rectangles so let's do this so we're going to finish it and hit face all right i'm probably going to fast forward through this so you'll see the end out come so let's uh probably speed this up key point to note for the ones where my face is in there but my hands are covering it i'm actually going to leave those in there and not label them so i'm gonna say the face is blocked so you can't actually see it so for this one for this one i'm gonna leave those so no annotation and again i'm just trying to capture the majority of my head as well almost did the d error again all righty those are all of our images labeled so again we've got some negative samples got plenty of images and plenty of annotations so again we've done 90 images and when we go and pump this through our augmentation pipeline we're going to get a ton more so i don't know what should that mean about 2 700 images all right so we've got a ton there or more i can't remember i can't remember how many we go and augment but okay so that is our set of annotations done so what you should have is inside of our images photo got a ton of images and then inside of our let's take a look so image we can open one of those up so again a bunch of images so we haven't actually gone and changed the image by annotating it we're creating that annotation inside of a separate file so those are our images which are looking brilliant and beautiful then if we go into our labels we've got our json labels so again at the actual annotation is inside of this shapes key here so shapes zero and then our points represent our coordinates so what you can see there now we've only got one label which is going to be face but we've got all of these different points to represent the bounding box coordinates for our face so we're going to be able to use those to do our object detection so that is our first bit now done so we've successfully gone and set up our implementation we've got and collected a bunch of images and we've gone and annotated using label me let's jump back on over to our client and have a chat and see what's next so we got data now what brace yourself we got a fair bit of data pre-processing coming we're now going to do a few things first we're going to take a look at some collected samples using matplotlib then we'll split it into a training testing and validation partition hold up why do we need to do that this is best practice the model is trained aka taught from the training partition but at the same time we use the validation partition to inform how we build the neural network so if we see the model is performing well on the training partition but it's shaky as hell on the validation set it might mean that we need to try some regularization or change our neural network architecture what about the test set we'll leave that bit right up until the end to finally see how a model has performed this should be a clear test of performance because we haven't used it in either our training or to determine how the model is built ah got it what else are we doing here a major key yeah all right dj khaled some may say i am the dj khaled of deep learning seriously now we're going to apply image augmentation to build out our data set we'll randomly crop and adjust our images to go from a small set of data to something way larger in this case 30 times bigger ah sweet let's do it then all righty and we're back so our client's pretty happy we've gone and collected a bunch of images of our face and we've gone and annotated them with label me as i showed you down there step two so we've got a bunch of stuff to do with this so we're gonna be going all the way up to step seven so first up we're gonna review our data set and build an image loading function we're then going to partition our unaugmented data so we're going to be splitting out the images and the labels that we just collected we're then going to apply our image augmentation and on our images and our labels using albumentations we're then going to build and run that augmentation pipeline and then we're going to prepare our labels and combine them all so by the end of this particular part we should effectively have a data pipeline so we'll have a training testing and validation data partition with our augmented data which has been gone and reshaped and pre-processed and we should be ready to pass this through to our deep learning model once we've gone and done all of this okay so first things first let's go on ahead and kick this off by reviewing our data set and building our image loading function so first things first we're going to import a number of key dependencies so we're going to be importing tensorflow and in order to do that we're going to run import tensorflow as tf so what you can see there and tensorflow is going to be used to build our data pipeline as well as to what we're going to use it for to actually build our deep learning model a little bit later on now we need to import it a little bit earlier on at least at this moment because we need to limit the gpu memory growth by default tensorflow is going to expand and use all of your vram so we need to implement uh memory growth limitation right up here so you need to do it pretty early we're importing cv2 again but i think we've already got it so we don't actually need it there i can actually take that out so the next thing that we're going to import is json so import json now why do we need json well if you take a look our labels are actually in a json format so you can see that there so we're going to need the json library to be able to load those into our python pipeline then we're going to import numpy so numpy is going to be used to just help us with our data pre-processing so we've written import numpy as np then we've imported matplotlib so from matplotlib import pipeline as plt and we're going to be using that down here to actually visualize our images so let's run those so tensorflow does take a little bit of time to import every now and then keep in mind as well if you've got a gpu this is going to train way faster so if you want to learn how to go and set up tensorflow for your gpu again jump back on over to the deep learning for beginners tutorial and i'll actually walk you through it all right so those are our four dependencies now imported so again we imported tense flow json numpy and matplotlib the next thing that we need to do so step 2.2 is limit our gpu memory growth so again this is some pretty stock standard code that you would have seen me use a ton of times really first up what we're doing is we're grabbing all of our gpus and then we're setting our memory growth equal to true down here so i won't explain that too much because we've gone through it a bunch of times just know that it's good practice to implement this when you're using tensorflow because it's going to ensure that you don't get too many out of memory errors um yeah got that in there that's pretty self-explanatory um and then we also want to test whether or not our gpu is available so we can write tf.test.is gpu available it normally says this is going to be deprecated but i still use it even though it is so it has returned true so you can also run this command here instead of this one so let me actually show that if i want to grab that this is the proper way to actually go and do it so you can see that our gpu is showing up there which means our gpu is available for deep learning okay cool so that is step 2.1 now done and 2.2 done so we've gone and imported tensorflow we've gone and limited our gpu memory girth next thing we need to do is load our images into our tensorflow data pipeline so in order to do this we are going to write images equals tf.data.dataset.list underscore files now i can already see an error in this so we're going to need to tweak this so what we need to do is we need to pass through the full path to where our images are and we also need to include a wildcard search which is this bit here so let me explain so right now we've got our data inside of a folder called so if this is where our jupyter notebook is so you can see facedetection.ipnb right there our data or our images are currently inside of data inside of images and then over here now i think i was playing around with this to test out training data but that that's fine so because our data is inside of data and inside of images we can actually get rid of this train bit but once we set up our training folders we might want to tweak this so we're going to put through the full file path to where our images currently are so in this particular case they're inside of data and inside of images i'll probably go and update this in the code afterwards so it's inside of github or at least you've got the full base flow so it's currently inside of data and inside of images so data images so it's right here which means that we are going to look inside of the data folder look inside of the images folder and then this is the wildcard search we're going to look for anything which has a jpg extension so you can see we are looking for a star and then dot jpg so anything with dot jpg we're going to pick this up in terms of its file path inside of the tensorflow data pipeline so i'll show you what this looks like in a second and then we're going to set shuffle equal to false because we don't want to shuffle i didn't realize shuffle was a thing included and this was causing me a ton of headaches previously as i was building this up but just know we're not going to shuffle it for now if i actually go and run this then what we should be able to do let's take a look at what else we got here let me show you this first bit so if i type in images dot as numpy iterator dot next it's going to return the full file path to an image now really really important thing to note is if you go and run that line and you've got nothing in there or it's not showing any images paths that means it has not picked up your images which means the rest of this isn't going to work so make sure you've got this full file path set correctly because it is really important to ensure your images are getting picked up in this particular case this is what you should be getting again the file name is going to be different because the unique identifiers are going to be different every time but you should at least be getting a file path out of this now you're probably thinking well nick this hasn't actually gone and loaded up an image what the hell we actually need an image to do object detection so that's exactly what this next step is going to do so we're actually written a load image function so we can actually i'll leave that there for you so we can write def or we've written def to define a new function and the function is called load underscore image to that we are going to pass through the full file path so this is what we're going to do down here when we use this actual function so we'll pass through the file path which is effectively going to be this and this is going to use two lines of tensorflow code to actually read in our image so first up we're going to run tf.io.read file and that is going to take in the file path which is going to return a byte encoded image then we can run tf.io.decode.jpg and then pass through that byte image so this goes into here and then we get our image back then in terms of using this with a tensorflow data pipeline you can actually use the map function so if we go um tensorflow data there it is i've opened this up so many times so the documentation for this is right over here so on this side it should say map map map over here so this actually explains what happens so this map applies map function to each element in the data set now the map function in our particular case is called load image so it's going to apply that load image function on each value in our data set which in this particular case are these file paths so it's going to go through and actually return our image so if we now go and run this we actually need to run the load image function if we go and run that now we're going to get our image back so we're effectively now mapping through each one of those images and we're now actually able to get an image back so this is what you should look or what it should look like after you've gone and run through that data transformation now what we can actually go on ahead and do is visualize those images because we've now gone and picked up all of our images from our images folder we can actually go and visualize them so that is step 2.1 2.2 and 2.3 done so we've successfully got some images inside of our tensorflow data pipeline and what i mean by tensorflow data pipeline if i type in type and then images uh we've actually gone yeah that's fine type and then images you can see that this is actually a tensorflow data pipeline so the fact that it says tensorflow python data data set ops that is a tensorflow data pipeline and again because the last uh function or transformation that we chained onto it was that mapping you can see it's returning map data set cool so that is looking okay now we can also go and visualize these images inside of matplotlib so we can go and batch these images up so again this is another function available inside of the tensorflow data set api so let me show you this one so if i go and select batch up here you could see this component it basically batches our images up so rather than returning one image it's going to return the number of values inside of a batch pretty common when we go and do deep learning so here we are going to batch it up into a set of four so we can visualize four at a time so image underscore generator equals images dot batch and we're gonna put four in each batch and we're going to return dot as numpy iterative so this actually allows us to loop through our images and get them back so images underscore generator we can then run images underscore generator dot next to get a next batch so plot underscore images equals image underscore generator dot next and we can actually run this line multiple times and it's going to return a new batch of data each and every time that's what the tensorflow data pipe or tensorflow fit model actually does or fit function actually does it'll go and run next train on a batch run next train on a batch run next train on the batch you get the idea so if i go and run this and then we go and run the next line this is just going to loop through and visualize our images so it's just using matplotlib and the subplots class here to actually do this so if i go and run that you can see we've got four images pretty cool we can go and run next again we're going to get another four images run next again another four images next how cool is that so we can now see a bunch of images that we've got now you can see that right now they're not shuffled so because it looks like it's actually like a moving image pretty cool right now if we wanted to we could actually enable shuffle up here and we'd actually get random images so let me show you that so if i disable shuffle here and we go and visualize so you can see they're all going to be slightly more random now right so you can see that it doesn't look like it's a sequential um set of images it's definitely a lot more random cool all right so we now have our images now done so that is step or part two now done so we've now gone and reviewed our data set and built at least our image loading function which we'll be able to use later to build up our tensorflow data pipeline so we imported tens flow limited our memory growth and took a look at how we can load images into our tensorflow data pipeline we also went and visualized our images using matplotlib next thing that we need to do is partition our unaugmented data now this is manual so i've written here this is this step is manually splitting our data and i've done this because it just gives you a little bit more control over actually splitting this if you wanted to use psychic learn train test split to actually do this by all means go ahead again i was running out of energy by this time to go and do that um full disclosure i wanted to to be real with you guys i always try to be again uh it's not all that pythonic and it's probably i've got some duplicate code but it does work does work okay so that's a key thing to know so what we're going to do now is we're going to go and move our data into train test and validation partition so we need to create a couple more folders so inside of our root folder actually inside of our data folder we're going to create another couple of folders so we're going to create a train folder we're going to create a test folder and we're going to create a vowel folder so we've now got four folders there so train test and valve inside a train we're going to create another folder called images another folder called labels and then we're going to jump back so inside a train we've now got labels and images and inside of tests again we're going to do the same labels images so again inside the test we've now got a labels and images folder and then inside a valve we're going to do exactly the same so labels images okay so it should now look like this so our data folder has now got a folder called labels called the images and these are the raw ones we've gone and collected or the unpartitioned data and then we've got a folder called train we've got again a folder called labels with nothing in it at the moment and we've also got a so labels has got nothing in it or images got nothing in it labels got nothing in it test again labels nothing in it images nothing in a minute and then if you're going to val same thing all right you get the idea so that is our set of folders now set up now what we actually want to go ahead and do is actually move some data in there so i'm just going to jump into the folders and i'm just going to go and set a bunch of images so we want let's actually work out so we've got 90 images total so let's say we want uh i don't know 70 assigned to our training data so let's say 0.7 so we'll assign let's say 63 so we're going to assign 63 to train and then let's go and assign i don't know let's say 15 to both vowel and vowel and test so 90 multiplied by 0.15 and then we'll assign what is that uh so we'll be doing 14 and 13 maybe so 14 and 13 to uh what is it test and val so that means so 63 plus what is it 27 down here we should have the full 90 wait am i is my mouth right 63 plus 27 yeah my mouth's right so 90. cool sorry mind blank okay cool so we've now successfully got our images now allocated what we actually need to do is actually go and split them up so let's go and grab what are we getting 63 for train so we're just going to randomly select and again this is not all that scientific but i'm just going to grab 63 images which is going to be the large majority right this is my random split function me just randomly clicking i'm just monitoring down the bottom to see once we get to 63 there's not too many we probably should have got some more data we could always add more later on but keep in mind we're going to augment as well right so we are going to be able to have a whole bunch more data so that's yeah okay boom boom all right that's 63 so i'm going to cut those and i'm going to put them inside of the train folder so i'm going to cut go to my train folder and go into the images folder inside a train paste all of those there so that is 63 images inside of the train images folder now we're going to go into images again and we are going to get 14 for what is it for val or test whatever one of those what's happened there why do we 39 okay wait why is there 39 looks like we copied and pasted them there so what do we need we need 14 right that's 14 images so i'm going to go and throw those into let's put that inside of test so 14 inch test all right so we've got a bunch of variations again some of us without a class or some of those without a face in it so let's go back into the let's grab the rest that's train so we want to go back into the raw images folder grab these last ones and we're going to throw those into the valve folder wait we've already got them no about cool all right so we've got 13 images inside a val we have how many inside a train we've got 63 inside a train and if we're going to test we've got 14 inside a test cool all right that is looking good so 63 assigned the train of 14 assigned to val i think no we assigned god i can't remember we assigned the rest to valentes but let's just make sure we've got this noted uh where are we we are inside of let's go check test so how many is inside a test so we've got 14 assigned to test okay 14 to test and 13 about okay so those are now successfully assigned now rather than doing the exact same for the labels i've gone and read in a script which basically loops through the train test and val folder and again it's looking for those folder names so you've got to ensure that they're the same so train test and valve so over here we have inside of data so we've got train test and val and inside of those we've got labels and images so what it's going to do is it's actually going to move the associated labels from the raw folder which is inside of currently in still inside of the root labels folder it's going to grab all of these and it's going to move them into their respective matching folders so it will match them up to val train and test so if i now go and run this all things holding equal we should effectively have our labels moved over so if we go into labels you can see there's nothing left there now if i go into train and labels you can see it's going to move those annotations so if we go into test and labels going to move those labels as well if we go into val and labels you can see it's going to move all those labels so that is our data now partitioned so we've now successfully gone and partitioned out our unaugmented data so again remember what you need to do is you need to go and move out of the the data out of the labels and images folders into those valve training test folders now if you go and add more data later effectively all you're going to need to do is go and push that data back into those folders again and push the annotations into their matching folders so that when we go and do augmentation it does the exact same okay so that is step three now done so we've successfully gone and moved over our unaugmented data and we've gone and moved over the labels now we are up to applying our image augmentation so first thing that we need to do is import albumentations so again albumentations is this library over here and if you actually scroll on down it shows you how to actually use this so we effectively set up a transformation pipeline which looks like you can basically assign a whole bunch of different transformations so over there you've got random crop horizontal flip random brightness so on and so forth there's a whole bunch of others i think we're going to use some others as well and then you need to pass through your bounding box parameters as well okay so first thing we need to do is we need to import it so i'm importing abumentations as alb so give that a sec water break running out of my throat is going okay so then the next thing that we're going to do is actually define our augmentation pipeline so this is what the recommended not the recommended one what the example one looks like i've gone and added a bunch of other things so let me actually break this down so you can see it boom and then uh yeah boom okay so what we've actually got is six different uh augmentations that we're going to apply we've got random crop horizontal flip random brightness contrast random gamma rgb shift and vertical flip so these ones are actually gonna the random crop is the trickiest one to handle if you were to do this well actually no random crop horizontal flip and vertical flip a tricky to handle if you didn't have something that does your annotation um augmentation as well mind blank okay so al alb dot random crop we're specifying how big augmented images are gonna be so we're actually going to cut them down to 450 by 450 pixels because right now they should be uh what is it so if we take a look at height so it should be 480 pixels by 480 by 640. so it's going to be basically 480 by 640. so we're going to cut them down a little bit so that means that we can do that dynamic crop then we are going to and let me actually just show you that so if we go and grab an image um cv2.i am read let's grab a image so if we go into train images this purely option i'm just showing you guys osl path to join at train so it should be inside of data dot train dot images and then let's grab one right so i'm just grabbing a random image at the moment just to take a look so uh let's actually assign it a variable name so image.shape yeah so 480 pixels high by 640 pixels wide by three channels deep so again we're going to be cropping that down and this might vary so if you capture images of different sizes or if you've captured images on your iphone again um you can still use this just be mindful of the image shape so you at least want it to be or that the minimum dimension needs to be should ideally be there should ideally be bigger than whatever you're going to crop it to so if your image is 100 pixels by 100 pixels well you're going to have a hard time cropping into 450 by 450 because it doesn't even meet those minimums so just something to keep in mind there okay so that is our augmentation pipeline so you can see inside of square brackets i've got those different uh transformations there this is my ocd kicking and it kills me that that wasn't aligned okay so we've got random crop horizontal flip random brightness random gamma rgb shift and vertical flip you could take some of these out you could add more in if you wanted to i've just found that those six give us a sufficient amount of data and then we are going to specify our bounding box parameters so in this particular case we're specifying the format of our annotation so the let me actually show you this because it's really important so if you get annotations different formats it's important to note that you need to go and change the format down here so right down here this is the the different formats that it's expecting so pascal voc is a really popular format so you've got x min y min x max y max and these are unnormalized images so they're not or unscaled images so this hasn't been adjusted for the size of the image so you can see it's got the raw image size so 98 3 45 420 462. we are going to go and divide our images by the shape of our or divide our labels by the shape of our image so you can see there that these ones have actually been normalized so they've been scaled for the size of the actual image so rather than having the raw coordinates they've actually been divided by the width by the height by the width by the height which is exactly what we'll do you can also use coco or yolo as well so those are again each one of those different models has a slightly different way of representing the annotation so we actually start off in pascal voc and then we go and adjust it to be inside of this arbumentations format as well okay so that is our augmented done so i've just gone and run that so that is our augmentation pipeline and now set up now what we can go ahead and do is go and test this out so the next thing that we're going to do is actually load up an image and run it through through the augmentation pipeline so first thing what we want to do is load up an image using cb2.imread and to that we're going to pass through an image from our training data set so data train images and then you need to go and replace whatever this file name is here if you go and run this right now let's run this oh that image exists weird okay so maybe it's gone and generated that actually no so if i go and run image you can see it's blank so right so if i'm running that it's blank right now so it won't throw an error it'll just return nothing and that is because this image doesn't exist right now which is what i was expecting so if we go and grab this image though so grab that image name and paste that is my head covering that yes it is and go and paste that in there now if i go and run this you can see our images return back so just something to keep in mind so it won't throw an error it just won't return an image and you'll be sitting there like what is happening it's not working nick just keep in mind that you need to go and pass through an image or the image name from your actual images folder okay then what we can go ahead and do is load in our matching annotation so we're going to go into our os.part.join we're going to go into our training folder and into our labels folder and we're going to grab the matching annotation now assuming we actually have a face inside of this image here the annotation name will be the exact same as the image name except the only thing is the extension will be json so if i go and paste that there and run this looks like no issues so we can actually take a look at our label boom look at that so we've got our label cool thing to know is if you wanted to go and grab the class so we can go into again this is a dictionary so let me show you so type it's a dictionary so we can just navigate it as a dictionary so we can go into shapes and that gives us our annotation go into you can see that that is inside of a list right to go into a list we can use number indexing so we go grab index zero and then we can go and grab our class so if we wanted to we've only got one class again let me know in the comments below if you wanted me to do more for multi-class or multi-object so that's our label then we can go and grab our points as well so if we go and grab points boom those are our points pretty cool right okay so that is our image successfully loaded our coordinates successfully loaded our annotations now what we can go ahead and do is extract coordinates so we are going to do exactly what i've shown you up here we're just going to store them inside of a coordinate array that you can see there so if i go and run this and go and take a look at codes and again this isn't super pythonic but again just for ease of reading i'm showing you what it looks like so we've now gone and taken what you can see over here and transformed it into a um a simple vector right so it's not a multi it's not a what is it a tensor which is higher than rank one so we've effectively just got a vector over here with all of our coordinates so 191 191 77 77 350 350 315 315 quick drink break all right so this is going to be x1 y1 x2 y2 so this will be the top coordinate this will be the bottom coordinate now the next thing that we want to do is go and do that transformation that i was telling you about so is it going to be x2 yes because x yeah okay cool that that makes sense all right so we're going to go and convert this set of coordinates into so you saw that we had pascal voc here which is unadjusted for the size of the image so this is based on this raw image over here so it's 98 345 which should be 98 345 and then force let me zoom in because you probably can't see that you can see that this annotation is 98 by 345 98 by 345 which is that then 420 by 462 so 420 by 462. what you're effectively going to do is you're going to divide this particular value by the width of the image so it'll be divided by 640 this particular value by the height of the image so it'll be 480 so that gives you these sets of values over here right that's effectively what we're going and doing over here so we're grabbing our raw coordinates which is this and we're dividing it by the width of the image and by the height of the image that will divide these two then we're going to divide this value by the width of the image by the height of the image that will divide these two so we're going to run this and take a look at our chords boom so we've now gone and transformed our coordinates from the raw pascal voc format which is this over here to the albumentations format which is this over here okay cool so that is looking good now what we can do is we can actually go and run that image and those sets of coordinates through our augmenter so what we're going to actually do is return back our augmented uh what is it i think it actually returns back a dictionary so augmented equals augmenter and then to that we're going to pass through our image we're going to pass through our bounding box coordinates and we're going to pass through our class label so if i go and run this let me show you augmented so you can see you actually get a dictionary back so let me prove that too so if i type in type got a dictionary and that dictionary has should have two keys so it's got it's got three keys so it's got an image it's got bounding boxes and it's got class labels so if we go and take a look so images or image image image right so there's that's our image which should now be 450 by 450 because remember this cropping is going to change the shape of that then if we go and take a look at our labels there's a label or labels i can't remember now labels what is it b boxes not labels b boxes right so those are our coordinates and you can see that they differ from our raw coordinates that's because we've gone and probably done a crop so if we go and take a look now so these two lines are actually going to allow you to visualize that so cv2.rectangle actually draws a rectangle on an image so um whenever you see the cool object detection tutorials that's probably what's being used so augmented so to that we are going to pass through our augmented image so this and then we're also going to extrapolate and get the true tuples that you need to actually draw the bounding box so just know this is the top most coordinate this is the bottom most coordinate so over here what i'm doing is i'm just grabbing the first two values so let me show you these are the first two values so if we go and take a look at augmented boxes b boxes you can see first two values to 53.33 and remember these correspond to x-min y-min and then if we go and grab the last two values which would be like this last two values this is going to represent x max y max which is this value and this value over here so now and then the other parameters so you also need to represent this as a tuple and we're going and rescaling it to represent the size of the image because once you go and transform it into normalized values you need to go and untransform it in order to go and render otherwise it's going to look really really small so that's that transformation that we're doing there we're also representing it as an integer and then we're passing it through as a tuple because that's what opencv expects and then we're also dynamically changing or we're not dynamically changing the color there we're just specifying what the color is and this is should be in bgr format so blue green red and then the thickness of the actual image or the actual rectangle so if we go and run this look at that so it's actually gone and flipped our image and it's also gone and drawn the bounding box around it now even though it's showing up blue it's not actually blue in reality it's showing up blue there because opencv reads an image as bgr but matplotlib renders it as rgb so that's fine it's not actually blue just keep that in mind but you can see that it's successfully gone and augmented our bounding box but it's still showing up around our face now we've only done this for one image right so i've sort of shown you and walked you through it for one image but i really wanted to drill into this because i think it's such a really really important topic so that is for step four now done so we've set up albumentations and our pipelines we're actually going to use this for our full-blown pipeline now we've gone and loaded a test image and tested it out we've gone and taken a look at what the annotations look like gone and applied some augmentations and then we've also gone and visualized it so that is step four now done we are now up to step five so the first part of work that we're going to do inside of step 5 is run our augmentation pipeline over our training testing and validation partitions for all of our images because up until now we've only really done it for one image right but we need to do it for all of our data to get true value now this is a ton of code and something that i spent a while writing up so probably not worthwhile going through every single line in great detail but i'm going to explain to you at a high level what is actually happening here so first of what we're going to do is we're going to loop through our training testing and validation folders and we're going to grab every single image inside of that we are then going to double check whether or not an annotation exists for that image because remember some images aren't going to have a head in them if an annotation doesn't exist well we're going to create a default annotation which is this set of coordinates here so we're just basically going to assign a zero set of coordinates and at the same time we're also going to assign a class of 0 down there now assuming a set of coordinates do exist we're effectively going to go and do that exact same transformation over here which is what we did right up here so we're extracting our coordinates and we're taking it from a set of or a tensor which is a stacked array to a straight vector which is what you can see down there so that is exactly what these two lines of code are doing so up until now from these all we're doing is loading up the image and we're loading up the labels then what we're doing is we're going and creating 60 images per base image so for x in range 60 so this means that we're going to be grabbing we're going to be creating 60 augmented images for every single base image so that means that for our 90 images that we created we're going to be creating 90 multiplied by 60 images or raw augmented images that we're going to be able to use then effectively what we're doing is we're running our data through our augmentation pipeline which is what you can see there so augmented equals augmenter and we're going to effectively be doing what we did just here so we're just going to be doing it first 60 times to create 60 different images then what we do is we write out the augmented images or the augmented image and we're going to be putting that inside of a folder called org data so i'm actually going to show you how to set that up in a second and we're going to do likewise we're going to transform our coordinates and we're going to write down those annotations using json.dump as well so by the end of this we should have 60 multiplied by 90 images and annotations for our training partition likewise the exact same for 60 by 90 for all of our data right but remember we've partitioned it out so we're going to have some of those split up into training testing and valve okay so this is going to do a ton of data augmentation for us if you wanted to do more data you could actually just bump up that number there so let's say you wanted 120 images per base image it'll be 120 here i found that 60 is a good mix and gives you enough data so let's zoom back in cool all right so we're there so what we now need to do is set up where our augmented data is going to be all we need to do is go back into our root folder so this is our raw data here right so we're going to create a new folder called org underscore data and inside of that we're going to create a training so train test and valve folder and we're effectively just going to replicate what we did for our data folder so inside eval we're going to create a folder called images and then a folder called labels and then inside of train we're going to create a folder called images and labels and then inside of test you guessed it we're going to create a folder called images and labels boom right so we've now got three folders inside of org data so we've got a valve folder which has labels and images we've got a train folder which has labels and images we've got a test folder which has labels and images but as of right now we don't actually have any data in there we actually need to go and run this augmentation pipeline to do that so let's actually go and run this so if we go and step out of it a little bit now this will take a little while to run because it's doing quite a fair bit of stuff but if we actually go and run this you can see we've got the little star over there so this is now running out oh that searches popped up it's happening over here so you can see that we are now running our augmentation pipeline so let's give that a second a run and then we'll be able to test it out and see how it all looks cool so that is now done so you can see that our code cell is successfully completed doesn't look like we've got any errors there so there's this over here so i've actually set it up so that if we've got invalid annotations for whatever reason it's actually going to drop them as well so just keep it looks like we've dropped out two images there perfectly fine you're still gonna have a ton of data which is why i set this to 60 up here okay let's zoom back in so now if we go into our org data if we're going to train and images let's open that up yep all right we've got a ton of images right now if we go into our training folder and if we go into labels we should have a ton of json labels there you go so this is what our augmented labels look like so we've got our bounding box which represents all of our four coordinates we've got our image and we've got our class so we now have a ton of augmented data and again you can see that our images are now for 450 by 450 as well so this is how to look at it manually let's just double check test so we've got labels in there and we've got images and if we go into val should have labels perfect and we've got images as well all looking good alrighty cool so that is looking good for now you can save this so the next thing we want to do is actually load some augmented images and take a look at them inside of tensorflow so this is where it gets unpythonic i wasn't too happy with what this code looks like but it works okay so if you've got a better method of actually going and building these up again you could just loop through and build this better in the interest of getting this tutorial out in time and showing you how to actually do this i actually went and just copied the code three times and re-labeled the various but it does work okay so we're now going to load in our images into our tensorflow data set so our training images are going to be inside of a variable called train images our test images are going to be inside of a folder called test images and our validation images are going to be inside of a folder called val images all we're doing is exactly the same as what we did right up at the start we're using the tf.data.dataset.listfiles method and we're grabbing or using that wildcard search to pick up data out of our train folder inside of our augmented data repository we're then using the load image and we're using map to actually load that image up we're then resizing our image to be 120 pixels by 120 pixels so we're compressing it even more and the reason that we're doing this is that we are able to make a more efficient neural network that way so that means there's less data that we need to pass through to the neural net still seems to work fine and we're also scaling our image so we're dividing it by 255 rather than our values being between 0 and 255 they're now going to be between 0 and 1 which means that we can apply a sigmoid activation to the final layer of our neural network okay and keep in mind we've set shuffle equal to false here so really really important because we're going to go and load our labels in the same format so we need them to be in the same structure okay so that is our those are our image pipelines now set up so if we go and run this this this runs pretty quickly so if we get go and take a look at train images dot uh as numpy iterator dot next so that is our image and you can see that it's now much smaller values this is because we've gone and scaled it over here cool right okay next thing what we want to do is we actually want to go ahead and prepare our labels so again we are just going to write up a function to load our labels so def load underscore labels and we're going to be using this similar to what we did for our images with the map function inside of the tensorflow dataset api so def load underscore labels we're going to specify the label path and we're really just going to use a with open statement here so with open and we're going to grab the actual label to grab that we need to use the dot numpy function we're going to read it in and return the label over here and we're going to extract the class and the bounding box so what we're actually going to be extracting is if i go into labels we're going to be extracting the class which is this and the bounding box which is this so in this particular case that there mustn't been a face in that particular image that it's gone and annotated hence why these are all zero let's find another one it looks like it's all zero stop all right so this one has got an actual annotation so you can see we've got our coordinates and we've got our class what this is doing down here is it's just going to return an array or that's actually going to return two arrays one of which returns the class so zero or one and the second actually returns the bounding box but again i'm going to show you that in more detail in a second okay so that is our label loading function so we can go and run that particular cell there and then we're going to load the labels to our tensorflow data set so this is uh also key point to note that's step five now done we're now up to step six which is what we're doing right now all right so step 602 we're going to use tf.data.dataset.list files we're pretty similar to what we did for our images but this time we're going to be loading our json objects and over here we're just doing it for our train functions at the moment and then we are going to use the dot map function this is weird so i had to do some some screwing around to get the label loading to work with the tensorflow data set pipeline because it doesn't by default render the actual label but just know that i've wrapped it inside of this tf.pi function here to have these strings available to our loading function but it does work so train underscore labels.map we're passing through using a lambda function so that means it's going to loop through each individual file name we're then wrapping it inside a tf.pi underscore function and we're using this load labels method over here to that we're going to be passing through our labels uh our file paths which is what we're getting let me actually show you this let's run this first i'll try and explain too much so train underscore labels dot as numpy iterator dot next all right so you can see we're getting the full file path to that particular json object this is going to be passed through to this so ignore all the complexity that you're seeing here just know that this particular file path is being passed through to this with open statement and we're going to be returning our annotations back okay so that is that now done so let's go and run this and then we are going to do it for our test labels and we're going to do it for our vat labels again non-pythonic and duplication of code i know but it definitely does work okay so those are our labels let's actually take a look at those so if i go to train labels train underscore labels dot as numpy iterator dot next you can see that we have two values being returned so we've got this array or this first array which is the class so face or not face and then we've got this second value over here which is our set of coordinates so we are looking i was just noticing there's a ton of sun coming in this way now but that's fine okay so those are our labels and our images now loaded so we've gone through and run our load labels function over each value or each json object or file path inside of that data set and we've gone and loaded it using that load labels function the next thing that we need to do is just combine our label and our image samples so think about when you go and train something like scikit-learn or inside of a different machine learning framework normally you you'll have the input features and the labels right right now we've got them separate so we've got images over here and we've got labels over here my hands aren't inside of the frame so we've got them separate we want to combine them so that it represents one sample so the input features and the labels so first thing that we're going to do is just check our partition length so i'm just printing out the length of each one of these partitions so our training images or our training images and labels we've got 3720 samples for testing we've got 840 and for validation we've got 720. kind of crazy right so we've gone from having uh what was it 90 images to what is that well over 4 000 probably approaching 5 000 samples kind of crazy i think it's absolutely amazing okay so that is checking our lengths then the next thing that we need to do is actually join these up so again i'm just repeating this three times for train test and valve if you've got a more pythonic way to do it let me know update it and i'll um i'll hook you guys up and and we'll push it into the repo but um just know that this does work so we're going to be doing our training testing and validation samples okay so what are we doing here so we're using the zip method to combine these so we're effectively going to be combining each one of the examples in each one of these data sets so the zip method effectively allows you to build that type of generator so training equals tf.dataset.zip and then to that inside of parentheses we're finding it passing our training images and our training labels then we're going to shuffle it up and ideally you want the shuffle buffer to be bigger than the size of the data set so we can actually drop this down we could make this 4000 over here 5000 should be fine for now i think i had more images when i was building this up and then we're going to batch it up so this means that we're going to have each batch is going to be represented as eight images and eight labels so train equals train dot batch and then we're passing eight to that and then we're going to prefetch so this helps eliminate bottlenecks when you're loading and training your data so train equals train.prefetch for and then we're doing pretty much the same thing for our testing partition and our validation partition so if we go and run these boom boom boom all things holding equal we're looking good so now if we go and run train and i know there's a lot of data pre-processing guys this is just part of the part of the deep learning process the train dot as numpy iterator dot next we should now get eight images eight annotations so if we go and grab the first thing dot shape should be eight by 450 by 450 by three oh actually by 120 because we've gone and resized it as well so we've got eight images by 120 pixels wide by 120 pixels higher by three channels because it is a colored image if we go and type in dot one here it should return the label uh it might not be a numpy array that's fine yeah cool so we've got all of our different classes and we've now got all of our bounding boxes down here you can see we've got some zero samples so some negative samples over there over there looking good all right cool that is our data set now done now before we go on and start doing some deep learning let's actually go and view these so we'll grab one sample so again you've seen me write this a few times so train dot as numpy iterator allows you to loop through all of the different batches so data underscore samples equals train dot as numpy iterator so if we run that and then we can grab the next batch so res equals data underscore samples so what we're grabbing from here dot next to grab the next batch might take a little while and then we can actually go and plot it out so if i go and run this boom look at that so we've actually got our images annotated and augmented pretty cool right so if we go let me zoom out a little bit so you can see that a bit better how cool is that so if we go and run another sample look at that so we've now got a ton of augmented data and you can see some of them are darker some of them are lighter because keep in mind remember we had a random brightness function inside of our augmentation function so all of this hard work has led us to this looking pretty good though right and again you can keep running the data dot underscore samples.next to keep getting the next batch so that's effectively what i'm doing eventually i'll reach to the point where i've gone through all the batches and it'll return none so if you run this too many times you'll get to the end in that particular case just run this again to regenerate or restart the uh the iterator and then you can go and grab the next batch again i can keep running through it okay but that is our data pre-processing now done so i know we did a absolute ton of stuff in there and wrote a ton of code or went through a ton of code so just know we've gone and successfully reviewed our data set and built our image loading function we've gone and partitioned it and remember we did that manually we've then gone and applied albumentations for our augmentation pipeline we've then gone and built our augmentation pipeline and generated a ton of data so we've now got what 3720 let's zoom in so we can see this a bit better so we've now got 3720 examples in our train 840 in test and 720 in val and we've gone and created our final data sets which we've gone and visualized down here and this is all just done using matplotlib so again we're using subplots and we're drawing a rectangle using opencv and then using i'm show to actually visualize the image okay that is our data set now done let's jump back on over to our client and see what's next so we're on to the fun bit deep learning so how are we building this well if you think about it object detection really has two parts classification so determining what the object is and that can really be thought of as a binary classification problem and finding the coordinates for the bounding box so x1 y1 x2 y2 so on and so forth now that second part of the problem is actually a regression problem interesting yep so that's exactly what we're going to build we're going to use a vgg16 base architecture and add on our final prediction layers one for classification and one for regression in order to do object detection sweet but how do you train it then i thought the keras sequential method expects one input and one output and one loss function you're mostly right we're actually going to use the functional api for this model this will allow us to have two different loss functions and combine them in the end one for the classification model and one for the regression model the latter will actually be called localization lost and we'll write this function ourselves huh nice i guess let's do it then all right he just went and had some lunch it's time to wrap this up so our client gave me a lunch break it is time to build our deep learning model okay so i'm gonna delve into this now so step eight we have what do we have four steps that we need to do so we're gonna import our layers and our base network now what i mean by base network is this bad boy over here so let's quickly take a look i think i've got a bunch of extras here we'll go back through this so for now what we're going to do is first up import the model class so from tensorflow.keras.models import model so this is the base or what all of the tensorflow models are built from right then we're importing a number of layers and i probably should clean this up but for now just know that we're going to import layers so from tensorflow.layers actually let's take a look i'll clean this up so on the fly so we're using input vgg to do so i think we can get rid of flatten yep we don't need that we don't need add i don't think we've got dropout let's clean this up we don't need that we don't need relu yeah perfect all right cool i just want to make this super clear for you guys so you you're not looking at this and like why the hell did he import that okay that's what we're going to import so from tens floater carousel models import models so that's our base model class for our layers so from tensorflow.keras.layers import input conf 2d max pooling 2d dense and global max pooling 2d wait did we use that oh gosh you guys are going to shoot me and we didn't use max pooling we used global maxwell okay get rid of that as well there you go nice and simple okay so we've got import com 2d dance global max pulling to i guess the reason that i've got those layers is as i'm prototyping this for you i'm like testing out a bunch of stuff and like trying to build specific types of neural networks so if ever you see stuff in there and you're like he's got extra things i'm not using this that's probably why okay so these are our layers and then we're going to import a vg g16 so this big bad boy so vgg16 is a huge neural net or reasonably big neural network that i built believe was built for image classification and it's got a bunch of convolutional neural networks so that's what vgg16 looks like the nice thing about keras is that we can actually pick this up and use it so we're actually going to cut it off here and pass from this big block over there into our specific regression model so this is obviously been pre-trained as well so you don't you can leverage that knowledge inside of that neural network and this is how it's intended to be used so let's first up import this now what we need to do is create an instance of vgg so what we can do is write vgg equals vgg and this is standard architecture actually good good segue so a lot of the time i get comments on nick how did you design this neuron net how like how do i pick the number of units now a large part of this is based on research that's out there so i'll go back and i'll take a look at different types of neural networks and go hey they've built it like this maybe i can tweak it for these particular sections and get my output in this particular example i think i started off with the ssd architecture so if you've ever seen the tensorflow object detection course that i've got then this is the model that we actually used for that so we used a ssd model for object detection so and you can see there that the ssd model is using vgg16 so i was like why not capitalize of what the greats have done and we'll use vgg16 as well and that's exactly what i've done so but obviously it's not exactly the same as this we've gone and tweaked it so we are going to be using vgg16 and that's the first thing so vgg and i'll probably explain a little bit more as to why i've made certain other design decisions as well but just just bear with me for now i'm going to go into this so vgg equals vgg16 and then i'm saying include top equals false so the final couple of layers in vgg16 we don't actually need because we're going to apply our own so we don't need all this stuff here because remember our object detection model is really two models it's a classification model and a regression model the vgg16 model is actually a classification model so we need to get rid of those final classification layers and sub it for our own final layers which is exactly what you're going to see in a set so vggx vgg16 and include top gets rid of those final layers so we're gonna run that and then we can actually take a look at the vgg model keep in mind if you haven't run this ever before it's going to take a little while to download i can't remember i think it took me like three minutes on my net which is slow as hell um so vgg dot summary is going to actually show you what the neural network looks like and see the beauty of this is that this is actually a pretty big neural network so what is it like 14.7 million parameters we don't need to go and train this obviously we're going to fine tune it but this is already giving us a bunch of knowledge inside of our model because it's been built for image classification and it's already got a number of convolutional neural network layers in there so you can see we've got a conf 2d layer there which has 64 um layers they're all 64 kernels we've got a max pooling layer so that bring or condenses that information down again cons layers comp layers max pooling convolutions convolutions max pooling now on the end here so you can see that we've got none none none none 512. so effectively what this is going to become is 120 or whatever at the original image is so 120 by 120 by 512 and obviously this might get changed or tweaked tweaked a bit depending on what types of inputs and what other transformations have been done but this is what these nuns represent so the number of samples plus the uh effectively the what is it with plus height and then the number of channels but this is obviously going to get transformed a bunch depending on whether or not the padding is set to same but just know we're going to be taking a really bulletproof architecture we're going to plug be plugging it into our object detection model so that's the beauty of it so that's vg dot summary so again you can use dot summary to get information about a whole bunch of different types of neural networks inside of keras so let's build our neural network so let's run this so what we've actually gone and done is let me show you so first up what we're doing is we're specifying an input layer so whenever you're building a neural network remember you need to have a bunch of inputs and either an output or multiple outputs so let's start with the top and the bottom first so our input layer is going to be an input which is using this input class up here and this is actually using i believe this is using the functional api which gives you a little bit more flexibility so we're using an input and our shape is going to be 120 pixels by 120 pixels by 3 pixels don't worry about any of this now let's take a look at our output so our outputs are specified over here so we've got two outputs we've got a classification output and a regression output now how crazy is that that matches to what our annotations look like remember we've got one output for our classification let's take a look so if i go and type train dot as numpy iterator dot next and let's go and grab our outputs a look at that so we've got a classification output so 1 0 1 0 1 0 and then we've got some regression outputs which are four different values now let's go back and break this down a little bit so our classification is mapping to class 2 which is class 2 over here and that particular layer is a dense layer which has one output so that means it's going to app one value and because it has an activation of sigmoid that means it's going to be a value between 0 and 1. remember that activation acts like a modifier to the output of a neural network so if we take a look at our sigmoid layer i want sigmoid activation actually that's it this is what a sigmoid activation looks like it takes any input and it maps it through to a value between zero you can see that zero is the bottom value on our y axis and one so that's what's going to happen which just so happens to be 0 and 1 in terms of our classes now our dense layer is going to be over here so if we take a look at our sorry our regression layer our second output is going to be regress 2 which is regress 2 that particular layer has four outputs which also has an activation of sigmoid we've got four outputs in our labels and remember they're scaled between zero and one because we've gone and divided it by the dimensions of our image that is effectively the crux of building neural networks focus on the input and getting to the output and then you can tweak what happens in the middle so you can tweak the hidden layers now in terms of our hidden layers let's actually go and take a look at what's happening in the middle of our neural network now so once we've taken in our input of 120 pixels by 120 by 3 then we're taking that input layer and we're passing it through to our vgg g16 layer so you can see here we're instantiating vjj as vgg16 and we're including or we're specifying include top equal to false because we want to drop off those final layers through that we're going to be passing through our input layer so you can see that we're what we're actually doing is we're creating we're doing two things here we're creating the layer and we're telling the layer what we're going to be passing through to it from our neural network so effectively we're going this is our input layer input layer goes to our vgg layer and then our vgg layer is going to be put into something else this is how the neural network is stacked up right so input layer vgg and then here we've actually got two effectively two different output heads or prediction heads that's what people typically refer to the final outputs of a neural network right so we've got two prediction heads this first prediction head over here is actually our classification model and what we're doing is we're first up condensing all of the information from our vgg layer using the global max pooling 2d layer think of this as taking or looking across all of the different channels in our vgg output and condensing it and only returning the max values so what will end up happening is that rather than having all of the dimensions from this 512 layer or this layer here we're effectively just going to get 512 values back and these are going to be the max values out of each one of these channels across the board so we're then going to take that output which is now stored in a value called f1 and we're going to be passing that through through a fully connected layer so that f1 value which represents the output from our vg16 layer goes to a dense layer which has 2048 units with an activation of relu that value which is now stored inside of class 1 gets passed to our final classification layer which is class 2. i've just called it class 2 because classification second layer you sort of get the idea remember that's got the one output because it maps back down to here now likewise we've got our regression model here so this think of this is our bounding box model this is again going to take in our vgg outputs you can see that there and we're again going to apply max pooling 2d then that output which is now stored inside of f2 goes to a regression oh this should actually be f2 down here that's actually an error so f2 comes from over here gets passed into this layer so that now means that we are going to be taking our vgg outputs passing it through to f2 and what we'll get out of this is regression one is then going to be represented as a dense layer of 2048 units with an activation of relu and then that regress one value goes to our final regression layer which means inside of regression two we've got a dense layer with four units an activation of sigmoid which maps through to our bounding box coordinates down here so all in all think about that as this so we've got an input which goes through a massive classification model and then those two outputs get broken down into a classification model and a bounding box model and then we combine it all together using the model api down here so model is over there to that model we basically pass through what our inputs are going to be which is our input layer and what our outputs are going to be which is going to be the class 2 outputs of the class 2 model and outputs of the regress to layer pretty cool right and then we return that back so now if we run this we've effectively instantly or we've created a function which instantiates our model so we can actually go and create an instance of it now which is what we'll do inside of 8.4 so we can use face tracker under or equals build underscore model so we're actually going to be running this build model function because remember that returns our neural network and then we can actually take a look at that neural network so let's uh disable scrolling all right so let's take a look at this so we've got our input layer over here which is 120 by 120 by 3 that gets passed through to our vgg16 layer which remember is going to be 99 and then 512 but it's really getting this output over here and then a global max pooling layer is reducing the outputs of this vgg 16 layer rather than going 99512 it goes to 512 outputs so based on each sample these two values then get passed to our true dense layers so over here this is or one this first one over here is the output from our classification model so that then gets passed to dense one which gives us our classification output which is just one value this dense layer over here you can actually specify names for this so it's a lot clearer which layer maps to which but i haven't done it here so dense two is going to be 2048 values which again maps to four bounding box values all in all our neural network is 16.8 million parameters so quite big okay that is our neural network now done now just remember like when i'm actually designing these i'm thinking about what's been done in the past can we leverage that information and what does our final output need to look like that's the core crux of how to build these okay that's our summary taking a look at we've taken a look at that so we can now go and grab a sample out of our training pipeline and that is going to unpack x so x will be our images y will be our labels we can view that so x again images y labels cool now we can take a look at x dot shape so again 8 by 120 by 120 can take a look and actually pass this through to our face tracker model so face tracker dot predict and we're passing through our images so ideally what we'll get out of that is our classes and our coordinates right now we haven't trained it so it's going to be crap but this is ideally how you're going to use it in the end if i go and run that now we should get some predictions so that might take a little second and if we go and print it out take a look so we're now getting all of our classification outputs as this first value and all of our coordinate values as this second value so once we go on train it should learn how to estimate these and return values which are closer to the actual outcomes or the closer classification outputs and the closer bounding box outputs so that is step eight now done so we've actually gone and now successfully built our neural network but keep in mind because we've got two outputs we actually need to do a little bit of tweaking when we go and train because we need a specific loss function for our classification model and a specific one for our regression model so that's what we'll do in a sec but just know we've now gone and successfully built our neural network we are now up to step nine defining our losses so first up our losses in our optimizer first up what we're going to do is we are going to specify what our learning rate decay is going to be and again i didn't get too fancy on this i think i actually found out how to calculate this from stack overflow but basically what we're specifying here is how to decrease the learning rate in such a way that we're 75 of the original learning rate after each epoch so this means that we'll slow down the learning so ideally we don't overfit and we don't blow out our gradients um first thing we need to work out is how many batches we've got within a within our training data server so if i type in len train we have 465 so what we need to do is replace this value here so batches per epoch should be 465. we could actually even just set it to this right so that way it's correct each time and this is going to specify a learning rate and then what we're going to do is set up our optimizer so we're going to be using the atom optimizer so think about it your optimizer is working out how to go and apply gradients and effectively apply backprop across our neural network so opt equals tf.keras.opt and then to that we're going to pass through our learning rate which is going to be 0.001 zero zero zero one and then our decay is going to be this decay value over here so if i actually show you this decay so that is effectively how much our learning rate is going to drop each time we've gone through one particular epoch again you could just plug in a value here as well if you wanted to if you didn't want to go and do this calc okay so then what we're going to do is we're going to create our localization loss so let me actually show you what this looks like so localization loss i saw a really good formula for it i think it was this one yeah take a look so basically what we're actually calculating here is this component so the this bit and this bit now we're not we're not square rooting it that's perfectly fine so effectively what we're doing is we are getting our distance between our actual coordinate and our predicted coordinate so this is what this line over here is doing so what we're doing is we're getting our y true value which is going to be our first coordinate minus our second coordinate and this is going to do our x and y at the same time so we're then going and squaring the difference and we're reducing the sum so we're effectively summing it all back up together and that gives us our delta value over here then we're calculating our actual height of the box the actual width of the box the predicted height of the box and the predicted width of the box and we're doing the pretty much the same thing that you're seeing down here so we're getting the true width minus the predicted width we're squaring it we're then getting the true height minus the predicted height we're squaring it we're adding both of those two values together so you can see the plus there and then we're using tf.reduce sum to reduce that value into a single outcome which gives us the value delta size so the difference between the coordinates is stored inside of a variable called delta colored so delta coord plus delta size gives us our localization loss which is what we're returning down here so return delta coord plus delta size so if i go and run that and then we go and test it out or we actually go and create variables for those so classification loss is then going to be passed through to our training pipeline and regression loss is going to be set to our localization loss so the classification last keep in mind is going to be binary cross entry because it's just a classification problem right we don't need to get fancy there but we could if we wanted to and then we're going to test it out so our localization loss we can pass through our bounding box coordinates so we're going to test against y1 which is what we had from over here and we're going to pass through our coordinates that looks okay so if we type in dot numpy we get the actual value which is going to be 6.0 what is that five to begin with but again we haven't trained our neural network so it's gonna be pretty crappy to begin with okay and we can test out uh classification loss which is gonna be binary cross entropy so in this particular case it's 0.584 and again to get that value you can just type in numpy boom okay that's looking good um and we we don't have a well this is regression loss so it's going to effectively be the same as localization loss but 6.05 okay that is step nine now done so we've now gone and defined our optimizer we've gone and created our localization loss and classification loss and we've also gone and tested out those metrics now what we need to do is actually train our neural network but before we do that we actually need to create a training pipeline so for that we're going to create this class over here let me zoom out a bit so you can see it so what we've got is the base model class so we're creating a new class called face tracker and to that we're passing through model from over here and then let's actually take a look so whenever you're creating the or of sub classing the model class from keras there's a couple of things that you need to have so you need to have an init method a compile method a train step method and a call method so let me explain what each of these does so the net method is where you can pass through your initial parameters so in this particular case we're actually passing through our pre-built neural network which is the eye tracker model actually shouldn't be eye tracker that's fine i called it eye tracker previously it'll still work um which is actually the face tracker model from over here so remember we instantiated face tracker here it's a face tracker equals build model which is creating an instance of this deep learning model over here then what we're doing is we're passing that through to our init method i've called it eye tracker it could be anything but we're setting that equal to self.model so self.model equals i trackup then we're compiling it so remember whenever you build a keras neural network you typically create the network set self.compile and to your compile method you pass through your loss and your optimizer that's exactly what we're doing here so def compile and then we're passing through our optimizer our classification loss and our localization loss and we're setting those variables over here as class variables so we're able to or class attributes so we're actually able to get those out so before we're doing that though we're running super dot compile so because this is a subclass model we're compiling this model as well we're then setting all of these lost metrics of self.classification loss equals class loss from up here self dot localization loss or ll loss equals localization loss from up here self dot opt equals opt from up here then this is where the magic happens so this uh train step is where a lot of the hardcore stuff actually happens and where we actually train our neural network so let's get ready to explain this so train step is going to take in one batch of data and it's going to train on that batch of data so that's the first thing to know so first thing we do is we get that batch of data and we unpack it into its x and y values what we then do is we then start or we then tell new keras to actually start calculating each of the different functions which are being applied to this information as we go and train our model or as we go and do stuff within our model so what we do is we first up make a prediction from our model so we run self.model which remember is taking in our face tracker model it says eye tracker here but it's effectively taking in our face tracker model and we're passing through our x values which remember are our pre-processed images and we're setting training equals to true because if we've got any specific layers which perform differently during training to inference it's going to activate those layers now now remember our model returns our classes and our coordinates so we can then take those classes and coordinates and pass them through their respective loss functions so we get batch class loss equals self self.c loss which is really just our classification loss and we're passing through our classification values remember the ones or zeros and we're passing through the true ones or zeros which we're getting from over here now we're passing through the predicted ones or zeros so y zero is it going to be y true classes is going to be y pred so over here this is y true why bread then our batch localization loss sorry my my throat is uh dying over now let's grab the glass of water this is always the problem with these huge tutorials by the end i'm like absolutely dead all right batch localization loss equals self.ll loss and then to that we're passing through y true and we're passing through our coordinates now i noticed when i was building this up that i had to cast the value to be a t of float32 so you can see that i'm casting that value there just know it'll work it's just something that i had to do to get the loss function to work appropriately so we've got batch class loss batch localization loss we're then going to add those together to get one loss metric so total loss equals batch underscore localization loss multiplied by 50 percent of the class loss that's just something that i chose you could tweak that if you wanted to but i found that tended to work and then we're actually going to calculate the gradients so this is really important when you're using a custom training step so you calculate the gradients and then you use your optimizer to go and apply them so with tf.gradient tape as tape so this is actually starts calculating all of the operations that are happening inside of our neural network and then we can actually use tape.gradient to actually go and get each one of those gradients or calculate those gradients so grad equals tape.gradient and then to that we're going to pass through total loss so we're actually calculating the gradients with respect to our loss function and then we're calcul passing that through to self.model.trainablevariable so what you'll actually get out of this is all of the gradients for each one of those variables with respect to that loss function and then we go and do gradient descent so optimizer dot apply gradients and what we're going to do is loop through each one of these gradients and apply a or apply one step of gradient descent so effectively we should be optimizing closer towards minimizing that loss so zip grad and then to that we're going to pass through each one of the trainable variables so self.model dot my head blocking that it is self.model.trainablevariables so i wanted to explain this in a little bit more detail right so because this is really important so train step is effectively going to give you that or perform that training so remember train step is going to be what actually trains our neural network and if you think about it there's a couple of key steps so first off you trigger that monitoring so tensorflow is going to start calculating all the operations you make a prediction you calculate the loss you then go and calculate the gradients and then you go and apply back props and apply gradient descent against all of those different variables so that in a nutshell is what's happening there and the reason that we had to do this is because remember we've effectively got two prediction heads we've got classification and we've got our uh regression model which gives us our bounding box coordinates and then what we're doing is we're returning those losses so returning total loss batch class loss and batch regress loss and so what we'll actually get back is a dictionary so when we go and run our training step we can see the progress now our test step actually allows us to you is actually triggered whenever we pass through a validation data set and this is almost identical to our train step the only difference is that we're not actually going applying back prop here we're just going and calculating our total loss batch class loss and batch regress loss over there so if you look at this versus this that the core difference is that we don't have the tf.gradient step tracking and we don't actually have these two lines here which are actually going and calculating the gradients and doing backprop so it just still gives you the ability to go and validate your model and then we've got def dot core i don't think we're actually going to use this but if ever you wanted to use dot predict for a model you need to implement def.call okay so that is our training step now created so we can actually go and run that and again all this code's going to be available by github so if you want to go into it into a ton more detail if you've got questions you want to ask me hit me up in the comments below when i my voice is back uh okay so let's zoom back in so that's our neural network now defined right now training step now defined so what we now need to do is just set it up so we're going to subclass our model so model equals face tracker which is this subclass over here and then through that we're going to be passing through the neural network that we set up just before which had our vgg layer so which had this over here this could be named anything it could be named custom model blah blah blah whatever you wanted or od model for example and then all you do is you'd pass through od model or face tracker model eye tracker model whatever you wanted to over here so just know that the model at the top maps into this model over here to give us our custom training step so we've just gone and created that we can then go and compile it into our compiler we have not instantiated our optimizer so let's go up to here my mouse has just died there we go we're back okay so let's go back up to here we didn't actually run our optimizer so we need to run that cell and then if we go back down here when we go and compile we are going to be passing through our optimizer our class loss and our regression loss if we go and run that that has gone and compiled successfully then we can train okay so we're going to specify a log directory and this is where our tensorboard model will log out to so if i go and run that and then i go and run this line this actually creates a tensorboard callback so if you wanted to go and review your model performance after you've actually gone and trained you'll actually be able to go into here it'll create a tensorboard log directory which you can pick up and go and review if you want more details on that hit me up in the comments below okay but for now this is the magic line so let's take a look so model.fit is going to call this model over here and it's going to remember fit is going to trigger our train step and if we're going to pass through validation data it's going to trigger our test step so hist equals model dot fit and then to that we're passing through our training data and if you wanted to pass through a smaller amount of data you can type in take to grab a smaller batch we're going to pass through everything so we'll get rid of that we're going to specify how long we want to train for so epochs equals 40. so this means we're going to train for 40 epochs and then we're going to specify our validation data oops sorry we'll get jumping over onto my trackpad which is kicking off um weird stuff happening so let's uh we zoomed out let's zoom up all right cool that's looking good all right so uh what are we doing so we're gonna then pass through validation data which is going to be our validation partition which we set up right up at the start and then we're going to specify our callbacks equals to this tensorboard callback which again is purely optional but i just do it just back up if ever i wanted to come back and take a look at my training data now because we've specified that we're going to save this model to a variable called hist this means it will actually be able to get our training history so when we go and run the fit model and assign the outputs of that to a variable we're actually able to get our history back which is right what we'll do down here we'll actually plot out our performance okay that is our fit model so assuming we've gone and done everything successfully if we go and run this line now we should kick off training so let's run this and see how we go and we've got errors uh batch regress loss is not defined so what have we done there so batch regress loss have we not gone and this should be a batch localization it really should be localization loss not regress loss i think where is it failing let's just quickly double check looks like we've got a bug there so it's failing in train step do we have regress last year so self.lls oh it's in here so this should be so batch localization loss should be that there that should be that updated there and this should be that there looks like we had a couple of bugs there the batch localization loss batch localization loss it looks like we just didn't update this training step okay let's try that again press our fingers okay so we're training so you can see that our model is running through all of the different batches inside of our training epoch so you can see it's 465. that's our total loss that's our classification loss and that is our regression loss or effectively what our bounding box regression loss model looks like now ideally you want to see a smooth reduction in all of those loss metrics and you ideally want us don't look looks like there's a massive disparity between our regression validation regression loss and our base regression loss so again it might be an indication well we've got a lot of data so it might make sense and our validation partition's pretty small but i don't know let's wait for this to finish training and then we'll actually be able to see how this is performing let's give it a couple of epochs and see okay that looks a little bit more reasonable so our regression loss is pretty close to our valve regression loss actually let me explain this so this is our total loss metric and that's going to be remember our batch localization loss plus our batch class loss multiplied by 0.5 and then we've got our val metrics as well sorry i'm losing my voice now so this is val total loss over here let me zoom out actually zoom in alright that's a bit easier so total loss class loss regression loss and then we've got where they're prefixed with val that's what we've gone and done on our validation partition so vowel total loss valid class loss value regression loss so what you want to see is that all of these metrics reduce progressively and pretty consistently with each other so you don't want to see vowel regression loss dump massively or spike massively likewise you don't want to see classification loss dump massively or spike massively you all want them to be reducing pretty consistently so you can see that um let's take a look so our vowel regression loss is 0.543 uh val well sorry our base regression loss is 0.543 our vowel regression loss is 0.0166 so that's looking okay ideally you don't wanna see one drop way too much more than the other you don't wanna see um stuff just going crazy so what we'll do is we're gonna review this in a lot more detail once we go and plot our performance down here inside of 10.3 so let's let that finish training and then i'll be able to walk you through what that's looking like so we're up to epoch 6 now total loss is at 0.0394 so that value that you can see there and let me zoom in so you can see this so total loss is that value there and val total loss is that value there so we've got a little bit of a discrepancy or it's not discrepancy this is part of training so 0.0394 and 0.0713 now if we break that down classification loss on the training partition is 0.0148 valve class classification loss is 0.038 so that must mean the regression loss is diverging so if we go and take a look val regression loss is 0.0695 and on the training partition it's 0.0321 so let's keep watching this and see how we go it looks like valve regression loss is bumping up a little bit okay that looks a little bit more consistent so vowel regress is 0.018 train regress will be 0.0117 [Music] so ideally you want them to both be decreasing at a consistent rate you might get spikes occasionally but it shouldn't spike massively and then stay massively out of whack all right so let's let that train and then we'll be right back and we'll be able to see how this is actually performed so again we're training we're going we're getting stuff done all right so that is our deep neural network that is finished training now it'll be interesting to see what performance looks like because i saw validation regression loss bouncing up and down but we shall soon see so at least the nice thing that we can do is we can at least go to history and type in dot history and we can actually get all of our lost metrics and see how this was actually performing so training performance looks like it was reducing pretty consistently but i think we had some or at least from the validation partition we had a little bit of bouncing up and down that now this might be caused by a whole bunch of things because you can see you've got a spike there possibly there's some weird data some weird annotations or some crappy annotations i don't know we'll actually find out soon so if we actually go and visualize this performance we can actually go i've got the exact same thing written there we can actually take a look at his.history to get that history back and that's because we've gone and saved model.fit to the variable history so that actually allows us to get that performance but rather than look at it like this it's probably easier to actually plot it out let me zoom in we're a bit too zoomed in there so if we actually go and plot this so this loss will actually plot the total loss the validation loss or the total loss the validation total loss the classification loss the validation classification loss the regression loss and the validation regression loss where i'm really losing my voice so if we go and run this let's actually take a look see so you can see regression loss was bouncing up and down now this is a little bit concerning but maybe we've got some weird data in that validation partition who knows so you can see classification loss performed pretty perfectly but it looks like we had some weird stuff happening in the validation partition possibly i'm thinking maybe there was some weird annotations or something let's actually go and take a look what's our validation data look like um so we're going to take a look at our images i don't know maybe it threw up on this or like it wasn't too happy with this particular image here let's actually test it out i mean let's jump back on over to our client and we'll actually be able to see what this performance looks like to begin with all right final stage making predictions you got it we're going to do this on our test set but also in real time to evaluate performance nice let's wrap this up alrighty so we're in the end game now so i'm making some predictions so first things first let's go and make some predictions on our test set so to do this we can write test underscore data equals test dot has numpy iterator so this will set up an iterator we've seen this a bunch of times we can then go and grab the next batch so test the underscore sample equals test underscore data.next so this is just going to grab one batch of data and remember it's going to be eight examples and then we can go and run a prediction so we can use our face tracker model so face tracker dot predict and we can pass through the test sample now remember we just want the x values not the labels so we can actually type in y x y equals testdata.next and just pass through x here right it'll do the exact same thing now we can take a look at these predictions using this plot so this is actually going to predict or only plot it out if the classification loss is over 0.5 and if it is then it will actually go and draw the rectangle now this is exactly the same as the way that we drew the annotations towards the start after we'd done the augmentation but let's actually take a look okay so this is performing absolutely terribly so something's gone terribly wrong so you can see there that we are not picking up our faces whatsoever so this is a problem now i'm wondering if whether or not we changed that final output layer or that final connection over here had an issue so we had this set to f and we had what did we have so we had this connection set to hook into f2 over here but previously when i tested it out f1 had seemed to work i wonder if changing that had influenced the model let's actually quickly double check our annotations first so if we go and grab something out of the let's keep going through and take a look at our training sample so if we notice that we've got weird images here i'm wondering whether or not it's the architecture so if we go and run that that looks okay that looks okay that looks okay oh we need to get the next one those annotations look fine they look fine let's just quickly take a look at uh we don't have oh we've got the annotations here if we take the sample codes uh no we want sample classes so um res one comma zero yeah so these all effectively are labeled correctly so one one one one that's fine i really wonder if changing the architecture here has screwed it up because previously i must have gone and trained it with f1 hooked in so this particular dense layer connected directly to the single global max pooling layer now let's see if changing this improves our model so we could actually get rid of this layer because f1 or this particular regress one layer is hooked into f1 which is this over there so we could actually take this out and say it's almost like that now i wonder if that gives the models more connectivity i don't know let's see so if we go and run that we go and rebuild our model we've got 16.8 mil params still fine nothing else that we've changed kind of weird that we're getting such bad performance learning rates fine we've gone and done anything weird over here so let's take a look at our loss metrics again so batch localization loss equals that which equals self dot localization loss that's fine classification loss and we're returning total loss goes to there probably good that you see me debugging one of these because then you're like well this is how to actually build these up and fix stuff when stuff goes wrong so batch localization loss gets passed to there that's okay batch localization loss that's okay so we're just double checking that the loss metrics are assigned correctly let's go and kick this off again let's train for a little less now so remember we had we only trained on 10 i don't know maybe let's go and train on a smaller batch and let's drop the epochs to 10. let's see what happens now so this is obviously going to allow us to train faster but it's not capitalizing on all the data okay so regression loss 0.8246 only thing i'm thinking is that maybe we've got bad data and stuff is getting thrown out okay so what do we have so regression loss those are decreasing pretty consistently vowel regression loss that's okay this is still decreasing in line but it's bumped up now weird okay so i'm just double checking that regression loss is decreasing in line with the vowel regression loss is decreasing in line with regression loss if it's not then maybe we've got a problem okay so that's gone up again i wonder if we've got issues okay so this has gone down massively now and regression loss has gone up gone down gone down okay it's gone down this has gone up we've got issues i think okay so they've kind of converged i don't know let's take a look at our history that's looking better right so at least our classification loss is dropping our regression loss is dropping what is what our predictions look like okay so we're getting weird bounding boxes over here why are we getting two bounding boxes okay this one at least looks okay let's actually go and train i wonder if there's weird data let's give it more data and see if that throws it out actually let's train for longer to begin with so let's go for i don't know 40 epochs [Music] [Music] 0.01 0.04 i think something's going wrong let's stop this hold on let's go and plot this out again uh this will be from the last set so that's not relevant just yet something's broken we shouldn't even be rendering two boxes go and get the next step of data yeah something has clearly gone wrong because we should we really just shouldn't be rendering out two sets of data hold on test data what are we doing here okay so then we need to render on wait we change this this should be test sample over here and then let's pass through test sample one zero okay so maybe we want to screw it up the rendering hold on so maybe we are rendering okay [Music] okay so that's picking that up it's not picking up anything there it's and run through another sample okay so maybe the model has been working and we've just been rendering it incorrectly i think that's what's actually gone wrong because we went and changed this to x y let's just keep running through the rest of the sample data so that looks okay that looks okay that looks okay that looks okay all right so maybe we aren't so bad okay phew i think the model maybe had been working but we've gone and screwed up the visualization i need to validate this because i don't want i'm not confident that we needed to go and change this let's just go put this back because i want to go and test it out properly so maybe it's the vis that we screwed up and not necessarily the model itself so let's go and set this equal to f2 cut this put that back there i'll include the final code anyway and the github repo so we went and reconnected this regression layer back to this f1 layer but i'm going to put it connect it back to this global max pooling layer to see if that was truly the issue or whether or not we just screwed up the visualization i've got a feeling it was the latter so let's go and redefine that we're going to redefine our neural network that's fine we're going to recompile let's give it all of the data back so we're just going to drop the dot take and let's give it i don't know let's let's train it for like 10 epochs we won't wait that long let's see what our history looks like we'll be right back we'll see if that was the bug a little longer than a few minutes later okay so i've gone and retrained it for the 10 epochs now again we did have one big spike here for our validation regression loss so if we go and take a look at our history again and our plot so you can definitely see that there was a massive spike but all in all it doesn't look too bad so it looks like we've kind of been kind of steady in terms of regression loss and validation regression loss but we do have that big spike which is a little bit concerning but we can go and take a look at that data now so all we've gone and done is we've just i think we went and screwed up the visualization here because we had x y which meant that we didn't actually go and update when we went and rendered the image over here which would be sample image so this really should have been um extracting the sample image here so my screw up might have actually gone and messed up the code here so if we just go and leave this as test sample let's go and take a look okay that is predicting faces accurately there there it's blocked out our face there let's go and make some other predictions we might be okay guys let's go and print this out so that sort of reassures the fact that it wasn't the neural network architecture which was out over here it might have just been the fact that we screwed up the visualization i screwed up the visual i'll take full uh responsibility we've gone i went and screwed it up but that's fine at least it you can see what happens when you're building these types of models okay so we've gone and successfully classified our bounding boxes let's go and take a look at some other examples it's looking good guys it's looking okay and right we can if let's say for example you wanted to increase the confidence of a face being in the scene you could actually increase this value here so set it to 0.9 still looks like we're picking up faces right so the classifier is really really good in this as well but it looks like it's not classifying our face when we cover our face so we're looking okay so predictions look look alright so again if you do notice these spikes i mean a one-off spike every now and then isn't the the worst-case scenario because keep in mind we do have less data in our validation partition versus our training partition so it may be more successful to changes in the weights but if you start getting massive amounts of variability or you just start to see them completely diverging then you know that you've probably got an overfitting error um but in this case only retraining like we went and retrained for 10 epochs and that seemed to be okay the real test will be when we actually go and test this with the real-time detection but for now let's actually go and save this model so i'm going to import the load model method so from tensorflow.cariston models import load model and then we can go and save the face tracker model so facetracker.save and we're going to save it to facetracker.h5 so you can see that we should now have a in a second we've got a facetracker.h5 file so let me zoom in so you can see it you can see that we've now got that there and if we go and reload it we should be good so if you wanted to go and reload it into a different app that's how you'd be doing it down here now we can go and try this out on real-time face detection so okay what we're doing here is we are effectively capturing our video frame we're then cutting it down to be 450 by 450 pixels because that's our augmented size we convert it from bgr to rgb because opencv reads it in as bgr we need it to be rgb for tensorflow we then resize it to be 120 by 120 pixels and divided by 255 to scale it down and then we passed it to our face tracker.predict method and then all this down here is really just about rendering so if you wanted to increase the rectangle size or color it's really just a matter of changing these over here so this is going this over here controls controls the main rectangle this over here the um the what's called the label rectangle i know i always get questions on like how do i increase font size this over here is controls the text rendered so if you wanted to go and change the font uh face it'll be this if you wanted to change the font color it'll be this thing font size font width um the positioning is controlled by this tuple over here let's go and test this out so i'm very curious to see how this performs in real time so let's drop this and i've closed a lot of the blinds in the apartment so it might be a little bit dark in the background i don't know we'll see how it performs but if i go and run this now we should get a pop-up okay so what's happened so we are not picking up our video capture device that's because we need to change this to be all right let me explain this so you can see it says type nun type object is non-subscriptable that is because it's probably not able to actually get a capture from our video camera now keep in mind when we initially captured our frames it was from up here and we had our video capture device set to one so we've got to change that down there as well to make sure we're pulling from the same camera both times so if i change this to one you need if you get that type of error just double check that you're picking up the right video camera as well just key thing to keep in mind we're running this this is the final test the final countdown see if this works oh my gosh guys take a look at that we built it it's picking up my face so this is a custom object detection model working completely from scratch guys i know this took a while to get to and it was a really long tutorial but that's what's possible so we went and built it completely from scratch using tensorflow and object detect let's cover our face see if it's this uh takes it away take a look at that so it's no longer picking up our face face face face you could use this just for about anything right but like if you wanted to go and i don't know do a number plate detection or if you wanted to i don't know capture different cars you could do that but like this is so powerful this is just the beginning but it at least shows you where i would have reeled a real time object detection model and one that works on custom data sets using deep learning and completely from scratch now let me take down or open up the blinds to see if it works under different conditions right what about if we turned off the recording lights still picking up my face guys look how dark i am in the how good is that though it's still working pretty resilient back on so even with really bright like i mean this has got auto focus on on the camera over here or auto brightness so you can see it's gone a little bit crazy but take a look at that guys picking up a face if i go down disappears i can't get over this how awesome is that so i mean we had a little bit of trouble with the rendering but apart from that it wasn't too bad to actually get to this a lot of pre-processing but that sort of comes with the uh with the territory when you're doing deep learning how awesome is that guys a real-time face detection model working pretty well i do say so myself again you could add more data so in the end let's quickly go through what went wrong there because i think that's important let's do a quick uh what do you call it postmortem i've just opened up the blinds as well just to see what it works on like what it looks like under different lighting conditions i think it still works okay guys like that's tracking my face pretty well and it's pretty quick like just it's working you can move the mic it works oh my god this is so good like i've wanted to do this ever since i started deep learning i'm so happy how good is that oh i'm going to be playing with this all night who knows what i'll be building all right oh it even works with the green screen guys oh come on as if that's not awesome keep in mind we didn't even give it any green screen samples so it's still able to pick up our face pretty well that is nuts okay post-mortem though so what actually went wrong here so i think i'm keeping this up here just to reiterate how awesome i think this is so uh what went wrong so the key i think the model itself trained okay so when we were actually training this we were perfectly fine it's just that when we went and visualized it that screwing up this so if i went and go wrote x y what clicked for me is when it started rendering multiple boxes this is a single object detection model so we start to see multiple boxes something has either gone wrong with the predictions or something has gone wrong with the rendering so if i go and do this right we're going to start to see multiple boxes or duplicate boxes oh this code is still running we've got to stop this code in order for this run so i'm just going to hit q now [Music] all right so we're going to run it once now if i go and do it again uh let's get another sample another sample it's not happening i would it's gotta be if something weird's gone wrong oh no it's because we had x here hold on there we go all right so the fact that we're getting multiple boxes so you can see that one there that's a dead giveaway that something has gone horribly wrong so you can see that that indicates that something has gone really really bad but it's typically going to be something to do with the actual architecture or it's going to be something to do with the rendering so either the machine learning model the deep learning model is outputting incorrectly or we're now rendering incorrectly you can see here as soon as we unscrew this so we're going to reset it back to test sample and test sample 0 let's go back boom it's working again so we didn't need to go and change the deep learning architecture we didn't need to go and move around those layers um i think it's okay guys this is working pretty well let's go and test it once more so if we go and run our real-time detection test so if you get that little pop-up and it closes just re-run it again sometimes it has to reset the video capture take a look guys face oh my god i'm so happy this is absolutely amazing that brings this tutorial to it then i know it's been a long one but we've gone through an absolute ton of stuff and the final code including all the things that i tweaked inside of this tutorial are going to be available on github i'll make sure to update it right after this so you got it but that is it in a nutshell we've gone and effectively built our own object detection model that does face detection hopefully you've enjoyed this let's quickly recap what we did in that last segment so we went and built our deep neural network we realized that it wasn't an issue with the neural network itself but i showed you how to go and double check that so if you wanted to go and change layers you can move them around they've got to be the same shape in terms of the output or linear algebra rules applied we then went and made some real-time detections we defined our localization loss we defined our training steps i explained the custom training step that we went and did as well as training how to go on debug what to look for when you're training so again big spikes one off might be okay if you got start to see huge amounts of divergence you've got to go and double check your data see whether or not you need to apply regularization whether or not it's a a complete architectural redesign but in our particular case what do we end up with so our final validation regression loss was 0.025 our total loss overall was 0.065 our total uh validation loss was 0.0297 but in all in all it seems to be working pretty well again we could add more data we could um do a whole bunch of additional things but on that note that about does wrap this up do check the github for the updated code thanks again for tuning in guys peace thanks so much for tuning in guys hopefully you enjoyed this video if you did be sure to give it a big thumbs up hit subscribe and click that bell and let me know how you went with this i did put in a ton of effort to be able to build up this full flow ideally from scratch and really with minimal dependency so let me know how you went with it thanks again for tuning in peace [Music]
Info
Channel: Nicholas Renotte
Views: 193,592
Rating: undefined out of 5
Keywords: face recognition python, face recognition, deep learning, python, face detection, object detection
Id: N_W4EYtsa10
Channel Id: undefined
Length: 146min 5sec (8765 seconds)
Published: Thu May 05 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.