Deep Convolutional Neural Networks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] at this time I want to continue with the discussion of neural nets by training a neural network which does one of the very common things that people really have considered or thought of successful for neural Nets wishes labeling of images classification of images in 2012 with image net really the ability to train a neural in our deep neural network architecture at scale this was really allowed by having a data set they mission a data set plus also advanced computational power allowed us to really start seeing and exploiting the structures of deep neural nets so image net was a really critical transition in what has revolutionized the field of modern deep learning architectures so details of this code base everything can be found here at data book u-dub comm is based out of a chapter in chapter 6 out of this book on data-driven science and engineering so let's get right to it I want to show you an architecture and this is something you might find all over the internet is pictures like this and what it is is sort of this canonical architecture that people have started developing for image labeling and classification and so let's go through some of the features of this typically there are many layers so this is just a very representative if you look on the internet just type deep convolutional neural Nets you'll see a lot of different architecture structures and styles but a lot of it has this kind of base schematic to it so I just want to highlight the kind of structures that people have been exploiting that have been successful in image processing and then we'll go through some code and show you how you might be able to actually implement something like this in practice so typically there's an input and the input is an image and what the goal of a lot of the image processing is is to take that input and provide it with a label so oftentimes what you have in the image net for instance is you had a set of images with classification labels on the back end so you're going to go from this input layer all the way over here to the output where the output is a label so for instance early on the lectures on neural nets we start off with very simple labels and very simple images dogs and cats alright so there's only two labels at the back end of this thing but you can imagine a vision processing you might have a very large diverse set of labels Pizza car house anything you can imagine would be part of this in labels can become even more sophisticated you know people having a picnic so these labels are typically provided by someone who's gone through and hand labelled these images and provided the output that you need and the training data to take this process in learning all the weights associated with this complex neural neural network architecture ok so by the way this is also very important in the structure of for instance cameras that are on self-driving cars right so what a camera needs to do on a self-driving car is are the cars driving around is to label the entire scene there's a pedestrian a cyclist other cars here's the road so all of this is done in sort of some kind of architecture of this nature so let's walk through two kind of functions I especially want to highlight for vision is this idea of convolution and pooling so these are going to be oftentimes the two key components of making computer Veit computer based algorithms revision ok so you start off with your image which is typically some high dimensional picture so think about even 4k video frames or HD camera everybody has smart technologies with them whether it's cameras or video that produces very high dimensional inputs which are pixel space and typically pixel space also has RGB colors and that's three-dimensional and so that's all in here this is a high dimensional input space trying to get out mapped out to the label space and again the label space can be a variety of things but the example we've worked with so far cats dogs okay but you can have a much much broader set of labels on the back end you may have thousands of labels tens of thousands of labels and your goal is to take an image run it through this network and produce an accurate label for that image so how this works here's my input and when I first talked about convolution one of the interesting things about what a convolution does is it looks at parts of the picture so here is a little convolutional window that you can slide across the entire image okay so the idea is within that window you look for features this is important because if there was a dog in this picture and maybe it was in this right corner to just look at the entire picture would maybe not to be the most useful thing but to have a sliding window that goes across the entire image and in each little sub block looks tries to see what are the features in the sub block associated with the labels that's the kind of thing that it should start learning and in fact you can create lots of layers of this convolutional window so every time you randomly start this if you restarted this network you might produce a filter from the input space to a convolutional space so every single one of those windows here it is here's for instance here the pixels in that convolution window produces through a non-linear output function which could be Ray Lou linear sigmoid 'el whatever you happen to bring onto this produces an output which gives you some evaluation of what's going on in that convolutional window out to some functional output and you're going to train the weights to give you some kind of representative figure here that get you from the input to the output every time you train this you can Prius a transfer function f in other words this transfer function f has all the neural needs neural network weights that are here in this network conveyed into that function so if I retrain it with a different random start I will generically get a different weighting function and so you can do this multiple times to produce lots of different filters to a convolutional layer over here not only that you could apply different type of activation functions you could put a rail ooh a linear sigmoidal so what you're producing here what I'm trying to show here is through this process of convolution through random starts to different activation functions you can produce a set of features in this convolutional space representing this input images from the convolution window typically what's done in logs image processing type ideas is you follow it up with a pooling window what pooling does is it takes a square such as this and it produces an average of that square oftentimes there's called max pooling which is it takes these pixels that are in here and it just simply produces what's the max value down at this level here and notice what happens is in this process it's a compression so you take this thing you take blocks you you replace them with their averages or the maximum value in their blocks and now you get a compression of the image down to this pooling layer which then you can do another convolutional layer to another pooling layer and so forth so a lot of these network architectures if you look at them you start off with a large convolutional window and you progressively refine it through many layers of this network all the way down to some very small pooling layer at the back end which eventually you splay it out back into fully connected Network out to the output so let's call this a paradigm of computer vision structure lots of convolution lots of pooling high dimensions to low dimensions Pooley convolution windows that are big and progressively refined to be small and what's interesting is there's a lot of information about the filters that these things eventually essentially produce in producing features for the image classification all this is learned through a training set now as critical as the architecture and having high performance computing is what really is important is have enough data to train all those network weights you know structures like this can have billions of network weights so the only way you can really use the full power of a deep convolutional neural network such as this one is to have an equally sized training data set and this is why image net was so important in 2012 up to that point you had hundreds of thousands of labeled images what with image net you had 15 million it was that 15 million it was that very large data set that allowed us to come to an architecture like this and exploit it to its maximum so this is a paradigm so I'm just want to highlight that that this is a structure that is being used widely in the community it's very expensive to train but it's also extremely robust and successful especially for images so that's a that's something I want to highlight here that that's kind of one of the interesting things and a lot of people have vision applications where an architecture like this is really exactly the kind of paradigm of a network you want to use in fact a lot of modern high-performance neural networks are basically playing around with different mutations of this kind of basic core structure so you can certainly take a look at any one of the leading dominant conferences on you know the conference on computer vision see if you pyaare and you'll see year-by-year orner ups you'll see a year-by-year especially in the vision field improvements but basically on thinking about ways to permute this kind of basic architecture to keep improving performance to beat even human performance on vision tasks so with that in mind I just want to show you that network architecture and deep convolutional neural nets are extremely powerful at least for this kind of problem where you're looking for structure that has features that are across scales and the convolution is really helpful to pulling those features out and helping you make a classification decision accordingly what I want to do now is walk you through a code in MATLAB let's just show you this in practice so I have a code here and it's not too hard to run or operate and I'm just gonna walk it through and show you how easy it is in some sense to train one of these neural networks I mean essentially weather using MATLAB or using tensor flow is with a Karass front-end or PI torch all these things make it extremely easy for you to implement a deep convolutional neural network architecture so let's walk through some of this code so the first thing here is to download the data so this is a data set provided within the MATLAB framework and so this is just basically pulling that data set in so that we can actually do the training required here so what this data set is is actually these are something like the emne Stata set or but but they are digits handwritten digits and here's an example I randomly pulled out 20 of these this is what this first set of code does here is what it's going to do is going to pull in this data we're gonna go randomly there's 10,000 images we're gonna go randomly pull out the first 20 of them out of this and show them here okay so this is what this is doing is just go grab this data set let me show you an exemplar of this and the fur in fact I'm going to take 20 random images and and this is what they look like so you can see these are digits handwritten otherwise they have rotations there some of them are of high quality some of her low quality but the point is with twenty thousand digits the question is can you write a code that can successfully take any of these images and correctly label it okay so this is one of the early successes of machine learning was in the application of such computer vision techniques to the m-miss data set Emnes data set was handwritten digits like these and this is work done by young McCune and others while he was at Bell Labs because what they wanted to do at the post office has put a little a computer vision reader read odds if codes and send mail to the right place just by having a computer read off the zip code versus having a human half to make that decision so you wouldn't have to accurately recognize handwritten digits and in fact that was when the earliest uses a computer vision was in that task and so you start seeing this is the difficulty of the task is to label some of the more problematic data like this one here I think that's a five you know that seven is a little bit faded and off to the side so some of them are very easy right I mean this is a zero but some of them are harder and have some ambiguity even to the human eye so how would a computer do and how would we set up a neural network architecture to have a set of layers to take an image such as this and to start learning how to map it from an input space which is pixel space like this to a label which is zero through nine that's it okay so let's talk about how this algorithm works so the first parts of this is just showing you the image and so now we're going to start training this thing all right so we have labels in this data and this is coming from us from MATLAB it's a prebuilt data set for people to use and of course the alati is nice there's a lot of data sets that are out there they're very easy for people to use because this provides a perfect training ground for trying new architectures and seeing what happens when you perturb your architecture what's the performance like when you change things around so here here's what we're gonna do first I'm gonna specify how many training how many how many number of files I want for training so 750 and I believe this is for each different digit I have 10,000 images so I'm gonna take 750 from each class and start training with those so the nice thing is then that provides me a lot of data right i have 750 examples with a correct label and so what I'm gonna start doing is across all of the the all of those digits I'm gonna have 750 examples with correct labels to try to learn a neural network architecture a deep convolutional neural network that correctly Maps an image to its label that's the goal and so this is what you specify here okay this will here we're going to do this I'm going to take this data and we're gonna split each into a training example and the training room and we're gonna randomized that training example okay so so the training is going to come from number of files gonna be randomized to pick from those and this here has to do with essentially how do you seed this thing so that like for instance we what you want to do is see the number random and random number generator so if I run it here and you run it in your laptop we both start with the same random and initialization okay so that's what that is actually doing here so let's build a neural network so here it is even easier than baking a cake kind of really is because all you have to do here look at this you just specify the layers I'm an image input layer these are 28 by 28 by 1 so I specify these black and white images 28 pixels by 20 pixels so I specify what's coming in specify the size of the input layer now it's interesting right I can run this very easily on my laptop but imagine putting in 4k video or HD quality stuff here so you can put layers much bigger and it's usually by 3 because it's color so you have the RGB cube so your input data can be massive okay but it's already going to tell you that's the input that's what you're gonna come in with and then you can just see what we've built from here once you come out on the input layer first thing I want to do is go to a convolutional 2d layer and what I can specify here are two parameters and these are the two parameters are going to tell me a little bit about what what normally you're gonna specify is you're gonna specify my what my stride is and the size of the convolutional window okay so those are the parameterizations that you would have and and also how many layers do want to want to make right so if you remember I went from one layer to many layers and the way you get those is through different random recedes so that I can build different transfer functions and I believe here I put 20 transfer functions convolutional window size 5 okay so you're gonna slide across over the window size 520 layers coming out of this thing and what I'm going to do now is specify when I go through this layer that it's array Lu layer that's what this is here so there's the Ray Lu layer okay now so that builds my first layer I'm going for my input through this convolution with a ray Lu from there I want to go to this a max pooling 2d layer and notice here I want to pull over size 2 windows grid sizes of 2 with the stride of 2 so I'm gonna basically walk this through them specifying how big I want the pooling level to be so if this thing has a certain size it's taking two pixels at a time it's a two by two pixel map sliding it to every single time all the way across okay so that's the max pooling and so all I'm gonna do here is go through one convolution one pooling and then I'm going to come back out I'm going to do a fully connected layer size ten softmax is a way of doing the classification on the back end and then finally a classification layer okay and then what I would suggest is there's lots of different neural network type architectures that can come into this so there's convolutional layers pooling layers there's ways to do classification like softmax or classification layers all of these if you just do help in MATLAB or math works you can find a lot more information about what they actually do I don't want to go into details to these specifically because these are just generically picked out of here and I can use other types of processing through this but this is just a representation of what some possibilities are so that's it so what layer's does is tells you here's the structure of norman neural network layers just basically the graphic I gave you which is here let me show it to you so when you look at something like this before I train anything I can specify an architecture I can tell you about the input convolution what are the functions I want and going from here to here how do I want to go from the convolutions of the pooling layer what size of things do I want more pooling there's fully connected through off threshold II so this architecture that I drew here okay that is so fully specified in the layers command so you get whatever you draw here you can specify through that layers function in MATLAB okay so we we only did one convolution to pooling we took out another convolution pooling layer but you could easily put them back in if you wanted to in fact you can see here kind of how easy it is you want to add another convolution after the max pooling if you want to do just copy and paste put them in there and determine the size you want it's kind of that simple so this architecture look how look how simple it is to write in some way and by the way this is the case for both MATLAB Kerris I can all of these are very simple command architectures to do very powerful things for you now we haven't trained anything yet all we've done is said here's the architecture this is the blueprint for the neural network I want to construct what's going to come next as some options let's talk about these options training options SGD M stochastic gradient descent method that's what SGD men stands for so I'm gonna go tell this thing to go training use to cast a gradient descent there are other optimizers there as well maximum number of epochs and other words how many times do I want to keep going through and updating the weights towards getting the final solution and the inertia shal learn what rate so the maximum epochs and initial learn rate are all associated with stochastic are associated with that grade in the set right remember the learning rate is how big a step do you want to take each time you learn something new and compute a gradient towards getting you an update on the weights so this is what this would do for you ok so you can set lots of functionalities there and then let's go train it one line train network and I call the output confident so that's for convolutional net so this is my own output layer so what it's gonna give for us I'm gonna give it the training data I'm gonna give it the layers and the options ok so remember the training train digit data has the images and their labels the layers tells you the architecture and the options tells you how you want to optimize this runs the stochastic asta gradient descent for over 20 epochs and then it gives you a result which is confident so now what I have is confident is actually now my model right this is now I can take this model run new images through it and it will produce out a label on the back end so how do I want to do it well here I can take this now and test it so I'm gonna use the classify command it's gonna basically classify is going to say take that confident model that I have and now instead of the trained digit data so trained digit data is the training data for the digital data and now I have this withhold set called test digit data so this is the test set held out the test that I held out I want to run it to this classify command using this confident model that I just built and I get some results and then I can test my results versus what the actual labels were supposed to be and so this is gonna give me my accuracy so right so I trained on a set of data once I get that model I can pull it out run it on new data and on that new data I get some scores so I'm gonna get those scores through the accuracy command but you can see here very compact architecture but by the way this can take a very long time to run in this example it's not too big because my images are just 28 by 28 they're small compared to what they could be but if I had lots of data and I had you know 3000 pixels by 2,000 pixels and this would be very expensive and I'd have to let it run for quite a long time before I actually built this neural network model but this is not so bad in fact the result of this let me just show you I've already pre run this but I had pre run it and then your accuracy score is actually let me just run this section here when you run this section here's what it does this is now starting the training process so here it is I'm now running this code and it's telling me the epics on iterations per epoch and the time elapsed it's giving you some important information here let me actually widen this screen out so maybe we can see more or what's there there you go and it's starting to show you the accuracy and the time elapsed no notice what's happening as I progressively train this notice what's look at the accuracy my initial accuracy and this is a mini batch so in other words this testing it on a small subset of the data first to just see how it's doing you can see this thing started off at 8% which is even less than point a 10 sided coin flip because there's 10 digits and within one app that gets set 55% and now after 11 epoch look at that now 12 epochs you're at 98% a little bit higher right and remember it stochastic gradient descent and it's a mini batch accuracy so it's not this is why sometimes the accuracy will appear to go high and then come back down a little bit is because you're looking at sort of these mini batches but now you're in the very high percentages in fact after 1617 now in the near 100% again a mini batch so what I have to do is test it over a broader range because it's still actually iterating to bring the error down so this is the kind of thing that you can do within this in a member this is a very simple toy data set so it's it actually trains pretty quickly and your overall accuracy after training is 95% okay so remember the mini batch was not a good indicator of how are you really doing but the 95 percent is so the 95 percent accuracy on a very simple test set like this gives you some idea of how well this thing can actually train a neural map for classification of the images I've shown you here which are things of this of this style here ok so this is this is a nice little example only in a sense that you know obviously you're gonna probably consider harder problems but you know this kind of structure look how easy it is to build right it's very modular very simple you can have a lot of flexibility of how you play with it how you put layers together how many layers you want to put together the size of the layers the strides the convolutions the pooling and you can get options about how you actually want to do these optimisations and in one line of code go train the network you just tell me the structure tell me the options train and I'm gonna give you back a model and then you can see how well it does so this is a this is sort of the modern way of thinking about no one that works because what it's really telling you is within architectures like this you can very easily specify them determine how you want them optimized run and of course it can take a very long time to run that optimization procedure but it gives you back some very interesting results and there's no arguing with the incredible success that these neural networks have had on image and vision processing and why they in fact are now sort of the canonical go-to method for computer vision applications such as self-driving cars in the ton of vehicles so you have now a data disposal that at your fingertips again highlight all this lecture all the code can be found here at data book u-dub comm this is from the chapter six of the book by Brundtland cuts and if you want to download the PDF of the book here it is data book about PDF off of that website and you can see all the details there not only MATLAB code but equivalent Python code to go with this as well and so gives you lots of flexibilities to start training more complex neural nets within this kind of architecture
Info
Channel: Nathan Kutz
Views: 4,582
Rating: undefined out of 5
Keywords: neural networks, deep learning, convolutional layers, convolutional networks, deep convolutional networks, machine learning, kutz, brunton
Id: PwIMCYUl0WA
Channel Id: undefined
Length: 30min 28sec (1828 seconds)
Published: Thu May 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.