Autoencoders in Python with Tensorflow/Keras

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on everybody and welcome to a video on auto encoders so auto encoders are neural networks that are trained to encode and subsequently decode um input information so by encoding what we mean is generally compressing it or denoising it or just reducing it to fewer fewer information than the input ideally it doesn't have to be that way so another form of encoding might just be we want to change the the format of the data but generally it's going to include some sort of shrinking of the size you would not use an auto encoder at least i haven't seen any for actually the purpose of like video compression let's say you would probably want to use some sort of mathematical proven formula for this but instead auto encoders are very popularly used with more of your advanced and complex problems with machine learning and deep learning i think the best way to understand stuff like this is just to do it so let's go ahead and jump in and learn a little bit about uh about auto encoders so uh the first order of business is that we get ourselves some data so we're gonna import tensorflow as tf we're going to from tensorflow we're going to import keras we're going to import um i can think i think matplotlib.piplot as plt we could also use cb2 for visualization there will be text-based version there is a text-based version of this tutorial i will put a link in the description uh in that we're i'm going to use opencv instead because you can very quickly like cycle through things but i always have problems on various it's like the way that you handle it is different on each operating system i know matplotlib will act the same way that i'm used to so i'm going to use matplotlib here we're also going to uh we probably see i think i'll still use cv2 probably in this tutorial to resize things so i will still import it and then we'll import uh numpy as np so to begin we are going to specify the x train y train yeah x train y train and then x test y test even though we don't actually need that information and we're going to use the mnist data set uh the reason why we're going to use it is honestly the data set doesn't really matter that much if you have your own we're going to be working with images so if you've got your own images and especially if you just want to like grayscale those images real quick feel free to pause it and use something else you don't have to use this data set but that's what we're going use uh for this tutorial so [Music] tf.karas.datasets.mnist.load so what we have here uh if you're not familiar we can just show plt.show and we'll show x train zero and actually a couple things first of all we'll say cmap gray and it's not show it's m show if we want to show an image so once we get the import uh load is it like load oh it's load underscore data so load data like so okay cool so here we have like a five for example we could check another data point a zero and so on so they're just handwritten digits uh and some more further information uh x train say zero dot shape uh it's they're 28 by 28 and as you can see they're gray scale and also if it was more if it was like rgb it would be 28 by 28 by three for example so in this case it's 28 by 28 data and if we just run the numbers real quick we see that 28 by 28 is 784 so when you feed this through a neural network that is 784 unique features if it was rgb it would be multiplied by three and now you would have actually 2 352 features this is an rgb but in the future we hopefully at the end i'll show an example of rgb because certain things change with the autoencoder but we'll get there so anyway in this case it's 28 by 28 784 features which is a lot of features so um luckily the endness dataset's actually a really easy one for neural networks to learn and we can talk a little bit about why that might be but um it's a very easy one but for neural networks 784 input features can often be very challenging so the idea of compressing those features down in such a way that it would make learning for the neural network easier that's always good and then if you've got features that are noisy i.e kind of useless features why not toss those out so this is us using a neural network to hopefully figure out these things so in a lot of cases for example maybe the features are all useful but certain features um have relationships with each other such that uh maybe you've got five features but you could actually because of the way that they interact with each other five features could actually be condensed into two or three features and then imagine that uh spread across hundreds of features if you're able to do something like that you can simplify the problem and with neural networks if you're having a hard time getting a neural network to learn something the the first thing you should always be thinking of is how can we make this an easier problem for the neural network to learn and when a neural network is training it's trying to do kind of two things at one time right it's trying to figure out what is this data what are the relationships between these features and it's trying to do say classification so it's doing it's kind of doing two things at once and the idea of an autoencoder as you'll soon see the input is the output so it's actually just mapping the input to the output and the idea is to have some sort of bottleneck along the way and doing that hopefully makes it easier for the network to learn so so we know we have 784 uh features now with all image data uh the like right now uh let's just print out x train zero we can see the values are everything from zero to there's probably i don't see a 255 i see a bunch of 253s but i assure you it goes to 255. uh anyway the data is zero to 255. so generally one of the things that were we do with image data is just divide by 255 including if it's rgb so we're going to say x train equals x train divided by 255 you could also say something like x test div was it div equals 255 i think either of those should work so we'll do that can we not get away with doing that div equals i thought we could do div equals i guess not um i'm sure what i do why that wouldn't work x test div 255 okay anyway i'm sure maybe that was really obvious i don't know anyway so now we have our data and so just just to show it real quick um it's kind of ugly but i assure you these are all of these values are now between 0 and 1 instead of 0 to 255 and again for neural networks we generally like to keep data between negative 1 and positive 1 or 0 and 1. so in this case it's between zero and one so okay so we have our data we've pre-processed our data how we'd like uh but the problem again is it's 784 features so what we're going to try to do is condense those down so we're going to start by defining um our basically an autoencoder is going to consist of two models into one model so we're going to have an encoder and we're going to have a decoder so we're going to say encoder underscore input and this is going to be a keras dot input layer and the shape is going to be oops it's actually shape uh it will be a 28 by 28 28 by 28 and then one and again if this one is this is how many channels is it so if it was rgb you would put a three there instead and then we'll give it a name and we'll just say uh image for now so that's the input to our neural network um now what we want to do is attempt to condense it so in our case with the mnist data set it is gray scale so we don't have these three channels to work with it's pretty simple data we don't have to use any convolutional layers now this will allow us to simplify and for me to show you like how this works also a lot of problems that use auto encoders aren't doing it on image data at all like you don't have to do this on image data the only reason we're doing it on image data is to um because you can actually visualize what the autoencoder does which i think is really cool so so we're not going to use any convolutional layers and then if you do use convolutional layers it adds a slight degree of complexity but again at the end of this tutorial i will show you some code that that does use convolutional layers including pooling and all this kind of stuff because that makes it a little more challenging but for now we'll keep it nice and simple and we're just going to use dense layers and to begin we are going to flatten that so keras.layers.flatten and we're going to do that to the encoder input layer um so as opposed so i think i want i want to say this is like keras functional as opposed to using the sequential type of model you can do like the model dot add if you want and just add a flattened layer and so on um i haven't really decided which one i like i've started kind of using this one instead i almost think this makes more sense um i don't know anyway just take note because i think this might be the first time i've used this on the channel so uh so we have a flattened layer so again that's going to flatten all of the input to be basically 784 so then we can have just a dense layer so we'll just say x equals karas.layers dot dense and we'll go with 64 and this will get multiplied by the you know this x here so we have that dense layer uh oh we need to do activation equals rectified linear and um honestly that's probably all we need uh for our encoder uh so i think what we'll do is yeah we'll just call this rather than x we'll say encoder encounter encoder underscore out and that will be equal to chaos.layers.dense and that's probably all we need more complex information might require a more complicated encoder but often it's the case that you don't need more than this and again because the the problem that we're trying to solve here is actually the easier of the many problems that neural networks have to solve and again the benefit of using an auto encoder is it's kind of um it's it's that sort of separation of tasks right the auto encoder's only task is to solve this sort of condensation problem so so that's our encoder and in case we want to use the encoder later you could do this where for both we're just we're actually only going to show the encoder layer later i mean but for now we'll say encoder and that's going to be equal to a keras.model and that's going to be everything from the encoder input to the encoder and i kind of want to make this to be output out put and uh we'll give it the name name i'll call encoder so now we have our encoder now we need our decoder so again the autoencoder is going to take input and map it literally to the exact same input so the only constraint that we have with an auto encoder is that the output like layer is exact as an exact match to the input so in this case it's a it needs to match 28 by 28 by one so uh the way that we're going to do that is first of all we could we could probably just go straight to the output but we'll go ahead and uh we'll have we'll have a dense layer first so you could there's a couple of things there's a couple of ways we could do this but i think probably we're just going to throw in a dense layer so here's what we'll do we'll say x equals uh keras.layers and we'll say 784 we know that we need to end on 784 values at some point so we'll just throw in 784 a 784 dense layer shouldn't be too complicated for us to use anyways so activation equals uh rectified linear and uh this gets multiplied by your encoder and then finally so we have 784 we know 784 is the result of 28 by 28. how do we get this back to be 28 by 28 like we need that to to go back alternatively you there's a million ways you can do this you could and it might be quicker using maybe numpy even first i haven't actually tested this but uh you could also just prior to feeding it through the neural network we could have actually flattened our data and then it would already be 784 and you could actually end on 784 probably um but we're going to do we'll we'll keep it this way especially because eventually if you use like rgb data with like three channels then it becomes so way more taxing i think to flatten it prior anyway uh so we'll do it this way dents and the big takeaway i think i hope is that there's like a million ways that you could you can make an auto encoder literally the only thing that matters is that you map input to input that's it so uh keras layers then 784 and then we'll have our um and actually let me think about this uh this really should be uh let's see so dance i think probably so we've we've defined our encoder and this could be our decoder let's see decoder uh input could just be this like dense layer uh we also you could start with a 64. i don't actually think a lot of times you're going to see the decoder is almost an exact mirror representation of the encoder it's probably not necessary in this case so i'm actually just gonna leave it like that and then we'll say if it doesn't work i'll fix it but uh we're gonna say decoder underscore output all we have to do here is keras.layers.reshape and again what is the shape that we need this to be it's the exact shape from that input right so 28 28 1 and then that is multiplied by uh decoder input and that is our decoder right all we needed it to do is to just decode out to um 784 basically and then reshape it to 28 by 28 and now it is an exact match of the input so now all we need to do is we're going to specify an optimizer that's going to be tf.keras and in fact i don't even think we need to say tf.keras because we're using keras already so we'll just say keras dot optimizers dot atom we're gonna specify our learning rate of zero zero one uh for this problem we don't actually really need a decay but we'll just throw one in there anyways um okay we've got our optimizer now we're going to specify our auto encoder uh model and again this is like our encoder model it's just going to contain everything from in fact i'll take this copy paste everything from encoder input to decoder output and we'll just call this auto encoder and then finally let's do autoencoder.summary and let's make sure that we haven't made any mistakes i need to delete this so i don't have to see this anymore let's just do that train activation oh encoder uh it needs to be encoder output not the encoder model okay that was quick okay so as we can see it does indeed um end on the exact same obviously like for example our reshape would fail if we used anything other than 784 here so again what what is a dense layer right a dense layer is going to produce literally just a vector of 700 in this case 784 values and then all we're doing is just we're just gonna have that whatever the output of those 784 neurons is we reshape it to a 28 by 28 by one and then boom we start optimizing calculating loss all that fun stuff so um okay so i think we're actually uh good to start training um so let's see i'm trying to decide i kind of already know how many epochs is required i kind of in the text based version of the tutorial i did like a little loop um only because i didn't actually know how many it would take but in this case i think we can just do auto encoder dot fit and in this case it is x train x train so again this is your x this is your y in the case of an auto encoder you're literally mapping x to x you're mapping features to features and the whole point of all of this is to hopefully go from 784 features down to 64 features and then we can prove that it works by then d um decoding it from these 64 features again that's 64 features that's just a vector of 64 values uh we're decompressed basically to 784 and then we reshape it if this does work what does this tell us right it tells us that rather than feeding through like we have this auto encoder so we've trained this encoder to take an image of 784 values and condense it down to 64 values yet still contain the same meaning and then what we could do is we could before we pass data through a different neural network if our if our training data are these mnist images rather than feeding through the 784 samples we could actually feed through the encoded 64 samples thus making the problem easier on the model to learn also we can change the data types and more on that later but first let's make sure this works so we fit x to x x train x train uh let's see if there's anything else apox equals three and then batch size probably 32 will be good enough and then validation split spull it equals 0.1 okay um let's see if that works no oh we didn't compile the model dummy uh let's go ahead and let's get that done huh so we'll do autoencoder dot compile where did we define the optimizer i got so busy with the summary uh we did get the optimizer in there so compile uh and then we'll just say opt and loss equals we'll go with mean squared error let's try that one more time okay the model is training and now what we want to do uh while we let that go for one more epoch is basically we're going to now take our x test data so we train this model on the training data so it was basically x train to x train samples now we're going to start referencing the x test and the first order of business will be to come down here looks like everything is done we now we have the auto encoder that's trained but that also means we have our encoder trained up here so i can come down here and what we can say is we can basically just say example equals uh whoops encoder dot predict and again um predictions i say again because i've had to say this like every tutorial predict takes a list of predictions even if you only have one prediction and it outputs a list of predictions even if it was only one so i'm going to end with zero and in the list we're gonna go with x test we'll go with zero um and we can print example oh we didn't let's see i think we have two yeah we gotta reshape it uh moments of truth and just ruined reshape uh and what is it negative 1 28 28 1. so again again it's taking a list and in this case it only has one value so we're just going to reshape it essentially to be this unknown size which happens to be one by 28 by 28 by one try again cool so now we have this is the um the encoded uh version and as you can see here if we do example dot shape it's 64 values so we went from 784 down to 64 which is 784 about eight percent of the original size uh we can also visualize this to some extent if we did something like plt dot m show now again this is not meant to be an image um so take this with a grain of salt but example dot uh can't type today uh since it's 64 we can reshape it to an eight by eight i want more space i don't like staring down there we go uh m show example uh and then c map wow i just i can't type today i don't know great okay so this is our uh condensed version and as you can see we actually have kind of a lot of what i would suggest are actually still useless pixels but that's kind of interesting um let's see what the actual encoder uh said so oh and also let's go ahead and plt dot m show the uh let's see test what did we do zero actually it's x test i think x test zero uh and then again c map equals [Music] gray i'm actually kind of worried that uh i'm not sure this will translate we're gonna find out um so the way that we find out is by actually doing an auto encoder so we'll do this time we see we will say ae out equals autoencoder dot fit and actually we'll do really the exact same thing that we did up here or dot fit dot predict so we'll scroll down dot predict so we have the prediction and then what we would do is we don't want to actually run the reshape so we'll just do plt.m show the ae out boom and no that did not work fascinating i almost wonder if uh we do need that um that extra layer i'm actually kind of interested if that is truly what is necessary i did i really did not think that we were gonna need that input uh let's see so we went straight from 784 to the reshape and the way i initially did this was essentially had one more so the input was actually 64. 64. and then it went to 780 784 we'll say x here x here and then this would be decoder input here let me make sure that's exactly what i had otherwise that should i mean that should be the same i'm going to be super let's see what happens i really didn't think that extra 64 unit layer would do anything so now i'm concerned there might be some other issue but yeah dang there's something else going on here what is it let's see flatten dense rectified linear almost going to have to i think maybe i'll copy the uh the original and just see if we get the exact same what else would this could this be x train [Music] unless did we possibly run let's do this part of me wonders because we might have run that div by 255 twice that could be part of the problem i don't work in notebooks often enough that uh i think i might have run the div by 255 two times and that might have caused it but i'm not sure yeah i think that's what it was that's interesting so in case you didn't follow my guess is what caused the problem was when i did this div equals uh here in this cell uh what did we do it on i think we must i think we so basically what we did was for the training data we did by 255 and then this errored out and then i div by 255 again i'm gonna i'm gonna wager that's exactly what happened i'm not sure uh i kind of wanna oh i really wanna find out if that's what it was so uh because that would make sense uh so if we just ran this one more time and then we do all this again sorry to waste everybody's time but i'm actually super curious so if this doesn't work especially if it shows all those like dead pixels that would also explain why so many of those pixels were like kind of dead yeah yeah interesting okay um so let me let me go ahead and unscrew this up where was the uh where was the div that's what i get for like i never work in notebooks it just makes sense for me to be working in a notebook for like this tutorial and that bit me in the in the butt did i get it cool okay so what i'm going to do is i'm just going to restart kernel and run all cells hopefully we don't run into that that again oh but actually so now i want to go back because i kind of want a next one uh okay so actually i want to say decoder input and then it goes uh oh let's see um so i want that to be commented out i want this to be back to decoder input and i want this to be 784 still working uh restart kernel and run all cells uh okay because i want to make i want to see if this actually works like it should it should work uh to just go straight to 784 one dense layer that's more than enough to learn this data and then we can get back to the uh get back to the tutorial now so um you've already seen that it actually does work uh looks like that worked cool so yeah so uh so this was the original input data and then this is the output i mean and we can test we could do a couple other ones so let's say we want to go with one okay it's a two we'll throw in a one here again and it's a two and uh and in fact we can come up here uh example we need to streamline this for sure but we'll do this so this is our condensed two um and yeah so you can see that this uh definitely worked and i think i think i'll probably i think we'll make our change first and then and then i'll change some of this code but what's important to recognize again is we went from 784 features down to this many features uh 64. uh and then we decoded so we encoded down to 64 features and then back up to 784 to essentially reproduce the the number that was the input it's a little dimmer it's not it's not a perfect replication and that's why uh it it's not like this would be a great method for again doing like video compression on a website youtube is probably not using auto encoders um so so um again the the point is for um features condensing features down and it's doing so in an unsupervised machine learning way in that as you can kind of already see here this this two and this two are slightly different they're not exactly the same so um i think to make this even more clear we can take this down quite a bit so we've got 64 here could we condense that down to 32 so i will go ahead and do that and in fact [Music] i am going to condense this code actually i want to cut that paste that so we will run that i'm going to get rid of that we have our example information i don't know that we need to have the shape but we'll go ahead and print uh example.shape as well um this one what do we go with do we say 32 this time yeah so we're bottlenecking to 32 samples only so again 32 uh print let's say 32 out of 784 uh so this is basically condensed down to four percent uh we can print uh example uh rather than reshaping to eight by eight uh 32 yikes uh we don't really have a great i guess eight by four right yeah that's i guess away again these image representations are just so we can see the values that that's not actually meant to be an image because actually this right here at the output from our example is the output from our encoder and what is that it's actually just a dense layer output it's just a vector of values period so we just happen to reshape it and make a pretty image out of it but that's it so okay so we get rid of this we can keep that i'm gonna get rid of this and the original cut paste so this was a two are we again using test.1 yes variables would help a ton and then we will paste this here so that's ae out and then we'll go ahead and do the m show right away so now let me just do this real quick and i think we'll just delete these okay so again it worked we have a couple of like dead pixels here uh but for the most part it looks like everything is working you know pretty well so again we can go with x test three for example we can change this [Applause] let me go ahead and actually run this one run this one run that one okay hopefully it's going to be a zero xs3 and there we go so we have a zero again and and again it has a lot of the original features like there's like a little extra bit up here on the zero it has that too and it does it's still missing some value here it is slightly dimmer but it's very clearly a zero and you know we can keep going and just see like how how tiny can we get this so um let's go let's do a nine so like a three by three essentially so we will retrain that um and let's just rerun essentially everything here i won't we don't necessarily unless we see anything like super interesting um i won't reiterate through a bunch of different samples but you should get the idea here um whoops uh this should be a three by three so okay so this information does it contain a zero and it does it's not the best zero it's not the most beautiful zero that we've ever seen but it is a zero and to me i think this is super cool that we condensed 784 values into nine a vector of nine values and it did a pretty good job it's still very clearly a zero uh let's go back to what was the five was it zero i think oh no i guess that was a seven oh so let's show that one real quick let's make sure okay so these are two clear these are clearly both sevens but this is like a this is not the same seven so what's happened here is we definitely have a neural network that has just learned okay this data is a seven so in code seven but it's not the exact same seven because again we've come down to literally only nine values which i'm sorry i just find that fascinating i cannot i know mnist is easy but if anything this should show you how easy of a problem mnist is to solve because we literally condense down to nine values and we're able to produce this um that's insane that's just insane so okay so that is auto encoders uh we can show i think i'll show one more interesting aspect of the auto encoder but again so like imagine you're the auto encoder uh how might you condense this information well you know really quickly with the mnis data set one thing you could do is chop away all corners right like this corner here this corner here this corner here in this corner here they're almost always dead information right there's no information there right so so immediately you could probably chop this data down fairly considerably but i don't know about you but there is no way i could condense it down to be nine values i just think that's crazy it's just crazy uh let's and let again we can we can visualize those exam those values uh print example these are the values right that's cool i just think that's super cool um okay so all right so we've made it this far uh the next thing i want to show i don't think there's really any benefit in me coding this uh i want to show you more example of noise reduction so i'm trying to find it i put it in the text based version of the tutorial it's a very poorly written function where are you here we go so i'm going to make a function here called add noise and what i'm going to do first is i'm going to re-expand this out probably condensing to 9 values is a little silly but so we're going to pull back to 64 here and i'll go ahead and just retrain that model i'll leave this information here just because it's interesting we'll come down here and we're going to add a function and it's just literally it's a function that's just going to iterate through every single pixel and in fact i'm using random choice let me go ahead and import random as well uh and then randomly this is a five out of a hundred percent chance uh to it will randomly change that pixel value essentially so import random and um so it's just going to add noise to an image so what we can do is we can say okay there's our function we will say noisy equals and we'll say x test zero whoops needs to be add noise x test zero noise and then now we can um let's just go ahead and peel t dot m show show or noisy image c map equals gray so now we have a noisy seven okay and what happens with uh let's see if i can hopefully copy and paste a quick ae out here boop come down here paste and rather than x test zero it will be noisy um okay and then we will do a plt dot m show not noisy ae out c-map equals gray so what you see is a pretty significantly noise reduced variant right this is the noisy seven and then this is the kind of the the seven that came out of our auto encoder so what didn't we do we did not train an auto encoder to take noisy examples of photos and then so like you know you could train a model where you have your clean data set and then you have your input be you know this add noise a bunch of ad noise examples and then you could train a model to remove noise but that's not actually what we did right we trained an auto encoder to just take images and like encode them into the meaning of that image and in our case the auto encoder again because it's unsupervised machine learning has just kind of figured out like these this group of images goes together and here's the generic version of that group right and we can even see that here like this have this seven doesn't quite match this the original seven um but it did a pretty good job removing that noise because the autoencoder has just simply learned that the meaning of the value or the meaning of um this input data is actually somehow in this in this number um again it's it's unsupervised we don't know exactly how it has done this but i think hopefully up to this point you have seen um just how much auto encoders can compress data right we took 784 features and compressed it down to nine which was that nine out of seven nine out of 784 uh basically one percent which is absolutely nuts um i'm sure that quite a few numbers condensed down to this um to that you know three by three or nine values i'm sure a few of them don't work or probably even quite a few i wouldn't be surprised if like 20 of the data sets illegible um but you wouldn't condense it to one percent but the idea is to just show you that you can use auto encoders to compress data down now the last thing i wanted to show with auto encoders is not only can you do noise reduction and compression the other thing you can do is just change the overall uh let's see this is for something else pay no attention to my notes you you can change the format of the data so in this case uh this i don't think i'm going to run it i don't really see any benefit to running it uh maybe i'll put it in the text based version of the tutorial if you want to play with it but essentially what it's doing is it's taking photos of cats and dogs and their color photos um i'm reshaping all photos to be 64 by 64. um we are doing the div by 255 but we're keeping the three channels right so the rgb channels and then we're running it through convolutional filters right and then we're also doing max pooling in fact i kind of want to run it uh i think i'll let's see maybe i'll run it okay um so let me just uh python three i think i'm running yeah i had the the way i defined python is different on like windows this linux machine and that linux machine so it's super i always forget like which one how do i call python and like which version of python i want anyway um we're running it for 15 epochs i don't think i'm going to like finish this training for you guys but i'll let it at least start because of really what i wanted to show is right here so i wanted to show the uh this model summary so this is what i was talking about before where um see how do i make it there we go um where essentially like what we're doing here is we are taking in 64 by 64 by three which if we did the math on that uh i think it's like 12 000 or something 64 times 64 times three yeah 12 288 so if you were to just flatten out a 64 by 64 rgb image it would be 12 288 now why would you ever want to flatten it out well for example i was tinkering with the um with trying to feed a sequence of images through a transformer model well the transformer model expects to take um an encoded input right a vector of values not a three-channel image so i was just curious can we condense this down to a small enough value that you could then pass a sequence of images through a specific transformer model and um so i was just kind of playing with that and essentially what i what's going on here is you have maybe by the time i get done yakking we'll actually see some examples in before some sort of stupid error but anyway you know you have your input which comes in as a 64 by 64 by three we do some convolutions to it but each time you do a convolution pay attention because like in this example you're saying 64 and then the window is 3x3 and then you can see what it what it winds up generating in terms of values so each of these vectors that if at any point you were to flatten would be quite large right this in this case it's 64 by 64 by three if you flattened here it's 64 by 64 by 64. it'd be gigantic but then we do the pooling and now we've pulled down because the pooling layer i think was a two by two yeah yeah a two by two now we're down to 32 by 32 64. still giant um but then we do another convolution uh then we're doing one more flatten or did we do three pooling okay i think this is only showing two i think no my eyes are deceiving me we're doing three anyway we pulled down to um essentially flattened to 8192 but we're not done yet right then i'm just throwing in a dense layer of 512. just to do it right and that's the end of the encoder so we can come over here and see the flatten and then the encoder output is 512. so this is officially the bottleneck so the question was you know could we condense down to 512 values a vector of 512 rather than a vector of 12 thousand 288 right and we're getting the benefit of using convolutional layers um and again you could always simplify this right you may not need to use rgb images you might grayscale the images first that divides your data by three right out of the gate so um so anyway take note um but anyways you essentially as you you know convolutional filters down down down down and then you're also doing your pooling you do the exact opposite on the way on the way back right you've got your convolutionals but then you have up sampling so then you up sample but to the same degree that you pooled the only thing you have to be careful about here is that there's no padding going on in your pooling so if there was there's prob there's probably ways that you could kind of get around it but but it becomes very challenging to get it to up sample all the way back to the exact same input shape that you started with so generally you're going to try to work with a very specific input shape that you can both pull down and then up sample right back up and again literally you're using a layer that is an up sampling layer anyway uh did we never maybe it threw an error i guess we threw an air i wonder what the problem is could internal error fail to get convolutional algorithm coding in it uh why would that happen um let's just run nvidia someone i'm actually not really sure uh so we've got the two gpus isn't this one running on gpu one yes so why i'm not sure why that error you don't have to see the output of it it's literally just an image of a cat [Music] but it would be fun to show it to you but yeah this one is full but it's because we're running uh here i'm actually not sure why wouldn't that um i'm not sure why why i hit that error way anyway i'm not gonna waste all this time and show it to you but just like i said it's literally just a rgb picture of a cat but i can post the code uh and i'll put it in the text based version of the tutorial if you want to if you want to tinker with it um it's it's that it's a casper's dog data set from uh from microsoft anyway uh quite a long tutorial um but there's just so much that you can tinker with here that uh we've really barely scratched the surface because we didn't even really talk like we showed the encoder side of things we didn't really show too much decoder wise and really you might be actually like in many cases you're focused actually primarily on the encoder but there's going to be plenty of times where you'd be focused on the decoder as well but the idea is always pretty much the same it's to get a model to focus very specifically on a challenge that you have uh and give it only one job to do rather than you know the many jobs that a neural network has to do generically at the beginning of training given a data set also i just think it's freaking awesome that we can take i know it's the eminence data set but condensing it down to only nine values and still producing the input data um i think that's cool i think that's really cool um so anyway uh i think i'm going to stop it here i think we're at like 50 minutes that's quite a long tutorial uh so anyways if you have questions uh comments concerns whatever feel free to leave them below also if you like uh getting into the more nitty gritty with with neural networks and tinkering um i suggest you check out the neural networks from scratch book i don't know if you've ever heard of it uh but yeah it's a book by uh daniel and myself uh you can check that out at nnfs.io otherwise i will see you guys in another video you
Info
Channel: sentdex
Views: 43,433
Rating: undefined out of 5
Keywords:
Id: JoR5HCs0n0s
Channel Id: undefined
Length: 49min 39sec (2979 seconds)
Published: Mon Mar 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.