Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training | Edureka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi everyone this is shine Tina from Ed Eureka welcome to today's tutorial session based on the auto-encoders now in this session you will have a deeper insight into the working of auto-encoders that will help you build a strong fundamental knowledge about the topic so let's move forward and look at the agenda of today's session so we will begin with the fundamental query that why do we need autoencoders next we will start with the introduction to what our auto-encoders and the various components of these then we will move on to the various properties of auto-encoders next we will see how the training is done and discuss the architecture of auto-encoders moving on we will have a detailed discussion about the various types of autoencoders such as convolution auto encoder sparse autoencoders deep autoencoders and contractive autoencoders and we shall also see how do they work along with a short demo now finally we will have a look at all the applications of autoencoders so let's get started data compression is a big topic that's used in computer vision computer networks and many more now the point of data compression is to convert our input into a smaller representation that we recreate to a degree of quality the smaller representation is what would be passed around and when anyone needed the original they would reconstruct it from the smaller representation now autoencoders are unsupervised neural networks that use machine learning to do this compression for us the aim of an auto encoder is to learn a compressed distributed representation for the given data typically for the purpose of dimensionality reduction now for that we already have principal component analysis then why do we need auto-encoders so an auto encoder can learn nonlinear transformations unlike pca with a non-linear animation function and multiple layers now it doesn't have to learn dense layers so it can use convolutional layers to learn - which could be better for video image and series data now it may be more efficient in terms of model parameters to learn several layers with an auto encoder rather than learn one huge transformation with pca an auto encoder also gives a representation as the output of each layer and having multiple representations of different dimensions is always useful so an auto encoder could let you make use of pre-trained layers from another model to apply transfer learning to prime the encoder or the decoder despite the fact that the practical applications of autoencoders were pretty rare sometime back today data denoising and dimensionality reduction for data visualization are considered as two main interesting practical applications of autoencoders now with appropriate dimensionality an sparsity constraints auto-encoders can learn data projections that are more interesting than pc or other basic techniques it also provides a more accurate output when compared to pca now auto-encoders are simple learning networks that aim to transform inputs into outputs with the minimum possible error this means that we want the output to be as close to input as possible an autoencoder neural network is basically an unsupervised machine learning algorithm that applies back propagation setting the target values to be equal to the inputs so let's have a look at some of the key features about auto-encoders so it is an unsupervised machine learning algorithm that is similar to PC but minimizes the same objective function an autoencoder is a neural network whose target output is its input auto-encoders although is quite similar to pca but they are more flexible when compared to the other auto-encoders can represent both linear and nonlinear transformation in encoding but PCA can only perform linear transformation so now let's have a look at the components of water encoders so there are basically three main layers the first one is the encoder then code and decoder the first component that is the encoder is the part of the network that compresses the input into a latent space representation the encoder layer encodes the input image as a compressed representation in a reduced dimension now the compressed image typically looks garbled nothing like the original image the next component represents the latent space now code is the part of the network that represents the compressed input fed to the decoder the third component is the decoder now this layer basically decodes the encoded image back to the original dimension the decoded image is a lossy reconstruction of the original image now it reconstructs the input from the latent space representation so now let's talk about some of the properties of autoencoders they are only able to compress data similar to what they have been trained on and auto encoder which has been trained on human faces would not be performing well with images of modern buildings this improvises the difference between the auto-encoders and mp3 kind of compression algorithms which only hold assumptions about sound in general but not about specific types of sounds auto-encoders are lossy which means that the decompressed outputs will be degraded compared to the original inputs just like what you see in jpg or mp3 now if you have appropriate training data it is easy to train specialized instances of the algorithm that will perform well on a specific type of input now it doesn't require any new engineering additionally in almost all contacts where the term auto encoder is used the compression and decompression functions are implemented with neural networks so now let's have a look at the training of an auto encoder now there are four hyper parameters that we need to set before training them the first one is the code size the code size represents the number of nodes in the middle layer smaller size results in more compression the second parameter is the number of layers now the auto encoder can be as as we want it to be we can have two or more layers in both the encoder and decoder without considering the input and the output next is the loss function so we either use mean squared error or binary cross-entropy now if the input values are in the range 0 to 1 then we typically use cross-entropy otherwise we use the mean squared error next is the number of nodes per layer so the number of nodes per layer decreases with each subsequent layer of the encoder and increases back in the decoder also the decoder is symmetric to the encoder in terms of layer structure but this is not necessary and we have total control over these parameters now let's have a look at the architecture of an auto encoder and have a deeper insight about the hidden layers so in an auto encoder we add a couple of layers in between the input and output and the sizes of these layers are smaller than the input layer let's say the input vector has a dimensionality of n which means that the output will also have a dimensionality of n now we make the input go through a layer of size P where the value of P is less than n and we ask it to reconstruct the input now the auto encoder receives unlabeled input which is then encoded to reconstruct the input one important part of the auto-encoders is the bottleneck now the bottleneck approach is a beautifully elegant approach to a representation learning specifically for deciding which aspects of observed data are relevant information and what aspects can be thrown away it does this by balancing two criteria that is the compactness of representation measured as the compressibility number of bits needed to store the representation and the second one information the representation retains about some behaviorally relevant variables now it assumes we know what the behaviorally relevant variables are and how they are related to observed data or at least we have to have data to learn or approximate the Joint Distribution between observed and relevant variables now let's have a detailed description about the encoder the encoder is a neural work its input is a data point X its output is a hidden representation Z and it has weights and biases theta now to be concrete let's say X is a 28 by 28 pixel photo of a handwritten number the encoder encodes the data which is 784 dimensional into a latent representation space Z which is much less than 784 dimensions now this is typically referred to as a bottleneck because the encoder must learn an efficient compression of the data into this lower dimensional space next is the decoder the decoder is another neural net its input is the representation Z it outputs the parameters to the probability distribution of the data and has weights and biases as Phi running with a handwritten digit example let's say the photos are black and white and represent each pixel as 0 or 1 now the probability distribution of a single pixel can be then represented using a Bernoulli distribution the decoder gets as input the latent representation of a digit said and outputs 784 Bernoulli parameters one for each of the 784 pixels in the image the decoder decodes the real valued numbers in Z into 784 real valued numbers between 0 & 1 information is lost because it goes from a smaller to a larger dimensionality now how do we find out how much information is lost the loss function equation helps us find the value now this measure tells us how effectively the decoder has learned to reconstruct an input image X given its latent representation Z the loss function of the variational auto encoder is the negative log likelihood with a regularizer because there are no global representations that are shared by all data points we can decompose the loss function into only terms that depend on a single data point so the loss function the first term is the reconstruction loss or expected negative log likelihood of the iaith data point the expectation is taken with respect to the encoders distribution over the representing this term encourages the decoder to learn to reconstruct the data now if the decoders output does not reconstruct the data well it will incur a large cost in the loss function the second term is a regularizer that we throw in this is the kullbackleibler divergence between the encoders distribution now this divergence measures how much information is lost when using Q to represent P it is one measure of how close Q is to P if the encoder output representations said that are different than those from a standard normal distribution it will receive a penalty in the loss this regularizer term means keep the representation Z of each digit sufficiently diverse if we didn't include the regularizer the encoder could learn to cheat and give each data point a representation in a different region of Euclidean space so now let's move on and have a look at the different types of auto-encoders the convolution operator allows filtering an input signal in order to extract some parts of its content auto-encoders in their traditional formulation do not take into account the fact that the signal can be seen as a sum of other signals convolution auto-encoders use the convolution operator to exploit this observation basically they learn to encode the input in a set of simple signals and then try to reconstruct the input from them modify the geometry or the reflectance of the image now the encoder consists of three convolutional lists the number of features changes from 1 the input data to 16 for the first convolutional layer then from 16 to 32 for the second lid and finally from 32 to 64 for the final convolution and lid now while transacting from one convolutional layer to another the shape undergoes an image compression so the decoder consists of three deconvolution layers arranged in sequence for each D convolution operation we reduce the number of features to obtain an image that must be the same size as the original image so in addition to reducing the number of features deconvolution involves a shape transformation of the images a convolution is the J continue case defined as the integral of the product of two functions after one is reversed and shifted as a result a convolution produces a new function and it is a commutative operation now in the Tauri discrete space the convolution operation is defined as the following equation and in the image domain where the signals are finite this formula becomes as such where of I J is the output pixel in position I and J and 2k plus 1 is the side of a square that is the odd convolutional filter F is the convolutional filter and I is the input image now this operation is done for every location of the input image that completely overlaps with the convolutional filter now let's see the use cases of convolution autoencoders the first one is image reconstruction now the convolution autoencoders learn to remove noise from a picture or reconstruct the missing parts so the input noisy version becomes the clean output version the network also fills the gap in the image next is image colorization now convolution autoencoders maps circles and squares from an image to the same image but with red and blue respectively purple is formed sometimes because of blend of colors where network hesitates between circle or square next up are some of the advanced applications such as fully image colorization latent space clustering and generating higher resolution images moving on to the next type of autoencoders sparse autoencoders offer us an alternative method for introducing an information bottleneck without requiring a reduction in the number of nodes at our hidden layers rather we will construct our loss functions such that we penalize activations within a layer so for any given observation we will encourage our network to learn an encoding and decoding which only relies on activating a small number of neurons now this is a different approach towards regularization as we normally regularize the weights of a network not the activations now there are two main by which we can impose this parsecity constraint both involve measuring the hidden layer activations for each training batch and adding some term to the loss function in order to penalize excessive activations the first one is the l1 regularization now we can add a term to our loss function that penalize --is the absolute value of the vector of activations a in layer H for observation I now scaled by a tuning parameter lambda the next one is the KL divergence in essence KL divergence is a measure of the difference between two probability distributions we can define a sparsity parameter Rho which denotes the average activation of a neuron over a collection of samples this expectation can be calculated in the following equation now the KL divergence between two Bernoulli distributions can be written as such now this loss term is visualized below for an ideal distribution of Rho equals to 0.2 corresponding with the minimum penalty at this point the next one is the deep autoencoders now the extension of the simple auto encoder is basically the deep autoencoders now the only difference to its simpler counterpart is number of hidden layers the additional hidden layers enabled the auto encoder to learn mathematically more complex underlying patterns in the data the first layer of the Deepwater encoder may learn first-order features in the raw input the second layer may learn second-order features corresponding to patterns in the appearance of first-order features deeper layers of the deep-water encoder tend to learn even higher-order features to put everything together we need additional layers to be able to handle more complex data such as the data we use in collaborative filtering so let's have a look at some of the use cases of deep autoencoders the first one is image search deep autoencoders are capable of compressing images into 30 number vectors image search therefore becomes a matter of uploading an image which the search engine will then compress to 30 numbers and compare that vector to all the others in it index vectors containing similar numbers will be returned for the search query and translated into their matching image a more general case of image compression is data compression and deep autoencoders are useful for semantic hashing next up is the topic modeling and information retrieval so deepwater encoders are useful in topic modeling or statistically modeling abstract topics that are distributed across a collection of documents in brief each document in a collection is converted to a bag of words and those word counts are scaled to decimals between 0 & 1 which may be thought of as a probability of a word occurring in the document for example one document could be the question and others could be the answers now a match the software would make using vector spaced measurements the next type of auto-encoders is the contractive auto-encoders one would expect that for very similar inputs the learned encoding would also be very similar we can explicitly train our model in order for this to be the case by requiring that the derivative of the hidden layer activations are small with respect to the input in other words for small changes to the input we should still maintain a very similar encoded state now this is quite similar to a denoising auto-encoder in the sense that these small perturbations to the input are essentially considered noise and that we would like our model to be robust against noise now we can accomplish this by constructing a loss term which penalizes large derivatives of our hidden layer activations with respect to the input training examples essentially penalizing instances where a small change in the input leads to a large change in the encoding space now in fancier mathematical terms we can craft a regularization loss term as the squared Frobenius norm of the Jacobian matrix J for the hidden layer activations with respect to the input observations a Frobenius norm is essentially an l2 norm for a matrix and the Jacobian matrix simply represents all first-order partial derivatives of a vector valued function now for M observations and NH notes we can calculate the values using the following equations so now that we know the various types of autoencoders let's have a look at how it actually works so let's see the code that provides us with a reconstructed image so this is the code for auto encoder that generates a similar output as the input with reduced dimensions the packages that are needed to be pre-installed are numpy carers and matplotlib so first we define the size of our encoded representations the encoded representation of the input is declared as encoded while the lossy reconstruction of the input is declared as decoded the input is then mapped to its encoded representation in order to create the decoder model next we have to prepare our input data and normalize all the values between 0 and 1 the 28 by 28 images will be then flatted in to vectors of size 784 now once the data is trained and tested the auto encoder is trained it will take certain values from the test set encode and decode them now for the visualisation of the reconstructed inputs and the encoded representations it is done with the help of matplotlib library so let's see what happens when we execute this code so now we can see that the data is being trained and tested and this will take a few minutes to complete now once we are done with the training of the data we will have an original image and a reconstructed image as our output our output will look like this so the first row shows the original input image and the second row shows the reconstructed image with reduced dimensions so now we have seen how an auto encoder actually works and how it provides us a reconstructed image so now let's move on to some of the applications of autoencoders now autoencoders are used for converting any black-and-white picture into a colored image depending on what is in the picture it is possible to tell what the color should be for example the leaves of trees are generally green the sky is blue and the clouds are white so all that is needed to be done is to make a computer be able to do this and that is where auto-encoders step in the next application is the feature variation it basically extracts only the required features of an image and generates the output by removing any noise or unnecessary interruption now using autoencoders we receive the same image as the input but with reduced dimensions it helps in providing the similar image along with a reduced pixel value now during the training the auto encoder learns to extract important features from input images and ignores the image noises because the labels have no noises the input scenes by the auto encoder is not the raw input but a stochastically corrupted version now autoencoders are also used for removing watermarks from images or to remove any object while filming a video or a movie now we have reached the end of today's session and I hope you have a clear idea about what our auto encoders and how do they actually work do let us know if you have any questions in the comment section below till then thank you and happy learning I hope you have enjoyed listening to this video please be kind enough to like it and you can comment any of your doubts and queries and we will reply them at the earliest do look out for more videos in our playlist and subscribe to any rekha channel to learn more learning
Info
Channel: edureka!
Views: 63,693
Rating: undefined out of 5
Keywords: yt:cc=on, Autoencoders tutorial, autoencoders tensorflow, autoencoder tensorflow tutorial, autoencoder deep learning, autoencoder explained, autoencoder deep learning tutorial, autoencoder python code, autoencoder machine learning, what is autoencoder, autoencoders neural networks, autoencoder anomaly detection, autoencoder applications, autoencoder architecture, autoencoder backpropagation, autoencoder clustering, autoencoder classification, Tensorflow training edureka, edureka
Id: nTt_ajul8NY
Channel Id: undefined
Length: 23min 57sec (1437 seconds)
Published: Tue Oct 09 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.