Autoencoders - EXPLAINED

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
say I have an image and I want to transfer it over a network raw image data isn't really compact and can take a long time to traverse over a network in today's world have increased image processing and video streaming such low transfer rates isn't gonna cut it so how do we overcome this network bandwidth bottleneck well from the source we can first compress the image then send this compressed version across the network and then reconstruct the image at the destination it sounds like a plausible solution consider another scenario self-driving cars have been the big talk for a while how does this work though just think big picture we know for them to function they need to keep themselves on the road they need to follow traffic rules and not to run over pedestrians to accomplish this they simply need to detect and delineate objects in its immediate field of vision and this process is called semantic segmentation these two applications are very crucial to our world in 2019 and fundamental way that we can implement them is through an architecture called auto-encoders in this video we're going to understand what they are how they work and how exactly they can be used in these cool applications this is code emporium so let's get started data around us like images and documents are very high-dimensional auto-encoders can learn a simple representation of it there are a class of unsupervised neural networks usually you wouldn't associate neural nets with being unsupervised but today you will the architecture consists of three parts an encoder a bottleneck and a decoder at the simplest level we can have a layer of fully connected neurons for each part the output represents the reconstructed input hence it has the same dimensions the objective is to learn a representation that will minimize the reconstruction loss learning these weights can be done with techniques like recirculation or back propagation however there is a problem a trivial solution would be a zero loss if we just copy the images but this remained that the latent layer doesn't really learn anything which is useless one way to work around this is to constrain the properties of this hidden layer these properties being the size and activation consider size if the number of neurons in H is less than that of the input layer the auto encoder is set to be under complete this is useful as the network needs to learn the latent representations with an only a small set of neurons so we're sure that it's going to learn something the second property is activation if we don't use an activation function for the decoder phase and the loss function is the mean squared error then the results are similar to pca principal component analysis and so the latent representation of K neurons will represent the top K principal components adding a nonlinear activation in the encoder and decoder parts allow us to perform a nonlinear version of pca but then we come back to the same problem if H is unconstrained it will simply copy the input to the output as that leads to a zero loss we need to make sure that H is restrained to an extent for learning it is important to remember that we aren't too concerned about the put of the decoder but more of the latent representation that the auto encoder learns already we can see a problem with this vanilla auto encoder we need the latent feature vector space to be small and the encoder and decoder parts should be shallow if not the resulting latent representation becomes useless as it simply copies the image this means more complex data cannot be modelled accurately and we may end up with under fitted models ideally we should be able to have a latent representation that is at least the size of the input and also introduces a nonlinear activation in the encoder and decoder parts this will more complex models to be fitted we can do this using regularized autoencoders they have the same principles of vanilla auto-encoders but have a modified loss function when you hear regularization you're probably thinking of l1 and l2 regularization where the coefficients are penalized to mitigate overfitting we have a similar intuition here for sparse auto-encoders the overall loss contains an additional sparse penalty term on the code layer H while thinking of sparsity think most neurons are turned off you can relate this to the concept of dropout in neural networks where neurons are randomly turned off this forces the network to learn more accurate representations that better generalize training data let's say the hidden layer has a sigmoid activation this means that the output values will range from 0 to 1 we say a neuron is off if it is close to 0 and on if it is close to 1 we impose a sparsity constrained by making the average activation output of each hidden neuron some very low value say Rho J for the Jade neuron this Rho J hat is the sparsity parameter now this notation may look a bit confusing basically if we have n samples each neuron the hidden layer is activated to some extent we want the average to be some low value Rho J I'll make the activation function a function of training examples as well to show that the activation of the jace neuron AJ depends on the sample xn note it's just a function not a multiplication with the input vector we want it to be a very low value like 0.05 but we cannot just set it to 0.05 it's an aggregate after all not a simple value so we set row J to 0.05 and penalize all neurons with a row J hat that deviates from this value we can model two distributions P and Q with the probability of success being Rho J and Rho J hat respectively the idea is to ensure the predicted distribution is as close to the actual one and we can model this with the KL divergence now this is just one loss function that could work just remember we are considering a sigmoid activation this is equivalent to the KL divergence between two Bernoulli distributions with the probability of success being Rho J and Rho J hat by the way KL divergence is a measure of difference between two distributions ideally we want to minimize it this graph shows the KL divergence when Rho is equal to two for a sigmoid activation clearly it's minimum when Rho is equal to Rho J hat we write the overall cost function as a sum of cost incurred from the vanilla auto-encoder plus the weighted sum of KL divergences from every neuron from threshold so how exactly does sparsity help we can take a look at the network that learns visually say we have an autoencoder whose input is a set of 10 cross 10 images so that's a hundred dimensional input and say that we use a hidden network of size 100 as well after training we want to visualize the components as shown here consider the top-left corner which was learned by the first neuron this neuron learned to detect diagonal edges all neurons like so learn to detect edges of different orientation so together they learned a very effective representation another type of regularize encoder are denoising auto-encoders auto-encoders learn a latent representation from the input text in an unsupervised way denoising auto-encoders learn such a representation from a corrupt input this allows for a better generalization so you can take the input X and add some noise to it the resulting corrupted input is X - or X Prime the hidden layer can no longer simply copy the input to the output hence it is forced to learn some latent representation for the most part until now we've only been talking about autoencoders with a single hidden layer for the code layer the single input layer for the encoder and a single output layer for the decoder such a shallow network can actually be quite useful you can attribute this to the universal approximation theorem in neural net theory a neural network with a single hidden layer and a finite number of neurons can approximate any function with assumptions of its activation however there is nothing stopping it from creating deeper architectures this allows faster training times as we don't need as much training data for a good generalization hope you can now see how auto-encoders can be used in data transfers across a network the example application that we talked about in the beginning of the video they can also be used in instant segmentation swapping out fully connected layers with convolution layers the auto encoders can take image inputs preserving spatial representation but instead of the original image we aim to output a semantic segmented counterpart outline the contours of interest for example this tech can be used in self-driving cars to segment different objects of interest xander on archive insights does a pretty good job of explaining this on his variational auto encoder video so check that out another application is neural in painting instead of adding noise to auto-encoders you can remove rectangular sections from an image and try to reconstruct the original this allows you to do things like remove watermarks from images and I can keep going semantic hashing auto encoder takes documents as input and it outputs a reduced 32-bit address the documents map to closer addresses are considered similar the distance between two documents is given by the hamming distance that is the number of positions in the address that are different from each other the list of application just goes on ever since the introduction of generative adversarial Nets in 2014 the generative models are being investigated in a new light there are so many interesting applications that are coming out and it's always fun to read and play around with that's all I got for you now so be sure to LIKE subscribe hit that bell share watch my videos until the end and I'll be dishing out new content soon so stick around
Info
Channel: CodeEmporium
Views: 26,754
Rating: undefined out of 5
Keywords: Machine Learning, Deep Learning, Data Science, Artificial Intelligence, Neural Network, autoencoders, types of autoencoders, neural inpainting, generative adversarial networks, convolution neural networks, deep learning applications, applications of autoencoders, document clustering with neural networks
Id: 7mRfwaGGAPg
Channel Id: undefined
Length: 10min 53sec (653 seconds)
Published: Sat Nov 17 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.