Standardization Vs Normalization- Feature Scaling

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello my name is Krishna and welcome to my youtube channel today in this particular video we'll be discussing the basic difference between normalization and standardization now guys I hope you have heard of this particular topic it is a very important topic in feature scaling which is an integral part of feature engineering so we should try to understand that why he should use feature scaling and what is the basic difference between normalization and standardization I'll also say that when to use this particular thing and when you should prefer for which algorithm you should prefer using this normalization or standardization technique you know and for which algorithm feature scaling suits you know it is not necessary that you need to apply the feature scaling for each and every algorithm okay so we will try to understand that I'll also show you the coding pattern coding how we can basically code how we can basically perform normalization and standardization with the help of a small data set that I have and I bounded it from tagging you know so I'll show you that particular example so make sure that you watch this particular video till the end now guys suppose if you have a use case and the most important thing for a use case is data so initially you'll be collecting the data and once you collect the data you will be having lot of features okay and those features will be including independent features and independent feature right with the help of the independent feature you will try to predict the dependent feature in supervised machine learning right so when you consider this features this features has two important properties one is unit the other one is magnitude I have collected some of the data with respect to a person and I have collected data like age weight height so all this particular information is collected now if I consider these two important properties that is unit and magnitude now if I consider the feature age the unit that is basically used to calculate age is number of years right from his date of birth like how many number of years has been happened right so apart from that the magnitude thing if I talk about magnitude that is basically the value suppose if I say 25 years the person age is 25 years so 25 over here is the magnitude and years right years is basically your unit so this is the basic thing and for each and every feature this will be there it will be either calculated with the use units it will be calculated with the help of units and magnitude now the main thing to understand is that if you are having many features it will definitely be getting computed by different different units and magnitudes right it need not be always same because if I take an example of height feature so in height feature it may be calculated using feed it may be calculated using hinges right so this unit and magnitude will always vary between all the features so it is very very necessary that for a machine learning algorithm the data that we provide reg you know we should try to scale down that particular data into some scale now what kind of scale I'll just discuss about that in just a while so the two skills the two most common techniques that is basically used is normalization and standardization now if I give you a simple definition of normalization normalization helps you to scale down your feature between 0 to 1 this is what is the definition yes definitely I will show you the formula when I will be actually showing your practical application but just understand that in simple terms normalization helps you to scale down your feature between 0 to 1 now what about standardization standardization will help you to scale down your feature based on standard normal distribution now if I talk about standard normal distribution over there the mean is usually 0 and the standard deviation is usually 1 right so this is the basic difference between standardization and normalization right and which one to use when I just discuss once I show you the practical application and then you will be able to understand more nicely now let us go ahead and try to see a practical application and I also show you the formula of a normalization which is also called as min max gala and show you how we can use a scale on library in order to perform that with the help of Python code and I'll also be discussing about standardization and the library name is basically called a standard scale ax so we'll discuss about this so let us go ahead and try to see the practical application now let us go ahead and try to see a practical application and we'll try to see the basic difference between normal and discuss I when we should be used normalization and we should be suspended ization nowadays if I just talk about normalization over here you can see that the mean definition is that we need to scale down the values of the feature between 0 to 1 and this is the formula that is basically mentioned so this is basically like X minus x mean divided by x max minus X me okay so this is basically the form of min/max Killah which will actually scale down your values between 0 to 1 now oh hey I'm taking an example where I am basically written that p dot read underscore seriously which is the Andaz function and I'm reading this particular content and now all this particular code will be given in the github link so you can basically consider this from the be coupling I give in the description box you can download it from there okay so over here what I'm doing is that I'm just considering the three columns initially from this particular CSV file which is called as my heartis could you cut our CSV and then I'm renaming the column columns as class alcohol and Malak so these are the properties that are present inside that particular dataset so if you consider about wine it is basically combined with various various features you know various chemicals that is mixture of various chemicals and it is basically prepared so once I do DL dot head over here you can see that my output looks something like this right the door five record so I have class alcohol and Malak now in order to show you how to perform in max scalar which is also called as a normalization and again there are varieties of normalization but many people prefer mil max kala in some of the use cases and again I'll discuss about the use cases when you should use that so here I've written from a scale unload pre-processing import min max Killa then I am basically creating an object of min max kala which is inside the scaling object and then I'm just doing fit underscore transform and I'm passing which on features I need to basically scale down now understand guys over here this alcohol and value will be basically calculated or note it down here based on various units and magnitudes right so there is a huge difference over here right there is a huge difference between this but again still I hope you are not burned the idea when we should do it just wait for some time and just explain you men you should be sitting perform this okay so over here you can see that the values are completely different this this may be a bigger number than in some of the but it may be a smaller number right so the magnitude is huge away so we should try to scale down between the same scale and for that I'm using min max kala so as soon as I do this and I pass the flip transform and ask their attributes inside this you can see all the values present inside this and it is getting replaced between I mean it is getting scaled down between 0 to 1 so you can see that the maximum value present over here is 1 and then the minimum value will be somewhere between 0 and 1 you can see all the values right so yes this is what it is happening and always remember this feature the the form that I have actually shown you will get applied to each and every feature right so in that particular way it basically works now similarly if you want to perform standardization which is also called as expo normalization this is the basic formula that you get that is X minus X is basically my future date of minus mu is mean divided by standard deviation and you know that why do we use standardization yeah all the features will be transformed in such a way that it will have the properties of so so here all the features would be transformed in such a way that I didn't have the properties of standard normal distribution with mean is equal to 0 and standard deviation of 1 so this is basically the formula that you can see oh yes that is equal to X minus mu bus standard deviation now in order to perform this and again remember the main thing is that the mean will be 0 and standard deviation will be 1 and for whichever feature it is it will be getting transformed or scaled down in that particular values so if I go ahead down and see over here you can see that from Escalon dot pre-processing you have to import this particular library which is called a standard scalar now standard scale ax again I am Telling You it actually scales down a value considering mean is equal to 0 and standard deviation equal to 1 so as soon as I create an object for Stanek scalar and I do fit underscore transform and it passed the attributes like alcohol and malloc you can see that all the values is got transformed considering mean is equal to 0 and standard deviation equal to 1 in that particular way and usually in this scenario will be giving a if it is getting converted in the standard normal distribution it will be get converted like bell curve but the mean will be over here a zero and each and every standard deviation to the right if I go the value will be one you know one two three like that if I talk about the standard deviation in this case so this was how it is basically done very simple way you just have to use standard scaler and in maxilla and trust me guys this is the most popular used library I mean technique like mill masculine standard scale up for most of the problem statement now I will go ahead and try to make you understand when you should use standard normalization and and you should use min max Allah so let us go ahead and try to understand that let us go and understand when we should use standardization and when you should use normalization and most of the scenarios guys whenever you are using some machine learning algorithm which involves Euclidean distance okay and suppose some of the deep learning techniques where you where gradient descent is basically involved you know gradient descent basically means is a parabolic curve where you need to find the best minimal pattern or global minima point right so that particular point in order to retrieve that particular point if we want to have if you want to get that particular point merged quickly you basically have to scale down that particular values so some of the algorithms like K and n K nearest neighbor K means clustering you know all the deep learning and artificial neural network convolutional neural network so in all these particular cases we have to basically perform scaling guys some of the algorithms are you don't have to perform scaling is some somewhere like decision tree is random forests XZ boost all the boosting techniques when I consider the bagging technique and the boosting technique which in was decision tree you don't have to scale down your values because there is no use of scaling down because at the end of the day you are just creating a decision tree right now decision tree is basically divided based on the features if you keep the value high or if you keep the values small it won't affect that much because based on some conditions the branches will be created in the decision tree but definitely for some of the algorithms where you are discussing like KNN k-means clustering linear regression logistic regression because in linear regression also we consider to that gradient descent tourists reach the global minimum point right so they're also you have to do the feature scaling guys now if I talk about normalization and standardization which techniques should be used and when it should be used now then based on my experience so for some of the use case or for many of the use cases where I have actually used standardization which basically means that the mean will be zero and standard deviation will be one it has basically performed better than the min Max kala okay which is basically a normalization technique now it is not like that min max Keller is bad and it is it should not be used for most of the deep learning techniques were using convolution neural networks right and artificial neural network you basically perform in max kala because you need to scale down your values between 0 to 1 now if I take the example of images your images are between 0 to 255 pixels so if you want to scale down that value you always have to do it between 0 to 1 okay and usually for the images it is done similarly for the artificial neural network the neural networks that you create by using any of the libraries like tensorflow and Cara's definitely they would accept the inputs between 0 to 1 which will help them to learn the weights quickly so this is the basic difference between normalization and standardization and I've also explained you how you should go ahead and when you should basically use it but for most of the scenario in machine learning algorithms if I talked about standardization performs well and that is completely based on my experience you know so this was the basic difference between standardization and normalization and I hope you have understood it I have also played you with the help of code now guys if you are looking for some carrier transition advice you know with respect to data sense how you should be simply move I am basically given a link in the description because I found that YouTube channel very much good because most of the advice that was given by data centers like how they work how did they make the transition will be available in that particular channel so I have given that particular link in the description can go ahead and watch it definitely will give you a whole lot of idea right so this was all about this particular video I hope you like this particular video please do subscribe this channel if you're not already and then I should you have a great day thank you one and all
Info
Channel: Krish Naik
Views: 131,024
Rating: 4.9324765 out of 5
Keywords: intellipat, upgrad, coursera, Normalization, edwisor, Standardization, great learning, krish, appliedaicourse, Feature Scaling
Id: mnKm3YP56PY
Channel Id: undefined
Length: 12min 51sec (771 seconds)
Published: Thu Nov 07 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.