Feature Selection techniques in Python | feature selection machine learning | machine learning tips

Video Statistics and Information

Video

Captions Word Cloud

Captions

welcome to unfold data science friends my name is aman and i am a data scientist i am sure you may be using feature selection technique from the very first day of your data science journey various techniques right for example correlation analysis for example recursive feature elimination for example some chi square test or anova test many things you might be using but do you know what are the different varieties of feature selection techniques and before that do you know what is that we are trying to achieve when we do feature selection or we are just running some algorithm in python some predefined function in python we just check some numbers and we just accept or reject some features do we really know what is that we are trying to achieve by selecting the right feature what is the definition of write feature all these things i am going to tell you in this video i am going to give you the python demo of all these things as well and explain you the theory of this as well let me take you to the whiteboard and try to explain you what are the varieties of feature selection techniques and what is that we are trying to achieve by a simple data let's go to the whiteboard okay so guys i have a very very simple data here okay and this is a employee data for organization as you can see you have employee name you have employee gender you have years of experience and you have salary okay now from this data we will try to understand first what is the purpose of feature selection okay this is very important to understand guys we will not go to any mathematical explanation or any python implementation first we will understand what is the purpose of feature selection from this simple data okay then i am going to tell you what are the different categories of feature selection techniques okay and then i am going to take one of those categories and show you python demo i am going to show you how it works and i am going to show you in what kind of scenario you can use these kind of techniques because there are many if if i talk of categories right there are multiple categories in multiple techniques one of the category i will cover in detail and then if you guys want i will cover more in the subsequent videos first of all guys in this simple data right employee name implies entire years of experience and salary suppose this salary is your target column okay this is what you have to predict now from your common sense right from your common sense without any knowledge of statistics or without any knowledge of how machine learning works right you can very easily say that hey aman i don't think employee name is a useful feature here okay why because from your common sense you are saying that how employee name can determine the salary of the employee right so based on your domain knowledge i can say not useful okay let us come to the second feature gender in gender suppose you have little understanding of how machine learning works okay and then you will be saying here hey aman i can see all the gender is m okay which means there is no variation in the general which means variance in this column is zero in one way so this is also not useful okay not useful and the third one if you see carefully years of experience and if you multiply this number by 10 right you get the salary so if you have little understanding of how machine learning works you will say hey this is very useful feature okay very useful feature now try to understand guys obviously you will take these two features out from the analysis or model training why you will do that why you will do that forget all the mathematical jargons i am writing here one simple line that you have to understand okay by doing all these things what you are doing is you are making models life easy remember this always guys make models life easy now what is the meaning of making models life easy and what is the meaning of making models lab difficult let me go here and add one more column okay i will see here location location and i will add here let's say bangalore for all the entries i will add bangalore okay all the entries i will add bangalore what will happen is there is no learning for the model from this new columns okay from this new column location and if i give this new column right i am making models life difficult an example of how you are making models life difficult hence if you see this column you simply take this column out that is one example how you can make models life difficult by keeping this employee name column because end of it it's a machine it will treat this as categories and try to learn something but there is nothing to learn here so if you take this out you are making models life easy how else you can make models life easy suppose i come here and i delete this i come here and i delete this okay and then i go here and i write female and i write female now there is some variation in this column there is some variety in this column which means there is some possibility for the model to learn some pattern from this column some possibility for the model to learn some pattern from the column now this column will no more be a not useful column now it can be a useful column okay because we have something to learn here so what is machine learning guys machine learning is learning pattern from the data you give the data where there are patterns and then your model's life will be easy how do you complicate model's life by adding many features which is not meaningful by adding many features which is not adding anything to the learning how you make models life easy by keeping the features which helps model in learning for example this feature for example this feature that is the entire purpose of feature selection okay and of course many things will come here if you add more features then obviously computation complexity will be there right computation complexity will be there if you unnecessarily add more features then problem in model interpretation will be there right model interpretation problem what else can be there if you add unnecessarily more features right uh you know tomorrow you want to explain your model to someone so you can explain your model with two features in a better way than 20 features right all these are problems if you have more number of features in your data but how do we know which feature is good which feature is not good which feature to keep which feature not to keep here i am talking about i am just looking at these four columns and hence i am able to take a call saying this is useful this is not useful but in real world i'll have many features and hence there are different techniques let us stop the techniques so i will say here categories of categories of feature selection there are three main categories of feature selection guys you have to keep in mind okay one feature selection technique is known as filter based feature selection okay other feature selection technique is known as hybrid not hybrid other feature selection technique is known as a wrapper okay wrapper based and third is known as embedded sometimes there is a fourth one also but fourth one i'm not going to mention here as hybrid right otherwise you will confuse unnecessarily just think of these three categories filter wrapper embedded okay what is filter method i am sure you would have heard of something known as correlation analysis right correlation analysis i'm not sure if you have heard of something called called as variance threshold i'm going to show you now in python don't worry variance threshold okay then something known as chi square i'm sure you would have heard of this something known as anova or f test okay and something known as information gain information gain all these if you see right these are your filter based techniques which means if you think of correlation right you put a boundary and you save the if the number is below this then feature is useful otherwise not useful variance threshold also i will show you all these now in wrapper method there is something known as a recursive feature elimination i have a detailed video on this maybe you can watch it okay so any kind of feature selection where you take a group of features and decide what is working for your model okay that is called a wrapper wrapper means you put a wrapper on top of multiple features and then you you kind of see which feature is making more sense for example forward elimination backward elimination all those okay and in embedded methods uh i'm sure you know about l1 and l2 regularization i'm sure you know about something known as pruning of decision trees not pruning you can say through decision tree selecting the features so sometimes we do that right through decision tree we select the features so use model to select features you can say use model right now in this video i am going to cover the filter method in detail and if you guys want me to cover those two in detail i'll create a separate video for wrapper method and embedded method as well different techniques in this category but for filter methods guys let's go ahead in python and try to see how we can do some of the filter based techniques okay so first of all i am i am doing correlation and variance threshold what is the meaning of correlation the meaning of correlation is how two variables vary with each other okay i have a detailed video you can watch that if you have any confusion now i have a simple data i have knowingly kept the data set very simple reason being i want you to understand the concept uh without you know complicating the data and all so here this is a boston housing data and here you have different columns for example crime rate for example what is the air you know quality what is the average age of the house many things are there okay and this is your housing price okay now uh here are all the features so how you can create a correlation chart if you call on any data set if you call df.cor you will get a correlation chart and i am plotting it here if you see here this is nothing but the correlation chart what this correlation chart tells you is ah all these diagonal will be one because this tells you you know diagonal elements correlation is one and others so wherever you see a very high number to give you an example here r a d between r a d and this feature this text feature not the text feature yeah text feature red and text you see a very high number the meaning of that is rad and text are highly correlated variables highly correlated variable means these variables more or less carry the same meaning so what we do generally we put a threshold of 0.9 or 0.85 and remove one of these variables that is one way in which we can do the feature selection okay what is this variance threshold i'm not explaining correlation much because i know you guys are aware of this concept let us let us understand variance threshold okay so various threshold is let's say in the same data i add a new column and put all the values is under let me put it little separately uh let me put it here and let me say df dot head in the same above boston data i have put a new column okay and in this column all the values are 100 you remember i created a column in the in the board and i put all the values in bangalore something similar like this now this column is a useless column why useless column because there is no variation in the column okay and what your variance threshold does is i have opened the sql and documentation see here sklearn feature selection variance threshold feature selector that removes all low variance features this feature selection algorithm looks only at the feature not the desired output i am not looking at any other thing only that column and can be unsupervised learning so what it will do features with training set variance lower than this threshold will be removed so i can give a threshold here in this case i am giving a threshold 0 okay and i am running this and i am saying get support get support means all these columns qualify apart from this column this column is false because all the values are 1 okay same value if i make this threshold as let's say 5 you will see that some more columns will qualify see here this column also qualifies for removal this column also qualifies for removal this column also qualifies for removal okay so what i am trying to tell you here is if there is enough variance in your column then it's useful for your model otherwise not useful and this is the way you can use your variance threshold okay you can refer to scale and documentation that is the reason i showed you these are the two basic techniques guys correlation and variance threshold in filter method now i am going to show you two more techniques one is known as chi square another is known as anova f test okay so you can see i am importing these two and i am taking a iris data set i am separating the data and target i mean features and target and then i am saying select percentile in let me say select k best okay i will show you what are the select k best in select percentile select k best k is equal to 2 for example so what i am saying here is give me k best features and take criteria as chi square test okay and when i run this guys you will see that original features are 4 and number of reduced features are two okay and what are the features after filtering petal length and petal width if i want to see the numbers the scores i can run this and i can see the scores so you can see here petal length and petal with our highest number hence those two are selected if i say k is equal to 2 if i say k is equal to 3 then top 3 will be selected see here petal width is also selected sorry simple length is also selected because that is you know after that 10 so you are getting an idea right i am running a child squad chi square test and i am saying how many features i want i am saying three features i want it gives me three features if i say two it will give me two similar to this i can run a f class if f class if means you are running anova let me go let me take you to the escalant documentation see here chi square i ran now compute chi square test between each non-negative features and classes and remember it works for what kind of variable categorical variable recall that chi square test major dependence between a stochastic variable and this is how it works right and f class if is what i am running now and if you see what is f classified uh here is the documentation of f classifc compute the anova f value for the provided sample so you can either use chi or you can use f class if now i'm using f classif okay so you see here three features is getting selected but the scores will change because we are using a different feature selection technique now and here you what you are using select k best in place of select k best i can use select k percent and select k b k best means how many features you want in the end percentile means what percentile of top features you want very simple okay run this you will see only one feature is getting selected and in percentile also there is you know a default value of 10 percentile hence it is giving one okay this is your score and select percentile you can use with chi square also select percentile chi square and you will see that it is also selecting one feature but the numbers will change because obviously we are using different tests now so how many tests we covered guys here we covered chi square and innova and i showed you two things either you want k best feature or whether you want top 10 percentile of features both the things you can do knowingly i have kept it simple data set iris data set so that it's easy for you to understand i will put this in my google drive you can take this file don't worry so guys in the information gain category right i am using a different data set known as insurance and or csv which has insurance data as you can see age sex bmi children smoker region and insurance charges for the patients okay i am using a function mutual info class if what this function does if you come here it says mutual info between two random variable is a non-negative value which measures the dependency between the variables okay so between the variables where your independent variable will be continuous variable and target will be your categorical variable you can measure what is the information gain from a particular variable on the target for example if i come here you can see features is all continuous variables okay continuous variables and your target is all your categorical variable categorical variable i have knowingly made it like this mutual info you give features target and then you see the feature scores you will see that the highest feature score is for bmi which means bmi is the most important feature here after that charges is the important feature here and after that age is the important feature here so you can take how many ever you want based on these numbers so what all techniques we discussed guys starting from basic one correlation variance threshold then anova chi square then we saw collect select k best and select percentile and we saw information gain okay all these come under the filter categories that is what i have covered in detail if you want me to cover the wrapper category and the embedded category please drop me comments saying you want me to cover at the moment i see more comments 15 20 comments i will definitely create a video on that let me know what doubts you have guys please subscribe to the channel if you have not done yet i'll see you all in the next video wherever you are stay safe and take care you

Info

Channel: Unfold Data Science

Views: 26,907

Rating: undefined out of 5

Keywords: Feature Selection techniques in Python, feature selection machine learning, machine learning tips, feature selection unfold data science, Python feature selection techniques, feature selection, feature selection playlist, python feature selection techniques, unfold data science, feature reduction techniques, feature engineering in machine learning, feature engineering in machine learning in hindi, how to do feature engineering in machine learning, feature engineering

Id: LTE7YbRexl8

Channel Id: undefined

Length: 18min 44sec (1124 seconds)

Published: Sun Mar 27 2022