Covariance vs Correlation with simple data | Covariance vs Correlation Coefficient

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to unfold data science friends my name is aman and i am a data scientist some of the terms when you search in internet tend to confuse you okay the reasons for that is many people write their definition in different ways one of such term is covariance and correlation okay i am going to take a small data and explain you what is covariance and what is correlation okay this is a fundamental concept which should be absolutely clear to you and also a favorite interview question for many data science interviewers okay so ensure to watch the video till end and then you will be able to explain correlation and covariance in pen and paper without any tools okay let us start guys first is we have to understand what is this word made up okay covariance is covariance co means together variance you know that is a statistical term now let me ask you a simple question forget all these calculations for now okay what is variance guys okay so i'm writing some numbers here let us say after your office workers colleges and your you know day work you go and watch netflix okay so every day you watch for two hours someday you watch for 2.5 hours some day 1.5 hours and suddenly one day some of your favorite web series has come and you watch for 10 hours okay so due to this 10 the overall variance of this week will increase or decrease you have the answer it will increase right and let us say every day you are watching 2 2.5 1.5 and someday you make it 0 okay will your variance increase or decrease it will increase again so what we have to understand here is the definition of variance is more closer you are to mean your variance is low any distribution of the world you take let us say this is your mean line okay this is your mean line from the mean you can be either up or down okay all your observations now more closer you are to this line the variance of your overall data will be low and if you go far which means more observations far from this like the way i give example of 10 here your variance is going to be high that is variance in plain english okay now here what we are trying to do guys we are trying to measure the variance of two variables together so why i told you that internet sometimes confuse you is in internet they will write measures how two variables vary together fine but we will understand with an example by definition covariance is x i minus x bar x bar standing for mean of x multiplied by y i minus y bar y bar standing for mean of y what is mean of x i have taken two variables here x and y mean of x is 11 and mean of y is 44 okay again you can take the same example number of hours how many hours you spend in a week on netflix let us say variable x let us say how many hours you sleep let us say variable y okay in a week these two variables are there we will measure we will calculate the covariance of these two variables fine what is the mean of x 11 mean of y 44 let us plug in the values guys first will be take the difference from the mean which means 10 minus 11 that is what i have written here minus 1 other variable take the difference from the mean 40 minus 44 that is minus 4 fine next variable 12 minus 11 how much 1 48 minus 44 how much 4 next variable 14 minus 11 how much 3 56 minus 44 how much 12 okay all these things will be summed up because it is a summation from i is equal to 1 to n okay and 8 minus 11 how much minus 3 32 minus 44 how much minus 13 minus 13 or 32 minus 44 right how much it will be 2 1 minus 12 right minus 12 so we have to sum all these numbers here and divide by 4 whatever is previously it was 27.6 with that calculation so whatever is that number guys that is your covariance now try to understand how covariance will change when we change one of these numbers okay so here let me make this 32 as let me make this 32 as uh 48 okay so somebody watches only eight hours of netflix but still sleeps for 48 hours in a week okay now in this 48 what will happen is this 12 will minus 12 will change to what number guys this minus 12 will change to 48 minus 44 which means 4 due to this the overall numerator will decrease why it will decrease because now both these numbers are not negative one is positive one is negative hence the overall numerator will decrease denominator will always remain fixed as four because number of observations are four so what will happen to our covariance it will come down now imagine in your data there are 100 records okay some values will be plus age some values will be minuses some value will be plus say some value will be minuses i told you why it's plus and minus right you take the difference from the mean overall product sometimes plus sometimes minus a product can be plus in how many ways guys two ways if you multiply two numbers the result can be a positive number in how many ways two ways if you multiply a minus with minus or a plus with plus correct here multiplying minus with minus hence the product is plus in this case it is minus because the sign is opposite now let us draw a logical conclusion from what we are trying to do from this covariance okay let us say guys this is your mean line of x okay this is your mean line of y okay we come to the first observation and we see whether it is above the mean or below the mean first observation 10 is below the mean okay what happens to y y is also below the mean fine we have a number here okay next observation x moves above the mean okay where is y moving is it moving above the mean or is it moving below the mean if it moves above the mean then one of these two condition will be satisfied basically this condition will satisfy and numerator of this formula will go up if y comes below the mean then opposite signs will come right and numerator of this formula will come down in the end in the end whatever is the sign of your covariance that tells you what is the direction of correlation overall between x and y as you are seeing here some terms are making it positive sometimes are making it negative when the all terms will be positive if the movements are in same direction observation is above the mean both below the mean both if this is happening in all the rows then your covariance number will be very high if it is getting violated in some of the rows covariance number will be somewhere in between if it is getting violated at many places covariance sign i am talking about when i say number you have to consider in the final plus and minus right so your plus and a number will come your minus and the number will come okay but in the end what we see is whether it is plus or minus that tells you what is the degree of your covariance and that is why we are saying how two variables are varying with each other whether they are varying from the mean in same direction at most of the places or in opposite direction that is case one case two is one up one down which means these signs are opposite okay in the end if we get a plus sign we say variables are moving in same direction if we get a negative sign we say variables are moving in opposite direction clear now in statistics and mathematics and data analysis guys it's all numbers right we cannot rely on just these signs so i'll give you an example let us say i write here one millimeter okay one millimeter now this one millimeter you cannot say it is a large number or small number the first thing as a data analyst you should tell me in what context you are telling one millimeter whether you are measuring size of a molecular you know device molecular thing in the pharma industry whether you are measuring size of a bacteria or whether you are measuring size of the earth or you know how many kilometers is bangalore from delhi so one millimeter is you know context-wise it will be less or it will be more okay hence we should have a scale on which we can measure right here we do not have any scale in covariance to give covariance that scale this term is introduced which is known as correlation what is correlation guys i'll just write the formula here correlation is very simple to understand so you say correlation of x and y is equal to covariance of correlation of x and y is equal to covariance of x and y and in the denominator you will say like this okay now why you say this the reason for that is we want to limit we want to limit the range so that we can say what is more and what is less at the moment we put this number in denominator the correlation value will always be between minus one to plus one okay and if our correlation value is nearer to minus one we say strong negative correlation if our correlation is value is nearer to plus one we say strong positive correlation only one basic difference between covariance and correlation covariance gives you just the direction correlation gives you direction and strength both correlation the correlation does not give you any uh unit of measurement for example kilogram centimeters meters cubic meters no covariance will give you measures i mean units okay that is the difference between correlation and covariance so i hope you understood now i'm sure you can explain this to someone on pen and paper guys and in statistics as i always tell you if you are able to explain me something on pen and paper with some example that is when you understood the concept okay let me know in comment what doubts you have guys please subscribe to the channel if you have not done yet kindly share my videos in data science groups you are part of that will help me a lot i'll see you all in the next video guys till then wherever you are stay safe and take care
Info
Channel: Unfold Data Science
Views: 10,267
Rating: 4.8939757 out of 5
Keywords: Covariance vs Correlation with simple data, Covariance vs Correlation Coefficient, What is difference between correlation and covariance, Understanding Correlation vs Covariance, How is Covariance different from correlation, correlation vs covariance, correlation vs covariance matrix, Unfold data science, covariance vs correlation, correlation vs covariance difference, correlation vs covariance in statistics
Id: sU8RsIsZ6Dg
Channel Id: undefined
Length: 12min 5sec (725 seconds)
Published: Fri Feb 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.