Tutorial 24-Z Score Statistics Data Science

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this particular video we'll be discussing about z-scores now z-score is pretty much important concept many of you know that but I'll try to explain you a very good example why exactly it is used and how it is used and we're also be taking a very good example to understand I'll just give you a very small example by taking a population of scores of a student in a class and I'll try to show you this particular example so make sure that you watch this video till the end so let me go ahead and try to understand about z-score you know about normal or Gaussian distribution in normal and Gaussian distribution you basically have a bell curve right the bell curve over here the center elementary is basically band mean okay if I go one standard deviation to the right then I will be having mean plus standard deviation one standard deviation then the other standard deviation if I go it will become mean plus two standard deviation and similarly if I go to the left it will be mu minus two one standard deviation and mean minus two standard deviation so this is basically my mean that I have okay and consider that this is my Gaussian or normal distribution I am again saying it as Gaussian and normal distribution okay now if I need to convert this into a standard normal distribution and what are the properties of a standard normal distribution if I go and see my mean is actually zero and my standard deviation is just one right this is the property of a standard normal distribution snd okay and in my write it would be one in my life again if I go more to the right it will become plus two if I go to the left it become minus 1 and this will become minus two now one more thing that you have to remember understand a very good empirical formula with respect to Gaussian normal distribution is that within the first know within the first standard deviation over here they are around 68 percentage of the data of your total distribution whenever you have a Gaussian distribution and similarly within the second standard deviation you have somewhere around 95 per 95 percentage of the total distribution and within the third standard deviation you have something like ninety nine point seven percentage of standard normal distribution so this is basically the empirical formula now if I want to convert this whole data back to a standard normal we apply a very simple formula and for that I will be basically calling it a z-score and the formula is the X of I minus mu divided by standard deviation so by using this particular formula I'll be able to convert this whole data where in my mean will be zero and standard deviation is equal to one in future engineering we use something called as a normalization technique which is also called a standard normalization you basically apply this particular formula for each and every feature but I'll try to explain you why this is important okay and let me consider a very good example over here I have one two three four five in my distribution okay if I try to find out the mean the mean over here will be three right if the mean is three why I'm saying 3 5 plus 4 is 9 9 plus 3 is 12 12 plus 2 is 14 14 plus 1 is 15 15 divided by 5 is nothing but 3 and let me consider that I have computed the standard deviation as 1 so if I try to apply for this 3 value which is my mean the same formula 3 minus mean is nothing but 3 XR I am considering it for 3 if I apply this particular formula minus 3 divided by 1 this will get converted to 0 similarly for will get converted to 1 5 will get converted to and 2 will get converted to minus 1 and 1 will get converted to minus 2 now this is the value this is the thing that we basically happen based on this empirical formula we also know that within the first standard deviation we have 68% within the second standard deviation we have 95% now the most important thing is that if I want to find out suppose within the 1.5 standard deviation away from the mean if I want to find out what will be this distribution I won't be able to say because the empirical formula does not help us to find out this particular answer so for specifically that we use the concept of z-score and for this we use a table which is called a z-score table all these explaining about a z-score table and I also be taking a very good example to explain you that ok so let's go ahead and try to understand with an example so guys I'm going to take a consider an example wherein I am taking a population of students in a classroom and chillie finding this course you know and the score over here is mean is equal to 75 I just found out the mean the mean is basically 75 the standard deviation instead suppose this is my problem statement okay now my problem statement basically says I need to find out the probability where a student will be able to score greater than 60 what is the probability that the student will be able to score greater than 60 now when I need this when I when I have this particular problem statement see that my mean is 75 so we have 75 my standard deviation is 10 to the right 85 then 95 then 65 and 55 but my question is that I need to find out the probability that the student will be scoring greater than 60 now let us go and populate this over here in 60 so 60 will be somewhere here now if I try to convert this into a standard normal distribution to the right what will happen is that this 75 will get converted to zero 65 will get converted to -1 and the 60 will actually get converted to minus 1.5 so this is my point of the 60 now I want to find out I want to find out what is the probability that the student will be scoring greater than 60 right so I have to find out this curve this whole entire curve you know this entire curve I want to find out and definitely not with the help of this empirical formula that we have will not be able to find out so for this we use a z-score table now what is this z-score table the z-score table basically says that suppose I want to find out this what will be the distribution you know below the 1.5 standard deviation I'll be able to find out this particular value so basically there will be a table wherein you can actually check what will be this minus 1.5 value okay and I'll show you that table at the end but just understand that it will just give us the left-hand side do not just give us the right-hand side now in order to compute it I will be dividing this whole entire area entire area to the right into three main reasons so one reason I just try to make this ass so this is my region one this is my region two and this is my region three so by this I will be computer and the region that I need to find out is this whole region right this whole region like what will be the probability where your x value is greater than 60 this is what I want to find out but with the help of Zechs code I will just be able to get this value because this is what it says jet score basically says that if we know the standard deviation like currently it is minus 1.5 away from the mean right I will be able to get this particular region right and if I go and see the z-score value the value will be somewhere around point zero six six something like this see point zero six six eight okay I'm just taking it approximately but after this particular we are at the end I'll be showing you that that score table over they will be able to understand it okay so this third reason right I am actually getting point zero six six eight right now remember for the second region we know that in standard normal distribution this part is symmetrical to this part that basically means 50% of the distribution is present in at the right hand side of this particular mean so this is my 50% value okay so over here this part is basically point zero six six eight this basically means six point six eight percent is this this region now the main thing is that I need to compute this region now because if I add this region and this region I will be able to get this particular answer that I require so let me just compute the region so first region is that I'll try to add up this one one region is basically between this two this mean right I don't know it so I'll just make it as X right now I don't know that now let me consider this two as Y and I know that this is 50% so let me just add up 50% plus right after that this particular region I'll try to add up this is my third region so this will be somewhere on point zero six six eight now I have all this particular value and I know the total value will be equal to hundred because this whole area is 100 percent right so what I will do is that if I want to compute the X so it will become 100 minus 15 sorry this should be six point six so this will be somewhere around fifty six point six okay so fifty six point six eight I can write it as when I subtract one hundred minus fifty six point six eight the x value will be somewhere around you know forty four percent approximately again you can do the computation but I feel that forty four percent is the approximate value now still I have got this particular region value I have forty four percent over here this is my fifty percent now if I want to find out this particular problem statement I will just combine forty four plus 50 which will be equal to ninety four percent will be my answer for this particular problem statement that a probability of a student scoring greater than sixty will be ninety four percent and that is approximately guys I am saying just approximately because by seeing the set score table you will be getting this particular value okay and I think there is some little bit difference but if I show you the source table now I'll just show you you know why and this value will be actually seeing it over there so this is how you actually use that score and trust me guys because if I wanted to find out this particular distribution I cannot use the empirical formula because I know that the empirical formula within this and this I know that and within the third standard deviation I know that within this it will be around sixty eight within this there will be around 95 percentage of the distribution within this there will be around ninety nine point seven percentage of the distribution but when I want to find out this kind of problem whether my probability will be greater than sixty and that I might not be able to compute it for that thing we use this Expo table now what we do is that we divide that into regions and this if we are actually finding out this right side of the region this is obviously be fifty percent because this both curve in the standard normal distribution or a Gaussian distribution are basically you know symmetrical and by that we are able to find out and this is how the computation is basically done so I hope you understood this let us go ahead and try to see the z-score table now let us go and understand how does a z-score table look like now this is what a great score table look like guys you can just go in the Google and search for z-score so we'll be getting the same trees and always remember so whatever standard deviation that you are actually looking at that is basically the Z value if you want to find out the total area it will be basically showing you the left-hand side of the area right now if I want to find out now in my case my sixty value was actually present in minus one 1.5 standard deviation so let me just go and see this z-score table over here you can see that in minus 1.5 where am I it is basically minus one point five zero zero right so here I have 0 6 6 8 so this particular value is basically my percentages 6 point 6 8 so I can take this particular value apply it in my area and basically compute the remaining value and you know that on the right-hand side of that particular Shannon normal distribution we know that since it is symmetric we have 50% and I have actually shown you how to do the computation and this is how you basically look at a z-score table and you can also find out all the different different values with respect to it and suppose if it is present in 1.5 1 or at that time you can go and see Oh a 1.5 and point 0 1 you will be getting somewhere around point 0 6 5 5 and this is basically the area under the curve we can say that whatever cuz we are basically checking out so yes this was all about this particular video I hope you like this particular video please do subscribe the channel if you're not releases final Siana next week a have a great day thank you one and all
Info
Channel: Krish Naik
Views: 75,158
Rating: undefined out of 5
Keywords: Z score, Statistics, Machine Learning, Deep LEarning, appliedaicourse, upgrad
Id: 4Fta6KQ1QHQ
Channel Id: undefined
Length: 11min 58sec (718 seconds)
Published: Tue Dec 03 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.