Inferential vs Observed Statistics...

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to complexity made simple my name is paul allen and before we get into today's video just a reminder some great news the design of experiments for 21st century engineers the minitab version has just been released i know for those of you unfortunate enough to have selected minitab you have a great deal of difficulty in understanding this software so we've created this special version of this text with the minitab screenshots the link to lulu.com where you can buy this book is in the description below and of course you also have the option of purchasing drink tea and read the paper which is the perfect book to go with your green belt or six sigma black belt training the link to lulu dot com for that book is also in the description below and of course the other thing that we'd really love you to do please go to buy me a coffee.com and make a small donation all of these things the purchase of the books and the donations they help keep the channel moving i'm really grateful to all of those people who are currently donating many thanks for your support and your help and now let's get on with today's video welcome to complexity made simple my name is paul allen i'm the subject of today's video well we're going to take a look at the difference between summary statistics and inferential statistics i'm also going to call summary statistics observed statistics because that's typically how you deal with the numbers it's what you're observing in your data sets and how important it is to drop the observed number and understand inferential statistics because this is where statistics goes completely wrong for most people most people believe that all i have to do i'm going to use statistics it's not difficult it's not clever all i have to do is collect some data look at the numbers it's telling me something and everything will be right with the world that is where so many mistakes are made so let's take a look at this we are going to compare observed stats versus inferential drawing a deeper conclusion from the data that's in front of you observe diverse is inferential now as i say the way this is going to show up in most manufacturing companies is the idea of taking a sample often we're taking a sample right at the beginning of a new product so what we'll often do is someone will take a sample they'll get i don't know maybe 20 data points i'll take a look and one of the things that they will do they might compare it to the tolerance they might work out the they might work out the mean in other words what am i observing from my 20 data points and what's what's the defect rate what am i observing from my 20 data points and they say well okay what i'm observing is that all 20 data points a defect free therefore what's my conclusion when i release this product out into production and i produce a hundred thousand that i'm going to need to produce for the life cycle of this product well of course what they conclude is that they're going to be defect-free you're using the observed results in order to conclude what's going to happen in order to conclude how you're doing if you do this you are going to make some of the biggest mistakes ever because if we look at it from an inferential point of view we're going to see things that the observed data is not seen so let's turn this thing around and look at it from the point of view of a different pattern look at it from the point of view of a distribution rather than individual data points and did they pass and fail now of course in order to get this distribution we need some summary stats we're going to need the mean and we're going to need the spread the standard deviation if we work those two out it'll work out this distribution for us we're assuming by the way that when we we plot the histogram of this data set some normality is appearing in the data as well so the pattern is there we're not we're not making the pattern up but once we understand that a pattern is present what we're gonna understand to understand of course is that if we've only taken a sample of 20 data points well if that's the shape up there where are the 20 data points likely to have come from well they're likely to have come from the middle so if we put let's put tolerances in here they come from the middle they're all inside the tolerance they're all defect-free but look there's more data in the middle they're bound to come from the middle i've only taken twenty i've taken twenty out of maybe ten thousand twenty thousand thirty thousand that i might be making in the next year if i only take twenty where are they going to come from well they're going to come from the middle so what we need to do we need to infer how extreme could the results get how much variability could the results produce if i'd have let this process produce a thousand or five thousand what would i have seen and this of course is where inferential statistics comes in because once you understand the pattern and you have the pattern drawn you have it you have it sized up in a way by having the mean and the standard deviation you've quantified the pattern you've quantified what's going to happen and what we're able to do is we we're able to calculate how much data do we expect to appear in the tails in other words what's the defect rate going to be long term so you've taken 20 data points they're all defect-free and now you're about to say release the product it's going to run brilliantly you want a boat to make the biggest mistake ever and you're about to cost your company thousands and thousands of pounds but if you use inferential statistics something that you were probably taught in your degree but you brushed it off and thought you didn't need this stuff here we go now we're going to be able to say even though the sample was defect-free we are going to make a prediction that we expect two percent in that tail and two percent in that tail we're gonna have a four percent defect rate if we decide to switch this process on okay now think about it 20 data points what's the chance that i'm going to pick up a defect if i'm running on a four percent defect rate what's the chance that i'm going to pick up a defect in those 20 data points it's going to be a pretty lucky day at the pretty lucky day at the office but if i take the statistics the mean and the standard deviation and i assume that there is a pattern present because i'm beginning to see it from the histogram then i can draw a much deeper conclusion i can predict the long-term defect rate i can draw a much deeper inference and this is the power of real statistical analysis drawing proper inferences from what you see by the way making sure that you get proper sampler sizes as well would also be part of inferential statistics but in because in order to get this in order to get this prediction right in order to get this estimate right quite honestly a sample size of 20 is terrible you should be taking if you're measuring something a sample size of 30 to 50 and it's all at all times you should be trying to measure things not just try to say pass or fail so the sample size is not great and if we understood inferential statistics we'd know that in order to get a good estimate of what's going to happen long term we need a good sample size don't forget this is cheap here this at this point this pilot run you want to do a pilot run how many pieces do you want 30 to 50 that's cheap because you are about to make a hundred thousand and you are potentially going to make piles and piles of defects don't you want to just get a little bit of data to make sure you don't make that mistake and that's what inferential statistics are for so you know the mean and the standard deviation maybe you were taught it in a class at school maybe you were taught in a classic college or university and you didn't think it was any use at all apart from passing your exams that's not true you can you can get much deeper understanding of your data sets much deeper understanding of your processes you can predict your long-term defect rates from the first 30 data points if you use inferential statistics use the mean the standard deviation and of course what this really is it crops up in cpk calculations so cpk calculations are using inferential statistics use inferential statistics and you will make better business decisions and you will make piles and piles of cash [Music]
Info
Channel: Paul Allen
Views: 202
Rating: undefined out of 5
Keywords: lean, six sigma, Six Sigma greenbelt training, Six Sigma Blackbelt Training, Shewhart, Juran, Deming, Taguchi, SPC, MSA, FMEA, DOE, X bar chart, Wheeler, Janam Sandhu, Mrnystrom, Gemba Academy, Full Factorial, Central Composite Design, Ronald Fisher, Hypothesis Test, p value, Histrogram, minitab, Pareto, multi-vari chart, https://youtu.be/QH984PnwRDE, https://youtu.be/f_fjqCpd67Q, https://youtu.be/AGJ1QYI2B4c, https://youtu.be/gsD8V2_eZ0A, https://youtu.be/mM6EyMvvAKk, quality hub india, simplilearn
Id: 3J-8IanN-5M
Channel Id: undefined
Length: 11min 37sec (697 seconds)
Published: Tue Nov 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.