100+ Statistics Concepts You Should Know

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the world is full of information and information is power as Tom Clancy once said if you can control information you can control the people but he never told us what to do with that information once we got it this video won't teach you how to control people but it'll at least tell you how to control information and that's what statistics is for for many people statistics was just one course we took and forgot about but for those that take it further they see the possibilities that strong statistical skills unlock you're not controlling people but you're paid nicely to handle data and information if you need more stats in your life but don't know where to start learning then welcome to the statistical 100 if you can Master these 100 topics then you'll be one step closer to tricking people into thinking you're good at board games in the beginning there was Data data is a general catch-all term for information that we observe this information can be in number form also known as quantitative data or can be in word form known as qualitative data for this video we focus on old part numbers data can come in different flavors one of those flavors is discrete data represented by the integers one important example of discrete data is binary data which can only take two values zero and one zero and one are useful because they can represent on and off states such as true and false we can represent group membership with binary data using one to indicate that someone is a part of one group while zero can represent the other group this logic can be taken further with categorical data where there can be more than two groups another important example of discrete data is Count data represented by the positive integers the other flavor of data is continuous data represented by the entire number line continuous data is useful for things that fall in the Continuum like a clinical biomarker or age one important example of continuous data is time to event data which represents the amount of time until some event such as debt or remission there are many types of data but they're all haunted by the same demon Randomness Randomness prevents us from making perfect predictions with data and winning big at the casino 100 of the time this raises an important question how can we still learn from data despite this Randomness statistics can be viewed as the study of data which includes everything from data collection to data presentation statisticians are in the business of looking past the randomness and data when we think of Randomness we usually think of it as chaotic and uncontrolled but if we assume that this Randomness has some kind of structure behind it then we can use this to our advantage this is where probability Theory enters statistics the central object that we use to represent data is the random variable a random variable can take on different values with different probabilities random variables are usually represented by capital letters while actual values taken from a random variable usually lowercase letter random variables have special functions that describe the probabilities associated with each value this function is called a probability density function or probability Mass function depending on the nature of the data the random variable needs to match the data so it can be either discrete or continuous the shape of a PDF describes the structure behind the randomness in the data you may also hear it described as the law of the data technically the distribution can take any shape but there are a few common ones you need to know about the uniform distribution tells us that all values are equally likely the Bernoulli distribution describes the distribution for a binary data and tells us How likely we'll observe a 1 which we call a success the binomial distribution is similar but tells us the probability for multiple coin flips or successes the poisson distribution describes counts and the most famous distribution of all is the normal distribution which has a characteristic Bell shape the normal distribution is used everywhere in statistics and it's the namesake of the channel the PDF is not the only way to describe Randomness in the data there's also the cumulative distribution function or CDF and the CDF is useful for defining quantiles and percentiles of a random variable there are other useful values that we use to characterize random data we might want to know what a typical value with typicalness is captured in the measures of central tendency the one most people know is the mean known in technical terms as the expectation or expected value another measure of typicalness is the median which marks the middle of a data set in terms of the CDF finally there's the mode which describes the most common value defined by the peak of the PDF we might also be interested in the range of values a random variable might take this can be described using the measures of scale the variance gives us a sense of how far values can be from the expected value and the standard deviation tells us the spread of data in terms of the original data units the measures of shape tell us more specific details about a shape of a probability distribution the skewness tells us the imbalance in the distribution towards one side or the other skewness implies the presence about liars or extreme values in a distribution kurtosis tells us about the pointiness of a distribution often we have to deal with functions or transformations of a random variable most commonly we deal with functions of normal random variables such as the T and case Squared Distribution other times we want to see how the probability of one random variable is influenced by another in this case as we want to know about the conditional probability of that random variable when random variables don't influence each other we think of them as independent another important concept using conditional probability is Bayes rule the Israel tells us that we should change our beliefs Based on data that we observe humans are natural bayesians but maybe not so much in this polarized World pager will gave rise to a framework of Statistics called bayesianism but most students are trained in frequent statistics instead these two schools have a lot of beef but that's a subject for another video now we'll use these probability tools to our advantage statisticians start by defining a population population is a group that we're interested in but often don't have the resources to fully observe instead we're forced to collect data from a small subset which we call a sample next we assume that the data was generated by a probability distribution or mathematical formula this assumption is our statistical model statisticians translate an aspect of this population into a parameter within this model because the population is unobservable this parameter is also unknown our goal is to use the data to construct the guests for the parameter which we call an estimator this process is called inferential statistics because we're trying to infer about an unknown population based on collected data this is distinct from descriptive statistics which are used to describe the data we collect the first estimator that people learn about is the sample mean we learn about the sample mean because it has many good qualities as an estimator the law of large numbers tells us that when we collect large amounts of data the sample mean will get very close to the population mean we call this consistency because samples are random by extension the sample mean is also random this means that estimators are also random variables and it's crucial that we understand the distribution of the estimator this distribution is so special we give it a name the sampling distribution if we know what it is we can tell if observing a single sample mean is likely or rare depending on the distribution we find the SIM Central limit theorem tells us that a function of the sample mean is a standard normal distribution assuming that we have lots of data the law of large numbers and the central limit theorem are examples of asymptotic theorems and they're the reason we try to get the biggest sample sizes we can when asymptotics don't apply we can possibly turn to other methods like the bootstrap statisticians translate beliefs about a population in two statements about the population parameters sampling distributions are crucial to understanding hypothesis tests there are two hypotheses we create in hypothesis test the first is the null hypothesis which represents a belief about the world that we want to disprove the second hypothesis is the alternative hypothesis which opposes the null as an example our parameter of interest will be the difference between treatment groups our null hypothesis is that there is no difference between the groups so the parameter is equal to zero as we collect data we have a decision to make we assume the null hypothesis is correct and extend our logic from there if we assume the null hypothesis is true it suggests that we have particular sampling distribution then we see where our estimator lies relative to this null distribution we want to know the probability of observing our sample mean or more extreme value this probability is known as the infamous p-value if this probability is low enough it suggests that it's unlikely that the world under the null hypothesis would have produced the sample mean that we got we can also make a decision based on a confidence interval a range of parameter values that could have realistically produced our sample mean if this interval doesn't contain the null hypothesis value then we can also reject there's a duality between p-values and confidence intervals so we know they'll lead to the same decision after making a decision there are two ways that we can be wrong a type 1 error happens when the truth is that the null hypothesis is actually correct but we decide to reject it this is like saying the treatment works when it actually doesn't a type 2 error happens when the null hypothesis is actually false but we fail to reject it this is like saying good medicine doesn't work ideally we want to minimize the probability that both of these errors occur minimizing one increases the chances of another instead we Define a low enough probability that we can tolerate for a type 1 error which we call the significance level after setting this we minimize the probability of a type 2 error this is also known as maximizing power [Applause] there are lots of hypothesis tests we can conduct depending on the question we want to answer if we want to characterize the population we can perform a one sample test but if we want to compare two groups we can use a two sample test the central limit theorem tells us that the sampling distributions for these will be normal and we can take advantage of this to produce a z statistic Z is usually used to denote a standard normal variable with zero mean and unit variance C statistics assume that we know the population variance but this is unrealistic in practice if we have to estimate this too it converts the Z statistic into a t statistic don't be surprised but a t statistic comes from a t distribution which has a slightly wider shape than a normal if we're comparing three groups we can use an analysis of variance or Anova to check if they'll have the same mean all of these tests assume continuous data or large sample sizes but if we're dealing with binary data we can construct the contingency table and perform a chi-square test these hypotheses tests are all types of univari analyzes since they focused on single random variables if we want to check relationships between variables we need regression linear regression lets us see how one variable influences another if the outcome is binary or a count then we can use a generalized linear model instead to estimate the parameters in these regression models we need to turn the maximum likelihood estimation or an optimization algorithm like newton-raphson you might suspect that treatment effects will vary over time so you can collect data from people across multiple occasions if you do this you'll no longer have independent and identically distributed data so you'll need to shift to longitudinal modeling for this we can use a GE model or a mixed effect model to account for the clustering effects we may also want to include multiple predictors in all of these models and choosing this set is called variable selection we can do this by asking experts in the field or checking past research another rule of thumb is to include potential confounders which can muddy the predictor outcome relationship if you don't account for them it's very important to know how data has been collected if you have data from a randomized controlled trial the results from your experiment could be considered causal instead of correl relational without this randomization our data comes from an observational design and we definitely can make causal statements with it this is why statisticians care heavily about experimental design so that we can know precisely what we can conclude from our data in statistics we must always keep our assumptions in mind models are assumptions themselves and we may have more depending on the model the models I've discussed so far are examples of parametric statistics if we don't want to use a parametric model we can use a non-parametric model instead for example the man Whitney test is a non-parametric form of the two-sample t-test then there's also semi-parametric statistics the most famous semi-parametric model is the Cox model used in survival analysis one part of the model is parametric which describes how survival changes with the treatment while the non-parametric part is an entire function called the hazard function inference and description are not the only statistical goals there's also prediction where we try to predict the value of a future observation based on a Model we've estimated this starts to venture into the field of machine learning where you'll start to see black box models modern statisticians deal with very specific but exciting modeling problems it's very common to assume that the number of predictors is much less than the sample size but in feels like genetics this is rarely the case so we need High dimensional statistics and other exciting areas causal inference which encompasses a set of techniques that allow us to make causal statements from observational data one of the pivotal ideas in causal inference is the counterfactual framework as I've mentioned before we need several assumptions and statistics if these assumptions are violated then our models are useless some reachers develop robust statistics that will enable proper inference even if these assumptions are wrong if you're excited to start doing statistics how do you start I recommend learning a statistical programming language so you can start playing with these ideas python is good but it's a general use language I recommend picking up R since it's dedicated to statistical analysis R makes it easier to examine data and perform exploratory data analyzes that can help inform how we model data statistician use programming to do extensive simulation studies to test new models before they can publish their work so that's also another skill to learn finally there's no free lunch and statistics statistics is hard work and because she jobs aren't easily earned everyone has data so everyone will eventually need a statistician so hopefully by watching this video someone will finally need you as well thanks for watching and I'll see you in the next one foreign
Info
Channel: Very Normal
Views: 12,814
Rating: undefined out of 5
Keywords: statistics, biostatistics
Id: UhJ_F4uovgE
Channel Id: undefined
Length: 13min 46sec (826 seconds)
Published: Mon Jul 31 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.