Introduction to Statistics..What are they? And, How Do I Know Which One to Choose?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome this is dr. Amanda raccoons and zap you this tutorial will provide an introduction to statistics we will specifically define the two branches of Statistics descriptive in an inferential we'll look at the factors that are needed for choosing a statistical method and finally we'll define and look at the difference between parametric and nonparametric procedures let's get started the field of Statistics exists because it is usually impossible to collect data from all individuals of interest or from an entire population our only solution then is to collect data from subsets or sample of individuals of interest or a sample from the population because really our desire is to know more about the population now whenever we collect data from a sample there's two different types of statistics we can run the first is descriptive statistics descriptive statistics are used to describe the sample or summarized information about them examples of descriptive statistics include mean median mode and standard deviation then there are inferential statistics inferential statistics are used to make inferences or generalizations about the broader population based on the sample data examples of inferential statistics include t-test and Novas and correlation analyses now the type of descriptive statistic that is reported or the type of inferential method that we run depends on several factors and we're going to talk about those next but before we start talking about them or as we begin to start talking about them it's important to note that there are some differences of opinion and controversy about these factors and how they should inform the choice of statistical analysis our discussion here will draw primarily from Warner and this is a really good text to reference in order to better understand both the concepts and the controversies that surround them she talks about this information in depth and we're only going to do a brief overview here for many research situations there usually more than one reasonable and appropriate statistical analysis so when selecting a statistical method Convention says that the following should be considered research design levels of measurement or the type of variable and also assumption violation and normal distribution of the population as I said we're going to briefly discuss these different factors here however again I encourage you to read texts such as Warner and Tabata and Fidel as they do really discuss these factors more in depth the first factor we're going to discuss here is research design and here you can see quantitative research designs listed we're going to talk about quantitative research designs as Campbell and Stanley and Crowell discuss them the first type of research design we're going to look at our descriptive studies the aim of descriptive studies is to understand what is in a specific situation with an identified population for example let's say an educational researcher wants to know what is the overall sense of community of online students the researcher does not attempt to manipulate or exert control over the phenomenon being studied rather he or she simply wants to observe and measure it as it occurs the researcher does not seek to examine any cause-and-effect relationship between variables thus there's no independent variable in a descriptive study and there's no dependent variable descriptive studies are often used simply just to gain knowledge about an identified problem Amba for more sophisticated research is done now descriptive studies vary in rigor thus if conducting a research study for let's say a thesis or a dissertation it's usually important to know the University policy on the use of descriptive studies for this endeavor but again the aim of descriptive studies and what's important to understand here is that descriptive studies are used to understand what is within a specific situation and identified population next we are going to discuss correlational research correlational research examines the extent to which two or more variables relate to one another so for example a researcher doing a correlational study may ask or may desire to know if a relationship exists between online students sense a community in their course grades or the researcher may also want to know how sense of community scores can predict course grades as the example questions imply correlational research designs include two types of studies prediction studies and correlational studies prediction studies are focused on prediction the ability of one variable or multiple variables to predict another variable and correlational studies simply examine the relationship between variables researchers often choose to conduct correlational study as the exploratory research or beginning research to determine if more rigorous research is warranted some researchers even consider correlational research observational research as the researcher really does not manipulate any of the variables so based on the example questions we just looked at this would mean that the researcher would only need to collect student sense of community scores and course grades to conduct his or her study no manipulation obviously is needed the researcher who chooses to conduct correlational research again is simply exam desiring to examine whether or not relationship or a relationship exists between or among variables the researcher less cannot make any statements about a cause-and-effect relationship because he or she does not know the direction of the cause and cannot guarantee that one variable is influencing another variable this is oftentimes why you might have heard a statistician say correlation does not equal causation experimental research is needed to determine if a of a cause-and-effect relationship exists so this brings us to group comparison studies now we're going to look at group comparison studies we're going to start with the causal comparative study which is actually a non experimental design a causal comparative study examines the possible cause and effect relationship between variables it's an ex post facto study which means that the phenomenon is studied after the fact that is after it occurs naturally in the environment or has already been manipulated for example a researcher may want to examine if males and females differ in their sense of community when taking an online course it's impossible for the researcher to manipulate the independent variable of gender therefore the researcher would conduct a causal comparative study so like our descriptive studies the researcher does not attempt to manipulate or exert control over the phenomenon being studied however unlike our descriptive study the researcher does seek to examine the possible cause and effect relationship between the variables thus there is an independent variable and a dependent variable now I'll stop here in notes that I said the word possible cause and effect due to the fact that there are extremists or confounding variables not controlled for in this type of design the results of the of the research can only suggest that one variable may cause another and more rigorous experimental designs are needed to verify these results so let's go ahead and talk about experimental design Campbell and Stanley in 1963 purported that there are three experimental designs pre experimental quasi experimental and true experimental the characteristic are the distinguishing characteristic of all experimental designs is that the researcher manipulates the independent variable let's go ahead and start talking a little bit further about each of these types of designs experimental designs let's start with the true experimental design the purpose of a true experimental design is to examine the cause and effect relationship between variables true experimental designs are often referred to as the gold standard in research practice due to their rigor and controls for threats to internal and external validity now the best way to think about a true experimental design is to think about its three characteristics first of all there's manipulation remember I just said that the characteristic of all experimental studies is manipulation so the true experimental design line is characterized by manipulation this means that the researcher manipulates the independent variable or implements an intervention and observes the effect of that intervention on a dependent variable the second characteristic is control a control or comparison group is used and this is any group that does not serve as the treatment group so you can so the researcher can compare the treatment group to the control group or comparison group a great resource for learning more about comparison groups is cast in 2003 so now we've talked about talked about two characteristics of a true experimental design manipulation and control let's move on to the third the third is randomization the third procedure a procedure required for conducting this design is randomization now that's not to be confused with random sampling randomization involves the assignment of participants to groups on a random basis this means that every participant has an equal chance of being assigned to any group let's say we have a control group and a treatment group that means that every participant who agrees to be part of the study has the opportunity to be assigned to either the treatment group or the control group now according to Campbell and Stanley randomization allows for the researcher to assume group equivalents however it is important to note that researchers like goggle and Borg suggests that group equivalents cannot be certain without that pretest but then Campbell and Stanley say when you add a pretest you actually add a threat to internal validity so let's just say I'm here because we won't go into great detail about that that randomization allows for the researcher to assume group equivalence so nail to review the characteristics or three characteristics of a true experimental design include manipulation control and randomization I'll also note here that also many many researchers and research texts say that random sampling from the population is also needed for a true experimental design but of the three characteristics that we just talked about the element of randomization is really what distinguishes a true experimental design from a quasi experimental design now let's talk a little bit about quasi experimental designs quasi experimental designs determine causality of an intervention or treatment with a target population very similar to the true experimental design quasi experimental designs use both a control group and allow the researcher to manipulate the treatment but they do not include random assignment of participants so there's two primary characteristics of a quasi experimental design manipulation and control quasi experimental designs use either in existing or use existing groups such as classrooms of students and assigns them to either a treatment or a control group oftentimes to quasi experimental designs can use participants as their own controls such as with it such as we see within within group designs and time-series designs but let's look at the idea that in a quasi experimental design that we use existing group let's say a researcher conducting a quasi experimental design wants to ask what's the what effect is participation in an online educational statistics course developed using a problem-based pedagogy have on online students achievement as compared to when students participate in an online statistics course developed using a traditional pedagogy and let's say here in this example that mrs. Smith's class which has already been formed receive statistics lessons using the problem-based pedagogy and mr. Jones's class which is again already been formed receives lessons developed using the traditional pedagogy again sent a quasi-experimental design uses groups that have been previously formed a researcher needs to ensure that the groups are relatively similar to one another that is in our example mrs. Smith's class and mr. Jones's class are similar in terms of their previous achievement in statistics as well as very other variables that could affect the dependent variable such as gender and ethnicity um so any extraneous variable needs to be identified and controlled for oftentimes the way this is done in a quasi-experimental research is to use a pretest so a pretest is done and is controlled for statistically by using analyses such as in an Cova or mein Kobo which we'll discuss in depth in later tutorials last but not least is the pre experimental design and we're not going to spend a lot of time discussing this because Campbell and Stanley actually say these designs are of little value due to the severe threats to internal validity so what's important to note here is that pre experimental designs are usually completed to obtain preliminary research data to determine an effectiveness of an intervention or treatment and these are distinguished from true experimental and pre experimental designs because they oftentimes do not have a control group or they don't include a pretest and have severe threats as I said to internal validity so as we're concluding our discussion on the different types of research design what's really important to understand here is the reason and the purpose of each of these designs as we're talking about research designs I want to make a quick note here every research and statistical text talks about different designs in a different manner the purpose of the designs do not change however sometimes they're classified in different ways are called by different names statistical texts such as Warner classify research designs as they're listed here experimental quasi experimental non experimental and other approaches such as case study time series qualitative methods and instrument developments I will note that one distinction that Warner discusses more in depth is the difference between the within and between subjects design let's take a look and define each of these because they become very important in choosing statistical analyses let's start with between subject designs or between group designs this means that every participant is tested under one and only one condition for example in a randomized experiment with a treatment condition in a control condition each participant is tested either under the treatment condition or under the control condition another way to say this with a little bit different terminology is is that a between groups design is an experimental design in which different participants are assigned to different conditions in the experiment that is individuals who comprise the control group are different people than those who comprise the experimental group and so that's our between subject design or between group design now let's talk a little bit about the within subject design sometimes it's simply desirable to use an experimental design in which every participant is tested under all the conditions and this is called a within subjects design or sometimes the term that you'll hear use is repeated measures design or time series design so in this design every participant is tested under every condition for example let's say that we want to know about students level of concentration in studying statistics and we want to look at two different conditions a quiet condition in a noisy condition and so we test that we take a group of participants and we test their level of participation first in a noisy condition and then we put them in a quiet the same group in a quiet condition and we test their level of concentration again this is an example of a within subjects design now Campbell and Stanley discuss within and between subject designs under what are what they term quasi experimental design understanding the difference is important as I said because there are different analyses used for within group designs and different analyses used for between group enza's designs so it's important to know the different classifications within your field but here the important thing to understand as I said when we were discussing the previous slide is the purpose of the design or the type of the purpose of the design because ultimately the purpose of the design will help you choose the appropriate statistical method let's take any let's take a look at an example of this nail the type of research design does not determine the choice of statistical analysis but it can guide it as the purpose of the design and analysis should align let's first look at between group experiments because between group experiments often involve comparison of means for quantitative variables across groups independent t-test in between group and Novas are often applied to examine this type of experimental data whereas a dependent t-test or within group and nova would not be appropriate this would be more appropriate for a within group experiment that involves the comparison of means for quantitative variables across one group and then sometimes a researcher simply wants to conduct a non experimental design such as a correlational study and this remember involves measuring quantitative variables and examining the relationship between them and so oftentimes a analysis that is aimed at looking at the relationship between variables is Pearson's R or Spearman row so in these situations in these non experimental situations the non experimental data is examined using a correlational analysis we'll talk about each of these different types of analyses in more depth in future tutorials but here it's simply important for you to understand how the purpose of the research design informs and guides the choice of the statistical analysis now while the research design guides the choice of the analysis the choice of the actual statistic is based upon the type of variable that's used either quantitative or qualitative or how the variables measured the level of measurement we can first discuss variables as they're discussed in many statistical texts um and that is as they're measured on one of four levels of measurements and these were first described by Steven in 1946 the four levels of measurement are nominal ordinal interval or ratio nominal or categorical variables that is variables that can be put into categories such as religion or gender ordinal variables are categorical and ranked ordered variables so they can be put into categories and then rank such as socioeconomic status that can be described as low moderate or high and finally there are interval and ratio variables which are measured variables interval variables do not have an absolute zero whereas ratio variables do we can also distinguish different types of variables or distinguish between two major types of variables and that's qualitative variables and quantitative variables qualitative variables are non numeric and categorical variables often described with words and quantitative variables are variables that are measured numerically now in the part for the purpose of this tutorial we're not going to get we're not going to discuss these different types of variables in depth for a more in-depth discussion I encourage you to review the variable tutorial part 1 now the type of variable that you have in a research study or the variables level of measurement influences the type of graph that's used and the choice of descriptive statistics reported here you can see an example nominal variables or qualitative variables categorical variables are often reported and described using frequency charts bar charts numbers percentages and modes whereas quantitative variables are described in these quantitative variables or more like your interval and ratio level variables as well as your ordinal variables are described in frequency tables histograms described using mean and median standard deviation the minimum and the maximum the level of measurement or type of variable as you can see here also determines the choice of the inferential statistic completed whether or not a nonparametric or parametric analysis is used here you'll see that nominal and ordinal levels of measurement can only be analyzed using nonparametric analyses interval and ratio levels of measurement which are both quantitative can be used or can be analyzed using parametric analyses I will make one except I will make note of one exception here and especially in educational research oftentimes likert-type skill data is considered ordinal data if we really think about it it is ordinal data however it's treated like ratio interval data in educational statistics and so therefore it's analyzed like in interval and ratio level data and so for likert-type scale data on surveys and different things that especially validated surveys that we use in educational research those can be analyzed using parametric statistics but that's just a small aside the point here is is that the level of measurement or type of variable determines the choice of either a nonparametric or parametric analysis now parametric and nonparametric are two broad classifications of statistical procedures it's generally easier simply to list examples of each type parametric and non-parametric rather than actually defining the terms and we'll talk about why that is in a moment here I want you to take a look at the list of parametric techniques and then there are alternative nonparametric techniques examples of parametric techniques include the independent samples t-test the one-way ANOVA the repeated measures ANOVA whereas nonparametric techniques include the chi-square the kruskal-wallis test and the mann-whitney u test as i said it is generally easier to list examples of types of procedures parametric versus nonparametric rather than to find them in fact the handbook for nonparametric statistics from 1962 in that I know that's quite a few years ago but this still holds true today says this a precise and universal universally accepted definition of a nonparametric is presently not available the viewpoint adopted in this handbook is that a statistical procedure is of nonparametric type if it has properties which are satisfied to a reasonable approximation when some assumptions that are at least of Mada moderately general nature hold so as you can tell this definition this definition is not really in the least bit helpful but it does underscore the fact that it is sometimes difficult to define the term parametric and non-parametric so for our practical purposes we are going to say that nonparametric parametric statistical procedures rely on no or few assumptions about the shape or parameter of a population distribution from which the sample is drawn so then alternatively we can say a parametric technique makes a number of assumptions about the population from which the sample is drawn for example it has nor the sample has normally distributed scores and then ate and it also makes assumptions about the nature of the data that its measured at an interval or ratio level so what does this mean for us well for a nonparametric you conduct a nonparametric statistic if you have nominal or ordinal data but when working with ratio or interval level data data as we've already said you plan to use a parametric analysis however if the data you collect even if it's when it's at ratio or interval level has majorly violated assumptions then it is more appropriate to use the nonparametric alternative so this brings us to our third factor for choosing a statistical procedure and that is assumption violation parametric procedures are based on a set of assumptions such as the distribution of the underlying population from which the sample is taken is normally distributed there'll be no extreme outliers and the variance of scores are approximately equal across populations that correspond to the groups being studied or homogeneity of variance whereas our nonparametric tests really do not rely on these assumptions about the shape and parameters of the underlying population distribution so for each parametric analysis there is a specific list of assumptions that need to be met so when you collect your data that's at interval or ratio level you do assumption testing so it's important for you to understand what the assumptions are for each and all parametric analysis and how they can be tested because if data deviates strongly from these assumptions a parametric procedure may not be about the best choice and in fact the nonparametric procedure may be the better choice because if you continue with the parametric procedure with major violations of assumptions what could happen is you could make wrong conclusions now Warner tobacco can Fidel and other statistical resources talk about that some minor violations of assumptions are okay and they talk about when that is for example the pair of parametric assumption of normality is particularly sleep particularly worrisome if you have a small sample size let's say under 30 however if you have a larger sample size it may not be as worrisome and therefore you may be able to continue with the parametric analysis as you as you get into looking at individual procedures it's important to consult statistical texts about this but the important thing to remember here is is that when they're when you collect interval or ratio level data and you find that there are violations and assumptions the parametric analysis may not be the best choice and a nonparametric analysis may be a better option so if parametric's procedures have all these rules and regulations you have to meet all these assumptions why not simply just do a nonparametric analyses or nonparametric procedure well although nonparametric procedures have very desirable properties such as making fewer assumptions about the distribution of measurement in the population from which that we draw the sample there are some drawbacks and there are two major drawbacks the first is that nonparametric analyses or procedures generally are have less statistical power then on their their alternative parametric procedures this is especially when the data is not approximately normal and less power means that there is a smaller probability that the procedure will tell us that the the variables under study are associated or different there's less of a chance that we'll find statistical significance um in order for a nonparametric analyses to be as strong as or have as much power as a parametric analysis oftentimes it requires a large sample size however one of the nice things about the nonparametric analyses is that we can do nonparametric gal analyses with smaller sample sizes but again we just have less power the second drawback associated with nonparametric procedures is that their results are often less easy to interpret and make sense of as the results of the parametric analyses and this is because many of the nonparametric analyses or procedures use ranking of value in the date and the analysis rather than the actual raw data so even though it may seem that a nonparametric analysis is preferable because you don't have to meet all the assumptions the reality is is they are left less powerful and often times difficult to interpret so if given the option the parametric option is the better option now we've identified three factors for choosing a statistical method sometimes the easiest way to look at these three theft factors is to put them in question form when you conduct a study you can ask the following questions the first question has to do with the type of variable or the level of measurements when you conduct a study you can ask what level of measurement are the variables in my study in an experimental study or cause little comparative study group comparison studies as Campbell and Andy Campbell and Stanley categorized them the dependent variable is really what we're concerned about here in a correlational study whether it be correlational can focus on relationship or predictive what we're concerned about is all of the variables but mainly in a predictive study were concerned about that criterion variable now once we've identified what level of measurement the dependent variable the variables of interest or the criterion variable is we can make a choice about whether or not to conduct a parametric or nonparametric analysis if our variables are remember nominal or ordinal then a nonparametric analysis is going to be the best choice if our data is either ratio or interval then a parametric analyses may be the best choice however if the variables are measured at the interval and ratio level remember the second factor for choosing a statistical method was assumption violation so we asked are the assumptions for the parametric analysis that we chose reasonably met if they are we can continue with the parametric analysis if they're not we may want to consider the nonparametric analysis Nail we consider research design research design can be very helpful in guiding us to a specific statistical approach the first question that we want to consider in terms of research design is is is the design concerned with within group or between group comparisons if it's a within group comparison then we're going to do a within group stutter within group analysis if it's between group we're going to do between group in the final question about design that we're going to ask that's going to lead us to a more specific analysis is is the design experimental or non experimental in nature that it and is it if it's um non experimental is it concerned with either differences or is it concerned with relationships because if you remember when we talked about research design of it's concerned with differences we may consider something such as a t-test or ANOVA whereas it's concerned with relationship the relationship between variables a relational or correlational analysis may be more appropriate such as a Pearson's R and then finally how many groups are we looking to study one two three four five now that we've talked about the factors and we've talked about the questions that you can ask let's go ahead and practically apply this let's say that a researcher is interested to see if there is a difference in college students course points based on whether they participate in an online statistics course as opposed to a residential statistics course now here we can see that our independent variable is the type of course the online or the residential whereas the dependent variable is course points so remember the first question we're going to ask is what levels of measurement are our variables under study our independent variable is categorical or nominal however our dependent variable is measured at the ratio level and that's really what we're concerned about so since our variable is measured at the ratio level the next question would be is are the assumptions of the parametric analyses unreasonably met because remember if we're dealing with ratio or interval variables we're going to we're going to consider a parametric analyses and appropriate analysis let's say for the sake of this study that our assumptions are met so we can continue with a parametric analyses so the next question we're going to ask is about research design the question is is is the design concerned with in with with in group comparison or between group comparison so in this study are we taking a group of students and are we putting them in an online course and then a residential course and measuring their course points both times or do we have two separate groups that we're comparing well if we look at the study it says that you're looking at the course points based on whether students participate in an online course versus a residential so this really implies that there are two separate groups so we're going to do a between group comparison finally we're going to ask is the design experimental or non experimental let's say that in this case the research are actually assigned the participants to treatment either residential or online and therefore it's experimental therefore it's concerned with the differences between the two groups so we're concerned with differences so that rules out our correlational analyses such as a Pearson's R so the next question we're going to ask is how many groups are there well we have one independent variable with two groups here the online and the residential so now we have all of this information we know that we're going to do a parametric analysis we're going to then use Warner's one of Warner's decisions trees to help us identify the specific statistical method that we're going to use so here we see one of the decisions trees that Warner often uses for hypotheses of difference for a parametric analysis so we determined remember in our example that we were going to use a parametric analysis we determined that it was a between-group study that was looking at two groups and we determined that we had two groups we were looking at the difference between online and residential so if we follow this down we go parametric between two groups we find the analysis or the appropriate analysis is probably the independent samples t-test now as you can remember and as I said before sometimes it's not quite so simple this is a simple scenario that we walk through sometimes it's not quite so simple and sometimes there are multiple analyses or different analyses that you could use one example here could be that you could use an independent samples t-test but you could also use a one-way ANOVA and there when we talk a little bit more about these different analyses we'll talk a little bit about when one may be more appropriate than another however this helps you see how you can use the factors for choosing statistical analyses and as I said we'll talk more in depth later on about choosing specific statistical analysis at this point probably what it's what's most important for you to know and understand is the difference between parametric and nonparametric analyses and the factors that go into choosing a parametric versus a nonparametric and other than other considerations such as research design that helped that will help you later choose a specific statistical analysis this then brings us to the end of the introduction tutorial for statistics at this point you should understand that there are two types of Statistics descriptive and inferential and be able to define those be able to identify the three factors for choosing a statistical method and how they inform the choice and then finally you should understand the difference between parametric and non-parametric procedures and when one is more appropriate to use than the other
Info
Channel: The Doctoral Journey
Views: 316,116
Rating: 4.8524518 out of 5
Keywords: Statistics, descriptive statistics
Id: HpyRybBEDQ0
Channel Id: undefined
Length: 39min 53sec (2393 seconds)
Published: Mon Aug 26 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.