Crosstabulations and their Interpretation. Part 1 of 2 on Crosstabulations and Chi-square

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay as you can see this week's topic is tables and cross tabulations chi-square and reporting results and this is the week in which you actually do a statistical test and I'll talk more about what that is in a bit in a few moments so tables and cross tabulations is the main thing laying things out in tables otherwise called cross tabulations and seeing how you can talk about that data and what you can analyze from it what you can see from it okay let me start with them this this slide descriptive versus inferential stats so far these statistics I've talked about particularly in the session last week or sorry two weeks ago are what are called descriptive statistics things that describe the distribution they tell you where the middle is - how well spread out it is and so on so things like the mean the median the mode the standard deviation and so on that the the interquartile range are all ways of describing the figures you've got the statistics if you like it they do that and then refer to as descriptive statistics what I'm gonna go on to talk about this week though is a different kind of statistic which is the inferential and this enables us to make decisions about how those variables are related so how one thing in your dataset relates to something else that one variable relates to another variable to draw conclusions or even to make generalizations about those figures by generalizations below mean there is taking it from your sample to the population from which that song was drawn so we can make decisions about whether those relationships are significant what they mean and so on on the basis of these inferential statistics and the chi-square to talk about this week is one of those statistics inferential statistics it now enables us called in French because we can infer a certain kind of significance based on the values we get in those statistics okay and well I've already talked about relationships we're looking at relationships here and the kind of relationships I'm talking about this week and that you'll be doing in the labs are those expressed through tables through R and they're often called contingency tables or cross tabulations and what they are is an arrangement just like I talked about a few weeks ago in the data matrix of rows and columns in this case of course the the both the rows and the columns are variables from your data set and what including the cells is the the actual counts how many individuals fell into that cell in other words they had a value that was both for that column and for that row the variables we can use here are generally nominal or categorical variables but you can include some ordinal ones to things where you've got a scale of you know from one to five things like that you can include but you don't want too many values usually it makes sense to have just a few values and that's why nominal categorical tend to be what we use in fact if you're using ordinal variables then you're a better tests are used and chi-squared so you might still produce a table but there are better ways of testing the relationships so what I'll be focusing on today is nominal or categorical variables that is variables where we're simply sorting our respondents into categories that one of those or one of those or one of those that male or female their age that group their age in that group what I used in that group and so on so that's what nominal or categorical means as I said the table contains the counts or the frequencies of those individuals any particular cell in the table contains the number of individuals that fall into that column and that row in that the table so one variable is spread across the horizontal axis and different columns therefore represent different values of that variable and the other variables spread across the different on the vertical axis cross the different rows so that each row indicates a particular value for that variable and we have also in the margin so on that what we tend to do is on the right-hand side on the bottom of the table we have counts or often the total number of people so the total number of people who were in across all those different variables values for that row and so the number of individuals in that row and then the same for the columns so the column totals were the actual numbers of individuals that fall into that column in total let's have a look at an example I've gone back to this data set again my views before you don't need to look at the details of the actual values but very simple one of a class for many years ago healthy on this course it's quite a small dose set 34 individuals and I've got information on here about their gender other sex on this column male or female and I'm going to use the one on the far right which is the age group I recoded the age into those who are under 21 and those who are 21 and over so I'm going to ask the question are women more likely to be under 21 than men so you might say are women you know did two more women tend to be in the younger age group than the men in that age group and if I would just simply count up these in fact and I'll show you in a moment how to do this in SPSS it's a couple of clicks and you do it but if you rearrange all the data this is the table you get a very simple table in fact the simplest of the contingency tables that you can produce it's a two-by-two table and there's there's actually some reason for focusing on that just a moment I'll tell you but what you can see is you've got two columns for the two values of the variable sex male and female and you've got two rows under 21s 21 and over for the two values of the age group variable and inside the cells the numbers 3 13 6 and 12 are the actual number of individuals and you can see I've got the marginals in on the far right that I've got 16 altogether under 21 both men are female together and I've got 18 21 and over making a total of 34 people in the the whole group now what we want to ask is is a relationship is there a relationship between the gender and those are in the rows so is there some sense in which there are more males in the younger age group than we might expect a more females or more or more females than we might expect me under age group now the problem here is of course that the actual totals are different is not easy to compare 3 with 13 because you've got 9 miles altogether and 25 females all together so you have to do a grouping of calculation in your head to work out what that means is 3 bigger than it than 13 given that you've got 25 women all together in the group and what we do is we produce some percentages that help us understand that here's one way of doing that comparing the the age of the mount of females by can produce in column percentages what we're doing is producing a percentage for the number of individuals in that cell for that column as a whole so I'm saying of the three males under 21 what's the percentage of that of all the males in that column three out of nine is one-third and you can see here I've got 33% so that's that's that's the percentage so for the females under 21 you take the the actual number in here that's 13 divided by 25 multiplied by 100 maker percent that's 52 percent now you were to do this all of this is being done for you by the program that's what we have computers to do things like this but I'm illustrating this because this is what's going on behind the scenes now the point about doing this is we can now compare the columns we can say both the columns at 100% so we've evened out those differences of numbers of males or females and we can see now more clearly that of the male was only a third one to 21 but of the females 52% are under 21 so in this data set we've got a clear difference there are more females in the younger age group than there are men now that certainly true of that particular group of individuals is it true of all people and all courses and so on that we know if we sample the larger population is it true of all of those we don't know that we know it's true of our particular sample but we might want to talk about whether it's true of a more general population from which our sample is taken students doing this kind of course at university so I'll come back to that the moment how you do that but let me before I do that we're going to be using SPSS for this so that's my very simple type in fact even simple as that table there but here's my very simple two by two table if you do a printout from SPSS and you tick all the boxes there as options to include things if you tick all the boxes for all the options you get a rather complex table like this but in fact this is the same table as the one I've just shown you so if I go back to two there the three thirteen six and twelve are all here the thirteen is at the top of the the first box of the female under-21s aisle sorry I think I've swapped round that the female and male columns of nine slightly differently so the thirteen should have been on the right but it's now on the left that doesn't really matter at all and the twelve and six are down below but in addition I've got an expected count I've got a percentage within the age group so that is the percentage across the row so it adds up to a hundred percent across the row and I've also done as well which I've already done for you the columns inches depending with insects of students so fifty-two percent females are under 21 forty-eight percent over 21 21 and over other so that's done for you as well and it also does percentages and expected values for the marginals as well so you get a lot of numbers on the sheet if you're not careful if you tick all the boxes you don't have to do that you have an option in SPSS to decide which of these extra percentages expected values you want so it's up to you to print out what makes sense to talk through the table and see the pattern some people in the video I'm about to show you and he actually uses the expected counts as a way of working out what the pattern is and you can do that here you could say if we have of the females if we Shoom that they're distributed in the age groups as the group as a whole is that's both man and female together then we'd expect 11.8 females in the first cell rather than 13 if we had the same for the males spread across the groups in that expected fashion we'd expect 4.2 but we've actually got three so for the females we have more than expected and for the males we have fewer than expected and that suggests that there's a pattern there the pattern I've just told you which is there are there are basically fewer sorry more males and in the the older age group and more females in the younger age group then you might expect there's one other thing about this table to bear in mind that is expected counts that one of the problems with using a statistic the chi-square statistic on this is that if you've got cell walls with expected values are less than five it could be an unreliable statistic and you can see I've actually got that here I've got an expected value of 4.2 for the males under 21 and I've got a 4.8 for the the males 21 and over so you have to be a bit careful about using disease particular figures in fact example I'll show you will be different
Info
Channel: Graham R Gibbs
Views: 72,644
Rating: undefined out of 5
Keywords: Chi-squared Distribution, Quantitative, Quantitative social research, Nominal, Categorical, Crosstabulation, Cross tabulation, Contingency tables, Percentages, Marginals
Id: B6bqHNVd-Kw
Channel Id: undefined
Length: 12min 32sec (752 seconds)
Published: Tue Mar 11 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.