The SPSS Item response theory (IRT) analysis | New

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi this is doctor vahidaridus here in this video i'm going to demonstrate how to do a three parameter logistic irt analysis now there are three uh irt models or item response theory models one being one pl or one parameter logistic model which is similar to rush to rush model the second one is a 2pl parameter model and the third one is the 3pl that's three parameter logistic model if you're interested to learn more about the technicalities of this kind of model i would recommend that you read birnbaum 1968. in addition to that what i would like to suggest is that you take a look at our recently published paper in studies in educational evaluation which is a systematic review of item response theory in language assessment if you are still interested to learn more about the technicalities of irt models so one of the best books is uh this uh item response theory principles and applications by hamilton and swaminathan okay so let's get started with our irt analysis and see what uh we can get from the analysis today um there are a few steps to take before we start the irt analysis using spss we should remember that we need to have r 3.5 on your pc if you have not installed it please go to this website here and install it on your machine that will be the first step i'll leave this link in the comments section of this video below it so you can just easily go there and download it i think it should also work on mac but i'm not sure if it's exactly the same version but i know that people have have been able to install r3.5 on their pc on the on their mac machines so i've already done that let me show you this is my r 3.5 then after you install r 3.5 let me go to stage 4 and just quickly tell you what these four uh packages are so one of them is is called mass package you can copy this command as i will leave it in the comments section again copy it and paste it into r and click enter so after this window pops up it's a transmitter window you can choose any of these options i'm going to choose the first one on top and click ok ok since i've already uh installed this i got this sort of error message but it should be fine and and the rest of them are going to be installed in the same way you just copy and paste it into the r interface and enter and you will install them after doing this installation then we go to our spss and here is stage two in spss we need to install stats underline irm and therefore i'm going to go back to spss through the extensions menu in the previous video i discussed how to connect your r with spss now some of you might not have watched that video so i quickly tell you how to do that the first thing is that you need to make sure that you have this additional option which is by default not there in spss output or in spss interface what you should do is to add it before the creating the connection between r and spss so the first step is to go to extension hub and yes so after the connection is created between spss and in the extension hub since i've already installed the package uh r3.5 as i have discussed in the previous video i cannot find it under not installed but you will be able to find it under this section not installed so what i want to do is to toggle between explore and installed and show you where that extension is and how it looks like so that you can find it too so if you scroll down you will find stats are 35 which is actually 3.5 now what you you will need to do is to click on a box like this in exactly the same place but that box will be install extension because i've already installed it i've got uninstall this extension so if i click this it will be uninstalled anyway so i'm not going to do that because i need it and after you install uh you just click ok and it will be installed for you that's the first package that you need to uh install on your spss the other package that you need to install is called stats irm and that's again within spss itself so let's look for stats irm it should be around here so i have already toggled back to explore under notice not installed and i found stats irm an easy way to find it is to sort everything by name and then find it then get extension click on it and then click ok accept click finish and you've got one extension installed and click ok ok now we'll wait for the results of this installation hopefully it will be installed fine okay the bundle has been successfully installed as you see from this message the last thing we need to do is to go back to extensions to click on r 3.5 configuration and enter the home directory of r 3.5 we should go to our c drive and see just copy in the address of r 3.5 and paste it right here and then click ok i've already done that before so i don't need to do this so i'm going to click cancel so after this the the installation and the extension are ready for the start of the analysis okay what we need to do is to go to analyze under this uh drop down menu go to scale and we'll find our item response theory model extension previously it was not there but since we have installed it we can find it so i'm click i'm going to click on item response theory remember that this is going to be a three parameter logistic model so we will not get one pl or 2pl i have a few items here five of them are grammar items and five of them are vocabulary i can actually change the the display the variable to display names and then out of these 10 i think i should have a kind of uni-dimensional scale i'm going to move this to the right-hand side under the item interface and the next step is quite straightforward i can click on output and under output i'll get all the tables as well all the plots next i continue so now we're ready to go we don't want to save anything and we can just let spss to delete the missing values list wise let's click ok okay nice so the results are out and i'm going to quickly walk you through the results the first thing that we see here is the global fit statistics as i mentioned in a previous video we can use global fit statistics to compare different models for example you can compare the three parameter logistic irt with the rush model these are available under spss as i have discussed in the previous video you can do an ir a rush analysis using spss under scale and rush model please watch that video to see how you can get this rush model extension if you click on this you'll see more or less the same sort of interface for rush analysis you can get all the items in analysis you can click on the output and and then click ok why am i suggesting you to do this while you are doing a three parameter logistic model i'm going to click ok the reason is that then you can you can get the global fit statistics which you can compare with the global fit statistics of the 3pl so in other words you will have a chance to compare the fit of your one parameter logistic model that's a rush measurement more or less similar to rush measurement and and your three parameter logistic model ideally we should be uh looking into a comparative study of 1pl 2pl and 3pl unfortunately so far under spss we can only compare one pl and 3pl which is not too bad really since i've already done that before i want to show you the results this is the output of 3pl which i transported here we are looking for the aic statistics which are smaller among two or more more models so the the smallest or the smaller statistic of aic belongs to the three pl on the other hand the bic statistic is smaller for the 1pl model now there are two schools of thoughts here i'm not going to go through it with too much details but remember that some scholars would prefer to look at aic and to prefer this over bic whereas on the other hand some scholars prefer to look at bic and they would say that the model which has got a smaller bic statistic is preferable now i leave it to you to decide as to whether a 3pl or 1pl model in this scenario is preferable i'm going back to the output of my 3pl model and as you know 3pl model can give you three parameters a guessing parameter item difficulty parameter and discrimination parameter guessing is students guessing on the item and the range for guessing should be between 0 and 0.4 so i'm gonna create a graph out of this guessing we already see that everything falls between 0 and 0.4 and therefore these items do not invite a lot of guessing which is good news for us if it's larger than 0.4 then we'll have a problem so i'm going to click on line and here is the [Music] the graph for the line the graph for the guessing parameter i double clicked on it and i just want to display the data labels i have to click on that option i can close these and i'll get the labels for data and this is really interesting because the highest guessing is right here and it can go back to 0.355 that's really what we need to have in a good and productive statistical analysis item difficulty hypothetically ranges from minus infinity to plus infinity so those items that are closer to the in difficulty are closer to the main bulk of the uh ability of the population or the ability of the cohort you're measuring would be better now of course we we can also look at the standardized error of measurement here and if you have a very high area of measurement for example like this one that item might not be useful at all so this item v2 that's vocabulary 2 might have a kind of problem so we need to look at that or we might have to leave it out of the analysis next is the discrimination parameter the discrimination parameter is basically analogous to the standardized loading coefficient in factor analysis as a result anything that that falls above 0.3 seems to be acceptable because we need our items to be discriminating between high ability and low ability people so for this discrimination parameter which i'm going to convert into a line graph again i would say since quite a few of them are falling above 1 they are over discriminating among our test takers for example items one two three four and five and and four under grammar and also item three under vocabulary what this means is that these items are either too difficult or too easy that they are over discriminating in the sense that for example if it's too difficult a lot of people have failed to answer that question so the discrimination parameter gets inflated a little bit under these circumstances so we don't really need to want a an item that is over discriminating among our test takers and unfortunately quite a few of these are over discriminating again this is the graph in the same way you can create a graph for the difficult parameter of your items so let me create quickly create a graph and just visually yet conceptualize the difficulty of the items so the most difficult item based on this graph let me also get the labels for our graph the most difficult item is item number v4 and with this difficulty and the easiest one is item perhaps g1 where am i is is it difficult yeah this is the difficulty parameter uh so it's minus two point three one right so let me just double check yes this is the the difficulty parameter okay next is our fit statistics this is different from the global fit statistics in the sense that we're looking at the fit stats for every single item preferably we should not have any significant values but you see that every fit statistic that you can see here is significant spss outputs only this kind of statistics in other software packages like bylog or or wind steps you can also get uh infinite mean square values and outfit mean square values which are perhaps more suitable for a sample size like the one that i have in the present present data set my sample size is around 1 800 plus people this graph is also useful in the sense that it gives us a kind of plotting of proportion correct against the total score for example if you have a total score of eight those people who have a total score of eight for item nine the proportion correct of their of item nine is around fifty percent let me draw this for you so it will be easier to imagine so for item eight sorry for for the total score eight and item nine those people who have item whose proportion correct is around 0.5 for item 9 typically have a total score of 8. on the other hand if you are looking at an item like item one which seems to be very easy if your total score is eight total score on on the entire test is eight then your proportion correct the proportion correct on item one is around zero point nine something or even one meaning that uh if your total score is eight your chances of answering what item number one is very high so the chances are that you will answer item item number one correctly next is our kernel density estimation graph which is basically a frequency graph so the frequency of those people who whose ability fell around here is a lot more between minus one and plus one logits is a lot more than those people who fell on this side of the graph we have a lower frequency on this side as well so this is basically what kernel density estimate estimation is about it's it provides us some similar information with the right map but only for the for the person ability parameters rather than item difficulty parameter because as you remember the right map has got two sections the person parameters and item parameters okay now i'm going to move down to the item characteristic curve as you see item characteristic curves are quite different from those that are that i presented in a previous video in which i use rush measurement it's because that the slope and the intercept of items have changed for example for item number one which is g1 we've got this slope which is uh which is kind of similar to the slope of the second item that is g2 but it's obviously these two items are obviously different from an item like v2 if you can see that uh because the slope is quite different in this intersecting them items v4 and also v1 are also intersecting other items in terms of their because they have got different uh degrees of slope this indicates that and also their their intercepts are different this indicates that their discrimination and uh guessing parameters are quite different so it might be a better idea to use a three parameter logistic model under this circumstance if we go by as i mentioned before we go by the aic parameter being superior to the bic parameter or the bic metric i should say okay finally we can look at this item information curve which provides more information about this the amount of information that each item is providing i have explained this in another video so i just very briefly touch on what i have talked about previously for an item like uh the one in blue i suppose this might be one of the grammar items i'm not quite sure about it so the main bulk of information falls in this area so which means that this in this area we can get the highest amount of information about the ability level of each of the test takers in this region and as a result we can claim that this item is most construct construct valid in this section or in this region whereas for another item for this this one that has been represented in green the main bulk of the sorry i should change this now the the main bulk of information for the green one here falls within this region which again means that the amount of information that we can get for our test takers using this item will be a lot more in this region and therefore the amount of construct validity is much higher under this this part of the curve for this item let's i suppose it might be item i don't i don't know what item it is it's very difficult to identify uh so that's how we can make sense of this item characteristic curve so thank you very much for paying attention to this video and i really hope that you find it useful if you did please give it a like and stay tuned in there will be more videos on doing different sorts of analysis in irt in the near future have a good day
Info
Channel: Statistics & Theory
Views: 3,918
Rating: 4.9473686 out of 5
Keywords: item response theory, IRT, 3 parameter logistic model, 3PL, SPSS, fit, discrimination, guessing
Id: z-VqRE3NybY
Channel Id: undefined
Length: 23min 2sec (1382 seconds)
Published: Sat Jan 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.