Advanced Regression Analysis for Behavioral Sciences/Generalized Estimating Equations

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
um so what we'll talk about today is GE in contrast to other longitudinal methods so where it fits how it's similar how it's different specify the distribution and the correlation structure and GE looking at time as a continuous predictor versus a categorical predictor and then just interpreting the intercept and slope effects so interpreting the output before we move into that who ransom moderated mediation last week yeah any questions from last week all right oh and then the reading what is in the folder for this week what is in there is like a GE 101 a nice brief paper I'm kind of distinguishing GE from MLM and then there's a paper on testing mediation in the context of GE so three good papers for sure all right so in terms of last week conditional indirect effects broadly encompass a range of greater than three variable relationships where the presence or the magnitude of one or more of the indirect effects of the mediating effects are conditional upon the specify upon a specified value of one or more moderator so last week we talked about one moderator but you could have more than one moderation at the aid or the beef had two moderators at the beef at the a path or the B path etc there are many scenarios for this type of model but generally moderation occurs at the a and Ord B path of the mediation chain and preach' Rucker and Hays which is the paper that we really focused on in learning moderated mediation described five different models that kind of five different forms that conditional indirect effects can take in terms of power running power analysis for mediation and this is an area that's still developing there are some programs of macros out there but also as I said last week that you know I've had good luck just using power tables and describing a range of effect sizes for you know an a path effects versus a B path effect and different combinations of magnitude for effect at the amv path in determining power based on table and then we briefly touched on extensions so thinking about SEM longitudinal SEM multi-level mediation and then briefly introducing causal causal modeling okay does that all sound familiar all right so GE so why is G why not just use repeated measures ANOVA well G definitely is a more flexible approach than repeated measures ANOVA it has more flexibility and capacities so some of those examples of those are that ANOVA assumes categorical predictors so ge can have both categorical and continuous predictors ANOVA doesn't handle time varying covariance so ANOVA will give us a repeated measures ANOVA will give us a time varying outcome but only a static predictor so in GE we can have both time varying independent variables and time varying dependent variables it's not necessary but it's an advantage if those are the type of models that we want to run in a nova the anova assumed likes treats time as a categorical predictor and so what that means that it assumes that everyone is measured at the same time point and that E and that time is an equally spaced in intervals as far as missing data is concerned you either have to impute it or its list was deleted in a repeated measures ANOVA and then GE the GE will maintain all available pairs of data so as long as you have two observations for a sip for a given case then that case will remain in the model and that case will contribute information to the model just going back to time as categorical versus continuous the difference with GE is that when you can treat when you can treat time as a continuous variable it means that you're able to have unequally spaced intervals like say month zero month two months six month nine you're also allowed to have people coming in at different time so say like you had a longitudinal study and you know your second year follow-up some people are coming in at month twelve some people are coming in at month fifteen some people are coming in at month eighteen etc GE will accommodate that and then finally repeated measures ANOVA assumes an equal can't correlation between observations over time so that's what's called a compound symmetric correlation structure and that can be fine but the advantage of GE is that you can be just a little bit more sensitive and how you assume the relationships between adjacent observations are so time one time two time three times four are they is there an equal correlation between 1 & 2 2 & 3 3 & 4 etc 1 in 4 etc or does the maybe does maybe the correlation diminish over time or does it increase over time or you know there's there's other variations besides that but that might be more realistic to what we're thinking like in a treatment trial we probably would think that the correlation between time points would diminish over time because life more life that's happening right that make sense any questions on that so this is just looking at the a wide data set so in in a novo we work with a wide or a processional data set or it's also called an unstacked data set so this is an example you have four time points and then you have six cases right and then it says so it's time one through four and ten one through four in this case Kim is the independent variable and time is the score on some dependent variable measured at four time points okay so here you have a time varying predictor and a time varying outcome so this is just the some simple code for flipping the data so who's done like flip their data for a longitudinal data sets moving from a cross sectional to a longitudinal data so some of you not all of you okay so then so when you want to do GE the first thing you have to do is flip the data from cross-sectional wide data set to a longitudinal long data set so um here's an example of SAS code here's an example of spss code so the new data set we could call long for the set that we're reading in we can call wide for so that's there and there in the SAS example time we can add on time as a continuous variable so here we have x 0 2 3 & 6 and then score which is the dependent variable so this is we're creating the score variable which is the score at time 1 time 2 times 3 times 4 and then the cam variable so creating creating those single variables that have more than multiple observations and then SPSS it's pretty similar the it's called variables to cases and you have this make command so you make chem and you make score just like you did up here here there's an index for so it's gonna create a time 1 2 3 4 so you would actually recode that into 0 2 3 6 and a separate step so that's not shown here and then these are the variables you're gonna spit out plus the index variable which is the index for time make sense this is what it'll look like so now we've got four observations per six cases right we also have that time variable that's 0 2 3 6 and then the score in the counter ok so that's what the long form data set looks like in contrast to the wide data set any questions on that so this is just thinking about the difference between a GE model and a linear regression model um so at the top we have our familiar regression equation the mean predicted Y which is equal to the the starting point when the predictors are at 0 the intercept and then the slope affected in this case the same example working for for M the slope effect for cam plus error right the difference with the GE model is that we have this range of scores for the dependent variable and in this example we also have a range of scores for the independent variable although that's not necessary right and then we also have this time predictor and then we have an adjustment a weight adjustment that for the the assumed correlation structure that we kind of impose on the model + error okay so in me if it were this were a multi-level model the difference here would be that the would be models rather than kind of fit right well actually in a multi lucano you would do both so you would you would assume a certain correlation structure but then you would also model the random intercepts and slopes right across the across individuals so this is just thinking about those differences a little bit more um this is just kind of showing us that like starting with a linear regression or a linear model we're talking about linear regression ANOVA in kolba this assumes that the responses are independent so if you had a cross sectional longitudinal data like the wide four that we looked at initially um you could run a regression model on that and it would give you six times four you would have 20 it would treat it like there were 24 cases and it would think that those cases were independent observations so that we don't want right because they're not independent observations the only difference here is that the distribution is generalized right so now we're generalizing we're generalizing to different distributional assumptions like a logistic regression a Poisson regression negative binomial et cetera still or assuming responsive or independent in GE we assume responses are quarter right which is closer to reality and but we sort of fit or impose a correlation structure that we think is happening between the observations in a linear mixed model or a multi-level model or a hierarchical model or random effects regression to an however you like to call it we our responses are also correlated but the correlation is also modelled to a certain extent by the rate random effects analysis okay and then the difference here is just now we're just generalizing the next Mouse so that means we're extending it to that wider range of distributional assumptions okay so well in both of these well actually one two three this is you know we can change our distribution if we want to right so what if I mean and I wrestled with this and so if anyone wants to like add on kind of this is how I discriminate the interpretation of the two I would love to hear it because I don't continue if I feel like I don't totally have a satisfying way of describing it but the in terms of the differences of the inference we have a subtle difference with GE it's a population average inference and what that means is that the analysis describes differences in predictive mean Y scores across the population right so we're looking at between person effects right and those between personal effects give us an entered interpretations about populations above depressed people about men about women about people who get control versus people who get treatment this analysis is informative from a population perspective this could be for policy makers were even providers desiring to optimize outcomes across a population say depressed individuals in terms of mixed models it's a subject specific inference so the analysis describes differences in the predicted mean of Y conditional on the patient's specific random effect okay and said another way these are differences within person from a population characterized by the average tran Domecq right so it's the averaged within person effect that's the part where I get a little Fuzzy's because we you know the difference between an averaged want with in person effect and a between an average between person effect it's hard to think about to me because if hypothetically the between person the within person change could be a between say most this would be most relevant from a from an individual patients perspective so said let's just say it one more way parameter interpretation and GE versus m alone the parameters in an equivalent t MLM vs. GE model and GE these are expected differences within the population that means given a change in everyone's acts from one value to another and conditional upon the intercept and the adjustment for the correlation structure versus an MLM this is the beer expected differences within the individual and it's given a change in their acts from one value to another and conditional upon that at the average random interceptive so the average intercept across individuals okay does anyone want to add to like how they think about the differences between the two does that do do you guys have preferences for one over the other like I like thinking about within person differences I like thinking about between person differences I like thinking about between person differences personally um but I do a lot of treatment research but I also do a lot of within condition analysis so then that is more about within personally so okay just any questions on that the two GE or not to GE that is the question is a good little article to that also talks about it it talks about the difference and when you might choose one over the other so in terms of standard error estimation like I said GE allows different working correlation structures different each correlation structure will be used to create weights for weighted regression model right when deriving inferences from the coefficients though it calculates two standard errors it calculates the naive standard error which assumes that we are correct in our assumed correlation structure such as if we assumed it was compound symmetric we assumed it was compound symmetric that would be the same as like as a repeat of measures ANOVA right that basically it's an equal correlation over time the correlation between adjacent time points doesn't change versus a robust standard error which is the default and which is what we report does not assume that the that what we imposed for the correlation is necessarily correct so it's probably adding some additional kind of adjustment weight there so the G in GE e means that you can specify your distribution right so you have you have that capacity in terms of a continuous linear model versus dichotomous outcomes count outcomes or you know- my binomial zero inflated model and you can choose your working correlation structures so it is the case that GE is pretty robust to the wrong choice and the way you can find that out is by running the models a couple of different ways and seeing if the seeing if the results change based on on how you specify the correlation but the choices are independent which you wouldn't choose because that's like regression exchangeable which is that it's like e^x something like that for the command but that's compound symmetric so that's the equal equal correlations auto regressive is one correlation that then is exponentiated so that's where your correlation is going down over time right M dependent is where you'll have a certain number of correlations between adjacent time points fit and then you will assume that the rest are independent so you assume the rest of them are 0 and then unstructured is it will fit the correlation structure for you ok the goal is to find the simplest structure when I say simplest structure it means that you want to use you know depending on your sample size you would want to potentially minimize the number of parameters that you have to estimate right that fits the data well so this is an approach where we can compare model fit right so that would be another thing that you could do is you could run and you know say the simplest like an exchangeable or an autoregressive and then run an unstructured which would be the most complex which would use the most degrees of freedom and then see which one fit better and see you know the differences potential differences and the results and then potential differences in the model fit okay make sense all right so it's just just looking at them so independence this will be it what it would that would look like so no correlation versus exchangeable this is going to cost one degree of freedom because it's gonna it's gonna estimate only one correlation that's going to be the same for the entire matrix versus autoregressive is also only one parameter estimated because then you just add an exponent right but like I said it's decreasing that's WIPs decreasing over time M dependence this say is for example two dependent it's gonna fit correlation one or relation to either to consider dependent soon to be dependent observations and it will fit the correlation for us and then we assumes the rest further out to be independent so zero correlation okay so you could specify you know three dependent for dependent five dependent depending on how many time points you had and depending on what you thought was going on yeah yeah exactly and then unstructured is going to fit all the correlations for you based on the data okay make sense um I tend to use AR one auto regressive sometimes I just use compound symmetric sometimes they use unstructured I don't know does anyone have preferences with their correlation structure their face their favorite correlation structure yeah it makes sense exactly so if you think like in a treatment study you would think auto regressive versus like maybe like an AMA study some type of longitudinal study with where maybe the time points are closer together and you're not doing any sort of stimulus then you might think compound symmetric make sense all right so here's an example this is an example I did with Chris Taylor and so this is a GE example but it's actually also an example of some some kind of fun coding stuff and ways to play around with time so this was a naturalistic longitudinal study of college student drinking and all of the students got um by weekly assessments bi-weekly online assessments so like maybe like a five-minute online assessment a portion of them were randomly selected and then randomized into getting an assessment versus no assessment and so this was the sort of embedded test of assessment activity so the sort of the question of interest was you know does drinking change in response to just your sort of typical baseline assessment versus you know baseline assessment plus some type of treatment there was no treatment in this study in this study we pulled out so it was like a two or three year longitudinal study but we pulled out nine bi-weekly assessments and we anchored them around the report of an alcohol-related event and this could be like even like you know puking or you know passing out so so meaning that it's not well related a bed but it's essentially a consequence not necessarily like a huge event like getting in trouble etc so we anchored them around this report self report of an alcohol-related event that came in from their online bi-weekly assessment and so one of the coding methods that we used was that we had a negative two negative one which were the pre event codes zero which was when the event occurred and then one two three four five six was the post event time points right so this is an example where you've got time as a continuous predictor because if people are people are people's events are happening in at different times right so this was used to examine both event reactivity and assessment reactivities we're interested in both than us and also whether or not they interacted right so the assumption is that heavy drinking would go down in response to an alcohol-related event some type of consequence there would be that naturalistic kind of oh I messed up and so I'm going to change my drinking for of time on but that there also might be a synergistic effect with an assessment so that if they had an event and they had that also that longer in-person assessment that they might change their drinking even more because the assessment would make the consequences of the event more salient because they have to talk about it right but the assessment was like one hour and it was designed to look like a typical baseline assessment but like I said it was in person versus just that sort of five minute online self-report so here's the code this is the SPSS code so here's the command is under the generalized linear model commands the dependent variable is number number of heavy drinking days when you have a biased statement that's a categorical variable so the categorical variables are sex race and this is white and multi race and the reason we combine those groups is because they had systematically different and higher drinking patterns than all the other groups and then I - condition is the assessment versus no assessment order descending that's just saying you know for a categorical predictor what's the reference what what are you comparing to right so this is saying descending meaning comparing higher to lower with time zero that's the that with the width statement is where we put our continuous covariance here's our model statement right so we've got all the Colbert all controls and then condition time zero condition by time zero distribution normal so which is just a regular linear model there's your estimation method I think you can switch it to maybe one other thing but not like full I think restricted I think a REM oh you can a restricted maximum-likelihood but you can't do full information and maximum like that that would be switching over to MLM within subject variables here's where we're plugging in our correlation structure so compound term symmetric in this case this is just a default of robust standard error and then this is dealing with missing data and it says exclude but really it should say included in this analysis I think they were excluded but you would you should this is the default and you should change it to include and that's going to keep all available pairs of data so that's going to maximize your the information used here's just the same thing but with SAS so say react is the name of the data set that we're using here's our model statements the same model statement here's where we're specifying our distribution we're asking for Type three sums of squares and here's where we're asking for we're specifying the correlation structure okay and that's all kind of annotated in here any questions on the syntax for either one of these a lot of this I mean SPSS plugs in like tons of this respect to fall information that I didn't break that out whereas SAS hides um so next we created a variable we talked about the variable called event which is this sort of countdown to the event oh no I'm sorry I must have met up so this variable is just zeros for pre event and all one's for time post event okay and then we're adding all possible terms so we've got the condition the time zero which is that countdown variable that's our main time variable so it's the negative 2 negative 1 0 all the way up the event variable which is gonna be three zeros and then six ones and then all the possible interactions between them and what they're asking is does heavy drinking does the heavy drinking intercept different pre and post event so that's where we're looking at the event effect right does it go down to is the intercept lower negative post event does the heavy drinking slope differ differ post event and so that's the event by time interactions so that's saying not only does drinking go down on average but then is there any time is there a negative slope effect to a higher rate of change in drinking reduction post events Don's heavy drinking post event slope differ by assessment versus non assessment writes a will assessment will being in that that one-to-one assessment predict a more negative slope in reductions than heavy drinking and that would be the three-way interaction make sense so fun with time that's the time so this is SPSS output um dependent variable we can see number of heavy drinking days we've got a normal distribution with an identity link here's our subject ID variable or within subject time variable are working correlation structure okay --sykes just check your model is it is that what you wanted it to case processing summary this is giving us this is the full number of observations so it's going to be the available and by the number of the number of time points which in this case was nine so you can see we do have missing data excluded so it was 431 subjects by nine time points we can see that here here this is just descriptive information for on your categorical variables continuous here's the descriptive information for your continuous variables so the dependent variable apparently this output has audit added in as a continuous covariant which wasn't in the code just FYI and you've got these fit statistics with a smaller is better criteria right so these are going to allow you to compare your models if you say you know play around with the correlation structure and see if you can get a better fit so here's the output so the coefficients confidence intervals significance these are the XP exponentiated Russian coefficients particularly like with if we have looked a dichotomous outcome this would give us odds ratios okay so what's the deal so we've got the sex race condition there's that time one audit time event time by event all the interactions what's the story heavy drinking increases on average pre events so that is event effect time effect here now on significant there's a non significant decrease of heavy drinking on average post-event so that's a time by event interaction sorry I'm having a hard time saying this actually here the event effect the slope and heavy drinking is significantly more negative post event and so that's our time by event so we've got a negative coefficient it's statistically significant okay so they're reducing their drinking post event but event is not interacting with assessment to reduce graders reductions in tributes and they're not reacting to the assessment here and we thought about why that was here condition by time by event mostly we thought that like that Nessus that the events themselves weren't necessarily strong enough or perhaps too variable but then again you know do you want assessment reactivity or not see any questions on that so this is just a little bit of you know this is why GE is good um it increases power by increasing the degrees of freedom allowing for those multiple observations it also increases power by accounting for the dependencies among the observations so having a better fit of the correlation structure um it also maximizes the information using all available pairs of data it is not just for repeated measures over time that's the only way that I work with it but it's appropriate for any nested data so that's just something to keep in mind okay that's it it's a short one questions on GE do you feel like you can go out and do some GE now it's not so bad it's you just think regression with a correlation that I think is right and then a longitudinal framework so interpreting coefficients in terms of longitudinal you know sort of longitudinal thinking in terms of intercepts and slopes and when we say slopes we're talking well slopes for time say for example so how you know how where they start and then how quickly or slowly they change yes you know any literature a good imputation of missing data with what happens behind the scenes during a full information maximum likelihood approach today in other words the imputation some of which actually do maximum likelihood estimation right is it simply an ease of use for the user or does the full information modeling do a better job or a worse job than um I think he Bobst out it's like it's on his to-do list and if you asked him he would say that the imputation like you know that imitation you really have to have good predictors you have you know when what is the question is like what is a good imputation model and you actually have enough of the right variables to have a good imputation model but but but the difference between yeah but the difference between well except that full information is using right so yeah so I don't know of a comparison other questions or thoughts are like neck plans for next steps what are you guys going to do with all this great information and Mull it over file it into your toolbox okay all right happy spring thank you
Info
Channel: Brown University DPHB
Views: 9,950
Rating: 4.951807 out of 5
Keywords:
Id: tnVP53VI3Kg
Channel Id: undefined
Length: 40min 18sec (2418 seconds)
Published: Tue May 19 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.