PMAP 8521 • Regression discontinuity 5: Regression discontinuity with R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so we've talked about the theory behind regression discontinuity we've talked about how to actually measure the gap at the threshold or at the cutoff and we've talked about some of the main concerns we have to think about when we do this analysis so now we're actually going to do it with our so this should be fun we'll use this áit example that we've been talking about throughout these different videos so what we're gonna do just to give a roadmap of what we're gonna do in the analysis we're gonna start off by checking to see if the assignment to our treatment or our program is rule-based and if it is then we can continue and we can do regression discontinuity if it's not then we can't the only way this works is if you have some sort of arbitrary rule that decides if somebody can access the program then we have to determine if the assignment treatment is fuzzy or sharp either one works there are ways to do fuzzy analysis or sharp analysis sharp is easier because you can just use regular regression you can you don't have to worry about compliers and non-compliers in the example that we're gonna do it's sharp in your problem set it's sharp in your in exam two it will be sharp I'm keeping everything as easy as possible if you start doing this stuff in real life and you see that it's fuzzy email me look at the documentation Google stuff I can you'll find lots of resources to help you with fuzzy analysis then we're going to see if there's a discontinuity and the running variable at the cut point and we don't want to see that here this is the McCreery test we want to see we don't want to see any manipulation in the test score to see if there's a whole bunch of people at 75 or 76 that shouldn't be there compared to 74 needs to be nice and smooth then we want to check to see if there's a discontinuity in the outcome so this is that final test score that we've been using and that's where we want to actually do see a discontinuity we wanted to see that Delta that size of the gap and if we can see that then we have a causal effect and then we're going to measure how that gap is and we're gonna measure it a whole bunch of different ways and figure out how big it is in real life there's no true value of the gap because it really all depends on how you draw the lines and there's no true way to draw the lines we'll just draw a ton of different lines get a whole bunch of different measures of the gap size and we'll see how big it potentially is so let's do this all right so go ahead and download the zip file that is posted on the class website for today or you can go to our studio cloud project that's there and it should load everything and have all the data and our markdown files and the correct locations if you download the zip file make sure you unzip it and then open the RP roj file so you can open a project and so your screen should look something like this after you open the AIG program RMD file so go ahead and do that you can go ahead and you can also click on AIG program finished and that is the finished version of all the analysis that's there for your reference it's also pretty identical to what is on the class website that is on the actual website and so you can have access to it there or here you can run it a whole bunch of different places but what we care about is AIG program our MD cuz this one if you scroll through it is pretty empty it just has a bunch of chunks with no code in it in them so we're going to add all of the code and start doing the analysis and so this again is based on this idea that students the hypothetical students take a test in 6th grade to determine if they can get into this academically and intellectually gifted program or AIG and then if they do the AIG program they might have higher test scores at the end of their high school experience and so there's some final mythical hypothetical tests that they take at the end of high school that's based on a score of a hundred and so what we want to see is does this AIG program boost test scores at the end of school again this is all fake it's just data that we can play with so if you look at this first chunk here named settings it just has some settings you've seen this in other examples before we're gonna use the huxtable package the show side-by-side regression tables but one thing the Huxtable likes to do when you knit is it reformats all of your tables should be shaded differently and have borders and stuff and I don't like that and so if you've run this options line right here that will turn off the fancy formatting that huxtable does and then right here this just makes it so all of the figures that we make will be the same width and height and they'll be centered and Retina means they'll be double resolution so if you have a fancy MacBook or a fancy Windows machine that has a good screen on it they'll look nice and clear again this stuff I've been copying and pasting from document to document all the time I have no idea how to type this by hand especially this huxtable printing thing I found that on the internet once and I've just been using that same line ever since so don't feel bad about copying and pasting so you can go ahead and run this chunk it's not gonna do much but it'll still run if you look at the next chunk this where we start loading libraries we're gonna use tidy verse so that we can plot things and manipulate our data frames like we've been doing before we're gonna use broom which lets us convert models like from LM into data frames and we can do nice things with them the two new packages that we're gonna be using are our D robust and our D density this lets us do regression discontinuity stuff nonparametric readers can read aggression discontinuity or this package lets us get the McCreery density tests and then we'll also load this huxtable library so we can get the side-by-side regression tables and then this last line loads the data and so that we can have this AIG program data to work with so go ahead and run this chunk and you'll see that you'll have a data frame over your environment panel named AIG program and if you click on that you can see all of our fake data we have their test score for getting into the AIG program we have their final test score some demographics and we have whether or not they are in a IG program and an ID column so those are the main things we care about the only columns we're gonna really work with our test score final score and a IG if we want we can control for other demographics in the models but we don't need to these were just randomly assigned anyway so it doesn't really matter so if you go back to your markdown file we're gonna go through those same steps that we looked at in the PowerPoint in the beginning of this video so first we want to determine if the process of assigning treatment was rule-based and there's no stat C way to do this we can't look at the data the data set and see if it was rule-based we just have to know about the program and because we made up this program and we know that there was a 75 point threshold we can say that it was rule-based so we can actually just type that here yep it was rule-based in real life you explain why it's rule-based and how it was assigned and what the arbitrary threshold is and all sorts of stuff but because we made up this data will just say yay all right so step two this is where we want to determine if the assignment to treatment was fuzzy or sharp we want to see if people who scored above 75 didn't get into the program or if people below 75 did get into the program because ideally we want it to be nice and sharp with no non-compliers on either side so we can check this graphically and we can check it with numbers we can count how many people scored higher than 75 and then get in the program and vice-versa so we're gonna first plot this so we're gonna plot two variables we want the running variable which is our test score and then we want to have a IG whether or not they were in the program on the y-axis and so this should show hopefully we'll have a nice clean break where people below 75 were not in the program people above 75 were in the program so if we come to this chunk we can start plotting so we'll go ggplot our data is aig per and then our mapping is going to be so we say AES for aesthetics and I say X is equal to I can't remember the name of the column it was test score so x equals test score y equals a IG and we'll also color the dots by a IG by whether or not they were in the program so we can see it better we want to show this if we ran this right now we'd see an empty plot so we want to show this with points so we'll say GOM underscore point and let's go ahead and run that just so we can see what the preliminary plot looks like there we go we have a whole bunch of red dots and a whole bunch of turquoise dots and so that shows a pretty good discontinuity right there at 75 one thing we can do to help see that actual cutoff is we can add a line at 75 so that it's clearer so that's just adding another geometry on point you can say GM underscore V line her vertical line and the way we tell it to be at exactly 75 is we say x intercept equals 75 so go ahead and run that chunk again and you'll see a nice line there at 75 we have some issues with over plotting we just have like these solid lines here those aren't actually lines they're hoping to dots on top of each other so one thing we can do to fix that is add some transparency we can say alpha equals 0.5 so they're 50% transparent so if you do that you'll see it's still pretty heavy there some semi-transparent things out there so we could shrink it even more alpha equals 0.1 it's still pretty heavy there so another thing we can do is we can jitter those things we can shift them around we can specifically shift them up and down they don't need to like we don't care if they're a little bit more true or a little bit less true they can go up and down here we don't want to jitter them side to side because we don't want somebody eighty and accidentally shifted to 85 so what we can do is inside Jian point we can say position equals and there's a special jitter function called position underscore jitter and what we can do is tell it how much to jitter widthwise so we can say width equals zero because we don't want these dots to randomly go side-to-side and we can say height equals something maybe 0.3 so they'll go up and down randomly within 0.3 so go ahead and run that chunk and there we go so this shows kind of a good distribution of people nobody above 75 points on their test or nobody below 75 points and the test was true nobody above was false it's a nice sharp discontinuity everything's happy then so we can actually check this numerically to see the exact count of people in these two different groups we have a group of people we have one grouping of whether or not they were in the program and we have another grouping of whether or not they scored 75 points on the test so what we can do is do some group by and summarize from deep liar to see the exact numbers so if we say aig program and then do a pipe sign which is command shift m' or ctrl shift m' we're going to group by two things here I'm going to group by AIG and we're gonna group by whether or not they're test score is greater than or equal to 75 okay so now we have two different groups and then we're gonna summarize and we're going to get the count of people in each of those groups and we'll just use the N function which tells us how many rows there are in each of these invisible sub groups that it makes so if we run this we'll see that there were 350 people who were not in the AIG program who also did not score above 75 and there were 600 people who were in the AIG program who did score above 75 and importantly there are zero people who were in the program and didn't score high enough or were not in the program and did score high enough so nice and sharp everything's great there okay step 3 we want to check for a discontinuity in the running variable we don't want to see that because again this is the manipulation we don't want to see a whole bunch of people scoring 75 or 76 or not enough people scoring 74 we don't want there to be a manipulation in the test score to get people into the program or not so what we can do is check this a couple different ways one easy way is we can draw a histogram of our distribution of test scores and see if it looks weird at the cutoff to see if there's more people in the programmer out of the program so we'll make a histogram with ggplot we're gonna say data equals AIG program and we're gonna mapping equals so we're gonna map the x-axis to our test score column and then we're gonna say jum histogram and if we just do this by itself it's gonna automatically do 30 bins we want to set our own bin width here so we might want to do a bin width of two so that what we have is every every one of those columns represents two points on the test and let's see what that looks like run that chunk in neat one thing we can do is fill this fill the the bars by whether or not they'd scored in the scored 75 or because we know it's a sharp discontinuity we can fill it by whether or not they were in the program so we'll fill by AIG so if we plot that now so you can see that this section of the distribution was not on the program the section was in the program and we can add a line for a cut-off again I'm using Geum V line and then x intercept equals 75 so there is our nice line right there at 75 and one last thing we can do to the histogram is if you add a color to it like color equals white that'll add a small border around each of the bars and so it's easier to see and of the two point chunks you have here so what we care about is the bars these histogram bars right before and right after the cut point it looks at first glance that we might have too many people on this side on the program side of the cutoff compared to the non program side of the cutoff if this was perfectly smooth you might have more people scoring 74 then over here scoring 76 so that might be something to worry about but we don't have any statistical measure of that we don't know if that's a significant jump if that's something we need to worry about if that's just because of chance so we can't just draw a histogram and call it good we need to run kind of a more systematic test and so that's where we can do the McCreery density test so the way we do that because we've loaded the the Rd density package there is a function called Rd density our do you know Rd plot density and this lets us do the official McCreery test to see if it's statistically significantly different at that boundary the syntax for this is kind of weird the authors of these regression discontinuity packages didn't write the packages to fit within the tidy verse world and so things are not easily pipeable it's it's hard to deal with the plots that get spat out of them you have to do some kind of older our stuff standard base our and so the syntax is weird and so again you have this for reference you don't need to memorize it so the way this works is we have to give our D plot density the function a regression discontinuity object which is our running variable so it's this argument called our D D so we say our D D equals and there's a function called our D density and you just have to memorize this or look at the documentation for already plot density there's no easy way to remember all of this so what we do is our D density is we have to feed it our running variable which is capital X so this is our AIG program and the way we do this is we have to give it a specific column inside áit program and this is kind of the base our way of doing this so we type the the data frame name and then we do a dollar sign and this is the regular our way of finding a column inside a data frame and so you notice once we hit the dollar sign it give is this pop-up menu so we can choose test score because that's our running variable and so that's the thing we care about then we need to tell it what the cutoff is so we can say C lowercase C because that's what it says in the help file equals 75 because that's our cutoff for the running variable and then after so in between these two parenthesis so outside of our d density but still inside our D plot density we'll hit a do a comma and then press Enter and then because it's a strange function we have to tell it the running variable again with capital X but this time we don't use our d density we just say AIG program dollar sign test score and if we do that and run this chunk we should see a plot we actually see two plots which is another kind of annoying thing that happens with our D plot density it spits out the plot twice so one way around that is you can actually assign the output of this this function to some variable and store it there and one of those plots will get stored there the other one won't the other one will just gets bad out because it does and again that's kind of annoying but that's just what it does so we can just say like asdf or whatever you want to call it doesn't matter we're just sticking the results of this thing so that it doesn't make two plots so if we run this function again now we just have one plot and if we look at that there is a gap between that black line and that red line but it's not statistically significant those confidence over confidence intervals overlap it's not something we need to worry about and so we can check off step three there's not a discontinuity and the running variable at the cut point so we can move on and we're good so next we need to measure the size of that gap officially but we want to see if there really is a gap um so again we're going to make another plot and this is going to be the running variable our test score on the x axis and on the y axis it's going to be our outcome variable or the final test score that they take at the end of high school so we're going to make another plot with GG plot our data once again is aig program our mapping with our aesthetics we have X is our running variable I forgot the name of it test score so test score our outcome is y which is called final score probably the name of these better oh well final score and then just to make the graph easier to follow or gonna color each of those points by whether or not they were in the program so we're gonna color by aig we want to add some points so we're gonna say geom point and let's go ahead and run that and see what it looks like a so we have a scatterplot there seems to be a jump here at 75 ish we have some over plotting some of these points are big they're overlapping each other we can fix that by shrinking the points we can say size equals 0.5 alpha equals 0.5 sure if we run that we should have smaller more transparent dots cool we want to see the actual line for the cutoff so we can say giambi line x intercept equals 75 and there is our line just for fun if you want to make it dotted you can say line type equals dotted and now you have a dotted cutoff line or you can say dashed and now it's a dashed line the last thing we want to do is add a best-fit line that goes on the red side and then goes on the blue side because we want to again get the the size of that gap so the easiest way to do that is to use GM smooth and if you just do regular GM smooth it's going to do a nonparametric low s curve that even tells you it's using low s and so you can see it's kind of curvy it Wiggles around we don't want that right now we just want a nice straight line and so inside GM smooth we can say method equals LM so it's gonna use a linear model so now if we run it there's our nice linear model for the regression discontinuity so what we care about now is we have a gap it looks like it's statistically significant but we want to measure how big that gap is how many points does this aig program boost your final score so we need to measure it and that gets us to step five so we're gonna measure it both parametrically and nonparametric so we're gonna use a low s-curve and kind of make a wiggly line and and figure out how big the gap is but first we're gonna use regular regression we don't need to add any squared coefficients or cubed coefficients because that looks pretty straight there's no weird curviness there so we'll go ahead and do just a regular regression to figure out how big that gap is but if you remember from the PowerPoint the easiest way to do this is to Center our running variable so instead of looking at the actual test score we're looking at how many points above or below 75 people scored and then that's going to help us get the accurate size of that gap so what we're gonna do is well make us we'll make a new data set based on a IG program where we're gonna add a new column so we're going to call this a IG centered and this is going to equal a IG program you'll add a pipe and we're gonna mutate and we're going to say we're gonna make a new variable called test centered and this is going to equal that score so we'll say test score minus 75 because that was our cutoff so if you run this now we should have a second data set up here called a IG centered if you look at it has all the same columns as before but now it has a new column for test centered so this person was 17 points above the threshold at 88 Oh at 92 over there this person was 2 points under the threshold at 73 ish points so that worked array so what we're gonna do is build a model using that new centered variable so we'll just call it a super exciting name of model 1 and I'm gonna say LM so here what our outcome was was final score so say final score is explained by tests centered plus AIG that's the indicator variable for they're not they were in the program that's the true or false variable and then we need to say data equals AIG centered and we want to see the results of this will say tidy model underscore one so if we run this now here are our results we have the intercept is 63 so like we said during the PowerPoint that means that is what that is essentially where this red line is right at the cutoff so somebody who scored like a seventy four point nine on their test on average would have a six final test score of 63 points that's what that is showing there this test centered that coefficient means every point you go up above the threshold your final score goes up by half a point by 0.5 three points this last one AIG true that's the thing we care about the most that is how big of a gap there is or how big of a boost you get when a ID is true at the cutoff and so you're going from 63 ish to whatever 63 plus eight is village I can't do the math for but we'll use our 63 plus eight is 71 whew so people who scored 75 on the aig test have an average final score of 71 compared to 63 for those who were not in the program so that is the size of the gap and that's good and we can check if it's if it's the physically significant the p-value is super tiny our T to our T statistic is big so that's good cool so another thing we can do right now we're using the full width of the data we're using everybody people who scored 30 on the test people who scored 100 on the test we haven't used any bandwidth yet but we want to limit our analysis to just kind of the people around that cutoff so what we need to do is shrink it down when we run the models so that we're not looking at everybody to see if that changes the test score so what we're going to do is make a couple new two frames so if we come down to this next chunk we'll make one data frame called AIG program ten for a bandwidth of 10 and so we're just going to base this on AIG centered and all we're doing here is we're gonna filter it so that it only includes people where they're centered test score is below 10 and above negative 10 so we're only looking at the people plus or minus 10 so to do that we can say test centered is greater than negative 10 and test centered is less than 10 so if you run that you should have a new data set over here called AIG program 10 and if you look at it test centered here nobody should be above 10 or under negative 10 and if you look we used to have a thousand rows now we have 497 rows so we threw away half our data we also want to make a dataset for people plus or minus 5 in the bandwidth so the easiest way to do this is just a copy our AIG program 10 and change it from 10 to 5 so we want test center it is greater than negative 5 and less than negative 5 and so if we run that we should see a similar result so now if we look at AIG program 5 nobody has a test centered score greater than 5 or less than negative 5 now we're down to only 257 rows we got rid of almost 3/4 of our data but we have very very narrow bandwidth now so that's good so now we want to run our models again but this time using the the bandwidth data so the easiest way to do this is to come back up to model 1 and copy it and we'll just change a few things so if we come here paste it and paste it again so we're gonna make model 2 or whatever you want to call it it's still gonna be predicting final score based on test centered plus AIG but instead of using the full centered data we're going to use the bandwidth of ten data and then we'll look at the results and then we'll make model three and that's going to be the same thing these don't change the only thing that changes is the data set that we're using which is AIG program five and then we want to look at the model results so if you've run that whole chunk it will show two different models so the first one is when you have a bandwidth of ten and notice how our coefficient is now 9:00 instead of 8:00 so if we're narrowing it down and just looking at plus or minus ten on each side the program effect seems bigger by a point if we look at the bandwidth of five and look at it even narrower then the program effect shrinks and now it's seven point three instead of eight or instead of nine which one of those is most accurate I have no idea I don't know what the true program effect is because it's just fake data but notice how it changes and so we can another sensitivity and it's sensitivity analysis we can do is shrink it even more make it even wider use half of the bandwidth twice the bandwidth something and see how much that changes to see if it ever drops below zero maybe it's a negative effect maybe it's a giant effect who knows so finally we want to see all three of these parametric models at once so the way we do this is we use the Huck's reg function and we just feed it our three models and we can actually name them to make it easier so to name them we can use lists so we're gonna say name of the thing so full data equals model one Bend with equals ten was model two and bandwidth equals five was model 303 okay and we'll put that on its own line so here we're just feeding it a list of models but we're naming each of them and so if you run this chunk you'll see this side-by-side regression table here for the full data for when we have a bandwidth of ten and when we have a bandwidth of five so you can see the the effect changing from a point four to nine point two to seven point four and you can see how much it changes and there's some preliminary results for our gap with parametric models next we want to do some nonparametric stuff and use the curvy lines instead of fitting exact straight lines to this so let's do the nonparametric stuff we're not going to run any models we're not going to worry about centering variables or anything we can use our original data the function we use here is our D robust and if you notice the pop up thing here there's a whole bunch of different arguments you can feed into it if you look at the help file for our D robust you can read all about them the three most important things you need to feed it are the Y which is your outcome the X which is your running variable and the C which is the cutoff and if we feed it just those three things it should tell us the effect size or the size of that gap so we say y equals and here we have to use the old-style version the standard our style version of giving a column name so we say area aig program dollar sign so our outcome is final score we can say x equals this is our running variable which is a IG program dollar sign test score and our cutoff was 75 if we just run this it will give us some output but for whatever reason it doesn't actually show us the size of the gap it just shows us like diagnostic stuff to see the size of the gap we have to feed the results of this function into another function called summary and then that will tell us the size of the gap so if you add a pipe and then do summary just like that it should give us the size of the gap okay so it actually gives you a ton of information here if you scroll down to the bottom that's where the most important that's the thing we care about the the coefficient for the conventional method for robust regression discontinuity that's nonparametric it says it's eight that's the size of the gap we have a z-score for a Z statistic for it it's big and it's statistically significant it gave us a confidence interval so it could be between five and ten if we did it a whole bunch of times but that's definitely not zero and so we can say it's statistically significant it's big if we scroll up to the top it gives us a bit more information about how it actually calculated that effect size so it says it started off with a thousand observations it decided to use a triangular kernel because it did so again that's giving tons of weight to the points that are right next to the cutoff and a lot less weight as you go further away it's using this optimal bandwidth based on this MSE Rd algorithm decision if you look down here it actually tells you what the bandwidth was so it's using plus or minus six point five eight four because that's what it decided was best you can actually feed your own bandwidth to it one of the arguments to our D robust is whatever bend what do you want and so we can tell it to be five we can tell it to be ten it also tells you how many observations it's actually looking at so it's actually only looking at 346 or even 138 that got treatment 654 with a wider been with a more robust bandwidth and then 198 and so that's that's what all of this shows here but again kind of the most important final number you care about is that eight cool so if we want to plot this there's a built-in function with the Rd robust package that will plot the the difference or the size of that gap the easiest way to do it is to actually grab the code from RD robust because it has all the same arguments the only thing that changes is the name of the function so if you grab this don't worry about the summary part copy Rd robust all of that stuff down to here and we'll change our D robust to our D plot for regression discontinuity plot and everything else is the same we can reinvent this just to make things line up if you select those lines and then press control I or command I it'll fix the indentation and if you run that chunk there's our regression discontinuity plot so you can see that there's a gap right there the size of the gap is eight points these points here one odd thing that the the Rd plot function does is these aren't actually the data points that are in the real data it essentially makes kind of a histogram it bins these things and so these are the average point the average Y values for some specific bin and you can choose the bin width and so it's kind of like a histogram but a scatter plot version of histogram so even if you had like a million points you would still see just like 50 or however many you see here so that's not the actual data those are just averages but you can still see that eight point increase there so the bandwidth the way so we saw that it chose this bandwidth up here this MS erd that's based on a means standard error trying to find to minimize the mean standard error that's the algorithm it's doing but we can look at a whole bunch of different algorithms and see what potentially it could choose so that we know is the best one but what you can do is you can use a function called Rd bandwidth select and it'll tell you a whole bunch of different ones and it uses the same syntax as already robust again so if we copy that and it's the same syntax as Rd plot we can come here to this next chunk and paste it we're gonna use this function called Rd b/w select and if we select these lines press command I or control I reinvent it this needs the summary because that's how they wrote this function if you run this chunk you'll see it gives you one bandwidth if you just give it if you just run it like this without telling it to show a whole bunch of different bandwidths that's just telling you what the optimal bandwidth is six point five and this happens to be symmetrical so it says six point five to the left of the cutoff six point five to the right of the cutoff that's not always the case some of the optimal bandwidths might have a few on the left side and a bunch on the right side it just depends on how the algorithm is trying to optimize stuff so that shows you the best bandwidth you can also see all of the bandwidths so if you see if you copy this code again this are DBW select and come to this next chunk that says are DBW select all one of the arguments you can feed into it is comma all equals true so if you run this it'll actually show you all of the potential bandwidths that the the Rd robust uses so if you look here these are all the different ones that could potentially work for finding the size of that gap it happened to choose the main standard error one but you can also have whole bunch of different others but they all range between like four point five six ish they do have this one that's like 14 on one side and six on the other neat there might be situations where that's best I don't know but again generally you just want to use whatever it tells you to use if you don't want to use whatever it tells you to use you can specify your own and the way you do that is you can add one more argument to already robust to give your own bandwidth instead of using whatever they automatically find for you so if you scroll back up to the very first Rd robust right here let's copy that and let's bring it down to here and we'll paste it so to specify your own bandwidth for whatever reason the argument is named H which stands for bandwidth and so here we can just say if we want plus or minus five it's a five so if we go ahead and run that this is the results when the bandwidth is five and so now we have a point to meet something that's common to do like we talked about during the PowerPoint is to you can do double the bandwidth and half the bandwidth to see how much it changes so if we copy this and come down to or I'll just do it in the same chunk here paste this twice this will do H equals five times two you could also just type ten but you can have R do the math for you and then we can say the bandwidth is 5 divided by 2 or 2.5 if you run each of these if you run the whole chunk you'll get three versions of it and so when we use a bandwidth the five the size is 8.2 if we use a bandwidth of 10 the size is 8.5 so it didn't really change much if we use a bandwidth of 2.5 the size is 8.2 which didn't really change much so that's good it shows that our results are pretty robust to different bandwidth sizes we can shrink it we can make it big and it's generally going to be around 8.2 ish regardless of what we do the last thing we can do is we can manipulate the kernel so right now it's using the triangular kernel so again giving lots of weight to the points right next to the cut point and less weight as you go out we can choose different kernels to so go ahead and copy the rd robust function again so the way you do different kernels is it's another argument to our D robust so instead of specifying our own bandwidth we'll just stick with whatever it chooses it was the 6.5 something but we're gonna change the kernel so we're gonna say kernel equals and there are three possible kernels you can use if you look at the help file for our D robust so we can say our D robust if you scroll down to kernel which eventually is right there there so if you look up here it says there are three possible kernels you have the triangular which is the default you have epinet snick off which is that curvy one so the points right by the cutoff are important but not super important and it kind of gradually lessons and importance and you can use uniform and so all of the points have the same importance all the way across so we can say triangular which is the default we can copy this and say yep and that I have no idea to spell that so we'll copy it from our help file epinet snick off and then we'll copy this and we'll do a uniform kernel so if we run all three of these so this is using the the default bandwidth with different kernels if we wanted to go super wild we could have double bandwidth and half bandwidth for each of the different kernels and we'd have like a billion different models but so far everything's pretty consistent no matter how big the bend with is and I'm guessing the kernels are pretty consistent - so here's triangular it's eight here's the epinet snack-off kernel and it is seven point eight and here's the uniform kernel and it is seven point seven so a little bit less when we change the kernel but it's not dropping down to like 4 or 1 or going up to like 20 it's generally within the range of seven eight and so that's kind of the main effect so then the last step here is to compare all the effects and so we can systematically go back and say what was it when we did it parametrically and we can write down what it was with each of the parametric models and so we can write down 8.4 we can write down 7.3 write down 8.4 again and if we collect all of those in the end we can have one big table that shows all of the different effects that we found if you look at the final finished our markdown file and scroll all the way to the bottom you can see that I did that and so here's this this fancy markdown table that shows all of the different estimates with the different bandwidths and kernels and methods and in the end you just choose one of those estimates or choose all of them and say here's all the potential program effects from this program it ranges from 7.3 to 9 but in general it looks like this AAG program does boost your final highschool test score and so we should probably roll it out because it's it has a good strong effect you do need to remember that this is just for the local average treatment effect and so it's not for the whole population it's just for people in that bandwidth but for people in that bandwidth it looks like this AIGA program is effective and it's doing the good things in boosting scores and so that's how you do this regression discontinuity stuff with our you just throw a whole bunch of different models at the data and see how consistent your estimates are and that's how you do it
Info
Channel: Andrew Heiss
Views: 1,966
Rating: 5 out of 5
Keywords:
Id: 8vHQCj5ploM
Channel Id: undefined
Length: 47min 56sec (2876 seconds)
Published: Tue Mar 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.