[MUSIC] In this final lecture, I'll go through an alternative method to infer causal effects in the absence of data from a randomized control trial or in lottery like variation in relocation to the dependent variables unavailable. This alternative method is called difference in difference and generally requires longitudinal data. All this will be explained in this lecture. The idea behind the difference in difference method is ac very simple, and it's illustrated on the figure in the slide. Imagine, as with a randomized control trial, that our independent variable, referred to as x in the previous lectures, only has two values indicating whether the individual is treated or not treated. We have data on relevant outcome for a group of treated individuals, before and after treatment, and similarly for control group. Then you can estimate the average difference in outcomes before treatment across the two groups. Imagine that we measure the amount of unemployment experienced for some individuals. Some of them are selected to participate in a labor market training program and some are not. If the two groups have experienced different amounts of unemployment before treatment this is an indication of selection into treatment. Imagine for instance, that the average amount of unemployment is lower for those selected to participate in the subsequent training program compared to those selected through the control group. This could be because individuals who volunteered for training, or more motivated, or has better skills than those who do not choose to sign up. Obviously, when comparing average unemployment between three of the controls after treatment, do not in this case estimate the causal effect from the training program, because differences also include pre treatment differences. In order to obtain a better estimate of the causal effect of the training program we'd like to deduce the pre program differences from the post program differences. Before inferring causal effect of the training program. Therefore in order to calculate the causal effect of the training program we estimate the difference in difference across treated and controlled. Hence the name of the method, the difference and difference method. This is all there is to it. The following slides will just formalize the method in terms of regression model and finally offer some empirical samples. In what follows, we use the capital letter T to indicate treatment. However, you should just think of it as x, the independent variable that might be correlated with the error term in a regression, on outcomes and treated and controls. It is this correlation that induced pretreatment differences between treatment, treated and controlled. We now write a regression model, with two independent variables. The instrument indicator capital T and a time period indicator, lower-case t. Both has two values. The treatment indicator is one if the individuals is in the treatment group and zero otherwise. And the time indicator has a value zero in the pre program period and one in the post program period. We now postulate that the interaction term between the treatment indicator and the period indicator, is equal to the causal effect of the treatment, the difference in difference. The regression coefficient, for the treatment indicator captures any preprogrammed differences and the period indicator captures any shared differences across the pre and post program period. In the case of unemployment we could imaging that if there's is a period of economic recovery between two time periods, both the treatments and control groups might experience a decrease in unemployment across two periods. This would be captured by the pure indicator. The following slides will show that this regression model will actually estimate the difference in differences. First we have that both the control and the treatment group can experience a change across periods for instance due to changes in economic conditions in the unemployment example. That is, the difference in the average outcome across periods could be non-zero. We also have that the change across periods in groups could be non-zero. Therefore, these are the changes that could occur in the data, and that to be captured accordingly by the model. So now we investigate how our proposed regression model captured those differences. First we state our regression model again to remember the difference of coefficients. Next we see how the model captures the difference in average outcomes in the pre-program period, that is when the time indicator is zero and the treatment indicator varies. Concerning the appropriate values of the treatment indicator, which varies across groups and the time indicator, which is zero, as we only look at the pre-program period into the regression model, yields the regression model equivalent of the program, preprogram average differences. In addition, as can be seen from the slide, this is equal to the regression coefficient for the treatment indicator. This regression coefficient captures preprogrammed differences across groups. If it's different from zero, there are selection into the treatment. That is, those who are selected or select themselves into treatment, on average different from those in the control group. Had data been generated from a successful randomized control trial, this coefficient should be zero. Now I want to see how the model captures change across time for the treatment group. This change is due to both share changes in the environment. For instance, economic conditions in the unemployment example as well as any potential effect of the treatment. We write the regression model once again to keep track of all the coefficients, again inserting the appropriate values of the treatment indicator (which is now one As you only look at the treatment group) and the time indicator (which is one in the last period and zero in the first period) we get that the average difference across time for the treatment group is a coefficient for the time indicator plus the indicator for the interaction term. For the third time, we restate our aggression model in order to see how the model kept its changes across time for the control group. Again, inserting the appropriate values of the treatment indicator, which is now zero as we only look at the control group and the time indicator, which is one (in the last period and zero in the first period) we get the average difference across time for the control group as a coefficient for the time indicator. Now finally we want to see how the model captures differences in differences between the control- and treatment group across time. The data is the average difference across time for the treatment group minus the average difference for the control group. For the regression model, we found from the previous slides that the difference in the average outcome for the treatment group, was a coefficient for time indicator plus the coefficient for the interaction term while different for the control group, was just the coefficient for the time indicator. And the difference between the two groups is just the interaction term. Therefore, the interaction term captures the difference in time across the two groups, or the differences in differences. It seems as a difference, the difference model is a magic tool. It removes all selection effects and therefore allows interpretation of the interaction term as the average causal effect of the treatment, without the need for randomized data or instrumental variables. All we need is average outcomes for individuals in the treatment group and control group before and after the treatment. We do not even need data for the same persons across time. Therefore, the data for the treatment and control groups does not need to be the same across periods. This is neat, if for instance, you want to study the effect of a smoking policy in one state or region, all you need is health outcomes in the region and a control group in a neighbouring region, for instance before and after the implementation of this modern policy. Any difference and difference across the two regions are interpreted as a causal effect of this smoking policy. This might sound too good to be true, and it is. In all derivations of the difference and differences from an aggression model, the average value of the residual term E is the same across all four groups, treated in both periods and controlled in both periods. If this is true, the difference in difference estimator is estimating the average causal effect of the treatment. If that's not true, however, then obviously the difference in difference estimator is estimating something else and the interpretation of the interaction terms as causal effect may be biased interpretations or estimates. Therefore, the interesting question is when it would be likely that it's not true and the average zero term is not the same across groups. It turned out that the interesting question is when the change in the error term is not the same across time for the two groups. That is, if the treatment group would have had counterfactual change across time compared to the control group in the case of the smoking policy sample. If other policies that related to health were invoked in both regions, then changes in health outcomes could be due to a wide range of factors and not just the invoking of the smoking policy. Hence the validity of the difference in difference approach to causal inference is therefore depending on how comparable change is across groups in the absence of the treatment. As an example of the application of the difference in difference approach, we now turn to this thigh example again. We use the difference in difference approach to see if attending a small class in first grade, improved math achievement. We saw earlier that attending a small class in Kindergarten had a positive affect on math achievement. Here we rely on the randomized control trial to confer causality. Now I want you to use a different approach to see if a small class in first grade also improved math achievement. We cannot completely rely on randomization to infer the effect of a small class in first grade. As some non-random drop-offs and crossovers took place between kindergarten and first grade. We now run from regressions in kindergarten and first grade math outcomes. In our first regression we only include a constant term. This is just to get a feel for the data. We find a constant term of 566. This means that an average student in the two grades score 566 on the math outcome scale. Now we rerun the regression model. Including a time indicator, grade equal to one in the first grade and zero in kindergarten. A treatment indicator, small equal to one if the student attends a small class in first grade, and zero otherwise and an interaction term, called inter, between the treatment indicator and time indicator. From the estimates we find, that on average students in small classes scored 8.7 points higher in math achievement after kindergarten, our pre-treatment period. That is students in a small class in first grade, on average better in math than students in ordinary classes in first grade, when they enter first grade. This is because students in small classes in first grade typically also attended a small class in kindergarten and this had a costly effect on the achievement in kindergarten. Therefore the two groups are not equal before treatment. A small class in first grade. We also find that an average student improved 44 points on the math scale between kindergarten and first grade. We finally find that the interaction term between treatment and the time indicator is negative. Hence students who are in a small class have a lower gain in math achievement during first grade than students in ordinary classes. The difference, is however small. But never the less the difference and difference approach indicate a negative albeit small effect of attending a small class in first grade. This seems somewhat counterintuitive. It could be expected that the effect was zero, but it seems unlikely that it should be negative. However, if you remember the Hawthorn effect from the last lecture it could be thought that students and teachers in small classes in kindergarten showed an extra effort because they thought this was appropriate as they had been selected into the treatment. Therefore, the negative effect in first grade, could be a kind of burnout effect that teachers and students in small classes in first grade were resting on the previous huge effort in math achievements. And this effect was apparently not due to the small class itself, but rather an effect of being assigned to the treatment. This concludes the course on inferring cause and effects in the social sciences. Hopefully you've enjoyed the course and appreciated that its indeed possible to conduct field work that enables estimation of cause and effect. But that you also have realized that its far from an easy task, and that many starters and data sets are not suited to infer causality. [MUSIC]