Now, we're going to switch gears and take up the topic of Longitudinal Causal Inference in which a sequence of treatments is applied over time, and one wants to compare the effect of different sequences on outcomes of interest. There is a very important generalization and it's due primarily to the work of James Robinson and his collaborators. Now, much of the work is fairly technical and difficult and I don't intend to do a whole lot more than give a brief introduction to the topic, and as I say, it's more difficult. Other places you might look at there's a recent book by Hernan and Robins, and there's a chapter by Hernan and Robins in 2009, and both of these provide a more extended, yet somewhat accessible introduction especially the book, and the book provides some intuition as well, and both provide references to the literature. So, let me start with a few examples. So, in neuroimaging research, measurements of neural activity are taken at periods t equals one through t. During each period, a stimulus Z may or may not be given. Another example, a medical researcher may wish to test if a medicine Z with nasty side effects, administered every day t yields better outcomes Y than administering the medication every other day. Or the researcher might want to know if administering a medication only days when a patient's symptoms were worse than those of the previous day, produces the same outcome the day after as versus administering the medication every day. You can see that last one's a little more complicated. Similarly, accompany that sends its clients promotional material might want to know whether more frequent mailings are more effective than less frequent mailings. So, those are a few motivating examples and in order to keep the notation manageable, I'm going to get rid of this subscript i and simply referred to an arbitrary subject. So, let's observe a subject of periods t equals one through t, and let's let Z t be one if the subject as assigned to the treatment in period t and zero otherwise, and let's put an over bar to denote the cumulative sequence up through period capital T, and that's the assigned treatment regimen and let's let Zt with the little t be the sub regimen consisting of assignments through period T. Now, this ZT, the big T is one of the possible regimens. Now, I've index them little ZT because now I'm going to index all its members in some set of interest, and of course this set of interests Omega z is a subset of the Cartesian product zero, one to T. So, let's let y index by little t of the treatment regimen ZT, the whole regimen measured at the end of period t denote the potential outcome under the treatment regimen little zt, and we're going to assume this depends only on treatments administered prior to the time of measurement. Pretty sensible most of the time, almost all the time, and so we'll here after write this as y t z with the little t instead of the big T. So, there are many estimates we might be interested in, and we shall focus on the case where y is continuous and interest resides in averages of unit differences, which I've written and we could have other metrics for measuring average effects. But that's what we're going to do here. So, for the next several lessons for each t, we're going to consider the average effect of treatment regimen Zt versus ZT star, given some covariates X1 that are observed prior period one treatment. That's the estimand and the average effect which we get by marginalizing out X1, and we'll generally be interested in doing this at each time point t. Although sometimes you may just be interested in it at the very end of study. In which case, you wouldn't look at all of these. So, in neuroimaging research, subjects or sometimes randomly assigned to treatment regimens. To ascertain whether the same outcomes can be attained when patients are given medication every day versus every other day, the researcher might randomly assign half the subjects to receive daily treatment, and half to receive the treatment every other day. So, in both these cases, neuroimaging and medication case, no issues arise that we have not already addressed. But in the case where the administration of the medication on day t depends on day t symptoms and the subsequent outcomes such as yt zt also depend on day t symptoms, day t symptoms are a confounder and it is necessary to adjust for these. But at the same time, day t symptoms are an outcome of previous assignments and symptoms, and adjusting for such intermediate outcomes will generally create comparisons that are not causal as we saw before. So fortunately, it is possible to extend the unconfoundedness conditions developed in part one to cover time varying confounding. So, we'll let x t denote measurements of the time varying confounders taken in period t, and these may or may not include the previous outcome Yt minus one in period t minus one or for that matter, even several previous outcomes. We're going to assume that in each period T, measurements xt are first recorded then treatment ZT is given, followed by measurement of Yt. It's important to keep that in mind. So, now we're going to write the cumulative history up to time little t by putting a bar over the x. The identification conditions extend the assumption. Remember Rosenbaum and Rubin have strongly ignored both treatment assignment given covariates that we studied in part one. So, the potential outcomes are independent of treatment assignment given the covariates, and zero is less than the probability of treatment given X1 is less than one. That's so-called positivity condition. At each value of x one, you can have both treated and untreated subjects. Now specifically, we're going to assume see Robins and Hernan for example, that for all regimens of interest and for all values of the covariates and for all t, we're going to make the following assumption. That potential outcomes after time little t are independent of treatment assignment and at time little t given the complete history of the past, and we're going to make this positivity condition, which is the analog of the positivity condition in the case where there's just one assignment. Or we can think of the special case above as when ZT is just as z1. So, the first condition is often referred to as sequential randomization or conditional exchangability and the second is the positivity condition. Now, under these conditions, the distributions of the potential outcomes both within covariates, and of course once we have that, we marginalize over x1. So, these distributions are identified. So, for example, if the probability function of f is discrete, this is the formula we get. I'm going to explain a little bit about that in a moment, where f of x one given x naught and z naught is just by definition F of X1 because that's where things start. X1 is prior to treatment assignment, and there is no x zero. Now, if the probability function is identified, of course the conditional expectation is identified as well as below. Now, Robins and Hernan call this result that G formula, and it is not difficult to establish and is evident from the following derivation for the case t equals two, and then you should follow this and then you can go ahead and extend it to t equals three et cetera. Make sure that you understand. Now, we're riding on the left hand side the potential outcomes. We're assuming covariates are discrete. So, the first thing we want to do is you'll see the first equation and that's just the law of total probability which you know, and then the second equation, we can stick the Z1 in there. That's the only difference. We can stick the Z1 in there by the unconfoundedness assumption, and then the third equation, now we're going to stick in X2 and go over the distribution of X2 given X1 and Z1. Again just log total probability essentially and we still have the rest, and now in the next line, because of norbility, we can stick in the Z2, and so now we have our formula and the last line follows because if we see Z1 and Z2, that's a Y and that's the same Y2 that we would see when Z1, Z2, X1, and X2 are set to those values, and so that gives us our identification condition because now the last line we have everything written in terms of observables. To use the G formula to compare different regimens and sub regimens, it is then necessary to estimate those expectations above. So, then we have to estimate the conditional expectation of yt given zt and xt, and we also of course need to estimate those probability functions in the previous slide. So, that can get tricky because this can mis-specification will generally lead to biased estimates, and as t increases, one might expect such things to become more likely because you're growing in a number of things you have to estimate, and additionally as t increases, the number of sequences also grows exponentially. In light of that latter thing exponential growth, Robins, Greenland, and Hsu 1999 analyzed data from an observational study of 167 children who were observed over 30 days. Now on each day, a child may be ill or not which is the outcome and the mother may be experiencing stress or not which is the treatment. So, there are two to the 30 possible treatment regimens. Robins and Hernan also point out a problem that they call the no paradox G estimation. Briefly under general conditions using parametric models to compare the means under different regimens can lead to falsely rejecting the null hypothesis of no effect, even when it's true, and this is one reason the G formula has been less used in marginal structural models. The topic I'm going to take up now, and another reason is that only recently as commercial software in SAS become available for implementing this approach.