0:10

In this video, we're going to talk about stratification which is

one way that causal effects can be estimated or identified.

Essentially, you would stratify on important variables, and then average

over the distribution of those variables which is also known as standardization.

So we're going to investigate that and see how it relates to causal effects.

And then we're also going to talk about limitations with this standardization

approach.

And why it's something we normally can't do, and

why we need additional causal inference methods.

1:10

But recall that, we're typically interested in a marginal causal effect.

Meaning when that does not involve conditioning on X.

So previously, we had defined for example,

the average causal effect as the expected value of Y1 minus Y0.

And you'll notice in what I just said, I didn't say given X.

So we needed to condition on X to be able to

link the observed outcome to the potential outcome.

For now, we want to get rid of the X and so in order to do that,

we'll just average over the distribution of X.

To make things simple, we'll just imagine for

now that there's a single categorical x variable.

So just one x variable, it can just take on a finite number of possible values.

It could be binary, 0, 1, that sort of thing.

And then what we'll do is we'll be able to get the thing that we want,

which is this expected value of the potential outcome.

2:12

So the expected value of Y superscript a, and that's the thing we want.

Because remember, if we have that for every value of a, then we can

contrast it and get casual effects, in a particular margin of casual effects.

So that's what we want, in order to get that what we'll do is we'll

take the expected value of Y given a and X, and average over the distribution of x.

2:36

And so to see what that is here, first,

let's note that this notation here, this capital sigma means summation.

And you'll notice there's a subscript x, so

that just means average or sum of all levels of x, okay?

Then we have this part here, which is the expected

value of y given among people who were treated at a little a.

And among people who have their covariants equal to little x.

So this is the subpopulation and this is something of course,

we can observe in a sense that, this is all observed data.

We can be strict to the subpopulation of people who have values a and little x.

And take the expected value of Y for

them, but that's a conditional expectation, right?

So that's the expected value of y among people who have X equal to little x, and

we can get that for every value of X.

So we would have a whole bunch of those expectations of Y, but

then we would need to average those.

And we'll average those in particular, over the marginal distribution of X.

So that's this probability of X equal to little x means.

That's the distribution of this covariant in our population of interest.

So all this is, the thing that we want, this expected value of Y superscript a.

The expected variable of the potential outcome is just an expected

value of the observed outcome in these subpopulations averaged

over the distribution of the covariant.

Sp this is known as standardization, and all we're doing is conditioning.

Conditioning means stratify and marginzaling averaging over.

So what we end up with standardized mean, and

that happens to be the same as the average potential outcome.

4:37

obtaining a treatment effect within in each stratum.

And then pooling across stratum where you're weighting by the probability of

each stratum.

If you actually had data, you could estimate a treatment effect then

by just computing means under for each treatment, within each stratum.

And then pooling across the stratum where we're weighting by the size of

the stratum.

4:58

To illustrate that, we're look at a hypothetical example.

And this will be kind of a simple example just to illustrate the main ideas.

So what we'll focused on is a population is diabetics, people with diabetes.

And we'll look at three different treatments.

5:53

One challenge here is that saxagliptin is a newer drug.

And users of saxagliptin are more likely to have had

some past use of other oral antidiabetic drugs.

And it's also the case that patients who've had past use other

oral antidiabetic drugs.

Which we'll call OADs are at higher risk for MACE in general.

So these for example,

could be patients who have tried a number of different oral antidiabetic drugs.

And maybe treatment hasn't been very effective for them.

So maybe they're might be sicker patients or

they might be patients that are just harder to help with medication.

6:46

So what can we do about that?

Well, the key idea here is we can then compute the rate of MACE for

saxagliptin versus sitagliptin in two populations.

So one is patients who have had no prior OAD.

7:13

And then we'll also stratify on patients who have had prior OAD use.

So we'll look at those two populations, that's our x variable,

is this prior OAD use, yes or no.

And then we can compute rates of MACE among saxagliptin and

sitagliptin initiators.

7:32

So if we do that, if we calculate those rates of MACE in these subpopulations.

We could then average across these populations based on the size of each

population, and then this will end up being a causal effect.

If it's true that within levels of prior OAD use,

treatment can be thought of as randomized.

In other words, the treatment assignment is ignorable given prior OAD use.

So this would be the case of, Clinicians

base their treatment decision on, the main thing that

influences their treatment decision is prior OAD, so they might be, for example,

more likely to give saxagliptin treatment to people who had prior OAD.

8:22

And if there are not other key variables that are determining this

treatment decision, then this is the variable we would need.

And we could ignore the treatment assignment.

We could ignore treatment assignment given that variable.

Realistically, in practice we would need to collect more variables than just

this one.

But we're simplifying to illustrate the main ideas.

8:49

So what we're looking at here is raw data of what we observe,

what we might observe in practice.

So in reality some people receive saxagliptin,

some people received sitagliptin.

9:02

And when we say saxagliptin equal no, that means sitagliptin.

And then we also have, some people have the outcome MACE and some people don't.

So MACE yes or no.

So we have this nice 2 x 2 table.

In this particular example we have 11,000 patients.

9:26

So whenever we say given, that just means restrict to that sub-population.

So in this case, we say, given saxagliptin equal yes.

So given means, restrict to this row.

Given saxagliptin equals yes means, just look at that row.

9:43

And then if we want to know the probability of MACE,

we would just take how many had MACE which is 350 and

we would divide by the total population size of saxagliptin users which is 4000.

We carry out the division, and we get a probability of 0.088.

Or about 8.8% of saxagliptin users ended up having the outcome.

10:20

And the probability of MACE for sitagliptin users, well,

500 of the sitagliptin users had the outcome MACE,

and then 7000 is the population size.

And we get a probability of 0.071.

So what we see is in our raw data,

we see that 7.1% of sitagliptin users had the outcome,

versus 8.8% of saxagliptin users.

And so just based on this raw data,

it looks like saxagliptin users are doing worse.

10:54

The problem is, we don't know if that's due to saxagliptin being less effective

than sitagliptin, or if it's because perhaps saxagliptin was

preferentially assigned to people who were sort of worse off.

So maybe sicker patients were given saxagliptin.

So in general saxagliptin was observed to have higher risk, but

we're not quite sure why at this point.

11:52

And it turns out that Saxa users are more likely to have prior OAD use and

we can basically see that by, if you look at the total number of saxagliptin users

who had prior OAD use is 3,000.

The total number of saxagliptin users who did

not have prior OAD use is 1,000.

Whereas for sitagliptin it's 3,000 versus 4,000.

So if you look at those ratios,

what we see is that the majority of saxagliptin users had prior OAD use,

whereas the majority of sitagliptin users did not have prior OAD use.

12:42

So here what we can see is that, so we're

looking at people with prior OAD use, and we're looking at then the risk of MACE.

And so we see that in general, if you have prior OAD use equal no,

your risk of MACE was 250 out of 5000.

Whereas if prior OAD use was equal to yes,

it was 600 out of 6,000 and we see in general

a higher rate of MACE if you were a prior OAD user.

Next we're going to actually carry out

the first step that we need to do if we're going to standardize, and

that's to compute the probability of MACE within each group.

13:38

So on this left hand table, this is the group of people who had no prior OAD use.

And we see that among Saxa users,

saxagliptin users, there were 50.

14:42

of 3000 that have MACE that have the outcome.

And, for sitagliptin users, we restrict to this row and

it's 300 out of 3000, and that's 10%.

And so what we see is that among prior OAD users, which is the right-hand table,

we also see that the risk of MACE is 10% regardless of treatment.

So if we think about these two tables collectively, what we see is that

in either group, in either sub-population, based on x, based on prior OAD use,

the risk of MACE is the same regardless of whether you get saxagliptin or

sitagliptin.

So now in contrast to the prior table that didn't stratify,

it looks like there's no difference in terms of treatment effectiveness,

whereas if you, sort of naively, didn't stratify on prior OAD use,

it looked like saxagliptin was a less effective medication.

15:46

look at the mean of the potential outcomes among saxagliptin users.

So on the previous slide we were just looking at

the rates of the outcome in these different sub-populations, but

those were always conditional on x and remember we want to marginalize.

We want to Have a expected value of a potential

outcome that's not conditional on x, so we're going to have to marginalize.

So next, we'll go through how to do that.

So our goal here is this potential outcome,

which is the expected value of Y if everyone in our population had,

hypothetically been assign saxagliptin.

That's what we want to know.

16:30

The way we're going to do that is by calculating the expected value of

Y among saxagliptin users at each level of X.

And then take a weighted average of those based on the size of those

corresponding populations, so let's first, we'll walk through each component here.

So first we're going to focus on saxagliptin users in the prior

OAD equal yes group.

17:10

And that's just 300 out of 3000, that's what we've seen before,

that's just the risk of the outcome.

If you're in the saxagliptin group and

you have Prior OAD use equal yes, so that we've seen before and

17:26

I should mention that this whole equation here is exactly the one we saw a few

slides ago where we're averaging over the marginal distribution of X.

It's an expected value times the probability of that value of X plus

the expected value times the probability of that x, so it's just a weighted average

and that's what we are trying to calculate so we're filling in these pieces, so

the next piece, so next we want this one

18:12

What proportion of them have prior OADs and

that'll tell us the probability of prior OADs.

While the proportion of them that do are the number of them that have it is here

6,000, so we have 6,000 out of a total of 11,000.

That's a proportion that have prior OADs.

So that's the number we're going to fill in there.

18:39

the prior OADs equal male group, so now we're focused on this sub population.

Our first calculation is, just what we saw before,

50 out of 1,000, that's the expected

value of Y given saxagliptin and given prior OADs.

So we've walked through this before, that's 50 out of 1,000.

And next we're going to need to know what proportion of the population.

19:34

And what that will do then is we can calculate that and

actually get an expected value of the outcome.

Which the outcome here is, essentially you could think of it as the probability

of MACE, if everybody had been assigned saxagliptin.

20:00

So now we're going to talk about the expected value of the outcome which is

MACE here if everybody had possibly contrary effect and assign sitagliptin.

We can basically walk through the exact same steps, but now restricting to

the second rows of these columns where we're dealing with sitagliptin.

So, we could just,

we had already walked through these calculations for

the other groups so It's probably not necessary to do all the details here but

I'll just mention one of them.

So for example, probability prior OAD use equally yes.

This is just the same as what we saw in the previous example, right?

This hasn't changed.

This is our population probability of being in the prior OAD group use,

group and that's 6,000 divided by 6,000 plus 5,000.

So that's that.

And we had previously on other slides calculated this one.

So we've actually already done these calculations before we

just have to put them together and we end up with 7.7%.

21:13

So what we can see here is that the, once we marginalize

we end up with the mean of the expected value for

saxagliptin and sitagliptin is exactly the same,

it's 7.7% in each group.

So, in other words, the potential outcome is exactly the same if

you gave everybody saxagliptin versus everybody sitagliptin.

In principle, that's a very effective way to get a causal effect.

We find these important X variables

21:50

that we need to make the ignorability assumption whole.

We stratify, we average, and then we can get a causal effect.

However, this becomes problematic very quickly,

because you can imagine having many X variables.

You might need many X variables to achieve ignorability.

So in practice, a clinician might not just look at your

history of medication use, but they might look at your history of many variables,

and even your own preferences, and your general health, your age.

So there might be a big collection of variables that we need to control for.

22:33

What we mean by many empty cells is there will be combinations of X variables for

which we just have no data.

There is no people that have that combination.

So there is no way for us to calculate a mean and then average.

22:57

So we're going to need alternatives to standardization.

The concept of standardization is extremely important because we're going to

keep that concept throughout the course.

We're always going to be doing something that's trying to get at standardization,

but we're going to have to do something a little different.

We're going to have to do things that are slightly different.

23:16

So in much of the remainder of the course, we're going to explore different options.

So with standardization as sort of the ideal situation, if hypothetically

you could have that much data, we're going to think of alternatives.

And in particular were going to focus on matching inverse probability of treatment

waiting and propensity score methods when it comes to observational studies.

And were also look at instrumental variable methods,