Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

39 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Hi. My name is Brian Caffo.

This is Mathematical Biostatistics Boot Camp Two,

Lecture Nine on Simpson's Paradox and Confounding.

In this lecture, we're going to talk about a phenomenon called Simpson's paradox.

And I, I, I don't find it to be a paradox.

But it's called Simpson's paradox.

we'll talk about some examples like the Berkley data of Simpson's paradox.

And then we'll talk about this related to the treatment of confounding, and then

I want to cover a particular way

to handle confounding through weighted estimators and

then talk a little bit about the Cochran Mantel/Haenszel estimator.

Okay, so consider this data right here, which

is taken from Agresti's Categorical Data Analysis book.

I, I think I've mentioned this book in the, in the

past, So, in this, in this instance there was a cross classification.

Of defendants from criminal trials where they

cross classify by the race of the victim. These are all murder trials.

So, the race of the victim, versus the race of the

defendant and then whether or not the person got the death penalty.

And here, I present all of the, the possible cells plus all the

possible marginals. So for example, here, you

see the eight cells that classify, victims race and here we are only factoring

in, two race denominations, white and black,

and death penalties, so there's eight cells.

and then here I have, the the, the margin for the

defendant white versus black. summed over victim's race.

And then here I have the victim, white versus bla, black.

summed over the defendant's race.

Okay. So let's actually investigate this.

So I'm, we're looking at the percentage of people that got the death penalty.

So if you look

white defendants receive the death penalty

a fewer percentage of the time, 11 to 22%,

for both white victims and black victims.

zero of the, of the white defendants receive the death

penalty for the black victims verses 2.8%.

Okay?

But then something kind of paradoxical occurs.

If you disregard the race of the victim, it actually comes out that white

defendants receive the death penalty a greater

percentage of the time, 11% to 8%, okay?

And then if you look at the race of the victim, disregarding the

race of the defendant, actually in the instance where the victim was white,

the, the the defendant received the death penalty

a higher percentage of the time, 12% to 2.5%.

But let's forget this last two, race of the victim, marginal.

And let's just compare the te, table itself,

in which case, in both cases, the the

white defendants got the death penalty a smaller

percentage of the time than the black defendants,

regardless of the ri, race of the victim versus the marginal, here, 11 to 8%.

Where the white defendants got the death penalty

a greater percentage of time than black defendants.

So what's happening?

If you were asked to, you know, this, this was related to a court case about whether

or not the death penalty was being equally applied, and so what would you conclude?

The, if you condition

on the race of the victim, you

get a totally different, the opposite answer than

if you look at the race of the defendant disregarding the race of the victim.

So what is the right answer?

So let's just discuss a little bit about what's going on.

So marginally, white defendants receive the death penalty

a greater percentage of time time than black defendants.

Across white and black victims, black defendants received the

death penalty a greater percentage of time than white defendants.

And Simpson's paradox refers to the fact

that marginal conditional associations can be opposing.

In this case, if you take the margin across victim's race.

You get a different answer than if you, condition on victim's race.

Here, So, here the death penalty was enacted more often for

the murder of a white victim than of a black victim.

And then whites tend to kill whites, it just demographically.

hence the larger marginal association.

but I want to, you know, kind of do a little bit

of a commentary before I go through more of examples.

So I'm going to cover several examples.

First of all, when you state Simpson's paradox in the

following way, it doesn't seem that paradoxical at, at all.

And that the paradox is, the apparent relationship between

two variables can change when factoring in a third variable.

And then that, that seems obvious.

Of course that's true.

It just seems difficult when you start to get mired in the specifics.

That,

and later on I'll go through the math to say

that there's nothing paradoxical about the mathematics of Simpson's paradox.

and Larry Wasserman, on his blog, The Normal Deviate,

has the most wonderful discussion of why Simpson's paradox is

difficult and what the mistake people are making.

And the mistake people are making is they're equating the statements the, the

causal statements with the probabilistic statements,

and the probabilistic statements can be misleading and

paradoxical and you're trying to make you know, difficult conclusions

in the light of noisy evidence.

so, and, and in addition, even if you knew

the exact probabilities, the probabilities themselves can exhibit the paradox.

However, the causal statements cannot exhibit a state of, of paradox.

Okay?

So the problem, I think, is his statement is that the.

The, the, the real confusion is equating the, the cause, in this

case, if you were to, the cause would be, you would say that

for example, juries tend to convict causally convict

black defendants more often than they convict white defendants.

If you were to make that causal statement, then it is impossible

for the, for marginal and conditional

associa, conditional causal statements to disagree.

and so, you know, the real details of this, I, you

know, put up the link to his blog post, which is wonderful.

But the real details of investigating the causal

statements is beyond the scope of this class.

We're not going to cover causal inference in this class but,

but I think it was a great discussion on his part.

To basically show or demonstrate that it is this conflation of cause.

With describing

probabilities and associations that is the apparent paradox.

but mathematically there is no paradox and I

think when we go through it you will see.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.