0:00

[MUSIC]

Hi, in this module, I'm going to talk about the phenomenon of Type I Errors, or

false positives.

A Type I error occurs when we incorrectly

conclude that a No hypothesis is incorrect.

That is we incorrectly reject the null hypothesis, or

conclude that a relationship or a difference exists, when there is none.

This occurs when a test suggest that a result is statistically significant,

when in fact it's the product of chance variation in the composition of

our sample.

Again, the p-value is an estimate of the probability of

the difference between the hypothesized parameter and

our observed measurement being the result of random chance.

But again, a p-value is never completely zero, it's always non-zero.

So there's always a chance that any result that we do observed,

could be produced by random chance.

1:05

Now, if we're worried about these Type I errors accidentally accepting

a result as statistically significant, when in fact, it shouldn't be.

Try to increase the sample size, so

larger sample sizes tend to change our test statistics.

Generally, make them larger so we get generally smaller p-values.

We can make the criterion for the hypothesis test more strict.

So we can demand that we want a p-value of 0.01 or 0.001,

before we accept that a relationship is statistically significant.

So there is only a one in a hundred or

one in a thousand chance committing a Type I error.

And we can try to reduce measurements error, that's a little bit technical.

But if we measure our outcome with more precision,

with careful protocols and so forth,

to remove chance variations that helps reduce the chances of a Type I error.

2:07

Now, I want to talk about the phenomenon of mass significance,

which is actually associated with Type I errors.

So because of the prospect of Type I errors,

in fact the likelihood of Type I errors.

Significance tests should not be used in evaluating large numbers of relationships.

Sifting through them to find relationships that are worth looking at,

because they appear to be statistically significant.

Significance tests should be used in evaluating pre-specified hypotheses, not,

again, screening and conducting exploratory analysis.

So, using p-values to screen for

statistically significant relationships among large numbers

of variables in exploratory analysis is extremely dangerous.

Think about it.

If we regress one variable made up entirely of random numbers,

on 100 other variables that are all, again, made up entirely of random numbers.

On average, if we look at the coefficients for these 100 right-hand side variables,

on average, five of them, even though they have nothing to do with the outcome.

Five of them will be statistically significant at the 1% level.

On average, one of them will be statistically significant at the 1% level.

So, again, we want to avoid screening.

Because we're guaranteed if we run enough regressions, that we're going to find

relationships that are statistically given and we're going to have false positives.

So some examples, genetic studies were a great deal about the prospect of

false positives, speaks quite often their regressing

measures of some phenotype on thousands or tens of thousands of genomes.

This would be in Genome-Wide Association Studies.

So when they do analysis, they have to set very strict criterion for

assessing statistical significance,

perhaps demanding a p-value of 0.00001 or 0.00000001.

Regressions that include large numbers of interactions between categorical

variables will produce dozens or hundreds of estimated coefficients.

And again, we are guaranteed that a certain number of them,

even if they actually have no relationship to anything,

will appear to be statistically significant.

4:24

Another related issue is P-value mining,

that if you are looking an association between some y variable,

some x variable, and then you have a bunch of control variables.

Continually tweaking the model to introduce or remove variables,

or somehow change the sample that you're conducting the analysis on.

It's a bit like drawing different samples, not quite.

But, you run the risk that, eventually, just by luck of the draw.

You'll get a sort of model and a sample in which

a relationship appears to be statistically significant, but it's actually not.

And it's just chance variation, after trying enough different models,

enough different definitions of the sample

that you seem to get something that appears statistically significant.

5:10

Now, a new issue that we're becoming increasingly aware of is publication bias.

Thus far, we've considered examples where mass significance leads to problems for

an individual researcher or a team.

Somebody sifting through hundreds maybe thousands of relationships or

coefficients, and digging out the ones that appear statistically significant.

But actually, we have to keep in mind, and we're recognizing this increasingly

as an issue, is that we have around the world hundreds of researchers or

hundreds of teams working on related topics.

But using independent samples.

So if you have 100 teams around the world working on the same topic,

doing the same analysis.

But each doing something on a sample that they've collected themselves.

Even if there's actually no relationship between whatever it is

that all these teams are looking at, and then,

the right-hand side variable that they're testing.

Five, on average, of these hundred teams, would yield or

come up with results that are significant at the 5% level.

Or on average,

one of them would come up with a result that is significant at the 1% level.

Now a publication bias refers to the fact that, that team or those teams,

that had the false positives, they'll be able to go and publish their papers.

Because journals are interested in novel findings,

that's what drives the field forward.

The remaining teams will put the results in a drawer or

the filing cabinet or just throw them away.

May not be able to publish them, until they have an opportunity to refute

the papers published by the teams that have the false positives.

So, these teams that have the false positives, they publish papers.

The other teams don't publish anything, they move on to something else.

Again, until perhaps, there's been some controversy.

And then they can pull out their old results, and their negative results, and

publish them.

So, what can we do about publication bias?

Well, we're moving towards, especially for medical or drug trials,

situation in which researchers announce their studies before they conduct them.

This is in response to a problem we found with pharmaceutical companies that were

essentially repeating studies of the effects of drugs on particular diseases.

And then, burying the results until they, again, by chance variation,

they got a false positive which suggested in effect of some medication on a disease.

That became the study that they published.

So as long as people have to announce their studies or

report them, record them before they conduct them.

Then if somebody comes out and says, well, they've got a exciting new result,

we can check back to see if they've already run 15 or 20 studies already.

That were actually testing the same relationship that didn't find anything.

8:00

We'd like to create repositories for negative results.

So that those teams that tried something, it didn't work out, they can't publish it.

Journals normally aren't interested in negative results,

unless you can refute somebody's controversial positive results.

Still we should have some way of making these results online, available, and

searchable, so that people don't reinvent the wheel.

And we can have some overall assessment, and look at all of the studies

of a particular phenomena, to figure out whether these studies that

are published are just false positives and the result of publication bias.

As a consumer of research results,

you want to be wary of studies that have small numbers of subjects.

These are much more at risk of Type I error.

So you should look for large studies, especially with controlled treatment

designs, where your chances of a Type I error are somewhat smaller.

And we should encourage the replication of published studies by using

independent samples.

So, again, I hope that I have sensitized you to some of the issues that

arise when we have to think about Type I errors in our research.

You may have learned about Type I errors when you took a statistics class.

But you probably didn't think about these broader implications for

the problem of mass significance or publication bias.

So I hope that you're now sufficiently aware of these issues to take account of

this, both in your own research, and as you consume other published research.