So instead, we define a couple of different error rates that are more

commonly used in genomics.

The first is the family wise error rate.

So this is the probability of finding even one false positive statistically

significant result if you've done, no matter how many tests that you do.

And so this is a very stringent error rate, it's basically requiring that

you not have more than one False Positive among all of these different tests.

Obviously that's going to be much more stringent than just calling p-value less

than 0.05 significant.

Another is the false discovery rate, and so this tells you something about

the noise among the discoveries that you're willing to tolerate.

So this is the expected # of False Discoveries divided by

the total # Of Discoveries, and so

you think about it like this, imagine that you get a false discovery to 5%.

That means you expect about 5% of the total number of discoveries that you've

made to be false positives.

So this helps you quantify sort of the noise level among the discoveries that

you've actually made, rather than noise level quantified versus the total number

of things that you're testing.

So, here's the difference in interpretation.

Suppose that I tell you that 50 out of 10,000 genes

are statistically significant at the 0.05 level.

Depending on what 0.05 level I've used, you get a different interpretation.

So if I just say out of the 10,000 genes I set

called all genes statistically significant that had a p-value less than 0.05,

then I expect 0.05*10,000 = 500 false positives, as we just talked about.

And since I only found 50 genes significant,

that doesn't seem like it's very good, there's probably mostly false positives.

On the other hand, if I found these fifty at a 0.05 false discovery rate,

then I expect there to be about 0.05 times the total number of discoveries I've made.

Because, remember the false discovery rate quantifies the fraction of discoveries

that you've made that are likely to be false.

And I get about an estimated 2.5 false positives among all of the genes

that I've called significant, so that's maybe more tolerable rate.

If I use family wise error rate, and I say at the 0.05 family wise error rate,

I found 50 genes that were significant, then that means that the probability

I'm controlling the probability of even one false positive to be less than 0.05.

And so basically I expect almost all of these 50 genes to be

statistically significantly different.