And so it turns out that this is a good thing to do to permute the labels

than permute each gene expression level, because it leaves the gene

expressions levels, the relationship between those levels connected.

And that's good, or it leaves those intact because you might need to model

that relationship later on in the modeling process, which we'll talk about later.

And so the idea here is that you do this permutation and

then you recalculate a statistic for each gene.

So if you calculated the original statistics, say, for gene one, and

it was equal to 2, that would be where the original statistic is.

Then you permute the labels every time, and you recalculate the statistic.

You hope that it would be centered near 0, because there should be on average,

no difference between the two groups once you permuted the labels.

And you can see how extreme this statistic is with respect to those permuted

statistics.

And if it's really extreme you might think, oh, well then it's not likely that

this statistic comes from this distribution, and if it's not very extreme

you think, oh, well it might be coming from that distribution.

So this permutation idea is used all the time in genomics.

It's used not just for the simple comparisons but for network comparison,

for enrichment comparisons all of the time all over the place.

And it assumes that if you switch the labels the data come from the exact same

distribution.

So by permuting the labels we're sort of making the assumption that the labels

don't matter.

That that gene's expression levels are completely independent of the labels.

And it's not necessarily just a comparison of means.

So that permutation statistic we calculated,

the T statistic, is calculating a distance between the two means.

But by permuting the labels, we're actually making that distribution,

we're assuming that the distribution is exactly the same.

So that T-statistic will actually find any difference if you do this permutation

approach, any difference including in the variance or

any of the other moments of the data of the generating distribution.

So permutation is actually quite a complicated topic.

We've covered it just very briefly here,

we'll cover it a little bit more in the assessments.

But you can learn a little bit more about it in this advanced statistics for

the life sciences course.