More importantly, specifically for

student's t-test, we also have to have equal variances.

Remember, we can work out the mean for each group.

But we can also work out the standard deviation, or

the square of that which is the variance.

The average distance, the average difference between each value and

the mean, they've got to be roughly equal.

Now, there's not an absolute rule of how different the variances can be.

But you have to look at the sizes and

whatever's at hand with that specific data to make a judgement call,

whether those variances are equal enough for you to use student's t-test.

We also have to have unpaired groups.

So the individuals in the one group must be independent from individuals

in the other group, cannot be the same individuals in both groups.

Or they cannot be connected or dependent on each other in any way.

Now some of these aspects I really want to go delve in a bit deeper.

Most importantly the data points coming from a population

parameter which has a normal distribution.

So, if you look at the data from a sample, remember that's a sample from

a population, how would you know all you have is that set of sample data points?

How can you know that they come from a population parameter that is

normally distributed?

Well, one thing you could do is just to make a histogram of the data values for

each group that you have.

And if they form a normal distribution, as we see there,

there's a density estimate plot there.

And we can see there's a rough normal distribution to it.

That would be one way to be relatively sure that these data points do come from

a population parameter that has a normal distribution.

And I can probably use a student's t-test under these circumstances.

A better way perhaps to do it is what is called a q-q plot,

the q standing for quantile.

We've got to do certain things there.

You're going to see a plot.

You're going see a red line, and some blue dots.

Now the red line will just represent a line that absolutely proves that something

is from a normal distribution,

a really straight line that goes from the left bottom to the right top.

The blue dots that you see though, that takes each individual value,

and it plots its quantile.

Now the quantile is the percentage of values that are less than that specific

value.

Do you see the red line there?

If all the blue dots fell exactly on that line,

there'd be a very good indication that that sample data

points come from a population parameter that has a normal distribution.

So each individual blue dot is one of the values from a group.

Now you've got to do it for

each group because they both have to come from a normal distribution.

And each one of them is plotted against their quantile,

the percentage of values that are less than that value.

And you can see these blue dots very nearly follow that red line.

So, we can assume that these data points for this sample set does come from

a population parameter that has a normal distribution.

Hence, we can use Student's t-test.

Now, if they do not do that, if it does not come from,

if you see that those blue dots are all over the show with a q-q plot,

then you can not use a parametric t-test.

Then we have to look at non-parametric t-tests.

And that's why it's important to look in the methods section

of any journal article.

Do they talk about these things?

It would be even better if we all had access to the data that

was actually used for that analysis, so we can see for

ourselves whether it was appropriate to use a parametric test.

Now another important point of the assumptions that we

make with student's t-test is this equal variances.

Now as I said, there isn't an absolute character for

how different the variance between the two groups have to be.

But you do see unequal variances or

something that must highlight the fact that you might be dealing with unequal

variances that are of such a large extent that you can't use student's t-test.

But you have to use another t-test.

It's usually when the data values have some skewness to that data.

And it's also regularly seen when the sample data numbers are quite small,

when you only have a few patients or participants in each group.

Then you have to have another type of t-test, and

you've seen an example of this.

You also have a distribution there that is quite skewed,

with the tail going off to the right.