[MUSIC]
Hi guys.
Welcome to the 27 lecture of the course Biological Diversity, Theories, Measure,
and Data sampling techniques.
Today, I will show you the fourth part of the statistics applied to the study of
biological diversity.
When we are conducting statistical comparisons between two samples,
beside to be interested in the association between the observed frequencies and
the effectiveness of the sample itself.
So trap, sampling techniques, and so on, as explained in the previous lecture,
one of the questions to answer in order to demonstrate the validity of
our hypothesis is if the sample size is significantly different,
from a statistical point of view, of course.
For example, if the biodiversity of a contaminated site is statistically
different from that of a non-contaminated site, it can be, in fact, that samples
with a similar mean or median, can have completely different variances.
In general, the comparison tests between the samples are divided into nonparametric
tests and parametric tests, depending on if they are comparing the means or
the variances using actual values, or the medians using ranks respectively.
I would describe you some simple parametric test.
This test assumes the observation are at scale of internal ratio, so
they are continuous,
otherwise the data should be transforming with the logging being transformation.
And that they are deprived from the population normally distributed.
If the distribution is not symmetrical, or
you are not sure about that, you must use a nonparametric test.
Generally biometric measurements, weight, height, length can be considered normal
distributed when applied to homogenous categories, or male or female or adults or
young people, for instance, while the are likely to be symmetrical.
Before proceeding to the comparison of samples with a parametric test,
you must make sure that the two samples are actually or
probably extracted from the two different populations.
Because although they can have similar means,
at least the variances have to be different.
Otherwise, you would have immediately confirmed the new hypothesis.
To check this issue, proceed to conduct a preliminary test that's
called F-test which evaluates the deviation of the ratio of two variances.
So the F-test is just calculating that's the maximum
variance divided the minimum variance.
After calculating the variances of the two samples,
as described in the previous lectures, derive the degrees of freedom.
So for the sample one is just degree of freedom minus one,
and sample two the same, where n1 and
n2 correspond to the number of sampling units of the samples 1 and 2.
Subsequently, check into the distribution of the F-probability table the value
at the intersection of the two degree of freedoms.
If the obtained value of F is lower than that in the table,
it is not possible to reject the new hypothesis immediately.
So the variances are similar, because probably the two samples come from
the same population, that is no statistical significant differences.
We need to conclude that at this level of analysis, the two variances are similar
and to accurately reject the null hypothesis and so
confirm the statistical significance, we must proceed to perform a parametric test.
Even the calculated value of F is higher than the critical value in the table for
the respective degrees of freedom, is not necessary to proceed further, and
it can be concluded that the null hypothesis is
reject because the samples are significantly different.
They do not derive from the same population.
So only in the first case, the following test should be carried out.
We can conduct the Z-test to compare the means of large samples.
It means more than 25 sampling units for both samples,
or we can carry the T-test for comparing the means of small samples.
It means less than 25 sampling units for both samples.
Z-test to compare the means of large samples as the ratio between
the difference of the two samples means, and
the standard error of the difference estimated from the variance of two samples
can be calculated with the following formula.
If the calculated value of that is greater than 1.96 or 2.58,
which is the normal distribution corresponding to P 0.05 or
P 0.01, you can reject the null hypothesis and confirm that the two means
are statistically significant or highly significant respectively.
The T-test instead is useful to compare the means of small samples, and
that's used at the variance of the two samples being small, may be similar, and
the sample units of both can be added up.
So this test introduces in the formula, the common variance, therefore,
the formula becomes using the square root of the common variance, the following.
Checking in the probability table of these distribution,
if the calculated value is larger than that in the table for the related degrees
of freedom of a two table test, you can reject the new hypothesis, and
it can be concluded that the difference between the two means is significant.
T is more than the value in the table for P 0.05, or highly significant,
T is greater than the value in the table to P 0.01.
So guys, today I show you how to calculate parametric test for
normally distributed data.
But in case you have no normal distributed data,
you need a nonparametric test that I will explain to you during the next lecture.