Up to now, we've looked at a few examples where we have by

default assumed a normally distributed population.

Now, for many situations such as product characteristics, e.g.

the volume of water in these mineral water bottles could indeed

feasibly follow approximately a normal distribution

and it serves as a very reasonable model.

However, thinking back to the tail end of week two,

we introduced a few different families of

probability distributions such as the Bernoulli,

the Binomial, and the Poisson distributions,

and these clearly exhibit very different characteristics from the normal distribution.

So in this section, we're going to consider

a remarkable mathematical result known as the Central Limit Theorem or CLT, for short.

Now, what is this all about?

Well, recall when we had a normally distributed population,

we noted the sampling distribution of the sample mean x bar,

whereby x bar followed a normal distribution

with a mean of mu and a variance of sigma squared over

n. Such that these sampling distribution

would be centered on the true population mean mu.

So on average, the sample mean would correctly estimate the population mean.

And then there was some variation in the observed sample mean values.

However, that variation would reduce as we

increase the sample size n. So given there will be

many other situations in life where a normal distribution

could not reasonably be supposed or assumed,

then we can defer to the Central Limit Theorem.

Which says, with one or two minor technical caveats,

which we will gloss over here,

that when sampling from any non-normally distributed population,

then asymptotically, that's a big word.

This simply means as the sample size n tends to infinity i.e.

as the sample gets larger and larger and larger,

then the sampling distribution of x bar converges to

the same normal mu sigma squared over n distribution.

So I'd like us to consider one special application of the CLT results.

One which is very useful in opinion surveys,

for example opinion polls perhaps in the sort of political science sphere.

So let's imagine we have a population which follows a Bernoulli distribution.

Remember the Bernoulli distribution,

one of my personal favorites,

whereby we divide the population into successes and failures, ones and zeros.

Such that in any member of the Bernoulli family,

there is a proportion of successes in the population denoted by

some success probability parameter phi and the remaining failures denoted by zero,

we would say occur with probability one minus phi.

So, here we have a two point probability distribution

whereby if we were to draw a sample from this population,

then each time we would either get a success coded as one or a failure coded as zero.

Therefore, our entire sample dataset was simply consists of all of the ones

and zeros reflecting the number of successes and failures we observed respectively.

So this is very different from a continuous smooth normal distribution.

But there is one way we can usefully apply the Central Limit Theorem.

So what is this saying?

So, let's take a very simple example and consider a sample of size five.

So n equal to five drawn from this Bernoulli distribution.

Such that we are doing some opinion polling.

I'm asking people whether they intend to vote for

the governing party or some other party.

So even though there could be multiple parties in this democracy,

it's very easy to reduce this to a dichotomy.

We could say you're voting for the governing party or some other party which means

all of the opposition parties and we don't need to make a distinction between them.

So let's imagine of the people randomly selected.

And suppose they are giving us honest answers so there's no response bias.

And they all give us an answer,

hence there is no non-response bias.

Let's say the first person says,

no, I will not vote for the governing party,

we will denote this as a failure and record this as an observation of zero.

The second person says, yes I will,

an observed value of one.

The third person says, yes I will.

The fourth person says,

no I won't, so a failure at zero.

And let's say the final person says,

yes I will, and hence we get a one.

So our pattern of observations would be a failure, a success,

success, failure, success denoted by zero,

one, one, zero, and one.

So let's imagine now we'd like to take the sample mean of all observations.

Well, here we learn back in our earlier work on descriptive statistics,

that to calculate the sample mean we just add up all of

the observations and divide by the number of observations.

Well, we have no need to deviate from that principle here.

So we simply add up our data values.

So zero plus one plus one plus zero plus one gives us

a grand total of three divided by our sample size n five.

So three over five gives us a value of nought point six.

So here we have calculated the sample mean.

But this has a very special interpretation because effectively we

have calculated the sample proportion of successes.

Because when we're adding up the zeros and ones,

the numerator term of our sample mean statistic.

Remember, the sum of the xi over n. What if we're adding up a load of zeros and ones,

the zeros do not contribute anything to the aggregate total.

So the numerator will simply represent the number of successes we've observed.

In this instance, three divided by our sample size n over five.

Giving us a sample proportion of nought point six i.e.

60% of the respondents

indicated that they would vote for the governing party and hence we got a success.

So given we're starting from a Bernoulli distributed population which is very non-normal,

we can now appeal to this Central Limit Theorem.

Now, we previously derive that the expected value of

a Bernoulli random variable was equal to

phi in the back of that probability weighted calculation.

So here that phi is the expectation of X really serves as

the Bernoulli case of the population mean mu.

Now, we'll just note the result that the population variance sigma

squared for a Bernoulli distribution is phi times one minus phi.

So if we now invoke our Central Limit Theorem result,

we can say our sample mean x bar which in

this particular application refers to the sample proportion,

which we may wish to denote let's say by the letter P, for proportion.

This is our special case of the sample mean here.

As n tends to infinity so asymptotically,

then the sample proportion P will be approximately

normally distributed with a mean of mu and a variance of sigma

squared over n. But since the population

from which we are drawing our sample is the Bernoulli distribution,

its mean is phi and its variance is phi times one minus phi and hence we can

now use the Central Limit Theorem to

derive the sampling distribution of the sample proportion.

Such that P is approximately, i.e.

if a large sample size is normally distributed with a mean

a phi and a variance of phi times one minus phi over n. So again,

we note that the expectation of the sample mean,

specifically this sample proportion,

is equal to the true parameter is trying to estimate i.e.

phi. That means on average,

our sample proportion will be equal to the true population proportion.

The phi in this Bernoulli distribution.

We also say that the variance of the sampling distribution will

decrease as our sample size n increases as we see n in

the denominator of the variance of P. So this serves

as an excellent illustration of how we can still use the normal distribution.

Yet again another example of its usefulness to us as statistician's in being able to

approximate the sampling distribution of a sample

mean when sampling from a non-normally distributed population.

So just to perhaps the end of this section,

let's just run a few simulations to see this Central Limit Theorem,

this approximation to normality really come to life.

Because I said this was an asymptotic result i.e.

one which holds increasingly well as the sample size gets larger and larger and larger.

But how long does it need to be for this approximation to be reasonable?

So, as an illustration,

here's one I prepared earlier.

I actually used a computer to simulate many random samples of

different sample sizes from

the Bernoulli distribution with the success parameter phi equal to nought point two.

So in the simplest case where n is equal to one,

this just means taking multiple random observations,

random drawings from this Bernoulli nought point two distribution.

So, as we have a very large number of

samplings from this distribution then the proportion of

successes that we observe should be approximately equal to

the true proportion of successes in this population i.e.

20% successes and 80% failures.

So here, we can produce a histogram of those simulated results.

And clearly this looks nothing like a normal distribution.

Unsurprising because as we only have samples of size one,

each observation is either a success and hence a value of one which is itself its sample

mean or when we have failures a value of

zero and hence also the value of its sample mean.

But now see what happens as we increase the sample sizes across all of these simulations.

By the time we reach sample sizes of about 50,

you can see the histogram which represents the sample means

calculated across this large number of randomly simulated samples,

we see the histogram is converging to a normal distribution.

And if we decide to increase the sample size n get further up,

so asymptotically, as n tends to infinity,

you really do see this histogram converging very nicely to

a normal distribution and hence a great example of the Central Limit Theorem.

So in our final section to come,

I just like to consider a couple of

statistical inference examples related to the sample proportion namely,

an example of a confidence interval and a hypothesis test.