0:02

There's an important result that arises out of these facts and the important

result is that the variance of the sample mean of a collection of independent and

identically distributed random variables is sigma squared over n.

So let's assume that we have a collection Xii equals one to n, that are independent

and identically distributed, IID, and that the variance of the distribution that

they're drawn from is sigma squared. Okay?

So let's calculate variance of X bar. Well that's just the variance of one / n

times the sum of the Xs, right. That's just the sample mean formula, the

sum of everything divided by the number of things you added up.

The one / n is a constant, so we can pull it out.

One / n^2, and we get the variance of the sum.

The variance of the sums, the sum of the variances because the Xs are independent.

Hence uncorrelated, and then because they're IID, the variance of each XI is

the same, its sigma squared. And we've added up n sigma squared so we

get n sigma squared, so it works out to be sigma squared over n for the final line

here. So, what does this mean?

It's really quite an interesting fact. What this says is, if I want to know

what's the variance of the distribution of averages of ten random variables, say,

from a distribution. I don't actually have to know what that

distribution of averages actually is. So I don't have to do that.

All I have to know is what the variance from the original distribution that the

individual observations are drawn from and that gives it to me.

I just have to divide that by n. Right?

So if I want the variance, I divide by n. If I want the standard deviation, take the

original standard deviation and divide by square root n.

And why is this important? Because remember, eventually we'd like to

connect all of these ideas, these population model ideas, to data.

And, if we have a bunch of things that we're willing to model as if they were

IID. Well, we get multiple draws from the

distribution of individual observations. All the XI's are separate draws from the

original distribution. So we can estimate things like sigma

squared. But we only get one sample mean.

Let's say we have a sample of 100 observations, we only get one sample of

100. So if we calculate the sample mean of all

those 100 observations we have nothing empirically to estimate the variance of

sample means of a 100 variables, we don't have repeated samples of 100 variables.

We only have the one. What this result says, you don't need

that, right? Because all you need is the variance of

the original population and divide it by n.

The variance of the original population is something we can estimate.

And so it's a very nifty result. Let me give you an example of this

property that you could do at home to just test this result to make sure it's true.

Recall in the last lecture. We said the variance of a die roll, which

takes values one to six with equal likelihood.

One, six for each number. The variance of a die roll was 2.92.

Okay. So what that says is if you roll a die

over and over and over and look at the distribution, you'll get about one-sixth

of each number. And that the variance of that

distribution, so if you were to roll it thousands and thousands of times and take

the variance of the thousands of measurements, you would get around 2.92.

So do that, roll a die a lot of times and take the sample variance of the thousand

die rolls for example and you'll get about 2.92.

Why is that? That's saying because the sample variance

of lots of die rolls estimates the distribution of the population of die

rolls which is this uniform distribution of one to six and its variance is 2.92, so

you'll get that. Now here's the question that the

calculation on the slide is answering. Suppose now instead of rolling a die over

and over again you roll ten dice and took their average.

And repeated that process over and over again.

Right? So now this would no longer be uniform on

the numbers one to six. Still, the minimum would be one, right?

If you got all ten 1s, the average of ten 1s is one.

And it, the maximum would still be six. The average of ten 6s is still six.

And so, the bounds are one and six. But it would not look like a uniform

distribution on the numbers between one and six cuz you can get all sorts of

different numbers, right? You can get numbers between one and two,

two and three, and so on. So it has kind of a funny distribution,

the distribution of averages of ten die rolls.

So imagine if you were to do that. Roll your ten dice, and take the average.

And do that over and over again so that you got, say, 10,000 averages of ten die

rolls. Right?

And you wanted to know what was the variance of that distribution.

Well it seems kind of like a hard calculation.

First you'd have to figure out what's the distribution of the average of ten die

rolls which seems kinda like a hard distribution.

We'll actually later on discuss that that's maybe even a little bit easier to

calculate than you might've thought. But this calculation says you don't even

have to worry about that. We know that the variance of the

distribution of individual die rolls is 2.92, so the variance of the distribution

of averages of ten die rolls is 2.92 / ten, so it will be 0.292.

And so we could run this experiment in R. For example, where we rolled a digital die

thousands of times and took the variance of a 1,000 die rolls and you'd find it's

about 2.92. And then we could also do this experiment

where we roll ten dice, took the average, and repeated that process over and over

again and got 10,000 averages of ten die rolls and you would find that the variance

of those averages was about 0.292. Very interesting, and so it's a very

simple formula. And so, let's belabor this point, on the

next slide. So when the xs are independent with a

common variance, the variance of x-bar is sigma squared over n.

The quantity sigma over n, the square root of this is so important and we give it a

name and we call it the standard error of the sample mean.

Basically, a standard error is nothing other than the standard deviation of a

statistic, in this the statistic is the sample mean, but you might have a standard

error another statistic for example the median, then itself has a standard error.

It's may be little hard to calculate but nonetheless it has a standard error.

So, what is the standard error? The standard error of a sample mean is the

standard deviation of the distribution of the sample mean.

So, sigma, the standard deviation talks about how variable the population is.

Sigma over square root n talks about how variable the population of average is of

size n from that population R. So two different statements and they

estimate different things. So, for example, if the Xs are IQ

measurements, Sigma talks about how variable IQs are.

Sigma over square root ten, say, then talks about how variable averages of ten

IQs are. Okay, so they're different, they're

obviously related, but they're different concepts, and it's easy to confuse the

two. An easy way to remember this, by the way,

is that the sample mean has to be less variable than a single observation,

therefore its standard deviation is divided by square root n, so that also

gives you a sense of how the rate at which standard aviation's decline as you collect

more data. So, since we've talked about the sample

variance a lot why don't we actually define it.

So the sample variances, that entity that we used data to estimate the population

variance. So recall the population variance was the

expected value or the average, the expected deviation of a random variable

around its population mean. Right?

So what is the sample variance? Well it's the average deviation of the

sample values around the sample mean. So it's quite convenient.

Now notice it's not exactly the average. We divide by n - one instead of n, which

is a little annoying but we do it. So imagine for the time being that this an

n in the denominator, not an n - one. Then the sample variance is nothing other

than the average square deviation of the observations around the sample mean.

So the sample variance is an estimator of the population variance sigma squared.

And just like the population variance has a short-cut formula, the sample variance

also has a short-cut formula. Summation Xi minus X bar squared the top

of the variance calculation is summation Xi squared minus nX bar squared.

So, if some one gives you the sum of the squared observations in the sample mean,

then you can calculate the sample variance really quickly.

So, why do we divide by n - one instead of'n?

And again, for large samples it's irrelevant right.

The factor n - one / n is small. So you are going to get about the same

answer either way. But for small samples it can make a

difference. So why do we choose to divide by n -one?

So, recall we have this property unbiasness.

And the property of unbiasness meant that the statistic, it's expected value equal

to the quantity that it's estimating. So, just to remind you, the sample

variance is a function of our observed data.

It's a function of our random variables, right?

So itself is a statistic. So it is a random variable itself.

So it has a distribution, and so, that distribution has a variance, and that

distribution has a mean. Okay?

That's what were going to talk about right now, is that the mean of that distribution

turns out to be sigma squared if you happen to do the calculation where you

divide by n - one. So I'm going to show it by showing that

the expected value of the numerator of the statistic is equal to n - one times sigma

squared, that's the same thing as showing that the, the sample variance is on biased

because then you just divide both sides of this equation by n - one and you get the

result. So let's do that.

Just to say it again because it's important, what are we doing?

Remember the sample variance is itself a random variable, that random variable has

a distribution, that distribution has a population mean, and we want to say that,

that population mean is in fact sigma squared.

Okay. So expected value of the numerator part of

the sample variance calculation, the sum of the squared deviations around the

sample mean. If we use the shortcut formula, that's sum

of the expected value of the Xi^2 of minus expected value of X bar squared.

Okay. And, now let's use a really kind of nifty

fact. Recall for the variance.

The shortcut variance formula was defined as the expected value of a random variable

squared, minus the expected value of the random variable quantity squared.

Well, we can shift that formula around, to get it to say that the expected value of a

random variable squared is the variance plus the mean squared.

And that's what we do right here, so the expected value of Xi^2, is variance Xi +

mu^2. Okay.

And then the same thing is true of course for the mean because the mean itself is

another random variable. So expected value of' X bar squared is

variance x bar + mu^2 and then we have this NR front.

Okay. And so the variance of Xi is sigma

squared, so you wind up with some sigma square + mu^2 which is the constant so we

wind up with n of those and then the variance of X bar we just arrived a little

bit ago is being sigma squared over n. So we get n times sigma squared over n +

mu^2 and just collect terms now and you get n - one sigma squared.

So this is really interesting fact. So this says that the expected value of

the variance is in fact the quantity its trying to estimate if in fact you divide

by n - one instead of n, and that's why we divide by n - one.

Another way to think about this is that well, you know, we don't know the

population mean, mu, and if we knew it, instead of plugging X bar into the sample

variance formula, we would plug mu into the sample variance.

We would calculate the deviations of the observed observations around the

population mean rather than the deviations around the sample mean.

And so, the idea is that we will sort of lose a degree of freedom by plugging in X

bar, its sample analog, instead of plugging in that mu.

So that's the kinda heuristic behind why you divide by n - one.

It's an interesting fact tough. It's not a 100 percent clear that you do

want to divide by n - one, it's sort of every introductory statistics textbook

divides by n - one but there's this interesting phenomenon called the

bias-variance tradeoff and in this case we've obtained an unbiased estimator by

dividing by n - one instead of n but what if we'd divided by n.

Maybe as exercise, I could ask you to calculate the expected value of the sample

variance. If it was calculated with n in the

denominator instead of n - one. Okay, so basically, what is the expected

value of n - one / n s^2 And you can calculate that very easily, it is not

sigma square but it is quite close to it. So it's, it's a biased estimator but the

other thing I would ask is well which of the two estimators, the estimator s^2

calculated with n - one in the denominator or calculation of the variance with an n

in the denominator, has a lower variance, and what I do mean by that.

Remember the sample variance is a random variable.

It has a distribution, that distribution has a variance.

And the question is, which of the two calculations dividing by n or dividing by

n - one results in a smaller variance of that distribution.

And what does that mean, that would mean how precise your estimate of the variance

is. I'll give you the punch line.

The sample variance divided by n has a slightly lower variance, than the sample

variance divided by n - one. So, it's another kind of classic bias

variance trade-off. In this case, we divide by n -one because

we want unbiasedness. But then we wind up with slightly, greater

variance. If we divide by n, we wind up with a

slightly lower variance of our sample variance but it's slightly biased.

I know extremely well established statisticians that say they would prefer

to have the lower variance. But pretty much every introductory

statistics textbook divides by n - one. It's kind of an interesting discussion,

you know, one of the confusions that always comes up seems quite simple.

We divide by an n - one when we calculate the sample variance.

People have a tendency to confuse that with the n that we divided by when we

talked about the standard error of the mean.

And so let's just try to avoid some of this confusion.

Suppose you have a bunch of observations that you're willing to model as IID with

population mean mu and population variance sigma squared.

Then the sample variance, S^2 estimates the population variance, sigma squared.

The calculation of S^2 involves dividing by n - one, and we just spent forever

talking about the difference between dividing by n and dividing by n - one.

Then, the standard error of the mean is Sigma over square root end.

So, S over square root n will estimate the standard error of the mean.

So we've already divided S^2 by n - one then we square rooted.

And then we divide by an additional square root of n if we want the standard error of

the mean. Okay, and I am just trying to avoid some

confusion because people seem to get confused by that.

So, I, I guess if you wanted to attach a label to the quantity S over square root

n, it's the sample standard error of the mean.

What does it estimate? It estimates the population standard error

of the mean, sigma over square root n. Let's tie this down with some actual

numbers. So I was involved in a study where there

was a lot of organolead workers in this case, I took a subset of 495 of them and

the total brain volume for the lead workers, they were interested in studying

how their exposure to lead in their job changed their brain volume.

So TBV stands for total brain volume, in this case as a measure of the brain volume

on the inside of the skull, so and all of the measures are in cubic centimeters.

So the mean, in this case, is 1151. If we're willing to assume these

organolead workers are, say, an IID draw of organolead workers from a population

that we're interested in. Then the sample mean, 1151 would be an

estimate of that population mean. The sum of the squared observations works

out to be this number. So the standard deviation, the sample

standard deviation works out to be that number Minus 495 times the sample mean

squared all divided by 494 that minus one in the denominator.

Square root the whole thing you end up with 112.

So what does 112 describe? 112 describes the variance of the

population of brain volumes of organolead workers.

Okay, so it, its a direct estimate of my sample variation, right, and then its an

attempt estimate, if you view my data as a sample from a population of organolead

workers. It attempts to then, estimate the

population standard deviation of that distribution.

So we can, for example, use Chevey Cheves rule to interpret what the combination of

the mean and the standard deviation say about brain volumes of lead workers in the

population. Now, what does, if I take this 112.6 and

divide it by square root 495 give me, gives me five as the numerical result but

what does that five actually estimate or do for us?

Well, the five is no longer talking about the variation in total brain volumes in

the population. It's talking about the variation in

averages of 495 organolead workers. So the idea is if we're willing to model

our 495 organolead workers from as a draw from a population of organolead workers,

then five estimates the distribution of averages of 495 draws of organolead

workers from that population. It talks about how variable averages of

495 brain volumes are. The 112 talks about how variable brain

volumes are, okay? So, let me just repeat that cuz it's very

important. The 112 talks about how variable brain

volumes are of organolead workers in the population and it directly talks about it

in the sample. But it's an estimate of our population

variance and the five is an estimate of the population standard deviation of

averages of 495 organolead workers. So, I hope you're getting a sense of what

these numbers are, are trying to calculate.

So, there's several concepts that are being used here, first we have our

observed data, right? And these quantities, the sample mean,

sample standard deviation and standard error tell us things about out observed

data. Right?

And then, there's the assumptions, for example, that they're IID, that help to

try and connect it to a population. So that we can maybe generalize the

results from this data to a population of organalead workers.

Say, for example, if you wanted to use this data to inform policy and then, these

numbers would then be estimates of these population quantities.

And then dividing by the square root 495, it's telling us things about.

How variable this mean is relative to the variability in the population.

Okay? So, that's the concepts that we're trying

to use, and we'll formalize these much more when we actually do things like

generate confidence interval and perform hypothesis test in these things [music].