So the variance of a random variable is another expected value property of a
distribution. Recall the mean measured the center of a
distribution. The variance measures how spread out it
is. So.
If x is a random variable and it has mean, , I, expected value of x equals , then the
variance of x is defined as the expected value of quantity x minus , whole thing
squared and, and the expected value. So what, what does that mean?
So, the expected values is in, is, is in essence an, an average, right?
So it's sort of the average or the typical value that the random that the variable
takes, the center of the distribution. On the other hand the, The variance is
sort of the, the, the average distance the random variable is from the mean.
So, what that means is, is sort of higher variances imply that variances are more,
what that implies is that, is that. Random variables with higher variances
come from distributions that are more spread out then ones that have a lower
variance. >> That makes sense and I'm just kind of
thinking of that fulcrum point still. >> Yeah.
>> How things are more spread out [inaudible].
>> Yeah, exactly, exactly, great. >> Alright.
>> And so, Let me just remind you what this formula, the variance formula means
again. If you were to take the random variable x
and figure out what the distribution was, if you were to subtract off its population
mean, which turns out to be the exact same distribution just with all the possible
values of x shifted by the value and then it has mean zero.
And then I were to take that random variable and figure out what the
distribution of the square of it is, then take the expected value of the resulting
random variable. And that's, that's hard, so we don't ever
calculate the variance that way. We typically calculate the variance by a,
a convenient shortcut, and that is that the variance of a random variable is the
expected value of x squared minus the expected value of x quantity squared, and
again this expected value of x quantity squared is just Mu squared.
This shortcut formula, then, requires you to calculate the expected value of
x-squared. But again, the, the kind of, ten,
typically the more convenient way to do that is to, to use, if it's discrete, the
summation of, of say, t-squared, p of t, where p is the probability mass function.
Or, if it's continuous, use integral t-squared, f of t, where f is the, the
density function. It would be nice for you as, as exercise,
to show that this original variance calculation.
Equals, this. Shortcut variance calculation.
Just by, expanding the square and using the.
Expected value, rules. It would be convenient if the variance
operator was also linear. It's not.
As an example, the, the variance if you pull a random variable out of the, out of
the variance, you, it gets squared. So variance of a times x, right, is not a
random variable it is a squared variance of x.
The square root of a random variable is called the standard deviation and the
reason we use standard deviation often instead of the variance is that the
standard deviation has the same units as the random variable.
So let's say x as a random variable has units in inches, the variance has units
inches squared, whereas, the standard deviation has units inches.
So, it's often quite convenient to. Talk about the spread in the same units as
the random variable itself. So the standard variation is a common
summary of the variance. Well let's, let's calculate a, a sample
variance. What's the sample variance from a toss of
a die? So in this case, expected value of x is
3.5. We've covered that already.
And expected value of x squared, let's calculate that.
Well, we have one squared times a sixth, plus two squared times a sixth, plus three
squared times a sixth, plus four squared times a sixth, plus five squared times a
sixth, plus six squared times a sixth. That works out to be 15.17.
And then you subtract 15.17 minus 3.5 squared, and that works out to be about
2.92. Let's go through a very important,
formula. Let's suppose we, flip a coin.
But let's make it slightly more interesting.
Instead of the coin having probability one-half of a head, let's say that it has
probability p of a head. So here expected value of x equals zero
times the probability of a tail, which is one minus p, plus one times the
probability of a head, which is p, so it works out to be p as the expected value.
And of course this works out with our calculation when the p happens to be
one-half for the, a fair coin. Now let's calculate the expected value of
x squared. Well, actually it's kind of interesting in
this case it's pretty easy to do that because x only takes on the values zero
and one, and if you square zero you get zero, and if you square one you get one.
So x squared is in fact exactly x. So expected value of x squared is equal to
the expected value of x which we already calculated as p.
So the variance of x in this case is expected value of x squared minus the
expected value of x quantity squared, which is p minus p squared, which works
out to be p times one minus p, which is a formula you may have encountered before.
It's interesting to know that this variance formula is maximized when p is.5,
so just simply plot the function p times one minus p, between zero and one.
So, plot this function between zero and one and you'll see that it maximizes at,
at.5. So the most variable coin flip can be is
if, in fact exactly a fair coin. It's, it's interesting to know that the
most variable a random variable can be, in general is if you shuff all its mass to
two endpoints. And equally distributed between those two
endpoints. That's so, if you have a continuous random
variable and you wanna make it more variable kind of chop out the middle and
spread it out equally distributed between the two ends.
And in fact let, let's, let's talk about this in greater detail.
Suppose that you have any random variable, like a uniform random variable, that's
between zero and one. And it's expected value is p.
Now, since the variable takes value between zero and one, p has to be a number
between zero and one. And then notice if, if x is a, a, a random
variable that's between zero and one, x squared has be less than or equal to x.
Because if you take, any number between zero and one and square it you get a.
Smaller number. And so X, expected value at X squared has
to be less than or equal to expected value of X which is P.
Therefore, the variance of X. Which is expected value of X squared minus
expected value of X quantity squared. Has to be less than or equal to the
expected value of X minus the expected value of X squared, which is P times one
minus P. And basically, this is then just a proof,
that the Bernoulli variants, this. Binary variance where the random variable
can only take the value zero or one, is the largest possible for a random variable
that has expected value of p. And then we also noted that we earlier
that the, the maximum value that you can get is when p is in fact 0.5, so this
basically just shows that the, this is basically a simple little proof that the
random variable, that the largest variance that you can get for a random variable is
that you. So to shove its mass to two endpoints,
and, the, the closer you can get to, to an equal mass in both the endpoints, the, the
larger the variance is. I' not sure if I'd mentioned this
previously but I called the, the variable a coin flip that can take heads with
probability p, I called it a Bernoulli random variable.
This is named after the mathematician Jacob Bernoulli who is one of the fathers
of probability and Jacob Bernoulli is an interesting character.
You should, you should read up on him. The Bernoullis were a very famous
mathematical family. They came up with lots of Lots of
discoveries, Jacob was a particularly influential member of the Bernoulli,
Bernoulli family and he discovered quite a bit of probability theory very, very early
on. At any rate, when you have a random
variable that takes the value zero or one with probability P, then we, we call that
a Bernoulli random variable. So here we are back.
Talking about variances, and. Variances are kind of difficult things to
understand and, and equivalently standard deviations.
I, I prefer to interpret standard deviations.
Intuitively we know that, that bigger variances mean distributions are more
spread out but, but we need some way to actually interpret what bigger a-, what
bigger means. Now in the context of a specific
distribution, we might learn. The, the kind of quantities associated
with that distribution to, to know that what, what does one variance mean, or two
standard deviations mean, three standard deviations mean?
And that's particularly true of the Gaussian, or bell-shaped density.
We, we know, we tend to know those, the values associated with those variances,
sort of, by heart. But there is a, a general rule that
applies to all distribution and its, its so called Chebyshev inequality, after the
Russian mathematician Chebyshev. So any rate, Chebyshev gave a really
useful inequality for interpreting variances.
So, Basically the inequality says the probability that a random variable is K
standard deviations from its mean, or more, is less than or equal to (1/K^2).
So let me repeat that because it's so important.
The probability that a random variable is more than K standard deviations from its
mean is less than or equal to (1/K^2). And let's just look at some simple
benchmarks for K. The probability that a random variable is
more than two standard deviations from its mean is.
25 percent or less, the probability of the random variable is three standard
deviations from its mean is eleven percent or less.
The probability of the random variable four standard deviations from its mean is
six percent or less. And again, note that, that, that is a
bound on the probability statement. It doesn't.
It's not an equality, so. It's the worst that it could possibly be
the, the, the lots of distributions the probability of being four standard
deviations or more beyond the mean is far lower than six%, but six percent is the
worst it can be. So, so it's unlikely, say, for example
that you will, if you Ob, observe a random variable, it's unlikely that you will see
that random variable be say, six standard deviations from the mean, that's, that's
quite unlikely, that's has probably less than one over 36, regardless of the
distributions. What, what's interesting about Chebyshev's
inequality is that it's, it's quite easy to prove.
And so well, let's just go through the proof really quickly.
Well, let's look at this probability statement.
The probability that a random variable is more than K standard deviations from its
mean. And, and let's do it in the, in the
continuous case. Let's just do it in the continuous case.
You can prove it more generally but, but this just gives you the intuition behind
the proof. Well that's the integral over the, the set
of x where it's more than k standard deviations from the mean, where here now
the little x and the, the domain of integration is a, is a dummy variable of
integration, f of xdx, and this could, could be, you know, we could replace this
by another letter over on the right-hand side but on the left-hand side it has to
be capital x. Well notice, notice the that x minus mue
over k sigma. Absolute value x minus mue over k sigma
has to be bigger than one. So if we square that it has to be bigger
than one as well. So you take a number that's bigger than
one and square it, it's still bigger than one.
So we can multiply by x minus mue squared over k squared sigma squared.