Now we proceed to Markov's Inequality.
Suppose there is a non-negative random variable,
f. That is f obtain only a non-negative values.
Then for any positive number a we get the following: the probability that f is
greater or equal than a is at most the expectation of f divided by a.
This is a useful inequality and we can use it in the following way:
It allows us to bound probabilities,
to show the probabilities are large using the expectations.
We have discussed how we can compute expectations,
so now, we can use this to balance some probabilities.
Okay, for the proof of this inequality,
it is convenient rewrite it.
It is convenient to multiply both sides by a.
Note that a is positive so we can do it.
And we will show that a times the probability that f is greater or equal than a,
is at most the expectation of f.
So we need to show this inequality
and for this let us consider the following new random variable, g.
So it is a random variable on the same probability space
and it is defined in the following way.
So, consider some outcome and the value of f on this outcome is a_i,
then if a_i is at least a then g is equal to a on this outcome.
And if a_i is less than a,
then g is equal is 0 on this outcome.
So, g indicates a certain event,
g indicates the following: If on some outcome the value of f is at least a,
then g is a.
Otherwise it is 0. Okay, note that g is always less or equals than f on each outcome.
Since it is always on all outcomes,
it is at most f then the average value of
g is also at most the average value of f. So we have this inequality.
The expectation of g is at most expectation of f. Again,
this follows since g is at most f on each possible outcome.
Okay, now what is the expectation of g?
Note that g has only one nonzero value.
If we write the definition of expectations then we
see that the expectation of g is this value,
multiplied by the sum of probabilities of
outcomes for this value of g. Now we have to see what are these outcomes.
Note that these outcomes are exactly
the outcomes of the following event:
f is at least a, g is equal to a,
if and only if on this outcome,
f is at least a.
So these are outcomes of the event, f is at least a.
The sum of their probabilities is
thus by the definition of a probability of the event f is at least a.
So the sum of probabilities of
this outcome is just exactly equal to the probability of the event f is at least a.
and so now we know what is the expectation of g. You have to multiply the a by
the sum of probabilities of
these outcomes and this sum is equal to probability that f is at least a.
So here is the expectation of g.
now let's sum up what we already have.
We have that the expectation of g
is at most the expectation of f
and on the other hand we have already computed the expectation of g,
it is equal to a times the probability that f is at least a.
So, if you combine this,
we see that the expectation of f is
at least a times the probability that f is at least a.
And this is exactly what we needed to show.
And we have shown Markov's inequality.
So the ides was that we substituted f by a smaller and simpler random variable.
And this way we have bounded,
we have obtained the bound between
expectation and the probability of the event we are interested in.
Now, let's see
just to get more intuition let's see geometric interpretation of
this inequality. Now, we have this inequality,
we would like to show and suppose for the simplicity of the presentation,
suppose that f obtains four values: a_sub_1, a_sub_2, a_sub_3,
and a_sub_4, with probabilities: p_sub_1,
p_sub_2, p_sub_3, and p_sub_4.
Now, again, recall that we can represent f in the system of coordinates,
we can split the interval from zero to one into four intervals of length p_sub_1,
p_sub_2, p_sub_3, and p_sub_4,
and recall that p_sub_1,
plus p_sub_2, plus p_sub_3,
plus p_sub_4 is exactly one.
So you can split the interval of length one into
these parts. We can consider the following graph,
which correspond to the function f. If we follow a point,
informally speaking, if we follow a point from an interval from zero to
one then with probability and see the value of the function of this graph.
In this point then the probability p_sub_1 it will be equal to a_sub_1,
p_sub_2 will be equal to a_sub_2 and so on.
This is exactly our random variable f. So here is the graph.
Here is how it corresponds to vertical axis.
And here is a,
from the inequality in the top of this slide.
And the recall that the expectation of f is just the area of gray region.
On the other hand,
you can also see a times the probability f is at least a in this picture.
It's the area of the red region.
Indeed f is at least a on two outcomes on this picture.
On the outcome a_sub_1 and on the outcome a_sub_3.
So the probabilities are p_sub_1 and p_sub_3.
So, this product a times probability
of that a is at least a breaks in two parts.
It's a times p_sub_1 plus a times p_sub_3.
And these are exactly the areas of red rectangles.
So now just observe,
it is just left to observe that the gray region is larger than the red one.
So that's the intuition behind,
the geometric intuition behind Markov's inequality.