0:17

So in this lecture, I'm gonna talk about basically three distributions.

And really two, if you kind of think about it.

So the Bernoulli distribution is one, the binomial distribution, which is a

distribution based on the Bernoulli distribution, and then I'll talk about the

Gaussian or normal distribution, which is a very important one as well.

In fact, I think if all the distributions were to get together and nominate a king,

it would definitely be the normal distribution.

So the Bernoulli distribution is named after Jacob Bernoulli.

We've already talked about it before but let's formalize it a little bit and remind

ourselves of some notation. So Bernoulli random variable is just a

fancy name for a coin flip. So Bernoulli random variable takes values

zero and one with probabilities p and one minus p where p is some number between

zero and one. So that probability mass function for

Bernoulli random variable we've seen before.

It's the probability that x takes the specific value zero or one, and it's p to

the x 1-p to the 1- x. The mean of a Bernoulli random variable is

simply p and the variance is p times 1- p, facts that we've proven before but we're

just restating them now. And, in general, for Bernoulli random

variables, if you code a coin flip as a one, or a head, it's often people call

that a six the best and zero is a failure. And, and they tend to do this regardless

of whether the classification is actually something that's actually successful.

So we might call a you know, a person getting side effects from a medication a

success, in terms of the Bernoulli coin flip.

It's just a little bit of odd nomenclature, I guess.

We've also already talked about the Bernoulli likelihood function.

So, if we have x1 to xn, our observed data points.

So the X1 to xn in this side are numbers that we recorded a collection of zeros and

ones. Then the likelihood is the product because

we're going to model these as independent coin flips, the product of p to the xi one

minus p to the one minus xi we've already talked about this equals p to the

summation xi, one minus p to the n minus summation xi.

So notice again that the likelihood only depends on the sum of the x's, so the

total in each x is zero or one, so the sum of the x's is just the total number of

successes. And then n minus the sum of the x's is the

total number of failure. So if you know n and you know the total

number of successes, then you know the Bernoulli likelihood.

It doesn't matter what order the heads or tails occurred, as far as information is

contained in the data about the parameter p.

Because n is fixed and assumed known, this implies that the sample proportion

contains all the relevant information that you need to know about p insofar as the

likelihood codifies it, simply because summation xi and summation xi over n are

one to one, you can get from one to the other easily by either multiplying or

dividing by n. So, the sufficiency result is basically

that the proportion of successes is all the relevant information you need to make

Bernoulli inference about a parameter. Now again, this all depends on the model

being correct. That you have.

Correctly modeled the data as iid Bernoulli.

We also, when we talked about maximum likelihood, we also showed that if you

maximize the Bernoulli likelihood over p, then you obtain that p hat, which is

summation xi over n, is the maximum likelihood estimator.

And here's, you know we, we went over an example.

6:00

So here just to give you an example of Brunulli likelihoods and what they look

like, imagine if we flipped a coin four times.

So if we flipped a coin four times, there's only four possible sufficient

statistics that we could obtain. We could get zero heads, one head out of

the four coin flips. Two heads out of the four coin flips,

three heads out of the four coin flips or four heads out of the four coin flips and

here is the four possible likelihoods that you could obtain from this experiment and

they're all normalized to have height one. So here's, on the left most one is the

likelihood if you get if you had no heads. The second one you see it shifted a little

bit to the right, right and it's peak is going to be right at p hat at a quarter

okay and then if you get two heads then you see the peak mle case is right at.5

and the likelihood shifts a little bit to the right, and if you've gotten three

heads it shifts a little bit more to the right and if you've gotten four heads it

shifts closer to one. Notice even in the event that you get all

tails or all heads, the likelihood is not entirely at zero, or entirely at one.

Right? There is substantial uncertainty.

Right? The likelihood is correctly codifying the

information that it's possible, even if the coin is say fair.

You know, here is the likely point right here.

Even if the coin is fair, to get four consecutive tails.

And, you know, it's just much less likely than if the coin is unfair towards some

value near, closer to zero. So, likelihood, you know, it's not

entirely shoved up against the vertical line at zero.

And it just gets closer and closer to that vertical line as you continue to flip and

get tail after tail after tail. Binomial random variables are nothing

other than the sum of iid Bernoulli trials.

We've seen that the key variable from a Bernoulli experiment is the number of

heads, so why don't we just create a random variable that is the number of

heads? So in specific, if X1 to Xn are RID

Bernoulli, then the random variable X defined as the sum of the individual Xi's

is the so-called binomial random variable. And the binomial mass function is just the

probability x takes any specific value, is, n choose x.

E to the x, 1-p. Or the n-x, where p is the probability

from each of the Bernoulli coin flips. Here, the values that x can take are zero,

if every single coin flip was a tail. All the way up to N, where every single

coin flip was a head. It was just to remind everyone, the

notation n choose x, right? This parentheses, n over x, n parentheses.

This is n factorial over x factorial n minus x factorial here zero factorial

we're going to treat as one. And this formula counts the number of ways

of selecting x items out of n without replacement disregarding the order of the

items. Okay let's consider an example.

Imagine I have I have ten neckties and I pick out three.

And. Put them on a bed, let's say.

And I'm not caring about what order on my bed, from left to right or something like

that. And so ten choose three is the number of

different configurations of ties that I could have obtained by picking three ties

out of my ten possible. So that, that's an example.

I can't think of any reason why you would want to do this with your neckties, but

whatever. So it's very easy in fact, certain special

cases are very easy. So imagine if I was only picking one

necktie right, how many different combinations can I get?

Well intuitively we know that answer has to be ten, right?

Because you know, there's only ten possibilities.

Well you know, if you plug into the formula you have ten factorial divided by

one factorial which is just one. N

Minus X factorial, which is ten minus one or nine factorial.

So we have ten factorial divided by nine factorial, which is just ten.

Okay. You know, another quite useful one is

choose two, how many ways can you pick two things out of n objects.

That one seems to come up a lot in my daily life for some reason.

And so that's, in this case would be ten factorial over two factorial, which is

just two, right, two times one. Then divided by eight factorial, right.

So that's ten times nine divided by two. So, any rate, the general rule is for N

choose two, you just wanna take N times N minus one over two.

And at any rate, it's one of the ones that it's worthwhile just to memorize the

formula for that special case, because it seems to pop up a lot.

So why is that factor, why is N choose X, the factor out in front of a binomial mass

function. So let's now consider the possibility of

getting six heads out of ten coin flips, from a coin with success probability P.

Well, if you account for the order, you say what's the probability of getting

tail, tail, tail, tail? That's four tails.

And the remaining are heads, right? In that specific order.

Right we now what that probability is we would just plug into the multivariable

Bernoulli mass function obtained by multiplying the Bernoulli mass function

for each coin flip in order and we would get p to the sixth one minus p to the

fourth. And as we know because of the fact that

it's only the total number of heads that's sufficient that if it wasn't just the

first four coin flips that were tails. If it was the last four coin flips were

tails, and there were, first six were heads, then we would get the same number.

We'd get p to the sixth, one minus p to the fourth.

And, if we had the four tails sprinkled in any configuration of the six heads, then

you will still get the same. Possible answer, p to the sixth, one minus

p to the fourth. The result is that, basically, for any

collection of instances where you get six heads and four tails, no matter what the

order is, the probability is going to be p to the sixth.

So what we need is to count how many such configurations there are.

Well, in this case there's ten flips and ten positions that could be heads, and we

want to know how many different collections of positions we can obtain,

and that's just ten to the sixth. It's just exactly the necktie problem, but

now we're picking the position of the coin flip that is a head, right?

And so in this case there's ten to the six possible orders of six heads and four

tails. We don't actually have to go through the,

the specific construction of a binomial distribution, because that I think that's

a pretty clear demonstration that's how you would wind up with the probability of

the sum, Of a collection of ten Bernoulli random variables being six, right?

You would sum over all the possible ways you could get six heads.

And it turns out that they all have the same probability.

P to the six, 1-P to the six. So we just need the number of things that

go into that sum. And it's pretty clear that it's ten to the

six. So yeah, we only did this for a specific

instance here. I hope you can see that if ten instead was

N, and six instead was X, that you would wind up with the same answer.

N choose X, P to the X, 1-P to the N-X. So that's the motivation.

And you can actually mathematically check that the binomial mass function sums to

one. In fact, it's actually kind of a

relatively famous formula. The binomial sum.

And so, if you for example, look up on Wikipedia the binomial sum, you'll see

that it just exactly illustrates. That formula by itself tells you exactly

that the binomial mass function sums to one.

But, but we're, right now just going to trust that we did our calculation right.

So let's just go through an example of using the binomial mass function.

So suppose you have an intrepid friend that has eight children and it turns out

that seven of them are girls and none of them are twins.

And let's. So if you're very persnickety, let's just

forget about all the little persnickety things that you could possibly think about

related to this problem like having twins. So let's just think about the problem in

the conceptual way. We're going to think of every child from

this family, its gender being a coin flip. And the question is what's the chance of

getting seven out of eight children that are girls if it really is the case that

every child their gender is independent from the other children and that there's a

50 percent probability at each birth of having a girl.

And so what's the probability of getting seven or more.

Well that's the probability of seven or more is the probability of getting seven

girls out of eight plus the probability of getting eight girls out of eight and so in

this case for the fair coin, it would be eight choose 7.5 to the seventh, one minus

0.5 to the one, and then eight choose eight, 0.5 to the eight, one minus 0.5 to

zero. And, you know, check with your calculator,

or, even better, with R, you know, that you can get this number.

It's about four%. This is an example of using the binomial

formula. I wanted it, to mention, this particular

example because this calculation is an example of a so-called P value.

And a P value is always the probability under a known hypothesis of getting a

result as extreme or more extreme than the one actually obtained.

So the logic behind the P value in this specific instance.

Is that you have this evidence here that you think wow, seven girls out of eight

children, that seems pretty odd. Maybe the 50 percent chance of girls

versus boys for this particular family is off for whatever the reason.

So the P value's saying okay why don't we calculate the probability.

If the null hypothesis was true of getting an event this extreme, and if that

probability's very low, then maybe that's an indication that our hypothesis, that

the 50 percent is correct, is not right. So, at any rate, we'll talk about p values

later. Well, I hope we'll get to talking about p

values later. But I just wanted to mention that that's

where the intuition behind this calculation is coming from.

Right now, we're only using it as an illustration of plugging into the binomial

formula. But I wanted to foreshadow kind of an

important statistical technique. And then here in this page is the

likelihood associated with p or this particular binomial experiment, if we're

willing to model births as, as if they were binomial.

And so here you can see, you know, .5 is in the one sixteenth likelihood but not in

the one eighth likelihood. And then you can see, you know, that this

likelihood is far more peaked around seven 8ths or its, reaches its maximum at seven

8ths, and then the curvature of it sort of gives you a sense of the relative evidence

for the collection of possible values of p.