0:17

So in this lecture, I'm gonna talk about basically three distributions.

Â And really two, if you kind of think about it.

Â So the Bernoulli distribution is one, the binomial distribution, which is a

Â distribution based on the Bernoulli distribution, and then I'll talk about the

Â Gaussian or normal distribution, which is a very important one as well.

Â In fact, I think if all the distributions were to get together and nominate a king,

Â it would definitely be the normal distribution.

Â So the Bernoulli distribution is named after Jacob Bernoulli.

Â We've already talked about it before but let's formalize it a little bit and remind

Â ourselves of some notation. So Bernoulli random variable is just a

Â fancy name for a coin flip. So Bernoulli random variable takes values

Â zero and one with probabilities p and one minus p where p is some number between

Â zero and one. So that probability mass function for

Â Bernoulli random variable we've seen before.

Â It's the probability that x takes the specific value zero or one, and it's p to

Â the x 1-p to the 1- x. The mean of a Bernoulli random variable is

Â simply p and the variance is p times 1- p, facts that we've proven before but we're

Â just restating them now. And, in general, for Bernoulli random

Â variables, if you code a coin flip as a one, or a head, it's often people call

Â that a six the best and zero is a failure. And, and they tend to do this regardless

Â of whether the classification is actually something that's actually successful.

Â So we might call a you know, a person getting side effects from a medication a

Â success, in terms of the Bernoulli coin flip.

Â It's just a little bit of odd nomenclature, I guess.

Â We've also already talked about the Bernoulli likelihood function.

Â So, if we have x1 to xn, our observed data points.

Â So the X1 to xn in this side are numbers that we recorded a collection of zeros and

Â ones. Then the likelihood is the product because

Â we're going to model these as independent coin flips, the product of p to the xi one

Â minus p to the one minus xi we've already talked about this equals p to the

Â summation xi, one minus p to the n minus summation xi.

Â So notice again that the likelihood only depends on the sum of the x's, so the

Â total in each x is zero or one, so the sum of the x's is just the total number of

Â successes. And then n minus the sum of the x's is the

Â total number of failure. So if you know n and you know the total

Â number of successes, then you know the Bernoulli likelihood.

Â It doesn't matter what order the heads or tails occurred, as far as information is

Â contained in the data about the parameter p.

Â Because n is fixed and assumed known, this implies that the sample proportion

Â contains all the relevant information that you need to know about p insofar as the

Â likelihood codifies it, simply because summation xi and summation xi over n are

Â one to one, you can get from one to the other easily by either multiplying or

Â dividing by n. So, the sufficiency result is basically

Â that the proportion of successes is all the relevant information you need to make

Â Bernoulli inference about a parameter. Now again, this all depends on the model

Â being correct. That you have.

Â Correctly modeled the data as iid Bernoulli.

Â We also, when we talked about maximum likelihood, we also showed that if you

Â maximize the Bernoulli likelihood over p, then you obtain that p hat, which is

Â summation xi over n, is the maximum likelihood estimator.

Â And here's, you know we, we went over an example.

Â 6:00

So here just to give you an example of Brunulli likelihoods and what they look

Â like, imagine if we flipped a coin four times.

Â So if we flipped a coin four times, there's only four possible sufficient

Â statistics that we could obtain. We could get zero heads, one head out of

Â the four coin flips. Two heads out of the four coin flips,

Â three heads out of the four coin flips or four heads out of the four coin flips and

Â here is the four possible likelihoods that you could obtain from this experiment and

Â they're all normalized to have height one. So here's, on the left most one is the

Â likelihood if you get if you had no heads. The second one you see it shifted a little

Â bit to the right, right and it's peak is going to be right at p hat at a quarter

Â okay and then if you get two heads then you see the peak mle case is right at.5

Â and the likelihood shifts a little bit to the right, and if you've gotten three

Â heads it shifts a little bit more to the right and if you've gotten four heads it

Â shifts closer to one. Notice even in the event that you get all

Â tails or all heads, the likelihood is not entirely at zero, or entirely at one.

Â Right? There is substantial uncertainty.

Â Right? The likelihood is correctly codifying the

Â information that it's possible, even if the coin is say fair.

Â You know, here is the likely point right here.

Â Even if the coin is fair, to get four consecutive tails.

Â And, you know, it's just much less likely than if the coin is unfair towards some

Â value near, closer to zero. So, likelihood, you know, it's not

Â entirely shoved up against the vertical line at zero.

Â And it just gets closer and closer to that vertical line as you continue to flip and

Â get tail after tail after tail. Binomial random variables are nothing

Â other than the sum of iid Bernoulli trials.

Â We've seen that the key variable from a Bernoulli experiment is the number of

Â heads, so why don't we just create a random variable that is the number of

Â heads? So in specific, if X1 to Xn are RID

Â Bernoulli, then the random variable X defined as the sum of the individual Xi's

Â is the so-called binomial random variable. And the binomial mass function is just the

Â probability x takes any specific value, is, n choose x.

Â E to the x, 1-p. Or the n-x, where p is the probability

Â from each of the Bernoulli coin flips. Here, the values that x can take are zero,

Â if every single coin flip was a tail. All the way up to N, where every single

Â coin flip was a head. It was just to remind everyone, the

Â notation n choose x, right? This parentheses, n over x, n parentheses.

Â This is n factorial over x factorial n minus x factorial here zero factorial

Â we're going to treat as one. And this formula counts the number of ways

Â of selecting x items out of n without replacement disregarding the order of the

Â items. Okay let's consider an example.

Â Imagine I have I have ten neckties and I pick out three.

Â And. Put them on a bed, let's say.

Â And I'm not caring about what order on my bed, from left to right or something like

Â that. And so ten choose three is the number of

Â different configurations of ties that I could have obtained by picking three ties

Â out of my ten possible. So that, that's an example.

Â I can't think of any reason why you would want to do this with your neckties, but

Â whatever. So it's very easy in fact, certain special

Â cases are very easy. So imagine if I was only picking one

Â necktie right, how many different combinations can I get?

Â Well intuitively we know that answer has to be ten, right?

Â Because you know, there's only ten possibilities.

Â Well you know, if you plug into the formula you have ten factorial divided by

Â one factorial which is just one. N

Â Minus X factorial, which is ten minus one or nine factorial.

Â So we have ten factorial divided by nine factorial, which is just ten.

Â Okay. You know, another quite useful one is

Â choose two, how many ways can you pick two things out of n objects.

Â That one seems to come up a lot in my daily life for some reason.

Â And so that's, in this case would be ten factorial over two factorial, which is

Â just two, right, two times one. Then divided by eight factorial, right.

Â So that's ten times nine divided by two. So, any rate, the general rule is for N

Â choose two, you just wanna take N times N minus one over two.

Â And at any rate, it's one of the ones that it's worthwhile just to memorize the

Â formula for that special case, because it seems to pop up a lot.

Â So why is that factor, why is N choose X, the factor out in front of a binomial mass

Â function. So let's now consider the possibility of

Â getting six heads out of ten coin flips, from a coin with success probability P.

Â Well, if you account for the order, you say what's the probability of getting

Â tail, tail, tail, tail? That's four tails.

Â And the remaining are heads, right? In that specific order.

Â Right we now what that probability is we would just plug into the multivariable

Â Bernoulli mass function obtained by multiplying the Bernoulli mass function

Â for each coin flip in order and we would get p to the sixth one minus p to the

Â fourth. And as we know because of the fact that

Â it's only the total number of heads that's sufficient that if it wasn't just the

Â first four coin flips that were tails. If it was the last four coin flips were

Â tails, and there were, first six were heads, then we would get the same number.

Â We'd get p to the sixth, one minus p to the fourth.

Â And, if we had the four tails sprinkled in any configuration of the six heads, then

Â you will still get the same. Possible answer, p to the sixth, one minus

Â p to the fourth. The result is that, basically, for any

Â collection of instances where you get six heads and four tails, no matter what the

Â order is, the probability is going to be p to the sixth.

Â So what we need is to count how many such configurations there are.

Â Well, in this case there's ten flips and ten positions that could be heads, and we

Â want to know how many different collections of positions we can obtain,

Â and that's just ten to the sixth. It's just exactly the necktie problem, but

Â now we're picking the position of the coin flip that is a head, right?

Â And so in this case there's ten to the six possible orders of six heads and four

Â tails. We don't actually have to go through the,

Â the specific construction of a binomial distribution, because that I think that's

Â a pretty clear demonstration that's how you would wind up with the probability of

Â the sum, Of a collection of ten Bernoulli random variables being six, right?

Â You would sum over all the possible ways you could get six heads.

Â And it turns out that they all have the same probability.

Â P to the six, 1-P to the six. So we just need the number of things that

Â go into that sum. And it's pretty clear that it's ten to the

Â six. So yeah, we only did this for a specific

Â instance here. I hope you can see that if ten instead was

Â N, and six instead was X, that you would wind up with the same answer.

Â N choose X, P to the X, 1-P to the N-X. So that's the motivation.

Â And you can actually mathematically check that the binomial mass function sums to

Â one. In fact, it's actually kind of a

Â relatively famous formula. The binomial sum.

Â And so, if you for example, look up on Wikipedia the binomial sum, you'll see

Â that it just exactly illustrates. That formula by itself tells you exactly

Â that the binomial mass function sums to one.

Â But, but we're, right now just going to trust that we did our calculation right.

Â So let's just go through an example of using the binomial mass function.

Â So suppose you have an intrepid friend that has eight children and it turns out

Â that seven of them are girls and none of them are twins.

Â And let's. So if you're very persnickety, let's just

Â forget about all the little persnickety things that you could possibly think about

Â related to this problem like having twins. So let's just think about the problem in

Â the conceptual way. We're going to think of every child from

Â this family, its gender being a coin flip. And the question is what's the chance of

Â getting seven out of eight children that are girls if it really is the case that

Â every child their gender is independent from the other children and that there's a

Â 50 percent probability at each birth of having a girl.

Â And so what's the probability of getting seven or more.

Â Well that's the probability of seven or more is the probability of getting seven

Â girls out of eight plus the probability of getting eight girls out of eight and so in

Â this case for the fair coin, it would be eight choose 7.5 to the seventh, one minus

Â 0.5 to the one, and then eight choose eight, 0.5 to the eight, one minus 0.5 to

Â zero. And, you know, check with your calculator,

Â or, even better, with R, you know, that you can get this number.

Â It's about four%. This is an example of using the binomial

Â formula. I wanted it, to mention, this particular

Â example because this calculation is an example of a so-called P value.

Â And a P value is always the probability under a known hypothesis of getting a

Â result as extreme or more extreme than the one actually obtained.

Â So the logic behind the P value in this specific instance.

Â Is that you have this evidence here that you think wow, seven girls out of eight

Â children, that seems pretty odd. Maybe the 50 percent chance of girls

Â versus boys for this particular family is off for whatever the reason.

Â So the P value's saying okay why don't we calculate the probability.

Â If the null hypothesis was true of getting an event this extreme, and if that

Â probability's very low, then maybe that's an indication that our hypothesis, that

Â the 50 percent is correct, is not right. So, at any rate, we'll talk about p values

Â later. Well, I hope we'll get to talking about p

Â values later. But I just wanted to mention that that's

Â where the intuition behind this calculation is coming from.

Â Right now, we're only using it as an illustration of plugging into the binomial

Â formula. But I wanted to foreshadow kind of an

Â important statistical technique. And then here in this page is the

Â likelihood associated with p or this particular binomial experiment, if we're

Â willing to model births as, as if they were binomial.

Â And so here you can see, you know, .5 is in the one sixteenth likelihood but not in

Â the one eighth likelihood. And then you can see, you know, that this

Â likelihood is far more peaked around seven 8ths or its, reaches its maximum at seven

Â 8ths, and then the curvature of it sort of gives you a sense of the relative evidence

Â for the collection of possible values of p.

Â