Next we're going to talk about marginal probabilities and the sum rule. It often happens in probability problems that we know the joint probabilities that two things will happen together. And we don't know, we want to know the individual probability that only one of those things will happen, regardless of the other event. When we want to know what is the probability of x1 and we are given the probability of x1 and y1, x1 and y2, x1 and y3, then we can refer to the probability of x1 as the marginal probability of x1, okay? You can think of it as being out on the margins of this matrix, okay? The sum rule tells us that the marginal probability, the probability of x 1, is equal to, assuming that y is a proper probability distribution meaning its statements are exclusive and exhaustive, equal to the sum of the joint probabilities. So the probability of x1 = 1 +, 1% + 10% + 4% = 15%, okay? Similarly the probability of y2, the marginal probability of y2 = x1y2 + x2y2 + x3y2. Which would be 79% or .79, okay? That we can do that, that we can add together joint probabilities to get a marginal probability is due to something called the sum rule. Here are two versions of the sum rule written out. The first is for a binary probability distribution. We have probability of B, probability of not B and we have the joint probability of A and B and we have the joint probability of A and not B, and together they sum to the probability of A. Similarly, if we a whole series of probabilities, n probabilities, we can sum the n joint probabilities to get the marginal probability of A. So it's exactly the same principle. Next we're going to talk about conditional probability. Conditional probability is defined as the probability that a statement is true given that some other statement is true with certainty. Everything to the right of the line is considered true with certainty. So what this notation means is if B is true with certainty, what is the probability of A in that case? So for example, if I throw a six-sided die and it comes up odd, what is the conditional probability that it is a 3? Well that would be three odd rolls, one, three and five, so the conditional probability would be one-third. What about if I throw a three with certainty? What is the conditional probability that my throw is odd? Well, in this case, the probability is one. It's odd with certainty if it's three with certainty. So, what we're looking at here are relationships of dependents rather than independents. The general formula that we use for calculating conditional probabilities is that we take the relevant outcomes, the ones that meet our definition of A, and we divide them by the total outcomes in our universe. However, our outcome has shrunk, it is cut down because B must be true. So the example of the die that I just gave you, we are cutting down the universe from six possibilities one, two, three, four, five, six to the odd possibilities one, three, and five. And in that case, the relevant outcome of three and the odd outcomes, one, three, five, there's one of those and three of those and so my probability is one-third. My conditional probability of throwing a three if I know that the die that I threw is odd. Now we want to relate our concept of joint probability, our concept of marginal probability, and our concept of conditional probability. And we do this using a very important rule called the product rule. The product rule tells us that the conditional probability of A given that B is true with certainty is equal to the joint probability that both A and B are true divided by the marginal probability that B is true, okay? So you notice that over here we're not assuming that B is true. The only way that we assume B is true, is when it's to the right of this special magic line, this line is what makes it true, over here it is not need to equal anything other than zero. The product rule allows us to develop a new definition of independence. You may remember that our definition of independent distributions is that the joint distribution is equal to the product distribution, okay? Well how do we get from our old definition to our new definition? The answer the is that we divide both sides by probability of B. Okay, so we have the joint probability of (A, B) divided by probability of B and we have probability of A(B) divided by probability of B assuming the probability of B does not equal 0. And now we use product rule to say that this term is equal to probability of A given B. And that is equal to probability of A. Our intuition about what this means is that knowing that B is true tells us nothing about the probabilities of A. B, the outcome B has no effect on the probability A and therefore they are independent. The converse is also true. If the conditional probability of A does not e, I'm sorry. If the conditional probability of A given B does not equal to probability of A, then they are dependent. So any two distributions are either independent or dependent, there's no middle way.