In our life, we have to make a lot of decisions. For example we have several schools and we have to decide which school to visit or we have to choose which book to read at some particular moment of time, and these decisions can affect probabilities of various events. For example of getting a particular job. We can use a probability theory to model to some extent this kind of relations between events, like choosing of school and getting a job. Let us consider one prominent example of this kind of reasoning which is called law of total probability. Let us consider the following situation. We have a university with a three-year educational program and we are interested in some ability of the students of this program. For example knowledge of calculus. We can consider the following random experiment. We pick a random student then we can consider the following events. First, the student can belong to first, second, or third year of education. This event is denoted by H_1 and the same events can be denoted H_2 and H_3 for second and third years. So H_2, H_3, similarly. Then we are interested in knowledge of calculus. So event C is that student knows calculus. Now, assume that we know probabilities of these events. Assume that probability that the randomly chosen person is first year student is one-half, and for second year the corresponding probability is 0.3. This probability is less than this one. For example because some students are expelled. Probability of H_3 is 0.2. Note also that student cannot belong to two years at the same time. So H_i and H_j does not intersect each other if i is not equal to j. We also have said that we pick random student and every student is either first year or second or third year. So at least one of these events have to occur. So we have the following thing: The union of all these three events is full probability space. It means that these events is a kind of alternative. We're choosing between one of three and we have to choose at least one and we cannot choose two so it means that we have to choose exactly one. Now, assume that we know conditional probabilities of event C provided these alternatives. For example probability of C provided that H_1 equals to 0.1. It means that probability that a randomly chosen person from first-year students knows calculus is one-tenth. This is quite small because no students of first year know calculus, but at the second year, this probability can be increased. Probability of C provided H_2 can be 0.8, and let us assume that probability of C provided H_3 is 0.7, because at third year of education, some students already forgot everything that they knew about calculus. So we know the probability of that randomly chosen student of first-year or education knows calculus, of second year, and of third year. What we want to do is to combine this information to get probability that simply randomly chosen student knows calculus. So we are interested in the probability of this event C. How to do it. Let me show you the picture. Assume that this square is full probability space and this part is H_1. Here is 0.5 and this part is H_2. Here we have 0.3, and this part is H_3 and here we have 0.2. Now, we have to take into account these probabilities. Probability of C, provided that H_1 holds, is 0.1. It means that out of this rectangle, we have one-tenth that belongs to C. Out of this rectangle, we have 0.8 that's corresponding conditional probability. It means that 80 percent of this rectangle belongs to C. The same thing with the third rectangle. The union of these smaller rectangles is actually the event C. So we have here the height is 0.8, and this height is 0.7. Now, we have to find the relative area of this figure related to the area of the whole square. This can be done pretty simple. We have to find areas of all these smaller rectangles, and then sum them up. So if this figure, let me emphasize it. This figure is our event C. Now let us find probability of C. H is equals to area of this rectangle, which is 0.5 times 0.1, plus probability of this rectangle, which is 0.3 times 0.8, plus the probability of this rectangle, which is 0.2 times 0.7. So the answer is 0.43. Now, let us find the general formula that allows us to obtain the results like this. Let us consider probability of C and let us decompose this C in to union of three separate events. Here, each of these events correspond to corresponding rectangle here. We see that these events cannot intersect each other because H_1 does not intersect H_2, and so on. So we can write. Now, we will use the definition of conditional probability to find these intersections. We know that probability of intersection can be rewritten in the following way. It is probability of one event times probability of another events, conditional on the first one. So we have this product, and this product actually the same thing as these products which is our area of this smaller rectangle. In the same way, we can do it with other terms here. So we have this sum that use only the information that we have, probabilities of these events, and a conditional probability of C provided these events; H_1, H_2, H_3. We can rewrite it shorter using the summation sign. This result gives us the law of total probability. It allows us to recover a total probability of C out of it conditional probabilities provided this event. We proved it for three events; H_1, H_2, H_3 but it is also possible to prove it in the same way for two such events or more than three. However, we have to take into account these conditions. These events are usually called hypothesis. They actually generate some kind of partition of all probability space into some non-overlapping areas. In other words, we have alternative. We have to choose exactly one of these hypothesis. Law of total probabilities allows us to find probability of C, provided that we know these probabilities. It is interesting to consider an inverse question. Assume that we know that event C occurred. So assume that we know, for example, that randomly chosen person knows calculus. What is the probability, for example, that this person is from the first year? So it is interesting to find the probability of hypothesis provided that these events halt. To find it, we have to use biased role that we will discuss later.