[MUSIC] In everyday life we don't tend to think of establishing cause and effect as a particularly hard problem. Answering the question, did the plate break because I dropped it on the floor, is straightforward. Similarly if we bump into a friend and he moves away from us, it does not take a rocket scientist to reveal that we cause the friend to move. It is usually easy to see that an action caused an outcome, because we often observe the mechanism by which the two are linked. For example. if we drop a plate, we can see the plate falling, hitting the floor, and breaking. Moreover, we handle plates every day and they never tend to break unless we drop them. Establishing cause and effect becomes a hard problem when we don't observe the mechanism by which an action is linked to an outcome. Regrettably, this is true for most marketing analytics. For example, it's exceedingly rare that we can describe, let alone observe, the exact process by which an ad persuades a consumer to buy. This makes the question, did this ad cause the consumer to buy my product, very tricky to answer. The question we would ideally want to answer is, how would a consumer behave in two alternative worlds that are identical except for one difference? In one world they see the ad, and in the other world they do not see the ad. Important in considering this question is that the two worlds referred to above are literally identical. Meaning that the worlds match on the most minute details, like how many people were born on March 23rd 1983, or how many hairs did they have on their head? Now if there existed two such worlds and we observed the difference in outcomes, for example, purchases, or visits, clicks, or attention. Because the worlds are the same, except for the ad, we could conclude that the ad caused the difference in the outcomes. And with caused we mean, if the ad had been shown in the world in which the ad was not shown, then the effect on the ad on outcomes would have been the same as in the world in which the ad actually was shown. While the above, of course, serves as a nice thought experiment, the co-problem in establishing causality is the consumers can never be in two worlds at once. If a consumer has seen an ad, this precludes that consumer from not having seen the ad. Conversely, if a consumer has not seen the ad, this precludes the consumer from having seen the ad at the same time. This problem goes far beyond marketing measurement. In fact, if applied to other units of comparison, this is the fundamental problem of establishing causal relationships or cause and effect in science. Now the solution for the problem that consumers cannot be in two worlds at once is what we call a randomized experiment. The idea is to assign consumers randomly to one of several worlds, or conditions, as they're typically called. In practice, there are many different names for such conditions, for example, experimental conditions, A/B conditions, or treatment and control conditions. Going back to our advertising example, suppose we want to perform an experiment with 100,000 consumers. We would start with consumer one, and randomly assign them with a 50% probability to either see the ad, what we call the ad condition, or not to see the ad, what we call the no ad condition. Now, repeat this with consumer two, and three, and four. And after repeating this random assignment for all 100,000 consumers, we will have close to 50,000 consumers, who were assigned to either see the ad, or not see the ad respectively. Notice that the groups of consumers in the two conditions are not identical, because, of course, the groups consist of different consumers. However, they are probabilistically equivalent, meaning that there are no systematic differences between the groups either in their characteristics or how they would respond to the ads. To understand why this will be the case, suppose we knew that the product appeals more to women than to men, and because of that, women are more likely to buy the product than men. Now suppose that we find the consumers in the ad condition are more likely to purchase than consume as in the no ad condition. Given that the product is more liked by women, normally we might be concerned that this is because there might be more women in the ad condition than in the no ad condition. However, this can't be the case, because men and women were randomly assigned to both conditions. Therefore, as long as there is sufficient number of consumers, the proportion of men and women in both conditions should be close to equal. And therefore cannot be the reason why purchases in the ad condition than in the no ad condition. What makes randomization so powerful Is that it works on any consumer characteristic at the same time. For example, suppose we also knew that the product appealed more to Democrats than to Republicans. And that the product appealed more to millennials than to baby boomers. And that the product appealed more to urban than rural consumers. Randomization for large enough sample sizes ensures that every possible characteristic will also be present in close to equal proportions in the ad and no ad condition. Finally, add to the list of consumer characteristics I just listed a characteristic that we as marketers can neither measure nor even conceive what it might be. Even for this unknown, unmeasurable characteristic, randomization will ensure that the characteristic will be present in close to equal proportions in the ad and and no ad condition. That is what probabilistic equivalence means. All characteristics that might influence the effect of the ad on purchases can be found in close to equal proportions in the ad and the no ad conditions. And therefore, any difference in purchases between the conditions cannot be explained by differences between consumers. They have to be caused by the ad. Probabilistic equivalence allows us to compare conditions as if consumers were in two worlds at once. [MUSIC]