So what if we go back to our question asking if the explanatory variable was manipulated? And the answer is yes. Data that come from studies in which the explanatory variable is manipulated, are called experimental data. Experimental data come from studies in which groups of observations are either pre-selected or randomly assigned, and the values of an explanatory variable and then observed on some response variable. There are two major types of experimental studies, True Experimental studies and Quasi Experimental studies. There are three components of an experimental study, first, only one explanatory variable is manipulated. Meaning that all other variables, that could also be related to the response variable, are held constant. The only thing that changes, is the value of the explanatory variable that is being manipulated by the experimenter. Second, there must be a control group, to which other values of the explanatory variable are compared to, on the response variable. And third, observations must be randomly assigned to values of the explanatory variable. This means that every observation starts out with an equal probability of being in each group, but is then randomly chosen to be in one group or another. For example, an agricultural researcher might be interested in determining the effect of a new fertilizer on plant growth. In this study, each plant is an observation. Fertilizer application is the explanatory variable, and plant growth is the response variable. The researcher takes a sample of seedlings, and randomly divides a sample into two groups. The first group of seedlings are fertilized and kept for three months in a room with a controlled amount of sunlight, watering and air temperature. The second group is kept in the same room with identical conditions for the same three months, with the exception that the plants in this group were not fertilized. Because the researcher is interested in the effect of the new fertilizer on plant growth, the plants that were not fertilized were the control group. And the plants that were fertilized were the treatment group. After the three month period, the researcher measures the height of each plant in both groups. The researcher found that the plants that were fertilized grow an average of two inches higher than the plants that were not fertilized. As a result, the researcher then concluded that the fertilizer significantly increased plant growth, and recommended that farmers should be encouraged to use the fertilizer. So you can see in this experimental study, that all other variables, with the exception of the explanatory variable of interest, are held constant in each group as a result of the experimental design. Because all other factors that could affect plant growth were held constant in this experiment, the researcher could conclude that the fertilizer caused the plants to grow higher. Most of the data we work with however is not produced by a true experiment. Most of the time we can't physically control all, or even any of the other factors that might affect our response variable. So for most studies we are not able to determine whether one variable causes another variable. But we are able to determine associations. Random assignment is another way we can control for these other factors. The idea is that if every observation in the sample has an equal probability of being in each of the groups, and truly, randomly end up in one group or another, then the groups end up balanced in terms of the other factors. So if age is a factor, then the group should have the same age variability and this equal variability essentially controls for that factor. And this should be the case for any other factor, however randomization doesn't always work the way we want it to. In fact randomization works best as your sample size approaches infinity. Unfortunately we work with finite samples, which can often be pretty small. The smaller the sample the greater the risk that the groups will be unbalanced on factors that could affect how the treatment affects the response variable. If part of your job as a data analyst is to evaluate data from studies with random assignment, one of the first things you'll wanna do is to check for any imbalances between your treatment and control groups on key variables that could change how the treatment effects the response variable. If imbalances are identified, then those variables can be included in the statistical model to predict the response variable, so that they can be statistically controlled. Statistical control is another commonly used strategy. If we include additional explanatory variables that could effect the association between the treatment and the response, than we could examine that association after adjusting to the other explanatory variables. Well, these are all good strategies, from posing as much control on a study as possible. They're not perfect. Nor can we possibly control for everything that could affect the association between the treatment and response variable. For that reason, unlike a true experiment in which we are able to hold every other possible variable constant, we cannot determine causality. We can only determine whether the treatment is associated with the response variable. Sometimes, we can't randomly assign people to a treatment or control group. In many cases, it would be unethical to do so. For example, if we're conducting a study to examine the association between cocaine use and memory processing, there's no way we could assign some participants to use cocaine. This would be completely unethical and we put our participants at significantly greater risk of harm. It certainly would not outweigh the benefit of any knowledge that would be gained by the study. Instead, we would have to identify people who either test positive for or self report, cocaine use and then test for memory processing differences between users and non-users. The manipulation of the explanatory variable is based on the fact that our treatment and control groups are pre-selected. In this study, cocaine users would be in our treatment group and non-users would be in our control group. So while it looks like an experimental design, it is missing the random assignment piece, and we call this a quasi-experimental design. We can increase the rigor of a quasi-experimental design by measuring as many confounding variables as possible. Having a control group and using a pre and post-test design whenever possible. A quasi-experimental design will not allow us to infer causality between an explanatory variable and our response variable.