Now, let's do a prediction. Regression says that if we are given x, then, we use the regression line to predict y. That means, we compute the regression line, plug in x, and see what predictor y hat we get. As noted before, in order to compute a regression line, all we need to have are the five summary numbers of these statistics. Computing the regression line can be done very quickly in software. For example, the command lm in the computer language R will do that for you. But it turns out you can actually do it rather quickly by hand. Let's look at an example. Let's say the average midterm score was 49.5, the average final score was 69.1, the standard deviation on the midterm was 10.2, the standard deviation on the final 11.8, and the correlation coefficient r was 0.67. Now, suppose somebody tells you a student scored 41 on the midterm, and you have to predict the final exam score of that student. Remember, if we wouldn't have that information about the midterm score, then, the best predictor for the final would simply be the average of 69.1. But regression gives us a tool to come up with a better predictor by incorporating the additional information that the student scored 41 on the midterm. Here's how we can do that regression by hand very quickly. First, note that 41 is 8.5 below the average. Here, average refers to the average of the midterm scores which is 49.5. Now, we standardize this, 8.5 below average means 0.83 standard deviations below average. Looking at the formula for the slope of the regression line, we predict the final exam score to be only r times 0.83 standard deviation below average. So, now, we can simply plug in the numbers. We take the average for the final exam, we subtract off because we are below average. So, we have to subtract off r which is 0.67 times 0.83 times the standard deviation for the final exam scores, which is 11.8, and we arrive at 62.5. This will be our prediction that we get from regression. Now, let's turn the prediction around. Suppose I told you that a student scored 89 on the final, and your task is to predict the midterm score of that student. Here's one important thing about regression. When you predict x from y, it is a mistake to use the regression line you got from predicting y on x, and simply solve for x. It's somewhat tempting, but it's the wrong thing to do. The reason why it's wrong is because there are two regression lines. There's one regression line for predicting y on x, and there's a different regression line for predicting x on y. These two lines will typically be very different. To avoid the confusion between these two lines, it's always best to predict on the x axis, and whatever you want to protect on the y axis, and then you simply do what we did before. So, in this case, that would mean we look at an x axis which corresponds to the final exam score, and a y axis which shows us the midterm scores. That's because the final exam score is the thing we base our regression on, that is, the final exam score is the predictor. So, we know the average of the final exam scores was 69.1, the average on the midterms was 49.5, and the information we are given is that the final exam score is 89. So, it's above average. We know that the regression line goes through the points that are the average on the horizontal axis and the average on the vertical axis, and it slopes upward because r is positive. So, the regression line looks somewhat like this. Since we base our regression on a final exam score of 89, we expect to end up somewhere above average for midterm. Exactly, how much above average is this calculation which we just did before? So, we say that 89 is 9.9 above average. So, if we standardize, we get 9.9 divided by the standard deviation of 11.8, equals 0.84 standard deviations above average. Therefore, our prediction for the midterm, will be also above average, and it will be not 0.84 standard deviation above average, but r times 0.84 standard deviations above average. And now, you can just plug in. We take the average for the midterms which is 49.5. Since we are above average, we add r which is 0.67 times 0.84 times the standard deviation which is 10.2, and that gives us 55.2. This will be our prediction for the midterm score. Now, keep in mind, if r were negative, then, the line would slope downward, and then, we would end up below average. So, whether we predict above or below average, depends on whether the correlation coefficient is positive or negative, and it also depends on whether we start out above average here or below average there, and it's best to make a picture and see which side you have to end up.