Welcome back to our experimental design class. We're continuing our lectures in Module 8 on inference on, or Module 10 rather, on inference on regression coefficients. So the last lecture we talked about hypothesis testing and here we're going to talk about confidence intervals in regression. It's often very useful to construct confidence intervals on the individual model coefficients to give you an idea about how precisely they'd been estimated. Then since we sometimes use the models to make predictions of Y or estimates of the mean of Y at different combinations of the Xs, it's sometimes useful to have confidence intervals on those expressions as well. We're going to continue to make the assumption about the errors that we made that hypothesis testing. That is the model errors are normally and independently distributed mean zero and constant variance sigma square. Now let's talk about confidence intervals on the individual model regression coefficients first. Your least squares estimator, beta hat, is basically a linear combination of the observations Y. Since the observations Y have a normal distribution because the errors do, then it seems kind of reasonable that that beta hat would also have a normal distribution. It would be a multi-variant normal distribution with mean vector beta and covariance matrix sigma squared times X prime X inverse. So then each of the statistics that you see here, each of these ratios that you see here would have a T distribution with N minus P degrees of freedom. Now, in this expression CJJ is the Jth diagonal element of the X prime X inverse matrix, and sigma hat square is the estimate of the error variance, and that's just the mean square error from your analysis of variance. So we can take this ratio and rearrange it to produce a confidence interval, and equation 10.38 is the equation for the 100 times one minus alpha percent confidence interval on the regression coefficient. It's just the point estimate of the coefficient plus or minus an appropriate T quantile times the standard error of the coefficient. The T quantile would be a T alpha over two quantile or percentage point with N minus P degrees of freedom. If you, for example, wanted that 95 percent confidence interval then that alpha over two would be T of 0.025 with the appropriate number of degrees of freedom. So you could actually write this confidence interval as you see at the bottom of the slide because that quantity inside the square root is sometimes also written as the standard arrow. Just to illustrate this let's find a 95 percent confidence interval for the parameter beta one in our regression model example. Now beta-hat one is 7.62129 and we already know from having to fit this model that sigma hat square is 267.604. That's the mean-square error from the ANOVA. C11 is 1.429184 times ten to the minus three and so all we have to do or substitute these quantities into our last expression, into equation 10.38. So substitute those quantities into equation 10.38 and do some arithmetic. By the way the T percentile that you need here is the 2.5 percentile of T with 13 degrees of freedom is 2.16. So the 95 percent confidence interval turns out to be this expression. There's your T multiple, there's the standard error, and there's your point estimate, and so the 95 percent confidence interval reduces to the expression that you see at the bottom of the slide. That is the lower confidence limit on beta one is 6.2855, and the upper confidence limit is is 8.9570. A fairly wide confidence interval, probably because the sample size here is not terribly large. How about confidence intervals on the mean response? This is something we very often use a regression model to do, to estimate the mean response at a particular point of interest in the in the space. So let's let X0 be a vector that represents this point. So the elements of X0 are one because of the intercept and then X01, X02, on down to X0K, those are the coordinates of the point that you are interested in calculating the mean at. So your estimate of the mean at that point is just found by plugging those values into your regression equation. The mean response at that point would be X0 prime beta and the estimated mean at that point, Y hat that X0, would be X0 prime times beta hat. So this is the estimated mean response at the point of interest. This is an unbiased estimator because beta hat is unbiased for beta. So now what we need is the variance of this expression in order be able to find the confidence interval. The variance of that expression is very easy to find. This is the variance expression. It's sigma-squared times X0 prime, that's the point of interest times X prime X inverse times X0. So substituting sigma hat square for sigma square and taking the square root of that, that is the standard error of the mean at that point. So your 100 times one minus alpha percent confidence interval on the mean response at that point would be given by equation 10.41 again this is the predicted value or estimated value of the mean at that point. This is the appropriate T quantile and this is the standard error of the mean at that point. How about predicting new observations? Regression models are very frequently used to predict some future value of the response that corresponds to a point of interest in the factor space. Once again, let's let that point be represented by x_01, x_02, and up to out to x_0k, and we can write that in vector form as x_0 prime equal to a rho vector made up of a one, and then x_01, x_02, on up to x_0k. So a point estimate for that future observation would be found by simply multiplying X_0 prime times Beta hat, the vector of coefficients. This is the expression for the prediction of this future value. So now, what you need is a prediction interval on this future value, and this is the expression for that prediction interval. Notice how similar it is to the confidence interval. This portion of this expression, appeared in the confidence interval, but there's an extra term here and the reason for that extra term is because, there's extra variability in this interval, associated with the estimates of the coefficients and the error term. So there's really two sources of variability here. In the confidence interval, you only have to worry about the error in estimating the parameters. Here, you have to worry about the error in estimating the parameters, and the error associated with the future observation. This interval will always be wider than the confidence interval. These prediction intervals can be very useful in designed experiments when we are running confirmation experiments. Remember, we talked about confirmation experiments previously and said that a really good way to run a confirmation experiment is to choose a point of interest in your design space, and then use the model associated with your experimental results to predict the response at that point, then actually go and run that point. If the observation at this new point lies inside the prediction interval for that point, then there's some reasonable evidence that says that your model is, in fact, reliable and that you've interpreted correctly, and that you're probably going to have useful results from this equation. Let's illustrate this using the situation back in example 8.1. Remember, this was a fractional factorial experiment. The results of the experiment seemed to indicate that there were three main effects; A, C, and D, and two-factor interactions, AC and AD, that were important, and then the point with A, B, and D, at the high-level and C at the low-level, was considered to be a reasonable confirmation run. So we actually performed that run and found that the response at that point was 100.25. Now, if this fractional factorial has been interpreted correctly and the model is correct, it's valid, then we would expect the observed value at this point, to fall inside the prediction interval that's computed from this last equation, 10.42, that you see here. This interval is pretty easy to calculate. The design used here was a half fraction of a 2_4, it's an orthogonal design. The model has six terms. The intercept, the three main effects of the two two-factor interactions, and then the X prime X inverse matrix is very simple. It's an identity matrix of order 6, with 1 over 8 on all on the main diagonals. So the coordinates of this point are x1 equal to 1, x2 equal to 1, x3 equal to minus 1, and x4 equal to 1. Since B or x2 really isn't in the model and the two interaction terms; AC and AD, or x1_3 and x1_x3 and x1_x4, are in the model, then the coordinates of the point of interest are very easy to find. The vector is 1, x1, x3, x4, x1 times x3, x1 times x4. It's easy to show them that that vector is as you see here, 1, 1, minus 1, 1, minus 1,1. Then the estimate of Sigma square for this model is 3.25. So we can plug all of this into Equation 10.42, and that's going to give us the prediction interval that you see being calculated on this page. These are the matrix expressions that we just defined. This is the mean square for error, 4.30 is the appropriate and statistic value here, and 100.25 is the point estimate of this future value. So when we plug in all of these numbers and do the arithmetic, this is the prediction interval at that new point. So we would expect the confirmation run with A, B, and D at the high-level, and C at the low-level, to produce an observation that falls somewhere between 90 and 110. The actual observation was 104. So I made good confirmation here, and the successful confirmation run provide some assurance that we did interpret this fractional factorial design correctly. I want to conclude this section by talking for just a couple of minutes about measures of influence. One of the things we often worry about in linear regression are influential observations. Influential observations have a tendency to pull your regression coefficient in a direction that is biased by that point. Generally, influential points are more remote in the design or in the x-space than points that are not overly influential. It's desirable to take location of the point, as well as the response variable into account when you measure influence. Dennis Cook from University of Minnesota has suggested a measure of influence that uses the squared distance between your least-squares estimate based on all endpoints and the estimate obtained by deleting the ith point. So Beta hat is the parameter vector estimated with all endpoints, all sample points, and then Beta hat_(i), is the estimate of that vector with the ith point deleted or removed from the sample, and the expression in 10,34 D_i is the influence measure that Dr. Cook suggested. You'll notice that this is just the squared distance between the vector Beta with the ith observation deleted, and the full Beta vector projected onto the contours of X prime X. Dr. Cook suggested that a reasonable cutoff value for this statistic D_i is unity. This is a heuristic, but large values of D_i do indicate that points which could be influential and certainly, any value of D_i that's larger than one, does point to an observation, which is more influential than it really should be on your model's parameter estimates. If you had to compute the D statistic from equation 10.54, you wouldn't like that very much. It's hard to do, but it turns out that D_i can be actually computed very simply using standard quantities that are available from multiple linear regression. Equation 10.55 gives you the equation for computing D_i. h_u, by the way, is the hat diagonal corresponding to the ith observation. Basically, apart from this constant p which is the number of parameters in the model, D_i is the square of the ith studentized residuals, that's r_i square, and this ratio h_u over 1 minus h_u. That ratio can be shown to be the distance from this particular point x_i to the centroid of the remaining data in your sample. So Cook's distance measure is made up of a component that reflects how well the model fits the ith observation, and then another component that measures how far away that point is from the rest of your data. Either one of these or both can contribute to a large value of D_i. Table 10.3 in the book, shows the value of D_i for the regression model fit to all the viscosity data from our example. None of those D_i has exceed one, so there's no real strong indication of influence here in the model. Here is equation or rather, here is table 10.3 from the book. Here are all the values of D_i from this model. You notice that none of them are anywhere close to being large enough to cause us some concern.