In the previous lectures, we've talked about the mechanics of least squares, fitting multiple linear regression models, and we've looked at some applications of regression models in dealing with designed experiments, and both analyzing a designed experiment as a regression model. And then looking at how we might deal with certain problems such as missing values or inaccurate levels, or maybe de-aliasing interaction effects. Well, in this class, we're going to talk about another important topic which is hypothesis testing in multiple regression. Test of hypothesis about the model parameters are very, very helpful. And in this class, we're going to describe several different important procedures. These hypothesis testing procedures assume that the arrows, the epsilons in your model, are normally and independently distributed with mean 0 and variance sigma square. Now, these tests are relatively robust to these assumptions. But but nonetheless, those assumptions are required in order to develop the test. And the first test that we're going to talk about is popularly called, the Test for Significance of Regression. Sometimes it's called the overall model test, but I use the terminology significance of regression. And it's a test to determine whether there is a linear relationship between the response variable y, and some subset of your regressor or predictor variables, x1, x2, on up to x sub k, and the appropriate hypotheses are shown in equation 10.20. The null hypothesis is that all of the individual regression coefficients beta 1, beta 2, on up to beta sub k, are equal to 0 against an alternative that there is at least one of those coefficients that is not 0. And so that would be the coefficient or coefficients for which we have a linear relationship between y and those set of x's. So how do we do this test? Well, the test uses an analysis of variance approach, and the regression or model sum of squares is given by equation 10.23. And that's beta hat prime, X prime y, minus what I'll call the usual correction factor, just the sum of the y's squared divided by n. And the error sum of squares of course, which we've seen before is y prime y, minus beta hat prime X prime y, so basically that is the basis. That's the foundation of an ANOVA partition, and the total corrected sum of squares then would be, y prime y minus the correction factor. So table 10.6, is how you would array an analysis of variance table to test for significance of regression, the regression sum of squares or model sum of squares, SSR would be computed from 10:23. The total sum of squares would be found from 10.25, and then the error sum of squares you would get by subtraction. There are k degrees of freedom for the model or for regression, one for each model parameter, n- 1 total degrees of freedom, and n- k- 1 degrees of freedom for error. The ratio of the sums of squares to the degrees of freedom produce mean squares, and then the appropriate statistic for testing that null hypothesis that we saw earlier, is the ratio of mean square regression over mean square error. And under the null hypothesis that all the betas is 0, that is a central F distribution with k numerator and n- k- 1 denominator degrees of freedom. So you could use a critical value from the F table, or you could compute a p-value based on that F reference distribution. And there's a note here on the slide that says, see table 10.4 in the book for the viscosity regression model. That table also reports something called a coefficient of multiple determination R square, and R square can be computed as the regression sum of squares over the total sum of squares or equivalently, 1 minus the error sum of squares over the total sum of squares. And so just like in the ANOVA for design experiments, R square is a measure of the amount of reduction in the variability of y, that is obtained by fitting these regressor variables in your model. Now, just as we talked about in designed experiments, a large value of R square does not necessarily tell you that your regression model is a good one. You can always make R square bigger, simply by adding variables to the model, and this is going to happen whether the initial added variables are statistically significant or not. In other words, you could put in a variable that's really not very useful, and the R square will always go up. So it's very possible for models that have large values of R square to produce really poor predictions of new observations or estimates of the mean response. We'd like to have some way to counteract this, some alternative to looking at R square. And some regression model builders in computer packages, report something called an adjusted R square, and the equation for the adjusted R square statistic is shown in equation 10.27. And you notice that the R square adjusted is 1- SSE over n- p over SST divided by n- 1. So that numerator SSE over n- p, that is basically the mean square for error. And then SS total over n -1 is sort of like a total mean square, and you can manipulate that expression mathematically to get the expression on the far right-hand side of the equal sign there. And that is 1 minus the ratio n- 1 over n- p, times the quantity 1- R square. Now, it turns out the adjusted R square statistic does not always increase when you add variables to the model. In fact, if you put in unnecessary terms, the value of R square will often decrease. Now, why is that true? Well, because the numerator out there in R square is mean square for error. And if you put in variables that are not useful, that means square for error can get larger. And that's what would cause the adjusted R square to decrease. For example, the adjusted R square for our model that we've been looking at, if you compute it using the quantities that are shown in table 10.4 in the book, the adjusted R square turns out to be 0.9157, and that's actually very close to the ordinary R square. And as long as the ordinary R square and the standard R square are similar to each other, you feel like you're in pretty good shape with so far as the variables are in the model. But if they differ dramatically, then there's a chance that you've put in some nonsignificant terms. Okay, how about testing individual regression coefficients? After you've done the test for significance of regression, a very obvious question is, are all of the variables significant, or they're only a few variables that are important? So that means that we need to be able to test hypothesis on the individual. Regression coefficients. And this is a very useful test to be able to perform. For example, your model might be better if you could delete one or more of the variables that you've already included. So basically, remember that adding a variable to the model causes the sum of squares for regression to increase. And the error sum of squares to decrease. And what you have to decide is, is the increase in the regression sum of squares large enough to warrant putting the additional variable in the model? And remember, putting in an unimportant variable can actually cause your mean square error to increase. And that limits the usefulness of the model. The hypotheses for testing the significance of an individual regression coefficient, Are shown here. The null hypothesis, H naught is that beta J is equal to 0. And the alternative is that beta J is not equal to 0. If that hypothesis is not rejected, if that null hypothesis is not rejected, then that suggests that you can remove. Or eliminate that variable XJ from the model. This test is usually performed as a t-test. And equation 1028 shows the t-statistic for for this test. The t-statistic is beta hat sub J, the estimate of the J-3 regression coefficient, divided by its standard error. That square root of sigma hat squared times CJJ is the standard error of beta J. CJJ is the diagonal element of X prime X inverse. And sigma hat square would be estimated by the mean square for error. I view this as a partial or a marginal test because the value of beta J. The numerical value of beta hat J depends on all the other variables that are in the model. And so this is telling you the contribution of XJ to the model given all of the other variables that are included in the model. And sometimes you will see this statistic written as equation 10.3. That is, the t-statistic is beta hat sub J over the standard error of beta hat sub J. Here is table 10 .4 from the book, which we've alluded to a couple of times and this is Minitab Output. It shows you the fitted regression equation and it shows you the overall test for significance of regression. That's the portion that you see down here and the overall test for significance of regression, the statistic is 82 and a half. The p-value it says is zero. It's not really zero. It's just smaller than the number that Minitab decides to print. And then here are the t-statistics for the individual model coefficients. And the intercept, or the t-value, is 25.4, for temperature it's 12.3, and for the feed rate it's 3.5. And all of those t-statistics are significant. So there's no indication here that any of these variables are not contributing usefully to the model. And by the way, here are the r-squared and adjusted r-squared statistics. And you can see that the ordinary r-squared is 92.7. And the adjusted r-squared as we computed earlier is 91.6, very similar. And so overall it looks like both of these variables are important contributors to the model. Now, another test that sometimes comes in very handy is to examine the contribution to an individual variable. Or to a group of variables using something called the Extra Sum of Squares method. And it turns out that if you have only a single variable involved in this test, it is equivalent to the t-test. But it's a very useful test when we want to look at tests on groups of coefficients. So let's suppose we have our standard multiple linear regression model, Y equal to X beta plus Epsilon. And we want to determine whether there is a subset of predictor variables X1 X2 on out to X of R. And that R is going to be less than K. Remember, their K variables is they're this subset of predictors that contribute significantly to the model. So let's partition our vector beta into these two pieces beta 1 and beta 2. And beta 1 is going to be R by 1 and beta 2 is going to be T minus R by 1. And we want to test the hypothesis, the null hypothesis, H naught that beta 1 is equal to 0 against the alternative that beta 1 is not equal to 0. So in other words, your model can be written in terms of a partition to beta and a partition to X. That is Y equal to X beta plus epsilon can be written as Y equal to X1 beta 1 plus X2 beta 2 plus Epsilon. That's this equation ,10.32. X1 or the columns of X that are associated with the parameters in beta 1. And X2 are the columns of X that are associated with the parameters in beta 2. So how would we go about testing this hypothesis? Well, the first thing you do is you fit the full model. And now, the full model contains both beta 1 and beta 2. Now, for the full model, we know that beta hat is equal to X prime X inverse times X prime Y. And the regression sum of squares for all the variables in the model is simply beta hat prime X prime Y. That's P degrees of freedom, so the intercept is actually included there. The mean square for error, of course, is Y prime Y minus beta hat prime times X prime Y over in minus P. And SSR of beta, this thing that you see up here, is often called the regression sum of squares due to beta. Or sometimes, it's called the full model regression sum of squares. Now, to find the contribution of the terms in beta 1 of the model, we fit what we call a reduced model. And the reduced model is restricted to the null hypothesis. That is, beta 1 equal 0 is assumed to be true. So the reduced model now is Y equal to X2 beta 2 plus epsilon. And the least squares estimate of beta 2 for that model is beta hat 2 is X2 prime X2 inverse times X2 Prime Y. And the errors, rather the model sum of squares SSR beta 2, would be beta hat 2 prime times X2 prime Y. That's equation 10.34. So then the regression sum of squares due to beta 1 given that beta 2 is already in the model has got to be the difference in those two model. Or regression sums of squares, and that's what we're denoting here. SSR of beta 1, given that beta 2 is already in the model, is SSR of beta. The full model, regression sum of squares, minus SSR of beta to, the reduced model, regression sum of squares. This sum of squares has R degrees of freedom. And it's usually called the extra sum of squares due to beta 2. In other words SSR of beta 1 given beta 2 is the increase in your regression or model sum of squares due to including those additional variables x1 x2 on up to x r into the model. And that extra sum of squares is independent of the mean square error. So you can use it to form an F statistic as you see in equation 1036. And so that is a statistic for testing the contribution of the extra variables X1 X2 on up to X sub R in the model. If that F statistic is large if it's significant that tells you that at least one of the parameters in beta 1 Is not zero, some people call this the extra sum of squares test. Some people call it the partial F test and it can be a very useful test in multiple linear regression. Let's talk about this viscosity example for just a moment, suppose we want to investigate the contribution of the variable x2 that's feed rate to the model. So the hypothesis that you want to test is H naught beta 2 equal to zero against the alternative beta 2 is not equal to 0. So your full model would have both beta-1 and beta-2, the reduced model would not have beta 2 x 2. So the reduced model would simply be the simple linear regression model that you see here. The least squares fit for that model is straightforward to obtain and the regression sum of squares with again one degree of freedom turns out to be 40,840.8 and that's actually shown down at the bottom of the minitab output in table 10.4, there it is. This is the quantity that you see right here. So SSR of beta 2 given that beta 1 is already in the model and the intercept of course would be the difference in the model sum of squares for the full model and 40,840.8, which is 3,300 and 16.3. So this is the extra sum of squares of Beta 2 or it's just the difference between the full model sum of squares and SSR of beta 1 given beta 0 and those two quantities turn out to be numbers that we can compute from table 10.4, and SSR beta 1 given beta 1 and beta 2 given beta 0 is 44,157.1. So this is the increase in the regression or model sum of squares that comes about from adding X2 to a model that already has X1 and it's also shown at the bottom of the minitab output on in table 10.4, so to test the beta 2 is equal to 0 to test that null hypothesis. Then your F statistic would be SSR of beta 2 given beta 0 and beta1 over one because there's only one degree of freedom only one parameter here and you would use the full model mean square error. And so that F statistic turns out to be 12.3926, and by the way, the five percent value of f with 1 and 13 degrees of freedom is 4.67. So we would clearly reject the null hypothesis and conclude that you do need feed rate in the model now now here's something kind of interesting this partial F test involves only a single parameter, only a single regressor. So it's equivalent to the t-test because the square of any T random variable with v degrees of freedom is an FCS random variable with 1 and V degrees of freedom. So if you go back to table 10.4 and look at the T statistic for testing the null hypothesis that beta 2 is equal to 0 it was 3.5203. And if you square that t-statistic you get 12.3925, which is exactly the same as the partial F statistic that we obtained here. So this shows you that the partial F procedure the extra sum of squares procedure applied to a single predictor variable is really the same as doing the T test but the real utility of the extra sum of squares technique is when we have more than one variable included in that extra sum of squares.