Welcome back. We're still talking about regression modeling. In this class, I'm going to show you examples of how regression methods can be helpful in analyzing data from designed experiments. We're going to start by simply looking at a 2_3 factorial design and showing you the regression analysis of that factorial design. So this is a 2_3 factorial. Chemical engineer is looking at the effect of three process variables on yield; temperature, pressure, and catalyst concentration. Each factor can be run at two levels, and so we run a two to the three design, but he's also included four center points. The design and the yields are shown in the book in table in Figure 10.5. There is a table that shows you the design, both in terms of the natural or engineering units and the coded units. Well, we're going to fit the model in terms of the coded units. So we're going to fit a main effects model. So we have only the terms x_1, x_2, and x_3 to represent the main effects. So the X matrix and Y vector are shown about the middle of this left-hand panel. The way we get those, of course, is we simply take the coded variables from the data table and we enter those into the x matrix and we add a column of ones to the left. This model matrix is found by taking the design matrix and expanding it to model form. In this case, the only thing we had to do is add a column of ones for the intercept. Then the Y vector is just this vector of observations on yield. X'X turns out to be very easy to find because this is an orthogonal design. So even with the added center runs, everything is still easy. The main diagonal elements are 12 in the first position because we have 12 ones and then we have eights on the remaining main diagonals because we have eight plus and minus ones and then zeros everywhere else. Then X prime Y is found in the usual way. The X prime X matrix is of course diagonal, so the inverse is easy. The inverse is a diagonal matrix with 1 over 12 in the first position and then 1 over 8 in all of the other diagonal positions. So the fitted regression model turns out to be as you see here. As we've talked about before back in Chapters 6, 7 and 8, the regression coefficients here are very closely related to the effect estimates. For example, the effect of temperature would be the average of all the runs where temperature is at the high level minus the average of all the runs where temperature is at the low level. If you do those calculations, that is 11.25. Well, the regression coefficient for temperature would be half of that effect estimate or 5.625. You'll notice that that's exactly what we got when we did the linear regression fit. So again, this is demonstrating that the effect estimates from 2_k are least squares estimates, and that the regression coefficients are always exactly equal to half the effects. The variances of your model parameters are found from the diagonal elements of X prime X inverse. The variance of Beta hat would be Sigma square over 12 and the Beta hat zero would be Sigma square over 12. The variance of all the other Beta hats, the other regression coefficients is Sigma square over 8. So the relative variance of the intercept is 1 over 12, and the relative variance of the other regression coefficients is 1 over 8. The picture that you see beside the data table in Figure 10.5, that shows you the geometry of this design with the eight runs expressed at the corners of the cube. This was easy because X prime X is diagonal because the design is orthogonal. So if we can design experiments so that we always have orthogonal designs, life would be easy, and in practice, it is often fairly easy to do this. Simply making the columns orthogonal, as we do in a 2_k design produces this sort of situation. Now, this is the easy case, but regression methods are sometimes useful when things go wrong in a designed experiment. Let's see what I mean by that. Let's take this same example that we just solved. But let's suppose that one of the runs is missing. The run with all variables at the high level is gone, that run is lost. This can happen for a lot of reasons, maybe a faulty measurement or maybe some of our experimental material or experimental units were damaged. So we still want to fit the main effects model, but now we only have 11 observations. So the x matrix and y vector are as you see here and you notice that we're missing that run with everything at the high level. When we form the X prime X matrix, it is no longer diagonal. The main diagonals are 11 and seven and the off diagonals are all minus ones. So the X prime X matrix is no longer as easy to find. We have to use the general methods for finding an inverse matrix in order to do that. But we can do that and then we solve the normal equations and look at your fitted regression model. Look how similar this is to what we found when we actually had all 12 observations. Our regression coefficients are very similar to what we saw before. Now, the model coefficients are no longer orthogonal to each other. There is some correlation between them. But the effect estimates and regression coefficients are very similar in size to what they were when we had all the observations. So regression analysis can be a very convenient method for analyzing data if you have one or more missing observations in a factorial. Another situation that occurs in practice is inaccurate levels in your design factors. Sometimes, it's difficult to actually hit and hold and achieve the design levels that you're trying to use for your experiment. To illustrate, here's our previous example. But now, we're having trouble hitting and holding the design levels accurately. It looks like most of the difficulties with respect to the temperature variable, I still can put everything in terms of coding units, that's what you see here. So I can still fit a model in terms of the coding units and that's what you see in the right-hand panel of this slide. We are fitting the model using the actual levels that we achieve in the experiment. So we find X prime X. It is of course obviously the longer diagonal. We find X prime y, and then we get the inverse of X prime X, and we solved the normal equations and the fitted model, again, with the regression coefficients, reported at two decimal places is shown here. If you compare that to the original model where we have the accurate factor levels, you'll notice there's not very much difference.So the practical interpretation of this experiment would not be seriously impacted by the inability of the experimenter to hit and to achieve and hold the desired factor levels exactly. Okay. The final example that I want to show you is something that is potentially quite useful, and that is de-aliasing interactions in a fractional factorial. We talked about, back in chapter eight, that there are various schemes that can be used to de-alias interactions in a fractional factorial for a process called foldover. For a resolution three design, the full foldover is conducted by simply running a second fraction, in which we reverse all of the signs in the original fraction. Then you could use the combined design to de-alias all the main effects from all of the two-factor interactions. Now, the difficulty will full foldover is that it requires a second group of runs that is the same size as the original design. That can be expensive. But sometimes, regression methods can show you a way to do partial foldover with even fewer runs than you might get with a standard partial foldover. Standard partial foldovers use half the number of runs associated with a full foldover. But sometimes, we can be even more efficient. Let's illustrate this. Let's suppose we've done a 2_4 minus 1 resolution for design. Table 8.3 in the book shows you the principle fraction of this design with I equal to ABCD. Now, suppose after the data from those first eight trials were observed, the largest effects were A, B, C, and D, and then the AB plus CD interaction alias chain. So is it AB, or is it CD, or is it both? Well, to find out, of course, we could run the alternate fraction. You could do another eight runs by simply reversing the signs in, say, column A. That would enable you to, de-alias all of the interactions in two factor interactions involving A. You could also do a partial foldover. But you could actually use fewer than four trials. Suppose you want to fit the model that you see here, four main effects and two-factor interactions. Using the design from table 8.3, this is the X matrix that you would get. Now I've labeled the columns. Look at columns x_1 and x_2. Notice that the x_1 and x_2 and x_3 and x_4 column are identical. Well, no surprise because AB or x_1x_2 is alias was CD, which is x_3x_4. So there's a linear dependency. We can't separate Beta_12 from Beta_34. But suppose you add one more run. Suppose you add the run where x_1, x_2, and x_3 are all at the low level and x_4 is at the high level. If you do that, then the X matrix now looks like this. I want you to consider those last two columns. Notice that they are no longer identical. So you can now fit the model including both the x_1, x_2 and x_3, x_4 term. In other words, you could de-alias AB from CD with as few as a single run. Very, very nice, very nice. Obviously, a single run could be used to do this. But the disadvantage is that if there is a timer block effect between the first eight runs and the time where the second run is made, we've got a problem. If you add a column to the X matrix for blocks, you get the matrix X that you see at the bottom of the first column. I've assumed here that the block factor was minus during the first eight runs and plus during the last run. Well, blocks are now no longer orthogonal to treatments. You could see that simply by getting the cross-product of every column with the block column and noticing that it's not zero. To make the block affects orthogonal to the other variables, you have to have an even number of runs. You could use four runs. If you use the four runs that you see here, you de-alias AB from CD, and you've preserved orthogonal blocking. Now this is equivalent to a partial foldover in terms of the total number of runs. In general, it is often straightforward to look at the X matrix for the reduced model from your fractional factorial and find out which runs you might want to add to augment the original design. If you can do this with fewer runs than you would get from a foldover or a partial foldover, this could be very useful. By the way, there are also some computer-based optimal design tools for constructing optimal designs that can be used for design augmentation purposes to de-alias effects just like this. We talked about some of those techniques in chapter nine.