So let's talk about another instance of an orthonormal basis that comes up quite frequently. So imagine our x is n by p and let's still assume that p is less than or equal to n, but imagine if we have a large number of subjects. Or records, so n is large, and then p is also large. So we want some way to reduce the dimension of x to make it a little bit more manageable. Consider the singular value decomposition of x. So x is n by p of full column rank. So we can decompose it into udv transpose where u is n by p, d is a p by p diagonal matrix of singular value, and v is a v transpose is a p by p matrix. And these are such that u transpose u equals v transpose v equals I. Okay, so one thing I want to know, is imagine for the time being that x has been centered in the sense that all of its column means are zero. And consider x transpose x, which is effectively the variance covariance matrix of the x matrix disregarding the n minus 1. Well, that's equal to using this result and this result. That's equal to vD u transposed uDv. Which is equal to v D squared v transposed. So the eigon value decomposition of x transfers x is related to the singular value decomposition of x by itself. The squared singular values, the eigon values from the eigon value decomposition helps summarize the variance in the x transpose x matrix, which is itself a variance-covariance matrix. And these are usually ordered so that the larger d squared values are earlier. So they're usually ordered in decreasing numbers. So the first one is the largest. The second one is the second-largest and so on. And so, consider the fact that the trace of x, transpose x, is equal to the trace of vD squared v ranspose. And then I can move that v over there, since trace ab is trace ba. v transpose v is I. So that trace of x transpose x is equal to the trace of the sum of my eigenvalues, squared. And so what this means is the eigen values are summarized in the variability, in the sense that the trace is the total variability In my x transpose x matrix, by taking all the diagonals, sum of all the diagonals, sum of all the variances. Okay, so what we could do is take the first three components of our decomposition and see what percentage of the total variation they explain. Let's suppose they explain 90% of the variation. Then we might consider reducing x by only taking those first three eigen vectors. So consider the way to do this, v transposed x. Wrong direction. V times d inverse is equal to u. So a way to think about how we get at the scores, these are the so called scores, the u, the vectors associated with u is by multiplication of x by v. So what v does is it combines the columns of x in such a way that it gives us these scores. And then the D inverse is sort of a normalization term, right, the D's are, you can think of as variances, so multiplication by DM, versus sort of like, normalizing in the sense of dividing by a standard deviation. Okay, so, what we could do if, to every column of v, which is an eigen vector is associated an eigen value which is an element of d squared. We might say the top three and then only say take the first three elements of this u matrix okay. So by only taking the first three elements of v, we'd only be taking the first three elements of d. Or we could just of course do the singular value composition which will give us U, D, and V and just take the first three vectors of U. So we could then try to minimize y minus u gamma let's say squared, okay. Where I'm going to put a little three under my u because I just happened to grab the first three columns of u and what this would mean is I'm trying to regress y with the design matrix u but my u was selected in a way to capture as much variation as possible as I could in my x. Of course I'm just using three as an example, you could use any number of the, number of columns of U to do this with. But you want to explore what percentage of the variation they explained, and how tolerable that percentage is to your goals. But at any rate, our discussion from orthonormal bases notes that because of course U is orthonormal, grabbing any three columns of U, especially the first three columns of U, is also going to be orthonormal. And so our estimate of gamma, our gamma hat is just going to be u 3 transposed times y. Okay so what we find is that the way in which we get sort of principal component regressors is simply by taking the singular value decomposition of our centered X matrix, taking the relevant columns from our score. Our vector last singular values which if we think about it in terms of principal components, as our scores and then if we simply multiply them by y multiply the transpose of them times y. We actually get the associated coefficients. So this just goes to show how we can use these nice operations that we get out of squares in this particular case. So using the singular value decomposition to come up with the orthonormal basis I think represents of the three most important bases, concepts and statistics, certainly I would describe wavelengths, transforms, and principal components basis as the three. And I think you can see that in this case it fits very nicely into the topic of regression. And it also fits very nicely if we have a large X matrix with a lot of columns that we want to summarize. One caveat, I would suggest to be careful of. Again, we get U. We can think of U as these linear combinations of our columns of X. If the units of x don't make sense to combine then this procedure may not make a lot of sense to do. So if the first column of x is a particular kind of units and the second column of x is a different kind of units then the interpretability of your scores may really suffer as a result. So again, there's a lot of intricacies to do this. And I think if you wanted to learn more about this, a class on multivariate statistics would be the way to go. But I just wanted to reinforce this point that when we have a design matrix that's orthonormal, we work out with a really simple solution for the coefficients. Okay, and next we'll go through a coding example where we go through some of these sorts of examples.