In this lecture, we are going to do a set of hand calculations of our principal components transform. You will not have to do this in practice. The purpose of the exercise is to demonstrate the importance of the eigenvalues and eigenvectors of the covariance matrix and how we should interpret their specific value. We reiterate the important aspects of the principal components transformation. The equation at the top shows how we generate the principle components, that is, the new bands y. From the original set of bands x, that relies on the elements of the transformation matrix G. We get the elements of G by computing the eigenvectors of the original covariance matrix C_x. Remember, G is the transpose of the matrix of column eigenvectors. The covariance matrix of the transformed data that is in the y-coordinate space, is the diagonal matrix of the eigenvalues of C_x. Remember there are four steps in obtaining the principal components, as we saw in the previous lecture. We will compute each of these steps in our example. Once we have finished the example, you should be quite familiar with how the principal components of an image come about. We will do our calculations using the highly correlated data-set from the previous lectures. Here we see the data distribution, again, its covariance matrix, and the equation we need to solve to find its eigenvalues and eigenvectors. That equation is called the characteristic equation, which is in the form of a determinant. Lambda is an unknown, which we'll be given in the solution to the equation. It has several values, which will be the eigenvalues we are looking for. As we noted in the last lecture, the identity matrix is just a diagonal matrix with ones in the diagonal positions. When multiplied by the scalar Lambda, it gives a matrix of zeros except with Lambda in each diagonal position. To solve the characteristic equation, we substitute for the covariance matrix as shown here. When matrices are added or subtracted, that is done element by element, also as seen here. For this simple two-dimensional case, the determinant is evaluated using the expression in gray on the right-hand side of the slide. When we use that formula, the quadratic equation in the unknown Lambda results as shown. For greater than two-dimensions, the evaluation of the determinant is less straightforward, but software procedures are available for that purpose. The characteristic equation has the two solutions shown. What do they represent? The values of Lambda are the eigenvalues of the covariance matrix of the data that is the pixels in the original coordinate system, that is in the original set of bands. We can therefore write down immediately the covariance matrix in the new principal components coordinates, which remember, is the diagonal matrix of eigenvalues. This tells us that the variance of the data in the first of the new principal coordinate axis is 2.67 and that in the second principal component is 0.33. Thus, we expect to see the data scattered predominantly along the first principal component as expected from the diagram we used initially to start our thinking about the principal components transformation. It is common to express the amount of the variance accounted for by each of the principal components. Here we see that the first component accounts for 89% of the scatter of the data. Having found the eigenvalues, we can now proceed to find the corresponding eigenvectors. When we have done so, we will then be able to form the transformation matrix J. There are as many eigenvectors as there are eigenvalues, one corresponding to each. Let's start with the largest eigenvalue, which we now call Lambda_1. The corresponding eigenvector is a solution to the vector equation shown on this slide. When we substitute in the values we know, we get a pair of simultaneous equations in the unknown components of the eigenvector. Both the equations are in fact the same and yield the single relationship between the two eigenvector components shown. We now introduce a new bit of information that is that the eigenvectors have unit magnitude, as seen in the middle of the slide. The magnitude of a vector is the square root of the sum of the squares of the elements. Since that sum has to be unity, we don't need to worry about the square root operation. It is sufficient to say that the sum of the vector elements squared has to be unity. When we substitute into that the solution to the eigenvector equation above, we now have unique values for the two components of the eigenvector, allowing the eigenvector to be written as shown in the second last equation on the slide. If we go through the same process using the second eigenvalue, we get the last equation on the slide. We now write the two eigenvectors side-by-side in matrix form and then transpose that matrix to give the required matrix, which transforms the original data-set into the set of principal components. The second half of the slide applies to transformation matrix to the data-set we are working with, yielding the new brightness values for the pixels in the new y coordinates. Here we see the data plotted in the new principal components coordinates. Even visually, we can assess that the data shows no obvious correlation. Its maximum spread variance is in the first-principle direction and the second largest spread is along the second principle axis. Note that the scales of the abscissa and ordinate are different here, which tends visually to mask the fact that the variance horizontally is much greater than the variance vertically. If we were dealing with data of high dimensionality, that trend of decreasing spread would continue with each subsequent component having progressively less variance. We now need to see what all these means in the context of a real image example, which is the subject of the next lecture. Here, we reiterate the essential steps in producing a principal components transformation. The first two questions here are just to test your understanding of the meaning of the eigenvalues in principal components analysis. The last question is particularly important and will arise time and again whenever you use principal components analysis. It is a common question in any situation where we see to discard low variance components of a transformed data-set, not just with PCAs, but with other transforms as well that can press data variance into a small number of components.