With this set of lectures on accuracy assessment, we come to the end of our material on general remote sensing. Before then completing the course, are set of lectures on remote sensing with imaging radar. Assessing how a classifier performs and the accuracy of the thematic map it generates, is one of the most important topics in operational thematic mapping in remote sensing. We will see that classifier performance and map accuracy are not the same things. This is a crucial matter in practical remote sensing. If significant economic decisions are going to be based on the thematic maps in class hectarages generated by the machine learning techniques we have covered in this course, then we have to be as sure as we can that we are interpreting the results correctly. In these series of lectures, we will look at how we assess the performance of a classifier. The relationship between classifier performance and the accuracy of a thematic map. The number of testing samples that should be used to test thematic map accuracy, and the number of testing samples needed to estimate class areas accurately. Recall from our previous work on machine learning methods that classifier performance is assessed by selecting a sample of testing pixels previously unseen by the classifier. The algorithm is applied to that set to check the performance of the classifier. In selecting the set of testing pixels, the analyst has to ensure that biases are not introduced by the fact that the aerial sizes of the classes might be quite different. If care is not taken, more testing pixels in a random sample will come from large classes, whereas smaller classes will have fewer pixels with which to test the operation of the classifier. We will assume we have avoided such a bias in what we're now going to develop. The set of testing pixels is often referred to as reference data, or less frequently nowadays as ground truth. The same terms are generally applied to training data. It is common to express the results of testing the classifier by setting up an error matrix as on the next slide. In the past, it was sometimes called a contingency matrix or a confusion matrix, terms you may still find in use. Here we see an error matrix which has been compiled from a thematic mapping exercise involving just three classes, A, B, and C. The cells are populated according to whether the pixels have been placed into their correct class, assessed by comparing the classifier output with the reference data or whether they have been put into the wrong class. That is, a classification error has been made. The rows represent the classifier. For example, it put 39 pixels in the class I. But when we look at what the reference data tells us for each of those pixels, we see that some were really from class B and some from class C. The same thing happens for its labeling of pixels as class B pixels and class C pixels. Down the columns, we see how each of the pixels in the reference data has been handled by class. For example, in the first column, the 50 pixels that are class A in the reference data have not all been put into that class by the classifier. Some have been put into class B and others into class C. If there were no classification errors at all, the matrix would be diagonal with zeros for all the off-diagonal entries. In such a case, the classifier and reference data agree on the label for every pixel. In this slide, we just emphasize that the rows tell us about the classifier outputs. Remember it has labeled 39 pixels as class A, 50 as class B, and 47 as class C pixels. So that is how many pixels for each class we will find on the thematic map. In contrast, the column sums, as we noted on the previous slide are the numbers of pixels in the reference dataset for each class. We now introduce some new nomenclature. In placing some class A pixels into the other classes, the classifier has committed errors. Say we refer to the off-diagonal entries along the same row as errors of commission. The errors down a column represent pixels from the reference data that the classifier has failed to label property. We call those errors of omission. We now need to ask ourselves the question of how to determine the accuracy of a classification exercise, using the entries in the error matrix. Well, the answer actually depends on whether you are the user of the thematic map produced by the classifier or the analyst who ran the classifier to label the pixels which led to the map. They are not the same. A measure of how well the classifier has performed is how well it has labeled the reference pixels. If we choose Class B as an example, the classifier was presented with 40 class B testing pixels and got 37 of them right. That is an accuracy of 92.5 percent. We call this the producer's accuracy since it is the accuracy with which the results were produced. However, the user of the thematic map is interested in how many pixels labeled B on the map are correct. There are 50 class B pixels on the map, only 37 of which are correct. The others are the result of the classifier committing errors on class A and class C pixels. We call this user's accuracy. For this example, it is 74 percent. Here we see the full set of user's and producer's accuracies, along with a commonly used measure of classifier performance. That is the overall accuracy, which is computed as the total number of pixels labeled correctly by the classifier as a fraction of the total number of pixels in the image. Note the range of producers and users' accuracies compared with the average accuracy. There is an alternative technique used for assessing the accuracy of a classifier, which does not use a separate set of testing or reference pixels. It is called cross validation and involves the following steps. First, a single labeled set of reference pixels is used for both training and testing. It is divided into k separate, equally sized subsets. Next, one subset is put aside for accuracy testing, and the classifier is trained on the pixels in the remaining k minus 1 sets. The process is then repeated k times with each of the k subsets excluded in rotation. At the end of those k trials, k different checks of classification accuracy have been generated. The final classification accuracy is the average of the k trial outcomes. A variation to this approach is to use subsets of size one. It is then called the leave one out approach. There will be as many trials as are reference pixels. Again, after all the trials have been performed, we will have an estimate of accuracy. By way of summary. Note that first, correctly assessing the accuracy of a thematic map is an essential step in operational remote sensing. Next, accuracy assessment is carried out using labeled testing pixels better referred to as reference data. The error matrix summarizes the performance information needed to assess class accuracy. There is a difference between producers' accuracy. That is, how well the classifier has performed, and the user's accuracy. That is the accuracy of the thematic map produced by the classifier. Cross validation is an alternative means for assessing classifier performance. Finally, the leave one out approach is a special case of cross validation often found in practice. These questions direct your attention to understanding fully the concepts involved in assessing classifiers and thematic maps. Pay particular attention to the third question about the cost overheads of the cross validation approach.