In this lecture we are going to look at Training Methods for Linear Classifiers, and in particular the Minimum Distance Method. We will then use that material to help us understand and develop training methods for more complex classification techniques. Remember, we are assuming at this stage that the two classes of pixel that we're dealing with are linearly separable. That is, a straight line can be placed between them. Many datasets, however, are not linearly separable. We will meet some later, but recall for the moment that the maximum likelihood classifier is able to separate datasets with at least quadratic hypersurfaces. There are many acceptable linear decision surfaces that can be placed between two linearly separable classes, as illustrated in this slide. One of the earliest methods for training which goes back to the 1960s involves choosing an arbitrary linear surface. That choice will almost certainly not be in their opposition, in that it will not have the classes on the right sides of the hyperplane, but then our repeated reference to each training pixel intern. The hyperplane is gradually iterated into an acceptable position. The book by Nilsson referenced here shows that method in detail. Know the restriction however, we are only dealing with two data classes. Will have to embed this method into some form of multi-class process later on. A better approach might be to choose as a separating hyperplane that which is the perpendicular bisector of the line which joins the means of the classes, as shown here. We can find that line as the locus of the points that are equidistant from the two class means. Note that we use them in nomenclature, d bracket x comma, m subscript to i close brackets to represent the distance vector between two points. If the distances are equal, then so will be there squares, saving the need to compute the expensive square root operation in software. So we are quite the squares of the two distances from their position x to the class means. Leading to the equation of the linear surface at the bottom of the slide. Not that it has the same structural form as the equation of a straight line that we are familiar with in two dimensions. That also that we have had to use two vector identities in coming to this result. Although we have computed the equation of the decision surface in the minimum distance rule, we actually don't compute the hyperplane explicitly. Instead, to label an unknown pixel, we just compare the distance squared to the class means and allocate the pixel to the class of the closest mean. That suggests that we can actually account for as many classes as we need to. In this slide we have shown three classes and given a general decision rule for the minimum distance classifier. So in summary, will the minimum distance classifier, one training data, is used to estimate the class means. Fewer pixels are needed compared with the maximum likelihood classifier, since now caviar its matrix estimation is required. Two, unknown pixels are allocated to or labeled as the class of the closest mean. Three, it is a multi-class technique. And four, it is fast in both training and classification. We are now at the stage where we can look in summary at the two classifiers we have treated so far and see the steps the user follows in practice. Although in our lectures, we have looked at the mathematics of the techniques, you do not need to know that material in practice. Although it does help you understand how the algorithms work and their comparative limitations. Note that three of the steps are common to both approaches, as highlighted in blue and will also be common to any of the classifiers we look at in this course. They are training, thematic mapping and accuracy assessment. The approaches differ only in how the classifiers are trained and how they are applied to unsent pixel data. If class conditional probabilities are used with the maximum likelihood method, there is an assumption that prior probabilities are available. Again, the software looks after that step, provided the user can assign values to the price. Note particularly the last step really in a real classification task, will the accuracy on all classes be acceptable the first time around. Instead, the analysts may have to consider whether some classes have been missed, leading to their pixels being miss allocated to another class. Or to some classes are too broad, spectrally and should perhaps be subdivided into constituent subclasses or spectral classes. Again, we will have more to say about this when we look at classification methodologies in module 3. We are going to meet a number, but not all classifiers that are used regularly in remote sensing. Some are quite complex and require a lot of user effort to make them work effectively. But they can give exceptionally good results. When selecting a method, though it is important not to go overboard in always choosing the newest and potentially the most complex one. Often the simple algorithms will work just as effectively in many real world applications. Remember what we said at the start of this course? Well, we are necessarily spending a lot of time developing classifier algorithms our course objectives are remote sensing and its applications. So we need to keep that in mind when evaluating classifier methods. Ultimately, we will have to embed them into a methodology which we will do in module 3. The first two questions here just test your knowledge of vector algebra while the last two, set you up for what is to come.