We are now at the stage in our course where we can start seriously looking at how to interpret images recorded by remote sensing missions, particularly using computer-based algorithms. Ultimately, this will take us into a mathematical journey, sometimes quite complex, but we will ensure that if you do not have the required mathematical background, the accompanying description should be sufficient for you to understand how the techniques operate. We start though with our descriptive outline of the essential aspects of computer image analysis, which is often called classification or sometimes quantitative analysis in remote sensing. The first concept to keep in mind is that we almost always work at the level of the individual pixel. There are procedures that take a different approach, but the great many that we will encounter focus on the individual pixel because it is the smallest quantifiable element in an image. Remember, for each pixel, we have a set of recordings composed of the brightness values detected in each of the individual wavebands. That set of measurements is collected together and written in the form of a column surrounded by square brackets, which we call a column vector or a pixel vector. In classification, we use the properties of the pixel vector to help find the label for the pixel, a label that represents one of the classes of ground cover in which we might be interested. For example, if we wanted to use remote sensing to map the natural landscape, we might employ computer classification to analyze each pixel in the image and label them as grass, soil, water, forest, and so on. Once all pixels are labeled, we then have a map of cover types which we call a thematic map because it is a map of themes or classes. The labels are specified by the end user. What the image analyst has to do is come up with a computer based approach that allows those labels to be applied to the pixels through an analysis of the pixel vectors. In this slide, we developed several of the most important concepts in the image analysis. To do so, we have assumed for convenience that the sensor we are dealing with has only two wave bands, one in the visible red and one in the near infrared. Again, for convenience, we assume that the scene being imaged by the remote sensing instrument consists of just three natural cover types, vegetation, soil, and water. Invariably, the first step in any classification process is for the analyst to find sets of pixels in the recorded image for which the cover type is known. In the upper left-hand depiction of the scene, we have indicated by the shaded shapes that the analyst knows that the pixels in those regions are actually pixels of vegetation, soil, and water, as indicated. We now need to remember something from the previous lecture, and that is the shapes of the spectral reflectance curves for the three cover types of interest. They are shown on the bottom left-hand side of the slide, and in each case, several examples of reflectance curves are given indicating that in practice there is natural variability among pixels, even of the same color type. If our sensor takes measurements in the red and the infrared, that amounts to sampling the cover type reflectance curves at those wavelengths as seen in the figure. For the vegetation pixels, the red samples will have low brightness, whereas the infrared samples will have high brightness values. By contrast, the samples for soil would be moderately bright in both bands and those for water will be moderately dark in both bands. We now introduce one of the most important concepts in classification, the representation of pixels in a geometric space to by the sets of spectral samples. Such a space is shown on the right-hand side of the slide, in which individual pixels are plotted according to their sets of brightness values. This is just a Cartesian coordinate system in which the axes are the pixel brightness values. If we now return to the sets of pixels for which we know the correct labels, we can plot them in our coordinate system as shown on the diagram. Because pixels from the same cover type will have similar spectral reflectance curves. When they are plotted in our coordinate system, they will tend to group or cluster as indicated. Effectively, those pixels for which we know the labels to find regions in the coordinate system as their own. There is a region in which we expect to find water pixels, a region in which we expect to find pixels of vegetation, and a region in which we expect to fine pixels of soil. In other words, the pixels for which we know the labels effectively segment the coordinate system into a set of regions corresponding to the classes of interest. We could make those regions explicit by drawing lines or boundaries that separate the classes as indicated. We have now introduced some terminology. Besides pixel vector, we use the term class to describe the particular cover type on the ground and we call the coordinate system a spectral space. The process of using pixels, for which the class labels are known to find regions in the spectral space corresponding to each class is called training. That is the first step in classification. That is to use known data to find corresponding regions in this spectral space. How does the analyst know the labels for some pixels? That can sometimes be a long and expensive task depending on the complexity of the exercise. Often, field visits are required, so that the class labels for groups of image pixels can be found. Sometimes, particularly in simpler classification exercises, human image interpretation or even the analysis of accompanying photos can be used to establish the training pixels. Finally, this example has been based on a two-band sensor, which has led to a two-dimensional spectral space. In practice, of course, the dimensionality can be much larger. A sensor with 10 wavebands will generate a 10-dimensional spectral space. While we can easily envisage a two and even three-dimensional space, we are not equipped mentally with the capacity to go beyond that, but that is not a problem. While we will develop many of our techniques on two and three-dimensional examples, the corresponding models and mathematics are easily extensible to any dimension, so all we have to do is develop our ideas in say, two dimensions, and understand that mathematically, any number of dimensions or wavebands can be handled. We can now use the results of the training stage of the previous slide to help us add labels to all of the other pixels in an image, that is, those we do not currently have labels for. This slide shows the process. We take an unknown pixel and plot it in the spectral space according to its brightness values. Because using training pixels, we previously had segmented the spectral space into regions representing each of the cover types, we can now add a label to the new pixel by seeing where it falls in the space. In this particular case, we have labeled a vegetation pixel. As we noted, the step in the previous slide was called training. This second step is called classification, labeling or sometimes generalization. In summary, classification or quantitative analysis is a two-stage process. The first is understanding the structure of the spectral space by using sets of training pixels. The second step is to use the results of the training to enable us to add labels to pixels we do not yet know about. Looking at the diagrams in this lecture, we could create a simple classifier by setting up the equations which describe the lines which separate the different segments in spectral space. Indeed, a very simple and early classifier did just that by making the lines midway between the mean positions of two groups of pixels. See the quiz question following. We're going to look at much more complex classifiers in this course because of their improved performance. The questions in this quiz focus your attention on the structure of the spectral space. Even though most of the time, the dimensionality is too high for us to envisage, it is still nevertheless important to understand as much as possible the structure of the data domain with which we are dealing.