We now come to the 4th of the classification techniques, we're going to consider in this course. We will develop it in two stages. In its original form, and its return in a more powerful inflexible form, over the past five to 10 years. The neural network, sometimes called the artificial neural network. Was popular in remote sensing in the 1990s. But be cause it was complex to design, and required significant computational power. It was overtaken by other techniques, such as a support vector machine. However, with some simple modifications that lead to improvements in performance and training. It is gained popularity again, over the past decade. Now called convolutional neural networks. These later variants also go under the name of deep learning. All that that description could just as easily have applied to the neural network in its original form. As with the support vector machine, we start by a return to the simple linear classifier. Again, we will do our development with a simple 2 dimensional, 2 class situation. But it will generalize to any number of classes, and dimensions. And again, we use our standard decision rule for linear decisions. Which will now represent diagrammatically, in the form of what is called a threshold logic unit or TLU. In this diagram, the elements up to the thresholding block, create the linear function used in the decision rule. The thresholding operation, then chicks with the value of the function is positive or negative. And thus whether the Pixel Victor is in class 1 or class 2, as required by the algebraic expression of the decision rule. Sometimes we represent the overall operation, or the single block called the tail u. As noted on the previous slide, this was one of the building blocks used in early machine learning theory. That led to a classification machine called the perceptron. It is also where we start the development of the neural network. A breakthrough came when the hard limiting operation in the TLU, also replaced by softer function. And in particular, one that could be differentiated. As we will see, that allows a training procedure to be derived. Which is not otherwise possible. We call the Soft limiting operation an activation function. Typical examples, include the inverse exponential operation, or sigmoid. And the hyperbolic tangent as shown in this slide. As seeing the behavior still represents what we want, in terms of specifying the class to which a pixel belongs. Because it implements our decision rule, but without a hard limit. The old TLU but with a soft limiting operation, is now called a processing element or PE. In the nomeclature of neural networks, we also replace the offset W subscript N plus 1, by the symbol theta. But in the bottom right hand drawing, that we will write the output of the processing element as g equals a function of z. Where the function f, is the chosen form of the activation function. And z is the linear function in our normal decision rule. The classical neural network, and that which was applied wildly in remote sensing. Is called the multilayer perceptron, or MLP. And is composed of layers of processing elements, which are fully connected with each other as seen in this slide. The blocks in the first layer, are not actually preciously elements. They just distribute the input pixel vector elements, to each of the processing elements in the next layer. The outputs of those processing elements, then form the inputs to another layer of PEs. And so on, for as many layers as chosen by the user. The outputs from the last layer of pairs, determine the class of the pixel vector fed into the first layer. Now, the user can choose how that is done. Options are, that each class could be represented by a single output. Or the set of outputs could represent, for example, a binary code that specifies the class. Note the nomeclature used with the layers of a neural network. In particular, the first layer which does any real analysis, is the first hidden layer. Note also the litter designations we apply to the layers. And while we have shown only one, there can be many hidden layers. Each being fed from the outputs of the previous layer. As the number of hidden layers increases, training the neural network becomes increasingly time consuming. In many remote sensing exercises of several decades ago, only one hidden layer was used. And found sufficient to handle many situations. But training still took along time, as we will see later. Having chosen autopology, we now need to work out how to find its unknown parameters. That is, the weights W, and the offsets theta for each processing element. As with all supervised classifiers, that is done through training, unlabeled reference data. But to make that possible, we need to understand the network equations. So the training process can be derived. That is now our immediate objective. The equations describing each processing element, of those we noted earlier that is G. Is a function of WTX plus theta. And remember, WTX plus theta is the linear function. However, we need a naming convention. To keep track of where the inputs come from, and where the outputs go. We do that by adding subscripts to the white, and offsets as shown here. And nomeclature is simplified in that is layer specific, but not PE specific. We could add a third subscript to indicate each actual PE in a layer. But that turns out to be unnecessary. We can derive the relevant network equations for training, without going to that added complexity. To start developing their training procedure, we need a measure of what we're trying to achieve. Clearly, if the network is functioning well as a classifier. We want the output to be accurate, when the network is presented with the previously unseen pixel vector. We check the networks performance by setting up an error measure. And measure that looks at how closely the actual output matches what we expect. When a training pixel is fed into the network. We choose for that measure, the squared difference error measure. Shown in the center of the slide. Remember, the set of actual outputs of the network, energy subscript k. This measure, tells us how well those actual outputs match the desired or target outputs. t subscript k for a given training pixel vectors. In the next lecture, we will use that expression to help set up the necessary equations. With which we can train the neural network. Clearly, our objective is to find the unknowns, weights and offsets. That minimize the error measure. In other words, that might be the actual class labels match as nearly as possible. The correct or target labels. In this part of the course, we are examining the use of the popular multilayer perceptrons. As a classifier in remote sensing. It consists of layers of processing elements, where each PE contains a differentiable activation function. We are now at the stage where we can set up the network equations. The second question here is important. Because it starts to develop a feel for the complexity of training a neural network.