So in a previous lecture, we introduced some code and some equations that allowed us to balance model performance versus model complexity. So we had some model that said, how accurate is the model versus how complex is the model. In-between those two things, we had some trade-off parameter. We didn't know how to set the trade-off parameter yet. So the purpose of this lecture is going to be to introduce the concept of a validation set, which we might use to select those trade-off parameters, and then we'll explore the relationship between parameters like Theta and these hyperparameters like Lambda, which is a trade-off, and finally, we'll introduce the complete training validation and test pipeline. Okay. So just to recap what we saw in the last few lectures. We saw how the training set can be used to evaluate model performance, but it can only do so on data that we've seen before. We need to introduce a test set if we'd like to estimate how well will the model actually generalize to unseen data, and we saw how a regularizer can be used to mitigate overfitting. In other words, how I can balance or trade-off model performance versus model complexity. So specifically, how we did that in a previous lecture, was to optimize an equation to looks like the following. We have on the left-hand side of this equation, a mean squared error, which says essentially how accurate is a particular model defined by the parameter vector Theta, and on the right-hand side, we have this term that penalizes model complexity. In this case, we're penalizing the sum of squared errors which would encourage our model to choose parameters that are approximately uniform or close to zero. We could instead use something like the sum of absolute values. But in any case, we have one part of the model that rewards accuracy and another part of the model which penalizes complexity. So we would like both of these things to be low. The MSE should be low or the model should be accurate, and the complexity should be low. So the right-hand side of the equation should be small as well. Then, we have this value in the middle, Lambda, which trades-off how much accuracy we want versus how much complexity we want. Okay. So what value should Lambda take? We would like some value of this regularization parameter. It says we want good model accuracy or low MSE on the left-hand side. We also want low complexity or a low sum of squared parameters on the right-hand side. Okay. So to get around this, we'd have to introduce a third partition of our data, which is going to be similar to the test set, in the sense that it's not actually used to train our model or select the value of Theta. But we are going to be allowed to look at this set multiple times in order to choose the best hyperparameters like Lambda, and this is going to be called a validation set. Okay. So comparing this to the training and test sets we developed before, we now have three petitions of our data; a training set, a validation set, and a test set. Again, if we have some model which has a feature matrix x and a label vector y, the training, validation, and test data will just correspond to three different partitions of that matrix. So some fraction of that matrix is used for training, some fraction for validation, and some fraction for testing, and the same corresponding fraction of the label vector will be used for training validation and testing. Okay. So just to summarize what the purposes of these data sets are. The training data is going to be used to optimize our model, it's going to be used to choose the best possible values of Theta given some particular value of Lambda. So given a value of Lambda which trades-off model accuracy versus model complexity, the training data will be used to select the best value of Theta for that trade-off level. Now, the validation data will be used say, okay, I've just trained to model on my training set, how well do I think is going to perform on your datas? Roughly similar to a measure of the model's generalization ability. The only difference is we're allowed to look at this data multiple times. Is not the true held out performance, it's just similar to a held out data set. So we can keep looking at the performance on our validation data, to say which value of Lambda is going to be the best one in terms of our accuracy, complexity, trade-off. Then only once we've selected our favorite or the best value of Lambda in terms of its validation performance, do we then look at the test accuracy. So we're really only allowed to look at our test data once if we truly want to measure generalization ability in order to evaluate the model right at the very end. Okay. So in this lecture, we introduced this concept known as a validation set, which we can use tune what are called hyperparameters. So these are parameters that are not selected from the training set, but rather they tend to correspond to things like trade-off parameters or different modal choices we might make, and we'd like to evaluate the estimated generalization ability using some held out data. Okay. So in the following lectures, we'll look at the full pipeline of how training, validation, and test sets can be used to optimize model performance.