In this lecture, we'll introduce the concept of training versus testing. So in a previous lecture, we saw that there could be some dangers in evaluating a model on the same data that was used to train it. In other words, we could be overly confident of correctness about model and actually find that it's not going to be very accurate when we apply it to unseen data. So in this lecture, we'll discuss the importance of evaluating models on fresh unseen data, and I will also show how to adjust to the Python code we saw previously to make use of these new ideas. Okay. So previously, we saw a simple classification example where we got very good performance of our classifier. But we saw one possible issue, we were evaluating a classifier on exactly the same data that was used to train the system and that may lead to overestimation of its performance. You may have just for instance memorize the data, but actually not performed so well on new data. So really what we would like to know, is how well the method is going to work on unseen data. Okay. So in order to estimate this, we would like to split our data set into two components. One is going to be called the training set, which we'll use to train our machine learning model just like we did before. But now have a new set called the test set, which is going to be used to estimate how well the model is likely to perform once we expose it to new data. So in course three, we're going to estimate both of these concepts in lot more detail. But for the moment, we'll just quickly try and adopt a previous Python code to incorporate these ideas and let's see how much effect it really has. Does it actually influence how predictions of how good our classifier is? Okay. So first of all, read our dataset as we did in previous lectures. I'm reading this dataset, I'm skipping the first few lines until I reach the datasets real head up. Then I'm reading it line by line. I'm inverting all of the features of floating point values and I'm converting the label to a bool; true or false value. All right. Now, the first thing we want to do in order to build a training and test sets, is to randomly shuffle out data. What you would not want to do for example, is to say well my training set is equal to the first half of my data and my testing set is equal to the second half of my data. Because they could be distinct characteristics between this two samples of your datasets. Imagine for instance, if your dataset was sorted alphabetically or was sorted by time or was grouped in this instance by different types of companies. It could be even the case that all of the companies that went bankrupt appear in the first half of the data and none of the companies that went bankrupt appear in the second half of the data. In that case, it will be totally invalid. Try and train a model on the first half and evaluate it on the second half, and then think that the evaluation you got from the second half was representative of the model's overall performance. Instead, you want these two samples to look roughly similar to each other, you want them to contain independent samples of the data points. So the way we do that, is to randomly shuffle our data set and then build two samples that each contain half of the data. Okay. So now that we have our randomly shuffled dataset, we build our training and test instances. So we build a training and test set, both the features and the labels. We get X train and y train and X test and y test, just by taking half of our shuffled data to belong to each sample. Okay, and now we train to model just as we did before. But now we're only using the training data and the training labels in order to train our model. So we take an instance of logistic regression clause from the sklearn library, and we train our model on X train and y train. So we now have a model, that's going to have a value of theta, somewhere internal agency implementation. But the point is, never seen the testing features, all the testing labels in order to fit this model. So now we can compute the accuracy like we did before, but this time we're going to compute the accuracy separately for the training data which was used to train the model and the test portion which has never been seen before. Okay. So we make predictions for both by calling this dot predict function on the training features and on the test features. We can now see what fraction of them are correct for the training sample and the testing sample, and we get now two different accuracy measurements. The first accuracy measurement was similar to what we had before, where we're saying, how well does the model make predictions on the same data that was used to train the model. The second is a better estimate or measurement of how well the model is likely to perform on new data that was not seen previously during training. Okay. So just to summarize, in this lecture, I presented a brief introduction to training and testing. We'll cover a lot more of this in course three, but the basic concepts are the following. Just training a model on some dataset does not give us a sense of how well the model will generalize to new data. A test set is going to be used to help us to estimate the generalization ability. When we build these training and test sets, we want non-overlapping random splits of our data to do so in order to get a representative estimate of how well the model will perform on seen data. So this lecture has introduced the concepts of training and testing sets, we described this idea of training performance versus generalization ability, and we showed how to adopt a Python code to measure performance on the training set and the test set separately. So on your own, I would suggest trying to repeat this experiment for regression examples we saw before. In other words, take a Python regression code and try splitting that into a training and a test set, and then measure the performance perhaps in terms of the sum of squared errors between the training and the test samples.