So now let's compare good versus bad features. So, what makes a good feature? Well, you want to take your raw data and represent it in a form that's amenable to machine learning. So ultimately, a good feature has to be related to the objective, you can't just throw a random data in there, that would just make the ML problem harder. And the idea is to make the problem easier, right? Easier for something to you to find a solution for. So, it's not something related to all thing what you're trying to do, throw that data field away. You have to make sure that it's known at production time, this can be surprisingly tricky, we'll talk about some instances of this. Third, it's got to be numeric. Fourth, you've got to have enough examples for it in your data set, and lastly, you need to bring in your own human insights into the problem. So let's start with the first one. First off, a good feature needs to be related to what you're actually predicting, since it has some kind of reasonable hypothesis of why a particular feature might matter for this particular problem. Don't just throw arbitrary data in there and just hope that you can get some kind of relationship out of it. You don't want to do what's called data dredging, you don't want to dredge your large data set and find whatever spurious correlations might exist, because the larger the data set is, the more likely it is that there is a lot of these spurious correlations, and your ML model would just get confused with this mass of data you're throwing out. For a housing example, just because we have a data point on whether chairs exist on the porch, and a house photo, or how many concrete blocks make up the driveway, doesn't mean that we should include them in our housing model, just because we have those data points. Show some reasonable idea of why these things, why these data points and these features could actually affect the outcome. The outcome is basically what's represented by this label that we're putting them, and you have to have some reasonable idea of why they could be related to the output. So, why would concrete blocks in the driveway affect the ultimate price of a house? Does that make sense? No. Now, you might be thinking that if you can tell if a driveway had cracks in it from the photo, that could be a good feature for a housing problem, keep that in mind we're going to come back to that later. So, what are the good features shown here for this horse problem? If you said it depends on what you're predicting, you're exactly right, and you paid attention to me for the last five minutes. If the objective is to find what features make a good race horse, you might want to go with the data points on breed and age. However, if your objective was to determine if the horses are more predisposed to eye disease, eye color may also be a completely valid feature. The key learning here is that different problems in the same domain, may need different features, and it depends on you, and your subject matter expertise to determine which fields you want to start with for your hypothesis.