As we said before, if you can't do basic analysis in your data, you can't do machine learning. So in this topic, we're going to focus exclusively on how to create and manipulate features from your raw data. Welcome to Feature Engineering, recall that we said there are three things that you need to do to build an effective machine learning model. First, you need to scale it out to large datasets, we just looked at that with Cloud ML. The second thing you need to do is what's called Feature Engineering. So in this module, we're going to talk about is how to create those good features and how to transform your inputs to get them ready for a machine learning model. What we'll also look at is creating synthetic features, which are features that aren't in your data set originally to begin with, but are going to make your model perform a lot better. We'll take a look at this, creating good features, transforming them, creating synthetic features, together these three things are called preprocessing. So we'll take a look at how to do preprocessing within the context of Cloud ML, which allow you to do it at scale. After you built a model, also look at hyper parameter tuning. It's the way to make these features better in the context of this dataset, they are ultimately going to be training against. So let's first start with how you can turn your raw data into a useful feature vectors, that can then be used properly inside your ML models. So, let's take a problem. So your end objective is to build a model to predict the price of a house for a given set of inputs. What types of data points would you want to know about this house to begin with? Somebody said things like the square footage of the house, maybe the size of the land. What about the number of the rooms? Or if it was sold in the past, how much was it sold for? You've probably already guessed that location, location, location could be a prime influencer of housing prices. For me, in the California Bay Area, I'm painfully aware. Wouldn't it be great if your raw data for this housing data is already clean and just the key fields that you need are going to be there? Oh, and also, it's in a format that you can just pass in your ML model for training? Well, I hate to break it to you, it's just never going to be the case. Good Feature Engineering, this process that we're going to go through, can take an average of 50 to 75 percent of the time that you working on an ML project. We haven't even started on the ML hour of them side, right? This is just getting the data right, and it's critical that we do so. Well, we ultimately want to do here, shown in a quick example, got raw data for houses on the left inside of a vector, you need to map it to one or more fields in the right, in a parallel, this is how we can use it inside of our ML model for training. So this might look like an easy mapping exercise for some of you. But wait, how do you even know what features to use or what makes a good feature in the first place?