All right. So, now you getting the hang of these, time for yet another quiz. Are these features that I'm going to show you noble at prediction time or not? All right. So, let's look at our discount coupon code case again. First up, the total number of discountable items that have been sold. Well, how long of a period are we looking at this number for, this total number? How long does it take for us to get to actually that number, that value? This is not a yes or no answer. There is a question that you need to ask before you can even consider using it as an input. So, our number one is a lot more clarification. Now number two, the total number of discountable items sold in the previous month. Yeah, this is getting a lot closer where we want to be. This seems like something they should have available to you at prediction time. So, let's see if there is a way of defining these things. If it's something as vague as the total number of discountable items sold for eternity, that's way too vague. You don't have the time period. You don't know how long it takes to collect all these items. But if you make a lot more practical like this one, the total number of discoutable items sold on the previous month, sure, that's something we can definitely work with. At this point, you defined it in a way that you can ultimately have it. And of course the time frame is going to depend on the latency in your system. So, that is a prompt for you to find out the answers to these types of things. How long does it actually take for you to get this data in before you can use it in real time? Last one, the number of customers who have viewed ads about a particular item that you have. Again, this is ultimately a question about timing. How long does it take for you to get the ads analysis back from your other systems before we can ultimately potentially use it inside of your prediction model? Here's another quiz. This one is about fraudulent credit card transactions, and whether or not, these features will be known at prediction time. First up, is whether or not a cardholder has purchased these items before from our store. Again, we're going to define this very, very carefully. What you might find out is that your credit card system takes three days to process before you can see which customers have purchased what items in your data warehouse. So, what this really means is that when somebody uses a credit card, we don't know about it immediately, because it takes the store three days to actually send in the transaction to your warehouse. So, if it takes three days before we'll have that data on hand during prediction, when we do our model training, we have to train the data as of three days ago. This is really important. So, let's talk you through this one a little bit more. You can't train with current data and then predict with stale data. So, if you go into your data warehouse for training, you can't use all the values for a customer's credit card history, because not all those values are going to be available at the same time. So, what you have to do is actually modify your training data inside of your warehouse to be as of three days ago, right? To reflect that lag. And the key point is that you have to train with stale data if stale data is all that you're going to have during prediction in a real time. So, let's do a little thought experiment. So, you're doing a prediction on May 15th, the data in your database is only going to be current as of May 12th at prediction time. Which means that during training, you're training on data save for February 12th that you had used for your input. You can only train with the number of times that their credit card has been used, as of February 9th. Again, that's three days lag. You have to correspondingly correct for this staleness few data in prediction through into your training. So, if you train your model assuming that you know exactly the data down into the second or the minute for your credit card transactions at prediction time, you won't have a highly performing machine learning model just because the lags won't correspond between prediction and training. So, you really have to think about the temporal nature of all the input variables that you're using. Okay. Onto the next one, is the item new at the store? Well, if it's new it can't have been purchased before. Yeah sure. This is a great feature. This is something that you should know from your catalog immediately. It's a perfectly valid input. Next up, the category the item being purchased. No problem. This is a super easy one. We'll know it at prediction time. We'll know if it's a grocery at item or if it's an apparel item and electronics item, we can look it up in real time. Now, whether it's an online purchase or an in-store in person purchase. Absolutely. Yeah, we'll know this thing too in real time. It's not a problem, so, let's use it. Again, think of the timing nature for a lot of these things and what other systems could be involved.