In this lecture, we will investigate different strategies for extracting features from temporal or possibly seasonal data, and we'll extend the concept of one-hot-encodings introduced previously, to represent temporal information. So, to motivate this problem, imagine trying to build regression models that incorporated features like the following: carried predict sales or preferences as a function of time. What are the long-term trends of styles and what are the short-term trends and sales of just the day of the week or the season? In principle the same is fairly straightforward. For example, if we were trying to model ratings as a function of time, we might observe data like the following, where we observe different timestamps on the x-axis as well as ratings on the y-axis. We have different observations of time rating pairs, and we can try to model a line of best-fit that explains the relationship between them. So it looks okay compared to our previous features where we saw big issues trying to model categorical data in the same way. But, here we're trying to bring a real valued quantity from real value data, assuming we perform the correct version of the date string to a number, so everything seems fine. Okay, so what would happen if for example we try to try and predict based on the month of the year how ratings looked. So to do this, let's first try a simple feature representation. For example, we will map the month name to a month number. So all I'm doing here is saying that the rating is equal to Theta naught plus Theta one times month, where to encode the month, I'm just saying January equals zero, February equals one, March equals two, April equals three, et cetera. Okay. Then we'll have a model that looks something like this. Our rating is equal to Theta naught plus Theta one times month. On the x-axis, I have all the 12 months encoded as a number from January equals zero up to December equals 11, and I'll call some line of best fit explaining the observations I get of rating as a function of month. So that seems perfectly reasonable, until I try extending my data to look at multiple years at the same time. So in this case what the model is really doing is it's not just fitting this line, but the line is going to wrap back around on December 31st and January first value. So all I've really done in this picture is make multiple copies of the data and put them next to each other for visualization purposes, just to show you by fitting a line, I'm really making this weird assumption about what happens when I wrap back around from a feature value of 11, back to a feature value of zero, which will happen between December 31st and January one. So essentially by accident, misrepresentation is implied that the model would wrap around from December 31st to its January first value, when the feature value changes from 11 back to zero. So most likely, this type of sawtooth pattern probably isn't very realistic. What the model would essentially be saying then is that ratings go up or they go down as a function of a time throughout the year, but then between December 31st and January 1st something very weird happens and they change very abruptly. So can we come up with a more realistic shape to capture data like this? Again, for visualization purposes, I've just shown multiple copies of the data next to each other, to cover two year period. So maybe you would think well, this is periodic data. So I cannot try representing it using a periodic function, like a trigonometric function. I might write down something like the following. I would say rating equals Theta naught plus Theta one times sine of Alpha plus month times 30. So really all I'm doing is using Alpha and Theta one to control the phase and magnitude of the sine wave. Okay so again, that's a valid solution. It's perfectly reasonable to try and fit some periodic function like a sine wave. But it's going to be really difficult to get this right and to fairly and flexible approach, also is not a linear model. Once we stop putting parameters inside trigonometric function like that, we're no longer going to be able to solve some simple system that matrix equations to get our unknowns. So is there a class of functions that we can use to capture more flexible variety of shapes? One possible answer would be to look at piecewise functions. Rather than using a trigonometric function, could we fit a function like following, a piecewise function on our 12 different months or our 12 different feature values for the months, would look like the following. We would have one prediction given every month value, and we can say how that could fit, some fairly large class of different types of shapes where we had trends increasing for the part of the year and then decreasing for not a part of the year, and then wrapping around to the January values at the end of December. So in fact, this is an easy thing to do. It's even a linear model. So the function would look like the following. We'd say that ratings is equal to Theta naught plus Theta one If it's February, Theta two if it's March, Theta three if it's April, et cetera. So in this equation, I will introduce some new notation, which is the Delta function which will take the value of February for Delta is Feb and zero if it's not February. So note again, like with the previous lecture on categorical features, we don't need to have a special dimension for January because January is going to be captured by the offset time Theta naught. So essentially, in this case Theta naught would say what is the prediction for January, Theta one would say what's the difference in prediction for February versus January, Theta two would say what's the difference in prediction for March versus January, et cetera. So equivalently, just to make it clear that this really is linear model, we can write it down as an inner product like the following where we would say the rating is equal to Theta dot product with x for our feature vector x is given by this 12 dimensional vector with ones in the first position for our offset time, and then an additional one in the second position if it's February, in the third position if it's March, in the fourth position if it's April, et cetera. So again, this is going to be a one-hot-encoding, exactly like we saw in the categorical features lecture. So this is a very flexible type of feature representation. It's going to be able to handle complex shapes, periodicity, just by using the one-hot-encoding to fit a piecewise function. If we wanted, we can easily increase or decrease the resolution of this function for a week or an entire season, just by changing how fine grained our encoding was. Also, we could extend this by combining multiple types of encodings together. We might think that seasonality in some data set at the level of a week and there's also seasonality at the level of a year. In other words, maybe people who make purchases have certain preferences depending on whether it's Monday or Friday, and they also have certain preferences depending on whether it's winter or summer. So we can easily combine those two things together just by concatenating two one-hot-encodings. So here we would just say the rating is equal to Theta dot product with x1 concatenated with x2, where x1 was our one-hot-encoding for the month, and x2 is our one-hot-encoding for the day. Okay, so to summarize, we have motivated the use of piecewise functions as a means of modeling temporal or periodic data, and we've described how one-hot-encodings can be used to do this. So on your own, you might think about the types of piecewise functions you would use to model demand in Amazon. So if you're trying to predict demand, is it going to be important to capture the day of the week or the day of the month, and how would you incorporate one-off events like significant holidays? How do you extend your one-hot-encoding to account for the day being Christmas, or New Years, or July fourth?