In this lecture I will demonstrate the use of transformations in order to incorporate non-linear functions into linear models. So to motivate this problem, imagine example like the following which we've been working on for several lectures, or we're trying to understand the relationship between weight versus height. So, so far solve this problem we've imagined trying to come up with the line of best fit, that explains the relationship between observed weight and height values. Coming up with this linear relationship seems perfectly reasonable for modeling this data or at least it seems okay, and in practice we'd probably get away with using this type of model most of the time. But, it certainly makes some assumptions that are not totally justified. For example, we know that the weight can never be below zero, and we know that even though weight is associated with high for awhile, it eventually levels off. So, is it somewhere around this? Can we fit in more suitable or more general functions to this kind of data? So, first of all, how should the right model look for weight versus height? Is a linear function? Is a quadratic polynomial function? Is an asymptotic function? So to begin with, let's just imagine trying to fit a polynomial function e.g. a cubic function to understand the relationship between height and weight. So here we've been given weight is equal to theta naught, plus theta one times height, plus theta two times height squared, plus theta three times height cubed. How do we do that? Actually this turns out to be still perfectly straightforward and we can already do this even using linear regression models, because it still takes the form of weight is equal to theta.product with x, where x is our feature vector. All we have to do is include these transforms versions of our feature in our feature vector so x would not just be height, then it will be height concatenated with height squared, and height cubed. If we expand that, we still get theta.x equals theta naught, plus theta one times height, theta two times height squared, and theta three times five cubed. Okay. So, what about more complex feature transformations, like imagine trying to fit a model? The estimated weight is a function of theta naught, theta one times height, theta two times height squared, theta three times the exponential height, theta four times the sine of height or whatever you like. So, note that this will still work. In fact we can perform arbitrary combinations of features and the model will still be linear in the parameters, which is what's critical for linear model. So, it's still right weight equals theta.product with our features x. Note on the other hand what we can't do for example, the same approach would not work if we wanted to actually apply transformations to the parameters. So y will be equal to theta naught, plus theta one times height, plus theta two squared, times height or some function sigma theta three times height. So the types of models will have linear models we've seen so far would not support these types of transformations. We need in order to be able to solve this system of matrix equations using linear algebra, is that the models are linear in the parameters theta which these models are not. On the other hand, there are alternative models just not ones we've seen yet, that would support nonlinear transformations of the parameters. For example, this is exactly what neural networks are trying to do. So to summarize, we've seen how to apply arbitrary transformations to features, or keeping our model linear, and we've explained better the restrictions and assumptions of linear models. So, on your own, I'd suggest you go ahead and extend our previous code where we used temperature to predicts pm2.5 levels, and see if you can improve that by incorporating simple polynomial functions of the temperature.