One thing to keep in mind is, again, when
we apply a prediction algorithm to the test set.
We have to be aware that we can only
use parameters that we estimated in the training set.
In other words, when we apply this same standardization
to the test set, we have to use the
mean from the training set, and the standard deviation
from the training set, to standardize the testing set values.
What does this mean?
It means that when you do the standardization, the
mean will not be exactly zero in the test set.
And the standard deviation will not be exactly one, because
we've standardized by parameters estimated in the training set, but
hopefully they'll be close to those values even though we're
using not the exact values built in the test set.
You can also use the preProcess function to do a lot of standardization for you.
So, the preprocess function is a function that is built into the caret package.
And here I'm passing it all of
the training variables except for one, except for
the 58th in the data set, which is the actual outcome that we care about.
And I'm telling it to center every variable and scale every variable.
That will do that same transformation that we talked about previously to
the data, where you subtract the mean and divide by the standard deviation.
And you can see that by looking at the
mean of the value capitalAve, just like we did before.
And you can see that after using the preProcess function
the mean is zero, and the standard deviation is one.
So, preprocess can be used to perform a lot of the preprocessing
tool, techniques that you, you used to have to do by hand.
The other thing that you can do is you can use the object that's created
using the preprocessing technique to apply that same preprocessing to the test set.
So, here this preObj was the object used on the previous slide.
That was the object that we created by preprocessing the training set.