In this video, we'll use the sampling distribution, that we derived in the previous video, In order to derive the t-tests for individual regression parameters. Let's suppose that we want to test the null hypothesis, that one of the regression parameters in a linear regression model is equal to some constant. Of course, this beta J can be beta zero, beta one, all the way up through, supposing we have P predictors all the way up through beta P. Now, we're testing that the perimeter is equal to some constant, some real number. Typically, we take this real number to be equal to zero, and that's what the standard R regression output is. The reason for that is because suppose that this is one of the slow parameters. What's multiplying in XJ. If this parameter actually is equal to zero, then that means the predictor is not entering the model, which means that there is no correlation, between that predictor XJ, and the response. That's an interesting hypothesis to test. Now of course, in general, we could choose a one-sided, or two-sided alternative, lower-tailed, upper-tailed, or two-tailed. The standard output in R is to have, a two-tailed alternative hypothesis. Namely, that regression parameter beta J is not equal to that constant, again, typically zero, and in the R output, It's always defaulted to zero. Now, the next thing that we do in hypothesis testing is, to specify a significance level alpha, sometimes also called the size, and defined as the probability of committing a type one error for this test. Typically, this has chosen to be five percent, there are some arguments that suggest maybe it should be lower in a typical scenario, maybe half a percent. But we choose alpha based on, the rate of type one error that we're willing to commit. Then we have to come up with a test statistic. Now thinking back to standard statistical inference, a test statistic very generally, can be written in the following way. Suppose we're testing a claim about some theta equal, to Theta Naught, that's our parameter, versus theta is not equal to Theta Naught. Typically what we do with a test statistic, and we write the test statistic as a function of the data, so we're supposing that we have some X data that are bearing on these hypotheses. We write our test statistic as, an estimator of theta minus the true value of theta, under the null hypothesis. This value matches this value here. Divided by the standard error of theta hat, the estimator of theta. Now as you might recall, sometimes the standard error itself has unknown parameters in it, that we have to estimate. Sometimes we have to estimate that standard error, so we'll put a hat on top of it. The classic case of this is when you have, the test statistic X bar minus mu, under the null divided by S, over the square root of N, where S is the sample standard deviation, estimating the unknown population standard deviation. This here is an estimate of the standard error of X bar, and so this is the general notation if we're in the context of a theta. From there, we try to figure out the distribution of that test statistic. Sometimes the distribution is normal, sometimes it's T, and it could be many other things, and based on that distribution, we try to figure out when we get a test statistic, or one more extreme that's very rare, so we could calculate a P-value. We can decide if the test statistic calculated for our data is in a rejection region and so on. In the case of our regression parameters, we should choose a test statistic that will be a function of our data, which is the response data y and we follow the same formula. We take our estimator for Beta j, so Beta j hat the least squares estimator for Beta j. We subtract off the value under the null hypothesis. This c here is this value, c up here. Again, that's typically zero, so when this is zero, your numerator is just Beta j hat, your estimate for Beta j. Then we divide by the standard error of Beta j hat. Now we typically don't have the standard error itself and we have to estimate it. I'll put a hat over the hat c to denote that we're estimating the standard error. Now to calculate that standard error. Well, we learned in the last video that the vector Beta hat has a variance covariance matrix of Sigma squared times X transpose X inverse. This is a p plus 1 by p plus 1 matrix. That means that the diagonal terms are the variances of each one of the estimators. If we took the diagonal term, I'll call it the j j term of that matrix, we'd have the variance of Beta j hat. The standard error is the variance, it's the standard deviation, so we should take the square root of that. Then further notice that, this standard error contains an unknown parameter, namely Sigma, and we know how to estimate Sigma. I'll put a hat on top of sigma, and that means that I'll put a hat on top of the standard error. Because now this is an estimate of the true quantity because we've estimated a quantity inside. Just as a reminder, you get Sigma hat squared from taking the residual sum of squares and dividing by the degrees of freedom for the model. Alright, so the next question we should address is, what is the distribution of this t? Well, based on the fact that the response is assumed to be normal and on the fact that Sigma is unknown. This should remind you of a fact from your basic statistical inference course that in general, this test statistic will have a t distribution. The degrees of freedom for the t distribution will be the number of data points minus the number of parameters that you're estimating, namely p plus 1. Our test statistic has a t distribution with n minus p plus 1 degrees of freedom. Now, if you made an assumption about the size of the sample, for example, if we had a dataset and we had many, many rows to the dataset, so many, many units in the sample, and if we had a relatively low number of parameters in the model, then this t distribution would be very close to a normal distribution, and then this test statistic would be approximately normal. You could work with the normal distribution depending on the balance of n and p. Let's see how this plays out in an example in R. Here we're utilizing the marketing data that we've analyzed in previous videos. If you're unfamiliar with that data, you should go back to those videos just to get a sense of the variables. But basically, we have three predictors. Those predictors are marketing budgets for different places. One is for Facebook, another is for YouTube, and another one is for a local newspaper. The response variable is units sold measured in thousands of units. So we've run this regression before, but really we've only analyzed this column here, this estimate column because it's the least squares estimate. But now we're in a position to analyze the rest of these columns. So the standard error values, shown here, are exactly the ones that I wrote down on the previous slide. So basically, taking the diagonal entries of the matrix and taking the standard deviation. So we've got standard errors for each of our parameters. To get this t-value, all that you do is take the ratio of the estimate to the standard error. So estimate divided by the standard error gets you the t-statistic. The reason for that is because I said in R, the default null hypothesis is that the parameter is equal to zero. So that means you just have the estimate over the standard error. Now from there to get the p-value, we're calculating the p-value from a t-distribution, and it's a two-sided alternative. So that means you're calculating alpha over two area in each of the tails. For the first three parameters, you have a p-value of effectively zero, which means you are rejecting the null hypothesis that the true parameter value is equal to zero. Now the rejection of the null hypothesis does not tell you that the true value is very far from zero. So it does not give you evidence that the true value is very far from zero and in fact, if you have a really large dataset, you'll almost always reject the null hypothesis and the effect size, where the distance from zero could be relatively small. So the statistical question is, do I have reason to reject the null hypothesis that the parameter is equal to zero, is different from the more practical, or you might call scientific or business question, which is, is this value importantly different from zero? So you'd have to ask yourself that for each of these values. The answer to that question might actually depend on the cost of the item that you're selling. So remember, the marketing budgets are all measured in thousands of dollars and the sales are measured in thousands of units. So depending on how much you're selling your items for, and how much of an increase in sales you're getting from a $1,000 increase in your budgets, that might make the difference like the practical difference on whether you should continue allocating money to these different sources. Now, take a look at the newspaper predictor. So in this case we have a p-value that's large, and so we don't have evidence against the claim that the parameter associated with newspaper is equal to zero. So we might act as though it is equal to zero and what that would mean is that we could take out the newspaper predictor. So if we increase the newspaper budget, we're not seeing an increase in sales, and so it doesn't make sense, it seems to keep it in the model. Now I think it's worth mentioning that in this analysis that we've just done, we've actually run a few different t-tests at the same time. So we've run four t-tests at the same time and we did some reasoning individually about them. Typically, this is not great practice and so in a future video, we'll look at a way that better controls the error rates of these tests, and a way that will tell us perhaps which predictors to keep in the model and which ones we might leave aside.