Multiple regression is the appropriate statistical tool when your response variable is quantitative. If your response variable is categorical with two levels, we need to use another multivariate tool, LOGISTIC REGRESSION. The SAS command that we'll use is called PROC LOGISTIC. Just like PROC GLN you type in you're response variable, the equal sign, and then your explanatory variable followed by a semicolon. Our response variable, Nicotinedep, is binary yes or no to nicotine dependence. And so, we should use a logistic regression. Note however, that because our variable is coded zero and one where zero is the absence of nicotine dependence and one is the presence of nicotine dependence. We also need to add the word descending to the code to assure that we're predicting the presence of nicotine dependence rather than it's absence. We also have an explanatory variable called SOCPDLIFE. That indicates the presence or absence of social phobia. Which is an anxiety disorder marked by a strong fear of being judged by others and of being embarrassed. Thus, our code would be PROC LOGISTIC descending; model Nicotinedep = SOCPDLIFE;. Let's take a look at the output here. Similar to the multiple regression output, we can see the number of observations and the number of observations with complete data that were used in the model. Here, we see the name of our response variable, Nicotinedep. Also, similar to the multiple regression output, we see a table with the parameter estimates and the P value. Notice also that our regression is significant at an alpha level of 0.0002. Of course, using the parameter estimates, we could generate the linear equation. Nicotinedep is a function of 0.38 +1.23 (SOCPDLIFE). >> But let's really think about this equation some more. In a regression model, our response variable was quantitative, and so it could theoretically take on any value. In a logistic regression, our response variable only takes on the values 0 and 1, therefore, if I tried to use this equation as a best fit line, I would run into some problems. Instead of talking in decimals, it may be more helpful for us to talk about how the probability of being nicotine dependent changes based on the presence or absence of social phobia. For example, are those with social phobia more or less likely to be nicotine dependent than those without social phobia? Instead of true expected values we want probabilities. >> Described visually, will no longer find the best fit line shown in red, very helpful to us as our outcome variable cannot take on any value. Instead, we're seeing that there is somewhere along our X-axis where our outcome variable moves from being more likely to be a 0 to being more likely to be a 1. Our goal will be to quantify the probability of getting a 1 versus a 0, for a given value on our X-axis. >> In order to better answer our research question, we will choose odds ratios as opposed to coefficients. The odds ratio is the probability of an event occurring in one group compared to the probability of an event occurring in another group. Odds ratios are always given in the form of odds and are not linear. Odds ratios are often a confusing topic for students when they're first introduced to it. So it will be important to go through it conceptually and better understand exactly what an odds ratio is and what it means. >> An odds ratio can range from zero to positive infinity. And is centered around the value one. If we ran our model and got an odds ratio of one, it would mean that there's an equal probability of nicotine dependence among those with and without social phobia. Those with social phobia are equally as likely to be nicotine dependent as those without. It's also likely then that our model would be statistically nonsignificant. If an odds ratio is greater than 1, it means that the probability of becoming nicotine dependent increases among those with social phobia compared to those without. In contrast, if the odds ratio is below 1, it means that the probability of becoming nicotine dependent is lower among those with social phobia than among those without. >> So how do we calculate the odds ratio? It is possible to do this by hand. The odds ratio is the natural exponentiation of our parameter estimate. However, we could also let SAS do this for us. As you can see, the odds ratio or a point estimate and associated confidence interval, are part of the SAS output for logistic regression. Because both my explanatory and response variables in this model are binary, coded zero and one, I can interpret this odds ratio in the following way. Those young adult daily smokers, my sample, with social phobia are 3.4 times more likely to have nicotine dependence than young adult smokers without social phobia. >> We also get a confidence interval for our odds ratio. Remember that our data set is just a sample of a population. We don't have every young adult daily smoker in the US. This confidence interval tells us that we can be 95% confident that if we select another sample from the population, the odds ratio for that new sample will be somewhere between these two numbers 95 times out of 100. So for example, my odds ratio for social phobia is 3.4. If I were to draw additional samples of young adult daily smokers in the US, 95 times out of 100 the odds ratio would fall somewhere between 1.78 and 6.61. It's important to keep in mind that the odds ratio is simply a statistic calculated for this sample. So looking at the confidence interval, we can get a better picture of how much this value would change for a different sample drawn from the population. Based on our model, those with social phobia are anywhere from 1.78 to 6.61 times more likely to have a nicotine dependence than those without social phobias. The odds ratio is a sample statistic, and the confidence intervals are an estimate of the population parameter.