Hello again everyone and welcome back. In the past lesson, I listed and described five important steps that are usually necessary when performing risk stratification to solve health care problems. In this lesson, I describe technical steps that statisticians and data scientists usually have some extensive knowledge with and experience. The general steps involve model creation and evaluation of the model, the final step is often considered the most important, the actual scoring of new data and deployment of the analytics to solve problems. At the end of the lesson, you will be able to list and articulate the meaning of the last five steps when performing risk stratification, create a model, evaluate the model, score new data, rank and stratify, and deploy the analytical output. Let me show you what I mean. In our prior lesson, we covered steps one through five to be: one, decide on algorithms whether to buy or to build, two was to select the target variables, three was to consider groupers, four was to specify a time periods, and five was to select candidate predictors. This lesson starts off where we left off with these additional five steps, starting with number six, which is model creation or to create the final model, seven is evaluation, eight is scoring new data, nine is ranking in stratifying, and step 10 is deployment of the analytical output. Let's go a bit deeper into each of these in turn. Let's get started with step six, creating the final model. This step involves deciding which variables from step five, selecting candidate predictors you want to include. Recall what you have learned about model selection in your role as data scientist. For example, A shows the principle of parsimony. This assumes that simple models often fit the data as well or better as compared to complex models. Next B, collinear variables should be avoided. In other words, variables that are strongly correlated with one another are often problematic. C, tests with holdout or validation datasets. Analysts should pick the most important variables or create an index that blends these into one variable. After careful analysis, you'll arrive at D, a final list of modeling variables that should be selected. For example, you might have found that including 20 or 30 Hierarchical Condition Category groups for an outcome of interest, creates a very high R-squared value. However, your comparison of the training and holdout data may reveal that you are at risk of overfitting the data. Thus, you might smartly select a smaller subset of variables. Finally, E, once you have a final model, you should run the model to obtain the coefficients or the weights. That gives you a picture of what takes place within step six or creating the final model. Step seven involves evaluating the model to make sure the global fit of the model is adequate. A global measure for the overall fit of an ordinary least squared regression model is the R-squared value. For logistic regression, you can use the receiver operating characteristic or the ROC curve. Evaluation of the bottle is likely to be an iterative process that involves the previous steps. Our model evaluation steps bring up the question, did we really choose the correct patients or members to target? To answer this question, let's briefly review some common parameters to evaluate predictive models. The R-squared evaluation method is common with regression models that have continuous outcomes. The R-squared value shows the percentage of the total variance of the observed data that is explained by the model. Between 10 to 20 percent is typical for most models. This is usually what we see with models that try to predict costs using administrative healthcare data. You might ask, is 10 to 20 percent really poor performance? I would say no, because the measure describes how well the model predicts each value rather than simply the rank of the observations are the values. In other words, the metric looks at how well the model predicts costs for each observation. But for stratification, we mainly care about ranking. Thus an R-squared value of about 15 percent might be quite effective. Now, let's discuss some traditional parameters for a dichotomous models. Logistic regression models are within this category and are commonly used in health care. This table summarizes the common yet powerful accuracy measure used with logistic regression. Sensitivity is the true positive rate or in the context of our confusion matrix shown here, a divided by a plus c. Specificity is the true negative rate or d divided by b plus d. Positive predictive value is the percent of the predicted high cost that are indeed the true high cost. The negative predictive value is not used very often for risk stratification. This is because it is usually the positives or the high cost that we care the most about. We're still in step seven. Now, let's move on to evaluate various cut points that will define the strata or risk groups as we evaluate the model. Life is all about trade offs and we face tough choices when assigning the cut-points for strata. This is important because different cut-points that define the strata will impact accuracy measures in different ways. If you maximize the true positive rate or the sensitivity, the true negative rate specificity will likely be reduced. This figure illustrates how different cut points have different numbers of observations flag for intervention. As one moves up and down the ranking, the accuracy metrics can help evaluate what it really means to get the high-value targets. If the users of risk stratification information are not willing to miss any high-value targets, then they risk adding in extra low-value targets. Thus, the real question is; what are the costs and benefits associated with mixing lower and higher value targets? Next is we evaluate the model, consider that cost concentration is an alternative accuracy measure offered. It is nicely described in an article titled Accuracy of Predictive Models for Disease Management, the authors illustrate the drawbacks with traditional parameters just mentioned, such as R-squared and receiver operating curves. They illustrate how a measure called cost concentration can help with risk stratification from the perspective of costs. They defined cost concentration is percentage of true costs of the total population that is concentrated among the sub-population that was predicted to be at high costs. As an example, imagine that five percent of the population was predicted to have a disproportionate amount of costs after following the stratification steps described earlier. If the model had no predictive value, then five percent of the costs would be expected in this group. If the model is more accurate, this group would have a larger fraction of the cost. For example, maybe 25 percent or more of the cost would be associated with this small group. This measure is like Pareto's rule that was described earlier. A small fraction of people are expected to have a disproportionate share of the illness or the costs. Let's move on to step eight, which is about scoring a new dataset. This assumes that we follow the guidelines of building models on training data sets, but then deploying them on new data sets. Models have structures and parameters that can easily be applied to new data sets. The parameters of the model are the coefficients or the weights that are fit to the training data. Now, let us move on to step nine; ranking and stratifying. This is where the model output is used to rank and stratify the observations. For most predictive models that is possible to obtain the probability estimation for each observation or instance. As shown in the example of linear regression, probability estimates are the sum of the weights. Once each observation has a probability or weight from the model output, it is easy to rank the observations. One way to do this is by sorting the data by weights obtained the ranks. Once ranked, the analysts can assign cut-points to the rankings to define groups or strata of interests. For example, cases identified by the model to be in the top percentiles can be assigned as a high-cost group. Finally, in step 10, we're ready for deployment of the model. Although we've scored the data and stratify the observations in step eight and step nine, this step reminds us that the final output from the scoring and ranking must be deployed in some manner. This may sound obvious, but information is not useful unless it has been placed within a context to help people solve problems and make decisions. I say this because I've seen output from many models be ignored even after a huge effort has been made to create and validate the models. There are many low and high tech ways to deploy a model. In the context of disease or case management, a list of members and their assigned ranking and strata can be sent to the intervention team via an email and an Excel spreadsheet. A higher tech method would be to incorporate the output into a database and then use some type of IT application, which scores the new patient data and then presents it to doctors and nurses within an electronic health record system. That concludes this lesson. Indeed, it has a lot of technical details, but I am confident that this conceptual process is a common and high-value endeavor within health care. You would do yourself well to review the five steps in this lesson and the five steps in the previous lessons aiming to be able to recall them and being able to explain them. This will give you a strong foundation for your work in healthcare data analytics. In our next lesson, we'll switch back to data and look specifically at Medicare claims data. I look forward to seeing you soon.