Okay. So, right now,

we're going to talk about the importance of

defining good research questions for making sound inference.

So, the overall idea here is that if

we want to start applying statistical procedures to data,

we have to have a good idea of what question we're answering in the first place.

So, we're going to talk about some examples and then moving forward,

we're always going to refer back to

a research question when we go through these different examples.

Okay. So, we're going to review the importance of

well-formulated research questions for quality statistical inference.

We're going to look at examples of different inferential approaches

using the NHANES data to address explicit research questions.

So, like I mentioned, we're always going to look back and say okay,

exactly what research question are we trying to answer.

In these different examples,

the examples will be supplemented with working Python code.

So, you can go through, run the code,

replicate the analysis results,

and follow along with the different inferential examples.

But, we always want to make sure that we're answering

a good well-formulated research question when we perform a statistical analysis.

So, a little bit about good research questions.

Well, we know data are everywhere these days;

big data, small data sets,

design Studies, process data.

It's really easy to find a dataset,

an electronic version of a dataset,

import it into some statistical software and run some analysis.

We can get data very easily from many different sources or again we could design studies,

design survey samples and collect data from populations.

But, wherever we get the data from,

inferences based on those analyses,

they're going to tend to miss the mark,

if we don't have a well formulated research question

underlying the study that we're trying to perform.

So, we have to ask the question,

what really defines a good research question?

So, some key aspects to think about when we're defining a research question.

First of all, what is the target population of interest?

It's a really good idea to write down a very clear and concise statement of what

the target population of interest is and then

make that clear when you're writing your research question.

Second is the research question descriptive or analytic?

Now, what's the difference between those?

So a descriptive question,

we might be interested in the mean income in a specific population.

So, we're interested in estimating a descriptive parameter,

the average income for

that population or we might be interested in more of an analytic question.

Analytic questions generally refer to the relationships between different variables.

So for example, we might be interested in the relationship

between income and quality of life in a specific population.

So not just estimating a mean or a total or a standard deviation,

but rather quantifying the relationship between two variables.

Those types of questions are generally referred to as analytic.

Third, has the question been asked before?

And will the new study add knowledge that didn't exist before?

So, many studies build on prior studies that may have been asking similar questions,

but we need to make it clear whether

the question has ever been asked before and what exactly

the new study is going to be adding to

the knowledge that we already have about this particular topic.

Then fourth, are the variables readily available,

measured appropriately, or feasible to measure using well-established tools?

So, you need to make sure that it's going to be

possible to actually collect the data that we're

interested in and are we using appropriate measures

for what exactly it is that we're trying to measure.

We're going to talk a little bit more about that with different examples.

But, we need to make sure that the variables that we're interested in

are readily available and straight forward to measure and we need to make sure that,

what we're measuring is actually

capturing the concepts we have in mind, that we wanted to measure.

So for good research questions,

we think about these four properties that we just discussed and if we

craft the research question following those four essential properties or

aspects that we just went over and we use

an appropriate statistical procedure that's well aligned

with that research question given the four properties,

we can make very good inferences related to that question,

but we need to make sure all five of these things go together.

The four key aspects that we just talked about and choosing

an appropriate statistical procedure that will lead to good inferences.

The absence of a good research question and just blindly running analyses,

we bring a dataset into some software and just start running different analyses,

generating different results, writing up those results.

If we do all this in the absence of a good research question,

this could very easily lead to poor insights and incorrect decisions.

We need to make sure that, the analyses that we're running are well aligned with

a carefully crafted research question that will

maximize the quality of the inferences that we make.

So, here's a bad question,

what is the relationship between academic performance and summer internship success?

So sounds straightforward on the surface.

We're interested in an analytic question

what's the relationship between these two variables,

one called academic performance,

one called summer internship success.

But, let's break this question down a little bit more detail.

First of all, what's the target population?

Well, we have no idea.

What population is the author of this question talking about?

We really have no idea the way the question stated.

Second of all, is the question descriptive or analytic?

Well, we see that the author is interested in the relationship between

performance and success so this would be framed as an analytic question.

That's good because it's making clear what type of analysis the author wishes to perform.

Third, will answering the question provide new knowledge?

Again, we have no idea.

The question just states that,

we want to look at the relationship between performance and success.

It doesn't say whether it's adding on to any existing knowledge.

Number four, how are performance and success even measured?

Well, again, we have no idea.

What measure of academic performance does the author have in mind is that GPA?

is it final exam performance?

Is it class attendance?

Is it being able to make adequate progress towards a major? We really have no idea.

What about success?

What defines the success of a summer internship?

Is it finishing the internship?,

Is it getting a positive evaluation from whoever your supervisor was at that internship?.

We really have no idea how

these different concepts are going to be measured the way that this question is written.

So, pretty much this question only hits on

one of the four key properties of a well-written question.

So definitely we need to rethink how to write this.

Here's a good question that we're going to build on as we

go through different examples using the enhanced data.

When considering Hispanic adults age 18 plus in the United States in 2015-2016,

what is the difference between males and females in mean systolic blood pressure?

So, little bit extra words but those words are

providing additional detail about what we're interested in.

So, let's break down this question.

First of all, what's the target population?

Well, In this case it's clearly defined.

We're talking about Hispanic adults age 18 and above in the United States in 2015-2016.

So the target population becomes clear the who,

the what the when, and the where.

That's the population that we want to make conclusions about.

Number two the objectives are clear.

This particular question is focused on a descriptive comparison of means.

We want to calculate the mean systolic blood pressure for both males and

females and then compare those means for this particular target population.

Number three, has the question been asked before?

We don't know on the surface the way the question has been stated.

Probably, but perhaps for other years,

we're making it clear that we're interested in generating new knowledge from

2015 and 2016 may be based on a recently collected or recently available dataset.

We're getting new knowledge for this specific population in these specific years.

Then number four, the measures are made clear.

So we have gender, we want to compare groups defined by male and

female or sex and then we also have systolic blood pressure.

So it's clear that we're measuring

this physiological characteristic and we want to calculate

the averages of systolic blood pressure for each of these two groups and compare them.

So additional detail here makes the objectives of the study clear and this is

a good question that we can build on when choosing an appropriate statistical procedure.

So, good questions make it very easy to choose inferential procedures.

Let's suppose we have a data set collected from a sample of

Hispanic adults age 18 and above in the United States in 2015-2016.

In this case, that sample is going to be the NHANES

2015-2016 and we want to compare means between two groups,

males and females on a continuous variable of interest, systolic blood pressure.

Given this information, the inferential procedure that we would

likely choose is an independent samples t-test.

This type of test allows us to compare means in

two independent groups on a continuous variable of interest.

Now, an important caveat moving forward.

We're going to be treating the data from

the NHANES as if they come from a simple random sample.

So moving forward when we start to introduce applications of

the statistical procedures that we'll be talking about in this particular course,

we're going to start simple and we're going to assume that

the enhanced data come from a simple random sample.

Now, as we learned in course one when talking about where data come from, remember that,

complex sample design features for probability samples like

the enhanced sample generally need to be accounted for in inferential procedures.

We will talk more about complex sample survey analysis later but as we're

introducing these procedures and the basics of applying these different procedures,

we're going to assume that the enhanced data come from a simple random sample,

and start simple with examples of these procedures.

Later on in this specialization,

we're going to revisit the same examples and take the complex sampling features of

the NHANES into account in the analysis when generating

estimates when making conclusions about the population.

So, when we talk about these different examples,

we're more or less setting a baseline under the assumption of

a simple random sample generating our estimates performing analyses.

Later on, we're going to account for complex sampling features and

revisit the conclusions that we make about these target populations.