So a Q-Q plot is exactly what it sounds like,

it's just plotting quantile versus quantile for a given data set.

And so we decide [COUGH] how many points we want in a Q-Q plot based

on the number of quantiles we want to calculate for a data set.

So remember, in our first example we said, okay we wanted the number

of quantiles, To be equal to 5.

And so we calculated the percent, less than 20%,

40%, 60%, 80%, and 100%.

If I calculate the same quantile set for two different data sets,

I can create a plot where this is Q1 and this is Q1,

and this is data set one, and this is data set two.

And as I plot how the quantiles map,

I may get some sort of picture like this, like we're seeing here.

Ideally what a quantile-quantile, or Q-Q plot shows us,

is if the data falls along this 45 degree line, it means the data

came from samples that have similar statistical distribution.

And so what people often do is they'll plot the quantiles for

a normal distribution against the quantiles for a different data set,

to see if a particular data set does have an underlying assumption of normality.

So again, trying to think about what are the different properties of our data set?

Is this multimodal?

Is it normal?

Is it skewed?

How can we explore and understand this?

Now this is a lot more powerful than comparing two different

distributions than histograms.

So for example,

if I'm looking at a histogram that sort of winds up maybe looking like this,

versus a histogram that sort of ends up looking like this, how similar are those?

It's difficult to tell a little bit.

Let alone, this may have been created from a sample size of 100.

This may have used a sample size of 1,000.

What we could be comparing is library number one versus library number two,

and we want to see what the distribution of page counts for

books is in library one and library two.

The libraries have different books.

They have different budgets.

But if we're using a Q-Q plot, sample sizes don't need to be equal.

Because again,

we're just calculating the quantiles for the data set from library one,

the data set from library two, and then plotting the quantiles in an x,y plot.

Another alternative is a probability plot that we're not going to talk about now,

but again, these are different tools to be aware of for your data detective role.