Percentile is the percentage of observations
that fall below a given data point.
Graphically it's the area below the probability distribution curve,
to the left of that observation.
So why is it that we can only use the Z-scores under normal curves, but
not in a distribution of a different shape?
Well we can always calculate percentiles for any sort of distribution,
except if the distribution does not follow this nice unimodal symmetric normal shape,
you'd need to use calculus for that.
And for the purposes of this course, we're not going to be using calculus, so
therefore we're going to be sticking to normal distributions for
calculating percentiles or areas under the curve.
In this day and age, percentiles are easily calculated using computation.
For example, in R, the function P norm gives the percentile of an observation,
given the mean and the standard deviation of the distribution.
So P norm of negative 1, for a distribution with mean 0 and
standard deviation of 1 is estimated to be about 0.1587.
We can also obtain the same probability using a web applet, so
no need for access to R to use this one.
So let's go to the URL that's on the slide to the web applet and
do a live demo of how we would use the applet to calculate this percentile.
So to use the applet the first thing we do is to select our distribution to be
normal.
We can change our mean as we desire,
but we're going to leave it that 0 since that's the distribution,
the standard normal distribution we're working with for now.
We could also slide our standard deviation around but let's leave that at 1 for
now as well.
And we were interested in the area under the curve below the cutoff
value of negative 1, and we want to pick the lower tail here, and
once again we get to the same answer, 15.9%.
Lastly, we can also avoid computation altogether and
use a normal probability table.
We locate the Z-score on the edges of the table and
grab the associated percentile value given in the center of the table.
So, for a Z-score of negative 1 we look in the negative 1.0 row and
0.00 column for the second decimal and
arrive at the same answer, 0.1587 or roughly 15.9%.
Obviously, we don't have to keep using all methods here.
We've talked about three different methods using R, using our web applet, or
using the table.
You're welcome to use whichever you like in your calculations.
While the computation approach is a little less archaic,
the tables are actually very useful for
getting a conceptual understanding of what we mean by area under the curve.
So I encourage you to use the computation or R approaches.
But for the time being as you're learning this material,
also make sure that you get a chance to interact with the tables and
make sure that you sketch out your distributions.
And don't just rely on the numbers that the computer is spitting out at you but
make sure that you confirm them by hand as well.
Let's take a look at a quick example.
We know that SAT scores are distributed normally with mean 1,500 and
standard deviation 300.
We also know that Pam earned an 1,800 on her SAT and
we want to find out what is her percentile score.
Soon as we find out that the distribution is normal, the first thing to do is to
always draw the curve, mark the mean, and shade the area of interest.
Here we have a normal distribution with mean 1,500, and
to find the percentile score associated with an SAT score of 1,800,
we shade the area under the curve below 1,800.
We can do this using R and the pnorm function.
So here, the first argument is the observation of interest.
The second argument is the mean.
And the third argument is the standard deviation,
which spits out an associated percentile of 0.8413,
meaning that Pam scored better than 84.13% of the SAT takers.