0:00

[SOUND]

[MUSIC]

So next, let's move onto what do we start doing with data?

And here, what you have in front of you is an example,

this is data that I just made up.

So this is simply looking at time for service at a restaurant.

So if you think about time for service at a restaurant,

if I were to collect data on this, what am I going to get?

I'm going to get different times for different people.

Now without getting into reasons for why there may be different times for

different people, what might happen is in this case,

you can see it ranges from 10 minutes to above 20 minutes.

So the last category on this chart is above 20 minutes.

And what have I done here, I've taken the data from roughly about 105 observations

that I can imagine I would have collected if I was standing at the restaurant and

collecting data on times that customers took, and I've looked at the frequency.

So what is it saying?

The first bar is telling me that on 3 of those 105 occasions,

the time taken was between 10 and 10.9 minutes, and

we can keep going from there, and we can see that all the way toward the end.

There were 3 of the parties, 3 of the customers that took more than 20 minutes.

So it's giving you a range of values that were possible in this particular instance.

In this context, I had a range going from 10 to above 20,

and it's also giving me frequencies.

So what do we mean by frequencies?

They're simply saying, how often did this occur in my data?

And you can see that it's creating what we call in statistics a data distribution.

So a data distribution is nothing but simply taking data and

getting their frequencies and then drawing a picture of it,

drawing a bar chart of it and saying how does it look?

And then you start looking at the shape of this distribution, and

you can say something about the shape of this distribution.

So you may recognize this as looking somewhat like a bell curve distribution,

which you may already be familiar with and which we are going to look at next.

2:28

So here's the normal distribution.

This is the distribution that is very common.

This is the distribution that's very commonly used.

We hear about it all the time, and it's because we like to convert things to

the normal distribution as much as possible.

It's because it gives us this power of being able to use z-scores.

And we'll talk about z-scores in a minute and what that means.

But basically, it has properties that we can use in order to make

any kind of inferences about populations based on samples that we've collected.

So that's why it's very popular.

So what is the normal distribution?

It is a distribution where, if you were to actually collect data,

it would look, and the frequencies would take this shape.

It would be bell-shaped, there are two parameters to this.

When we say parameters, there are two things in this distribution that matter.

One is the mean and one is the standard deviation.

The mean is indicated by the Greek letter mu, and

the standard deviation is indicated by sigma.

This is where the idea of Six Sigma comes from.

So sigma stands for standard deviation of the population.

When we have population parameters, we talk about it in Greek letters.

The corresponding statistics that we get from samples are talked about as x-bar and

sd or s as being the standard deviation.

So those are the two main things that we look at when we look at a normal

distribution.

Now the cool thing about this normal distribution or

a couple of cool things about this normal distribution are that it's symmetric.

One is it's 50% of the data is to the left of the mean,

50% of the data is to the right of the mean.

And that's what we mean by when we say that the median and

the mean are exactly the same.

So those two measures of central tendency are exactly the same.

The mean is simply the calculated average.

And then the median is where 50% of the values lie bellow that,

50% of the values lie above that.

And then the third measure of the central tendency here is the mode,

which is the value that has the highest frequency.

So in the case of the normal distribution, all these three are identical, so

the center meal value is the mean, it is also the median, it is also the mode.

4:57

The probabilities of the values within this normal distribution,

we know that between plus or minus 1 standard distribution, we have 68%.

So if you go from the mean to the right that's 34%,

if you go from mean to the left, that's 34%, so 1 standard deviation to the left,

1 standard deviation to the right, that encompasses 68%.

Similarly when you go to 2 standard deviations,

that's 95%, when you go to 3 standard deviations that's 99.7%.

And theoretically speaking, although this is never true in reality,

but theoretically speaking, this distribution has an infinite range.

So if you see, if you notice the way the normal distribution curve

has been drawn in this picture, it does not touch the x-axis.

It stays away from the x-axis, it becomes parallel to the x-axis.

And the point is that it keeps going up to plus or minus infinity on either side,

and it's an infinite distribution, theoretically speaking.

So those are the characteristics of the normal distribution.

And the way we use this normal distribution a lot, or

the reason we use this normal distribution a lot is this central limit theorem.

So what is the central limit theorem?

It's basically saying, if we were to take random samples from any population,

the probability distribution of the sample means starts to become

approximately normal as the sample size becomes large.

And what do we mean by sample size becomes large?

We like to think of the number 30.

A sample of size 30 is considered a good sample size for

the central limit theorem to apply.

And there may be debate about whether that's a good number or not.

There are other characteristics of the distribution that you need to think about,

but generally speaking, that's what we use

as a rule of thumb in terms of applying the central limit theorem.

We start talking about z values and

things like that when we have a sample size of 30 or greater.

So what do we mean by z values?

So here are two characteristics of the standard normal distribution.

So we talked about the normal distribution earlier,

whats a standard normal distribution?

A standard normal distribution is one where the mean is 0,

the mean is 0 and the standard deviation is 1.

So it has a fixed mean and a fixed standard deviation.

And we also know that once we can calculate the z value and

what do we mean by the z value?

We're basically taking any normal distribution and we are converting

it into the standard normal distribution by doing some calculations on it,

and we'll see those calculations on the next slide.