0:05

Let's take a look at our first experiment where we

can measure a result with an analysis of variance, and

we'll start with a common experiment that you may have even done yourself.

A website ab test.

An ab test Has visitors who come to a website and

some are exposed to one version of the site and

others are exposed to another versions hence the A and B term.

0:43

So here's the scenario we'll work with.

First we'll talk about the design considerations of this experiment.

Talk about some of the considerations when we're running the experiment,

and then we'll move as we've done before to the arcode and

show how we would analyze this experiment statistically, and report the result.

1:04

Let's say on a given day.

500 visitors to a website are treated as part of the experiment.

Perhaps the first 500 he visit the website on that designated day, and

let's say half of them are exposed.

To a website A and half of them are exposed to a variation of it website B.

1:28

Now that may not be the optimal way to run an AB test perhaps it shouldn't just be on

one day for example, and perhaps it should be more than 500 people, or

perhaps it should be a certain number of people on a given day.

All of those are good variations to consider, but for

now we're going to keep it simple and just keep it to the scenario I described.

1:58

So, maybe we think that a redesign of a web site, say version B of this site,

will have people stay on the site longer and view more pages.

So distinct pages viewed will be our measure, and

you could imagine in a real world AB test, we might also count time on site and

perhaps page loads or page views total and other types of factors like that.

Maybe even clicks and things.

So we're interested in the number of distinct pages that they view.

3:24

Dependant variables are the things that result from our manipulation, or

sometimes called our treatment, which would be the site they're exposed to.

The dependant variable is really the measure, and as I said before, we're

interested in the number of distinct pages that are viewed so we can call that pages.

4:04

Some independent variables let's say x we just have one here so

we'll call it x, but if we had more than one which we will

see later in the course, we may have x 1 and x 2 and x 3 and so

on, bY is related to X and then we have to add plus.

Which is traditionally measurement error.

4:28

The idea here, in our case would be the number of pages viewed we think

Might depend on the value that x takes.

Is x website a or b, plus measurement error.

What's measurement error?

Well this is actually a very deep issue, but you can think of it as the random,

or error, or noise, that's in the measurement's that were

taking over people, over subjects for this experiment.

4:58

You might say, why is there any measurement error?

We know how many distinct pages they visit on the website.

That's true.

In that case, we know the measurement of the page count Presumably without error,

although there could be perhaps some error in our code that's logging that, or

maybe some edge case that's not handled or something, but

that's not just what measurement error is.

Measurement error in this term is also considering the variation that naturally

takes place when we measure things.

So it doesn't have to be that we're logging it wrong.

It could be that if I measured the same person on Tuesday,

and then measured them again on Wednesday, they may in fact have a different result.

If I measured two different people, they may have a different result.

Due purely to the fact that they're different people,

not because the website really is causing that.

These errors are taken to be kind of random, and

usually normally distributed, and they are part of any experiment, any measurement.

In fact, we don't know how much air may be in a measurement.

How much variation maybe, natural variation, and

that's why we need to have an statistical power

to draw the inferences over the population that we're after.

Meaning, we want to know, is there a true difference between website A and B,

in this case, in spite of the fact that we have some error in

every single measurement, because of the so called natural variation.

6:28

Of any human behavior that we might be measuring, so that's what that term is and

it's inescapable and it's exact value, of course, is unknowable.

So, in our particular experimental case, we're looking at, as I said,

the number of distinct pages being in some relation to

The site value of the site plus this error.

Now there's something else to be said about the design of this

experiment as well, and that is that these variables each have types and

it's important to be aware of variable types.

We saw in the previous section that we were recoding the subject variable as

a factor which is R's term for a categorical or nominal variable type.

We also know that there are numeric variable types.

Also sometimes called continuous or scalar And there's even a third type called

ordinal, or ordered, which are variables that are in a sequence

7:32

that has an order like a liquard scale, like a one to seven scale or

a one to five scale or short, medium, tall, taller, tallest.

Things like that that have an order to them are called ordinal.

7:54

What's the variable type for this pages?

It is numeric or numerical or scalar or continuous, all synonyms.

I'll grab this color here and I'll make a note of that.

In our customer analysis of variance situation we'll see some analysis where

this is not the case, but most we'll see that our Y value will be numeric.

It's a numeric outcome based on certain inputs, but what are those input types?

What is the type of X here?

It's the site that can take on two values, A or B.

9:05

Okay, so those are variable types, and

we'll see that through out some of our analysis.

Now, the other terms that are relevant here, that we'll use more commonly.

We wont say independent variables, probably much beyond this moment.

We'll say factors, because certain experiments we look at in the future

will have multiple factors, and they'll be factorial designs.

That'll be later in the class.

So independent variables can also be called, let me use our other color here,

can also be called factors, and factors can take on values.

Just like site has in this case two values,

10:04

Now, there's one last consideration to take into account, and that is that these

factors can also be between subjects or within subjects.

Well, what does that mean?

10:47

So in our case each subject would experience either website a ,or website b,

but not both, and within subject's factor Is one for

which a participant experiences more than one level of the factor.

In this case it would both website A and B.

In a website A B test, when a visitor comes to a site, they're usually issued

into one or the other variations of the website and not both.

I mean, piece of local storage or a cookie or something similars put on the machine

to kind of remember which site they were exposed to.

So each time they go to the site, they get the same one.

11:29

So, that's what a between subjects and within subjects factor is and

then when we have multiple factors, we can have Ssome of them be between subjects and

some of them be within.

To be a within subject factor you only need to be exposed

to more than one level of the factor.

So if we had a, b, c and d say versions of the site,

if a participant was exposed to a and b, but

maybe not c and d, it would still be within a subjects factor.

It would be a partial within subjects factor at that point.

12:02

So these are some of the design considerations for this website AB test.

What are some things to keep in mind when we run such a test?

This is by no means comprehensive list of considerations, but

it is a few things we'd want to think about.

12:16

One question is do we measure each visitor only once?

Remember we're measuring how many distinct pages they view.

What if they come back in the same day, or what if they come back

in a time when they're still within that group of 500 that we said we wanted?

12:31

For that matter, how many visitors do we want, why 500?

Should we want more, fewer?

That kind of depends again on how big is the difference in pages visited

between these website A, and B versions.

The differences are great, we dont need, so many

subjects if the differences are smaller we may need more to tell the difference.

Is the split 50/50?

Do half the subjects get A and half get B?

You can run website A B tests of course with any arbitrary split say 90/10 or

80/20 In our case, for this data, we'll more or less do 50/50, but

it may depend on an algorithm that assigns people the conditions in

a way that could get slightly unbalanced, and so that's a consideration as well.

Is the design a balanced design or an unbalanced design.

Balanced designs have the same number of data points in every condition.

Unbalanced designs do not.

So those are some of the things to think about.

For our purposes in this particular study, we will have near a 50/50 split,

but it comes out, as we'll see, not quite exactly 50/50, and

that's okay, and we have a total of 500 visitors.

13:42

And we do measure each visitor only once.

So, we have one measure per visitor, the number of distinct pages they viewed,

either in website A or website B.

Let's go now to look at the R code, and see how we would the analysis for

this kind of experiment