In this video, we're going to go through a work example of using the chi-squared test like you would in a common A/B test. On the web you'll often run an A/B test, and where you want to compare behavior between subjects. So let's say for example, for our design lab homepage we have two alternative splash landing pages. We have this one here which has a wonderful picture of our building. And then we have this one here which is a call for design fellows when you get to see a picture of our lab and all that cool stuff. Actually I'm going to put these down here to give us some room to work. So what we want to measure is which of these two alternative pages gives more click-throughs or conversions, if you can imagine if this were an e-commerce site, this might be sales, or a political campaign, it might be donations, so here's what we do. On our site, using software that we write ourselves or with the help of controlled experimentation platform, what we're going to do is we're going to show half of our participants one design. So half the people that come to the design lab website are going to see our building as the splash page, and half the people that come to the website are going to see the inside, all of us, with the coffer design fellows as the splash page. And we're going to work through this example by hand. Not because that's what grandpa did or because that's actually how you would do things yourself in the age of computers. But rather even though the computers can do the calculations for you, I really want you to understand the principles behind why the statistical tests that we're doing work the way that we do. And what I love about the chi-squared test is it's both so pervasively useful and also super simple to look under the hood and really understand what's going on. So here's our building page. And let's say that we had 120 people overall who saw this page. That's going to be our total. And we're going to see how many of these people clicked, so if 20 of the people clicked, that means 100 of the people didn't click. Make sense? Now just to make the numbers a little easier to understand let's suppose that a slightly different number of people ultimately saw this page. So this bottom row, this total is how many people were exposed to each condition or that you gather data for, for each condition. And this fellow's page where you get to see inside the lab, it's better. So we get more clicks here. So if 25 people did click that means that 75 didn't. Here we've got a ratio of one and three that the clicks to didn't click on our fellow's page. And here we have a ratio of one in five for the building page. And we want to know is this difference significant? You'll remember our null hypothesis, our opening bid is always there's no real difference. There's no material difference. It's just you know, a little bit of variation. So what we want to be able to do is to figure out, well, if our null hypothesis were true, if there were no actual difference between these two conditions, what would be the expected values, given these totals here? So the table that we've got so far. This is our observed behavior. If our null hypothesis were true, these two different observed behaviors would be drawn from the same underlying distribution. So what we can do is we can sum this row of people who clicked, we say they were 45 people who clicked across both and there were 175 people, this poor souls who didn't click across both and we had a total of 220 participants who saw both of these. So if our null hypothesis is true, we'll see no statistically significant difference between the behavior in this condition versus our null hypothesis, and the behavior in this condition versus our hard null hypothesis. What would we expect to see if each of these behaviors were drawn from this overall sample here, that's going to be our expected value. So for the building, again, we would have the same number of people who saw it, and here the number of expected clicks would be 45. So this would be 45 over 220, which is our overall range, times 120 people. And what that gives us is about 24.5 people. Obviously, in the real world it would have to be either 24 or 25, but the expected values are that average that you would expect. So we would expect if we exposed 120 people to a page where there was no significant difference across these conditions we would expect 24.5 people to click on it on average. And so that means that 120 minus 24.5 or 95.5 are the number of people who didn't click, and then we repeat the same cycle over here. So we would say, this is for our building. And this is for the lab. So here, we get our 45 over 220 again, and now that's going to be multiplied by a 100 people. Because that's how many total participants we had. So that's times a 100 and this is going to give us about 20.5 people, and then, a 100 and so that would give us 79.5 people. So you remember from the previous video that what we do with the chi-square is that our chi-squared value is going to be the sum across all four of our cells, Of the observed value. So if these values over here. Our observed value minus the expected value. What we would see if the null hypothesis were true? And that's squared over our expected value for that condition. So let's start out with the number of clicks that are in the building condition. So our observed was 20. Our expected was 24.5. We would square that, and so that's going to work out to about 20.25 and then we divide that by our expected which again is going to be 24.5, and let me ask our Nita, who's are resident design labs status station. What's a 20.25 over 24.5? 0.83, all right, equals 0.83. And then we do our second condition here, so we'll do the 25 observed. Minus 20.5 expected, also 4.5 and we square that again, which gives us another 20.25. And this time, it's going to be divided by our expected here, 20.5. And what does that give us for this one? 0.99, all right. Now we go in to the bottom row here that didn't clicks. So if we do 100, those partials that didn't click, minus 95.5 and we square that divided by, in this case our expected would be 95.5. And it's going to be a 20.25. So what do we get here? 0.21. And then the last cell is going to be 75 minus 79.5. Square-root over 79.5 and so that's going to give us 0.25. Okay, so we now get to this is we get the sum of these poor values. Get a 1.2, 1.45, and 2.27. Assuming I did that right. And so now what we can do is our grand total of the chi-squared value, our measure of the difference between what would have happened if our expected values happened and what we observed. That sum is what we're getting with the chi-squared value here. And how unlikely is it that you would see this level of overall deviation in real observed behavior? And that's when we can turn to our critical table four or you can just use any statistical package, but today we're diving under the hood and so we're doing this old school. Now you'll see that one of the questions here is how many degrees of freedom there are in your system. And the number of degrees of freedom is a little bit like a number puzzle. What it's wondering is how many different options do you have for filling in the cells? How many different things could there be? And the best explanation that I like for doing this degrees of freedom is if I gave you the sums along the perimeter here. So 45, 175, and 120. How many different values will I have to fill in before the rest became preordained? And you can see that example here. So if we do this, 45, 175, 100, and 120. And let's say that we filled in just a one value, like 25 clicks here, well, at this point everything is determined by the system. So if this is 25 this has to be 20 to get it to sum. If this is 20 and the sum is 120 we know that we need 100 here. If this is 25 and this is 100 we know that we're going to need to get 75 here to get this and this and everything else to sum. So, in a 2 by 2 like this. If you have an A/B test, two conditions and two values of each condition click and didn't click then your degrees of freedom is going to be one. Obviously, if you have more conditions or more values, your number of degrees of freedom is going to go up. So here with a degrees of freedom of one and chi-squared value of a little more than 2.25, we see that the odds of this happening, the odds of seeing this observed behavior and the null hypothesis being true is a little bit more than 0.1. So there is a little bit more than a 10% chance that our observed value here is drawn from an underlying distribution. Which is represented by our null hypothesis here that these two sights are exhibiting the same behavior. So here we don't quite have enough data, to be able to say that one of these is performing better than the other. And if we wanted to be able to find out whether we can say something for sure because there's a clear trend. It's just not confident enough to be able to reject the null hypothesis. So if we wanted to be able to figure out, well yeah, we've got a trend what should we do? In this case what we would want to do is go back and get more data. And if we increased these numbers, and the ratios held, as the total number, if you have the same ratios and the number of participants keeps increasing. The odd that that happens by chance go down. If you got a million people and a third of them click here, and a fifth of them clicked here, that just ain't of going to happened by chance. And so by increasing the number of participants what you'll see is, if the ratios hold, the difference will reveal itself. And if it was, in fact, drawn from something like the null hypothesis that the distributions are the same by chance, as you increase the number of participants, these two numbers will start to look more like each other. So that's the chi-squared taught in a nutshell. When you see controlled experiments on the web, you know that this is the basic machinery that's under the hood. Have fun!