[MUSIC] To get started, let's talk about the background here and then we'll come back to this specific article and explore the reasons for why the truth wears off. Okay, this is not gonna be a replacement for a introductory college statistics course. This is gonna be a quick overview of the terminology and the concepts you should be familiar with. Okay, so we're talking about statistical inference here. And so these are methods for drawing conclusions about a general population from sample data and there's two key methods that you can use here. Hypothesis tests and confidence intervals. And we're gonna bring in confidence intervals again a little later on but I'm not gonna talk about them directly right now. All right, so what is hypothesis testing? Well, you're gonna be comparing an experimental group to a control group. And there's always gonna be a null hypothesis. And the null hypothesis is, there's just no difference between these two groups, all right? The one who received the treatment in question are no different than the ones who did not. The new website generates no more traffic than the old website, than the control website than the default and so on. Okay, so that's the null hypothesis. The alternative hypothesis is that there is an effect. There's a statistically significant difference between the two. And so here, difference is defined in terms of some test statistic. And most of the examples you'll find in an introductory course or really any course are gonna be about comparing the means, right? So the average effect in the control group was different than the average effect in the experimental group. Okay, now, a lot of what statistics is about is actually designing the experiment to collect the data. And in a data science regime, in a big data regime, we're actually less frequently in a position to design the experiment in the first place. A lot of times we're dealing with data that we did not necessarily collect. Okay, so that's maybe one difference between classical statistics and the way I want to present this material for purposes of data science. Okay, that being said, it's important to understand that careful experimental design is really the most important thing there is in all this work, right? The analysis techniques are second fiddle to the proper collection of data, okay? So this includes things like randomized trials, blinded and double-blinded. So what is blinded? It means that the participants themselves don't know which group they're in. And that's pretty much non-negotiable, right? You can't tell people that they're getting a placebo drug versus the actual drug, or they'll report their symptoms differently as an effect. Randomize is also. Would be non-negotiable except for the fact that it's difficult to achieve in practice in some cases. So randomize would mean we draw a sample through some method and then we assign them to the groups. To the control group and the experimental group with no process whatsoever, right, it's just purely random. Okay, so, this framework expressed in just these sort of few bullets at a high level is unbelievably powerful, rIght? It's completely universal to data analysis. So it's really important to sort of internalize these points and we'll go into some more detail on some of the other aspects that aren't included in this slide. So, some examples you can dream up, that measuring the effect of a new ad placement on your website compared to the control group of the existing placement. Measuring the effect of a treatment against my sugar pill, or the best existing treatment, okay, and everything else you might imagine. So to summarize hypothesis testing, you can organize the terminology into this grid here. Where there's two possibilities for this true state of the world. One is that the null hypothesis is true, there is no difference between the control group and the experimental group. And the other is that the null hypothesis is false, that there is an effect that you're measuring. Okay, and so then there's also two possibilities for the outcome of your statistical test. In one case you do not reject the null hypothesis, right? You find no evidence that there's any difference between the groups. And the other is that you reject the null hypothesis. You do find evidence that there's difference between the groups. Okay, so if the null hypothesis is true, there is no difference. But you detect a difference, that's a type 1 error. And the rate at which that happens, we'll refer to as alpha. And that might come up at times as we have this discussion, right. And if you make the correct decision, then the probability of that is 1- alpha when the null hypothesis is true. When the null hypothesis is false and you fail to reject it, all right, there is an effect and you fail to measure it, that's a type 2 error, and that's beta. And when you do reject a null hypothesis, when it's false, you detect an effect when there is an effect to detect. That's 1- beta and this is called the power of the test, the statistical power. [MUSIC]