Hi, my name is Brian Caffo and this is Mathematical Biostatistics Boot Camp Lecture 4 on Two Sample Binomial Tests. Okay, in this lecture we're going to talk about the score statistic, which is specific two sample binomial test that will. Serve as motivation for creating a confidence interval, as well. We'll talk about how you can do exact tests for two binomial proportions. And then we'll talk about comparing two binomial proportions. and then we'll go over a little bit about Bayesian and likelihood methods for comparing binomial proportions. We're, we'll actually spend quite a bit of time on binomial proportions, because later on we'll also talk about relative risks and odds ratios and that sort of thing. [NOISE] [NOISE] [NOISE] Okay, so let's put some context on this. So imagine, a randomized trial where there was 40 subjects. And 20 each were randomized to two drugs and the, the two drugs have the same active ingredients, but let's assume they had different expedients. What, you know that, ways in which the drugs were delivered. One, let's just say one was a capsule and one was a, was a different kind of pill. Okay. so consider counting the, the, the number of side-effects for each drug among the 40 people. So here we have a table where there's 40 total people. 20 in drug A, 20 in drug B. So this margin is fixed. [INAUDIBLE] 20 and 20, that margin is fixed. And then 11 from Drug A receive side effects and then 9 didn't. And then five from Drug B had side effects and 15 didn't. So on face value there, there seems to be a greater propensity for side effects from Drug A then from Drug B. And what we'd like to do is to, to do a test of whether or not the side effects are the propensity for side effects is the same within the two, within the two drugs. And there's a lot of different ways we could formulate this problem even from a simple data set. And we'll talk about some of the ways. To do that. but for right now lets just start with talking about score tests. So the scores should seem fairly familiar to you because it's just going to be constructed in the same way. that the ordinary test is, the, sort of, ordinary Z tests are constructed in the way that we have before. So, let's consider a single binomial proportion, not two binomial proportions, but i can set a single binomial proportion, and later on in the lecture, we'll consider two binomial proportions. so imagine only looking at drug A and saying well lets test whether or not drug A has a specific population proportion of side effects. So we want to test h0 p equal to p0 and so our obvious estimate of p0 is p hat the sample proportion of side effects. Okay, so our obvious metric of the discrepancy between p hat and p naught would be the difference or maybe we can do a long ratio or something like that but lets for right now lets do the difference. Okay. But that, but, you know, eh, you know, in order to compare this to a statistical distribution, let's normalize it by the variance of p hat. Well the variance of p hat is p times 1 minus p over n. And, under the null hypothesis, that's p0 times 1 minus p0 over n, so the standard standard error is the square root of that. And notice that were plugging in the p0 under the null hypothesis, not plugging in p hat which would give us the estimated standard error that we used, for example, in the construction of the confidence interval. So any rate, this, this test when we plug in p0 in the denominator here, performs a little bit better than the so called wall test where in the denominator rather than plugging in p0, we plug in p hat. So remember how we can invert this test statistic to perform a confidence, to create a confidence interval. And we can, you know, of course we can use the test statistic to just perform the test, so I should have said this on the previous slide. but, that Z statistic you compare to quantiles from the standard normal distribution, the upper Alpha over Tuth quantile if you're doing a two sided test. The upper alpha over two quantile if you're doing the one sided greater than test, the lower alpha, the upper alpha quantile if you're doing the one sided test where the alternative is greater than, p greater than p0. And the alpha quantile if you're doing in one sided test well where you're testing p less than p0. Okay so and this, this should be obvious to you at this point in the class how you would create take a Z statistic and then use it to perform a test. And then, of course, we've already talked at length, in other settings, for example, in inverting the T test, and the standard Z interval, on how we can invert a confidence interval, and create, I'm sorry, invert a hypothesis test, and create a confidence interval, namely, calculate those values of p0 for which we'd failed to reject. If we invert the Wald test, we get the Wald confidence interval. P hat plus or minus Z one minus alpha over two square root p hat one minus p hat over n. And if we invert the Score test we get a so called Score interval which is a lot more complicated. We get p hat times this quantitiy times one half times this quantitiy plus and minus our normal quantal, our upper Alpha over 2 normal quantile and then this standard error formula. I want to point out one thing about the that, that the, this is not a numerator here by the way this is this quantity p hat times thing plus 1 half times this thing and then I ran out of space, ten on the next line I put plus or minus. The normal quantile times the standard error. So this is p hat times n over n plus Z squared and then plus one half times Z squared over n plus Z squared. So, look at this two factors n over n plus Z squared and Z squared over n plus Z squared, they add up to one. So they are they are two proportions that add up that, that, that, um,add up to one. So in other words a point on the two dimensional simplex is what you might call, say in mathematical parlance but whats important is as n gets very big, this first term gets very big and p hat dominates. if n is small then, then, then this term in front of the one, this term in front of the one half gets a little bit bigger and, well, the one half probably hopefully doesn't dominate but, but, but there's more a greater fraction placed on the, on the one half. And just to give you context, the Z One minus alpha over 2, well you know that's usually going to be around 2 so this is about 4. So it's n over, it's about n over n plus 4 plus one half, 4 over n plus 4 so at any rate, and as n gets very large it just becomes very similar. To the Wald interval. so what this, this does, is it takes p hat and it shrinks it towards one half. And that's, that, that turns out to be a good thing to do, because the binomial confidence interval you don't want it centered exactly at p hat, because the binomial distribution is p hat is Further away from one half gets more asymmetric. It gets more skewed, and because of that, you don't want that point right in the middle. At any rate, so we, we talked previously about confidence intervals for binomial proportions. And plugging in Z one minus alpha over two equal to two yields the so called Agresti Coull interval that we talked about before. So this is actually the motivation for the Agresti Coull interval is that most people do 95% intervals and if we take our 1.96 and just round it up to two then and plug it into the score interval we get exactly the Agresti Coull interval. [INAUDIBLE] [NOISE] Okay, so let's do our example. In our previous example consider testing whether or not Drug A's percentage of subjects with side effects is greater than 10%. So, I don't know, I made up 10%. So let's assume that the FDA gets really mad for this kind of drug if you have more than 10%. Prevalance of side effects. so H0 PA is equal to 0.1 versus HA PA greater than one, where PA is the population proportion of side effects for Drug A. And then our p hat is 11 over 20 which is 0.55. Our test statistic is 0.55 minus 0.1 divided by square root remember to plug in the p naught from the null hypothesis 0.1 times 0.9 divided 20 you get 6.7. We reject the p value we reject you know our, our critical value is going to be, or for one-sided, test is going to be about 1.65 or a two-sided test, it would, it would be about 2. Either way, 6.7's going to be bigger than it. And then, our P value, the probability of, of getting a Z bigger than 6.7 is of course, nearly around 0. six, almost seven standard deviations away from, from zero, for a standard normal, is, is quite. Quite far out in the tails, remember the three standard evasions covered the majority of the distribution. and then if we were doing a two-sided test remember that we would double this P value. Okay. Now let's, let's discuss this problem a little bit. So what, what do we have to do to get this to work. so mechanically I, I hope you find this easy, but let's talk a little bit about the thinking of this. So we're, we're postulating that, that, the number of side effects out of 20 is a binomial trial. Well that, implicit in that is the idea that, and we have I ID g/ Data, every person is independent and identically distributed, draw from a population. So we're using those assumptions, the IID assumptions to create the idea of a population, super population, that has a prevalence of side effects of p a. of course, we cannot, in general you can't know that unless your action is sending more people or going to great pains to actually sample independently from the population you're interested in. which is usually not the case. In general the sample is, what you're, what you are doing is you're doing a statistical model where you're hoping that the people are a representative sample. You're hoping that there's no crossover of side effects. So let's suppose some of the people receiving the drug are friends, or in the same family, or in the same neighborhood, or something like that that there's no reason if one person gets. Side effects. That it's more likely for anyone else around them to get side effects who also received the drug. So no interference is what they would call that. and so so, so our IID model is what's giving us here, this idea of a population proportion. And then we're, we're testing relative to that proportion. But it's good, I think, whenever you're doing these. Kinds of test actually think about what you're modeling is random and what your population model is that you're trying to do, because, you know, modeling is a modeling the, the calculations here are very simple. Right? I hope we all agree that the calculations are very simple. but the, the principles, the idea of how the modeling is going is, is a lot more delicate. And, and, and actually on a lot more shaking ground so that's why its a good idea to always sort of make through your model assumptions and what they imply for your actual hypothesis. So just give you an example lets assume that our sample of people. Are all people who are professional drug takers for pharmaceutical companies? Sort of professional guinea pigs. They, whenever a drug company says, oh, well, we want to test out this headache medication, they sign up. And then, there are, there are people who do is, of cours, of course, right, you know. You can, you can make, maybe not a, a tremendous living doing it, but, but, you know, they, the drug companies of course pay people for their time, to, to, to test out these sorts of drugs. then the, it would be hard to, to, to to declare that population IID draw. From the general population, because this is a very, very different sort of people. They probably adhere to their medication schedule very very precisely and, and other things like that, they're probably very good takers of medications. They probably do the instructions quite well. They know And so on. so so any way, my point, my point, my larger point being is that, that, when you, when you get this number 6.7, which is very easy to obtain, you know, think about what it means in the, in the context of the model that you've used. And in this case the model is seemingly very benign, but the, you know, the big part of the model is that. The IID draw from a population. think about what that means in the context of the problem that you're studying.