0:00

Hi, my name is Brian Caffo and this is Mathematical Biostatistics Boot Camp

Â Lecture twelve on Bootstrapping. Today, we're going to talk about the tool

Â of bootstrapping which is an incredibly useful, handy result in statistics that

Â you can use in a variety of settings. It was made, sort of coincided with the

Â personal computer revolution. And so, it, it gives us a way to avoid an

Â awful lot of mathematics in biostatistics. Before we talk about the bootstrap, we're

Â going to talk about the jackknife, which is a precursor to the bootstrap that is as

Â its name suggests, a handy little tool. So, let's talk about the jackknife here

Â briefly, before we go into the bootstrap. The jackknife is as exactly as it's name

Â suggests, a handy little tool. The bootstrap, on the other hand, is like an

Â entire workshop of tools. The key idea in both the jackknife and the

Â bootstrap is, to use the data, so called resampling of the data, to get at

Â quantities that are difficult to get at otherwise.

Â For example, variances, and biases, and that sort of thing.

Â Now, we, we don't need either the bootstrap or the jackknife for something

Â like the sample mean where we know all its theoretical properties.

Â But, for other less obvious statistics, we need something that does it for us and,

Â You know, it'd be preferable if that something didn't require a year of

Â mathematics just to get us to the starting point.

Â And, in contrast, the bootstrap is you dream up a, statistic or something like

Â that. And you want to estimate a standard error

Â with it, you can start bootstrapping it immediately.

Â So, Let's talk a little bit about what the

Â jackknife does before we begin with the bootstrap because sort of, historically,

Â the jackknife came first. The first use of the jackknife was by the

Â statistician, I mean.

Â A butcher. His name, but I think it's pronounced

Â Quenouille. And, he used the jackknife to estimate

Â bias, I believe. Then, the jackknife was really popularized

Â and further refined by the extremely well-known statistician, John Tukey, who

Â we talked about a little bit in the lecture on plotting.

Â Tukey had numerous inventions including the fast discreet Fourier transform.

Â He coined the term bit for binary digit. He was the first person to do that,

Â And he did lots of things. He invented the box plot.

Â I think when you see it, you'll conclude along with me, the jackknifes a handy and

Â incredibly clever thing for someone to think of.

Â So, the idea behind the jackknife and similar to the idea behind the bootstrap

Â is to, you have something you don't know, like the bias in a statistics, or the

Â standard error of a statistic, and the idea is to use the data to get a sense of

Â it. Wellm what the jackknofe does is it says,

Â okay, well, one way to get at these quantities is to take one of the

Â observations out. And then, formulate the statistic on the remainder and see how

Â well the statistic does, you know, at estimating that one pulled out

Â observation. And this is very related to the idea that,

Â you know, frequently you hear of in machine learning and statistical

Â prediction of so-called cross validation. The jackknife tends to have a different

Â goal. In that, the goal of the jackknife tends

Â to be bias estimation, or variation estimation.

Â But the principle is very similar in that you're deleting observations.

Â Leave one out cross validation is typically used as an estimate of

Â prediction error. So, anyway, let's just focus on the

Â jackknife. And if you take classes in machine

Â learning or something like that, you'll talk about cross validation.

Â The jackknife deletes one observation and calculates whatever estimate you're

Â thinking of based on the remaining n - one of them.

Â And then, it uses this estimate based on n - one of them in which you get n estimates

Â having left out one observation one at a time.

Â It uses these n estimates to do something like, estimate biases and standard error.

Â And again, no, we don't need this for the sample mean. We know that the sample mean

Â is unbiased under certain assumptions, and we know exactly what the standard error of

Â the sample mean is under the standard setting. So, the jackknife isn't necessary

Â for those settings, but it's, maybe necessary for other ones.

Â So, let's just consider the jackknife for univariate data.

Â And let's let x1 to xn be a collection of univariate data points where we want to

Â estimate a parameter theta. And so, let's let theta be the estimate

Â based on the full data set. And then, let's let theta hat sub i.

Â Be the estimate of theta that you obtain, where you use the n - one observations

Â obtain by deleting observation i.. And then, let's let thta bar be the

Â average of the leave one out estimates. So, with that notation in mind, the

Â jackknife estimate of the bias of our statistic theta hat, is just n - one theta

Â bar minus theta hat. So, let's kind of consider the principal

Â of this before we've, get to why in the world that n - one is there.

Â So, theta hat is our estimate. Looking at how close it is to the averages

Â of estimates where we deleted an observation each time,

Â Is exactly going to give us a sense of kind of population level bias.

Â And then, you might wonder, where in the world does this n - one come from?

Â It's, factor that's based on the, For example, the sample variants where you

Â would experiment the bias of the sample of variants, it would give you the correct

Â answer. The n - one is sort of calibrated by,

Â statistics that we actually know. So again, this estimate is really related

Â to how far the average delete one estimate is from the actual estimate.

Â And then, this n - one is just a factor that was sort of, a good estimate of what

Â is the appropriate multiplier to have to get the bias to be an estimate of the true

Â bias. And then, the jacknife estimate of the

Â standard error is n - one over n times the sum of the squared deviations of the

Â delete-one estimates around the average of the delete-one estimates.

Â So, it's sort of like the square root of n - one times the variance of the delete-one

Â out estimates. You, so, again, the rationale for this

Â factor out front, The extra n - one, why not just, why not

Â just take the variance of delete-one out estimates as an estimate of the standard

Â error of the statistic? Well, it turns out that delete-one out

Â estimates because they have the majority of the data.

Â They have n - one of the data points included.

Â They tend to be quite close to one another,

Â And excessively close to one another. So, the variance, by itself, is not a good

Â estimate of the standard error of the statistic.

Â So, we need a fact, and they calibrated that n - one is a reasonable factor to do

Â that, And the same thing is true with a bias.

Â That, to delete-one out statistics tend to be a little too close to one another

Â unless you sort of multiply this by its estimate by a little, but you don't get

Â reasonable estimate. So, let's go through an example.

Â So, we had 630 measurements of gray matter volume from workers from a lead

Â manufacturing plant. The gray matter volume wound up to be

Â about 589 cubic centimeters. And, we want to estimate the bias and the

Â standard error of the median. And then, I'll come back to this

Â discussion of jackknife the median because that's where we're going to move forward

Â to the bootstrap. So, for example, the gist of the code to

Â do this. Now, you don't actually have to execute

Â the code. I'll show you in a page, how to do it.

Â But, you can do it in any language, not just R.

Â You just have to figure out how to delete observations one at a time.

Â So, let's let n just be the number of observations we have.

Â Theta hat is the median of these grey matter volumes.

Â And then, the jackknife estimates are the median that I obtain each time where I

Â delete the i-th observation, This sapply function is exactly that.

Â Then, theta bar, just exactly from the notation from the previous couple of

Â slides is just the mean of these delete-one out jackknife estimates.

Â Then, my bias estimate is going to be n - one times the difference between theta bar

Â and theta hat. And, the standard error is going to be the

Â square root of n - one times the average squared deviation of the jackknife

Â estimates around the average of the jackknife estimates.

Â And then, on the next page, it's a lot easier to do this. [laugh] If you want to

Â just use the software in the bootstrap library, you can jackknife, out is the

Â jackknife function is the list of my grey matter volumes and the function I want to

Â calculate the jackknife estimate of is the median.

Â And then, I assign that to a variable out, then I pick out the standard error and the

Â bias calculation. Both methods yield a estimated bias of

Â zero and a standard error of 9.94. And,

Â There's an odd little fact. The jackknife tends to work well for sort

Â of smooth functions, and empirical quantiles often don't satisfy that

Â requirement. The median is an example.

Â So, it's an odd little fact the jackknife estimate of the bias for the median is

Â always zero when the number of observations is even.

Â So, the medians an example where the jackknife isn't that good of a thing to

Â do. In general, if your function of the data, a nice smooth function,

Â The estimate that you're getting is a nice smooth function of the data, then the

Â jackknife will work fine. But, if it's not, then it tends to work pretty poorly.

Â In that, there was a very well known paper by Efron, the inventor of the bootstrap

Â that illustrated this quite starkly. And the jackknife has been shown to be a

Â linear approximation of the bootstrap. So, if you're in some setting where it's

Â going to be difficult to program off the bootstrap, then doing a jackknife, which

Â is a pretty simple thing to do, is a handy little tool to use.

Â And then, just to remind you, you know, don't use the jackknife for sample

Â quantiles. It's a handy procedure and it works in a

Â lot of settings, but maybe not for sample quantiles, like the median, as it's been

Â shown to have some poor properties. And what could you possibly use then?

Â Well, why not try to use the bootstrap. So, let's move on to the bootstrap which

Â is maybe a little bit more of a complete toolbox but it's certainly a little less

Â compact of a tool than the jackknife in exactly the way the analogy to the tools

Â sounds like. By the way, the term bootstrap comes from

Â this idea of pulling one out by ones own bootstraps, right?

Â And, you know, of course, This has been discussed a lot.

Â It's kind of an unfortunate title for a statistical procedure, because it makes it

Â sound like the information's coming from nowhere,

Â Right? Because you can't pull yourself up from

Â your own bootstraps. It's physically impossible.

Â But, you know, there's been plenty of theoretical work that shows where the

Â information is coming from, from the bootstrap in, sort of, when it is

Â applicable. Another thing I would note is this idea of

Â pulling oneself up from one's own bootstrap is from the fable of Baron

Â Munchhausen. And so, there's a great movie called The Adventures of Baron Munchausen.

Â And it was done by some of the people who made the Monty Python series.

Â If you get a chance, you should, you know, in honor of this lecture, watch the Baron

Â Munchausen movie. But, at any rate, from that fable is where

Â the term, pulling oneselves up from one's own bootstrap comes from.

Â And then, that's where they got the idea for the name from this procedure.

Â Any rate, Back to the jacknife.

Â So, another way to think about the jackknife is this idea of so called,

Â pseudo observations. So, if you take n times theta hat minus n

Â minus one times theta hat sub i, you can kind of think of these as whatever

Â observation I contribute to the estimate of theta.

Â And then, notice that if, if the theta hat is the sample mean, then these pseudo

Â observations are exactly the data themselves.

Â So, it's sort of this idea of taking what worked in a very neat and tidy sense for

Â the sample mean in trying to extend the idea to other statistics.

Â And then, the sample standard error of these observations is the jackknife

Â standard error. And, the mean of these observations is a

Â sort of bias corrected estimate of the parameter that you're interested in.

Â So, it takes your ordinary estimate and attempts to correct the bias.

Â I have to admit, for my thinking about the jackknife, I kind of prefer to think about

Â it this way in terms of the pseudo observations than in the, sort of,

Â classical development of it.

Â