0:37

And pick them one by one manually.

Another common way is by cross-correlation.

If you have some kind of template, and

you know at least a little bit about the particle you're looking for.

For instance that's it's round or at least something about its size.

You can use that as the template and search computationally by cross

correlation for everything in the images that looks close to that.

There's also been other methods tried, neural nets, edge-detection, and

other things.

In fact, there's been competitions held to motivate

the development of new algorithms to find particles within these images.

So by whatever method it was done, let's assume that now we have images where each

of the particles has been picked, here shown by the crosshairs on each particle.

Now, this image makes the task look easy, because the particles are so

high contrast.

It's a highly focused image of a very large particle, and

it's a very good image at that.

And so it seems easy to pick out the individual particles.

In other cases, it's not so easy.

This for instance, is a picture of a much smaller particle, groweli,

taken closer to focus.

And a particle picking algorithm has already found this particle, for instance.

And this one and this one and this one.

So you can see in this case it's very difficult to detect the particles from

the background but nevertheless it needs to and can be done.

2:42

In addition, there can be conformational heterogeneity.

If the particles have subtly different confirmations,

we need to be able to detect that.

So the next step in a typical single-particle analysis image processing

flow is to take each of these individual particle images And

classify them into classes of images that all look the same.

And the hope is that within a single class of images all of the particles in that

class are stoichiometrically homogeneous, conformationally homogenous.

And they were all recorded from the same point of view.

Now note that if different copies of the particle all froze in the ice in the same

orientation, they could have different in plane rotations.

And there images should still be classified into the same class.

because the images look the same even though they have different rotations.

Now there's lots of different algorithms to classify the particles.

One of them for instance is called hierarchical Ascendant Classification.

And the basic idea is what if we started with N images.

4:05

We could ask that these two classes be defined by

maximizing the inter class variance.

And minimizing the intraclass variance.

And we could allow the computer to try different potential classifications and

pick the one which created two classes with the maximum interclass variance.

And the minimum intraclass variance.

Now let me rename this class A2.

And this will be class A1.

And then we could do the same thing for those classes.

We could take all the images in class A1 and

we could ask the computer to further divide them into two sub-classes.

Let's call them B1 and B2.

That would maximize the interclass variance between those two and

minimize the intraclass class variance.

And likewise we could do that to A2, to create classes B3 and B4.

Then we could further take another step and classify this into C1.

In C2, in C3, in C4, etcetera.

We could do this in as many steps as needed until we had lots of different

classes.

And like I say, there's lots of different ways that one could classify the images

to try to create pure classes.

5:29

But once that procedure is done, then you would align all

the images within a class so that they're transitionally on top of each other, and

check for any in-plane rotations of one image with respect to the other.

And once they were all aligned in that fashion,

they're averaged together to produce what are called class averages.

And that's what is shown here.

And this again was taken from a tutorial for EMAN, one of the software packages for

single particle analysis found at this website.

And so this for instance is the class average of a lot of images.

Of the grow EL particle and this is obviously a top view and so

in the class average of the signal to noise ratio is much higher.

Here is a class average showing particles from a side view where you can see

the four different rings that form the grow EL molecule and here again,

the signal to noise ratio is much higher then in any individual image.

And so we see a number of different class averages.

Some of them are similar, some look like side views, some are top views,

some are in between the views.

Now there are many ways to group images into classes.

7:11

this access would be the density in pixel number one and

this axis here would be the density in pixel number two.

And this axis would be the density in pixel number three.

Obviously each image could then be plotted in this three dimensional space

according to the density value here here and here.

And it would product a particular spot somewhere.

Well now imagine in a computer if you defined an N dimensional space,

where here let's say there are N pixels total in the image, and

you defined an N-dimensional space, which of course we can't draw,

you could still plot the position of each image in that space.

And so you'd have scattered points throughout the space.

8:05

And it could be that you had a number of images that were

near each other in space, in different regions.

You might then find that your original axis of each density of each pixel

were not nearly as useful in describing the diversity of images.

And there are ways to calculate what are the so

called principal axis through this space that best

characterize the big differences that exist between all the images.

For instance, in this kind of a space, if you have a lot of the images along this

line, this axis might well define the difference between these images that

cluster over here and a set of images over here that cluster over there.

And there might be a second principal axis that best reveals that

there's a cluster over here separated from the clusters over there.

So these are called principal axes, or a principal component analysis.

And images can be created, which define the extremes

of these difference axis, so there can be an image,

and these are going to be called the eigen images.

This one and this one and perhaps there's one over here.

It's a large number of eigen images that go through this space

that define the extreme differences between all the images.

10:06

number two, plus etc, to as many eigen images as you want to consider.

So basically, decomposing the images with a basis set of

these special eigen images that represent the diversity within the image set.

At this point, you can cluster the images according to their positions

along these principle axis and generate different classes.

And within each class, you can then align and average each of the images

in that particular class to give you some kind of a class average image.

That if you were to plot it in this space,

would appear near the center of that cluster.

And so you could get a class average there, and a class average there.

Then you could take those class averages and

realign each of your experimental images with the class averages and proceed.

Once we have a set of class averages, the next step is to figure out

the relative orientation of each of those class averages,

so that we can merge the information that we get from each one

into a full three dimensional reconstruction.

And to illustrate that I'll use slides, again, take form the EMAN tutorial here.

And as I showed in the intro, but we'll go into a little bit more detail this time.

12:20

Then we go to the next class average.

So this is a different class average.

As you can see, this isn't really a top view.

It's not a side view either, it's some kind of oblique view.

And we calculate its fourier transform to get the amplitudes and phases here.

And then we have to find what is its relative orientation

compared to the first one which is here.

Okay, now one principle is that both of these transforms

will share a common line of amplitudes and phases.

And so the goal is to discover that line.

And here, we've put that line on the axis of spatial frequency in the x direction.

So this is where the two central sections through the 3D fourier transform.

This central section, which comes from the second class average, and

this central section, which came from the first class average.

Both of those, of course, cut through the origin and

there's at least one line of amplitudes in phases that match, and

initially we'll just set that to arbitrary be spacial frequency in the x direction.

14:30

And then let's say, we took the amplitudes in phases on this line.

And we wrote them across in a table right there.

And then we took the next set here across the line like that and

we wrote table here, and you see the pattern.

These amplitudes and phases we would write in the table here.

And so, we can take the amplitudes and phases of lines across this transform and

merely write them out in a chart looking down this chart.

Then once we get all the way around 180 degrees,

we realize that the next one were we to map it out.

The first line had amplitudes in phases across the transform right there,

and after we went around 180 degrees we realized the next line, if we were to draw

it, would be amplitudes in phases going across in the other direction.

I mean, for those lines to be superimposed,

it's just one is at this direction, one is the other direction.

So we can see that the next line we would write would have the same amplitudes and

phases as this line, just in the opposite direction.

In other words, this one's going that way, this one is going that way.

And we could continue that all around the transform and

replicate these values.

My box wasn't quite big enough, I'm going to squeeze the last one in.

But you get the picture of setting up this table.

Now, we could take the 4A transform of this second central

section and, let me try to draw this appropriately.

If we took the 4A transform of the second central section,

and again it's full of amplitudes and phases around there.

We could do a similar procedure and take all of the amplituding phases across

that line right there that line right there then we could right them in a table.

And then we can take all the amplitudes and phases in the next line and

write them in a table, and all the amplitudes and phases here, and

write them in a table, and these, etc.

17:37

Okay, and at this point, we could then start to compare them.

And plot the correlation

of say this blue line with the blue line here.

Now their colors match now in colors.

But it's not necessarily the case that the amplitudes in

phases here would match the amplitudes in phases here in this other blue line.

So the colors aren't meant to code for

anything other than these different colored lines or

the amplitudes and phases going across the transform in different locations.

Anyway, we could compare these amplitudes and phases with these amplitudes and

phases and there would be a value here for the correlation.

And we could compare these amplitudes and phases with these and

there would be another value for the correlation and these amplitudes and

phases with the first one and there would be a correlation.

And we fill out this whole chart of all the cross-correlation values

of all these lines compared to all these lines.

And in a successful case there will be one particular spot,

one peak that's much higher than the rest, telling us that it was the amplitudes and

phases here, say, that matched best against these amplitudes and phases.

Meaning so in this case it was this line across the transform

matched very well this line across this transform.

And this kind of map has been called a sinogram and

it allows us to find the relative orientation between this transform and

this transform, which allows us to find this common line and

fill three dimensional reciprocal space with amplitudes and phases appropriately.

Now we move on to the third class average.

Here it is.

Again it doesn't really look like a top view or a side view.

It's some kind of a bleak view.

We calculate it's foray transform.

And then we find the common line that it's transform has

with both the second transform and also the first transform.

So we already knew where the common line was

between the first transform and the second.

It was right here but we didn't know what that angle was.

Now given the third class average,

we can find the common line that the third class average has with the second.

So this is three versus two and we can also find the common line

that the third transform has with the first transform three verses one.

And now we have enough angular constraints to reveal the correct

angles, the correct orientation of all three cross-sections.

There's only one way that each of these

three central sections could cross through each other.

With these particular common lines.

And then we move on to the fourth class average.

Here it is, it's clearly a side view.

We calculate its Fourier transform, amplitudes and phases, and

then we find the common line that it has with the third, the second, and

the first and together this is now an over determined set.

Which reveals how to insert the amplitudes and

phases of this power spectrum into 3D 4A space.

And so this, the process is shown with four different central sections.

You might have though 20 good class averages though.

Giving you 20 different planes.

And so it becomes a least square fit problem,

to determine the vast relative orientations of all those cross sections,

to make the common lines fit as best they can.

It's heavily over determined at that point and so you pick the best fit.

Once you fit the power spectra of all your class averages into reciprocal space,

you then interpolate that data onto a regular Cartesian coordinate system in

reciprocal space and then calculate an inverse Fourier transform in 3D.

To generate a three dimensional reconstruction of your object.

The next step is to take this three dimensional reconstruction

22:01

and reproject it in all possible directions.

So if we were to take a projection image of the particle from that view,

it would look like it's four stacks of rings.

And if we took a projection of the particle from the top,

it would look like a circle.

And if we took a picture from this angle,

it would look like the stack of four rings.

And, of course, from the top it would look like a circle.

And we could imagine projections in all kinds of angles here,

which I don't even know if I draw right.

Just imagine that we could calculate

the projections in a very large number of possible directions.

And not just in the plane.

Imagine an image coming out towards us and an image coming out there and

an image going out towards the back.

So the full three-dimensional view, we calculate projections of what

would a picture of this object look like from all possible directions.

And we call the set of these pictures,

we call them reprojections of the initial model in this case.

And so in the actual example of this growEL model data set

in the tutorial, the reprojections of the initial model look like this.

Here's the top view, here are other views that are further and

further away from a top view and then here is a nice side view and so this is

a set of all the different views that you could obtain from that initial model.

Now once we have the set of reprojections then we take all of our initial images and

we compare them to each possible reprojection and

reclassify which Orientation or which view does each

of our original images give us of the particle?

And this process is beautifully depicted in this nice figure from a recent review.

So the idea here is that these are the initial images,

individual images of the particle of interest.

And then there was an initial model, and

the initial model is reprojected into all different possible views.

And so here they are showing only give, but

there might actually be 40 or even hundreds later in the process.

And then we take our initial images and we ask using cross correlation usually,

which view of the particle does each of these images represent?

And we might find that this image say and

this image, they match this reprojection the best.

So these two are now assigned to this class and

used to produce a new class average.

So if we average that image and that image,

we can get this new class average here for that view of the actual particle.

Then we take our next experimental image and

we check it against all the reprojections of the model.

And maybe it matches this one the best.

And we go through and we check all the experimental images, and in this case

they're showing maybe this one also looks very much like that reprojection.

So we take that image and this image, and we average them together

to produce a new class average of the particle, which is represented here.

25:35

In like fashion we go to the next experimental image, and we check it

against all these reprojections, and we find that it matches this one the best.

And perhaps there are many more.

I mean, there are seven example images shown.

The process might actually involve 2,000 particles or maybe 200,000.

Or even more.

And so in this case though, the simplified case, there's only one image that matches

that reprojection, so it becomes the new class average for that view.

In like fashion, this one is assigned to this view.

And is used to produce this new class average.

And finally this experimental image is found to match best this view

of the model, and so it's used to produce this new class average.

Once you have this set of new class averages, they are merged in reciprocal

space again to produce a new refined three dimensional model of the particle.

Then this refined model, here we go, the refined model,

can then be reprojected into another set of reprojections.

And again, our experimental images can be matched against them.

And so, individual experimental images

can be assigned to different views and different iterations of this process.

And the whole process is iterated until the classes converge and

the final model converges, and the resolution increases all along the way.

And the kind of resolution gains that can be achieved is illustrated well

in this figure from Steve Ludtke and his colleagues,

showing that given that initial model, it's reprojected,

the experimental images are reclassified, realigned,

and used to produce a next model of the object.

And then in the second iteration, the resolution goes up a little bit more and

the third iteration, and the fourth iteration, and

then here is a final Reconstruction for this particular data set.

And it's compared to an x ray crystal structure.

That's low pass filter to approximately the same resolution.

Showing that the procedure does give a correct structure.

Now, an important variation on that basic process,

takes advantage of what is called a maximum likelihood method.

And in order to explain this, let's go back to this figure.

29:05

In like fashion, the likelihood that experimental image number

two represents this view of the particle is estimated, and

this view, and this view, and this view, and this view.

And we could represent these weights as, say, 2a, and 2b, and

2c, 2d, and 2e, et cetera.

The likelihood that image three represents that class,

and this class, and this class, and this class,

and this class are calculated 3a, 3b, 3c, 3d, and 3e.

Et cetera, et cetera for all the experimental images.

29:53

And then, in order to generate a refined class average, this

new class average will be the weighted sum of all of the experimental images.

For instance it will be the weight 1a times image number one,

plus the weight 2a times image number two,

plus 3a times image number three, et cetera,

to image n, in the entire data set.

So each of these refined averages becomes

a combination of all the experimental images with different weights according to

the probability that they represent that point of view.