Hi, in this module we're going to keep talking about group level analysis, but we're going to focus on the more statistical aspects of the problem. So, whenever we apply statistics to real world problems, we usually tend to separate between the model used to describe the data, the method, a parameter estimation, and the algorithm used to obtain them. So the model uses probability theory to describe the parameters of some unknown distribution that is thought to be generating the data. The method defines the loss function that's minimized in order to find the unknown model parameters. Finally, the algorithm defines the manner in which the chosen loss function is minimized. So each of these components come into play whenever we're doing a statistical analysis. And that's going to become important in this module. Before performing a group analysis, we have to perform several preprocessing steps in order to insure the validity of the results. These preprocessing steps include motion correction, but which is intrasubject registration. Spatial normalization, which is intrasubject registration, and a bit of spatial smoothing to overcome limitations in the spatial normalization. So, motion correction is always important when we're doing analysis. But here, spatial normalization is critical, because we need the voxels to align Across subjects. So if I take a specific voxel at a specific location in one subject, I want to be able to obtain the same voxel in another subject, so that the brains have to be aligned to one another. However, spatial normalization isn't perfect, so we need a degree of spacial smoothing in order to overcome the limitations of spatial normalization. So all these preprocesses that should be performed before performing a group analysis. When performing group analysis, we often use multi-level models, as Tor described in the previous modules. And they often are performed in two levels. The first level deals with individual subjects, and the second level deals with groups of subjects. And so, here's a little cartoon of that. Where we have a first level model, where we have in different subject. And then we have a second level module where we combine the subject data. And again, all inferences typically performed at this massive univariate setting, so this is the reason why it's so important that all the brains are aligned to one another, because we want the voxels to mean the same thing across subjects. So the first level model, basically here we have to go back and look at the traditional GLM approach that we did when were talking about single subject data, so this is going to be the same thing here. So suppose we have data from n different subjects, and for each subject we use the following model. Yk = XkBetak+ek. So this is just a standard GLM analysis such as we talked about in the single subject setting, but now we just put a k here to indicate which subject we're interested in. And so again, X of k is the design matrix for that subject, and it may look like this. So in the first level, we have auto-correlated data, but with a relatively large number of observations. And so, we can combine the first level models for all subjects in the following way. We can just concatenate all the data across subjects from Y1 up to Yn in this manner. In this matrix Y, we do this for housekeeping purposes here. We can combine all the design matrices for the subjects into a grand design matrices for the whole population, which is X. We can similarly do this with beta and the noise e. And also the variance covariance matrix can be combined in the following manner. The reason we do this is just simply for bookkeeping purposes. And so, and so this just now, the y x beta e and v are containing information about every subject in our study. So, the full first level model using this notation can be written in this manner, y is equal to x beta plus e. So, note that this model is separable, and it's possible to fit each subjects data individually, but the reason we keep it this way is just simply for book-keeping purposes. Now, once we have the first level model fixed like this, we can move to the second level model. And here we want to estimate the different beta parameters from different subjects and tie them together in a groups to setting. To the second level model it can be written as beta is equal to X of g. So this is a new design matrix, the group-level design matrix. Beta g, which are the group level parameters plus eta. And eta is now normally distributed with the means 0, and a variance co-variance matrix V of g, which is the group level variance. Here x of X of v is the second level design matrix. For example, we might separate cases from controls, and beta g is the vector of second level parameters. So for the second level model, we typically have IID data, so usually the errors are independent, but we have relatively few observations, so maybe we only have 20 subjects or something like that. And so, here's an example of what the design matrix might look like if we say we had four subjects, two cases and two controls. We might have that beta g of zero is the amplitude for cases, and beta g of one is the amplitude for controls. And in this case, we want to kind of separate cases and controls, and get separate estimates for them. So this is the way we might put up, introduce the design matrix in this case. So, the second level model relates the subject specific parameters contained in beta to the population parameters which we're calling beta G here. And so it assumes that the first level parameters are randomly sampled from a population of possible regression parameters. And this assumption is what allows us to eventually generalize the results to the whole population, and this is what we want to do. Okay, so let's depict this problem pictorially here. So here let us suppose that we have a population of subjects that we could want not study. And let's suppose that for each subject we have a specific beta premier associated with that subject. And if we look at the distribution over this beta premiership across the entire population, might get a normal distribution as the one seen here. So let's assume that this normal distribution has a mean beta g, and a variance sigma squared of g. So beta g is the population average, and sigma squared of g depicts the variance in its population. So these are parameters that we want to be able to estimate. However, in general, we don't have access to this distribution. So what we need to do is we need to take a random sample from it. So let's say that we have a study where we take a random sample of seven subjects. And let's say that the seven subjects that we include in our study take these beta values depicted by these red crosses here. Now in some of these, we have people that have low values of beta, and they're on the left-hand side. And we have people who have high values of beta, which are on the right-hand side. And then we have some people that are in the middle around the average value. Now, this is going to be mapped onto the first level result is if we have a person here who's beta value is beta one, this person is going to tend to be a low responder. Their beta value is going to be smaller, and there's going to be less of activation from that subject. Now, if you look at another person with a higher beta value, they're going to tend to have a higher amplitude here. So each subject has their own amplitude, but these amplitudes are drawn from a larger population. And here, that population is described by these parameters beta g and sigma squared of g, and these are the parameters we want to make inference from. So by making inferences about the population parameters, we can move all our conclusions to this population of people that we're interested in. And that's kind of one of the power of these multi-level models. Is that, now rather than making inferences about the subjects that are in our study, we're also making inferences about the people that aren't in our study, but by assuming that we have a distribution, and that the people that are in our study is a random sample from that distribution. So statistically, now we can summarize our entire model by the first level model, which is just Y = X beta p+ e. And our second model, which is beta + Xg, beta g + eta. So in this case, now we have the first level model for the individual subjects, and we have the second level model with the group parameters. And so, this model can be expanded further to incorporate more levels if we have multiple sessions for subjects, for example, or what not. And so, this two level model can be combined into a single level model as follows. So if we take the first level model, and we express beta using the second level model as follows, where beta is equal to Xg beta g + eda, we can re-express that the full model as follows. And this is in statistics what we call the mixed-effects model. In general we can write this as a single model saying that Y follows the following distribution with the following mean and variance. In statistics there's a lot of different ways of estimating such mixed-effects models. So now we come back to the terminology that we talked about before. So now that we have this model, so we've been talking about the model. We have to figure out a way how to estimate this. And this is where these statistical techniques, and also algorithms, will come into play. So again, statistical techniques define the loss function that should be minimized in order to find the parameters of interest in our model. And so just like when we're talking about single subject GLM, commonly used techniques include maximum likely hood estimation or restrictive maximum likely hood estimation. Algorithms are defined the manner in which the chosen loss functions are minimized. Here commonly used techniques including, Newton-Raphson, Fisher-scoring, the EM-algorithm or IGLS/RIGLS. So let's talk about some of these statistical techniques here. So maximum likelihood is basically maximizes the likelihood of the data. And we've talked about that it produces biased estimates of the variance components. Restricted maximum likelihood, on the other hand, maximizes the likelihood of the residuals. So this produces unbiased estimates of the variance components. So typically in our group level models, we want to use restricted maximum likelihood as a loss function if we want to get unbiased estimates of the variance components. Now, the algorithms, there's a whole cottage industry of different algorithms that we can use to find the maximum likelihood estimate, or the restricted maximum likelihood estimate. And many of them are common across disciplines. For example, Newton-Raphson is an iterative procedure that finds estimates using the derivative at the current solution. And we have Fisher Scoring, which is very similar to Newton-Raphson, but which finds the estimates using Fisher information. And finally, we have the EM-algorithm, which is also an integer procedure that finds estimates from models that depend on unreserved latent variables. For example, the second level layer. In general, how this is done depends on what software package that you use. And different neuroimaging software packages have implemented different types of mixed effects models, like the ones they have discussed. And they differ in which method and algorithm that they ultimately apply. However, as Tor mentioned, a simple non-narrative two-stage least squares approach is what's most commonly used in fMRI data analysis. And this is the so called summary statistics approach. Here results from individual subjects are reused in the second level, and this allows us to reduce any computational burden of fitting a full model. And so just to summarize, the summary statistics approach, here we fit a model to each subject's data. And then we construct contrast images for each subject. And then we conduct a t-test on the contrast images. Now, only the contrasts are recycled from the first level models, and only one contrast can be estimated at a time. And so, this makes a number of assumptions, but if these assumptions hold true, then this is a very simple and straightforward way of doing a multilevel model, which kind of circumvents some of the computational difficulties of using a full mixed-effects model. When using temporal basis sets at the first level, it can sometimes be difficult to summarize the response with a single number, and this makes group inference difficult. Because often times we want to make inference, on say, the amplitude for each subject. But if we have basis sets, the amplitude's going to depend on all of the different basis functions, so that makes second level analysis very tricky in that setting. And here, we can perform group analysis using, for example, just the main basis functions. So, for example, if we're using the canonical HRF and it's derivatives, sometimes we just use the amplitude corresponding to the canonical HRF, and use this in our second level analysis in the summary statistics approach. Another way of doing it is, use all the basis functions and do an F test. And finally, a third way would be to reparmaterize the fitted response. So basically, you take the different basis sets and reconstruct the HRF. And estimate the magnitude for the reconstruct that HRF, and then use this information at the second level. So these are all different ways of addressing this problem. Okay, so that's the end of this module. And this is the last module on group analysis. In the next couple of modules, we're going to be talking about the multiple comparisons problem and how we deal with it in neuroimaging, and FRI in particular. Okay, see you then. Bye.