This course covers the design, acquisition, and analysis of Functional Magnetic Resonance Imaging (fMRI) data. A book related to the class can be found here: https://leanpub.com/principlesoffmri

Loading...

From the course by Johns Hopkins University

Principles of fMRI 1

259 ratings

Johns Hopkins University

259 ratings

This course covers the design, acquisition, and analysis of Functional Magnetic Resonance Imaging (fMRI) data. A book related to the class can be found here: https://leanpub.com/principlesoffmri

From the lesson

Week 3

This week we will discuss the General Linear Model (GLM).

- Martin Lindquist, PhD, MScProfessor, Biostatistics

Bloomberg School of Public Health | Johns Hopkins University - Tor WagerPhD

Department of Psychology and Neuroscience, The Institute of Cognitive Science | University of Colorado at Boulder

Welcome back to Principles of fMRI.

Â In this module, we're going to introduce the general linear model,

Â which is the bread and butter work horse of statistical analysis.

Â In this module and the following series,

Â we'll walk you through how to construct the GLM and how to use it for fMRI.

Â So there are multiple goals in the analysis of fMRI data.

Â And they include localizing areas activated by a task or in relation

Â to a process of interest, determining networks corresponding to brain function,

Â functional connectivity, and effective connectivity, and making predictions about

Â psychological or disease states or other outcomes from functional imaging data.

Â All of these can be handled in certain ways in the general linear

Â modeling framework.

Â So in terms of situating us in where we are in the whole process,

Â we are here in the data analysis portion.

Â And to give you a little bit more detailed view and an overview of the GLM analysis

Â process, it's typically a two level hierarchical analysis, and

Â we analyze within subject effects, individual by individual case,

Â that's the first level, and secondly we do analysis across subjects or

Â across groups in a group analysis or a second level analysis.

Â We can do this in stages and that's one approach, a common approach.

Â Also hierarchical models combine both those levels into one integrated model.

Â So where we are in the processing stream and the steps are first,

Â design specification, or building a model.

Â Secondly, that model is combined with real data and estimated, and

Â effects are estimated at each voxel.

Â And contrast images are calculated, we'll talk more about those later,

Â those are combined with images from other subjects into a group analysis.

Â And then finally, we can make inferences about the areas that are activated in

Â that group and localize them anatomically, talk about them.

Â So let's first introduce the GLM family of tests.

Â The general linear model approach treats the data as a linear combination of

Â model functions, predictors, plus noise, or error.

Â So you can think of it as breaking the data up into the part

Â that I can explain with the model, and the part that I can't explain.

Â These model functions are assumed to have known shapes,

Â the simplest one being a straight line.

Â But their amplitudes, or their slopes are unknown and

Â those are what need to be estimated when I fit the model.

Â I'm not limited to straight lines, I can also fit smooth, pre-specified curves or

Â other functions, and we'll look at more examples of that later on.

Â The GLM encompasses many techniques that affirm our data analysis, and

Â also data analysis more generally, so

Â chances are you've used some version of this in your research before.

Â Let's look now at the entire GLM family.

Â While many of you might be familiar with simple regression, one outcome,

Â one predictor, or with ANOVA, an analysis of multiple categories, and those are both

Â instances of the general linear model, so they fit with the broad GLM framework.

Â And in fact, they're instances of another subclass of the general linear model

Â multiple regression, which is a case where you have one outcome and

Â multiple predictors.

Â And in fact, any analysis that I can do in a ANOVA framework, I can do that

Â exact same ANOVA analysis in regression, multiple regression, framework.

Â So those are actually interchangeable at a mechanistic level.

Â They're all examples of the GLM.

Â And more broadly, in multiple regression is an instance or

Â class of instances of the general linear model more broadly.

Â And that encompasses models like mixed effects and the hierarchical models,

Â timeseries models with autoregressive complements to them, and

Â other advanced tweaks, like robust models, penalized regression models.

Â Many of you have heard of LASSO or Ridge.

Â Those kinds of things.

Â And finally, there's this broad category of generalized linear models,

Â where I can incorporate ideas non-normal errors, different error distributions,

Â and logistic regression is one example of that.

Â So all of these are different instances of the general linear model.

Â In many cases there is a simple close form algebraic solution so

Â I can solve the equations, an estimate that model in one step and

Â many other cases require iterative solutions so I have to alternate

Â between estimating the mile per minute and estimating the air structure for example.

Â So this is the simplest example, simple linear regression, one predictor,

Â one outcome.

Â And for our purposes here, we'll just talk about four stages that we go through.

Â One is we specify the model.

Â In this case, it's very simple.

Â We posit that there is a linear relationship between

Â the predictor and the outcome.

Â That's the model, the simplification of this complex data into a compact form.

Â Then we estimate the model.

Â In this case, this means that we have to estimate the slope and

Â the intercept of that model, or where it crosses the y axis.

Â Third, is the statistical inference.

Â I'd like to cast the significance of that slope and get a P value,

Â which relates to how likely is it that I've observed a slope

Â like this under the null hypothesis that there is no actual true relationship.

Â That the line is actually flat.

Â And finally, when I find significant effects I want to make a scientific

Â interpretation, which has to do with the meaning of this relationship.

Â So this is another view of the GLM family, I think it's a useful to situate us.

Â All the GLM models are characterized by the use of one variable,

Â which is a continuous variable as the dependent variable of the outcome.

Â So one continuous DV, and depending on what the structure of the predictors are,

Â then one is doing different kinds of tests.

Â All GLM tests, all with the same fundamental linear algebraic equations.

Â So, if I have one continuous predictors, that's simple regression.

Â If I have two continuous predictors, that's multiple regression.

Â Let's say I have a categorical predictor with two levels, male or female.

Â And that's what I'm using to break the outcome,

Â that ends up being a two sample t-test.

Â If you have a categorical predictor with three or more levels, basketball players,

Â football players, baseball players, and then the outcome might be memories,

Â core performance, that's a one way ANOVA.

Â If I have two or more categorical predictors, arranged in factors,

Â that's a factorial ANOVA.

Â And we'll look at examples of that later as we go on.

Â Another important extension of the GLM is the case where

Â we have multiple observations on the same people, or repeated measures.

Â So all of these in blue here are repeated measures designs and

Â they involve essentially then correlations in the error structure, so

Â fight has to one person at time one, time two, time three, that's repeated measures.

Â And because I have multiple observations on the same people,

Â then those observations are linked by the person.

Â They're no longer independent from all the other observations.

Â Or, sorry, from one another, I should say.

Â So that's repeated measures design and

Â if you have a within-person predictor, one within-person predictor,

Â that's a paired T test, time one, time two, two levels.

Â If you have multiple repeated measures, time one, time two, time three,

Â time four, we're doing a one way repeated measures ANOVA.

Â We can also use MANOVA to analyze such data.

Â If I have four more repeated measures organized in a factor structure.

Â For example, I might have a memory experiment where I have

Â words that are highly image-able or not, and they're presented in visual or

Â auditory modality, that's a two by two factorial design, and

Â that's a factorial repeated measures ANOVA design.

Â That's very common in functional neuroimaging experiments.

Â And if I have repeated measures, the same with a repeated measure structure,

Â but now I start to add between person predictors, like

Â a moderation of that effect correlation with age or with performance,

Â or with group status, patient versus control, then we'll just call that a GLM.

Â And this illustrates that I can mix and match with in-person and

Â between-person factors and variables, and all within the same analysis framework.

Â Another important extension is this idea of

Â introducing other kinds of correlated error structures.

Â So one example is that time series correlations.

Â Events that occur at one time point depend on what happens at

Â time points before that.

Â A quintessential example is the stock market.

Â Things happen in the stock market and influence multiple time points.

Â So each measurement across time is not independent from each other one.

Â And we can use the generalized lead squares framework with iterative models to

Â model that kind of structure.

Â This is the basic structural model for the GLM.

Â And we'll first break it down and then show it in one compact equation.

Â So up on top here, what you see is, y is the dependent variable

Â throughout the course, and x is going to refer to predictors.

Â And here I've got time point i, so we've got the outcome at time point i,

Â is broken down into beta not, which is an intercept parameter

Â that captures the average across time, constant across time, plus beta one,

Â times vector one, plus beta two, times vector two, plus beta three and so on for

Â as many predictors as I have in the model.

Â Those beta's are regression slopes.

Â When I estimate them they become beta hats, we always use hat for an estimate.

Â So this breaks down the model into the part that I can explain with the model.

Â Some combination of beta values are slopes times the predictors plus errors.

Â The residuals are everything left over.

Â And the job then, when I estimate the GLM is to solve for

Â that beta vector the series of beta naught, beta one, beta two, beta three.

Â I do that typically by minimizing the sum of squared residuals,

Â although there are other options as well.

Â Now what you see on the bottom here is if I write that same equation sort

Â of messy equation in matrix form, I get one very compact, beautiful equation.

Â Y = X time beta plus error.

Â And let's take a closer look at this and break this down again.

Â So, Y = X beta plus error.

Â That's decomposed into Y is a column of observed data,

Â and that's modeled in terms of a design matrix,

Â which is the intercept plus all the other predictors.

Â And the intercept is usually a constant value, in this case it's just a column of

Â ones, plus a column for each predictor or regressor in the model together.

Â That's X, times beta,

Â beta is a vector of the model parameters all the regression slopes in the model

Â plus error, all the residuals, or everything left over.

Â So here's a non-fMRI example, and later, we'll map this onto the fMRI context.

Â So in this non-fMRI example, I'm interest in whether exercise predicts life-span.

Â So the outcome, this is made up data, [LAUGH] the outcome is going to be

Â life-span, a predictor is going to be exercise intensity level, or amount,

Â which is a continuous variable and we'll introduce one other variable which may or

Â may not be important here.

Â Let's say it's a covariate.

Â And this is going to be gender, male or female.

Â And so now I've got two predictors in my model.

Â Let's look at how this works.

Â Well first, let's look at the data.

Â This is an example of what the data might look like.

Â Exercise is a predictor on the x-axis, and lifespan is to outcome on the y axis.

Â And now I've got the group of females and the group of males.

Â I can see that females and males are different.

Â And that categorical variable is going to be an additional predictor.

Â So I can look at the effect of exercise controlling for

Â gender, gender controlling for exercise.

Â And let's look at how this works in terms of our design matrix and outcome.

Â So the outcome data is the life span which is continuous variable distributed

Â across the series of observations.

Â In this case observations are subjects, so

Â this is what the life-span data might look like.

Â And that's decomposed into the part that I can model with the design matrix, and

Â there's the design matrix itself.

Â There's the intercept, which is a constant,

Â effect of exercise is continuous, and now sex is included as a categorical variable.

Â Because there are two levels, I can code that in regression with values of one and

Â negative one, for example, maybe one for female, negative one for male, or

Â vice versa.

Â And that looks like that as a predictor and

Â that design matrix is going to be multiplied by the model parameters, which

Â I estimate when I fit the model, plus the error residuals, everything left over.

Â So this example illustrates how I can do a rather

Â complex design very simply using the general linear model.

Â And this design would be called an ANCOVA,

Â analysis of covariance design in the traditional behavior literature.

Â So, that's the end of this module.

Â In the next module we'll begin to map GLM onto fMRI data specifically.

Â Coursera provides universal access to the worldâ€™s best education, partnering with top universities and organizations to offer courses online.