0:26
So you will have seen in previous courses, how well conducted
genome-wide association studies followed by fine mapping,
can identify causative variants, causative risk variants.
So these are genetic variants that are directly associated with disease risk.
Their effect is independent, which means that it's not just the reflection
of linkage disequilibrium with other variants,
it's really an effect that is due to themselves.
That's of course a very good first step in the analysis, but
what we really want, and what the pharmaceutical industry wants,
is to identify the genes that are affected by these causative variants.
As well as the downstream pathways that are perturbed as a result of
this genetic variation, hence contributing to increased risk to develop the disease.
1:35
So remember that causative variants come in two types.
On the one hand, you have the variants that
are affecting the sequence of the gene product,
typically the amino acids sequence of a protein.
They are coding variants.
They can either be stop/gains, stop/loss, frameshift,
splice site, missense variants, or large deletions.
The second type of causing variants or
causative variants are regulatory variants.
They would typically affect gene switches or regulatory elements that
participate in post-transcriptional gene regulation.
Imagine that the causative variants that you have identified
by fine mapping, following GWAS, is a coding variant.
So in that case, the target gene,
which we will call the causative gene is, of course, identified beyond any doubt.
This is especially true, if, in fact, multiple independently
acting causative variants are coding variants in the same gene.
3:23
Remember that the majority of the regulatory
variants operate on gene switches.
What that means is that their effect will most of the time go through
alterations in the expression levels of their target genes.
So to identify the target genes, what we can do is to perform so
called eQTL studies, where e stands for expression and
QTL for quantitative trait locus.
So what we are going to do is we are going to test whether the causative variant for
the disease, which we believe to be a regulatory variant,
is associated in one or more disease relevant cell types
with variation in the expression level of genes in its vicinity,
as assayed by transcript levels, steady state transcript levels.
So what we typically are going to do is to collect samples
from a cohort of preferably healthy individuals, I'll come back to that.
We'll try to collect multiple cell types so that we increase our chance
that at least some of the cell types is disease relevant.
That means that it participates in the declaration of the disease.
We are going to extract total RNA.
And we are going to measure RNA levels of all the genes by methods that
are available for transcriptome analysis, which can either be array based, or
increasingly are based on next generation sequencing, called RNA-seq.
What we're then going to do, is we are going to sort the individuals
from this cohort by the genotype for
the disease associated causative variant.
Say that the two alleles are A and B, we are going to have the AA individuals,
the AB individuals, and the BB individuals.
And in all available cell types, we're then going to look at
the expression level within these three groups of the genes
that are located in the vicinity of our causative variant.
And what we hope is at some point to see that the expression level of
one of these genes in one of the examined tissues is in fact correlated,
if you want, with the genotype at the causative SNP.
So that would be a very good start to try to identify
the target causative gene perturbed by the disease
associated variant, that we have identified by GWAS.
6:31
Now it's important to realize that these eQTL are in fact very common occurrences.
A very large proportion of our genes is under the influence of one or
more regulatory variants.
So just the fact that your disease causing variant shows an effect on the expression,
doesn’t mean that it has an independent effect on the expression of the gene,
as it had an independent effect on the disease.
So it's good to go a little bit further.
And if we find a promising positive signal on expression,
ensure ourself that the variant,
the disease causing variant, not only affects the expression level, or
correlates with the expression level, but that this is truly an independent effect.
One way to look at that is to check
whether what we refer to as the disease association pattern,
that is the association of all the SNPs in the region on the disease,
is very similar to the expression association pattern, that is
the effects of all the SNPs in the region on the expression level of the gene.
And one desired scenario is the disease causing variant has the highest effect on
the disease, as it has the highest effect on the expression level of the gene.
In a situation like that, we would have a very strong candidate,
causative gene, in our hands.
So this study, where we test the effect of a disease associated
SNP on the expression levels of neighboring genes in
a variety of tissues, is called a Cis eQTL analysis.
Because our assumption is that the SNP affects a gene switch,
which controls a gene, which has to be located somewhere in the vicinity,
even if that vicinity can be as broad as, let's say, 1 million base pairs.
A complimentary source of information in this context
can be obtained if the expression levels are measured by RNAseq,
where one can look specifically at heterozygous individuals.
The prediction is that in such individuals,
there will be an allelic imbalance between the two alleles.
So that means that we see that one allele is more strongly
expressed than the other one in heterozygous individuals.
9:21
A few more aspects about the Cis eQTL analysis.
So what is important to stress is the fact that these
expression studies do not have to be performed in the same cohort,
as the one where the GWAS study was done for the disease.
It's actually much better to do it in a healthy cohort, because these risk
variants, if they are common, exist in all of us, whether we are affected or
not, and their deleterious effect operates in each one of us.
And so that's what we're trying to see when we perform the Cis eQTL studies
in healthy individuals.
We actually think it's better to do it in healthy individuals, because otherwise
we will have to fight with noise introduced by
10:17
a whole wave of secondary effects that are reflecting the disease condition,
rather than the primary effect of the risk variants.
A second thing which we can mention, is the fact that these transcriptome studies
that are the basis for eQTL, and especially Cis eQTL studies,
can be complemented by other omics technologies.
Such as DNAse-seq, ATAC-seq, ribosome profiling,
proteomic analysis, to gain some insights in the mechanisms
that underlie the observed Cis eQTL effects.
11:00
It is noteworthy that when we look at present
results of the confrontation of the known disease causing
variants with eQTL studies, that we observe that for
a large proportion of what have to be regulatory disease-causing variants,
we can actually not find a matching Cis eQTL.
What we believe this means is that we're not looking at the right tissue,
or we're not looking at the right tissue in the right conditions, for
instance, stimulated immune cells.
This emphasizes the need to continue to enlarge the panel of
tissues and cellular conditions under which eQTL studies are performed.
11:53
So this is for the identification of the genes that are directly affected by
the causative variants, whether it be coding or regular variants.
So the perturbation of the causative gene will very
seldomly be directly affecting disease risk.
What it usually does, is to trigger a wave of secondary downstream effects.
And the pharmaceutical industry may be equally interested in identifying these,
let's say, pathways, which upon perturbation affect disease risk.
So how could we use modern technologies to identify
components of these perturbed downstream pathways?
Well, in fact, one possible approach is very similar to the Cis eQTL studies,
except that we are not going to limit ourself to the genes,
to, on the one hand expression patterns or transcriptome levels
of genes that are in the immediate vicinity of the causative variants.
We're going to explore all the genes,
we're going to explore them in as many tissues as possible.
And we're not going to limit ourself to RNA, but we're going to look at potential
downstream effects, including the amounts of protein, but
also the amounts of specific metabolites or glycans.
Everything that we can assay in a very effective way,
using the presently available battery of omics techniques.
So the same cohort that we have used to identify the Cis eQTL
effects can in the same way be used to search for,
what we will now refer to as trans-QTL effects.
Because we're not going to limit ourself to target molecules that are in
the direct vicinity attached to the same molecule as the causative variant.
And the prediction is that if we consider one of the molecular
components of the downstream pathways, their quantity should actually move
in a way that is also correlated with the genotype as the causative SNP.
The correlation may be not as strong, because it's a downstream effect, but
if the sample size is sufficiently large, our prediction is that the disease
associated SNPs will also be associated with the quantities
of components of the downstream pathways, whichever they are.
And again, this is best done by using a healthy cohort
to avoid the noise that is introduced by the barrage of downstream effects.
14:57
I'll just briefly mention one specific type of measurement,
which relates to our microbiome.
We have come to realize that we are living in symbiosis with
trillions of microorganisms that colonize especially our gut, but
also other body cavities, and even our skin.
We typically characterize these bacterial or
15:27
colonies of microorganisms using the concept of microbiota,
or using the concept of microbiome.
So the microbiota would be a description of all the species
of bacteria and their abundance in one of our body parts, I may say.
The microbiome relates more to the genomic organization,
the complex of genes, that are contributed jointly
by this community of microorganisms.
The microbiota can very simply be studied by just
targeting so-called 16S ribosome RNA genes.
These are parts of the genome that are shared by virtually all microorganisms,
and that can be amplified using primers
targeting very conserved part of this gene.
They are flanking regions which are variable, and
allow to identify the different bacteria that are present in a sample.
So by performing 16S ribosomal RNA
in combination with next generation sequencing, one can very effectively
have an idea about a composition in terms of species,
let's say of the microflora, of the explored microflora.
16:55
On the other hand, one can use so-called metagenomic approaches, where
the bacterial DNA is shotgun sequenced, everything is sequenced together.
And a posteriori, the genomes of the different organisms are reconstituted,
rebuilt from the overlapping pieces.
So this gives us a more comprehensive picture of the corresponding microflora.
And also allows us to make functional inferences, which cannot be as accurately
done using microbiota 16S ribosome RNA approaches.
So we have the means to effectively study the composition
of the microflora that we live in symobiosis with.
17:44
If we go back to the study of the downstream effects
following the perturbation by the causative variants,
it is reasonable to make the hypothesis that our genome in fact
affect microbiome composition, so microflora composition.
And that this in turn generates or affects the risk to develop a disease.
So exactly as we were performing association studies between
the genotype and the disease causing SNP and Cis eQTL effects,
following by trans-QTL studies to look for downstream effects.
We can do exactly the same now to see whether the genotype at a disease
causing SNPs is correlated with variations of the microbiota or
microbiome in a disease-relevant part of our body.
So these microbiome-QTL studies are now becoming
an integral part of the study of many complex diseases.