0:04
Just this mic. Hi. It's nice to get a talk over with Erly.
So, thank you to the organizers for the invitation.
And, thank you to United for getting me here,
even though it was 24 hours delayed.
So, what I'd like to talk about today is,
some work coming out of our group,
better understanding the constraints on
viral evolution with a specific focused on influenza.
So, as probably all of you know,
RNA viruses have exceptionally large population sizes.
They have incredibly high mutation rates.
You can see that here in this diagram,
so you see on the order of 10 to the negative fourth to 10 to
the negative sixth mutation rate per site per replication cycle.
And so, because of both these large population size and high mutation rates,
there is just this general expectation that RNA viruses can very readily adapt both to
new host populations as well as to new selection pressures
put on these viruses from existing host populations.
And so, even though this is generally talked about for RNA virus,
it's just their ability to very rapidly adapt,
what I'd like to talk about today is actually some barriers to
their rapid adaptation to host populations.
And so, what I'd like to talk about are
these two general factors that actually put constraints on viral adaptation.
The first are what are known as these transmission bottlenecks.
So, it's generally appreciated that from when
a disease is spread from a donor to recipient host,
there's some reduction in the viral population size that gets transferred.
And so, what role do these transmission bottlenecks play in
terms of constraining viral adaptation and how large are these transmission bottlenecks?
And the second factor is the factor of genetic linkage.
So, of course, mutations occur in the genetic background of other mutations.
So, how does that genetic linkage actually impact the rate of viral adaptation?
Again, I told you I was going talk about this in the context of influenza.
So, influenza just as a reminder,
is a segmented RNA virus consisting of these eight different gene segments.
I don't need to go into the symptoms relating to influenza.
All of you, I'm sure, have gotten infected with influenza sometime in your lives.
And I'd like to talk about three specific studies that we've
done relating to these constraints and viral adaptation,
one relating to this transmission bottleneck constraint
and these two other ones relating to genetic linkage.
Okay, so just to get everyone on the same page here,
when transmission bottleneck sizes are loose,
as you can see here diagrammatically,
there's the founding population size in
the recipient host is pretty large and then you can even expand further in that host.
In contrast, when transmission bottleneck sizes are tight or small,
a much smaller founding population size gets into the recipient.
And, it's generally appreciated that looser bottlenecks, shown here,
enable much more rapid viral adaptation and that
tight bottlenecks slow the process of viral adaptation at the population level.
And, this is because,
you can imagine especially for an acute infection such as flu,
there might be beneficial mutation that arises in a donor individual.
But, there's not that much time for
that beneficial mutation to actually increase to very high frequencies.
So, during the transmission event,
when bottleneck sizes, if they are tight,
there's a high probability of
that beneficial variant actually being lost and not transferred to the recipient host.
And so, tight bottlenecks can constrain the rate of viral adaptation.
Okay, so over the last several years,
there's just the increasing availability of viral NGS data especially for flu.
And so, what you can see here is from this viral population,
a donor individual from this NGS data.
What we can do is, we can just infer or we can
identify variants and determine their frequencies,
both the donor individual and recipient individuals.
And, what these estimations of
transmission bottleneck sizes attempt to do is they try to figure out quantitatively,
what is the size of the expanding population in the recipient host?
What is this NB? Okay, there are several existing methods
already out there for estimating this NB founding population size using NGS data.
And you can lump these generally into two categories.
One is kind of what we can consider the presence/absence method.
The other one is the frequency method.
So, the presence/absence method is given sequenced virus from the donor,
identified variance and their frequencies,
we just look in the recipient host population to see which of these variants
that are identified in the donor actually are present in the recipient.
We can then calculate the probability
of a variant at a certain frequency and the donor being transmitted.
And, that's a function of this transmission bottleneck size.
So, we can come out with a likelihood estimate of NB given
variant frequencies in the donor and then presence and absence of
these variants in the recipient.
This method is not very high resolution.
So, this other method, the frequency method,
basically uses both free variants identified in the donor and their frequencies,
as well as the frequencies of those variants in the recipient population to estimate NB.
And generally, what is used is what's known as a single generation Wright-Fisher model.
So basically, it has an underlying assumption in this binomial sampling,
so that the viral population in the recipient is just
probably sampled from from the donor.
Okay. So, what we want to do first was actually modify or
extend these frequency methods to be more sensitive and to
actually take into consideration some processes that hadn't been considered before.
So, one thing that we did here in terms of extending this method,
is actually to think of the standing population size,
again as being sampled from that donor.
But then, we really have to consider this process of stochastic growth,
from the time of the founding to the time of sampling,
when the viral populations are very large.
What we also did was we incorporated these variant-calling thresholds.
So sometimes, even if a variant is present in the recipient,
sometimes it's below the limit of detection,
or below that threshold of about 2-3%.
So basically, adding these things onto the method basically
leads to a likelihood calculation,
which is based on a beta binomial distribution.
I won't go into the details, unless asked,
but you can see the derivation in this most recent paper of ours.
Okay. So, what we did was we first kind of tested the method,
on simulated NGS data.
You can see here it's 500 different variants,
their frequencies in the donor,
their frequencies in the recipient.
We applied this beta binomial sampling method and you can see
that we can recover our bottleneck size of 50.
If we instead use the presence/absence method
or this single generation Wright-Fisher model,
we estimate transmission bottleneck sizes which are much lower than
the one that we can recover here which is the true one in a simulated data set.
So, this is where it becomes more
interesting in terms of actually applying it to a real NGS data set.
So, this is a data set that had been previously published,
actually takes place here in Hong Kong, the study.
So, that's really wonderful.
The study is performed by Ben Cowling,
Leo Poon and as you can see by Elodie Ghedin,
was published just a year ago.
So, the study was a cohort study that took place in July and August of 2009.
There were 84 individuals from whom virus was sampled,
at least one at one time point.
And, individuals were found to be infected with either H3N2 or H1N1, the pandemic strain.
A few individuals were actually co-infected.
And, there was also metadata on these individuals or I can just call those data as well,
including things like their age,
vaccination status, their symptoms including temperature, so forth.
Okay, so in this previous study,
which was published and whose data we used,
there were nine transmission pairs that were identified with
this H1N1 pandemic strain and
seven transmission pairs that were identified from this H3N2 strain.
And, those transmission pairs were identified based
on both what was known about who's living in each household,
the time of which they develop symptoms,
as well as how similar their viral populations were.
So, those transmission pairs were already established.
So, what we did then was,
for each of these nine transmission pairs for H1N1 and the seven ones for H3N2,
we inferred the transmission bottleneck sizes by transmission pair.
In purple and in pink,
you can see just an approximate method of
our beta binomial sampling approach
and one is an exact method and because of very high coverage,
those two estimates were pretty much the same.
And, what you can see here is that overall for H1N1,
the transmission bottleneck size was on the order of about 200.
For H3N2, it was a little bit higher.
But, you can see also here that there is
a huge amount of variation between the transmission pairs.
Just to note, NB here is across H1N1 and H3N2,
so taking into consideration both data sets.
And, here in black is the Poon, Song, et al.
estimates that actually use a single generation Wright-Fisher model,
but actually, on a much smaller number of variants,
only about 18 variants.
Whereas, we used variants that numbered in the hundreds.
Okay. So here, just overall bottleneck size estimates,
what we estimated was transmission bottleneck size overall of about 196 variants.
This is the confidence interval.
And, what you can see up here in dots is
the maximum likelihood estimate for each one of those transmission pairs.
So, just a way to think about this transmission bottleneck size,
what we can do is we can look at the function
of the frequency of variant in a donor, right?
So 10%, 20%, 30%.
What's the probability of the transfer of
that variant to the recipient given this bottleneck size?
What you see up here in purple is that probability.
And so, you can see that even if a variant is three or four or 5% in the donor,
that variant is most likely going to be transferred over to the recipient.
What you see here in gray is actually what
you would actually observe as the probability of transfer to
the recipient given variant-calling thresholds and this is in agreement with the data.
The true probability of transfer is actually much, much higher.
So, I mentioned before there was this variability between transmission pairs.
So, what we're interested and actually doing then,
was actually understanding some of
the variability in transmission bottleneck size estimates.
So, we went to these metadata,
and what we did is for the donors,
we looked at their symptoms score.
So, this was the sum of the headache,
and sore throat, cough myalgia etc,
and replied that symptom score against our bottleneck size estimate,
and you could see that there is no significant correlation there.
We did the same thing, or actually we did this in kind of
multiple regression sort of approach,
we looked at donor temperature,
and here's the maximum donor temperature,
and here's our bottleneck size estimate and we actually found
a significantly positive relationship between
donor temperature and transmission bottleneck size.
And so, this makes sense in light of some previous studies.
Specifically, one showing that host temperature was associated with viral load.
So presumably, these individuals who had higher temperature,
also had higher viral load.
And there's also studies that show that host temperature
was positively related to the degree of nasal shedding.
Okay. So, summary is kind this part one of the talk.
We found that transmission bottleneck size were on the order of
about 200 variants which I consider generally very loose.
They were of highly variable across transmission pairs.
And that variation could be explained by donor temperature,
and viral load in part.
Which makes me kind of think that transmission bottleneck size may not play
a really substantial role in actually slowing influenza's rate of adaptation.
Okay. So, next turning to these studies on genetic linkage.
So, the first one I'd like to talk about is actually
some evidence for actually the reassortment within infected individuals,
is actually rather limited within a human challenge study.
And so, let me explain a little bit about this human challenge study first.
So, this is in collaboration with Chris Woods and Micah McClain,
two infectious disease clinicians at Duke,
who about a decade ago performed a human challenge study with influenza virus.
And so, the study involved actually isolating an H-3 and two strain.
Almost was constrain strain,
from an infected individual.
This was taken as a reference strain.
That reference strain was propagated in both kind of eggs and cells,
to create a viral stock which was then used to inoculate each one of these subjects.
So, they did a number of different studies.
So, there were a number of
different individuals who were actually experimentally inoculated,
and then these individuals were followed over the period of a week,
with viral samples being taken daily.
Not all of those unfortunately, actually,
were able to get kind of virus sequence from them.
But, the samples were taken daily.
So in an earlier paper,
what we did was looking at just one these studies for which we had data, we actually,
found that the viral stock was adapted to passage environments which they had not known,
but wasn't terribly surprising.
And one thing we found was,
all of the variants were distributed across different gene segments.
Five of them obtained really high frequencies,
and three of them were on the HA gene segment and they were non-synonymous.
And you can see here, the frequencies at which they were found in the viral stock.
These were actually, very highly likely,
to have been egg adapted, and self-culture adapted.
Some of that is consistent with, kind of,
what is already known in the literature about certain amino acids which are preferred
in egg versus in mammalian cells.
So, what we could look at was from these different subjects,
we could look at the variants that were identified in the stock,
and their trajectories in infected individuals.
And basically, what we found to this specifically kind in HA is
kind of a reduction and a little frequencies that were
associated with these egg adapted variants.
So, we generally see this reversion of
intrahost viral population within just a day or two,
to that mammalian adapted reference strain.
So, what we did then in a subsequent paper,
we actually combined this dataset with another one we then had available,
and we want to ask, well,
was there any evidence for viral reassortment from this NGS data in these subjects?
So basically, you can measure, three different scenarios.
One, where there's rapid reassortment, one,
where there's no reassortment,
and one, where there's limited reassortment.
So, in the rapid reassortment case.
So this schematic kind of shows three different gene segments of flu.
And this is Allele, this Gilio,
and the Sillio all have a selective advantage in this scenario.
And the degree of the selective advantage differs between these three variants.
And so, in the no reassortment case.
If you kind of simulated kind of forward a model like this,
you would get a little frequency change that looks somehow like this.
However, if there's no reassortment,
and these haplotypes actually interfered with one another,
there would be a kind of this difference in terms of prediction, of Allele frequencies.
And here in the limited reassortment case intermediate between these two,
you'll see something somewhat different.
And so, what we are trying to do here was to reverse, right?
Where we basically have Allele frequency changes
that are observed and what we're trying to do,
is actually work back to determine the degree of reassortment, as well,
as the selective advantage on each one of these Alleles.
I won't go into the details of it,
just in the interest of time,
but you can kind of find them in this recently published paper.
So, somewhat surprisingly to us,
what we found was that the effective rate of reassortment,
was actually really, really limited during these experimental human infections.
So basically, a model with almost no reassortment was actually much preferred based on
likelihood than one that actually had kind of this rapid reassortment.
So very, very, low eccentrive reassortment.
And so, you can see kind of, why
the likelihoods are much better for the model without reassortment,
or with very low reassortment.
This is for an individual Flu5009.
Subject 5001.
These are different,
where you can see here in black are observed Allele frequency changes,
and in blue the no reassortment model predictions.
And here in red,
the rapid reassortment case.
And so, what you can see is in the data,
all these coordinated changes,
in terms of Allele frequency changes across the genome.
And there are no reassortment case can capture that,
whereas the rapid reassortment one cannot.
This is why we get kind of a higher likelihood for the no reassortment case.
So just to summarize this part,
this effective rate of reassortment actually appears to be very limited,
and at least these experimental human infections.
We think that this might be due to
these spatial 'metapopulation' dynamics within
infected humans because we know influenza infection is
a highly spatial process in the lungs with
reassortment happening predominately between identical viral genotypes.
So, there's some studies which indicate
there's a lot of reassortment in small mammal studies,
but maybe the spatial dynamic might be actually less
pronounced in some of these studies, we don't know.
Just to summarize, genetic linkage across gene segments may play actually
a more prominent role in constraining influence evolution than previously thought,
especially a kind of the within-host level.
Finally to the last study,
which is focused on deleterious mutations and how
they impact influenza's antigenic evolution.
So, what's been very well characterized for
influenza is influenza is kind of motive antigenic evolution.
What we know is that
these antigenic clusters emerge and replace one another within every two to eight years.
And these cluster transitions are caused by,
in the vast majority of cases,
just a single amino acid change in the hemoglobin,
primarily in this receptor binding area.
It's a single amino acid one or maybe two to eight years.
So this really opens a question of well,
why don't we see cluster transitions happen more quickly given
these really high mutation rates?
Why does it take two to eight years?
And why don't we actually see explosive antigenic diversification?
And why is antigenic evolution so punctuated?
So, there's been a lot of models to try to attempt
to answer kind of one or more of these questions.
And the answer that we'd like to propose here is that it
might actually have to do with a genetic background relating to mutation alone.
So why even consider deleterious mutations?
Well, there's actually a recent study I used to have kind of
this distribution of fitness effects model for VSV,
but a recent one came out for flu here, which is great.
And what we see is that about 30 percent of
single point mutations are lethal deleterious,
and the vast majority of the remainder one actually have some kind
of fitness cost or sub-lethal deleterious mutations.
These of course can kind of persist for a long time in
populations that they persist in flu,
has been first kind of remarked upon in
this really seminal paper of Walter Fitch and colleagues,
where they find effectively kind of a higher dN/dS
ratio on external branches relative to internal branches.
And so, they conclude that this is likely because of
deleterious mutation load just in the HA.
There was a nice study by all PLoS and co-authors,
I'm looking across actually about 140 different RNA viruses.
And using kind of a similar approach,
they find that, in general,
RNA viruses carry deleterious mutation loads.
And they actually also looked at influenza,
they looked at different gene segment,
I think it was M gene segment.
So what we decided to do,
was in terms of one of these phylodynamic models is developing
epidemiological model where we have host infection histories tracked.
And we actually allow for both antigenic mutations on
the hemagglutinin with antigenicity as
well as the host history of infection determining susceptibility to infection.
But then, we also have this other component which are deleterious mutations occurring at
all gene segments with a constant fitness cost
as is traditional in some of these population genetic models.
And we had these deleterious mutations lower the transmission rate beta.
So what population genetic theory tells us,
is that deleterious mutations will act to slow the rate of adaptive evolution,
and to actually make it also more punctuated in nature.
So these are some simulations.
These were done by a former grad student,
my group, Dave Rasmussen.
So, you could see these full phylodynamics simulations,
when there's no load so we basically turn off deleterious mutations.
We see kind of this very rapid explosion of antigenic diversity.
These colors are actually kind of reused for different antigenic clusters,
but we see explosive genetic and antigenic diversity,
and we see overall in the population,
an increase in incidents in the prevalence,
which is definitely not what we see empirically.
But if we actually turn on this deleterious mutation rate in these simulations,
and you just need some but then a very general and robust result,
is that we can recreate or reproduce the spindly phylogeny of the HA very easily.
So this phylogeny is the same exact phylogeny color coded by different attributes.
So this is color coded by fitness R. So you
can see the backbone here has higher fitness, which makes sense.
We can decompose this fitness R into both the fraction of
the population that kind of viral lineage sees,
as well as kind of the mutation load of that viral lineage.
And we can see that fitness R
depends both on the fraction susceptibility of susceptibles,
so both on antigenic novelty.
So with those like colors indicating that there's
a lot of susceptible hosts to that lineage,
but also depends on mutation load.
So you can see that the background here,
the backbone, is lighter blue,
so carries a lower load.
Okay. So a summary for this part 3,
is that genetic linkage within,
and to some lesser extent,
between gene segments can slow
influenza's antigenic evolution and make it more punctuated in nature.
And the reason why it kind of slows influenza's antigenic evolution
is because it really requires
a large antigenic mutation to occur in a really good genetic background,
and that jackpot combination is very rare.
Genetic linkage in the context of these commonly occurring deleterious mutations and
sub-lethal deleterious mutations is likely to play
a prominent role in shaping and slowing influenza's adaptive evolution.
Just to go back to these three studies,
kind of summarizing them.
A question that I like to kind of put out there at the end of this talk is,
is there a way for these next generation control strategies for flu or in general for
RNA viruses to leverage actually some of
these viral constraints or findings for more effective disease control?
I think there's a few different control strategies that are now being thought
about like therapeutic interfering particles and so forth,
that may actually be able to leverage some of these constraints.
With that, I want to thank all of my co-authors in all of this work.
So this is the Transmission Bottleneck work.
Ashley Sobel, actually you probably saw her name as
first author in a number of these papers.
She is just an M.D.
Ph.D. student in my group,
who's finished her Ph.D. now,
and just slogging away on her M.D.
portion. Elodie Ghedin, Daniel Weissman and Ben Greenbaum,
folks I interacted with on the reassortment rate project,
especially Chris Illingworth, and
Dave Rasmussen on the deleterious mutation work. Any questions?