Viruses
tobacco mosaic virus
(Thomas Splettstoesser, wikipedia)
bacteria phage
(adenosine, wikipedia)
influenza virus
wikipedia
human immunodeficiency virus
wikipedia
rely on host to replicate
little more than genome + capsid
genomes typically 5-200k bases (+exceptions)
many infectious diseases are caused by viruses
very important function in microbial eco-systems
most abundant organisms on earth $\sim 10^{31}$
Evolution of HIV
Chimp → human transmission around 1900 gave rise to HIV-1 group M
~100 million infected people since
subtypes differ at 10-20% of their genome
HIV-1 evolves ~0.1% per year
image: Sharp and Hahn, CSH Persp. Med.
HIV infection
$10^8$ cells are infected every day
the virus repeatedly escapes immune recognition
integrates into T-cells as latent provirus
image: wikipedia
HIV-1 evolution within one individual
silouhette: clipartfest.com, Zanini at al, 2015. Collaboration with Jan Albert and his group
Population sequencing to track all mutations above 1%
diverge at 0.1-1% per year
almost whole genome coverage in 10 patients
full data set at hiv.tuebingen.mpg.de
Zanini et al, eLife, 2015; antibody data from Richman et al, 2003
Theoretical framework for virus evolution -- population genetics
evolutionary processes ↔ trees ↔ genetic diversity
Neutral models and beyond
Neutral models
all individuals are identical → same offspring distribution
Kingman coalesence and diffusion theory are dual descriptions
everything is easy to calculate
perturbations like background selection can be included
But: neutral models not suitable for RNA viruses!
Clonal interference and traveling waves
RN, Annual Reviews, 2013; Desai & Fisher; Brunet & Derride; Kessler & Levine
Neutral/Kingman coalescent
strong selection
Bolthausen-Sznitman Coalescent
RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007
Traveling waves and the Bolthausen-Snitman coalescent
Branching process approximation: $P(n_i, t|x_i)$
Does a sample (blue dots) have a common ancestor $\tau$ generations ago?
$\quad Q_b = \langle \sum_i \left(\frac{n_i}{\sum_j n_j}\right)^b\rangle \approx \frac{\tau-T_c}{T_c(b-1)} $
All other merger rates are also consistent with the Bolthausen-Sznitman coalescent: $\quad\lambda_{b,k} = \frac{(k-2)!(b-k)!}{T_c (b-1)!}$
RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007
U-shaped polarized site frequency spectra
RN, Hallatschek, PNAS, 2013
Zanini et al, eLife, 2015
Bursts in a tree ↔ high fitness genotypes
Can we read fitness of a tree?
Human seasonal influenza viruses
slide by Trevor Bedford
Influenza virus evolves to avoid human immunity
Vaccines need frequent updates
Predicting evolution
Given the branching pattern:
can we predict fitness?
pick the closest relative of the future?
RN, Russell, Shraiman, eLife, 2014
Fitness inference from trees
$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$
RN, Russell, Shraiman, eLife, 2014
Validate on simulation data
simulate evolution
sample sequences
reconstruct trees
infer fitness
predict ancestor of future
compare to truth
RN, Russell, Shraiman, eLife, 2014
Validation on simulated data
RN, Russell, Shraiman, eLife, 2014
Prediction of the dominating H3N2 influenza strain
no influenza specific input
how can the model be improved? (see model by Luksza & Laessig)
what other context might this apply?
RN, Russell, Shraiman, eLife, 2014
Summary
RNA virus evolution can be observed directly
Extensive reversion to preferred amino acid sequence
Rapidly adapting population require new population genetic models
Those model can be used to infer fit clades
Future influenza population can be anticipated
Automated real-time analysis can help fight the spread of disease
HIV acknowledgments
Fabio Zanini
Jan Albert
Johanna Brodin
Christa Lanz
Göran Bratt
Lina Thebo
Vadim Puller
Influenza and Theory acknowledgments
Boris Shraiman
Colin Russell
Trevor Bedford
Oskar Hallatschek
nextstrain.org
Trevor Bedford
Colin Megill
Pavel Sagulenko
Wei Ding
Sidney Bell
James Hadfield