Development of sequencing technologies
We can now sequence...
- thousands of bacterial isolates
- thousands of single cells
- populations of viruses, bacteria or flies
- diverse ecosystems
Virus genomes change rapidly through time
A/Brisbane/100/2014
GGATAATTCTATTAACCATGAAGACTATCATTGCTTT...
A/Brisbane/1000/2015
GGATAATTCTATTAACCATGAAGACTATTATTGCTTT...
A/Brisbane/1/2017
GGATAATTCTATTAACCATGAAGACTATCATTGCTTT...
... hundreds of thousands of sequences...
Phylogenetic analysis of viral sequences
RNA viruses have a high mutation rate. New mutations arise every few weeks.
Some viruses evolve a million times faster than animals
Animal haemoglobin
HIV protein
Evolution of HIV
- Chimp → human transmission around 1900 gave rise to HIV-1 group M
- ~100 million infected people since
- subtypes differ at 10-20% of their genome
- HIV-1 evolves ~0.1% per year
image: Sharp and Hahn, CSH Persp. Med.
HIV infection
- $10^8$ cells are infected every day
- the virus repeatedly escapes immune recognition
- integrates into T-cells as latent provirus
image: wikipedia
HIV-1 evolution within one individual
silouhette: clipartfest.com, Zanini at al, 2015. Collaboration with Jan Albert and his group
Immune escape in early HIV infection
Immune escape in early HIV infection
Selective sweeps
- Viruses carrying a beneficial mutation have more offspring: on average $1+s$ instead of $1$
- $s$ is called selection coefficient
- Fraction $x$ of viruses carrying the mutation changes as
$$x(t+1) = \frac{(1+s)x(t)}{(1+s)x(t) + (1-x(t))}$$
- In continuous time → logistic differential equation:
$$\frac{dx}{dt} = sx(1-x) \Rightarrow x(t) = \frac{e^{s(t-t_0)}}{1+ e^{s(t-t_0)}}$$
Population sequencing to track all mutations above 1%
- diverge at 0.1-1% per year
- almost whole genome coverage in 10 patients
- full data set at hiv.tuebingen.mpg.de
Zanini et al, eLife, 2015; antibody data from Richman et al, 2003
Human Influenza A viruses
slide by Trevor Bedford
Influenza virus
- Surface proteins hemagglutinin (HA) and neuraminidase (NA)
- Influenza A virus
- Common in birds and mammals
- Many different subtypes defined by surface proteins
- H3N2, H1N1, H7N9, H5N1
- Influenza B virus
- infects mainly humans
- two lineages that split 30-40y ago
- B/Victoria vs B/Yamagata
Clonal interference and traveling waves
RN, Annual Reviews, 2013; Desai & Fisher; Brunet & Derrida; Kessler & Levine
Typical tree
Bolthausen-Sznitman Coalescent
RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007
Bursts in a tree ↔ high fitness genotypes
Can we read fitness of a tree?
Predicting evolution
Given the branching pattern:
- can we predict fitness?
- pick the closest relative of the future?
RN, Russell, Shraiman, eLife, 2014
Fitness inference from trees
$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$
RN, Russell, Shraiman, eLife, 2014
Validation on simulated data
RN, Russell, Shraiman, eLife, 2014
Prediction of the dominating H3N2 influenza strain
- no influenza specific input
- how can the model be improved? (see model by Luksza & Laessig)
- what other context might this apply?
RN, Russell, Shraiman, eLife, 2014
Real-time tracking of SARS-CoV-2
- hundreds of new sequences every day
- more than >6M sequences right now
- comprehensive analysis require hours to days to complete
→ requires continuous analysis and easy dissemination
→ interpretable and intuitive visualization
Emergence and dominance of VoCs
VoCs have more mutations than expected...
nextstrain
SARS-CoV-2 variants can become dominant without advantage
Hodcroft et al
High case numbers in Spain and high travel volume spread the variant
Hodcroft et al
Summary
- RNA virus evolution can be observed directly
- Rapidly adapting population require new population genetic models
- Those model can be used to infer fit clades
- Future influenza population can be anticipated
- Automated real-time analysis can help fight the spread of disease
HIV acknowledgments
- Fabio Zanini
- Jan Albert
- Johanna Brodin
- Christa Lanz
- Göran Bratt
- Lina Thebo
- Vadim Puller
Influenza and Theory acknowledgments
- Boris Shraiman
- Colin Russell
- Trevor Bedford
- Oskar Hallatschek
SARS-CoV-2 acknowledgements
- Emma Hodcroft (now in Bern)
- Moira Zuber (Basel)
- Iñaki Comas and Fernando Gonzalez-Candelas, Valencia
- Martina Reichmuth and Christian Althaus (Bern)
- Tanja Stadler, Sarah Nadeau, Tim Vaughan at ETH
- Alberto Hernando and David Matteo at Kido Dynamics
- Jesse Bloom, Katherine Crawford at Fred Hutch
- David Veesler, Alex Walls, Davide Corti, John Bowen at UW
Acknowledgments
Trevor Bedford and his lab -- terrific collaboration since 2014
especially James Hadfield, Emma Hodcroft, Ivan Aksamentov, Moira Zuber, John Huddleston, and Tom Sibley
Data we analyze are contributed by scientists from all over the world
Data are shared and curated by GISAID