Richard Neher

Biozentrum, University of Basel

slides at neherlab.org/202012_Ringvorlesung.html

tobacco mosaic virus
(Thomas Splettstoesser, wikipedia)

bacteria phage (adenosine, wikipedia)

influenza virus wikipedia

human immunodeficiency virus wikipedia

- rely on host to replicate
- little more than genome + capsid
- genomes typically 5-200k bases (+exceptions)
- most abundant organisms on earth $\sim 10^{31}$

- thousands of bacterial isolates
- thousands of single cells
- populations of viruses, bacteria or flies
- diverse ecosystems

GGATAATTCTATTAACCATGAAGACTATCATTGCTTT...

GGATAATTCTATTAACCATGAAGACTATTATTGCTTT...

GGATAATTCTATTAACCATGAAGACTATCATTGCTTT...

... hundreds of thousands of sequences...

- Chimp → human transmission around 1900 gave rise to HIV-1 group M
- ~100 million infected people since
- subtypes differ at 10-20% of their genome
- HIV-1 evolves ~0.1% per year

- $10^8$ cells are infected every day
- the virus repeatedly escapes immune recognition
- integrates into T-cells as
latent provirus

- Viruses carrying a beneficial mutation have more offspring: on average $1+s$ instead of $1$
- $s$ is called selection coefficient
- Fraction $x$ of viruses carrying the mutation changes as $$x(t+1) = \frac{(1+s)x(t)}{(1+s)x(t) + (1-x(t))}$$
- In continuous time → logistic differential equation: $$\frac{dx}{dt} = sx(1-x) \Rightarrow x(t) = \frac{e^{s(t-t_0)}}{1+ e^{s(t-t_0)}}$$

- diverge at 0.1-1% per year
- almost whole genome coverage in 10 patients
- full data set at hiv.tuebingen.mpg.de

- mutation away from preferred state with rate $\mu$
- selection against non-preferred state with strength $s$
- variant frequency dynamics: $\frac{d x}{dt} = \mu -s x $
- equilibrium frequency: $\bar{x} = \mu/s $
- fitness cost: $s = \mu/\bar{x}$

- Surface proteins hemagglutinin (HA) and neuraminidase (NA)
- Influenza A virus
- Common in birds and mammals
- Many different subtypes defined by surface proteins
- H3N2, H1N1, H7N9, H5N1
- Influenza B virus
- infects mainly humans
- two lineages that split 30-40y ago
- B/Victoria vs B/Yamagata

- can we predict fitness?
- pick the closest relative of the future?

$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$

RN, Russell, Shraiman, eLife, 2014
- no influenza specific input
- how can the model be improved? (see model by Luksza & Laessig)
- what other context might this apply?

- hundreds of new sequences every day
- more than 200k sequences right now
- comprehensive analysis require hours to days to complete

- RNA virus evolution can be observed directly
- Rapidly adapting population require new population genetic models
- Those model can be used to infer fit clades
- Future influenza population can be anticipated
- Automated real-time analysis can help fight the spread of disease

- Fabio Zanini
- Jan Albert
- Johanna Brodin
- Christa Lanz
- GĂ¶ran Bratt
- Lina Thebo
- Vadim Puller

- Boris Shraiman
- Colin Russell
- Trevor Bedford
- Oskar Hallatschek

- Trevor Bedford
- Colin Megill
- Pavel Sagulenko
- Sidney Bell
- James Hadfield
- Wei Ding
- Emma Hodcroft
- Sanda Dejanic