Virus evolution and population genetics


Richard Neher
Biozentrum, University of Basel


slides at neherlab.org/202004_computational_biology.html

Viruses

tobacco mosaic virus (Thomas Splettstoesser, wikipedia)

bacteria phage (adenosine, wikipedia)

influenza virus wikipedia

human immunodeficiency virus wikipedia
  • rely on host to replicate
  • little more than genome + capsid
  • genomes typically 5-200k bases (+exceptions)
  • most abundant organisms on earth $\sim 10^{31}$

Some viruses evolve a million times faster than animals

Animal haemoglobin

HIV protein

Development of sequencing technologies

We can now sequence...
  • thousands of bacterial isolates
  • thousands of single cells
  • populations of viruses, bacteria or flies
  • diverse ecosystems

Evolution of HIV


  • Chimp → human transmission around 1900 gave rise to HIV-1 group M
  • ~100 million infected people since
  • subtypes differ at 10-20% of their genome
  • HIV-1 evolves ~0.1% per year
image: Sharp and Hahn, CSH Persp. Med.

HIV infection

  • $10^8$ cells are infected every day
  • the virus repeatedly escapes immune recognition
  • integrates into T-cells as latent provirus
image: wikipedia

Population genetics & evolutionary dynamics

evolutionary processes ↔ trees ↔ genetic diversity

HIV-1 evolution within one individual



silouhette: clipartfest.com, Zanini at al, 2015. Collaboration with Jan Albert and his group

Mutation rates and diversity and neutral sites

Zanini et al, Virus Evolution, 2017

Time-scaled phylogenies

Tree building optimization with temporal constraints

  • Time stamps single out a root
  • Root can be found by optimizing root-to-tip regression
  • BEAST: Markov-Chain Monte Carlo tree sampler
  • If topology is correct, temporal constraints can be accounted for in linear time
  • Multiple tools: treedate, LSD, treetime

Time-scaled phylogenies

Attach sequences and dates
Propagate temporal constraints via convolutions
Integrate up-stream and down-stream constraints

Time-scaled phylogenies

  • Calibration points can be longitudinal samples, ancient DNA or fossils
  • Rates can vary between proteins and organisms from 0.01/year to $<10^{-8}$/y
  • Some site change often, some rarely → saturation
  • The apparent rate changes over time
  • Divergence times are often under estimated.

treetime.ch

SARS-CoV-2 talk

Exercise sheet