Real-time tracking and prediction of RNA virus evolution

Richard Neher
Biozentrum, University of Basel

slides at

Human seasonal influenza viruses

slide by Trevor Bedford

  • Influenza viruses evolve to avoid human immunity
  • Vaccines need frequent updates

joint work with Trevor Bedford & his lab

code at

Beyond tracking: can we predict?

Model of rapidly adapting virus populations

RN, Annual Reviews, 2013; Desai & Fisher; Brunet & Derrida; Kessler & Levine

Typical tree

Bolthausen-Sznitman Coalescent

RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007; Desai, Walczak, Fisher, Genetics, 2013

Bursts in a tree ↔ high fitness genotypes

Predicting evolution

Given the branching pattern:

  • can we predict fitness?
  • pick the closest relative of the future?
RN, Russell, Shraiman, eLife, 2014

Fitness inference from trees

$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$
RN, Russell, Shraiman, eLife, 2014

Prediction of the dominating H3N2 influenza strain

  • no influenza specific input
  • how can the model be improved? (see model by Luksza & Laessig)
  • what other context might this apply?
RN, Russell, Shraiman, eLife, 2014

Prediction of the dominating H3N2 influenza strain

joint work with Trevor Bedford & his lab

code at

NextStrain architecture

Using treetime to rapidly compute timetrees

TreeTime: maximum likelihood phylodynamic analysis

Phylogenetic trees record history:
  • transmission
  • divergence times
  • population dynamics
  • ancestral geographic distribution/migrations
Typical approach: Bayesian parameter estimation
  • flexible
  • probabilistic → confidence intervals etc
  • but: computationally expensive
TreeTime by Pavel Sagulenko
  • probabilistic treatment of divergence times
  • dates trees with thousand sequences in a few minutes
  • linear time complexity
  • fixed tree topology
West African Ebola virus outbreak

TreeTime: nuts and bolts

Attach sequences and dates
Reconstruct ancestral sequences
Propagate temporal constraints via convolutions
Integrate up-stream and down-stream constraints
Fit phylodynamic model → iterate

Molecular clock phylogenies of ~2000 A/H3N2 HA sequences -- a few minutes

What about bacteria?

  • vertical and horizontal transmission
  • genome rearrangements
  • much larger genomes
  • variation of divergence along the genome
  • NGS genomes tend to be fragmented
  • annotations of variable quality
panX by Wei Ding
  • pan-genome identification pipeline
  • phylogenetic analysis of each orthologous cluster
  • detect associations with phenotypes
  • fast: analyze hundreds of genomes in a few hours

panX @

S. pneumoniae data set by Croucher et al.

Pan-genome statistics and filters

Species trees and gene trees

Links between species trees and gene trees


  • Data set are growing rapidly
    → tools for interpretation and exploration are crucial
  • Breadth and depth
    → provide an overview, integrate, and go deep
  • Actionable outputs require (near)real-time analysis
    → fast analysis pipelines are essential
  • We are just scratching the surface...

Interested in HIV NGS: come find me!

Influenza and Theory acknowledgments

  • Boris Shraiman
  • Colin Russell
  • Trevor Bedford
  • Oskar Hallatschek

  • All the NICs and WHO CCs that provide influenza sequence data

  • Trevor Bedford
  • Colin Megill
  • Sidney Bell
  • James Hadfield

  • All the scientist that share virus sequence data

TreeTime & panX

TreeTime: Pavel Sagulenko
webserver at
manuscript on bioRxiv
panX: Wei Ding
live site at
manuscript on bioRxiv