Human seasonal influenza viruses
slide by Trevor Bedford
- Influenza viruses evolve to avoid human immunity
- Vaccines need frequent updates
Beyond tracking: can we predict?
Model of rapidly adapting virus populations
RN, Annual Reviews, 2013; Desai & Fisher; Brunet & Derrida; Kessler & Levine
Typical tree
Bolthausen-Sznitman Coalescent
RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007; Desai, Walczak, Fisher, Genetics, 2013
Bursts in a tree ↔ high fitness genotypes
Predicting evolution
Given the branching pattern:
- can we predict fitness?
- pick the closest relative of the future?
RN, Russell, Shraiman, eLife, 2014
Fitness inference from trees
$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$
RN, Russell, Shraiman, eLife, 2014
Prediction of the dominating H3N2 influenza strain
- no influenza specific input
- how can the model be improved? (see model by Luksza & Laessig)
- what other context might this apply?
RN, Russell, Shraiman, eLife, 2014
Prediction of the dominating H3N2 influenza strain
NextStrain architecture
Using treetime to rapidly compute timetrees
TreeTime: maximum likelihood phylodynamic analysis
Phylogenetic trees record history:
- transmission
- divergence times
- population dynamics
- ancestral geographic distribution/migrations
Typical approach: Bayesian parameter estimation
- flexible
- probabilistic → confidence intervals etc
- but: computationally expensive
TreeTime by Pavel Sagulenko
- probabilistic treatment of divergence times
- dates trees with thousand sequences in a few minutes
- linear time complexity
- fixed tree topology
- github.com/neherlab/treetime
West African Ebola virus outbreak
Molecular clock phylogenies of ~2000 A/H3N2 HA sequences -- a few minutes
What about bacteria?
- vertical and horizontal transmission
- genome rearrangements
- much larger genomes
- variation of divergence along the genome
- NGS genomes tend to be fragmented
- annotations of variable quality
- pan-genome identification pipeline
- phylogenetic analysis of each orthologous cluster
- detect associations with phenotypes
- fast: analyze hundreds of genomes in a few hours
- github.com/neherlab/pan-genome-analysis
S. pneumoniae data set by Croucher et al.
Pan-genome statistics and filters
Species trees and gene trees
Links between species trees and gene trees
Summary
- Data set are growing rapidly
→ tools for interpretation and exploration are crucial
- Breadth and depth
→ provide an overview, integrate, and go deep
- Actionable outputs require (near)real-time analysis
→ fast analysis pipelines are essential
- We are just scratching the surface...
Interested in HIV NGS: come find me!
Influenza and Theory acknowledgments
- Boris Shraiman
- Colin Russell
- Trevor Bedford
- Oskar Hallatschek
- All the NICs and WHO CCs that provide influenza sequence data
nextstrain.org
- Trevor Bedford
- Colin Megill
- Sidney Bell
- James Hadfield
- All the scientist that share virus sequence data