### Human seasonal influenza viruses

slide by Trevor Bedford
- Influenza viruses evolve to avoid human immunity
- Vaccines need frequent updates

## Beyond tracking: can we predict?

### Model of rapidly adapting virus populations

RN, Annual Reviews, 2013; Desai & Fisher; Brunet & Derrida; Kessler & Levine

#### Typical tree

#### Bolthausen-Sznitman Coalescent

RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007; Desai, Walczak, Fisher, Genetics, 2013
### Bursts in a tree ↔ high fitness genotypes

### Predicting evolution

### Given the branching pattern:

- can we predict fitness?
- pick the closest relative of the future?

RN, Russell, Shraiman, eLife, 2014
### Fitness inference from trees

$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$

RN, Russell, Shraiman, eLife, 2014
### Prediction of the dominating H3N2 influenza strain

- no influenza specific input
- how can the model be improved? (see model by Luksza & Laessig)
- what other context might this apply?

RN, Russell, Shraiman, eLife, 2014
### Prediction of the dominating H3N2 influenza strain

## NextStrain architecture

#### Using treetime to rapidly compute timetrees

### TreeTime: maximum likelihood phylodynamic analysis

##### Phylogenetic trees record history:

- transmission
- divergence times
- population dynamics
- ancestral geographic distribution/migrations

##### Typical approach: Bayesian parameter estimation

- flexible
- probabilistic → confidence intervals etc
- but: computationally expensive

##### TreeTime by Pavel Sagulenko

- probabilistic treatment of divergence times
- dates trees with thousand sequences in a few minutes
- linear time complexity
- fixed tree topology
- github.com/neherlab/treetime

##### West African Ebola virus outbreak

#### Molecular clock phylogenies of ~2000 A/H3N2 HA sequences -- a few minutes

### What about bacteria?

- vertical and horizontal transmission
- genome rearrangements
- much larger genomes
- variation of divergence along the genome
- NGS genomes tend to be fragmented
- annotations of variable quality

- pan-genome identification pipeline
- phylogenetic analysis of each orthologous cluster
- detect associations with phenotypes
- fast: analyze hundreds of genomes in a few hours
- github.com/neherlab/pan-genome-analysis

#### S. pneumoniae data set by Croucher et al.

### Pan-genome statistics and filters

### Species trees and gene trees

### Links between species trees and gene trees

## Summary

- Data set are growing rapidly

→ tools for interpretation and exploration are crucial
- Breadth and depth

→ provide an overview, integrate, and go deep
- Actionable outputs require (near)real-time analysis

→ fast analysis pipelines are essential
- We are just scratching the surface...

#### Interested in HIV NGS: come find me!

### Influenza and Theory acknowledgments

- Boris Shraiman
- Colin Russell
- Trevor Bedford
- Oskar Hallatschek

- All the NICs and WHO CCs that provide influenza sequence data

### nextstrain.org

- Trevor Bedford
- Colin Megill
- Sidney Bell
- James Hadfield

- All the scientist that share virus sequence data